[MKSearch-dev] Jspider postgres problems

Chris Croome chris at webarchitects.co.uk
Wed Feb 8 16:21:22 GMT 2006


Hi

On Wed 08-Feb-2006 at 04:08:45PM -0000, Phil Shaw wrote:
> On 8 Feb 2006 at 13:29, Chris Croome wrote:
> 
> > This is the script:
> > 
> >   #!/bin/bash
> >   SITES="tre.ngfl.gov.uk ferl.becta.org.uk www.aclearn.net"
> >   for a in $SITES
> >     do
> >     $mk_home/bin/java-jspider-pgsql.sh http://$a/ rdfstoredb
> >   done
> 
> It's okay to use this type of script with database storage, but you
> are starting and stopping the JVM for each. If you were to set up a
> start page of index links as we were before, there would be one JVM
> start-up and the spider could make better use of its threading
> capability. Not critical.

Ah, good point.

> > And it runs for a while with various warnings and then ends with
> > this:
> > 
> >   PANIC! Task net.javacoding.jspider.core.task.work.SpiderHttpURLTask at 1231fd8 threw an excpetion!  java.lang.NullPointerException
> 
> I have occasionally seen this type of error before and I think it's to
> do with thread scheduling during the spider shut down process. I
> re-traced the code one time and found the engine tries to clean-up a
> thread that had just terminated itself. I don't think it affects the
> indexing because it's all over by then. 

OK.

> That's not very much data, but it's more likely to be a configuration
> issue governing which pages should be indexed I suspect. Take a look
> at the site properties configuration and the rules applied to
> spidering. Perhaps there is no configuration for this domain and it is
> only spidering the first page it encounters.

Do'h of course... I forgot that step... I'm running it again.

Chris

-- 
Chris Croome                               <chris at webarchitects.co.uk>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   


More information about the MKSearch-dev mailing list