(Fwd) Re: [MKSearch-dev] Jspider postgres problems

Wed May 31 23:11:57 BST 2006

Should have posted this to the mailing list too...

------- Forwarded message follows -------
From:           	Phil Shaw <phil at mkdoc.com>
To:             	Chris Croome <chris at webarchitects.co.uk>
Subject:        	Re: [MKSearch-dev] Jspider postgres problems
Send reply to:  	phil at mkdoc.com
Date sent:      	Sun, 26 Mar 2006 09:55:13 +0100

On 8 Feb 2006 at 17:05, Chris Croome wrote:

> Right, I had forgotten to create files for these domains, now I have,
> they are all the same and are in:

Chris,

Sorry it's taken so long to get back to this. I think the problem is 
to do with the base site only rule. This was set for my test runs 
against our static HTML test site and has been carried over into 
other configuration sets.

site.rules.parser.count=1
site.rules.parser.1.class=net.javacoding.jspider.mod.rule.BaseSiteOnly
Rule

For a single run of JSpider there is only one base site: the domain 
of the original URL it is given at the command line. If this rule is 
applied to the parser, only HTML on the base site will be parsed for 
more links to follow. Any external sites will be spidered (to allow 
checks for 404 etc.) but not parsed for further links. So you only 
get metadata for individual linked resources on secondary sites -- I 
think this is the case here, isn't it?

To have a group of sites indexed fully, delete the second parser 
configuration line and change the rule count to zero for every site 
configuration file:

site.rules.parser.count=0

The default site properties configuration will ensure that sites that 
do not have a configuration of their own will not be parsed.

Best regards,

Phil

------- End of forwarded message -------
--
MKSearch (beta)

http://www.mksearch.mkdoc.org/

Free, open source metadata search engine with RDF storage and query.