(Fwd) Re: [MKSearch-dev] Jspider postgres problems
Phil Shaw
phil at mkdoc.com
Wed May 31 23:11:57 BST 2006
Should have posted this to the mailing list too...
------- Forwarded message follows -------
From: Phil Shaw <phil at mkdoc.com>
To: Chris Croome <chris at webarchitects.co.uk>
Subject: Re: [MKSearch-dev] Jspider postgres problems
Send reply to: phil at mkdoc.com
Date sent: Sun, 26 Mar 2006 09:55:13 +0100
On 8 Feb 2006 at 17:05, Chris Croome wrote:
> Right, I had forgotten to create files for these domains, now I have,
> they are all the same and are in:
Chris,
Sorry it's taken so long to get back to this. I think the problem is
to do with the base site only rule. This was set for my test runs
against our static HTML test site and has been carried over into
other configuration sets.
site.rules.parser.count=1
site.rules.parser.1.class=net.javacoding.jspider.mod.rule.BaseSiteOnly
Rule
For a single run of JSpider there is only one base site: the domain
of the original URL it is given at the command line. If this rule is
applied to the parser, only HTML on the base site will be parsed for
more links to follow. Any external sites will be spidered (to allow
checks for 404 etc.) but not parsed for further links. So you only
get metadata for individual linked resources on secondary sites -- I
think this is the case here, isn't it?
To have a group of sites indexed fully, delete the second parser
configuration line and change the rule count to zero for every site
configuration file:
site.rules.parser.count=0
The default site properties configuration will ensure that sites that
do not have a configuration of their own will not be parsed.
Best regards,
Phil
------- End of forwarded message -------
--
MKSearch (beta)
http://www.mksearch.mkdoc.org/
Free, open source metadata search engine with RDF storage and query.
More information about the MKSearch-dev
mailing list