[MKSearch-dev] Jspider postgres problems

Chris Croome chris at webarchitects.co.uk
Wed Feb 8 17:05:42 GMT 2006


Hi

On Wed 08-Feb-2006 at 04:21:22PM +0000, Chris Croome wrote:
> 
> > That's not very much data, but it's more likely to be a
> > configuration issue governing which pages should be indexed I
> > suspect. Take a look at the site properties configuration and the
> > rules applied to spidering. Perhaps there is no configuration for
> > this domain and it is only spidering the first page it encounters.

Right, I had forgotten to create files for these domains, now I have,
they are all the same and are in:

  $mk_home/conf/rdfstoredb/sites/

And I've attached them to this email.

Running the spidering process as before results in the same amount of
data being indexed... I guess I have forgotten something else...?

Chris

-- 
Chris Croome                               <chris at webarchitects.co.uk>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   
-------------- next part --------------
# ---------------------------------------------------------
# MKSearch test site configuration
# ---------------------------------------------------------
#
# Created:  2004-11-16
# Issued:   2004-12-09
# Modified: 2005-03-29
#
###########################################################

site.handle=true

# -----------------------------------------------------------------------------
# Proxy Configuration
# -----------------------------------------------------------------------------

site.proxy.use=true

# -----------------------------------------------------------------------------
# Throttling Configuration
# -----------------------------------------------------------------------------

site.throttle.provider=net.javacoding.jspider.core.throttle.impl.DistributedLoadThrottleProvider
site.throttle.config.interval=500

# -----------------------------------------------------------------------------
# Cookie Configuration
# -----------------------------------------------------------------------------

site.cookies.use=true

# -----------------------------------------------------------------------------
# Robots.txt configuration
# -----------------------------------------------------------------------------

site.robotstxt.fetch=true
site.robotstxt.obey=true

# -----------------------------------------------------------------------------
# User Agent configuration
# -----------------------------------------------------------------------------

#site.userAgent=JSpider (http://j-spider.sourceforge.net)

# -----------------------------------------------------------------------------
# Rules Configuration
# -----------------------------------------------------------------------------

site.rules.spider.count=0

site.rules.parser.count=1
site.rules.parser.1.class=net.javacoding.jspider.mod.rule.BaseSiteOnlyRule
-------------- next part --------------
# ---------------------------------------------------------
# MKSearch test site configuration
# ---------------------------------------------------------
#
# Created:  2004-11-16
# Issued:   2004-12-09
# Modified: 2005-03-29
#
###########################################################

site.handle=true

# -----------------------------------------------------------------------------
# Proxy Configuration
# -----------------------------------------------------------------------------

site.proxy.use=true

# -----------------------------------------------------------------------------
# Throttling Configuration
# -----------------------------------------------------------------------------

site.throttle.provider=net.javacoding.jspider.core.throttle.impl.DistributedLoadThrottleProvider
site.throttle.config.interval=500

# -----------------------------------------------------------------------------
# Cookie Configuration
# -----------------------------------------------------------------------------

site.cookies.use=true

# -----------------------------------------------------------------------------
# Robots.txt configuration
# -----------------------------------------------------------------------------

site.robotstxt.fetch=true
site.robotstxt.obey=true

# -----------------------------------------------------------------------------
# User Agent configuration
# -----------------------------------------------------------------------------

#site.userAgent=JSpider (http://j-spider.sourceforge.net)

# -----------------------------------------------------------------------------
# Rules Configuration
# -----------------------------------------------------------------------------

site.rules.spider.count=0

site.rules.parser.count=1
site.rules.parser.1.class=net.javacoding.jspider.mod.rule.BaseSiteOnlyRule
-------------- next part --------------
# ---------------------------------------------------------
# MKSearch test site configuration
# ---------------------------------------------------------
#
# Created:  2004-11-16
# Issued:   2004-12-09
# Modified: 2005-03-29
#
###########################################################

site.handle=true

# -----------------------------------------------------------------------------
# Proxy Configuration
# -----------------------------------------------------------------------------

site.proxy.use=true

# -----------------------------------------------------------------------------
# Throttling Configuration
# -----------------------------------------------------------------------------

site.throttle.provider=net.javacoding.jspider.core.throttle.impl.DistributedLoadThrottleProvider
site.throttle.config.interval=500

# -----------------------------------------------------------------------------
# Cookie Configuration
# -----------------------------------------------------------------------------

site.cookies.use=true

# -----------------------------------------------------------------------------
# Robots.txt configuration
# -----------------------------------------------------------------------------

site.robotstxt.fetch=true
site.robotstxt.obey=true

# -----------------------------------------------------------------------------
# User Agent configuration
# -----------------------------------------------------------------------------

#site.userAgent=JSpider (http://j-spider.sourceforge.net)

# -----------------------------------------------------------------------------
# Rules Configuration
# -----------------------------------------------------------------------------

site.rules.spider.count=0

site.rules.parser.count=1
site.rules.parser.1.class=net.javacoding.jspider.mod.rule.BaseSiteOnlyRule


More information about the MKSearch-dev mailing list