[MKSearch-dev] Setting MKSearch
Chris Croome
chris at webarchitects.co.uk
Mon Jan 9 13:42:33 GMT 2006
Hi Phil
Best do this on the list I think :-)
On Thu 05-Jan-2006 at 06:22:19PM -0000, Phil Shaw wrote:
> On 5 Jan 2006 at 16:49, Chris Croome wrote:
>
> > The site I have set up is here:
> >
> > http://mksearch.dev.webarch.net/mksearch/
> >
> > And it seem to be fine apart from the FQ URIs, like this:
> >
> > http://localhost:8080/mksearch/?property=dcterms.alternative
>
> The system uses the absolute URI as a key for an (as yet, un-
> implemented) result cache. Can you send me the server.xml
> configuration you are using, it should be at:
>
> /etc/tomcat5/server.xml
>
> Tomcat does not know the host name it is operating under. Some work on
> the configuration should sort that out.
>
> > Could MKSearch generate these relative to the DocumentRoot, ie:
> >
> > /mksearch/?property=dcterms.alternative
>
> This could be done if fix #1 does not solve it. Even so, it would be
> preferable to add a configuration parameter to the MKSearch servlet to
> tell it the domain it's working under.
Fix #1 won't work unless we can also remove the 8080 port number
via server.xml since 8080 (tomcat) is not directly accessible -- I want
to do everythjing via apache...
> > My next questing is which indexes to generate? triple and or
> > rdfstore?
>
> The triple configuration is a good one to check the indexer is
> properly set up. It will write a mirrored set of N-Triple files for
> each Web page at $mk_home/output by default
>
> If you start at a deep URL on the test site, it won't take long to
> confirm it's working properly, e.g.:
>
> http://test.mksearch.mkdoc.org/link/rel/index.html
Yes, things look fine :-)
> Then you can switch the rdfstore configuration. This will create an
> RDF/XML serialization at:
>
> $mk_home/output/com.mkdoc.store.LocalStoreManager.rdf
OK, I have done this, I see that each time java-jspider.sh it clobbers
com.mkdoc.store.LocalStoreManager.rdf rather than updating / adding to
it? Or perhaps the problems is that I haven't yet done the multi-site
setup stuff...?
> If you use this file to replace the current sample index (below) and
> re-build the WAR, it will make a drop-in replacement:
>
> $mk_home/src/app/WEB-INF/rdf/com.mkdoc.store.LocalStoreManager.rdf
OK, I did that, though the spider might have died before it completed
the task these are the last couple of lines of output:
PANIC! Task net.javacoding.jspider.core.task.work.SpiderHttpURLTask at 133c8d0 threw an excpetion!
java.lang.NullPointerException
> Re-run:
>
> $mk_home/bin/war-mksearch.sh
>
> And the updated WAR file is output at:
>
> $mk_home/dist/mksearch.war
OK, I have done this and deployed the .war file:
sudo cp $mk_home/dist/mksearch.war /var/lib/tomcat5/webapps/
sudo /etc/init.d/tomcat5 restart
So the test install now has an index of http://www.webarchitects.co.uk/
http://mksearch.dev.webarch.net/mksearch/
And Subject searches work OK:
http://mksearch.dev.webarch.net/mksearch/HttpQuery?dc.subject=web&type=html&limit=10
But, when I search for documents contributed to by "Chris Croome" I get
no results:
http://mksearch.dev.webarch.net/mksearch/HttpQuery?dc.subject=&dc.contributor=Chris+Croome&type=html&limit=10
But I'm down as a contributor to the front page of the site...
> To configure for multiple sites, you will need to edit the rules in
> the configuration files, see these for example:
>
> $mk_home/conf/rdfstore/sites.properties
> $mk_home/conf/rdfstore/sites/default.properties
> $mk_home/conf/rdfstore/sites/mksearch.mkdoc.org.properties
OK... so as a minimum a file like this is needed for each site?
$mk_home/conf/rdfstore/sites/example.org.properties
And doing this addresses the com.mkdoc.store.LocalStoreManager.rdf
clobbering issue?
Chris
PS I have left in the rest of the email from you since it could help
people having it in the archives :-)
> More specific rules override the general rules at the base level. You
> can create any number of per-site configurations for throttling,
> robots.txt, user agent, etc. as above, but it's not necessary. If no
> site-specific configuration is declared, the default properties will
> be used.
>
> The JSpider manual has guidance on configuration, but you'll need to
> skim over lots to find the useful stuff. More pointers below.
>
> http://prdownloads.sourceforge.net/j-spider/jspider-0-5-0-doc-user.pdf?download
>
> See this JavaDoc page for an outline of the JSpider configuration
> rules:
>
> https://svn.mkdoc.com/mksearch/trunk/doc/javadoc/jspider/net/javacoding/jspider/mod/rule/package-summary.html
>
> And some rules I wrote:
>
> https://svn.mkdoc.com/mksearch/trunk/doc/javadoc/com/mkdoc/jspider/HtmlAndRdfMimeTypeOnlyRule.html
> https://svn.mkdoc.com/mksearch/trunk/doc/javadoc/com/mkdoc/jspider/RdfMimeTypeOnlyRule.html
--
Chris Croome <chris at webarchitects.co.uk>
web design http://www.webarchitects.co.uk/
web content management http://mkdoc.com/
More information about the MKSearch-dev
mailing list