From phil at mkdoc.com Sun Mar 9 10:30:45 2008
From: phil at mkdoc.com (Phil Shaw)
Date: Sun Mar 9 10:31:16 2008
Subject: [MKSearch-dev] Server reply unparseable?
In-Reply-To: <47C33F02.5080404@bu.edu>
Message-ID: <47D3BC55.28403.1030697F@phil.mkdoc.com>
On 25 Feb 2008 at 17:19, Jeff Albro wrote:
> I'm trying to spider a local page, and am getting:
>
> java.io.IOException: Server reply was unparseable: "-//IETF//DTD HTML 2.0//EN">
> Server reply was unparseable:
> [Plugin] Error event comment: resource http://emt.bu.edu couldn't be
> fetched [0]
> [Plugin] 0 - ERROR !!!http://emt.bu.edu
>
> It wouldn't surprise me if there were an error in this page's doctype,
> but is there a way to ignore the error?
Jeff,
Sorry for the delay in responding. It would help if you could outline
how you have built and configured MKSearch and what source you have
used please.
All HTML is passed through JTidy to convert it to XHTML before
processing. XML validation and indexing only occurs _after_ the
source has passed through JTidy, so this looks like JSpider may be
having trouble processing the source for link extraction.
The start URL you have given, without a trailing slash, results in a
bad request response from the server, which will probably be handled
as an error by JSpider:
400 Bad Request
Bad Request
Your browser sent a request that this server could not
understand.
Apache/2.0.52 (CentOS) Server at imc-sed.bu.edu Port
80
Web browsers will apparently redirect to the default base URL
http://emt.bu.edu/ with a trailing slash. Try starting the indexer
with that URL instead.
The target site's markup is declared as XHTML 1.0 Transitional, but
does not validate on many counts. However, this should _probably_ be
cleaned up by JTidy.
It would probably be best to ensure your installation can process the
our test site content before trying elsewhere.
Best regards,
Phil
--
MKSearch (beta)
http://www.mksearch.mkdoc.org/
Free, open source metadata search engine with RDF storage
and query.
From phil at mkdoc.com Sun Mar 9 10:39:03 2008
From: phil at mkdoc.com (Phil Shaw)
Date: Sun Mar 9 10:39:27 2008
Subject: [MKSearch-dev] http://test.mksearch.mkdoc.org/ down?
In-Reply-To: <47C33831.8010207@bu.edu>
References: <47BB4691.25583.583EF9B8@phil.mkdoc.com>
Message-ID: <47D3BE47.11391.10380448@phil.mkdoc.com>
On Monday, February 25, 2008 at 16:50, Jeff Albro wrote:
> I can confirm that the site is back up... but it is still not working
> for me... How sensitive it is to java version?
> I'm using:
>
> export mk_build=/home/jalbro/mksearch/build
> export mk_home=/home/jalbro/mksearch
> #export CLASSPATH=/usr/share/java/libgcj-3.4.1.jar
> export CLASSPATH=/usr/share/java/libgcj-3.4.3.jar
>
> And I get the error below. I also got it with gij-jspider.
> exception during spidering
> java.lang.ClassCastException: java.util.List
> java.util.List
Jeff,
This exception does suggest a Java version conflict, it may be that
the java.util.List version the source was compiled against is not
compatible with your runtime system.
To ensure your installation is compatible with your runtime
environment, follow these instructions on compiling with GCJ on
Linux:
http://www.mksearch.mkdoc.org/howto/build-mksearch-with-gcj/
Hope this helps.
Phil
--
MKSearch (beta)
http://www.mksearch.mkdoc.org/
Free, open source metadata search engine with RDF storage
and query.