[MKSearch-dev] Tomcat on FC4
Phil Shaw
phil at mkdoc.com
Tue Sep 6 16:55:48 BST 2005
On 31 Aug 2005, at 18:45, Phil Shaw wrote:
> On 31 Aug 2005, at 16:35, Chris Croome wrote:
>
> > Compiling JTidy...
> > /usr/local/mksearch/lib-src/jtidy/org/w3c/tidy/Lexer.java:1278: error: Unrecognized character for encoding 'UTF-8'.
> > if (doctype != null)// #473490 - fix by Bj�rn H�hrmann 10 Oct 01
> > ^
> > /usr/local/mksearch/lib-src/jtidy/org/w3c/tidy/Lexer.java:132: error: Type âLexer.W3CVersionInfoâ not found in declaration of field âW3CVERSIONâ.
> > private static final Lexer.W3CVersionInfo[] W3CVERSION = {
> > ^
> > 2 errors
Chris,
I believe this is fixed now, I added an explicit Latin 1 encoding
argument to the compiler.
This update to JTidy required an update of GNU JAXP to version
1.3. This snapshot was taken just before it was merged with the
main classpath project for GCJ 4, as packaged in FC4.
The 1.3 version of JAXP contained a number of errors, but I have
taken out the offending packages. The same has been done for the
classpath version, so I suppose it's a known issue.
Now all library packages and MKSearch itself compile under GCJ
3.3.4 on Cygwin and Sun JDK 1.4 on Windows. I have just indexed
the main MKSearch site and it seems to run much faster than
before, this may be down to improvements in JTidy.
The site has 2340 URLs, I used 5 spider threads with a throttle of
500ms. It parsed 1330 documents in just under an hour and
produced about 2.5MB of valid RDF/XML.
I'll test this on FC3 next. If you would like to have a go on FC4,
please do. When it comes to using the $mk_home/bin/gij-
jspider.sh script, you may have to remove the reference to our
version of GNU JAXP, to avoid conflicts with the merged classpath
version. If so, take out this segment out of the line with the gij
command, including the colon separator:
:$mk_home/lib/gnu-jaxp.jar
Best regards,
Phil
--
MKSearch (alpha)
http://www.mksearch.mkdoc.org/
Free, open source metadata search engine with RDF storage and query.
More information about the MKSearch-dev
mailing list