[MKSearch-dev] Week 21 round up

Phil Shaw phil at mkdoc.com
Wed Mar 2 19:22:45 GMT 2005


I finished work on the components for a general purpose XHTML indexer 
and started bringing the crawler up to date. Spent lots more time 
picking through government metadata specifications and I'm pleased to 
say I'm over the worst of it. Still some gremlins to sort out.

Best regards,

Phil


Monday
~~~~~~
Created a test suite for the new UKeGMS class, which required some 
additions to the Schema interface and updates to the 
DublinCoreElements and DublinCoreTerms schemas. Testing identified 
some minor oversights in the Dublin Core schemas.

Tuesday
~~~~~~~
Added tests for the new Schema getPrefix methods. Re-factored many 
SAX and JSpider classes to share more common code and create a new 
general purpose XhtmlTripleWriterPlugin. Added test cases for 
XhtmlMetadataFilter and an addSchema method to the RDFHandler types. 
Also made name changes to various SAX and JSpider classes to simplify.

Wednesday
~~~~~~~~~
Added the Hansel test coverage tool and supporting BCEL package to 
the optional library directory so that it is properly part of the 
project build system. Updated the JSpider "triple" configuration to 
the new general purpose XhtmlTripleWriterPlugin with custom UKeGMS 
Schema and tested -- required a few minor tweaks and adjustments. 
Added a target to the Ant build script to create a GNU Servlet API 
JAR and added it to the library JAR target dependencies. Finally, 
created a set of 70 e-GMS test documents, ran the crawler over them 
and extracted no metadata! Something to look at next week...


More information about the MKSearch-dev mailing list