[MKSearch-dev] Week 3 round up

Phil Shaw phil at mkdoc.com
Thu Oct 28 15:30:40 BST 2004


I'm doing a long week to complete my days for October. I haven't 
finished yet but I wanted to write up what I've done before I forget 
it. I'm about to start delving more deeply into spider source code 
and feel I need to clear the decks...

As usual, all comments are welcome.

Best regards,

Phil


Monday
------
I've been trying to create a build environment in which GNU/Linux and 
Cygwin usage is as close as possible to Windows usage. To date I have 
been having problems running Ant and JUnit -- JUnit compiles but does 
not run with GIJ and the error messages aren't terribly helpful!

I spent some time trying to compile a code coverage test framework I 
like to use called Hansel, which depends on some Apache tools, and 
this is where I decided to draw the line and take a layered approach 
to the build framework.

I'll continue to investigate the problems running Ant/JUnit with the 
GNU tools, but I separated out the Hansel tests and will keep the 
other code conformance checks I like to run under Ant/Windows for the 
time being. 

http://hansel.sourceforge.net/
http://checkstyle.sourceforge.net/
http://pmd.sourceforge.net/

Started a more detailed package dependency analysis for the Web 
spiders and RDF frameworks, to check where they rely on non-GPL code.

Modified jar scripts to take an implicit reference to the jar tool.


Tuesday
-------
Further package dependency analysis and licence research for Web 
spiders and RDF frameworks. Completed the AbstractXMLReader class, 
standard JUnit tests and Hansel coverage test. Updated the Ant build 
script to the new layered testing scheme.


Wednesday
---------
Added JTidy to the library source and created scripts to compile, 
archive and run under GNU/Linux and Windows. Completed dependency 
analysis for RDF frameworks and excluded various tools from 
consideration.

https://svn.mkdoc.com/mksearch/doc/licence/index.htm

Identified James Clark's XT as a suitable XSLT processor, if 
required. Confirmed licence compatibility with GNU.

http://www.blnz.com/xt/index.html


Thursday
--------
Completed dependency analysis for Web spiders and excluded various 
tools from consideration. Did some further work test running JUnit 
with GIJ and identified a known compatibility bug with the Sun Java 
interpreter. Running GCJ compiled code that includes inner classes 
requires version 1.4 of the Sun interpreter or above.

I felt I needed to get a better grasp of the terms used in the 
project proposal, so I updated the high level schematic with with 
relations between the descriptive components -- Spider, Indexer, 
Checker, etc. -- and the software components. Saved a print 
resolution PNG and initial notes:

https://svn.mkdoc.com/mksearch/doc/design/MKSearch%20high%20level%20sc
hematic%20v0.2.png

http://www.mksearch.mkdoc.org/howto/system-components/


More information about the MKSearch-dev mailing list