[MKSearch-dev] Week 2 round up

Wed Oct 20 12:49:09 BST 2004

I've crammed my work into the first part of this week, these are the 
highlights. All feedback appreciated.

Best regards,

Phil

Monday
------
The problems I was facing last week building the GNU JAXP package 
under Cygwin related to system-dependent "native" classes. These 
classes are not critical to this project (and don't fit with my idea 
of platform independence), so I wrote bash scripts to compile this 
package without the problem classes and package them in a Java 
archive (JAR).

Used Ant build scripts to test the selective compilation using the 
Sun JDK, but it does not seem possible to run Ant under Cygwin.

Researched open source Java spidering software and added notes to the 
licence page for the time being. Heretrix looks a strong candidate, 
it is used by the Internet Archive. Other GPL spidering tools have 
components that may be of use.

http://crawler.archive.org/
http://www.archive.org/

Also looked further into Inkling, which is GPL, and RDF API, which is 
W3C.

https://svn.mkdoc.com/mksearch/doc/licence/index.htm

Tuesday
-------
Wrote further bash scripts to complete my initial development 
environment, to compile the GNU Servlet API, JUnit, MKSearch and 
project test classes.

Got the latest CVS source from the GNU ClasspathX project and JUnit 
and tested all scripts, making minor modifications along the way. 
Major headaches working out how GCJ works with classpath settings, 
but gradually getting a better grasp.

Created a high level schematic of how I am planning to tackle the 
main components of the system:

https://svn.mkdoc.com/mksearch/doc/design/MKSearch%20high%20level%20sc
hematic%20v0.1.png 

Wednesday
---------

Updated the compile and JAR scripts to use variable substitution so 
they can be run from any arbitrary installation directory, and 
completed a first draft README.txt file to explain how to do it.

https://svn.mkdoc.com/mksearch/bin/README.txt

It would be really helpful if someone could check out the project and 
test these instructions and the scripts on a different machine. You 
only need make a couple of environment settings and everything should 
run straight out of the box. The two key software dependencies are 
noted in the README file.

Created a first draft AbstractXMLReader, which will be the basis for 
parsing XHTML output from JTidy to RDF objects.