[MKSearch-dev] Week 2 round up
Phil Shaw
phil at mkdoc.com
Wed Oct 20 12:49:09 BST 2004
I've crammed my work into the first part of this week, these are the
highlights. All feedback appreciated.
Best regards,
Phil
Monday
------
The problems I was facing last week building the GNU JAXP package
under Cygwin related to system-dependent "native" classes. These
classes are not critical to this project (and don't fit with my idea
of platform independence), so I wrote bash scripts to compile this
package without the problem classes and package them in a Java
archive (JAR).
Used Ant build scripts to test the selective compilation using the
Sun JDK, but it does not seem possible to run Ant under Cygwin.
Researched open source Java spidering software and added notes to the
licence page for the time being. Heretrix looks a strong candidate,
it is used by the Internet Archive. Other GPL spidering tools have
components that may be of use.
http://crawler.archive.org/
http://www.archive.org/
Also looked further into Inkling, which is GPL, and RDF API, which is
W3C.
https://svn.mkdoc.com/mksearch/doc/licence/index.htm
Tuesday
-------
Wrote further bash scripts to complete my initial development
environment, to compile the GNU Servlet API, JUnit, MKSearch and
project test classes.
Got the latest CVS source from the GNU ClasspathX project and JUnit
and tested all scripts, making minor modifications along the way.
Major headaches working out how GCJ works with classpath settings,
but gradually getting a better grasp.
Created a high level schematic of how I am planning to tackle the
main components of the system:
https://svn.mkdoc.com/mksearch/doc/design/MKSearch%20high%20level%20sc
hematic%20v0.1.png
Wednesday
---------
Updated the compile and JAR scripts to use variable substitution so
they can be run from any arbitrary installation directory, and
completed a first draft README.txt file to explain how to do it.
https://svn.mkdoc.com/mksearch/bin/README.txt
It would be really helpful if someone could check out the project and
test these instructions and the scripts on a different machine. You
only need make a couple of environment settings and everything should
run straight out of the box. The two key software dependencies are
noted in the README file.
Created a first draft AbstractXMLReader, which will be the basis for
parsing XHTML output from JTidy to RDF objects.
More information about the MKSearch-dev
mailing list