[Petal] Problems with HTML Parser

Fri, 9 Aug 2002 17:14:41 +0100

Hi,

> So really, you're outputting valid xml right now which is what is causing 
> the problem with the browsers that don't understand <link></link> tagsets? 

Yes. <link></link> is perfectly well-formed XML, however it's not valid
XHTML. XHTML is just a subset of XML.

> I can see the benefit of adding INPUT and OUTPUT settings which would help 
> clear up the confusion while adding the possibility for lots more bugs 
> <g>. But if you could provide a flowchart of how Petal processes a 
> template and outputs the results, you may improve the chances of getting 
> bug smashers helping you out.

Okay... When I built Petal I thought it would be cool to be able to
support multiple syntaxes. However, directly implementing all the
syntaxes would have been quite painful.

I chose to use <?petal:var name="something"?> as my base, canonical
syntax because it's extremely easy to tokenize and to turn into Perl
code (which is what the Petal::CodeGenerator does).

So what Petal does is the following (if you are not using the cache). 

1. Read the source file

2. Parse it using either XML::Parser or HTML::TreeBuilder

3. Throw XML events to the Petal::Canonicalizer using the parsed tree

4. Using the XML event it receives, the Petal::Canonicalizer re-writes
the code in a canonical way. You can see the canonical version of a
source file using the following statement.

  print ${$template->_canonicalize()};

5. Turns the canonical template into Perl code which is
Petal::CodeGenerator's job. You can see the Perl code version using

  print $template->_code_disk_cached();

6. Cache the perl code on disk

7. Turn the perl code into an anonymous subroutine (coderef)

8. Executes this subroutine with the hash passed to the process()
method.

Note that when you're running under mod_perl, most of the time Petal
should do only the step 8. If you are not using mod_perl but are still
using caching, most of the time it'll be doing 7. and 8.

In the current version of Petal that I'm working on, there are two
canonicalizers, one which outputs XML canonical templates and the other
one which outputs XHTML canonical templates. The latter is just a
subclass of the former and its behavior differs only when it's
processing XHTML specific tags.

> It seems that the Parser and the Canonicalizer may now be doing the
> INPUT/OUTPUT handling although I'm still confused as to which each is
> doing. It seems to me that the  Canonicalizer handles the INPUT part
> right now. Ahhh, that's why you would add the
> Petal::Canonicalizer::XHTML, P::C::XML, etc. modules. Am I getting
> warm? In that case, what module would handle the OUTPUT? 

Nope. The INPUT specifies the parser to use. For example, you could
write a Petal::Parser::POD which takes Perl POD source and fires XML
events from this source.

The OUTPUT specifies under which form the template should be rewritten.
Let's say you feel courageous, you might be able to write, let's say, a
Petal::Canonicalizer::Word.

The CodeGenerator takes this easily parsable template and turns it into
Perl code... you know the end of the story.

> > Anyway, as I started to use the library myself I have seen bugs stacking up
> > in my petal list folder and I need to do something about it, so I guess I'm
> > not going to wait until tomorrow :-)
> 
> Great! I'm ready to have some of the outstanding bugs smashed.

Soon, soon! I also need to write some extensive documentation... I can't
believe I'm creating myself all that extra work :-)

Cheers,
-- 
IT'S TIME FOR A DIFFERENT KIND OF WEB
================================================================
  Jean-Michel Hiver - Software Director
  jhiver@mkdoc.com
  +44 (0)114 255 8097
================================================================
                                      VISIT HTTP://WWW.MKDOC.COM