[Petal] HTML::TreeBuilder utf8 troubles

William McKee william at knowmad.com
Thu Jan 20 21:53:37 GMT 2005


On Thu, Jan 20, 2005 at 09:19:00PM +0000, Bruno Postle wrote:
> Ok though I have some questions:
> 
> 1. Your patch seems to assume that all the data will fit into 
>   latin1:

Oops, I didn't mean to discriminate; just chalk it up to learning the
ropes of Unicode.


>   Unless HTML::TreeBuilder and/or HTML::Parser are unsafe with 
>   anything other than latin1 - In which case it doesn't matter and 
>   Petal::Parser::HTB should try and squeeze everything into latin1.

I dunno. I just know that I got rid of my problems by recoding the data
into latin1 before passing it off to HTML::TreeBuilder. Hopefully I'll
hear back from Sean Burke with some ideas/suggestions/fixes. In the
meantime, I wonder if we could figure out what the local encoding is for
the system in question and use it. Or perhaps we should be using the
value of DECODE_CHARSET. If neither of these are available, we could
fallback to latin1.


> 2. Petal::Parser::HTB has (the ancient) Petal-1.10 as a 
>   prerequisite, does it really only work with this exact version of 
>   Petal?

No, it works fine with the latest version of Petal (and I think it will
work ok with earlier versions; I'm still running v1.06 on my production
server b/c I haven't wanted to go through the onerous task of making all
of my early projects into valid xhtml documents). Basically, P::P::HTB
allows you to continue to use the old HTML::TreeBuilder as the parser
instead of MKDoc::XML (which requires valid documents).


>   (You can tell that I don't use this backend)

Consider yourself fortunate. Let's take a straw vote. I'll post it as a
separate message for folks who aren't following this thread.


> 3. Petal::Parser::HTB has HTML::TreeBuilder-3.12 as a prerequisite, 
>   does it really only work with this exact version?

I'm pretty sure there are reasons that at least v3.12 is necessary (you
could check the archives). I'm successfully using it with v3.13 which is
the latest version.


William

-- 
Knowmad Services Inc.
http://www.knowmad.com


More information about the Petal mailing list