[Petal] HTML::TreeBuilder utf8 troubles
Bruno Postle
bruno at mkdoc.com
Thu Jan 20 21:19:00 GMT 2005
On Thu 20-Jan-2005 at 15:40 -0500, William McKee wrote:
>
> After today's research on character encodings (which I'm finally
> feeling like I'm getting a grasp of), I've attached a patch for
> Petal::Parser::HTB which should fix it's desire to output the
> extra Acirc character.
> Bruno, any chance you'd be willing to post this update to CPAN?
Ok though I have some questions:
1. Your patch seems to assume that all the data will fit into
latin1:
encode ('latin1', $$data_ref)
Now this isn't necessarily the case. For instance, this produces
the sort of garbage you are trying to prevent in the first place:
encode ("latin1", "Euro: \x{20ac} Copyright: \x{00a9}");
Unless HTML::TreeBuilder and/or HTML::Parser are unsafe with
anything other than latin1 - In which case it doesn't matter and
Petal::Parser::HTB should try and squeeze everything into latin1.
2. Petal::Parser::HTB has (the ancient) Petal-1.10 as a
prerequisite, does it really only work with this exact version of
Petal?
(You can tell that I don't use this backend)
3. Petal::Parser::HTB has HTML::TreeBuilder-3.12 as a prerequisite,
does it really only work with this exact version?
--
Bruno
More information about the Petal
mailing list