[Petal] HTML::TreeBuilder utf8 troubles

Bruno Postle bruno at mkdoc.com
Thu Jan 20 22:40:34 GMT 2005


On Thu 20-Jan-2005 at 16:53 -0500, William McKee wrote:
> > 
> > 1. Your patch seems to assume that all the data will fit into 
> >   latin1:
>
> Oops, I didn't mean to discriminate; just chalk it up to learning the
> ropes of Unicode.

You could add a check and only convert to latin1 if your data is 
actually convertible to latin1, something like this:

Index: lib/Petal/Parser/HTB.pm
===================================================================
RCS file: /var/spool/cvs/Petal-Parser-HTB/lib/Petal/Parser/HTB.pm,v
retrieving revision 1.3
diff -r1.3 HTB.pm
58a59,60
>         # encode the data as latin1 before passing to HTML::TreeBuilder
>         $$data_ref = Encode::encode('latin1', $$data_ref) if _is_ok_as_latin1 ($$data_ref);
189a192,197
> sub _is_ok_as_latin1
> {
>     my $data = shift;
>     return 1 if Encode::decode ("latin1", Encode::encode ("latin1", $data)) eq $data;
>     return 0;
> }

(not sure if this is an improvement though)

-- 
Bruno


More information about the Petal mailing list