[Petal] HTML::TreeBuilder utf8 troubles
Bruno Postle
bruno at mkdoc.com
Thu Jan 20 22:40:34 GMT 2005
On Thu 20-Jan-2005 at 16:53 -0500, William McKee wrote:
> >
> > 1. Your patch seems to assume that all the data will fit into
> > latin1:
>
> Oops, I didn't mean to discriminate; just chalk it up to learning the
> ropes of Unicode.
You could add a check and only convert to latin1 if your data is
actually convertible to latin1, something like this:
Index: lib/Petal/Parser/HTB.pm
===================================================================
RCS file: /var/spool/cvs/Petal-Parser-HTB/lib/Petal/Parser/HTB.pm,v
retrieving revision 1.3
diff -r1.3 HTB.pm
58a59,60
> # encode the data as latin1 before passing to HTML::TreeBuilder
> $$data_ref = Encode::encode('latin1', $$data_ref) if _is_ok_as_latin1 ($$data_ref);
189a192,197
> sub _is_ok_as_latin1
> {
> my $data = shift;
> return 1 if Encode::decode ("latin1", Encode::encode ("latin1", $data)) eq $data;
> return 0;
> }
(not sure if this is an improvement though)
--
Bruno
More information about the Petal
mailing list