[Petal] HTML::TreeBuilder utf8 troubles

William McKee william at knowmad.com
Thu Jan 20 20:40:58 GMT 2005


Hey gang,

It's probably tacky to reply to a cc that I wrote but then I sometimes
talk to myself IRL as well :).

After today's research on character encodings (which I'm finally feeling
like I'm getting a grasp of), I've attached a patch for
Petal::Parser::HTB which should fix it's desire to output the extra
Acirc character.

Although I have not written a test specifically to test this problem,
all existing 200+ tests continue to pass with the modification.

Bruno, any chance you'd be willing to post this update to CPAN?


Thanks,
William

-- 
Knowmad Services Inc.
http://www.knowmad.com
-------------- next part --------------
--- HTB.pm.orig	2005-01-20 15:36:24.000000000 -0500
+++ HTB.pm	2005-01-20 15:35:56.000000000 -0500
@@ -56,6 +56,8 @@
     
     eval
     {
+      # encode the data as latin1 before passing to HTML::TreeBuilder
+      $$data_ref = Encode::encode('latin1', $$data_ref);
 	$tree->parse ($$data_ref);
 	my @nodes = $tree->guts();
 	$tree->elementify();


More information about the Petal mailing list