[Petal] HTML::TreeBuilder utf8 troubles

William McKee william at knowmad.com
Sat Jan 22 00:54:24 GMT 2005


On Thu, Jan 20, 2005 at 10:40:34PM +0000, Bruno Postle wrote:
> You could add a check and only convert to latin1 if your data is 
> actually convertible to latin1, something like this:

That looks like an interesting idea. I'd like to throw a warning if it
doesn't validate. I think it would make sense to use the value of
decode_charset if it's available.

Hallelujah! I just discovered that the latest release of HTML::Parser
(v3.45 released just a couple weeks ago) fixes these problems. After all
that time coming up with a good test script, I discovered it stopped
working on one of my systems. Turns out that I had unknowingly upgraded
my version of HTML::Parser which fixes several Unicode problems.

So, we can skip all this non-sense and happily recommend the use of
Petal::Parser::HTB but with the condition that HTML::Parser be v3.45 or
better. I've attached a patch to update the Makefile.PL.


William

-- 
Knowmad Services Inc.
http://www.knowmad.com
-------------- next part --------------
--- Changes.orig	2004-03-15 10:59:28.000000000 -0500
+++ Changes	2005-01-21 19:52:18.000000000 -0500
@@ -1,5 +1,10 @@
 Revision history for Petal::Parser::HTB.
 
+1.04 Fri Jan 21 19:51:27 2005
+		- Add requirement for HTML::Parser v3.45 which fixes problems with entity
+		encoding that resulted in the mysterious appearance of  characters in
+		the output
+
 1.03 Mon Mar 15 15:59:06 2004
     - Updated entities test
 
-------------- next part --------------
--- Makefile.PL.orig	2005-01-21 19:50:43.000000000 -0500
+++ Makefile.PL	2005-01-21 19:51:10.000000000 -0500
@@ -7,6 +7,7 @@
     'PREREQ_PM'		=> {
 	'Petal'             => '1.10',
         'HTML::TreeBuilder' => '3.12',
+        'HTML::Parser' => '3.45',
     },
     ($] >= 5.005 ?    ## Add these new keywords supported since 5.005
       (ABSTRACT_FROM => 'lib/Petal/Parser/HTB.pm', # retrieve abstract from module


More information about the Petal mailing list