[Petal] HTML::TreeBuilder utf8 troubles
William McKee
william at knowmad.com
Sat Jan 22 00:54:24 GMT 2005
On Thu, Jan 20, 2005 at 10:40:34PM +0000, Bruno Postle wrote:
> You could add a check and only convert to latin1 if your data is
> actually convertible to latin1, something like this:
That looks like an interesting idea. I'd like to throw a warning if it
doesn't validate. I think it would make sense to use the value of
decode_charset if it's available.
Hallelujah! I just discovered that the latest release of HTML::Parser
(v3.45 released just a couple weeks ago) fixes these problems. After all
that time coming up with a good test script, I discovered it stopped
working on one of my systems. Turns out that I had unknowingly upgraded
my version of HTML::Parser which fixes several Unicode problems.
So, we can skip all this non-sense and happily recommend the use of
Petal::Parser::HTB but with the condition that HTML::Parser be v3.45 or
better. I've attached a patch to update the Makefile.PL.
William
--
Knowmad Services Inc.
http://www.knowmad.com
-------------- next part --------------
--- Changes.orig 2004-03-15 10:59:28.000000000 -0500
+++ Changes 2005-01-21 19:52:18.000000000 -0500
@@ -1,5 +1,10 @@
Revision history for Petal::Parser::HTB.
+1.04 Fri Jan 21 19:51:27 2005
+ - Add requirement for HTML::Parser v3.45 which fixes problems with entity
+ encoding that resulted in the mysterious appearance of  characters in
+ the output
+
1.03 Mon Mar 15 15:59:06 2004
- Updated entities test
-------------- next part --------------
--- Makefile.PL.orig 2005-01-21 19:50:43.000000000 -0500
+++ Makefile.PL 2005-01-21 19:51:10.000000000 -0500
@@ -7,6 +7,7 @@
'PREREQ_PM' => {
'Petal' => '1.10',
'HTML::TreeBuilder' => '3.12',
+ 'HTML::Parser' => '3.45',
},
($] >= 5.005 ? ## Add these new keywords supported since 5.005
(ABSTRACT_FROM => 'lib/Petal/Parser/HTB.pm', # retrieve abstract from module
More information about the Petal
mailing list