[Petal] Recap on utf8 encoding issues and PerlIO patch

William McKee william at knowmad.com
Thu Jan 20 16:58:44 GMT 2005


I'm back to the problem with html entities outputting extra characters
(Acirc) when viewed as latin1/iso-8859-1 (see the messages from Feb 2004
in the archives for more details).

This time I've discovered that one of the leading causes (after
eliminating the issue of the browser trying to display utf8 as Western
ISO-8859-1) is employing Petal::Parser::HTB. It seems that this module
causes the Petal to handle the data differently and no amount of coaxing
(via Encode::encode('latin1', $string)) is making a difference.

Along the way, I tried using the -decode_charset option. I found a bug
at line 177 of Petal.pm in _file_data_ref() where it loads the template
file. According to the Encode manpage[1], the open command is improperly
using PerlIO. Here is the example from Encode:

  open my $in,  "<:encoding(shiftjis)", $infile  or die;

And here's what Petal is currently doing:

  open FP, "<:$encoding", "$file_path" || die 'Cannot read-open $file_path';

That line causes the error message:

  Unknown PerlIO layer "iso" at /usr/local/lib/perl5/site_perl/5.8.6/Petal.pm line 581.
  readline() on closed filehandle FP at /usr/local/lib/perl5/site_perl/5.8.6/Petal.pm line 588.

Changing it to:

  open FP, "<:encoding($encoding)", "$file_path" or die "Cannot read-open $file_path";

seems to work (note that I fixed the precedence issue we had with using
|| as well as double-quoting the error message so that $file_path gets
properly interpolated). Unfortunately, I still get the darn Acirc in my
output when using Petal::Parser::HTB. Guess it's time to stop using that
crib module and get my html validating ;>.

I hope someone more familiar with PerlIO can comment on this patch.


Cheers,
William

[1] http://crs.ciril.fr/public/docs/perl/PA-RISC2.0/Encode.html

-- 
Knowmad Services Inc.
http://www.knowmad.com


More information about the Petal mailing list