[Petal]   to  mystery

William McKee william at knowmad.com
Tue Feb 24 23:08:31 GMT 2004


Chris,

Thanks for your feedback on this character encoding mystery and for the
info about Apache Bench and lynx. Those tips will prove useful in my
education about character encodings.

I've finally had a chance to look into this issue more. When I set the
meta tag, all works as expected. Without it I get the funny A character.

I'm hypothesizing that my test script fails when run in my shell because
LC_CTYPE or LANG or one of the other locale settings is not in utf8
(it's en_US). I tried to change it to check my assumption that setting
it to utf8 would correct the error but had difficulties so have
abandoned the effort.

The thing that still baffles me is why I would get the  when using
Petal v2.02 with Petal::Parser::HTB and not get it with straight Petal.
If, as Jean-Michel says, Petal is outputting everything in UTF8, it
seems that I'd be getting the  in both.

Perhaps that character is being generated by HTB. Petal::Entities is
converting nbsp to \240 which is decimal 160. Is there a way to print
the 160 character on the command line to see what it generates? I guess
a simple Perl script would do the job....

    perl -e 'print "-\240-\n"'

Indeed, the above one-liner prints a space between the dashes. At this
point, I'm betting that it must be HTML::TreeBuilder converting that
character into something it thinks is printable which is causing the
problem. HTB looks too big and involved for me to bother tracking this
down any further. I'd appreciate any insight others who are more
familiar with it may have.



Cheers!
William

-- 
Knowmad Services Inc.
http://www.knowmad.com


More information about the Petal mailing list