[Petal] More on entities and Â

Michele Beltrame mb at italpro.net
Tue May 4 16:02:43 BST 2004


Hi!

> Grant seems to be saying the default is UTF8 whereas Michele says it is
> iso8859-1.

It really depends on Perl, as it has a "use UTF8 if I find a UTF8 charachter"
behaviour. The only way you can be sure output is *always* UTF8 or
*always* ISO8859-1 is to use the Encode module, as per example I
posted in my previous message.

> The next thing that confuses me is that I have Perl 5.8.3 installed on
> both systems. Only one is showing the extra character.

This is, of course, mistery. ;-)

> Finally, my reading of utf8 docs says that a 00 should be appended to
> ANSI characters. Where is the A0 character coming from?

The 00 is not actually prepended to charachters with code point 0-127
in UTF8. This is one of the things that make UTF8 different from
UCS2 (also known as UTF16), which always used two bytes for a
charachters. UTF8 chars are of variable byte-occupation, and that
allows charachter 0-127 to remains the same, thus maintaining
perfect compatibility with US ASCII documents.

	Michele.

-- 
Michele Beltrame
http://www.italpro.net/mb/
ICQ# 76660101 - e-mail: mb at italpro.net


More information about the Petal mailing list