[Petal] More on entities and Â

William McKee william at knowmad.com
Tue May 4 16:50:14 BST 2004


On Tue, May 04, 2004 at 05:02:43PM +0200, Michele Beltrame wrote:
> It really depends on Perl, as it has a "use UTF8 if I find a UTF8 charachter"
> behaviour. The only way you can be sure output is *always* UTF8 or
> *always* ISO8859-1 is to use the Encode module, as per example I
> posted in my previous message.

OK, this explanation makes sense.


> > The next thing that confuses me is that I have Perl 5.8.3 installed on
> > both systems. Only one is showing the extra character.
> 
> This is, of course, mistery. ;-)

Figures... :-/


> The 00 is not actually prepended to charachters with code point 0-127
> in UTF8. This is one of the things that make UTF8 different from
> UCS2 (also known as UTF16), which always used two bytes for a
> charachters. UTF8 chars are of variable byte-occupation, and that
> allows charachter 0-127 to remains the same, thus maintaining
> perfect compatibility with US ASCII documents.

Thanks for the lesson. Can you explain what is happening that makes the
A0 character have a C2 appended to it when output as utf-8? My
understanding of utf-8 was that it was compatible with latin1. This
behavior is *not* very compatible from my point of view.

One more point which may be at the root of my problems. I'm trying to
get Apache to add the Content-Type header using the following
declaration in my httpd.conf per the Apache docs:

    AddDefaultCharset utf-8

No matter if I have this in my main server configuration or the virtual
host configuration, if I do a `HEAD http::servername`, I get back a
Content-Type of iso-8859-1. If I view the page in Firefox and manually
tell Firefox to display it as UTF-8, all is well. Any ideas why Apache
isn't playing nice?


Thanks,
William

-- 
Knowmad Services Inc.
http://www.knowmad.com


More information about the Petal mailing list