[Petal] More on entities and Â
William McKee
william at knowmad.com
Tue May 4 16:50:14 BST 2004
On Tue, May 04, 2004 at 05:02:43PM +0200, Michele Beltrame wrote:
> It really depends on Perl, as it has a "use UTF8 if I find a UTF8 charachter"
> behaviour. The only way you can be sure output is *always* UTF8 or
> *always* ISO8859-1 is to use the Encode module, as per example I
> posted in my previous message.
OK, this explanation makes sense.
> > The next thing that confuses me is that I have Perl 5.8.3 installed on
> > both systems. Only one is showing the extra character.
>
> This is, of course, mistery. ;-)
Figures... :-/
> The 00 is not actually prepended to charachters with code point 0-127
> in UTF8. This is one of the things that make UTF8 different from
> UCS2 (also known as UTF16), which always used two bytes for a
> charachters. UTF8 chars are of variable byte-occupation, and that
> allows charachter 0-127 to remains the same, thus maintaining
> perfect compatibility with US ASCII documents.
Thanks for the lesson. Can you explain what is happening that makes the
A0 character have a C2 appended to it when output as utf-8? My
understanding of utf-8 was that it was compatible with latin1. This
behavior is *not* very compatible from my point of view.
One more point which may be at the root of my problems. I'm trying to
get Apache to add the Content-Type header using the following
declaration in my httpd.conf per the Apache docs:
AddDefaultCharset utf-8
No matter if I have this in my main server configuration or the virtual
host configuration, if I do a `HEAD http::servername`, I get back a
Content-Type of iso-8859-1. If I view the page in Firefox and manually
tell Firefox to display it as UTF-8, all is well. Any ideas why Apache
isn't playing nice?
Thanks,
William
--
Knowmad Services Inc.
http://www.knowmad.com
More information about the Petal
mailing list