[Petal] UTF8 under Perl 5.6.1

William McKee william at knowmad.com
Mon Feb 28 20:23:55 GMT 2005


On Mon, Feb 28, 2005 at 04:49:54PM +0100, Michele Beltrame wrote:
> >I tried using Encode but quickly realized that it is not available under
> >Perl 5.6. So, does anyone have any idea of what encoding Perl uses in
> 
> I think you can install it, it's available on CPAN. However, I don't 
> know how it works under Perl 5.6.x. ;-)

Apparently not since this line is in the Makefile.PL:

  use 5.007003;

My initial research was concluding that I should not use Petal 2 with
Perl 5.6.x because Unicode support is incomplete and experimental. As an
example, I have an application with 3 templates. Two of the pages
generate iso-8859-1 output. One of the pages generates utf8 output.

This is a major problem because when I view chr(160), aka nbsp, in
latin1 when data is in utf8, I end up with the Acirc character being
displayed (btw, I found an excellent explanation[1] of why this happens
which has helped clear up my understanding of some of the unicode issues
I've encountered).

The crux of the problem is that I was having difficulty finding a way to
influence how perl 5.6.1 is reading in data from my templates. According
to the perlunicode docs for 5.6.1[2], this is a limitation of the
system. It turns out that pack can be used to influence this behavior.
By replacing the chr() functions with pack("U", xxx) functions, I was
able to get perl 5.6.1 to output UTF8 on all 3 of my pages.

Despite the following patch, I'd still advise against using Petal 2 with
Perl 5.6.x due to this kind of crazy behavior. I modified
MKDoc::XML::Decode::[XMLBase|XHTML|Numeric] to enable this behavior.
>From my initial tests, the code also works under 5.8.6. I've included a
patch to this message in case anyone else would like to see how I
implemented this behavior. Comments would be much appreciated. This
patch also contains my modifications to Decode.pm which enable PAR
support. It is based on the older 0.72 codebase.

BTW, can anyone tell me how to generate a recursive diff that includes
the primary module? For instance, here's the basic syntax of the
command-line I used to generate this attachment:

  diff -ru ~/MKDoc-XML-0.74/lib/MKDoc ~/client1/perl5/MKDoc

However, that does not include modifications to MKDoc.pm (not that I
made any). Would that have to be submitted as a separate patch or is it
possible to concatenate these?


Thanks,
William

[1] http://www.dpawson.co.uk/xsl/sect2/N7150.html#d8562e1215
[2] http://www.sunsite.ualberta.ca/Documentation/Misc/perl-5.6.1/pod/perlunicode.html

-- 
Knowmad Services Inc.
http://www.knowmad.com
-------------- next part --------------
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode/Numeric.pm MKDoc/XML/Decode/Numeric.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode/Numeric.pm	2003-12-10 10:52:13.000000000 -0500
+++ MKDoc/XML/Decode/Numeric.pm	2005-02-28 15:03:03.000000000 -0500
@@ -14,8 +14,10 @@
     # if hex, convert to hex
     $stuff =~ s/^\[xX]([0-9a-fA-F])+$/hex($1)/e;
     
+    #warn "Converting $stuff to an entity via chr --> " . chr ($stuff);
     return unless ($stuff =~ /^\d+$/);
-    return chr ($stuff);
+    #return chr ($stuff) if $[ > 5.007;
+    return pack ("U", $stuff);
 }
 
 1;
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode/XHTML.pm MKDoc/XML/Decode/XHTML.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode/XHTML.pm	2004-10-06 07:31:44.000000000 -0400
+++ MKDoc/XML/Decode/XHTML.pm	2005-02-28 15:02:42.000000000 -0500
@@ -317,7 +317,13 @@
     (@_ == 2) or warn "MKDoc::XML::Encode::process() should be called with two arguments";
     my $class = shift;
     my $stuff = shift;
-    return $ENTITY_2_CHAR{$stuff};
+    #return $ENTITY_2_CHAR{$stuff} if $[ > 5.007;
+
+    my $char = $ENTITY_2_CHAR{$stuff} || '';
+    return unless $char;
+    my $ord = ord($char);
+    #warn "found a character = $stuff --> " . $char . " [$ord]";
+    return pack("U", ord($char));
 }
 
 
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode/XMLBase.pm MKDoc/XML/Decode/XMLBase.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode/XMLBase.pm	2003-12-10 10:52:13.000000000 -0500
+++ MKDoc/XML/Decode/XMLBase.pm	2005-02-28 15:00:50.000000000 -0500
@@ -14,7 +14,14 @@
 {
     my $class = shift;
     my $stuff = shift;
-    return $XML_Decode{$stuff};
+    #return $XML_Decode{$stuff};
+
+    my $char = $XML_Decode{$stuff} || '';
+    #warn "char = $char";
+    return unless $char;
+    my $ord = ord($char);
+    warn "ord = $ord";
+    return pack("U", $ord);
 }
 
 sub module_name { 'xml' }
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode.pm MKDoc/XML/Decode.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Decode.pm	2004-10-06 07:44:27.000000000 -0400
+++ MKDoc/XML/Decode.pm	2005-02-28 15:03:21.000000000 -0500
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Decode
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This modules expands XML entities &amp; &gt; &lt; &quot; and &apos;.
@@ -21,6 +21,25 @@
 # import all plugins once
 foreach my $include_dir (@INC)
 {
+  my @modules;
+  # handle PAR archives
+  if (ref $include_dir eq 'CODE')
+  {
+    next unless keys %PAR::LibCache; # skip if PAR is not loaded
+    while (my ($filename, $zip) = each %PAR::LibCache)
+    {
+      my @mods = $zip->membersMatching( "MKDoc/XML/Decode/" );
+      foreach my $mod (@mods)
+      {
+        my $fn = $mod->fileName;
+        my ($pm) = $fn =~ /\/(\w+)\.pm$/;
+        #warn "$fn = $pm";
+        push @modules, $pm;
+      }
+    }
+  }
+  else
+  {
     my $dir = "$include_dir/MKDoc/XML/Decode";
     if (-e $dir and -d $dir)
     {
@@ -35,7 +54,11 @@
                       readdir (DD);
 
         closedir DD;
+      }
+    }
 	
+    #use Data::Dumper;
+    #warn Dumper(\@modules);
         foreach my $module (@modules)
         {
 	    $module =~ /^(\w+)$/;
@@ -49,7 +72,6 @@
 	    
 	    $Modules{$name} = "MKDoc::XML::Decode::$module";
         }
-    }
 }
 
 
@@ -72,6 +94,7 @@
 {
     my $self = shift;
     my $char = shift;
+    #warn "Converting $char to an entity";
     for (@{$self}) {
 	my $res = $_->process ($char);
 	return $res if (defined $res);
@@ -88,6 +111,9 @@
 
     my $self = shift;
     my $data = join '', map { defined $_ ? $_ : () } @_;
+    #use Data::Dumper;
+    #warn "Ref = " . ref $self if ($data  =~ m/&(#?[0-9A-Za-z]+);/);
+    #warn "Data = " . Dumper($data) if ($data  =~ m/&(#?[0-9A-Za-z]+);/);
     $data    =~ s/&(#?[0-9A-Za-z]+);/$self->entity_to_char ($1)/eg;
     return $data;
 }
@@ -145,7 +171,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Dumper.pm MKDoc/XML/Dumper.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Dumper.pm	2004-10-06 07:44:34.000000000 -0400
+++ MKDoc/XML/Dumper.pm	2004-01-31 05:48:15.000000000 -0500
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Dumper
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module serializes / dumps / freezes Perl structures to a well-formed XML string
@@ -455,7 +455,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Encode.pm MKDoc/XML/Encode.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Encode.pm	2004-10-06 07:44:37.000000000 -0400
+++ MKDoc/XML/Encode.pm	2003-09-19 11:28:30.000000000 -0400
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Encode
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This modules encodes XML entities &amp; &gt; &lt; &quot; and &apos;.
@@ -80,7 +80,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Stripper/mkdoc16.txt MKDoc/XML/Stripper/mkdoc16.txt
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Stripper/mkdoc16.txt	2004-10-06 07:31:44.000000000 -0400
+++ MKDoc/XML/Stripper/mkdoc16.txt	2004-07-21 13:40:36.000000000 -0400
@@ -331,7 +331,6 @@
 h1 lang
 h1 title
 h1 xml:lang
-h1 align
 
 
 # h2
@@ -343,7 +342,7 @@
 h2 lang
 h2 title
 h2 xml:lang
-h2 align
+
 
 # h3
 # heading
@@ -354,7 +353,7 @@
 h3 lang
 h3 title
 h3 xml:lang
-h3 align
+
 
 # h4
 # heading
@@ -365,7 +364,7 @@
 h4 lang
 h4 title
 h4 xml:lang
-h4 align
+
 
 # h5
 # heading
@@ -376,7 +375,7 @@
 h5 lang
 h5 title
 h5 xml:lang
-h5 align
+
 
 # h6
 # heading
@@ -387,7 +386,7 @@
 h6 lang
 h6 title
 h6 xml:lang
-h6 align
+
 
 # head
 # document head
@@ -448,7 +447,7 @@
 img usemap
 img width
 img xml:lang
-img border
+
 
 # input
 # form control
@@ -619,7 +618,6 @@
 p lang
 p title
 p xml:lang
-p align
 
 
 # param
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Stripper.pm MKDoc/XML/Stripper.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Stripper.pm	2004-10-06 07:44:39.000000000 -0400
+++ MKDoc/XML/Stripper.pm	2003-09-30 05:49:57.000000000 -0400
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Stripper
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module removes user-defined markup from an existing XML file / variable.
@@ -326,7 +326,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Tagger/Preserve.pm MKDoc/XML/Tagger/Preserve.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Tagger/Preserve.pm	2004-10-06 07:44:56.000000000 -0400
+++ MKDoc/XML/Tagger/Preserve.pm	2004-02-06 10:38:31.000000000 -0500
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Tagger::Preserve
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module uses MKDoc::XML::Tagger, except it preserves specific tags to prevent
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Tagger.pm MKDoc/XML/Tagger.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Tagger.pm	2004-12-09 08:49:22.000000000 -0500
+++ MKDoc/XML/Tagger.pm	2004-06-21 08:38:23.000000000 -0400
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Tagger
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module adds markup to an existing XML file / variable by matching expression.
@@ -108,15 +108,12 @@
         $text = _text_replace ($text, $expr, $tag, \%attr);
     }
     
-    while ($text =~ /\&\(\d+\)/)
-    {
     for (my $i = 0; $i < @{$tags}; $i++)
     {
         my $c   = $i + 1;
         my $tag = $tags->[$i];
         $text =~ s/\&\($c\)/$tag/g;
     }
-    }
     
     return $text;
 }
@@ -365,7 +362,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Tokenizer.pm MKDoc/XML/Tokenizer.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Tokenizer.pm	2004-10-06 07:44:50.000000000 -0400
+++ MKDoc/XML/Tokenizer.pm	2003-10-16 07:20:17.000000000 -0400
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::Tokenizer
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module turns an XML string into a list of tokens and returns this list.
@@ -199,7 +199,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Token.pm MKDoc/XML/Token.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/Token.pm	2004-10-06 07:44:47.000000000 -0400
+++ MKDoc/XML/Token.pm	2004-05-04 12:03:15.000000000 -0400
@@ -413,7 +413,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/TreeBuilder.pm MKDoc/XML/TreeBuilder.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/TreeBuilder.pm	2004-10-06 07:44:53.000000000 -0400
+++ MKDoc/XML/TreeBuilder.pm	2003-10-09 11:22:04.000000000 -0400
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::TreeBuilder
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module turns an XML string into a tree of elements and returns the top elements.
@@ -306,7 +306,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/TreePrinter.pm MKDoc/XML/TreePrinter.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML/TreePrinter.pm	2004-10-06 07:45:02.000000000 -0400
+++ MKDoc/XML/TreePrinter.pm	2004-07-20 08:46:07.000000000 -0400
@@ -1,7 +1,7 @@
 # -------------------------------------------------------------------------------------
 # MKDoc::XML::TreePrinter
 # -------------------------------------------------------------------------------------
-# Author : Jean-Michel Hiver.
+# Author : Jean-Michel Hiver <jhiver at mkdoc.com>.
 # Copyright : (c) MKDoc Holdings Ltd, 2003
 #
 # This module is the counterpart of MKDoc::XML::TreePrinter. It turns an XML
@@ -75,8 +75,6 @@
 sub _encode_quot
 {
     my $res = shift;
-    return '' unless (defined $res);
-
     $res =~ s/\"/\&quot\;/g;
     return $res;
 }
@@ -110,7 +108,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
diff -ru /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML.pm MKDoc/XML.pm
--- /home/william/local/MKDoc-XML-0.74/lib/MKDoc/XML.pm	2004-12-09 09:02:39.000000000 -0500
+++ MKDoc/XML.pm	2004-07-21 13:41:54.000000000 -0400
@@ -2,7 +2,7 @@
 use strict;
 use warnings;
 
-our $VERSION = '0.74';
+our $VERSION = '0.72';
 
 
 1;
@@ -111,7 +111,7 @@
 
 Copyright 2003 - MKDoc Holdings Ltd.
 
-Author: Jean-Michel Hiver
+Author: Jean-Michel Hiver <jhiver at mkdoc.com>
 
 This module is free software and is distributed under the same license as Perl
 itself. Use it at your own risk.
@@ -119,7 +119,7 @@
 
 =head1 SEE ALSO
 
-  Petal: http://search.cpan.org/dist/Petal/
+  Petal: http://search.cpan.org/author/JHIVER/Petal/
   MKDoc: http://www.mkdoc.com/
 
 Help us open-source MKDoc. Join the mkdoc-modules mailing list:


More information about the Petal mailing list