[Petal] [REQ] Escape HTML entities

Armin Helbach armin at syrius.de
Tue Oct 7 22:15:14 BST 2003


Hi,

I wanted "input => 'XML'" and " " etc.

With recent expat lib and XML::Parser external entities work fine, but you have to supply the 
<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Strict//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd'>
line (or whatever DTD supplies the wanted entities).

I found that this line was not passed to XML::Parser in Parser/XMLWrapper.pm because
of Canonicalizer/XML.pm 

Since I have not fully worked out how my little change affects the rest of the system, I do not attach a patch.
But it works for the entities!

Please comment on this change, because I love petal and will use it in big projects where I need a stable system.

1 Line changed in Canonicalizer/XML.pm:

sub process
{
    my $class = shift;
    my $parser = shift;
    my $data_ref = shift;
    $data_ref = (ref $data_ref) ? $data_ref : \$data_ref;

    # grab anything that's before the first '<' tag
    my ($header) = $$data_ref =~ /(^.*?)<(?!\?|\!)/sm;
    $$data_ref =~ s/(^.*?)<(?!\?|\!)/\</sm;

    # grab the <!...> tags which the parser is going to strip
    # in order to reinclude them afterwards
    my @decls = $$data_ref =~ /(<!.*?>)/gsm;

    # take the existing processing instructions out and replace
    # them with temporary xml-friendly handlers
    my $pis = $class->_processing_instructions_out ($data_ref);

    local @Result = ();
    local @NodeStack = ();

#But I want my DOCTYPE-Decl. passed to XML::Parser !
#This is what I did :
$$data_ref = $header.$$data_ref;

    $parser->process ($class, $data_ref);

    $header ||= '';
    my $res = (join '', @Result);

    $class->_processing_instructions_in (\$res, $pis);

    return \$res;
}


armin




More information about the Petal mailing list