4
votes

I'm trying to create the following DTD containing entity declarations:

<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" 
[ <!ENTITY icon.url "https://example.com/icon.png"> 
<!ENTITY base.url "https://example.com/content/" > ]>

I can successfully create the DOCTYPE without the entity references:

#!/usr/bin/perl -w
use strict;
use XML::LibXML;

my $doc = XML::LibXML::Document->new('1.0','UTF-8');
my $dtd = $doc->createInternalSubset( "LinkSet", "-//NLM//DTD LinkOut 1.0//EN", "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" );

my $ls = $doc->createElement( "LinkSet" );
$doc->setDocumentElement($ls);

print $doc->toString;
exit;

Results in:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd">
<LinkSet/>

The XML::LibXML documentation shows how to add an entity reference to a document, but not how to declare an entity in the DOCTYPE.

A similar (but PHP-based) question points to creating the ENTITY references as a string and parsing that. Is this the best approach in Perl too?

1
The entity declarations are nodes of type XML_ENTITY_DECL that exist as children of the DTD node. $node->addChild doesn't support adding nodes of type XML_ENTITY_DECL, so you wouldn't be able to add one even if you could create it. Therefore, the parse-to-generate approach solution that's been posted is probably the only way to do what you want.ikegami

1 Answers

3
votes

The documentation for XML::LibXML::Document says this

[The Document Class] inherits all functions from XML::LibXML::Node as specified in the DOM specification. This enables access to the nodes besides the root element on document level - a "DTD" for example. The support for these nodes is limited at the moment.

It also makes it clear later on that the source of these limitations is libxml2 itself, not the Perl module. This makes sense, as the DTD has a completely different syntax from XML (or even an XML Processing Instruction) even though it may look superficially similar.

The only way appears to be to parse a basic document with the required DTD and work with that

Like so

use strict;
use warnings 'all';

use XML::LibXML;

my $doc = XML::LibXML->load_xml(string => <<__END_XML__);
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" 
[
  <!ENTITY icon.url "https://example.com/icon.png"> 
  <!ENTITY base.url "https://example.com/content/">
]>

<LinkSet/>
__END_XML__

print $doc;

output

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE LinkSet PUBLIC "-//NLM//DTD LinkOut 1.0//EN" "https://www.ncbi.nlm.nih.gov/projects/linkout/doc/LinkOut.dtd" [
<!ENTITY icon.url "https://example.com/icon.png">
<!ENTITY base.url "https://example.com/content/">
]>
<LinkSet/>