1
votes

Task: I want to parse an XML document using DOMParser (https://developer.mozilla.org/en-US/docs/Web/API/DOMParser). I have no and need no formal DTD and parsing this as "text/xml" worked pretty well. Now I want to use certain symbolic entities, such as   in my xml and the parser, of course, complains that they are not known. Since I want to be able to access, in principle, all existing html entities, I tried to use a doctype specification

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/html4/strict.dtd">

and this worked as expected, since DOMParser seems to have this doctype and the connected entity list preloaded. However, this doctype is outdated. So I tried the new <!DOCYTPE html> but this did not work. Also this is expected, as the novel html5 doctype tag works differently than the older xml/sgml based ones.

Question: Is there some standardized !DOCTYPE for html (5) which the browser recognizes and which contains the preloaded HTML entities. (I do not want to copy in a list of all entities as separate entity definitions, the browser has them somewhere, I just do not know how to activate them by an xml/sgml style DTD for html5)

1

1 Answers

0
votes

If you want to continue using XML, but don't want to use the XHTML doctype, then you have to declare the character entities of XHTML via ENTITY declarations directly in your document (in the internal subset or an external declaration set) since only HTML has nbsp and many others as predefined entities (XML has only quot, amp, apos, lt, and gt). You can use the HTML5 entity set from https://www.w3.org/2003/entities/2007/htmlmathml-f.ent (which includes the large set of MathML entities), or the much smaller set of classic HTML4 entities.

But I would first check if DomParser actually processes markup declarations and/or external declaration sets with markup declarations. Try to parse the following

<?xml version="1.0"?>
<!DOCTYPE test [
  <!ENTITY nbsp "&#xA0;">
]>
<test>
  &nbsp;
</test>

and check the console for error messages.

There is no "official" DTD for HTML (in fact, no formal grammar at all), but there's my SGML DTD for W3C HTML 5.1 with much more information about parsing HTML5 than you probably are interested in, including info about HTML5's predefined entities.