Is there a way to escape a CDATA end token in xml?

Question

I was wondering if there is any way to escape a CDATA end token (]]>) within a CDATA section in an xml document. Or, more generally, if there is some escape sequence for using within a CDATA (but if it exists, I guess it'd probably only make sense to escape begin or end tokens, anyway).

Basically, can you have a begin or end token embedded in a CDATA and tell the parser not to interpret it but to treat it as just another character sequence.

Probably, you should just refactor your xml structure or your code if you find yourself trying to do that, but even though I've been working with xml on a daily basis for the last 3 years or so and I have never had this problem, I was wondering if it was possible. Just out of curiosity.

Edit:

Other than using html encoding...

First, i accept the answer as correct but note: Nothing precludes someone from encoding > as > within CData to ensure embedded ]]> will not be parsed as CDEnd. It simply means it's unexpected and that & must FIRST be encoded as & too so that the data can be properly decoded. Users of the document must know to decode this CData too. It's not unheard of since part of the purpose of CData is to contain content that a specific consumer understands how to handle. Such a CData just can't be expected to be interpreted properly by any generic consumer. — nix
@nix, CDATA just provides an explicit way to declare text node content such that language tokens within (other than ]]>) do not get parsed. It specifically does not expand entity references like > for this reason, so in a CDATA block, that just means those four characters, not '>'. To put it in perspective: in the xml spec, all text content is called "cdata", not just these sequences ("character data"). Also it's not about specific consuming agents. (Such a thing does exist though -- processing instructions (<?target instruction?>). — Semicolon
(I should add, even if this sort of thing runs contrary to the original intent of the node, all is fair in the long & torturous battle with XML. I just feel it could be useful for readers to know that <![CDATA[]]> was not actually designed for that purpose.) — Semicolon
@Semicolon CDATA was designed to allow anything: they are used to escape blocks of text containing characters which would otherwise be recognized as markup That implies CDATA too since it is also markup. But, in fact, you don't need the double encoding I implied. ]]> is an acceptable means of encoding a CDEnd within a CDATA. — nix
True, you wouldn't need double encoding -- but you would still need the agent to have special knowledge, since the parser wouldn't parse > as >. That's what you mean though, I think? That you could replace them as you see fit, after parsing? — Semicolon

S.Lott S.Lott · Accepted Answer · 2008-10-21T22:27:56

You have to break your data into pieces to conceal the ]]>.

Here's the whole thing:

<![CDATA[]]]]><![CDATA[>]]>

The first <![CDATA[]]]]> has the ]]. The second <![CDATA[>]]> has the >.

Is there a way to escape a CDATA end token in xml?

9 Answers