1101
votes

I often find this strange CDATA tag in XML files:

<![CDATA[some stuff]]>

I have observed that this CDATA tag always comes at the beginning, and then followed by some stuff.

But sometimes it is used, sometimes it is not. I assume it is to mark that some stuff is the "data" that will be inserted after that. But what kind of data is some stuff? Isn't anything I write in XML tags some sort of data?

13

13 Answers

1025
votes

CDATA stands for Character Data and it means that the data in between these strings includes data that could be interpreted as XML markup, but should not be.

The key differences between CDATA and comments are:

This means given these four snippets of XML from one well-formed document:

<!ENTITY MyParamEntity "Has been expanded">

<!--
Within this comment I can use ]]>
and other reserved characters like <
&, ', and ", but %MyParamEntity; will not be expanded
(if I retrieve the text of this node it will contain
%MyParamEntity; and not "Has been expanded")
and I can't place two dashes next to each other.
-->

<![CDATA[
Within this Character Data block I can
use double dashes as much as I want (along with <, &, ', and ")
*and* %MyParamEntity; will be expanded to the text
"Has been expanded" ... however, I can't use
the CEND sequence. If I need to use CEND I must escape one of the
brackets or the greater-than sign using concatenated CDATA sections.
]]>

<description>An example of escaped CENDs</description>
<!-- This text contains a CEND ]]> -->
<!-- In this first case we put the ]] at the end of the first CDATA block
     and the > in the second CDATA block -->
<data><![CDATA[This text contains a CEND ]]]]><![CDATA[>]]></data>
<!-- In this second case we put a ] at the end of the first CDATA block
     and the ]> in the second CDATA block -->
<alternative><![CDATA[This text contains a CEND ]]]><![CDATA[]>]]></alternative>
350
votes

A CDATA section is "a section of element content that is marked for the parser to interpret as only character data, not markup."

Syntactically, it behaves similarly to a comment:

<exampleOfAComment>
<!--
    Since this is a comment
    I can use all sorts of reserved characters
    like > < " and &
    or write things like
    <foo></bar>
    but my document is still well-formed!
-->
</exampleOfAComment>

... but it is still part of the document:

<exampleOfACDATA>
<![CDATA[
    Since this is a CDATA section
    I can use all sorts of reserved characters
    like > < " and &
    or write things like
    <foo></bar>
    but my document is still well formed!
]]>
</exampleOfACDATA>

Try saving the following as a .xhtml file (not .html) and open it using FireFox (not Internet Explorer) to see the difference between the comment and the CDATA section; the comment won't appear when you look at the document in a browser, while the CDATA section will:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" >
<head>
<title>CDATA Example</title>
</head>
<body>

<h2>Using a Comment</h2>
<div id="commentExample">
<!--
You won't see this in the document
and can use reserved characters like
< > & "
-->
</div>

<h2>Using a CDATA Section</h2>
<div id="cdataExample">
<![CDATA[
You will see this in the document
and can use reserved characters like
< > & "
]]>
</div>

</body>
</html>

Something to take note of with CDATA sections is that they have no encoding, so there's no way to include the string ]]> in them. Any character data which contains ]]> will have to - as far as I know - be a text node instead. Likewise, from a DOM manipulation perspective you can't create a CDATA section which includes ]]>:

var myEl = xmlDoc.getElementById("cdata-wrapper");
myEl.appendChild(xmlDoc.createCDATASection("This section cannot contain ]]>"));

This DOM manipulation code will either throw an exception (in Firefox) or result in a poorly structured XML document: http://jsfiddle.net/9NNHA/

72
votes

One big use-case: your xml includes a program, as data (e.g. a web-page tutorial for Java). In that situation your data includes a big chunk of characters that include '&' and '<' but those characters aren't meant to be xml.

Compare:

<example-code>
while (x &lt; len &amp;&amp; !done) {
    print( &quot;Still working, &apos;zzz&apos;.&quot; );
    ++x;
    }
</example-code>

with

<example-code><![CDATA[
while (x < len && !done) {
    print( "Still working, 'zzzz'." );
    ++x;
    }
]]></example-code>

Especially if you are copy/pasting this code from a file (or including it, in a pre-processor), it's nice to just have the characters you want in your xml file, w/o confusing them with XML tags/attributes. As @paary mentioned, other common uses include when you're embedding URLs that contain ampersands. Finally, even if the data only contains a few special characters but the data is very very long (the text of a chapter, say), it's nice to not have to be en/de-coding those few entities as you edit your xml file.

(I suspect all the comparisons to comments are kinda misleading/unhelpful.)

45
votes

I once had to use CDATA when my xml element needed to store HTML code. Something like

<codearea>
  <![CDATA[ 
  <div> <p> my para </p> </div> 
  ]]>
</codearea>

So CDATA means it will ignore any character which could otherwise be interpreted as XML tag like < and > etc.

34
votes

The data contained therein will not be parsed as XML, and as such does not need to be valid XML or can contain elements that may appear to be XML but are not.

19
votes

As another example of its use:

If you have an RSS Feed (xml document) and want to include some basic HTML encoding in the display of the description, you can use CData to encode it:

<item>
  <title>Title of Feed Item</title>
  <link>/mylink/article1</link>
  <description>
    <![CDATA[
      <p>
      <a href="/mylink/article1"><img style="float: left; margin-right: 5px;" height="80" src="/mylink/image" alt=""/></a>
      Author Names
      <br/><em>Date</em>
      <br/>Paragraph of text describing the article to be displayed</p>
    ]]>
  </description>
</item>

The RSS Reader pulls in the description and renders the HTML within the CDATA.

Note - not all HTML tags work - I think it depends on the RSS reader you are using.


And as a explanation for why this example uses CData (and not the appropriate pubData and dc:creator tags): this is for website display using a RSS widget for which we have no real formatting control.

This enables us to specify the height and position of the included image, format the author names and date correctly, and so forth, without the need for a new widget. It also means I can script this and not have to add them by hand.

17
votes

From Wikipedia:

[In] an XML document or external parsed entity, a CDATA section is a section of element content that is marked for the parser to interpret as only character data, not markup.

http://en.wikipedia.org/wiki/CDATA

Thus: text inside CDATA is seen by the parser but only as characters not as XML nodes.

11
votes

CDATA stands for Character Data. You can use this to escape some characters which otherwise will be treated as regular XML. The data inside this will not be parsed. For example, if you want to pass a URL that contains & in it, you can use CDATA to do it. Otherwise, you will get an error as it will be parsed as regular XML.

8
votes

It escapes a string that cannot be passed to XML as usual:

Example:

The string contains "&" in it.

You can not:

<FL val="Company Name">Dolce & Gabbana</FL>

Therefore, you must use CDATA:

<FL val="Company Name"> <![CDATA["Dolce & Gabbana"]]> </FL>
7
votes

It's used to contain data which could otherwise be seen as xml because it contains certain characters.

This way the data inside will be displayed, but not interpreted.

2
votes

The Cdata is a data which you may want to pass to an xml parser and still not interpreted as an xml.

Say for eg :- You have an xml which has encapsulates question/answer object . Such open fields can have any data which does not strictly fall under basic data type or xml defined custom data types. Like --Is this a correct tag for xml comment ? .-- You may have a requirement to pass it as it is without being interpreted by the xml parser as another child element. Here Cdata comes to your rescue . By declaring as Cdata you are telling the parser don't treat the data wrapped as an xml (though it may look like one )

1
votes

Usually used for embedding custom data, like pictures or sound data within an XML document.

1
votes

Note that the CDATA construct is only needed if placing text directly in the XML text file.

That is, you only need to use CDATA if hand typing or programmatically building the XML text directly.

Any text entered using a DOM processor API or SimpleXML will be automatically escaped to prevent running foul of XML content rules.

Notwithstanding that, there can be times where using CDATA can reduce the text size that would otherwise be produced with all entities encoded, such as for css in style tags or javascript in script tags, where many language constructs use characters in HTML|XML, like < and >.