Using XMLUtils.marshalToString()
from docx4j, I have the following content at identical locations in two docx files (extracted from corresponding word/document.xml
after unzipping the .docx). These are the only differences between the files:
<w:t xml:space="preserve">New line. First is </w:t>
and
<w:t xml:space="preserve"> <w:r> <w:t xml:space="preserve">New line.</w:t> </w:r> <w:r> <w:t xml:space="preserve"> First is </w:t> </w:r> </w:t>
In the first document, the <w:t>
node is output as above.
However, in the second, an empty <w:t>
node is printed as follows:
<w:t xml:space="preserve"></w:t>
I checked the w:t
schema at http://www.schemacentral.com/sc/ooxml/e-w_p-1.html and w:r
is a valid contained element.
Edit: the above link is the schema of the w:p
element, not w:t
. The proper link for w:t
is: http://www.schemacentral.com/sc/ooxml/e-w_t-1.html. It clearly shows the only acceptable content for w:t
is a string (not a w:r
or any other tags). Consequently (as suggested Jason's answer below), the XML from document.xml
was invalid, and (as such) not being unmarshalled into docx4j. As a result, the text was not available for output by XmlUtils.marshalToString().
What is keeping the second block from being output?