0
votes

Using XMLUtils.marshalToString() from docx4j, I have the following content at identical locations in two docx files (extracted from corresponding word/document.xml after unzipping the .docx). These are the only differences between the files:

 <w:t xml:space="preserve">New line.  First is </w:t>

and

 <w:t xml:space="preserve">
 <w:r>
 <w:t xml:space="preserve">New line.</w:t>
 </w:r>
 <w:r>
 <w:t xml:space="preserve">  First is </w:t>
 </w:r>
 </w:t>

In the first document, the <w:t> node is output as above.

However, in the second, an empty <w:t> node is printed as follows:

   <w:t xml:space="preserve"></w:t>

I checked the w:t schema at http://www.schemacentral.com/sc/ooxml/e-w_p-1.html and w:r is a valid contained element.

Edit: the above link is the schema of the w:p element, not w:t. The proper link for w:t is: http://www.schemacentral.com/sc/ooxml/e-w_t-1.html. It clearly shows the only acceptable content for w:t is a string (not a w:r or any other tags). Consequently (as suggested Jason's answer below), the XML from document.xml was invalid, and (as such) not being unmarshalled into docx4j. As a result, the text was not available for output by XmlUtils.marshalToString().

What is keeping the second block from being output?

1

1 Answers

0
votes

You can trust marshalToString.

If it is returning an empty w:t, that's because the underlying org.docx4j.wml.Text object has a null or empty value field.

You need to look at whatever code is supposed to be populating that.