I took the suggestion for comparing docx files from here: OutOfMemoryError while doing docx comparison using docx4j
However, this line:
Body newBody = (Body) org.docx4j.XmlUtils.unmarshalString(contentStr);
triggers a number of JAXB Warnings such as:
WARN org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 80 - [ERROR] : unexpected element (uri:"", local:"ins"). Expected elements are <{[?]}text>
INFO org.docx4j.jaxb.JaxbValidationEventHandler .handleEvent line 106 - continuing (with possible element/attribute loss)
That is understandable given that org.docx4j.wml.Text
does not indicate handling for any nested tags and the string written by Docx4jDriver.diff()
contains:
<w:t dfx:insert="true" xml:space="preserve"><ins>This</ins><ins> </ins><ins>first</ins><ins> </ins><ins>line</ins><ins> </ins><ins>has</ins><ins> </ins><ins>a</ins><ins> </ins></w:t>
Consequently, the Text.getValue()
calls which contain <ins>
tags return an empty String.
I'm attempting to programatically determine diffs between two docx files (original + result of round-tripping a docx transformation process) using the suggested approach plus the following code:
Body newBody = (Body) org.docx4j.XmlUtils.unmarshalString(contentStr);
for ( Object bodyPart : newBody.getContent() ) {
if ( bodyPart instanceof P ) {
P bodyPartInCast = (P)bodyPart;
for ( Object currentPContent : bodyPartInCast.getContent() ) {
if ( currentPContent instanceof R ) {
R pContentCast = (R)currentPContent;
for( Object currentRContent : pContentCast.getContent() ) {
if ( currentRContent instanceof JAXBElement ) {
JAXBElement rContentCast = (JAXBElement)currentRContent;
Object jaxbValue = rContentCast.getValue();
if ( jaxbValue instanceof Text ) {
Text textValue = (Text)jaxbValue;
System.out.println( "Text: --> " + textValue.getValue() );
}
}
}
}
}
}
}
So, the question is: if this isn't the correct approach for processing the details of the differences between two files, what is?
I'm using docx4j version 2.8.0 and the two docx files being compared are: