In How do I change color of a particular word document using apache poi? I have shown an algorithm to split XWPFRuns
for formatting reasons. This is only for formatting one character and it does not clone the run properties. But the basic is shown. We have to look at the entire paragraph since only there are methods for inserting runs. And we need looping over the run texts character wise since all methods for split into words will lead to problems with punctuation marks while reassembling the words to a paragraph then.
What lacks is a method for cloning the run properties from original run to the new added ones. This could be done by cloning the underlying w:rPr
element.
Then the whole approach is to go through all runs in paragraph. If we have a run with keyword in it, then split run text into characters. Then go through all characters in that run and buffer them. If the buffered character stream ends with the keyword, then set all chars, which are current buffered, except the keyword, as the text of the actual run. Then insert new run for the formatted keyword and clone the run properties from original run. Set the keyword into the run and do the additional formatting. Then insert a new run for the next characters and also clone the run properties from original run. So on for each run in the paragraph.
Complete example:
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.XmlCursor;
import java.util.*;
import java.awt.Desktop;
public class WordFormatWords {
static void cloneRunProperties(XWPFRun source, XWPFRun dest) { // clones the underlying w:rPr element
CTR tRSource = source.getCTR();
CTRPr rPrSource = tRSource.getRPr();
if (rPrSource != null) {
CTRPr rPrDest = (CTRPr)rPrSource.copy();
CTR tRDest = dest.getCTR();
tRDest.setRPr(rPrDest);
}
}
static void formatWord(XWPFParagraph paragraph, String keyword, Map<String, String> formats) {
int runNumber = 0;
while (runNumber < paragraph.getRuns().size()) { //go through all runs, we cannot use for each since we will possibly insert new runs
XWPFRun run = paragraph.getRuns().get(runNumber);
XWPFRun run2 = run;
String runText = run.getText(0);
if (runText != null && runText.contains(keyword)) { //if we have a run with keyword in it, then
// This code part is to manage comment ranges.
// Do we have commentRangeEnd immediately after the run?
// If so then remember that in a cursor.
XmlCursor commentRangeEndCursor = null;
XmlCursor cursor = run.getCTR().newCursor();
cursor.toEndToken();
if (cursor.hasNextToken()) {
cursor.toNextToken();
XmlObject commentRangeEnd = cursor.getObject();
if (commentRangeEnd != null && commentRangeEnd instanceof CTMarkupRange) {
commentRangeEndCursor = cursor;
}
}
char[] runChars = runText.toCharArray(); //split run text into characters
StringBuffer sb = new StringBuffer();
for (int charNumber = 0; charNumber < runChars.length; charNumber++) { //go through all characters in that run
sb.append(runChars[charNumber]); //buffer all characters
runText = sb.toString();
if (runText.endsWith(keyword)) { //if the bufferend character stream ends with the keyword
//set all chars, which are current buffered, except the keyword, as the text of the actual run
run.setText(runText.substring(0, runText.length() - keyword.length()), 0);
run2 = paragraph.insertNewRun(++runNumber); //insert new run for the formatted keyword
cloneRunProperties(run, run2); // clone the run properties from original run
run2.setText(keyword, 0); // set the keyword in run
for (String toSet : formats.keySet()) { // do the additional formatting
if ("color".equals(toSet)) {
run2.setColor(formats.get(toSet));
} else if ("bold".equals(toSet)) {
run2.setBold(Boolean.valueOf(formats.get(toSet)));
}
}
run2 = paragraph.insertNewRun(++runNumber); //insert a new run for the next characters
cloneRunProperties(run, run2); // clone the run properties from original run
run = run2;
sb = new StringBuffer(); //empty the buffer
}
}
run.setText(sb.toString(), 0); //set all characters, which are currently buffered, as the text of the actual run
// This code part is to manage comment ranges.
// If we had remembered commentRangeEnd, then move this to here now.
if(commentRangeEndCursor != null) {
cursor = run.getCTR().newCursor();
cursor.toEndToken();
if (cursor.hasNextToken()) {
cursor.toNextToken();
commentRangeEndCursor.moveXml(cursor);
}
cursor.dispose();
commentRangeEndCursor.dispose();
}
}
runNumber++;
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String[] keywords = new String[]{"fox", "dog"};
Map<String, String> formats = new HashMap<String, String>();
formats.put("bold", "true");
formats.put("color", "DC143C");
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
for (String keyword : keywords) {
formatWord(paragraph, keyword, formats);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
System.out.println("Done");
Desktop.getDesktop().open(new File("result.docx"));
}
}
This code also takes care about XML
markup range elements such as commentRangeEnd
which are immediately after the run's r
element. Such markup range elements are used to mark start and end of groups of other elements. For example a group of text run elements to those a comment is applied is between commentRangeStart
and commentRangeEnd
having same id
.
If immediately after the run which needs to be split follows a commentRangeEnd
, then we remember that in a cursor. Then after splitting the run we move this commentRangeEnd
immediately behind the last new inserted run. So comments should stay correct.
Of course even this will have some disadvantages because of the clumsy kind on how Microsoft Word
stores text in text runs sometimes. There is not the one and only general solution for this when Microsoft Word
is the source.