1
votes

I'm trying to take some data stored in a database and populate a Word template's Content Controls with it using the Open XML SDK. The data contains paragraphs and so there are carriage return and line feed characters in it. The data is stored in the database as nvarchar.

When I open the generated document, the CR+LF combination shows up as a question mark with a box around it (not sure the name of this character). This is actually two sequences back to back, so CR+LF CR+LF equals two strange characters:

strange character

If I unzip the .docx, take the Custom XML part and do a hex dump, I can clearly see 0d0a 0d0a so the CR+LF is there. Word is just printing it weird.

I've tried enforcing UTF-8 encoding in my XmlWriter's settings, but that didn't seem to help:

Dim docStream As New MemoryStream
Dim settings As XmlWriterSettings = New XmlWriterSettings()
settings.Encoding = New UTF8Encoding(False)
Dim docWriter As XmlWriter = XmlTextWriter.Create(docStream, settings)

Does anyone know how I can get Word to render these characters correctly when written to a .docx through the Open XML SDK?

1
Perhaps helpful to understand that Word would not save CR/CRLF combinations but has XML constructs instead - e.g. a paragraph with "abc" then a line break, then "def" would be more like: <w:p><w:r><w:t>abc</w:t></w:r><w:r><w:br/></w:r><w:r><w:t>def</w:t></w:r></w:p> - user1379931
Correct. @bibadia provided the answer. I use an add-in with Open XML editor to analyze what is happening. You can also debug from VS using an add-in you develop and look at the xml version continuously when stepping through. - Guido Leenders
@bibadia Okay, that is helpful. I can probably just run a regex replace over the string from the database to apply the proper tags. But I tried to add those tags in manually to the custom XML part and Word can no longer read the document. I included the w namespace from some MSDN articles but that didn't help. Any idea if these tags are different for custom XML parts? - embedded.kyle
My mistake - if the data is going in the custom part then it shouldn't have the XML encoding (unless you are going to open in Word 2013 and your control is a rich text control, in which case the Custom Part needs rather more). I see 0d0a here for a line break in the control. I suppose it is possible that the control is not set to be multiline, but here the line breaks just disappear when displayed in that case. - user1379931
@bibadia Well, I am opening in Word 2013 and the control is a rich text control. Could you please expand on what you mean by "rather more"? Possibly an MSDN link with the relevant information? Also, I see no option in the properties to set the control to be multiline. Where is that located? - embedded.kyle

1 Answers

1
votes

To bind to a Word 2013 rich text control, your XML element has to contain a complete docx. See [MS-DOCX]:

the data stored in the XML element will be an escaped string comprised of a flattened WordprocessingML document representing the formatted data in the structured document tag range.

Earlier versions couldn't bind a rich text control.

Things should work though (with CR/LF, not w:br), if you bind to a plain text control, with multiline set to true.