I am to migrate the contents of a Lotus Notes database to SharePoint. The entire database is exported to XML files (this requirement cannot be changed) and I have to parse these XML files and insert the data into SharePoint.
Whats tripping me up is the elements that contain rich text. The XML elements contain an XML representation of the exact rich text format used in the field in Lotus Notes using DXL as described in http://publib.boulder.ibm.com/infocenter/domhelp/v8r0/index.jsp?topic=%2Fcom.ibm.designer.domino.main.doc%2FH_PARAGRAPH_DEFINITIONS_ELEMENT_XML.html
I don't need to keep the actual formatting of the text (unless this is equally easy as retrieving the plain text), but if I simply extract the value of the XML element containing the rich text (using LinqToXML) I get the plain text without linebreaks which is not acceptable. Additionally, embedded images are displayed in the retrieved text as base64 encoded strings (they are embedded in the XML as such).
Can anyone provide me with guidance to how to extract the text from the XML element either as proper RTF format that can be inserted into an RTF file or as a plain text that includes the correct line breaks and don't contain the embedded images?