3
votes

Environment: asp.net c# openxml

Ok, so I've been reading a ton of snippets and trying to recreate the wheel, but I'm hoping that somone can help me get to my desination faster. I have multiple documents that I need to merge together... check... I'm able to do that with openxml sdk. Birds are singing, sun is shining so far. Now that I have the document the way I want it, I need to search and replace text and/or content controls.

I've tried using my own text - {replace this} but when I look at the xml (rename docx to zip and view the file), the { is nowhere near the text. So I either need to know how to protect that within the doucment so they don't diverge or I need to find another way to search and replace.

I'm able to search/replace if it is an xml file, but then I'm back to not being able to combine the doucments easily.

Code below... and as I mentioned... document merge works fine... just need to replace stuff.

* Update * changed my replace call to go after the tag instead of regex. I have the right info now, but the .Replace call doesn't seem to want to work. Last four lines are for validation that I was seeing the right tag contents. I simply want to replace those contents now.

    protected void exeProcessTheDoc(object sender, EventArgs e)
    {
        string doc1 = Server.MapPath("~/Templates/doc1.docx");
        string doc2 = Server.MapPath("~/Templates/doc2.docx");
        string final_doc = Server.MapPath("~/Templates/extFinal.docx");

        File.Delete(final_doc);
        File.Copy(doc1, final_doc);

        using (WordprocessingDocument myDoc = WordprocessingDocument.Open(final_doc, true))
        {
            string altChunkId = "AltChunkId2";

            MainDocumentPart mainPart = myDoc.MainDocumentPart;
            AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
            AlternativeFormatImportPartType.WordprocessingML, altChunkId);
            using (FileStream fileStream = File.Open(doc2, FileMode.Open))
            chunk.FeedData(fileStream);
            AltChunk altChunk = new AltChunk();
            altChunk.Id = altChunkId;
            mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
            mainPart.Document.Save();
        }
        exeSearchReplace(final_doc);
    }

    public static void GetPropertyFromDocument(string document, string outdoc)
    {
        XmlDocument xmlProperties = new XmlDocument();

        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, false))
        {
            ExtendedFilePropertiesPart appPart = wordDoc.ExtendedFilePropertiesPart;

            xmlProperties.Load(appPart.GetStream());
        }
        XmlNodeList chars = xmlProperties.GetElementsByTagName("Company");
        chars.Item(0).InnerText.Replace("{ClientName}", "Penn Inc.");

        StreamWriter sw;
        sw = File.CreateText(outdoc);
        sw.WriteLine(chars.Item(0).InnerText);
        sw.Close();
     }    
}

}

2

2 Answers

1
votes

If I'm reading this right, you have something like "{replace me}" in a .docx and then when you loop through the XML, you're finding things like <t>{replace</t><t> me</><t>}</t> or some such havoc. Now, with XML like that, it's impossible to create a routine that will replace "{replace me}".

If that's the case, then it's very, very likely related to the fact that it's considered a proofing error. i.e. it's misspelled as far as Word is concerned. The cause of it is that you've opened the document in Word and have proofing turned on. As such, the text is marked as "isDirty" and split up into different runs.

The two ways about fixing this are:

  1. Client-side. In Word, just make sure all proofing errors are either corrected or ignored.
  2. Format-side. Use the MarkupSimplifier tool that is part of Open XML Package Editor Power Tool for Visual Studio 2010 to fix this outside of the client. Eric White has a great (and timely for you - just a few days old) write up here on it: Getting Started with Open XML PowerTools Markup Simplifier
1
votes

If you want to search and replace text in a WordprocessingML document, there is a fairly easy algorithm that you can use:

  • Break all runs into runs of a single character. This includes runs that have special characters such as a line break, carriage return, or hard tab.
  • It is then pretty easy to find a set of runs that match the characters in your search string.
  • Once you have identified a set of runs that match, then you can replace that set of runs with a newly created run (which has the run properties of the run containing the first character that matched the search string).
  • After replacing the single-character runs with a newly created run, you can then consolidate adjacent runs with identical formatting.

I've written a blog post and recorded a screen-cast that walks through this algorithm.

Blog post: http://openxmldeveloper.org/archive/2011/05/12/148357.aspx
Screen cast: http://www.youtube.com/watch?v=w128hJUu3GM

-Eric