2
votes

I want to traverse through all the elements of an word document one by one and according to type of element (header, sentence, table,image,textbox, shape, etc.) I want to process that element. I tried to search any enumerator or object which can represent elements of document in office interop API but failed to find any. API offers sentences, paragraphs, shapes collections but doesnt provide generic object which can point to next element. For example :

<header of document>
<plain text sentences>
<table with many rows,columns>
<text box>
<image>
<footer>

(Please imagine it as a word document)


So, now I want some enumerator which will first give me <header of document>, then on next iteration give me <plain text sentences>, then <table with many rows,columns> and so on. Does anyone knows how we can achieve this? Is it possible?

I am using C#, visual studio 2005 and Word 2003.

Thanks a lot

2

2 Answers

4
votes

The reason that you don't have a simple iterator is that Word documents can be far more complex than the simple structure outlined in your question.

For example, a document may have multiple headers and footers for the first page as well as even and odd pages, contains more than one section with different header and footer setup, contain footnotes, comments and revisions, and objects such as tables, text boxes, images and shapes may appear inline with text or floating. In short, there is no fix sequence of elements.

You would have to check how complex your input documents are and based on the result of that analysis decide how to iterate over paragraphs and attached images and shapes etc.

3
votes

for example:

        // open the file
        Word.ApplicationClass app = new Word.ApplicationClass();
        object path = @"c:\Users\name\Desktop\Весь набор.docx";
        object missing = System.Reflection.Missing.Value;

        Word.Document doc = null;
        try
        {
            doc = app.Documents.Open(ref path,
                ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing);

            // index
            foreach ( Word.Section section in doc.Sections)
            {
                Debug.WriteLine("Section index:" + section.Index);
                Debug.WriteLine("section start: " + section.Range.Start + ", section end: " + section.Range.End);

            }


            bool processNextTable = false;
            foreach (Word.Paragraph paragraph in doc.Paragraphs)
            {
                string toWrite = paragraph.Range.Text;
                System.Diagnostics.Debug.WriteLine(toWrite);
            }

            foreach (Word.Table table in doc.Tables)
            {
                foreach (Word.Row wRow in table.Rows)
                    foreach (Word.Cell cell in wRow.Cells)
                    {
                    }
            }

        }
        finally
        {
            if (doc != null)
            {
                bool saveChanges = false; // temporary not save any changes
                app.Quit(ref saveChanges, ref missing, ref missing);
            }
        }