8
votes

I am trying to use openxml to produce automated excel files. One problem I am facing is to accomodate my object model with open xml object model for excel. I have to come to a point where I realise that the order in which I append the child elements for a worksheet matters.

For Example:

workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);
workSheet.Append(mergeCells);
workSheet.Append(drawing);

the above ordering doesnot give any error.

But the following:

workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);
workSheet.Append(drawing);
workSheet.Append(mergeCells);

gives an error

So this doesn't let me to create a drawing object whenever I want to and append it to the worksheet. Which forces me to create these elements before using them.

Can anyone tell me if I have understood the problem correctly ? Because I believe we should be able to open any excel file create a new child element for a worksheet if necessary and append it. But now this might break the order in which these elements are supposed to be appended.

Thanks.

5

5 Answers

9
votes

According to the Standard ECMA-376 Office Open XML File Formats, CT_Worksheet has a required sequence:

CT_Worksheet Schema Diagram

The reason the following is crashing:

workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);
workSheet.Append(drawing);
workSheet.Append(mergeCells);

Is because you have drawing before mergeCells. As long as you append your mergeCells after drawing, your code should work fine.

Note: You can find the full XSD in ECMA-376 3rd edition Part 1 (.zip) -> OfficeOpenXML-XMLSchema-Strict -> sml.xsd.

1
votes

I found that for all "Singleton" children where the parent objects has a Property defined (such as Worksheet.sheetViews) use the singleton property and assign the new object to that instead of using "Append" This causes the class itself to ensure the order is correct.

workSheet.Append(sheetViews);
workSheet.Append(columns);
workSheet.Append(sheetData);  // bad idea(though it does work if the order is good)
workSheet.Append(drawing);
workSheet.Append(mergeCells);

More correct format...

workSheet.sheetViews=sheetViews; // order doesn't matter.
workSheet.columns=columns;
...
1
votes

As Joe Masilotti already explained, the order is defined in the schema.

Unfortunately, the OpenXML library does not ensure the correct order of child elements in the serialized XML as required by the underlying XML schema. Applications may not be able to parse the XML successfully if the order is not correct.

Here is a generic solution which I am using in my code:

private T GetOrCreateWorksheetChildCollection<T>(Spreadsheet.Worksheet worksheet) 
    where T : OpenXmlCompositeElement, new()
{
    T collection = worksheet.GetFirstChild<T>();
    if (collection == null)
    {
        collection = new T();
        if (!worksheet.HasChildren)
        {
            worksheet.AppendChild(collection);
        }
        else
        {
            // compute the positions of all child elements (existing + new collection)
            List<int> schemaPositions = worksheet.ChildElements
                .Select(e => _childElementNames.IndexOf(e.LocalName)).ToList();
            int collectionSchemaPos = _childElementNames.IndexOf(collection.LocalName);
            schemaPositions.Add(collectionSchemaPos);
            schemaPositions = schemaPositions.OrderBy(i => i).ToList();

            // now get the index where the position of the new child is
            int index = schemaPositions.IndexOf(collectionSchemaPos);

            // this is the index to insert the new element
            worksheet.InsertAt(collection, index);
        }
    }
    return collection;
}

// names and order of possible child elements according to the openXML schema
private static readonly List<string> _childElementNames = new List<string>() { 
    "sheetPr", "dimension", "sheetViews", "sheetFormatPr", "cols", "sheetData", 
    "sheetCalcPr", "sheetProtection", "protectedRanges", "scenarios", "autoFilter",
    "sortState", "dataConsolidate", "customSheetViews", "mergeCells", "phoneticPr",
    "conditionalFormatting", "dataValidations", "hyperlinks", "printOptions", 
    "pageMargins", "pageSetup", "headerFooter", "rowBreaks", "colBreaks", 
    "customProperties", "cellWatches", "ignoredErrors", "smartTags", "drawing",
    "drawingHF", "picture", "oleObjects", "controls", "webPublishItems", "tableParts",
    "extLst"
};

The method always inserts the new child element at the correct position, ensuring that the resulting document is valid.

0
votes

helb's answer is beautiful - thank you for that, helb.

It has the slight drawback that it does not test if there are already problems with the order of child elements. The following slight modification makes sure there are no pre-existing problems when adding a new element (you still need his _childElementNames, which is priceless) and it's slightly more efficient:

    private static int getChildElementOrderIndex(OpenXmlElement collection)
    {
        int orderIndex = _childElementNames.IndexOf(collection.LocalName);
        if( orderIndex < 0)
            throw new InvalidOperationException($"Internal: worksheet part {collection.LocalName} not found");
        return orderIndex;
    }
    private static T GetOrCreateWorksheetChildCollection<T>(Worksheet worksheet) where T : OpenXmlCompositeElement, new()
    {
        T collection = worksheet.GetFirstChild<T>();
        if (collection == null)
        {
            collection = new T();
            if (!worksheet.HasChildren)
            {
                worksheet.AppendChild(collection);
            }
            else
            {
                int collectionSchemaPos = getChildElementOrderIndex(collection);
                int insertPos = 0;
                int lastOrderNum = -1;
                for(int i=0; i<worksheet.ChildElements.Count; ++i)
                {
                    int thisOrderNum = getChildElementOrderIndex(worksheet.ChildElements[i]);
                    if(thisOrderNum<=lastOrderNum)
                        throw new InvalidOperationException($"Internal: worksheet parts {_childElementNames[lastOrderNum]} and {_childElementNames[thisOrderNum]} out of order");
                    lastOrderNum = thisOrderNum;
                    if( thisOrderNum < collectionSchemaPos )
                        ++insertPos;
                }
                // this is the index to insert the new element
                worksheet.InsertAt(collection, insertPos);
            }
        }
        return collection;
    }
0
votes

For those end up here via Google like I did, the function below solves the ordering problem after the child element is inserted:

public static T ReorderChildren<T>(T element) where T : OpenXmlElement
{
  Dictionary<Type, int> childOrderHashTable = element.GetType()
                                                  .GetCustomAttributes()
                                                  .Where(x => x is ChildElementInfoAttribute)
                                                  .Select( (x, idx) => new KeyValuePair<Type, int>(((ChildElementInfoAttribute)x).ElementType, idx))
                                                  .ToDictionary(x => x.Key, x => x.Value);

  List<OpenXmlElement> reorderedChildren = element.ChildElements
                                                .OrderBy(x => childOrderHashTable[x.GetType()])
                                                .ToList();
  element.RemoveAllChildren();
  element.Append(reorderedChildren);
  return element;         
}

The generated types in the DocumentFormat.OpenXml library have custom attributes that can be used to reflect metadata from the the OOXML schema. This solution relies on System.Reflection and System.Linq (i.e., not very fast) but eliminates the need to hardcode a list of strings to correctly order the child elements for a specific type.

I use this function after validation on the ValidationErrorInfo.Node property and it and cleans up the newly created element by reference. That way I don't have apply this method recursively across an entire document.