3
votes

I'm attempting to clear out rows from an excel template. The code before this goes through and creates the workbook based off a template. That code produces fine excel files with no errors. It's only when adding this part do I run into problems:

Sheet theSheet = workbookPart.Workbook.Descendants<Sheet>()
    .Where(s => s.Name == task).FirstOrDefault();

if (theSheet != null)
{
    WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(theSheet.Id);

    SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();

    var rows = sheetData.Elements<Row>().Where(r => r.RowIndex > 1).ToArray();

    for (int x = 0; x < rows.Count(); x++)
    {
        ((Row)rows[x]).Remove();
    }

    worksheetPart.Worksheet.Save();
}

It clears out the rows successfully. However when I open the file in excel I receive the following error:

Excel found unreadable content in 'excel.xlsx'. Do you want to recover the contents of this workbook?...

Clicking yes gives the following details:

<repairedRecords summary="Following is a list of repairs:">
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1c.xml part</repairedRecord>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1d.xml part</repairedRecord>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1b.xml part</repairedRecord>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1a.xml part</repairedRecord>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet26.xml part</repairedRecord>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1f.xml part</repairedRecord>
    <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1e.xml part</repairedRecord>
</repairedRecords>

If I open the Excel File in "Open XML SDK 2.5 Productivity Tool" and validate it. It gives a bit more info:

Error Node Type: Worksheet
Error Part: /xl/worksheets/sheet1a.xml (this is the only line chat changes and it corresponds to the above errors)
Error Node Path: /x:worksheet[1]
Related Node Type: OpenXmlUnknownElement
Related Part: 
Description: The element has invalid child element 'http://schemas.openxmlformats.org/sheadsheetml/2006/main:row'.

If I open the original Excel file that this code is modifying, sheet1a/sheet1b, etc. do not exist. Where are they coming from? Is there something I'm missing? How can these sheets contain invalid row elements when all I'm doing is removing rows? Thank you for any suggestions.

Edit: Trimmed down form of sheet1a.xml:

    <?xml version="1.0" encoding="utf-8"?>
<x:worksheet xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <x:dimension ref="A1:AK180" />
    <x:sheetViews>
        <x:sheetView workbookViewId="0" />
    </x:sheetViews>
    <x:sheetFormatPr defaultRowHeight="15" />
    <x:cols>
        <x:col min="1" max="1" width="13.85546875" bestFit="1" customWidth="1" />
    </x:cols>
    <x:sheetData>
        <x:row>
            <x:c r="A1" t="inlineStr">
                <x:is>
                    <x:t>TestResultFileId</x:t>
                </x:is>
            </x:c>
            </x:row>
        <x:row r="2">
            <x:c r="A2" t="inlineStr">
                <x:is>
                    <x:t>6F2DFA01-27EE-E211-8250-0025906392BB</x:t>
                </x:is>
            </x:c>
        </x:row>
    </x:sheetData>
    <x:row r="1" spans="1:37">
        <x:c r="A1" t="s">
            <x:v />
        </x:c>
        </x:row>
    <x:conditionalFormatting sqref="A1:AK1048576">
        <x:cfRule type="expression" dxfId="7" priority="1">
            <x:formula />
        </x:cfRule>
    </x:conditionalFormatting>
    <x:pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3" />
</x:worksheet>
1
Does the worksheet contain some formulas (with cell references)? What do you mean by it clears out the rows successfully? Did you try the snippet from here. This is what you do, but the snippet shows how to take care of cell reference. Also the snippet shows how to clean up the SharedStringTable. - Chris
There are pivot tables, but as long as I leave the header row intact I don't get pivot table errors. I will look at those snippets and see if I'm missing something. It clears out the rows successfully meaning the rows are empty. - Shawn
I'm guessing posting the original Excel template file might be difficult (confidentiality or size). But is it ok to just post the XML of a worksheet? The "sheet1a.xml" (and others) are zipped in the Excel file. Rename the file from say exceltemplate.xlsx to exceltemplate.zip. Unzip the file. Go to the folder /xl/worksheet/ and you'll find all the worksheet XML files. Post the XML content of one of them. The problem could be due to some weird relationship that wasn't taken care of when the Row's (and thus Cell's) are removed. - Vincent Tan
I added a trimmed down form on sheeta1a.xml. The sheets with errors only exist in the export, not in the template. I don't understand why these sheets are getting added as I'm not adding any... - Shawn
I can't reproduce the error. So suggestion: maybe the Cell object held a formula that references other sheets. It's like Sheet1!A2 references Sheet85!N888 and whatnot. Since you're removing all the Rows (above 1) and thus most of the Cells, all the references turn invalid. Although a different error should come up, like the calcChain is wrong, but whatever. Try removing the CalculationChainPart of the WorkbookPart and see if that works. Without the actual Excel file, I can't reproduce the error. So the actual XML of the worksheet means something. Maybe show a few Row's and Cell data? - Vincent Tan

1 Answers

4
votes

Based on the XML of the worksheet, this part:

<x:row r="1" spans="1:37">
    <x:c r="A1" t="s">
        <x:v />
    </x:c>
</x:row>

should not exist outside of the SheetData element. In fact, there appears to be a duplicate, since

<x:row>
    <x:c r="A1" t="inlineStr">
        <x:is>
            <x:t>TestResultFileId</x:t>
        </x:is>
    </x:c>
</x:row>

also exist. Note that the "actual" header Row has no RowIndex assigned, but according to the CellReference of "A1", this particular row is on row 1.

Note that this:

var rows = sheetData.Elements<Row>().Where(r => r.RowIndex > 1).ToArray();

will probably ignore any Row objects without the RowIndex assigned (I didn't test this though...). Which can happen. Excel should have assigned a value, but any third-party software isn't bound to do so (because the Open XML specs state that RowIndex is an optional attribute).

I don't know why there's a Row object outside of SheetData. Check that the original template Excel file doesn't have this "Row object outside of SheetData object" case. If it does, then the original template file was faulty in the first place.

You might want to consider the option of storing the 1st Row in a separate variable first. Then wipe out all the child elements of SheetData. Then Append() that 1st Row in. This might be easier. You can wipe out children like so (insert bad parenting joke here):

sheetData.RemoveAllChildren();