0
votes

I have a workbook, which I must clear from all references to other workbooks. I am currently trying to parse the cell formulas wo check if they are referencing any excel file.

For that I use this line

cell.getCellFormula().matches(".*\\[.*\\.xls[xm]?\\].*")

the issue with this is, that the cell looks like this in XML format:

 <c r="K64" s="2128">
    <f>[5]Segments!$AS$7/Annual!AF38</f>
    <v>0.0</v>
 </c>

As you can see, the formula doesn't actually contain .xls, '.xlsx' or .xlsm at all. As far as I know [5] indicates a shared string which holds the actual path and therefore the actual value for the formula.

Now one could say and change the regex to .*\\[\d+\\].*, but I think that this can be pretty error prone. Also I think that not literally every external reference will look like this.

So my question is:

How can I identify formulas which reference an external workbook?


If you have any questions, feel free to ask.

EDIT:

I have prepared an sample excel file showcasing the issue. It's available for download at workupload.com

1
Did you look in the shared XML for the id 5? In these cases: the simpler, the better. Even if erroneous.Joop Eggen
@JoopEggen In no XML file I could find the actual file name. I used a test file and not the actual file, but the result should be the same. And if not, it just adds another layer of complexityXtremeBaumer
Maybe, this question holds some hints on how to do that. The code in the accepted (and bountied) answer works with formulae and external sheets.deHaar
@deHaar The answer lets me list all external references, but so far I couldn't find a way to match the numbers ([5]) to the referenceXtremeBaumer
The xlsx unzipped, xl/sharedStrings.xml the text should be in XML element /sst/si[5]/tJoop Eggen

1 Answers

2
votes

The way shown in Dynamically add External (Cross-Workbook) references definitely is the way to go. Go through all formula tokens and if one of those has an external sheet index, then this formula refers an external sheet.

Example using your uploaded file:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.formula.*;
import org.apache.poi.ss.formula.ptg.*;
import org.apache.poi.ss.formula.EvaluationWorkbook.ExternalSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.hssf.usermodel.HSSFEvaluationWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFEvaluationWorkbook;

import java.io.FileInputStream;

public class ExcelReadExternalReference {
    public static void main(String[] args) throws Exception {

        String filePath = "TestExternalLinks.xlsx";
        // String filePath = "TestExternalLinks.xls";

        Workbook workbook = WorkbookFactory.create(new FileInputStream(filePath));

        EvaluationWorkbook evalWorkbook = null;
        if (workbook instanceof HSSFWorkbook) {
            evalWorkbook = HSSFEvaluationWorkbook.create((HSSFWorkbook) workbook);
        } else if (workbook instanceof XSSFWorkbook) {
            evalWorkbook = XSSFEvaluationWorkbook.create((XSSFWorkbook) workbook);
        }

        Sheet sheet = workbook.getSheetAt(0);
        EvaluationSheet evalSheet = evalWorkbook.getSheet(0);

        for (Row row : sheet) {
            for (Cell cell : row) {
                if (cell.getCellType() == CellType.FORMULA) {
                    String cellFormula = cell.getCellFormula();
                    System.out.println(cellFormula);

                    EvaluationCell evaluationCell = evalSheet.getCell(cell.getRowIndex(), cell.getColumnIndex());
                    Ptg[] formulaTokens = evalWorkbook.getFormulaTokens(evaluationCell);
                    for (Ptg formulaToken : formulaTokens) {
                        int externalSheetIndex = -1;
                        if (formulaToken instanceof Ref3DPtg) {
                            Ref3DPtg refToken = (Ref3DPtg) formulaToken;
                            externalSheetIndex = refToken.getExternSheetIndex();
                        } else if (formulaToken instanceof Area3DPtg) {
                            Area3DPtg refToken = (Area3DPtg) formulaToken;
                            externalSheetIndex = refToken.getExternSheetIndex();
                        } else if (formulaToken instanceof Ref3DPxg) {
                            Ref3DPxg refToken = (Ref3DPxg) formulaToken;
                            externalSheetIndex = refToken.getExternalWorkbookNumber();
                        } else if (formulaToken instanceof Area3DPxg) {
                            Area3DPxg refToken = (Area3DPxg) formulaToken;
                            externalSheetIndex = refToken.getExternalWorkbookNumber();
                        }

                        if (externalSheetIndex >= 0) {
                            System.out.print("We have extrenal sheet index: " + externalSheetIndex
                                    + ". So this formula refers an external sheet in workbook: ");

                            ExternalSheet externalSheet = null;
                            if (workbook instanceof HSSFWorkbook) {
                                externalSheet = evalWorkbook.getExternalSheet(externalSheetIndex);
                            } else if (workbook instanceof XSSFWorkbook) {
                                externalSheet = evalWorkbook.getExternalSheet(null, null, externalSheetIndex);
                            }
                            String linkedFileName = externalSheet.getWorkbookName();
                            System.out.println(linkedFileName);

                        }
                    }
                }
            }
        }

        workbook.close();
    }

}