3
votes

I have an EXCEL file with dates in it. They are formatted as TEXT, for example: =TEXT(TODAY(); "yyyy-MM-dd")

In EXCEL the date is correctly formatted as text, but when I read the cell with Apache POI, it will return the numeric value. Why? Why does POI not read the formatted text value?

I don't want to format the date in my JAVA application, because the EXCEL file should define the format (it could be different for each value).

Here's my code for reading the cell value:

private static String getString(Cell cell) {
 if (cell == null) return null; 

 if (cell.getCellTypeEnum() != CellType.FORMULA) { 
  switch (cell.getCellTypeEnum()) { 
   case STRING: 
    return cell.getStringCellValue().trim(); 
   case BOOLEAN: 
    return String.valueOf(cell.getBooleanCellValue());
   case NUMERIC: 
    return String.valueOf(cell.getNumericCellValue()); 
   case BLANK: 
    return null; 
   case ERROR: 
    throw new RuntimeException(ErrorEval.getText(cell.getErrorCellValue())); 
   default: 
    throw new RuntimeException("unexpected cell type " + cell.getCellTypeEnum());
  }
 } 
 FormulaEvaluator evaluator = cell.getSheet().getWorkbook().getCreationHelper().createFormulaEvaluator();
 try { 
  CellValue cellValue = evaluator.evaluate(cell); 
  switch (cellValue.getCellTypeEnum()) { 
   case NUMERIC: 
    return String.valueOf(cellValue.getNumberValue());
   case STRING: 
    return cellValue.getStringValue().trim(); 
   case BOOLEAN: 
    return String.valueOf(cellValue.getBooleanValue()); 
   case ERROR: 
    throw new RuntimeException(ErrorEval.getText(cellValue.getErrorValue())); 
   default: 
    throw new RuntimeException("unexpected
cell type " + cellValue.getCellTypeEnum()); 
  } 
 } catch (RuntimeException e) { 
  throw new RuntimeException("Could not evaluate the value of " + cell.getAddress() + " in sheet " + cell.getSheet().getSheetName(), e);
 }
}
2
If I were faced with your problem in a VBA environment I would use the Format function to convert the numeric value to text, like, MyDate$ = Format(Date, "dd/mm/yyyy")Variatus
Thanks, but that's not what I want. I don't want to define the format in my application. The exelfile should define it. As I wrote, the format could change between each cell. That's why I tried with =TEXT(xxx, "format")ner0
Sorry, don't understand your problem. If you want Excel to determine the format Excel needs a number but if Excel actually does get a number then what is your complaint?Variatus
The problem only occurs if the Excel used is not a English one. Then the formula is not really =TEXT(A2,"yyyy-MM-dd") for example but =TEXT(A2,"JJJJ-MM-TT") in my German Excel for example. And because apache poi's FormulaEvaluator does not have locale settings until now, that formula cannot be evaluated properly. Then one could only hope the stored cell value which should be the needed string. So if cell formula starts with "TEXT" then do not evaluating but do taking the string cell value from Excel's last evaluation.Axel Richter

2 Answers

2
votes

The problem only occurs if the Excel used is not a English one. Then the formula is not really =TEXT(A2,"yyyy-MM-dd") for example but =TEXT(A2,"JJJJ-MM-TT") in my German Excel for example.

As you see, the format part within the TEXT function will always be locale dependant although all other formula parts will be en_US locale always. This is because that format part is in a string within the formula which will not be changed. So in German it is =TEXT(A2,"JJJJ-MM-TT") (Year = Jahr, Day = Tag) and in French it is =TEXT(A2,"AAAA-MM-JJ") (Year = Année, Day = Jour) .

And because apache poi's FormulaEvaluator does not have locale settings until now, that formula cannot be evaluated properly.

Then we have two possibilities.

First we could hope the stored cell value should be the needed string. So if cell formula starts with "TEXT" and contains "JJJJ-MM-TT" then do not evaluating because this will not be properly. Instead take the string cell value from Excel's last evaluation.

Second we could replacing the locale dependent format part with the en_US one in the formula and then let apache poi evaluate. At least if we only wants reading and not rewriting the Excel file this will not destroing something in the Excel file.


Code first approach:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.util.*;

import org.apache.poi.ss.formula.eval.ErrorEval;

import java.io.FileInputStream;

class ReadExcelExample {

 private static String getString(Cell cell, FormulaEvaluator evaluator) {
  if (cell == null) return "null";
  String text = "";
  switch (cell.getCellType()) {
  //switch (cell.getCellTypeEnum()) {
   case STRING:
    text = cell.getRichStringCellValue().getString();
   break;
   case NUMERIC:
    if (DateUtil.isCellDateFormatted(cell)) {
     text = String.valueOf(cell.getDateCellValue());
    } else {
     text = String.valueOf(cell.getNumericCellValue());
    }
   break;
   case BOOLEAN:
    text = String.valueOf(cell.getBooleanCellValue());
   break;
   case FORMULA:
    text = cell.getCellFormula();

    //if formula is TEXT(...,"JJJJ-MM-TT") then do not evaluating:
    if (cell.getCellFormula().startsWith("TEXT") && cell.getCellFormula().contains("JJJJ-MM-TT")) {
     text = text + ": value got from cell = " + cell.getRichStringCellValue().getString();

    } else {
     CellValue cellValue = evaluator.evaluate(cell); 
     switch (cellValue.getCellType()) {
     //switch (cellValue.getCellTypeEnum()) {
      case STRING:
       text = text + ": " + cellValue.getStringValue();
      break;
      case NUMERIC:
       if (DateUtil.isCellDateFormatted(cell)) {
        text = text + ": " + String.valueOf(DateUtil.getJavaDate(cellValue.getNumberValue()));
       } else {
        text = text + ": " + String.valueOf(cellValue.getNumberValue());
       }
      break;
      case BOOLEAN:
       text = text + ": " + String.valueOf(cellValue.getBooleanValue());
      break;
      case ERROR:
       throw new RuntimeException("from CellValue: " + ErrorEval.getText(cellValue.getErrorValue()));
      default:
       throw new RuntimeException("unexpected cellValue type " + cellValue.getCellType()); 
     }
    }
   break;
   case ERROR:
    throw new RuntimeException("from Cell: " + ErrorEval.getText(cell.getErrorCellValue())); 
   case BLANK:
    text = "";
   break;
   default:
    throw new RuntimeException("unexpected cell type " + cell.getCellType());
  }

  return text;
 }

 public static void main(String[] args) throws Exception {

  //Workbook wb  = WorkbookFactory.create(new FileInputStream("SAMPLE.xls"));
  Workbook wb  = WorkbookFactory.create(new FileInputStream("SAMPLE.xlsx"));

  DataFormatter formatter = new DataFormatter(new java.util.Locale("en", "US"));
  FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();

  Sheet sheet = wb.getSheetAt(0);
  for (Row row : sheet) {
   for (Cell cell : row) {
    CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
    System.out.print(cellRef.formatAsString());
    System.out.print(" - ");

    String text = "";
    try {
    text = getString(cell, evaluator);
    } catch (Exception ex) {
     text = ex.toString();
    }
    System.out.println(text);

   }
  }

  wb.close();

 }
}

German Excel:

enter image description here

Result:

A1 - Value
B1 - Formula
A2 - Fri Jan 11 00:00:00 CET 2019
B2 - TEXT(A2,"JJJJ-MM-TT"): value got from cell = 2019-01-11
A3 - 123.45
B3 - A3*2: 246.9
B4 - java.lang.RuntimeException: from CellValue: #DIV/0!
B5 - TODAY(): Fri Jan 11 00:00:00 CET 2019
B6 - B5=A2: true
A7 - java.lang.RuntimeException: from CellValue: #N/A
B8 - TEXT(TODAY(),"JJJJ-MM-TT"): value got from cell = 2019-01-11

English Calc:

enter image description here

Result:

A1 - Value
B1 - Formula
A2 - Fri Jan 11 00:00:00 CET 2019
B2 - TEXT(A2,"yyyy-MM-dd"): 2019-01-11
A3 - 123.45
B3 - A3*2: 246.9
B4 - java.lang.RuntimeException: from CellValue: #DIV/0!
B5 - TODAY(): Fri Jan 11 00:00:00 CET 2019
B6 - B5=A2: true
A7 - java.lang.RuntimeException: from CellValue: #N/A
B8 - TEXT(TODAY(),"yyyy-MM-dd"): 2019-01-11

Code second approach (replacing the locale dependent format part with the en_US one):

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.util.*;

import java.io.FileInputStream;
import java.util.Locale;

class ExcelEvaluateTEXTDiffLocales {

 private static String getString(Cell cell, DataFormatter formatter, FormulaEvaluator evaluator, Locale locale) {
  String text = "";
  if (cell.getCellType() == CellType.FORMULA) {
   String cellFormula = cell.getCellFormula();
   text += cellFormula + ":= ";

   if (cellFormula.startsWith("TEXT")) {
    int startFormatPart = cellFormula.indexOf('"');
    int endFormatPart = cellFormula.lastIndexOf('"') + 1;
    String formatPartOld = cellFormula.substring(startFormatPart, endFormatPart);
    String formatPartNew = formatPartOld;
    if ("de".equals(locale.getLanguage())) {
     formatPartNew = formatPartNew.replace("T", "D"); // Tag = Day
     // Monat = Month
     formatPartNew = formatPartNew.replace("J", "Y"); // Jahr = Year
     //...
    } else if ("fr".equals(locale.getLanguage())) {
     formatPartNew = formatPartNew.replace("J", "D"); // Jour = Day
     // Mois = Month
     formatPartNew = formatPartNew.replace("A", "Y"); // Année = Year
     //...
    } //...
    cellFormula = cellFormula.replace(formatPartOld, formatPartNew);
    cell.setCellFormula(cellFormula);
   }

  }
  try {
   text += formatter.formatCellValue(cell, evaluator);
  } catch (org.apache.poi.ss.formula.eval.NotImplementedException ex) {
   text += ex.toString();
  }

  return text;
 }

 public static void main(String[] args) throws Exception {

  //Workbook wb  = WorkbookFactory.create(new FileInputStream("SAMPLE.xls"));
  Workbook wb  = WorkbookFactory.create(new FileInputStream("SAMPLE.xlsx"));

  Locale locale = new Locale("fr", "CH");
  DataFormatter formatter = new DataFormatter(locale);
  FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();

  Sheet sheet = wb.getSheetAt(0);
  for (Row row : sheet) {
   for (Cell cell : row) {
    CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
    System.out.print(cellRef.formatAsString());
    System.out.print(" - ");

    String text = "";
    text = getString(cell, formatter, evaluator, locale);

    System.out.println(text);

   }
  }

  wb.close();

 }
}

French Calc:

enter image description here

Result:

A1 - Value
B1 - Formula
A2 - 1/11/2019
B2 - TEXT(A2,"AAAA-MM-JJ"):= 2019-01-11
A3 - 123.45
B3 - A3*2:= 246.9
B4 - 1/A4:= #DIV/0!
B5 - TODAY():= 1/12/2019
B6 - B5=A2:= FALSE
A7 - NA():= #N/A
B8 - TEXT(TODAY(),"AAAA-MM-JJ"):= 2019-01-12

Hint: Used apache poi version here is 4.0.1. Maybe lower versions might have further evaluation issues.

0
votes

Providing a patch for org/apache/poi/ss/formula/functions/TextFunction.java

Of course my first answer is only correcting the symptoms. The final solution clearly should be that evaluating the TEXT function should take into account different locales.

Working draft:

Changed org/apache/poi/ss/formula/functions/TextFunction.javaas follows:

...
    /**
     * An implementation of the TEXT function<br>
     * TEXT returns a number value formatted with the given number formatting string. 
     * This function is not a complete implementation of the Excel function, but
     *  handles most of the common cases. All work is passed down to 
     *  {@link DataFormatter} to be done, as this works much the same as the
     *  display focused work that that does. 
     *
     * <b>Syntax<b>:<br> <b>TEXT</b>(<b>value</b>, <b>format_text</b>)<br>
     */
    public static final Function TEXT = new Fixed2ArgFunction() {

        public ValueEval evaluate(int srcRowIndex, int srcColumnIndex, ValueEval arg0, ValueEval arg1) {
            double s0;
            String s1;
            try {
                s0 = evaluateDoubleArg(arg0, srcRowIndex, srcColumnIndex);
                s1 = evaluateStringArg(arg1, srcRowIndex, srcColumnIndex);
            } catch (EvaluationException e) {
                return e.getErrorEval();
            }

            try {
            // Correct locale dependent format strings
                Locale locale = org.apache.poi.util.LocaleUtil.getUserLocale();
                if ("de".equals(locale.getLanguage())) {
                    s1 = s1.replace("T", "D"); // Tag = Day
                    // Monat = Month
                    s1 = s1.replace("J", "Y"); // Jahr = Year
                    //... further replacements
                } else if ("fr".equals(locale.getLanguage())) {
                    s1 = s1.replace("J", "D"); // Jour = Day
                    // Mois = Month
                    s1 = s1.replace("A", "Y"); // Année = Year
                    //... further replacements
                } //... further languages

            // Ask DataFormatter to handle the String for us
                String formattedStr = formatter.formatRawCellContents(s0, -1, s1);
                return new StringEval(formattedStr);
            } catch (Exception e) {
                return ErrorEval.VALUE_INVALID;
            }
        }
    };
...

Then the getting the content as simple as:

import org.apache.poi.ss.usermodel.*;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.ss.util.*;
import org.apache.poi.util.LocaleUtil;

import java.io.FileInputStream;
import java.util.Locale;

class ExcelEvaluateDiffLocales {

 private static String getString(Cell cell, DataFormatter formatter, FormulaEvaluator evaluator) {
  String text = "";
  if (cell.getCellType() == CellType.FORMULA) {
   String cellFormula = cell.getCellFormula();
   text += cellFormula + ":= ";
  }
  try {
   text += formatter.formatCellValue(cell, evaluator);
  } catch (org.apache.poi.ss.formula.eval.NotImplementedException ex) {
   text += ex.toString();
  }
  return text;
 }

 public static void main(String[] args) throws Exception {

  //Workbook wb  = WorkbookFactory.create(new FileInputStream("SAMPLE.xls"));
  Workbook wb  = WorkbookFactory.create(new FileInputStream("SAMPLE.xlsx"));

  Locale locale = new Locale("fr", "FR");
  LocaleUtil.setUserLocale(locale);
  DataFormatter formatter = new DataFormatter();
  FormulaEvaluator evaluator = wb.getCreationHelper().createFormulaEvaluator();

  Sheet sheet = wb.getSheetAt(0);
  for (Row row : sheet) {
   for (Cell cell : row) {
    CellReference cellRef = new CellReference(row.getRowNum(), cell.getColumnIndex());
    System.out.print(cellRef.formatAsString());
    System.out.print(" - ");

    String text = "";
    text = getString(cell, formatter, evaluator);

    System.out.println(text);

   }
  }

  wb.close();

 }
}