I'm trying to use the Univocity format auto-detection for parsing this CSV table:
HEADER1, HEADER2, HEADER3
11, 12, 13
21, 22, 23
31, 32, 33
As you can see, there're same number of commas ',' and spaces ' '. Problem is that the heuristic for finding the delimiter gives preference to the ' ' instead of the ',' character.
So in this case the detected separator is the space ' '. And then, the values of the cells are wrong since the comma is taken as part of the value:
I saw there's a functionality setDelimiterDetectionEnabled for defining the delimiters in order of priority, but I couldn't make it work.
I use it like this: setDelimiterDetectionEnabled(true, ',', ' ')
, but still chooses the space as delimiter.
If I remove 1 space in the CSV table (so there would be more commas than spaces) the comma is chosen as delimiter.
This the code, is scala but I think this is not relevant because the library is written in java:
val settings = new CsvParserSettings
settings.setDelimiterDetectionEnabled(true, ',', ' ')
val parser = new CsvParser(settings)
val spaceAndCommaTable = new File("/home/pr/SPACE_AND_COMMA.csv")
val parsed = parser.parseAll(spaceAndCommaTable, "UTF-8")
val format = parser.getDetectedFormat
I expected to have format.getDelimiter the comma ',', but the actual delimiter is the space ' '