I have to read a csv file. The file can contain any delimtier and can be enclosed by ""\" or not. The file should also be parsed regarding RFC4180. (I know that in RFC4180 the delimiter is a ",", but a user should also be able to read a file delimited by "|" for example).
public List<List<String>> readFileAsListOfList(File file, String delimiter, String lineEnding, String enclosure) throws Exception {
if (!file.exists()) {
throw new Exception("File doesn't exist.");
}
if (!file.isFile()) {
throw new Exception("File must be a file.");
}
List<List<String>> fileContent = new ArrayList<>();
CSVFormat csvFormat = CSVFormat.RFC4180.withDelimiter(delimiter.charAt(0)).withEscape(lineEnding.charAt(0));
if (StringUtils.isNotEmpty(enclosure)) {
csvFormat.withQuote(enclosure.charAt(0));
} else {
csvFormat.withQuote(null);
}
System.out.println(csvFormat);
List<String> lineContent = new ArrayList<>();
for (CSVRecord rec : csvFormat.parse(new FileReader(file))) {
for (String field : rec) {
lineContent.add(field);
}
fileContent.add(lineContent);
}
return fileContent;
}
If I have now the case that the file is not enclosed and I have a line like
aaa|bbb|"|ccc
I get following error:
Exception in thread "main" java.lang.IllegalStateException: IOException reading next record: java.io.IOException: (startline 120707) EOF reached before encapsulated token finished at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:530) at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:540) at com.ids.dam.pim.validation.CSVFileReaderApache.readFileAsListOfList(CSVFileReaderApache.java:61) at com.ids.dam.pim.validation.CSVFileReaderApache.main(CSVFileReaderApache.java:78) Caused by: java.io.IOException: (startline 120707) EOF reached before encapsulated token finished at org.apache.commons.csv.Lexer.parseEncapsulatedToken(Lexer.java:288) at org.apache.commons.csv.Lexer.nextToken(Lexer.java:158) at org.apache.commons.csv.CSVParser.nextRecord(CSVParser.java:586) at org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:527) ... 3 more
I think this is because my CSVFormat still contains a double quote as enclosure, because this is default in RFC4180.
Printing out the format gives following:
Delimiter=<|> Escape=<L> QuoteChar=<"> RecordSeparator=< > SkipHeaderRecord:false
For me, this means I can overwrite the default delimiter with CSVFormat.RFC4180.withDelimiter(delimiter.charAt(0)...
but I cannot set the enclosure to null
Is there a way to set the enclosure to null while still using RFC4180?