14
votes

In my case, valid CSV are ones delimited by either comma or semi-colon. I am open to other libraries, but it needs to be Java. Reading through the Apache CSVParser API, the only thing I can think is to do this which seems inefficient and ugly.

try
{
   BufferedReader reader = new BufferedReader(new InputStreamReader(file));
   CSVFormat csvFormat = CSVFormat.EXCEL.withHeader().withDelimiter(';');
   CSVParser parser = csvFormat.parse( reader );
   // now read the records
} 
catch (IOException eee) 
{
   try
   {
      // try the other valid delimeter
      csvFormat = CSVFormat.EXCEL.withHeader().withDelimiter(',');
      parser = csvFormat.parse( reader );
      // now read the records
   }
   catch (IOException eee) 
   {
      // then its really not a valid CSV file
   }
}

Is there a way to check the delimiter first, or perhaps allow two delimiters? Anyone have a better idea than just catching an exception?

3
I think your codes are best. No method for detecting delimiter in normal CSV file. Only way for detecting delimiter is retrying with several delimiters.gilchris
Just a thought, if you have well formed csv could you do a pattern match for one of your options? If every field is wrapped in quotes then separated by commas you might find several instances of the pattern ","Ryan E

3 Answers

8
votes

We built support for this in uniVocity-parsers:

public static void main(String... args) {
    CsvParserSettings settings = new CsvParserSettings();
    settings.setDelimiterDetectionEnabled(true);

    CsvParser parser = new CsvParser(settings);

    List<String[]> rows = parser.parseAll(file);

}

The parser has many more features that I'm sure you will find useful. Give it a try.

Disclaimer: I'm the author of this library, it's open source and free (apache 2.0 license)

0
votes

I've had the same problem which I solved it in this way:

    BufferedReader in = Files.newBufferedReader(Paths.get(fileName));
    in.mark(1024);
    String line = in.readLine();
    CSVFormat fileFormat;
    
    if(line.indexOf(';') != -1)
        fileFormat = CSVFormat.EXCEL.withDelimiter(';');
    else
        fileFormat = CSVFormat.EXCEL;
    
    in.reset();

After that you can parse it with CSVParser.

0
votes

below my solve for this problem:

    private static final Character[] DELIMITERS = {';', ','};
    private static final char NO_DELIMITER = '\0'; //empty char

    private char detectDelimiter() throws IOException {
        try (
            final var reader = new BufferedReader(new InputStreamReader(resource.getInputStream()));
        ) {
            String line = reader.readLine();

            return Arrays.stream(DELIMITERS)
                .filter(s -> line.contains(s.toString()))
                .findFirst()
                .orElse(NO_DELIMITER);
        }
    }

example usage:

private CSVParser openCsv() throws IOException {

        final var csvFormat = CSVFormat.DEFAULT
            .withFirstRecordAsHeader()
            .withDelimiter(detectDelimiter())
            .withTrim();

        return new CSVParser(new InputStreamReader(resource.getInputStream(), StandardCharsets.UTF_8), csvFormat);
    }