16
votes

Suppose I have this CSV file :

NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"

I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :

string[] line = s.Split(',');

s is a reference to RichTextBox.Lines[]. And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens

Any help will be appreciated!

6
Unless you want to display anything, do not (ab)use GUI components for data storage. If you need the contents of the file line by line, use the File.ReadLines method.O. R. Mapper
@O.R.Mapper You're absolutely right! I'll change my code design for thatswdev
@chancea CsvHelper and CsvReader it that link should be good, but I think I will go with the solution that use RegEx. :) Thanks!swdev

6 Answers

27
votes

You could use regex too:

string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = @"""\s*,\s*""";

// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
    input.Substring(1, input.Length - 2), pattern);

This will give you:

Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
9
votes

I've done this with my own method. It simply counts the amout of " and ' characters.
Improve this to your needs.

    public List<string> SplitCsvLine(string s) {
        int i;
        int a = 0;
        int count = 0;
        List<string> str = new List<string>();
        for (i = 0; i < s.Length; i++) {
            switch (s[i]) {
                case ',':
                    if ((count & 1) == 0) {
                        str.Add(s.Substring(a, i - a));
                        a = i + 1;
                    }
                    break;
                case '"':
                case '\'': count++; break;
            }
        }
        str.Add(s.Substring(a));
        return str;
    }
2
votes

It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv. CSV could be delimited with various punctuation signs. Moreover, there are gotchas, which are already addressed by library creators. Such as dealing with name row, dealing with different date formats and mapping rows to C# objects.

2
votes

You can replace "," with ; then split by ;

var values= s.Replace("\",\"",";").Split(';');
0
votes

If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string

 string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");

This will only work if ALL fields are double-quoted even if they don't (officially) need to be. It will be faster than RegEx but with given conditions as to its use.

Really useful if your data looks like "Name","1","12/03/2018","Add1,Add2,Add3","other stuff"

0
votes

Five years old but there is always somebody new who wants to split a CSV.

If your data is simple and predictable (i.e. never has any special characters like commas, quotes and newlines) then you can do it with split() or regex.

But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. Don't re-invent the wheel (unless you are doing it for fun of course).

CsvHelper is simple enough to use:

https://joshclose.github.io/CsvHelper/2.x/

using (var parser = new CsvParser(textReader)
{
    while(true)
    {
        string[] line = parser.Read();

        if (line != null)
        {
            // do something
        }
        else
        {
            break;
        }
    }
}

More discussion / same question: Dealing with commas in a CSV file