4
votes

I have a text file that needs to be converted into a csv file. My plan is to:

  • parse the file line by line
  • search and replace commas inside double quotes with a space
  • then delete all double quotes
  • append the line to a new csv file

Question: I need a function that will recognize the comma inside a double quote and replace it.

Here is a sample line:

"MRS Brown","4611 BEAUMONT ST"," ","WARRIOR RUN, PA"

8
Might there be quotes inside the quoted strings? If so, how will they be escaped? (E.g. "quotes \"inside\" other quotes" or "quotes ""inside"" other quotes"?)Aasmund Eldhuset
Yes, there can be quotes inside double quotes.Internet Engineer
I have tried regex expressions and replace.Internet Engineer
.NET has good support for this, use the TextFieldParser class. It gives you a string[], you can then simply use string.Replace to whack the commas.Hans Passant
Does this answer your question? Dealing with commas in a CSV fileTylerH

8 Answers

5
votes

Your file seems to be already in a CSV complaint format. Any good CSV reader would be able to read it correctly.

If your problem is just reading the field values correctly, then you need to read it the correct way.

Here is one way to do it:

using Microsoft.VisualBasic.FileIO; 


    private void button1_Click(object sender, EventArgs e)
    {
        TextFieldParser tfp = new TextFieldParser("C:\\Temp\\Test.csv");
        tfp.Delimiters = new string[] { "," };
        tfp.HasFieldsEnclosedInQuotes = true;
        while (!tfp.EndOfData)
        {
            string[] fields = tfp.ReadFields();

            // do whatever you want to do with the fields now...
            // e.g. remove the commas and double-quotes from the fields.
            for (int i = 0; i < fields.Length;i++ )
            {
                fields[i] = fields[i].Replace(","," ").Replace("\"","");
            }

            // this is to show what we got as the output
            textBox1.AppendText(String.Join("\t", fields) + "\n");
        }
        tfp.Close();
    }

EDIT:

I just noticed that the question has been filed under C#, VB.NET-2010. Here is the VB.NET version, just in case you are coding in VB.

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    Dim tfp As New FileIO.TextFieldParser("C:\Temp\Test.csv")
    tfp.Delimiters = New String() {","}
    tfp.HasFieldsEnclosedInQuotes = True
    While Not tfp.EndOfData
        Dim fields() As String = tfp.ReadFields

        '' do whatever you want to do with the fields now...
        '' e.g. remove the commas and double-quotes from the fields.
        For i As Integer = 0 To fields.Length - 1
            fields(i) = fields(i).Replace(",", " ").Replace("""", "")
        Next
        '' this is to show what we got as the output
        TextBox1.AppendText(Join(fields, vbTab) & vbCrLf)
    End While
    tfp.Close()
End Sub
2
votes

Here's a simple function that will remove commas embedded between two double quotes in a string. You can pass in a long string that has multiple occurrences of "abc,123",10/13/12,"some description"...etc. It will also remove the double quotes.

Private Function ParseCommasInQuotes(ByVal arg As String) As String

    Dim foundEndQuote As Boolean = False
    Dim foundStartQuote As Boolean = False
    Dim output As New StringBuilder()

    '44 = comma
    '34 = double quote

    For Each element As Char In arg

        If foundEndQuote Then
            foundStartQuote = False
            foundEndQuote = False
        End If

        If element.Equals(Chr(34)) And (Not foundEndQuote) And foundStartQuote Then
            foundEndQuote = True
            Continue For
        End If


        If element.Equals(Chr(34)) And Not foundStartQuote Then
            foundStartQuote = True
            Continue For
        End If


        If (element.Equals(Chr(44)) And foundStartQuote) Then
            'skip the comma...its between double quotes
        Else
            output.Append(element)
        End If

    Next

    Return output.ToString()

End Function
2
votes

Thanks to Baz, The Glockster Answer in VB, I just converted it in C# and its works well. With this code you don't need any third party parser.

string line = reader.ReadLine();                    
line = ParseCommasInQuotes(line);

private string ParseCommasInQuotes(string arg)
{

  bool foundEndQuote = false;
  bool foundStartQuote = false;
  StringBuilder output = new StringBuilder();

  //44 = comma
  //34 = double quote

  foreach (char element in arg)
  {
    if (foundEndQuote)
    {
      foundStartQuote = false;
      foundEndQuote = false;
    }

    if (element.Equals((Char)34) & (!foundEndQuote) & foundStartQuote)
    {
      foundEndQuote = true;
      continue;
    }

    if (element.Equals((Char)34) & !foundStartQuote)
    {
      foundStartQuote = true;
      continue;
    }

    if ((element.Equals((Char)44) & foundStartQuote))
    {
      //skip the comma...its between double quotes
    }
    else
    {
      output.Append(element);
    }
  }
  return output.ToString();
}
0
votes

I did not understand your question before. Now I'm pretty sure I got it right:

TextFieldParser parser = new TextFieldParser(@"c:\file.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData) 
{
    //Processing row
    string[] fields = parser.ReadFields();
    foreach (string field in fields) 
    {
        //TODO: Do whatever you need
    }
}
parser.Close();
0
votes
var result = Regex.Replace(input,
                           @"[^\""]([^\""])*[^\""]", 
                           m => m.Value.Replace(",", " ") );
0
votes

It doesn't sound as though what you are describing will end up as a csv file but to answer your question I would do this.

First you need to get the text file into some usable code that can be looped through like this:

    public static List<String> GetTextListFromDiskFile(String fileName)
    {
        List<String> list = new List<String>();
        try
        {
            //load the file into the streamreader 
            System.IO.StreamReader sr = new System.IO.StreamReader(fileName);

            //loop through each line of the file
            while (sr.Peek() >= 0)
            {
                list.Add(sr.ReadLine());
            }
            sr.Close();
        }
        catch (Exception ex)
        {
            list.Add("Error: Could not read file from disk. Original error: " + ex.Message);
        }

        return list;
    }

Then loop through the list and using a simple foreach loop and run the replace on the list like this:

        foreach (String item in list)
        {
            String x = item.Replace("\",\"", "\" \"");
            x = x.Replace("\"", "");
        }

After you do this then you need to create you csv file line by line. I would use a StringBuilder again and then just do an sb.AppendLine(x) to create the String that will become the text file then write it out to the disk using something like this.

    public static void SaveFileToDisk(String filePathName, String fileText)
    {
        using (StreamWriter outfile = new StreamWriter(filePathName))
        {
            outfile.Write(fileText);
        }
    }
0
votes

This worked for me. Hope it helps someone else.

Private Sub Command1_Click()
Open "c:\\dir\file.csv" For Input As #1
Open "c:\\dir\file2.csv" For Output As #2
Do Until EOF(1)
Line Input #1, test$
99
c = InStr(test$, """""")
If c > 0 Then
test$ = Left$(test$, c - 1) + Right$(test$, Len(test$) - (c + 1))
GoTo 99
End If
Print #2, test$
Loop
End Sub
0
votes

I would do it all before you start to process it line by line. Also, check out CsvHelper. It's fast and easy. Just take your results and throw them into a TextReader and then pass it to CvsReader.

Here is your comma (in double quotes) and then subsequent double quote stripper.

        using (TextReader reader = File.OpenText(file))
        {
            // remove commas and double quotes inside file
            var pattern = @"\""(.+?,.+)+\""";
            var results = Regex.Replace(reader.ReadToEnd(), pattern, match => match.Value.Replace(",", " "));
            results = results.Replace("\"", "");
         }