2
votes

I have CSV text file which I'm trying to read from my code into a Dataset. If the starting character is #, the code reads alternate columns.

Dataset

following is the code that I'm using.

DataSet dsFileRecords = new DataSet();
string selectQuery = "SELECT * FROM " + filename + "";
OleDbConnection fileConnection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + decryptedFilePath + ";Extended Properties=\"text;HDR=NO;FMT=CSVDelimited;\"");
OleDbDataAdapter dsCmd = new OleDbDataAdapter(selectQuery, fileConnection);
//Fill the DataSet object
dsCmd.Fill(dsFileRecords, "ReconciliationRecords");

Update

I have updated my screenshot based on the suggested answer,

enter image description here

Input Data

#Header,TPCCH,LTPTP,TRC,F,2012/06/06 23:59:59,0000000002,0000000003,00.00,00.00,PTP0101011,PTP0101010 
011001001,0110212122,W,W1,2012/06/06 23:59:59,01100110,2L,10.00,,A,,AP09BK4890, 
011001002,0110212123,W,W1,2012/06/06 23:59:59,01100110,2L,10.00,10.00,AD,,AP09BK4890, 
011001003,0110212123,E,E1,2012/06/06 23:59:59,01100110,2L,10.00,10.00,R,012,AP09BK4891, #TRAILER,0000000003
1

1 Answers

3
votes

Sandeep, hashtag sing (#) isn't your issue here.

Since you turned off the headers, Jet engine interprets each column as the same data type. Say here

#Header, 011001001, 011001002, 011001003 are in the same column and the format for #header is different than the rest so the engine rejects it.

The similar situation for the rest of the missing ones. If you notice this issue doesn't happen int he 3rd and 4th columns since they are of a same format.

To fix this issue turn your headers ON by changing HDR=No to HDR=Yes in your connection string and the first column will be treated differently and don't get used for data type recognition purposes.

To give you a little glimps of what is happening I have written a little console application to replicate the situation.

var file = new FileInfo("C:\\TextFile.txt");

OleDbConnection fileConnection = new OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0;Data Source=\"" + file.DirectoryName + "\";Extended Properties='text;HDR=No;FMT=Delimited(,)';");
string selectQuery = "SELECT * FROM " + file.Name;

using (var cmd = new OleDbCommand(selectQuery, fileConnection))
{
    fileConnection.Open();
    OleDbDataAdapter dsCmd = new OleDbDataAdapter(selectQuery, fileConnection);
    //Fill the DataSet object
    DataSet dsFileRecords = new DataSet();
    dsCmd.Fill(dsFileRecords, "ReconciliationRecords");

    foreach (DataTable dsFileRecord in dsFileRecords.Tables)
    {
        foreach (DataRow row in dsFileRecord.Rows)
        {
            foreach (var item in row.ItemArray)
            {
                Console.WriteLine(item.ToString());
            }
            Console.WriteLine( "\r\n" );
        }
    }
}

Console.ReadLine();

What I am going to test here are the contents of my TextFile.txt

If the contents are

#FName, LName, Phone 
Hank, Donald, 0202
Walter, W, 0203
Jimmy, Jones, 0201

The output is,

#FName
LName


Hank
Donald
202

Walter
W
203

Jimmy
Jones
201

If you noticed I also added # before the first header to clear up the suspension . Now you can see the the header Phone is missing, since all it's following row's values in the same column have a numerical value.

Now check this out. My TextFile.txt content as,

#FName, LName, Phone 
Hank, Donald, 0202
Walter, W, 0203
Jimmy, Jones, 0201

gives the output in the console app as,

#FName
LName
1111

Hank
Donald
202

Walter
W
203

Jimmy
Jones
201

How good is that?! The value 1111 that I have replaced for value Phone comes up fine. :)

Turn the headers On and add this chunk to your code inside the first Foreach and before the rows Foreach and you'll get your headers printed and sorted as bellow,

foreach (DataTable dsFileRecord in dsFileRecords.Tables)
{
    // Added foreach statement. 
    foreach (DataColumn column in dsFileRecord.Columns)
    {
        Console.WriteLine(column.ColumnName.ToString());
    }

    Console.WriteLine("\r\n");

    foreach (DataRow row in dsFileRecord.Rows)
    {
        foreach (var item in row.ItemArray)
        {
            Console.WriteLine(item.ToString());
        }
        Console.WriteLine("\r\n");
    }
}

and the output is,

#FName
LName
Phone

Hank
Donald
202

Walter
W
203

Jimmy
Jones
201

Voila!

Update

#Header and #Trailer aren't part of your data, so then need to be removed before processing. A little Regex would do the trick (however if you don't know Regex (Regular Expression Language) this might look like a black magic).

This is the code need to be modified at the start

var file = new FileInfo("C:\\RealTextFile.txt");

// Regex to remove the "#Header" and "#Trailer".
// The modigied txt file will be saved with the "_" prefix.
File.WriteAllLines(file.DirectoryName + "\\_" + file.Name,
    File.ReadAllLines(file.FullName).Select(content =>
        Regex.Replace(content, @".*(?>#Header,)|#TRAILER.+", string.Empty)
    ));

file = new FileInfo(file.DirectoryName + "\\_" + file.Name);


OleDbConnection fileConnection = ...