5
votes

Im trying to load a csv file into a datatable using oledb.

This is no problem but unfortunately one of the fields which looks numeric has a string value in about 3% of the fields and so is not being populated.

because im converting the csv into xml i really don't care about inferring datatypes and simply need the data in a string as i can cast it later in a Linq2XMl phase.

I am hoping to be able to do this in the connection string.

I don't want to just copy the table, set it up with new columns with the datatype I want and then write the data into it because that would involve loading the csv file twice.

any ideas?

my current connection string is

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + thefile.DirectoryName + ";Extended Properties='text;HDR=Yes;FMT=Delimited'";

2

2 Answers

6
votes

Did some researchand the answer is use a schema.ini but generate it on the fly for your dataset.

http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx

contains the info required. to construct the schema:

   public static void ConstructSchema(FileInfo theFile)
    {
        StringBuilder schema = new StringBuilder(); 
        DataTable data = LoadCSV(theFile); 
        schema.AppendLine("[" + theFile.Name + "]");
        schema.AppendLine("ColNameHeader=True"); 
        for (int i = 0; i < data.Columns.Count; i++)
        {
            schema.AppendLine("col" + (i + 1).ToString() + "=" + data.Columns[i].ColumnName + " Text");
        }   
        string schemaFileName = theFile.DirectoryName + @"\Schema.ini";
        TextWriter tw = new StreamWriter(schemaFileName);   
        tw.WriteLine(schema.ToString());
        tw.Close();  
    }

to load the csv as datatable

public static DataTable LoadCSV(FileInfo theFile)
    {   
        string sqlString = "Select * FROM [" + theFile.Name + "];";
        string conStr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source="
            + theFile.DirectoryName + ";" + "Extended Properties='text;HDR=YES;'";
        DataTable theCSV = new DataTable();

        using (OleDbConnection conn = new OleDbConnection(conStr))
        {
            using (OleDbCommand comm = new OleDbCommand(sqlString, conn))
            {
                using (OleDbDataAdapter adapter = new OleDbDataAdapter(comm))
                {
                    adapter.Fill(theCSV);
                }
            }
        }
        return theCSV;
    }

to convert to xml

 public static XElement GetXMLFromCSV(FileInfo theFile, string rootNodeName, string itemName)
    {
        XElement retVal;
        DataTable data;
        data = CrateCsvAndSchema(theFile); 
        DataSet ds = new DataSet(rootNodeName);
        data.TableName = itemName;
        ds.Tables.Add(data); 
        retVal = XElement.Parse(ds.GetXml());  
        return retVal;
    }
0
votes

For reading a CSV into a DataTable I recommend this CSV parser.

It's really easy to use. Here's how you can use it to fill a DataTable with data from a comma delimited, quote qualified CSV:

    DataTable dt = null;
    using (GenericParserAdapter gp = new GenericParser.GenericParserAdapter(yourCsvFullname)) {
        dt = gp.GetDataTable();
    }

There are a number of options you can set: the delimiter, the text qualifer character(s)whether the first line in the CSV show column headers (if true, each DataColumn in your DataTable will be named accordingly), etc.

There are a number of fast, flexible CSV parsers out there but for simple requirements this one can't be beat.