1
votes

I wrote a short function with Oledb which should read large amount of data but there are still several issues I couldnt't solve, it's about reading and inserting semicolon separated data in sql database

Private Function GetCSVFile(ByVal file As String) As DataTable

        Try
            Dim dt As New DataTable
            Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited"""
            Dim conn As New OleDb.OleDbConnection(ConStr)

            Dim da As New OleDb.OleDbDataAdapter("Select * from " & _table & ".csv", conn)
            da.Fill(dt)
            Application.DoEvents()
            getData = dt
        Catch ex As OleDbException
            MessageBox.Show(ex.Message)
        Catch ex As Exception
            MessageBox.Show(ex.Message)
        End Try
        Return getData
    End Function

1. it reads whole file but i need to tell the function that it should read only 50.000 rows and pass them to another function step for step, it should probably better work in for loop because jet oledb doesn't read files larger than 1 GB

  1. i need to replace characters
    value(i) = value(i).Replace("\t", Constants.vbTab).Replace("\n", Constants.vbLf).Replace("\r", Constants.vbCr).Replace("\""", """").Replace("\\", "\")

but it usually works only for strings

  1. I need to recognize data types like integer, double, string etc. my first idea was to do it over sql query and check the tables over tryparse

    
      Dim dtInserts As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)
                Dim ListOfTypes As New List(Of System.Type)
         For Each _col As DataColumn In dtInserts.Columns
                    Dim _type As System.Type = _col.DataType
                    ListOfTypes.Add(_type)
                Next
                Dim _wert1 As String = "11.11.2011"
                Dim _type1 As System.Type = ListOfTypes.Item(1)
                If DateTime.TryParse(_wert1, New Date) Then
                End If
    

    but still not sure if it will work

  2. All data that have been read should be encoded in 1252 Codepage. this one doesn't really work

     Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited;CODEPAGE=1252""" 

anyone idea what could be done?

1
1 question at a time please, bundling 4 together makes it too broad and too difficult to post a short answer.Ňɏssa Pøngjǣrdenlarp
#3 and #4 are very doable, #2 is too vague without sample data to understand what you are trying to do, #1 might be able to be done natively depending on the data, otherwise you may need to add a step. If you refine the question to something less broad you might get some answers before it is closedŇɏssa Pøngjǣrdenlarp
You should remove the line Application.DoEvents() - it can make strange things happen.Andrew Morton
If you need to parse a CSV file, you can use a CSV parser. The TextFieldParser class might suffice. Then your program will not be limited by JET.Andrew Morton

1 Answers

2
votes

CSVHelper can do most of what it sounds like you want. FileHelpers might also work, I just haven't worked with it as much. I wouldn't recommend VB's TextFieldParser because it returns a string array rather than typed data.

OleDB provides a great way to import, but saving to the database presents a small challenge. All the loaded rows will have a RowState of Unchanged. The only way to change that to Added is to copy the rows into a new table:

For Each dr As DataRow In dtCSV.Rows
    dtDest.Rows.Add(dr.ItemArray)
Next

It will work, but as a result, you will have 2 tables and all those rows loaded at once. It turns out that using CSVHelper and a simple INSERT query is the most economical - since the records are fetched from an IEnumerable<T> only 1 source record will be loaded at a time. It is also a bit faster than copying rows: about 20% faster on 500k rows.

Note: We have no idea whatsoever what the data looks like other than it is semicolon delimited...and there is apparently a lot of it.

Using sr As New StreamReader(CSVFilePath, False),
     csv As New CsvReader(sr)

    ' some CSVHelper config options
    csv.Configuration.HasHeaderRecord = True
    csv.Configuration.TrimFields = True
    csv.Configuration.TrimHeaders = True
    csv.Configuration.Delimiter = ";"
    csv.Configuration.IsHeaderCaseSensitive = False
    csv.Configuration.RegisterClassMap(Of RandItem.RandItemMap)()

    ' get the file into IEnumerable collection
    Dim csvData = csv.GetRecords(Of OleImportItem)()

    Dim SQL = <sql>
              INSERT INTO RandomData 
                     (Foo, Bar, Cat, Dog...)
              VALUES 
                    (@p1, @p2, @p3, @p4...)
            </sql>.Value

    Using dbcon As New OleDbConnection(ACEConnStr)
        Using cmd As New OleDbCommand(SQL, dbcon)
            dbcon.Open()

            ' create the parameters
            cmd.Parameters.Add("@p1", OleDbType.VarChar)
            cmd.Parameters.Add("@p2", OleDbType.VarChar)
            cmd.Parameters.Add("@p3", OleDbType.Integer)
            cmd.Parameters.Add("@p4", OleDbType.Integer)
            ...
            ' load one item at a time, to save it
            For Each item In csvData
                cmd.Parameters("@p1").Value = item.Foo
                cmd.Parameters("@p2").Value = item.Bar
                cmd.Parameters("@p3").Value = item.Cat
                cmd.Parameters("@p4").Value = item.Dog
                ...
                cmd.ExecuteNonQuery()
            Next
        End Using
    End Using
End Using

I cant give a complete tutorial on how to use it, but in general you create a type (Class) which defines the datatype for each column (here,RandItem), the RandItemMap is another class which specifies the order of those properties in the file. In so doing, CSVHelper knows the datatype of each column and will convert for you.

There are several ways to use it, this way is reading one line at a time from the file and immediately saves that item to the database: csvData = csv.GetRecords(Of OleImportItem)() initializes csvData as an IEnumberable(Of RandItem), so only one item is loaded at a time in the loop which makes it very economical.

In the loop, the code gets a typed item from the csv file which is then mapped to the respective parameters and saved. This is about 20% faster than copying the data to a DataTable in batches and saving; and much less memory intensive since there is only ever 1 item loaded.


For smaller csv files, you could use .ToArray() or ToList() to load the file to a collection.

Note Importing/parsing a text file without headers is a little different than shown, but just as easy.

It sounds like CSVHelper might obviate most issues in your laundry list.