4
votes

I have written the following snippet to import a CSV file into an MS SQL Server database but it gives me an error. It is based on code written for Sqlite for Python and changed for MSSQL.

import csv, pyodbc
import logging

def _get_col_datatypes(fin):
    dr = csv.DictReader(fin) # comma is default delimiter
    fieldTypes = {}
    for entry in dr:
        feildslLeft = [f for f in dr.fieldnames if f not in fieldTypes.keys()]
        if not feildslLeft: break # We're done
        for field in feildslLeft:
            data = entry[field]

            # Need data to decide
            if len(data) == 0:
                continue

            if data.isdigit():
                fieldTypes[field] = "INTEGER"
            else:
                fieldTypes[field] = "TEXT"
        # TODO: Currently there's no support for DATE in sqllite

    if len(feildslLeft) > 0:
        raise Exception("Failed to find all the columns data types - Maybe some are empty?")

    return fieldTypes


def escapingGenerator(f):
    for line in f:
        yield line.encode("ascii", "xmlcharrefreplace").decode("ascii")


def csvToDb(csvFile, outputToFile = False):
    # TODO: implement output to file

    with open(csvFile,mode='r') as fin:
        dt = _get_col_datatypes(fin)

        fin.seek(0)

        reader = csv.DictReader(fin)

        # Keep the order of the columns name just as in the CSV
        fields = reader.fieldnames
        cols = []


        # Set field and type
        for f in fields:
            cols.append("%s %s" % (f, dt[f]))

        # Generate create table statement:
        stmt = "CREATE TABLE ads (%s)" % ",".join(cols)

        con = pyodbc.connect('DRIVER={SQL Server};SERVER=localhost;DATABASE=sd;UID=Test;PWD=11')
        cur = con.cursor()
        cur.execute(stmt)

        fin.seek(0)


        reader = csv.reader(escapingGenerator(fin))

        # Generate insert statement:
        stmt = "INSERT INTO ads VALUES(%s);" % ','.join('?' * len(cols))

        cur.executemany(stmt, reader)
        con.commit()

    return con


csvToDb('Books.csv')

The error I am getting is

pyodbc.DataError: ('22018', "[22018] [Microsoft][ODBC SQL Server Driver][SQL Server]Conversion failed when converting the varchar value 'a' to data type int. (245) (SQLExecDirectW)")

Also please suggest if you think there are any other methods to dynamically import CSV or text files into an MSSQL database

1

1 Answers

2
votes

The error message

Conversion failed when converting the varchar value 'a' to data type int.

reveals that your code can be "fooled" into thinking that a column is integer when it is really text, presumably because it only looks at the first row of data. Testing reveals that both

ID,txt1,txt2,int1
1,foo,123,3
2,bar,abc,4

and

"ID","txt1","txt2","int1"
1,"foo","123",3
2,"bar","abc",4

result in your code producing the CREATE TABLE statement:

CREATE TABLE ads (ID INTEGER,txt1 TEXT,txt2 INTEGER,int1 INTEGER)

which is wrong because the [txt2] column is not really INTEGER.

You could investigate tweaking your code to look at more than the first data row. (Microsoft's own import routines often default to the first eight rows when attempting to auto-detect data types.) You could also just import all columns as text and then convert them later in SQL server.

However, given that there must be hundreds – if not thousands – of examples out there for importing CSV data to SQL Server you should also consider doing a more exhaustive search for existing (debugged) code before you continue investing time and effort into "rolling your own solution".