2
votes

I have about 20 .csv files which are around 100-200mb each.

They each have about 100 columns.

90% of the columns of each file are the same; however, some files have more columns and some files have less columns.

I need to import all of these files into one table in a sql server 2008 database.

If the field does not exist, I need it to be created.

question: What should be the process with this import? How do I more efficiently and quickly import all of these files into one table in a database, and make sure that if a field does not exist, then it is created? Please also keep in mind that the same field might be in a different location. For example, CAR can be in field AB in one csv whereas the same field name (CAR) can be AC in the other csv file. The solution can be SQL or C# or both.

8
Would this be an ongoing thing? or a one time import? - John Hartsock
@JohnHartsock why do you ask? - Alex Gordon
Because it may be easier to create SSIS package or temporarily import the files to dumb tables then using TSQL import the data. - John Hartsock
for a one time just import each file using bcp. You should already have clear understanding of how many columns there are and how each column in the table maps to a field in each csv. You should also understand how the csv data denormalizes into relations. - Remus Rusanu

8 Answers

3
votes

You may choose a number of options 1. Use the DTS package 2. Try to produce one uniform CSV file, get the db table in sync with its columns and bulk insert it 3. Bulk insert every file to its own table, and after that merge the tables into the target table.

3
votes

I would recommend looking at the BCP program which comes with SQL Server and is intended to help with jobs just like this:

http://msdn.microsoft.com/en-us/library/aa337544.aspx

There are "format files" which allow you to specify which CSV columns go to which SQL columns.

If you are more inclined to use C#, have a look at the SqlBulkCopy class:

http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

Also take a look at this SO thread, also about importing from CSV files into SQL Server:

SQL Bulk import from CSV

1
votes

I recommend writing a small c# application that reads each of the CSV file headers and stores a dictionary of the columns needed and either outputs a 'create table' statement or directly runs a create table operation on the database. Then you can use Sql Management Studio to load the 20 files individually using the import routine.

1
votes

Use SqlBulkCopy class in System.Data.SqlClient

It facilitates bulk data transfer. only catch it wont work with DataTime DB column

1
votes

Less of an answer and more of a direction, but here I go. The way I would do it is first enumerate the column names from both the CSV files and the DB, then make sure the ones from your CSV all exist in the destination.

Once you have validated and/or created all the columns, then you can do your bulk insert. Assuming you don't have multiple imports happening at the same time, you could cache the column names from the DB when you start the import, as they shouldn't be changing.

If you will have multiple imports running at the same time, then you will need to make sure you have a full table lock during the import, as race conditions could show up.

I do a lot of automated imports for SQL DBs, and I haven't ever seen what you asked, as it's an assumed requirement that one knows the data that is coming in to the DB. Not knowing columns ahead of time is typically a very bad thing, but it sounds like you have an exception to the rule.

1
votes

Roll your own.

Keep (or create) a runtime representation of the target table's columns in the database. Before importing each file, check to see if the column exists already. If it doesn't, run the appropriate ALTER statement. Then import the file.

The actual import process can and probably should be done by BCP or whatever Bulk protocol you have available. You will have to do some fancy kajiggering since the source data and destination align only logically, not physically. So you will need BCP format files.

1
votes

There are several possibilities that you have here.

  • You can use SSIS if it is available to you.
  • In Sql Server you can use SqlBulkCopy to bulk insert in a staging table where you will insert the whole .csv file and then use a stored procedure with possibly MERGE statement in it to place each row where it belongs or create a new one if it doesn't exist.
  • You can use C# code to read the files and write them using SqlBulkInsert or EntityDataReader
0
votes

For those data volumes, you should use an ETL. See this tutorial.

ETLs are designed for large amount of data manipulation