How can I use python to change the delimiter of a csv file while also stripping the fields of the new delimiter?

Question

I receive a well formated csv file, with double-quotes around text fields that contain commas.

Alas, I need to load it into SQL Server, which, as far as I have learned (please tell me how I am wrong here) cannot handle quote-enclosed fields that contain the delimiter.

So, I would like to write a python script which will a) convert the file to pipe-delimited, and b) strip whatever pipes exist in the fields (my sense is that commas are more common, so I'd like to save them, plus I also have some numeric fields that might, at least in the future, contain commas).

Here is the code that I have to do a:

import csv
import sys

source_file=sys.argv[1]
good_file=sys.argv[2]
bad_file=sys.argv[3]

with open(source_file, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    with open(good_file, 'w') as new_file:
            csv_writer = csv.DictWriter(new_file, csv_reader.fieldnames, delimiter='|')
            headers = dict( (n,n) for n in csv_reader.fieldnames)
            csv_writer.writerow(headers)
            for line in csv_reader:
                    csv_writer.writerow(str.replace(line, '|', ' '))

How can I augment it to do b?

ps--I am using python 2.6, IIRC.

Could you add a few lines from the source CSV to your question and show your expected output for it? — Martin Evans

jamie jamie · Accepted Answer · 2018-11-06T03:45:12

SQL Server can load the type of file you describe. The file can most certainly be loaded with an SSIS package and can also be loaded with the SQL Server bcp utility. Writing the python script would not be the way to go (to introduce another technology into the mix when not needed... just imho). SQL Server is equipped to handle exactly what you are wanting to do.

ssis is pretty straightforward. For BCP, you'll need to not use the -t option (to specify a field terminator for the entire file) and instead use a format file. Using a format file, you can customize each fields terminator. For the fields that are quoted you'll want to use a custom delimiter. See this post or many others like it that detail how to use BCP and files with delimiters and quoted fields to hide delimiters that might appear in the data.

SQL Server BCP Export where comma in SQL field

How can I use python to change the delimiter of a csv file while also stripping the fields of the new delimiter?

1 Answers