python - How to write UTF-8 in a CSV file

90

votes

I am trying to create a text file in csv format out of a PyQt4 QTableWidget. I want to write the text with a UTF-8 encoding because it contains special characters. I use following code:

import codecs
...
myfile = codecs.open(filename, 'w','utf-8')
...
f = result.table.item(i,c).text()
myfile.write(f+";")

It works until the cell contains a special character. I tried also with

myfile = open(filename, 'w')
...
f = unicode(result.table.item(i,c).text(), "utf-8")

But it also stops when a special character appears. I have no idea what I am doing wrong.

pythoncsvencodingutf-8

"it salso tops"? What does that mean? What error do you get? What is your input? – user1907906

The input is a pyqt4 QTableWidgetItem. The problem is that i don't get any error because script is running as a plugin. – Martin

Then try to reproduce the problem outside of QT. – user1907906

Found the solution. I had to write myfile.write(u"%s"&f+";") – Martin

See also: How do I read and write CSV files with Python? – Martin Thoma

108

votes

It's very simple for Python 3.x (docs).

import csv

with open('output_file_name', 'w', newline='', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file, delimiter=';')
    writer.writerow('my_utf8_string')

For Python 2.x, look here.

106

votes

From your shell run:

pip2 install unicodecsv

And (unlike the original question) presuming you're using Python's built in csv module, turn
import csv into
import unicodecsv as csv in your code.

14

votes

Use this package, it just works: https://github.com/jdunck/python-unicodecsv.

5

votes

For me the UnicodeWriter class from Python 2 CSV module documentation didn't really work as it breaks the csv.writer.write_row() interface.

For example:

csv_writer = csv.writer(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

works, while:

csv_writer = UnicodeWriter(csv_file)
row = ['The meaning', 42]
csv_writer.writerow(row)

will throw AttributeError: 'int' object has no attribute 'encode'.

As UnicodeWriter obviously expects all column values to be strings, we can convert the values ourselves and just use the default CSV module:

def to_utf8(lst):
    return [unicode(elem).encode('utf-8') for elem in lst]

...
csv_writer.writerow(to_utf8(row))

Or we can even monkey-patch csv_writer to add a write_utf8_row function - the exercise is left to the reader.

2

votes

The examples in the Python documentation show how to write Unicode CSV files: http://docs.python.org/2/library/csv.html#examples

(can't copy the code here because it's protected by copyright)

0

votes

For python2 you can use this code before csv_writer.writerows(rows)
This code will NOT convert integers to utf-8 strings

def encode_rows_to_utf8(rows):
    encoded_rows = []
    for row in rows:
        encoded_row = []
        for value in row:
            if isinstance(value, basestring):
                value = unicode(value).encode("utf-8")
            encoded_row.append(value)
        encoded_rows.append(encoded_row)
    return encoded_rows

-1

votes

A very simple hack is to use the json import instead of csv. For example instead of csv.writer just do the following:

    fd = codecs.open(tempfilename, 'wb', 'utf-8')  
    for c in whatever :
        fd.write( json.dumps(c) [1:-1] )   # json dumps writes ["a",..]
        fd.write('\n')
    fd.close()

Basically, given the list of fields in correct order, the json formatted string is identical to a csv line except for [ and ] at the start and end respectively. And json seems to be robust to utf-8 in python 2.*

python - How to write UTF-8 in a CSV file

7 Answers