DataFrame to_sql using wrong character set with replace

Question

Panda's to_sql() with if_exists='replace' is setting the wrong charset for my table. In order to copy multiple csv's to mysql and ignoring errors on rows (like duplicate error) I first read a csv as a dataframe, then copy the csv into a temp table and finally do an INSERT IGNORE using this temp table to get the data into the proper table. However, when using if_exists='replace' to get the dataframe into the temp table, it sets the charset to latin1 instead of utf8. This is causing errors as the data contains Chinese characters. Is there a way to make sure the table (and appropriate columns) is using UTF8? I also tried specifying the charset in the connection, but that doesn't prevent the to_sql to (re)set the table to latin1

relevant code:

import pandas as pd
from sqlalchemy import create_engine
import configparser

#load configuration
config = configparser.ConfigParser()
config.read('config-local.ini')

# construct database address
db_url = 'mysql+mysqlconnector://' + config['mysql']['user'] + ':' + config['mysql']['passwd'] + '@' + \
         config['mysql']['host'] + ':' + config['mysql']['port'] + \
         '/' + config['mysql']['database'] + '?charset=utf8'

engine = create_engine(db_url)
local_filename = 'some-file.csv'
df = pd.read_csv(local_filename, header=None, encoding='utf8')
try:
    # write to database
    df.to_sql('TempTable', con=engine, if_exists='replace', index=False)
except Exception as e:
    print(e)

Chrisvdberge Chrisvdberge · Accepted Answer · 2020-01-03T11:04:44

0

votes

Setting the default charset and collation on the database level solved the problem.

DataFrame to_sql using wrong character set with replace

1 Answers