mysql - Python Pandas to_sql, how to create a table with a primary key?

Question

I would like to create a MySQL table with Pandas' to_sql function which has a primary key (it is usually kind of good to have a primary key in a mysql table) as so:

group_export.to_sql(con = db, name = config.table_group_export, if_exists = 'replace', flavor = 'mysql', index = False)

but this creates a table without any primary key, (or even without any index).

The documentation mentions the parameter 'index_label' which combined with the 'index' parameter could be used to create an index but doesn't mention any option for primary keys.

Documentation

@unutbu I think the index=True just ensures the index is written to the table and that it is an index in sql, and not yet a primary key — joris
Yes, index just uses the row number as an index which is not what I want. — patapouf_ai
For now, there is not yet support for specifying primary keys (it's on the feature wishlist). Possible workaround to first create the table, and then use the 'append' option in to_sql. To create the table, pd.io.sql.get_schema could be helpful to create the schema (that then can be adapted/executed to create the table) — joris
Thanks @joris, you're right index=True makes an index, but not a primary key. — unutbu
@joris, trying to add to existing table gives error: "NOT NULL constraint failed" for 'id INT PRIMARY KEY NOT NULL'. How to fill-in PRIMARY KEY? — Alexei Martianov

krvkir krvkir · Accepted Answer · 2015-06-25T08:29:44

Disclaimer: this answer is more experimental then practical, but maybe worth mention.

I found that class pandas.io.sql.SQLTable has named argument key and if you assign it the name of the field then this field becomes the primary key:

Unfortunately you can't just transfer this argument from DataFrame.to_sql() function. To use it you should:

create pandas.io.SQLDatabase instance

engine = sa.create_engine('postgresql:///somedb')
pandas_sql = pd.io.sql.pandasSQL_builder(engine, schema=None, flavor=None)

define function analoguous to pandas.io.SQLDatabase.to_sql() but with additional *kwargs argument which is passed to pandas.io.SQLTable object created inside it (i've just copied original to_sql() method and added *kwargs):

def to_sql_k(self, frame, name, if_exists='fail', index=True,
           index_label=None, schema=None, chunksize=None, dtype=None, **kwargs):
    if dtype is not None:
        from sqlalchemy.types import to_instance, TypeEngine
        for col, my_type in dtype.items():
            if not isinstance(to_instance(my_type), TypeEngine):
                raise ValueError('The type of %s is not a SQLAlchemy '
                                 'type ' % col)

    table = pd.io.sql.SQLTable(name, self, frame=frame, index=index,
                     if_exists=if_exists, index_label=index_label,
                     schema=schema, dtype=dtype, **kwargs)
    table.create()
    table.insert(chunksize)

call this function with your SQLDatabase instance and the dataframe you want to save

to_sql_k(pandas_sql, df2save, 'tmp',
        index=True, index_label='id', keys='id', if_exists='replace')

And we get something like

CREATE TABLE public.tmp
(
  id bigint NOT NULL DEFAULT nextval('tmp_id_seq'::regclass),
...
)

in the database.

PS You can of course monkey-patch DataFrame, io.SQLDatabase and io.to_sql() functions to use this workaround with convenience.

mysql - Python Pandas to_sql, how to create a table with a primary key?

4 Answers