I am currently trying to load the data from a csv file to a table in a database, insert any new entries and update old entries. The table has columns say (field1, field2, field3, field4, field5) with field1 as the primary key and the csv has values with only the columns (field1, field2, field3). What would be the best way to approach this? Currently I was thinking of approaching it as follows:
- Creating a table in tmp, say tmp_table
- Load csv in tmp_table using 'LOAD DATA'
Updating the old entries with a query like
UPDATE db.table1 t INNER JOIN tmp.tmp_table s ON (t.field1=s.field1) SET t.field1 = s.field1, t.field2 = s.field2, t.field3 = s.field3
Insert new entries
INSERT INTO db.table1(field1, field2, field3) SELECT t.field1, t.field2, t.field3 FROM tmp.tmp_table t LEFT JOIN db.table1 v ON (t.field1=v.field1) WHERE v.field1=NULL
Any comments on improving/optimizing/correcting the steps?