1
votes

I am currently trying to load the data from a csv file to a table in a database, insert any new entries and update old entries. The table has columns say (field1, field2, field3, field4, field5) with field1 as the primary key and the csv has values with only the columns (field1, field2, field3). What would be the best way to approach this? Currently I was thinking of approaching it as follows:

  • Creating a table in tmp, say tmp_table
  • Load csv in tmp_table using 'LOAD DATA'
  • Updating the old entries with a query like

    UPDATE db.table1 t INNER JOIN tmp.tmp_table s ON (t.field1=s.field1)
    SET t.field1 = s.field1, t.field2 = s.field2, t.field3 = s.field3
    
  • Insert new entries

    INSERT INTO db.table1(field1, field2, field3)
    SELECT t.field1, t.field2, t.field3 
    FROM tmp.tmp_table t 
    LEFT JOIN db.table1 v ON (t.field1=v.field1) 
    WHERE v.field1=NULL
    

Any comments on improving/optimizing/correcting the steps?

1

1 Answers

1
votes

Since you are using MySQL I suggest this non-general solution.

Include field4 and field5 in your tmp table.

After creating the tmp table update field4 and field5 with values for existing records in table1.

UPDATE tmp.tmp_table s INNER JOIN db.table1 t ON (t.field1=s.field1)
SET s.field4 = t.field4, s.field5 =t.field5

Then

REPLACE INTO db.table1 VALUES (field1, field2, field3, field4, field5)
SELECT t.field1, t.field2, t.field3, t.field4, t.field5 
FROM tmp.tmp_table t