2
votes

I have a MySQL query that performs batch INSERTs and uses ON DUPLICATE KEY UPDATE to update a row in case there's a unique key duplicate.

INSERT INTO table1
    (col1,col2,col3)
    VALUES 
    (val1,val2,val3),
    (val4,val5,val6),
    (val7,val8,val9),
    ...
    (valn,valx,valz)
    ON DUPLICATE KEY UPDATE
    col3 = VALUES(col3);

In other words, new rows are inserted unless there's a duplicate unique key, in which case col3 is updated.

When the query is finished, I would like to know how many rows were INSERTED as well as how many rows were UPDATED. Is this possible?

3

3 Answers

2
votes

No, there's no definitive way to tell from the rows_affected count. There's some corner cases where we can tell... if rows_affected is exactly twice the number of rows we attempted to insert, we know they were all updates. If the rows_affected count is zero, we know that no rows were inserted. If the rows_affected count is one, we know that one row was inserted. But aside from that, there are a lot of permutations.

It might be possible to craft BEFORE INSERT and BEFORE UPDATE trigger to increment user-defined variables. If we initialize the user-defined variables immediately before the INSERT ... ON DUPLICATE KEY UPDATE statement, we could combine use those variables to determine how many rows we attempted to insert, and how many of those rows caused a duplicate key exception. (MySQL doesn't increment the rows_affected for an UPDATE action that causes no actual update to the row.)


EDIT

If you have a guarantee that an UPDATE action will cause an actual change to the row... if you are changing the value of at least one column on each row, for every row that is changed... and if you have a count of the actual number of rows you are attempting to insert, then you could determine from the rows_affected count how many rows were inserted, and how many rows were updated.

The INSERT ... ON DUPLICATE KEY can cause the same row to be inserted and updated, and/or cause the same row to be updated multiple times.

Did you want a count of the number of "update" operations, including updates to the same row, or did you want a count of the number of rows in the table that got updated?

2
votes

expanding on the hakkikonu's Answer, and read it first or this will make no sense if it does at all ...

and agreeing with @spencer7593 's comments, such things as "concurrency killing (CK) operations with locks", and the need to fix the formula for determining the update count in hak's Answer

i see no way of getting accurate insert and update counts without CK. Throwing in AFTER triggers certainly doesn't help solve it without CK, "alone and at the same time being accurate".

were one to have the nullable table1.blabla column only for use with batches against table1, regardless of the frequency of such batches. if a batch is not running against table1, blabla is guaranteed to be null even if the column is not dropped. it is obvious how.

i believe you can get insert and update counts accurately. here is how and based on your Insert statement.

table1 has write lock given exclusive to the batch code. let's assume the MyISAM storage engine. hey why not, we are making assumptions here. blabla column shows null 'inserted' and 'updated' based on your statement (barely different that what Hakkikonu suggested). You have your counts.

Concerning what spencer wrote in his Answer about updates and or inserts happening more than once for a given row based on your Question's Insert statement, I don't see it that way. Unless your batch data has duplicate keys presented in which case what does accuracy matter anyway.

Either the row is there or not to begin with based on whatever threw the ON DUPLICATE KEY. If it threw it, it is an update, if didn't, insert. Someone correct me.

Then at end the alter table drop blabla is performed or update to null. lock released.

so i guess how important is update and insert counts, the size of the table, and the frequency of batches.

1
votes

Add a new column something like blabla and give it null as default value. I assumed, you will use this only once.

Then

ON DUPLICATE KEY UPDATE
    col3   = VALUES(col3),
    blabla = 'up' ;

SELECT count(blabla) as allrows FROM table1;                      # returns all rows count
SELECT count(blabla) as updrows FROM table1 WHERE blabla = 'up';  # returns update count
SELECT count(blabla) as insrows FROM table1 WHERE blabla IS NULL; # returns inserts