I have a table in my database to store user data. I found a defect in the code that adds data to this table database where if a network timeout occurs, the code updated the next user's data with the previous user's data. I've addressed this defect but I need to clean the database. I've added a flag to indicate the rows that need to be ignored and my goal is to mark these flags accordingly for duplicates. In some cases, though, duplicate values may actually be legitimate so I am more interested in finding several user's with the same data (i.e, u> 2).
Here's an example (tablename = Data):
id---- user_id----data1----data2----data3----datetime-----------flag
1-----usr1--------3---------- 2---------2---------2012-02-16..-----0
2-----usr2--------3---------- 2---------2---------2012-02-16..-----0
3-----usr3--------3---------- 2---------2---------2012-02-16..-----0
In this case, I'd like to mark the 1 and 2 id flags as 1 (to indicate ignore). Since we know usr1 was the original datapoint (assuming the oldest dates are earlier in the list).
At this point there are so many entries in the table that I'm not sure the best way to identify the users that have duplicate entries.
I'm looking for a mysql command to identify the problem data first and then I'll be able to mark the entries. Could someone guide me in the right direction?