5
votes

I need SQL code to solve the tables combination problem, described on below:

Table old data: table old

    name     version    status    lastupdate      ID
    A        0.1        on        6/8/2010        1
    B        0.1        on        6/8/2010        2
    C        0.1        on        6/8/2010        3
    D        0.1        on        6/8/2010        4
    E        0.1        on        6/8/2010        5
    F        0.1        on        6/8/2010        6
    G        0.1        on        6/8/2010        7

Table new data: table new

    name     version    status    lastupdate     ID         
    A        0.1        on        6/18/2010                
                                                           #B entry deleted
    C        0.3        on        6/18/2010                #version_updated
    C1       0.1        on        6/18/2010                #new_added
    D        0.1        on        6/18/2010                
    E        0.1        off       6/18/2010                #status_updated
    F        0.1        on        6/18/2010                
    G        0.1        on        6/18/2010                
    H        0.1        on        6/18/2010                #new_added
    H1       0.1        on        6/18/2010                #new_added

the difference of new data and old date:

B entry deleted

C entry version updated

E entry status updated

C1/H/H1 entry new added

What I want is always keeping the ID - name mapping relationship in old data table no matter how data changed later, a.k.a the name always has an unique ID number bind with it.

If entry has update, then update the data, if entry is new added, insert to the table then give a new assigned unique ID. If the entry was deleted, delete the entry and do not reuse that ID later.

However, I can only use SQL with simple select or update statement then it may too hard for me to write such code, then I hope someone with expertise can give direction, no details needed on the different of SQL variant, a standard sql code as sample is enough.

Thanks in advance!

Rgs

KC

======== I listed my draft sql here, but not sure if it works, some one with expertise pls comment, thanks!

1.duplicate old table as tmp for store updates

create table tmp as select * from old

2.update into tmp where the "name" is same in old and new table

update tmp where name in (select name from new)

3.insert different "name" (old vs new) into tmp and assign new ID

insert into tmp (name version status lastupdate ID) set idvar = max(select max(id) from tmp) + 1 select * from (select new.name new.version new.status new.lastupdate new.ID from old, new where old.name <> new.name)

4. delete the deleted entries from tmp table (such as B)

delete from tmp where (select ???)

7
You do not have the ID in the new data table?tzup
Your sample output is not indicative of what you expect given your description. Is it the case that you want the ID's to remain sequential?Thomas
In addition, what database product and version are you using?Thomas
To tzup: the ID should be generated based on old table. so, before the new table be generated, there's no ID for it.K. C
@Registered User KC - You should update your tags to indicate it is MySql as that significantly changes the available solutions. Second, I'm still not clear on the problem. If a delete has taken place, there is no means to determine what was deleted unless it is logged somewhere or you are inspecting the row in a transaction before you delete it. Can you list out some more specific workflow?Thomas

7 Answers

1
votes

You never mentioned what DBMS you are using but if you are using SQL Server, one really good one is the SQL MERGE statement. See: http://www.mssqltips.com/tip.asp?tip=1704

The MERGE statement basically works as separate insert, update, and delete statements all within the same statement. You specify a "Source" record set and a "Target" table, and the join between the two. You then specify the type of data modification that is to occur when the records between the two data are matched or are not matched. MERGE is very useful, especially when it comes to loading data warehouse tables, which can be very large and require specific actions to be taken when rows are or are not present.

Example:

MERGE Products AS TARGET
USING UpdatedProducts AS SOURCE 
ON (TARGET.ProductID = SOURCE.ProductID) 
--When records are matched, update 
--the records if there is any change
WHEN MATCHED AND TARGET.ProductName <> SOURCE.ProductName 
OR TARGET.Rate <> SOURCE.Rate THEN 
UPDATE SET TARGET.ProductName = SOURCE.ProductName, 
TARGET.Rate = SOURCE.Rate 
--When no records are matched, insert
--the incoming records from source
--table to target table
WHEN NOT MATCHED BY TARGET THEN 
INSERT (ProductID, ProductName, Rate) 
VALUES (SOURCE.ProductID, SOURCE.ProductName, SOURCE.Rate)
--When there is a row that exists in target table and
--same record does not exist in source table
--then delete this record from target table
WHEN NOT MATCHED BY SOURCE THEN 
DELETE
--$action specifies a column of type nvarchar(10) 
--in the OUTPUT clause that returns one of three 
--values for each row: 'INSERT', 'UPDATE', or 'DELETE', 
--according to the action that was performed on that row
OUTPUT $action, 
DELETED.ProductID AS TargetProductID, 
DELETED.ProductName AS TargetProductName, 
DELETED.Rate AS TargetRate, 
INSERTED.ProductID AS SourceProductID, 
INSERTED.ProductName AS SourceProductName, 
INSERTED.Rate AS SourceRate; 
SELECT @@ROWCOUNT;
GO
1
votes

Let me start from the end:

In #4 you would delete all rows in tmp; what you wanted to say there is WHERE tmp.name NOT IN (SELECT name FROM new); similarly #3 is not correct syntax, but if it was it would try to insert all rows.

Regarding #2, why not use auto increment on the ID?

Regarding #1, if your tmp table is the same as new the queries #2-#4 make no sense, unless you change (update, insert, delete) new table in some way.

But (!), if you do update the table new and it has an auto increment field on ID and if you are properly updating the table (using ID) from the application then your whole procedure is unnecessary (!).

So, the important thing is that you should not design the system to work like above.

To get the concept of updating data in the database from the application side take a look at examples here (php/mysql).

Also, to get the syntax correct on your queries go through the basic version of SET, INSERT, DELETE and SELECT commands (no way around this).

1
votes

Note - if you are concerned about performance you can skip this whole answer :-)

If you can redesign have 2 tables - one with the data and other with the name - ID linkage. Something like

table_original

name     version    status    lastupdate
A        0.1        on        6/8/2010
B        0.1        on        6/8/2010
C        0.1        on        6/8/2010
D        0.1        on        6/8/2010
E        0.1        on        6/8/2010
F        0.1        on        6/8/2010
G        0.1        on        6/8/2010

and name_id

name     ID 
A        1 
B        2 
C        3 
D        4 
E        5 
F        6 
G        7

When you get the table_new with the new set of data

  1. TRUNCATE table_original
  2. INSERT INTO name_id (names from table_new not in name_id)
  3. copy table_new to table_original

Note : I think there's a bit of ambiguity about the deletion here

If the entry was deleted, delete the entry and do not reuse that ID later.

If name A gets deleted, and it turns up again in a later set of updates do you want to a. reuse the original ID tagged to A, or b. generate a new ID?

If it's b. you need a column Deleted? in name_id and a last step

4 . set Deleted? = Y where name not in table_original

and 2. would exclude Deleted? = Y records.

You could also do the same thing without the name_id table based on the logic that the only thing you need from table_old is the name - ID links. Everything else you need is in table_new,

1
votes

This works in Informix and gives exactly the display you require. Same or similar should work in MySQL, one would think. The trick here is to get the union of all names into a temp table and left join on that so that the values from the other two can be compared.

SELECT DISTINCT name FROM old
UNION
SELECT DISTINCT name FROM new
INTO TEMP _tmp;

SELECT 
  CASE WHEN b.name IS NULL THEN ''
       ELSE aa.name
       END AS name, 
  CASE WHEN b.version IS NULL THEN ''
       WHEN a.version = b.version THEN a.version 
       ELSE b.version
       END AS version,
  CASE WHEN a.status = b.status THEN a.status 
       WHEN b.status IS NULL THEN ''
       ELSE b.status
       END AS status,
  CASE WHEN a.lastupdate = b.lastupdate THEN a.lastupdate 
       WHEN b.lastupdate IS NULL THEN null
       ELSE b.lastupdate
       END AS lastupdate,
  CASE WHEN a.name IS NULL THEN '#new_added'
       WHEN b.name IS NULL THEN '#' || aa.name || ' entry deleted'
       WHEN a.version  b.version THEN '#version_updated'
       WHEN a.status  b.status THEN '#status_updated'
       ELSE ''
  END AS change
  FROM _tmp aa
  LEFT JOIN old a
         ON a.name = aa.name
  LEFT JOIN new b
         ON b.name = aa.name;
0
votes

a drafted approach, I have no idea if it works fine......

CREATE TRIGGER auto_next_id AFTER INSERT ON table FOR EACH ROW BEGIN UPDATE table SET uid = max(uid) + 1 ; END;

0
votes

If I understood well what you need based on the comments in the two tables, I think you can simplify a lot your problem if you don't merge or update the old table because what you need is table new with the IDs in table old when they exist and new IDs when they do not exist, right?

New records: table new has the new records already - OK (but they need a new ID) Deleted Records: they are not in table new - OK Updated Records: already updated in table new - OK (need to copy ID from table old) Unmodified records: already in table new - OK (need to copy ID from table old)

So the only thing you need to do is to: (a) copy the IDs from table old to table new when they exist (b) create new IDs in table new when they do not exist in table old (c) copy table new to table old.

(a) UPDATE new SET ID = IFNULL((SELECT ID FROM old WHERE new.name = old.name),0);

(b) UPDATE new SET ID = FUNCTION_TO GENERATE_ID(new.name) WHERE ID = 0;

(c) Drop table old; CREATE TABLE old (select * from new);

As I don't know which SQL database you are using, in (b) you can use an sql function to generate the unique id depending on the database. With SQL Server, newid(), With postgresql (not too old versions), now() seems a good choice as its precision looks sufficient (but not in other databases as MySQL for example as I think the precision is limited to seconds)

Edit: Sorry, I hadn't seen you're using sqlite and python. In this case you can use str(uuid.uuid4()) function (uuid module) in python to generate the uuid and fill the ID in new table where ID = 0 in step (b). This way you'll be able to join 2 independent databases if needed without conflicts on the IDs.

0
votes

Why don't you use a UUID for this? Generate it once for a plug-in, and incorporate/keep it into the plug-in, not into the DB. Now that you mention python, here's how to generate it:

import uuid
UID = str(uuid.uuid4()) # this will yield new UUID string

Sure it does not guarantee global uniqueness, but chances you get the same string in your project is pretty low.