Store all data changes with every details (like Stackoverflow) [closed]

Question

I have system written using Codeigniter and as a database using MySQL. System have user, usergroups with different privileges and etc. Have lots of mysql tables which have many to many relationships.

Some of the tables I have:

items
contracts
customers
products
product_features
orders
order_features
order_products
etc...

Currently I am logging every change on data for these tables which made by users. Users can change these datas due to their privilege. Storing change of logs only simple form like

A user changed product features with id of A8767
B user added new customer with id 56
C user edited content of orderlist
A user added new product (id: A8767) to order (id: or67)
...

I want keep all changes which made with every detail, like edit history of question Stackoverflow. I can think about log_table design to keep all data changes from various tables. Is there any way, tutorial, engine , plugin to do that ? Only i can think make duplicate of every table and keep storing changes on them, but i dont think its good way.

You might want to read wikipedia's article about slowly changing dimensions. — ta.speot.is
related: stackoverflow.com/questions/762405/database-data-versioning — Kaii
related: stackoverflow.com/questions/39281/database-design-for-revisions — Kaii

Kaii Kaii · Accepted Answer · 2012-03-24T22:58:57

I've been thinking about that for a while now and can only think of two ways to do this. Both can work fully transparent when crafted into an abstract data layer / model.

By the way there is an implementation for "versionable" table data in the ORM mapper doctrine. See this example in their docs. Maybe that fits your needs, but it doesn't fit mine. It seems to delete all history data when the original record is deleted, making it not really revision safe.

Option A: have a copy of each table to hold revision data

Lets say you have a simple contact table:

CREATE TABLE contact (
    id INT NOT NULL auto_increment,
    name VARCHAR(255),
    firstname VARCHAR(255),
    lastname VARCHAR(255),
    PRIMARY KEY (id)
)

You would create a copy of that table and add revision data:

CREATE TABLE contact_revisions (
    id INT NOT NULL,
    name VARCHAR(255),
    firstname VARCHAR(255),
    lastname VARCHAR(255),
    revision_id INT auto_increment,
    type ENUM('INSERT', 'UPDATE', 'DELETE') NOT NULL,
    change_time DEFAULT current_timestamp,
    PRIMARY KEY(revision_id)
)

Keep track of INSERT and UPDATE using AFTER triggers. On each new data revision in the original, insert a copy of the new data in the revision table and set the modification type properly.

To log a DELETE revisionally safe you must also insert a new row in the history table! For this you should use a BEFORE DELETE trigger and store the latest values before they are deleted. Otherwise you will have to remove every NOT NULL constraint in the history table as well.

Some important notes regarding this implementation

For the history table you must drop every UNIQUE KEY (here: the PRIMARY KEY) from the revision table because you will have the same key multiple times for each data revision.
When you ALTER the schema and data in the original table via an update (e.g. software update) you must ensure the same data or schema corrections are applied to the history table and its data, too. Otherwise you will run into trouble when reverting to an older revision of a record set.
In a real world implementation you would want to know which user modified the data. To have that revisionally safe a user record should be never deleted from the users table. You should just set the account disabled with a flag.
Usually, a single user action involves more than one table. In a real world implementation, you would also have to keep track which changes in multiple tables belong to a single user transaction and also in which order. In a real use case you would want to revert all changes of a single transaction together, in a reverse order. That would require an additional revision table which keeps track on the users and transactions and holds a loose relationship to all those individual revisions in the history tables.

Benefits:

completely in database, independent from application code. (well, not when tracking user transactions is important. that would require some logic outside the scope of the single query)
all data is in their original format, no implicit type conversions.
good performance on search in the revisions
easy rollback. Just do a simple INSERT .. ON DUPLICATE KEY UPDATE .. statement on the original table, using the data from the revision you want to roll back.

Merits:

Hard to implement manually.
Hard (but not impossible) to automate when it comes to database migrations / application updates.

As already stated above, doctrines versionable does something similiar.

Option B: have a central change log table

preface: bad practice, shown for illustration of the alternative only.

This approach does heavily rely on application logic, which should be hidden in a data layer / model.

You have a central history table that keeps track on

Who did
when
modify, insert or delete
what data
in which field
of which table

Like in the other approach, you may also want to track which individual data changes belong to a single user action / transaction and in which order.

Benefits:

no need to keep in sync with the original table when adding fields to a table or creating a new table. it scales transparently.

Merits:

bad practice using a simple value = key store in database
bad search performance, because of implicit type conversions
may slowdown overall performance of the application/database, when the central history table becomes a bottleneck because of write locks (this only applies for specific engines with table locks, i.e. MyISAM)
It's much harder to implement rollbacks
possible data conversion errors / precision loss because of implicit type conversion
doesn't keep track of changes when you directly access the database somewhere in your code instead of using your model / data layer and forget that in this case you must write to the revision log manually. May be a big issue when working in a team with other programmers.

Conclusion:

Option B can be very handy for small apps as a simple "drop in" when its just for logging changes.
If you want to go back in time and be able to easily compare the differences between historic revison 123 to revision 125 and/or revert to the old data, then Option A is the hard way to go.

Store all data changes with every details (like Stackoverflow) [closed]

2 Answers