I've been thinking about that for a while now and can only think of two ways to do this. Both can work fully transparent when crafted into an abstract data layer / model.
By the way there is an implementation for "versionable" table data in the ORM mapper doctrine. See this example in their docs. Maybe that fits your needs, but it doesn't fit mine. It seems to delete all history data when the original record is deleted, making it not really revision safe.
Option A: have a copy of each table to hold revision data
Lets say you have a simple contact table:
CREATE TABLE contact (
id INT NOT NULL auto_increment,
name VARCHAR(255),
firstname VARCHAR(255),
lastname VARCHAR(255),
PRIMARY KEY (id)
)
You would create a copy of that table and add revision data:
CREATE TABLE contact_revisions (
id INT NOT NULL,
name VARCHAR(255),
firstname VARCHAR(255),
lastname VARCHAR(255),
revision_id INT auto_increment,
type ENUM('INSERT', 'UPDATE', 'DELETE') NOT NULL,
change_time DEFAULT current_timestamp,
PRIMARY KEY(revision_id)
)
Keep track of INSERT
and UPDATE
using AFTER
triggers. On each new data revision in the original, insert a copy of the new data in the revision table and set the modification type
properly.
To log a DELETE
revisionally safe you must also insert a new row in the history table! For this you should use a BEFORE DELETE
trigger and store the latest values before they are deleted. Otherwise you will have to remove every NOT NULL
constraint in the history table as well.
Some important notes regarding this implementation
- For the history table you must drop every
UNIQUE KEY
(here: the PRIMARY KEY
) from the revision table because you will have the same key multiple times for each data revision.
- When you
ALTER
the schema and data in the original table via an update (e.g. software update) you must ensure the same data or schema corrections are applied to the history table and its data, too. Otherwise you will run into trouble when reverting to an older revision of a record set.
- In a real world implementation you would want to know which user modified the data. To have that revisionally safe a user record should be never deleted from the users table. You should just set the account disabled with a flag.
- Usually, a single user action involves more than one table. In a real world implementation, you would also have to keep track which changes in multiple tables belong to a single user transaction and also in which order. In a real use case you would want to revert all changes of a single transaction together, in a reverse order. That would require an additional revision table which keeps track on the users and transactions and holds a loose relationship to all those individual revisions in the history tables.
Benefits:
- completely in database, independent from application code. (well, not when tracking user transactions is important. that would require some logic outside the scope of the single query)
- all data is in their original format, no implicit type conversions.
- good performance on search in the revisions
- easy rollback. Just do a simple
INSERT .. ON DUPLICATE KEY UPDATE ..
statement on the original table, using the data from the revision you want to roll back.
Merits:
- Hard to implement manually.
- Hard (but not impossible) to automate when it comes to database migrations / application updates.
As already stated above, doctrines versionable
does something similiar.
Option B: have a central change log table
preface: bad practice, shown for illustration of the alternative only.
This approach does heavily rely on application logic, which should be hidden in a data layer / model.
You have a central history table that keeps track on
- Who did
- when
- modify, insert or delete
- what data
- in which field
- of which table
Like in the other approach, you may also want to track which individual data changes belong to a single user action / transaction and in which order.
Benefits:
- no need to keep in sync with the original table when adding fields to a table or creating a new table. it scales transparently.
Merits:
- bad practice using a simple value = key store in database
- bad search performance, because of implicit type conversions
- may slowdown overall performance of the application/database, when the central history table becomes a bottleneck because of write locks (this only applies for specific engines with table locks, i.e. MyISAM)
- It's much harder to implement rollbacks
- possible data conversion errors / precision loss because of implicit type conversion
- doesn't keep track of changes when you directly access the database somewhere in your code instead of using your model / data layer and forget that in this case you must write to the revision log manually. May be a big issue when working in a team with other programmers.
Conclusion:
- Option B can be very handy for small apps as a simple "drop in" when its just for logging changes.
- If you want to go back in time and be able to easily compare the differences between historic revison 123 to revision 125 and/or revert to the old data, then Option A is the hard way to go.