Continuous Integration / Deployment and Databases

Question

I have a Laravel project (API for an iOS app) and am currently setting up a continuous integration server (Jenkins) to handle deployments to AWS. I'm using tools such as Capistrano, Packer and Terraform to accomplish this.

And currently, the app has two environments: Staging and Production.

However, i'm trying to find a good way to work with Databases in this system.

Basically, i envision the pipeline being something like:

Checkout the code
run tests
deploy Staging AMIs and standup new infrastructure
QA and deploy the AMIs to Production

However, between steps 3 and 4, I'd love to do a "dry run" of the production deployment -- which is to say, trying out migrations, and having access to the potentially large data set that production will have.

So I see 2 options:

1) When we're ready to QA, export the Production DB and import it into Staging. Then run "the process" (migrations, terraform, packer, etc). If all goes well, move to Production

PROS:

You get try everything out on the literal production data set, so you have confidence things will work
You get to work with the large data sets and see if there's any bottle necks as a result of there being a large number of records as compared to a typical staging environment

CONS:

Eventually, the production database could get very big, and exporting it daily, or several times a day, could become very slow
Similar to the above point, this makes for very slow continuous integration

2) Instead of importing from Production, write configurable seeders for all the database models and run as needed for QA.

PROS:

You can seed the database with small, or very large data sets, depending on your needs for that particular deployment
The seeders are just a simple script and can be run very quickly

CONS:

You have to keep your seeders up to date with any Model changes you make.
In general, this process seems more subject to human error, versus exporting the actual data set from Production.

How do people general approach this process?

ydaetskcoR ydaetskcoR · Accepted Answer · 2017-01-14T18:01:12

Your staging environment wants to look as much like production as possible otherwise it kind of defeats the point of having it because it's going to be difficult to QA it or use it for actually testing you aren't about to break production.

As such your database migrations should move with the code and any changes you make to the underlying schema should be committed at the same time as the code that uses those changes and thus propagated through your CI pipeline at the same time.

As for actual data, we take regular snapshots of our databases (running on RDS in AWS) and then restore these in to our "like live" environments. This means our testing environments have a similar amount of data to production so we can see the impact of things like a database migration and how long it takes to perform before it hits production.

We also have some more stripped down environments for running an extensive automated test suite but these have minimal generated data that is just enough for running the tests.

In our case we are also handling personally identifiable information so our snapshot process is actually slightly more convoluted as we also randomise any potentially sensitive data, generating new names and contact details etc.

For you it's likely to come down to how painful it is to really restore data from production. I would say start with doing that and when it gets too painful or slow then consider the move to generating the data set instead and making sure it's of a size big enough to simulate production or give you a good understanding of the real world.

So in your case I would start something like this:

Jenkins picks up code changes (such as merges to master in Git).
Unit tests are run in memory in Jenkins.
A green build then triggers a Packer build of an AMI which is then tagged appropriately.
Jenkins then has Terraform build your staging environment using the most recent (sanitised if necessary) snapshot from production and the new AMI which is automatically picked up using the aws_ami data source.
If your testing passes this stage then you can trigger a production deploy which has Terraform replace the old AMI with the new AMI.

I would suggest using blue/green deploys to roll your new AMI out using a strategy such as outlined here but that's a separate question in itself (with plenty of resources elsewhere).

Continuous Integration / Deployment and Databases

1 Answers