I have a Laravel project (API for an iOS app) and am currently setting up a continuous integration server (Jenkins) to handle deployments to AWS. I'm using tools such as Capistrano, Packer and Terraform to accomplish this.
And currently, the app has two environments: Staging and Production.
However, i'm trying to find a good way to work with Databases in this system.
Basically, i envision the pipeline being something like:
- Checkout the code
- run tests
- deploy Staging AMIs and standup new infrastructure
- QA and deploy the AMIs to Production
However, between steps 3 and 4, I'd love to do a "dry run" of the production deployment -- which is to say, trying out migrations, and having access to the potentially large data set that production will have.
So I see 2 options:
1) When we're ready to QA, export the Production DB and import it into Staging. Then run "the process" (migrations, terraform, packer, etc). If all goes well, move to Production
PROS:
- You get try everything out on the literal production data set, so you have confidence things will work
- You get to work with the large data sets and see if there's any bottle necks as a result of there being a large number of records as compared to a typical staging environment
CONS:
- Eventually, the production database could get very big, and exporting it daily, or several times a day, could become very slow
- Similar to the above point, this makes for very slow continuous integration
2) Instead of importing from Production, write configurable seeders for all the database models and run as needed for QA.
PROS:
- You can seed the database with small, or very large data sets, depending on your needs for that particular deployment
- The seeders are just a simple script and can be run very quickly
CONS:
You have to keep your seeders up to date with any Model changes you make.
In general, this process seems more subject to human error, versus exporting the actual data set from Production.
How do people general approach this process?