2
votes

Background: I run end-to-end tests that depend on a PostgreSQL database. I need these tests to be fast, parallelizable, deterministic, and isolated.

Therefore, I want to repeatedly recreate the database -- including schema, data, and indexes -- as quickly as possible. I create a base image:

  1. start a PostgreSQL instance
  2. run SQL statements
  3. run VACCUM FULL
  4. stop the instance
  5. tar the files in /var/lib/postgresql/data

For each test I then quickly untar that image and run a PostgreSQL instance.


Problem: This all works, but the disk files seem larger than necessary. A rather small database is still 64MB in size.

How can I achieve a smaller set of files for file-level restores? Shrink the size of the existing files? Exclude some files from the backup?

1
Why not instantiate them from a customised template?wildplasser
@wildplasser, I suppose could do that, if I reused the same PostgreSQL instance.Paul Draper
If that is not what you want, you could remove all databases except template0 and postgres. It won't get much smaller than that.Laurenz Albe
Have you considered docker? It won't save you space, but it's simple and fast.teppic
If you look into the data directory, there are also thw WAL-files. You can shrink that, but it will likely cost you performance. And yes, what is your goal? Why do you pack and unpack the data directory?A. Scherbaum

1 Answers

0
votes

Thank you for making plain your use case:

I run end-to-end tests that depend on a PostgreSQL database. I need these tests to be fast, parallelizable, deterministic, and isolated.

Therefore, I want to repeatedly recreate the database -- including schema, data, and indexes -- as quickly as possible.

The initial idea – to locate a specific set of files by path, and restore those – is fragile because the paths can and do change with different PostgreSQL versions.

As discussed in comments, a more deterministic way to do this is with more general container technologies; one that is well trodden and suitable for the use case is Docker. There are various other container options also.

The documentation for Docker has an example of how to set up a container running PostgreSQL.