I’m trying to set up daily backups (using Persistent Disk snapshots) for a PostgreSQL instance I’m running on Google Compute Engine and whose data directory lives on a Persistent Disk.
Now, according to the Persistent Disk Backups blog post, I should:
- stop my application (PostgreSQL)
fsfreezemy file system to prevent further modifications and flush pending blocks to disk- take a Persistent Disk snapshot
- unfreeze my filesystem
- start my application (PostgreSQL)
This obviously brings with it some downtime (each of the steps took from seconds to minutes in my tests) that I’d like to avoid or at least minimize.
The steps of the blog post are labeled as necessary to ensure the snapshot is consistent (I’m assuming on the filesystem level), but I’m not interested in a clean filesystem, I’m interested in being able to restore all the data that’s in my PostgreSQL instance from such a snapshot.
PostgreSQL uses fsync when committing, so all data which PostgreSQL acknowledges as committed has made its way to the disk already (fsync goes to the disk).
For the purpose of this discussion, I think it makes sense to compare a Persistent Disk snapshot without stopping PostgreSQL and without using fsfreeze with a filesystem on a disk that has just experienced an unexpected power outage.
After reading https://wiki.postgresql.org/wiki/Corruption and http://www.postgresql.org/docs/current/static/wal-reliability.html, my understanding is that all committed data should survive an unexpected power outage.
My questions are:
Is my comparison with an unexpected power outage accurate or am I missing anything?
Can I take snapshots without stopping PostgreSQL and without using
fsfreezeor am I missing some side-effect?If the answer to the above is that I shouldn’t just take a snapshot, would it be idiomatic to create another Persistent Disk, periodically use
pg_dumpall(1)to dump the entire database and then snapshot that other Persistent Disk?