1
votes

Enviroment: 800GB Postgres-Database (OpenSuse)

Normal Restore-Process:

  • You have pg_basebackup to restore (from let's say: every Saturday)
  • You have WAL files from last Saturday to today
  • First: Restore with pg_basebackup
  • Then: Update database with WAL-files to have newest data. (with recovery.conf)

My Idea:
Why do every week big pg_basebackup and copy 800GB over Internet to NAS, when you have everyday incremental Backups with some backup-software.

  • Restore complete database-vm (stand yesterday)
  • Add WAL-files (restore) to bring this vm-clone up to date.

Now i've done:

  • I restored a vm
  • create recovery.conf

    restore_command = 'cp /.../%f %p'

  • rcpostgresql start

I get following errors:

2017-05-09 16:46:07.780 CEST [2938]: [1-1] user=,db=,app=,client= LOG:  database system was shut down at 2017-05-09 16:45:47 CEST
2017-05-09 16:46:07.780 CEST [2938]: [2-1] user=,db=,app=,client= LOG:  starting archive recovery
2017-05-09 16:46:08.588 CEST [2952]: [1-1] user=[unknown],db=[unknown],app=[unknown],client=[local] LOG:  connection received: host=[local]
2017-05-09 16:46:08.588 CEST [2952]: [2-1] user=postgres,db=postgres,app=[unknown],client=[local] FATAL:  the database system is starting up
2017-05-09 16:46:09.391 CEST [2938]: [3-1] user=,db=,app=,client= LOG:  restored log file "000000010000070D0000008A" from archive
2017-05-09 16:46:09.434 CEST [2938]: [4-1] user=,db=,app=,client= LOG:  contrecord is requested by 70D/8A000028
2017-05-09 16:46:09.434 CEST [2938]: [5-1] user=,db=,app=,client= LOG:  invalid primary checkpoint record
2017-05-09 16:46:09.434 CEST [2938]: [6-1] user=,db=,app=,client= LOG:  invalid secondary checkpoint link in control file
2017-05-09 16:46:09.434 CEST [2938]: [7-1] user=,db=,app=,client= PANIC:  could not locate a valid checkpoint record
2017-05-09 16:46:09.434 CEST [2936]: [4-1] user=,db=,app=,client= LOG:  startup process (PID 2938) was terminated by signal 6: Aborted
2017-05-09 16:46:09.434 CEST [2936]: [5-1] user=,db=,app=,client= LOG:  aborting startup due to startup process failure

After pg_resetxlog the next WAL-file was restored. and i get same error (with next wal-file-name)

Is there any way to get this working?

2
So long as you are calling pg_start_backup() and pg_stop_backup(), your incremental backup covers all database files and you have all the WAL files from pg_start_backup onwards this should work.Richard Huxton

2 Answers

0
votes

From your error I assume you skipped pg_start_backup. Otherwise you should have missing checkpoint:

pg_start_backup accepts an arbitrary user-defined label for the backup. (Typically this would be the name under which the backup dump file will be stored.) When used in exclusive mode, the function writes a backup label file (backup_label) and, if there are any links in the pg_tblspc/ directory, a tablespace map file (tablespace_map) into the database cluster's data directory, performs a checkpoint, and then returns the backup's starting transaction log location as text.

Following the logic the sequence should be this:

  • backup:

    1. the day before - just before VM copy, run select pg_start_backup('some label') (make sure it returned location - it can take long time to make a savepoint, or force fast creation at IO spike price)
    2. VM backup
    3. select pg_stop_backup()
  • restore:
    1. I restored a vm
    2. create recovery.conf with restore_command = 'cp /.../%f %p'
    3. rcpostgresql start
    4. let people know if it worked

Also yo might want to read about the pg_control, chechpoints and recover sequence here.

0
votes

After a few days, I was able to get this working. @Vao Tsun 's help bring me into right direction, but sadly wasn't necessary.

How you can restore Postgres-Database with WAL-Files and complete VM Backup | Restore:

  • backup:
    • [ Maybe create a new postgres checkpoint. for me it was not necessary but my last checkpoint wasn't too old; for checkpoints there is a direct way without pg_start_backup() ]
    • Do a simple backup of your VM containing the postgres-database. full/incremental -> your choice. (I do this while VM was running)
    • select pg_start_backup('some label') is NOT necessary.
      Just normal backup with [maybe a checkpoint before]
  • restore VM:
    • Do NOT boot this VM automatically. You need to make sure postgres doesn't start automatically. You can do this with special boot mode if you have, rename postgres binary or data-directory with live-linux-CD or having a script, which checks if system was restored so postgres should not start.
    • Boot VM
    • [ Maybe check pg_log-File if disabling postgres work. -> no new log file ]
  • restore database:
    • create the recovery.conf inside the $pgdata directory:
      restore_command = 'cp /[path_to_your_wal_backups]/%f "%p"' recovery_target_timeline = 'latest'
    • start postgres
    • see pg_log if restoring of wal files worked
    • [ connect to database. and search for new data as last test ]