Debezium causes Postgres to run out of disk space on RDS

Question

I have a small Postgres development database running on Amazon RDS, and I'm running K8s. As far as I can tell, there is barely any traffic. I want to enable change capture, I've enabled rds.logical_replication, started a Debezium instance, and the topics appear in Kafka, and all seems fine.

After a few hours, the free disk space starts tanking:

It started to consume disk at a constant rate, and eat up all of the 20Gb available within 24 hours. Stopping Debezium doesn't do anything. The way I got my disk space back was by:

select pg_drop_replication_slot('services_debezium')

and:

vacuum full

Then, after a few minutes, as you can see in the graph, disk space is reclaimed.

Any tips? I would love to see what is it what's actually filling up the space, but I don't think I can. Nothing seems to happen on the Debezium side (no ominous logs), and the Postgres logs don't show anything special either. Or is there some external event that triggers the start of this?

As far as I understood, there is an 'invisible' AWS database on the instance that does stuff, which shares the WAL and has quite a bit of activity. So if change capture isn't progressing, either because there is no activity on your database or some other cause, it will eat disk space pretty quickly. Setting up a heartbeat 'heartbeat.interval.ms' helps when your database has very little activity. — Frank Lee

Laurenz Albe Laurenz Albe · Accepted Answer · 2020-12-22T15:07:07

The replication slot is the problem. It marks a position in the WAL, and PostgreSQL won't delete any WAL segments newer than that. Those files are in the pg_wal subdirectory of the data directory.

Dropping the replication slot and running CHECKPOINT will delete the files and free space.

The cause of the problem must be misconfiguration of Debrezium: it does not consume changes and move the replication slot ahead. Fix that problem and you are good.

Debezium causes Postgres to run out of disk space on RDS

3 Answers