3
votes

Here are the top bloated tables before and after running a manual vacuum operation on all the tables in a heroku managed postgresql 9.2 database. As you can see, not much has changed, some waste has even increased...
What could the reason be? Is this normal behavior?

Before:

 type  | schemaname |       object_name      | bloat |   waste
-------+------------+------------------------+-------+------------
 index | public     | table_1                |   1.4 | 113 MB
 table | public     | table_2                |   1.1 | 92 MB
 table | public     | table_3                |   1.1 | 70 MB
 index | public     | table_4                |   1.2 | 66 MB
 index | public     | table_5                |   1.2 | 65 MB
 index | public     | table_6                |   1.2 | 64 MB
 index | public     | table_7                |   1.1 | 34 MB
 table | public     | table_8                |   1.1 | 19 MB

After:

 type  | schemaname |       object_name      | bloat |   waste
-------+------------+------------------------+-------+------------
 index | public     | table_1                |   1.4 | 123 MB
 table | public     | table_2                |   1.1 | 82 MB
 table | public     | table_3                |   1.1 | 82 MB
 index | public     | table_4                |   1.3 | 72 MB
 index | public     | table_5                |   1.3 | 72 MB
 index | public     | table_6                |   1.3 | 71 MB
 index | public     | table_7                |   1.1 | 39 MB
 table | public     | table_8                |   1.1 | 19 MB
2
You can use VACUUM FULL to shrink the tables to minimum size, but the effect will be temporary. Also do not forget about vacuuming indexes.Ihor Romanchenko

2 Answers

3
votes

If you need to pack your tables to minimum size, run VACUUM FULL. VACUUM does not try to compact data pages or free disk space, except from the end of a table (which is a cheap operation).

Normally, plain VACUUM is the much preferable approach. The space occupied by dead tuples can be reused by later updates, and updated row versions can be written to the same data page this way. If you pack everything tightly, new row versions always have to be appended to the end of the table. Some slack generally improves write performance - except for read-only tables, which would be candidates for VACUUM FULL (once) or even CLUSTER.

The client program vacuumdb has the -f (--full) switch.

Much more information in the Postgres Wiki on vacuuming.

2
votes

Tl;dr version: doesn't look odd.

When rows get updated or deleted, mvcc marks the old row as dead from a txid onward; a new one is inserted from that txid onward for inserts and updates.

Auto vacuum then kicks in on occasion, and is fine for normal DB operation.

Vacuum forces what auto vacuum normally does periodically and locally. You'd run that after a big update or delete, for instance.

What vacuuming does is, basically, trim dead rows from disk pages. And, unless I'm mistaking, create some new space for future new rows by splitting disk pages that are too full (as in too far from the table's fill factor), or remove needless space by merging disk pages that are too empty (for the same reason).