VACUUM Command in PostgreSQL

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
  • User AvatarPradip
  • 29 Sep, 2024
  • 0 Comments
  • 5 Mins Read

VACUUM Command in PostgreSQL

VACUUM Command in PostgreSQL

In PostgreSQL, the vacuum command is a maintenance task that helps to optimize database performance and reclaim disk space. Using the PostgreSQL vacuum command involves removing deleted or outdated rows from tables and indexes and updating statistics used by the query planner. This process is necessary to prevent the accumulation of unnecessary data, known as “dead tuples” or “dead rows,” which can take up significant space and slow down queries.

Multi-version concurrency control (MVCC)

To maintain consistency and prevent data loss due to concurrent updates, PostgreSQL employs multi-version concurrency control (MVCC). PostgreSQL and other database management systems use MVCC to ensure consistent reads and prevent data loss from concurrent updates. PostgreSQL is achieved by storing multiple versions of each row within the database, allowing transactions to access a consistent data snapshot.

In PostgreSQL databases, each row within a single table is assigned a transaction ID referred to as an “xmin”. This ID signifies the transaction that inserted the row. When a row is updated or deleted, it is not immediately removed from the table. Instead, a new version of the row is inserted with a new transaction ID, while the old version is marked as “dead” with a transaction ID called an “xmax”.

When a transaction reads a row, it utilizes the xmin and xmax values to determine whether the row is visible to the transaction. If the xmin value is greater than the transaction’s “snapshot” (a record of the transaction IDs that were in progress when the transaction began), the row is not visible to the transaction. If the xmax value is equal to the transaction’s ID, the row has been deleted by the transaction and is also not visible. In all other cases, the row is visible to the transaction.

This allows transactions to access a consistent data snapshot, as they can only see rows committed when the transaction began. It also prevents data loss due to concurrent updates, as conflicting updates result in inserting new row versions rather than overwriting the existing data.

Although MVCC incurs some overhead in terms of storage and performance due to the need to maintain multiple versions of each row, it is a crucial feature of PostgreSQL and other database systems that support concurrent updates.

This allows multiple versions of each row to be stored, enabling transactions to access a consistent data snapshot. However, this can result in the accumulation of dead tuples as rows are updated or deleted.

Which Parameters Does PostgreSQL Accept for the VACUUM Command?

FULL, FREEZE, and VERBOSE are the optional arguments that VACUUM can accept. All these options serve different functionalities.

FULL: It is used to write the full content of a table into a new file. It assists the users in regaining all the unused space.

FREEZE: It is very much similar to the FULL option in terms of applicability. When the vacuum operation is performed, all the records are frozen.

VERBOSE: If the VERBOSE option is used with the VACUUM command, then the output will be in a more detailed format.

The difference between VACUUM FULL large_table and VACUUM large_table in PostgreSQL lies in their behavior and the impact on the table and database performance: 

  1. VACUUM large_table (Regular VACUUM):
  • Function:
  • Regular VACUUM reclaims storage by removing dead tuples (rows that were updated or deleted but not yet physically removed).
  • It marks the space occupied by these dead tuples as free, so future inserts or updates can reuse it.
  • It does not release space back to the operating system, but only to PostgreSQL for reuse within the table.
  • Locks:
  • Regular VACUUM does not lock the table, meaning it can run concurrently with other operations like SELECT, INSERT, UPDATE, or DELETE.
  • Performance:
  • It’s a non-blocking operation, so it’s generally safe to run during normal database activity.
  • However, it may take longer if the table has a large number of dead tuples.
  • When to use:
  • Use regular VACUUM to maintain healthy performance without locking the table, allowing PostgreSQL to reuse space for future operations.
  1. VACUUM FULL large_table:
  • Function:
  • VACUUM FULL performs a more aggressive vacuum. It compacts the table by physically rewriting it into a new file, removing dead tuples and releasing the unused space back to the operating system.
  • The new, smaller version of the table replaces the old one, making it more efficient in terms of disk usage.
  • Locks:
  • VACUUM FULL locks the table for the entire duration of the operation, meaning no other read or write operations can be performed on the table while it’s being vacuumed.
  • Performance:
  • It is much slower than regular VACUUM because it rewrites the entire table.
  • Due to the exclusive lock, it can cause downtime for that specific table, especially in environments with heavy read/write operations.
  • When to use:
  • Use VACUUM FULL when you need to reclaim space back to the operating system (for instance, after large DELETE operations).
  • This operation is more intrusive and should be run during maintenance windows or periods of low activity.

Key Differences:

Feature VACUUM large_table (Regular) VACUUM FULL large_table
Releases space to OS No Yes
Locks the table No Yes (Exclusive lock)
Concurrency Can run concurrently with other queries Blocks all other operations
Performance Faster, lightweight Slower, rewrites the entire table
Use case Routine maintenance Reclaim disk space after major DELETEs

Recommendation:

  • For routine maintenance, use regular VACUUM.
  • Use VACUUM FULL only when significant space needs to be reclaimed after large deletions, and you’re prepared for the table to be locked during the operation.

 

Steps to Simulate a Vacuum Issue: 
  • Create a Large Table with Random Data: Create a table and fill it with random data. This will simulate a workload in which the table gets bloated over time.
CREATE TABLE large_table (

id SERIAL PRIMARY KEY,

data TEXT

);

-- Insert 1 million rows

INSERT INTO large_table (data)

SELECT md5(random()::text)

FROM generate_series(1, 1000000);
  • Update Rows Frequently: Frequent updates can cause dead tuples in the table, which VACUUM will eventually need to clean up.

— Update rows frequently to create dead tuples

UPDATE large_table

SET data = md5(random()::text)

WHERE id % 10 = 0;

 

  • Delete a Substantial Number of Rows: Deleting rows creates dead tuples that are not removed until VACUUM is run.

— Delete 10% of the rows

DELETE FROM large_table

WHERE id % 10 = 0;
  • Run VACUUM and Monitor Behavior: You can then run VACUUM manually and check for issues.
  • Regular VACUUM:
VACUUM large_table;
  • VACUUM FULL (more aggressive):
VACUUM FULL large_table;

Monitor Table Statistics: You can use the following query to check for bloat before and after VACUUM:

SELECT

relname AS table_name,

pg_size_pretty(pg_table_size(relid)) AS table_size,

pg_size_pretty(pg_total_relation_size(relid)) AS total_size,

n_dead_tup AS dead_tuples,

n_live_tup AS live_tuples

FROM pg_stat_all_tables

WHERE relname = 'large_table';

Check pg_stat_activity: While running VACUUM, check the running queries and see if there are any blocking issues.

SELECT * FROM pg_stat_activity WHERE query ILIKE '%VACUUM%';
Disable Auto-Vacuum to Simulate Issues: 
ALTER TABLE large_table SET (autovacuum_enabled = false);