Buffer Bulk Eviction

Posted on 2024-12-31 In postgres

If bulk reads or writes are performed, there is a risk that one-time data can quickly oust useful pages from the buffer cache.

As a precaution, bulk operations use rather small buffer rings, and eviction is performed within their boundaries, without affecting other buffers.

A buffer ring of a particular size consists of an array of buffers that are used one after another. At first, the buffer ring is empty, and individual buffers join it one by one, after being selected from the buffer cache in the usual manner. Then eviction comes into play,but only within the ring limits

Buffers added into a ring are not excluded from the buffer cache and can still be used by other operations. So if the buffer to be reused turns out to be pinned, or its usage count is higher than one, it will be simply detached from the ring and replaced by another buffer.
PostgreSQL supports three eviction strategies.

strategy	trigger	buffer ring
Bulk reads	sequential scans of large tables if their size exceeds 1/4 of the buffer cache(128MB：16384 page）	256KB(32 page)
Bulk writes	applied by Copy from, create table as select , and create materialized view commands, as well as by those alter table flavors that cause table rewrites.	default: 16MB(2048 page)
Vacuuming	full table scan without taking the visibility map into account	256KB(32 page)

Buffer rings do not always prevent undesired eviction. If UPDATE or DELETE commands affect a lot of rows, the performed table scan applies the bulk reads strategy, but since the pages are constantly being modified, buffer rings virtually become useless.

Another example worth mentioning is storing oversized data in TOAST tables. In spite of a potentially large volume of data that has to be read, toasted values are always accessed via an index, so they bypass buffer rings.

Let’s take a closer look at the bulk reads strategy. For simplicity, we will create a table in such a way that an inserted row takes the whole page. By default, the buffer cache size is 16,384 pages, 8 kb each. So the table must take more than 4096 pages for the scan to use a buffer ring.

test=# SHOW shared_buffers;
 shared_buffers 
----------------
 128MB
(1 row)

test=# CREATE TABLE big(
test(# id integer PRIMARY KEY GENERATED ALWAYS AS IDENTITY, s char(1000)
test(# ) WITH (fillfactor = 10);
CREATE TABLE
test=# INSERT INTO big(s)
test-# SELECT 'FOO' FROM generate_series(1,4096+1);
INSERT 0 4097
test=# ANALYZE big;
ANALYZE
test=# SELECT relname, relfilenode, relpages FROM pg_class
test-# WHERE relname IN ('big', 'big_pkey');
 relname  | relfilenode | relpages 
----------+-------------+----------
 big      |       16487 |     4097
 big_pkey |       16492 |       14
(2 rows)

test=# EXPLAIN (analyze, costs off, timing off, summary off) SELECT * FROM big;
                 QUERY PLAN                 
--------------------------------------------
 Seq Scan on big (actual rows=4097 loops=1)
(1 row)

test=# SELECT count(*) FROM pg_buffercache
WHERE relfilenode = pg_relation_filenode('big'::regclass);
 count 
-------
    32
(1 row)

参考书目

Egor Rogov, PostgreSQL 14 Internals, https://postgrespro.com/community/books/internals