🔗 Read efficiency issues in PostgreSQL queries

263 words, 2 min read

⚠️ This post links to an external website. ⚠️

A lot of the time in database land, our queries are I/O constrained. As such, performance work often involves reducing the number of page reads. Indexes are a prime example, but they don’t solve every issue (a couple of which we’ll now explore).

The way Postgres handles consistency while serving concurrent queries is by maintaining multiple row versions in both the main part of a table (the “heap”) as well as in the indexes (docs). Old row versions take up space, at least until they are no longer needed, and the space can be reused. This extra space is commonly referred to as “bloat”. Below we’ll look into both heap bloat and index bloat, how they can affect query performance, and what you can do to both prevent and respond to issues.

In pgMustard, we originally called tips around this “Bloat Likelihood”, but we learned that bloat isn’t the only way queries can end up reading more data than necessary. Another is to do with data locality—for example if a query needs to read several rows that all happen to be on a single page, that’s quicker than if those rows are each on separate pages. We’ll look into this below too. As such, the tips in pgMustard are now titled “Read Efficiency”.

These read efficiency issues can be tricky to spot, especially without looking at the buffer numbers reported in EXPLAIN ANALYZE and pg_stat_statements, so I don’t see them discussed much. However, they are relatively common in slow query plans that I see, so I thought it might be worth writing about.

continue reading on www.pgmustard.com

If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts, subscribe use the RSS feed.