PostgreSQL

PostgreSQL is the RDMS (database system) that we use at Overloop.

Basic information

Like any SQL database engine, PostgresSQL is a software that allows executing SQL commands.

To install it, just get the package using brew:

$ brew install postgres

Then, we can start executing SQL commands with the psql command-line interface, or, by connecting it to our favorite backend system.

Usage

In order to use well the database, please follow the rules below:

Contact it as fewer as possible

Try to group your queries.

# Instead of doing: 
SELECT * FROM table WHERE id = 1 
SELECT * FROM table WHERE id = 1
# Do 
SELECT * FROM table WHERE id IN (1, 2) 

Make it return only what it needs

Unnecessary data can cause the database to transmit a lot of data over the network and so, answer slowly. So, prefer doing: SELECT id FROM ... , than SELECT * FROM ...

Monitor it frequently Like any complex engine, it can sometimes be slow or work a lot in unexpected situations. So, that's important to check periodically that the DB is going well using the monitoring tips given below. Check that all tables have a reasonable size, that indexes are used, that queries are fast, ...

Monitoring

Overall diagnosis on Heroku

If you are using Postgres on Heroku, you can just run the command below to have some basic insights:

$ heroku pg:diagnose -a app-name

To clean the data cache, just run:

$ heroku pg:stats_reset -a app-name

Table & index sizes

To get the size of all tables

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 25;

To get the size of all indexes:

SELECT s.schemaname,
       s.relname AS tablename,
       s.indexrelname AS indexname,
       pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,
       s.idx_scan as usage
FROM pg_catalog.pg_stat_user_indexes s
   JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE 0 <>ALL (i.indkey)  -- no index column is an expression
  AND NOT i.indisunique   -- is not a UNIQUE index
  AND NOT EXISTS          -- does not enforce a constraint
         (SELECT 1 FROM pg_catalog.pg_constraint c
          WHERE c.conindid = s.indexrelid)
  AND s.relname = 'table_name'
ORDER BY pg_relation_size(s.indexrelid) DESC;

Index usage

-- Index hit rate
WITH idx_hit_rate as (
SELECT 
  relname as table_name, 
  n_live_tup,
  round(100.0 * idx_scan / (seq_scan + idx_scan),2) as idx_hit_rate
FROM pg_stat_user_tables WHERE (seq_scan + idx_scan) > 0 
ORDER BY n_live_tup DESC
),

-- Cache hit rate
cache_hit_rate as (
SELECT
 relname as table_name,
 heap_blks_read + heap_blks_hit as reads,
 round(100.0 * sum (heap_blks_read + heap_blks_hit) over (ORDER BY heap_blks_read + heap_blks_hit DESC) / sum(heap_blks_read + heap_blks_hit) over (),4) as cumulative_pct_reads,
 round(100.0 * heap_blks_hit / (heap_blks_hit + heap_blks_read),2) as cache_hit_rate
FROM  pg_statio_user_tables
WHERE heap_blks_hit + heap_blks_read > 0
ORDER BY 2 DESC
)

SELECT 
  idx_hit_rate.table_name,
  idx_hit_rate.n_live_tup as size,
  cache_hit_rate.reads,
  cache_hit_rate.cumulative_pct_reads,
  idx_hit_rate.idx_hit_rate,
  cache_hit_rate.cache_hit_rate
FROM idx_hit_rate, cache_hit_rate
WHERE idx_hit_rate.table_name = cache_hit_rate.table_name
  AND cumulative_pct_reads < 100.0
ORDER BY reads DESC;

Slow queries

Run the snippet below to find all slow queries:

select query,calls,total_time,min_time,max_time,mean_time,stddev_time,rows 
from pg_stat_statements order by mean_time desc;

The data used to find them can be reset using:

SELECT pg_stat_statements_reset()

Table usage

To find the usage of a single table (not studying a particular slow query), run:

SELECT * FROM pg_stat_user_tables WHERE relname = 'table_name' ORDER By relname;

Optimize a single query

When you have found a slow query, run the EXPLAIN command (see doc) to find the origin of the problem.

EXPLAIN query # Explain the query plan of the query 
EXPLAIN ANALYZE query # Explain the query plan and run the query 
EXPLAIN (analyze,buffers,costs off) query # More detailled explain call

You'll obtain a detailed query plan ( explanation of what does the DB do to run a particular query). For more information about how to read it, check this article.

Common issue: missing index

Very often, problems of performance in the database can be explained by the lack of an index. In that case, the query plan will look like:

QUERY PLAN
Limit  (cost=0.09..240.66 rows=2 width=501)
  ->  Index Scan Backward using table_pkey on table  (cost=0.09..209539.43 rows=1742 width=501)
        Filter: (foreign_key_id = 2)

It states that the DB is doing:

  1. An index scan on the primary key of the table

  2. Then, a filtering on the values returned by the index

This kind of plan is very slow, because it has to browse a whole index to find a particular record.

Adding an index will change it to:

Limit  (cost=0.09..4.09 rows=1 width=501)
  ->  Index Scan using table_pkey on table (cost=0.09..4.09 rows=1 width=501)
        Index Cond: (id = 2)

Here, it's much faster because instead of filtering the index, it's putting a condition (i.e. using the index), thus, increasing the DB performance.

Troubleshooting performance

Sometimes, you can see that some requests may experience timeouts, or slowness. This can be discovered by seeing logs that indicate "ERROR: canceling statement due to statement timeout", or, because some features appear to be slower than expected.

There can be many causes for that. But, generally, you can use the procedure below to find what is going wrong.

Step 1: Find the slow query

First step of the investigation is to find the query that is running slow.

To do that, search in the logs for the following statement:ERROR: canceling statement due to statement timeout.

Around that, you will find the exact query causing the problem.

It can be something like:

SELECT COUNT(*) FROM mytable WHERE columnname = ? 

Step 2: Reproduce

Once you've got the problematic query, try reproducing the issue, replacing the placeholder values with values that are issuing the same slow query.

You should do that, if possible, on the same environment on which was running the original query, since the amount of data which caused the problem can be part of the answer.

Note that for some queries, running it once can make subsequent calls faster, thanks to the cache. So you must find the right set of parameters to reproduce the problem.

Step 3: Analyze

Then, you need to analyze the query.

Generally, you can solve the problem using the section Optimize a single query in the above Monitoring section, using a simple ANALYZE command. If the problem was due to a missing index, you can finish here.

Step 4: Analyze temporal correlations

If the previous step did not work, you'll have to look after the root cause elsewhere.

Try making a time correlation analysis, by having a look at the general state of the database, other bugs, etc… to see if the problem is not due to the timeframe of the request.

An example can be that the DB is particularly overloaded at a particular moment in time.

This can give us logs like the following:

Here, if we have timeouts between 00:00 and 01:00 AM, it can be because the database was generally overloaded.

Step 5: Fix overall performance degradation

If the root cause is due to a temporary database overload, try the following if this applies:

  • Look for other slow queries, which can impact the overall performance of the DB and fix them (don't focus on the initial issue)

  • Look at all other monitoring tools, such as overall cache/index hit rate, and try to improve it if it was too low, by either : fixing other issues, looking for poor index, table usage for other queries, running vacuum or rebuilding indexes, and, as a last resort, tune Postgres settings (like cache buffers), or upgrade the DB.

IO and cache

Note that, even when all of those requests are applied, we're still limited by the amount of IO a database can perform per second. Even with the best indexes, query optimization, … retrieving terabytes of data will take ages. That's why having a proper "raw" IO performance is also an important element of the database.

If one expect a table to be easily sortable, filterable, in many dimensions, that's possible without indexes. For that, it needs to fit entirely in RAM, and to be cached by postgres. With that properly sets, we can expect a search within millions of records to be performed under <10 seconds.

We thus need to ensure that:

  • Database RAM > Size of our working set

  • Database CPU or IOPS are not overloaded

  • There is no query flushing the cache of the database.

  • Postgres is aware that the data should fit in RAM (it is frequently called)

To get global information about which request is reading a certain amount of data from the disk, and thus, are likely to be in RAM, you can use the following query.

select query,calls, rows, blk_read_time from pg_stat_statements order by blk_read_time desc limit 10;

Last updated