PostgreSQL
PostgreSQL is the RDMS (database system) that we use at Overloop.
Basic information
Like any SQL database engine, PostgresSQL is a software that allows executing SQL commands.
To install it, just get the package using brew:
Then, we can start executing SQL commands with the psql
command-line interface, or, by connecting it to our favorite backend system.
Usage
In order to use well the database, please follow the rules below:
Contact it as fewer as possible
Try to group your queries.
Make it return only what it needs
Unnecessary data can cause the database to transmit a lot of data over the network and so, answer slowly.
So, prefer doing: SELECT id FROM ...
, than SELECT * FROM ...
Monitor it frequently Like any complex engine, it can sometimes be slow or work a lot in unexpected situations. So, that's important to check periodically that the DB is going well using the monitoring tips given below. Check that all tables have a reasonable size, that indexes are used, that queries are fast, ...
Monitoring
Overall diagnosis on Heroku
If you are using Postgres on Heroku, you can just run the command below to have some basic insights:
To clean the data cache, just run:
Table & index sizes
To get the size of all tables
To get the size of all indexes:
Index usage
Slow queries
Run the snippet below to find all slow queries:
The data used to find them can be reset using:
Table usage
To find the usage of a single table (not studying a particular slow query), run:
Optimize a single query
When you have found a slow query, run the EXPLAIN command (see doc) to find the origin of the problem.
You'll obtain a detailed query plan ( explanation of what does the DB do to run a particular query). For more information about how to read it, check this article.
Common issue: missing index
Very often, problems of performance in the database can be explained by the lack of an index. In that case, the query plan will look like:
It states that the DB is doing:
An index scan on the primary key of the table
Then, a filtering on the values returned by the index
This kind of plan is very slow, because it has to browse a whole index to find a particular record.
Adding an index will change it to:
Here, it's much faster because instead of filtering the index, it's putting a condition (i.e. using the index), thus, increasing the DB performance.
Troubleshooting performance
Sometimes, you can see that some requests may experience timeouts, or slowness. This can be discovered by seeing logs that indicate "ERROR: canceling statement due to statement timeout", or, because some features appear to be slower than expected.
There can be many causes for that. But, generally, you can use the procedure below to find what is going wrong.
Step 1: Find the slow query
First step of the investigation is to find the query that is running slow.
To do that, search in the logs for the following statement:ERROR: canceling statement due to statement timeout
.
Around that, you will find the exact query causing the problem.
It can be something like:
Step 2: Reproduce
Once you've got the problematic query, try reproducing the issue, replacing the placeholder values with values that are issuing the same slow query.
You should do that, if possible, on the same environment on which was running the original query, since the amount of data which caused the problem can be part of the answer.
Note that for some queries, running it once can make subsequent calls faster, thanks to the cache. So you must find the right set of parameters to reproduce the problem.
Step 3: Analyze
Then, you need to analyze the query.
Generally, you can solve the problem using the section Optimize a single query in the above Monitoring section, using a simple ANALYZE command. If the problem was due to a missing index, you can finish here.
Step 4: Analyze temporal correlations
If the previous step did not work, you'll have to look after the root cause elsewhere.
Try making a time correlation analysis, by having a look at the general state of the database, other bugs, etc… to see if the problem is not due to the timeframe of the request.
An example can be that the DB is particularly overloaded at a particular moment in time.
This can give us logs like the following:
Here, if we have timeouts between 00:00 and 01:00 AM, it can be because the database was generally overloaded.
Step 5: Fix overall performance degradation
If the root cause is due to a temporary database overload, try the following if this applies:
Look for other slow queries, which can impact the overall performance of the DB and fix them (don't focus on the initial issue)
Look at all other monitoring tools, such as overall cache/index hit rate, and try to improve it if it was too low, by either : fixing other issues, looking for poor index, table usage for other queries, running vacuum or rebuilding indexes, and, as a last resort, tune Postgres settings (like cache buffers), or upgrade the DB.
IO and cache
Note that, even when all of those requests are applied, we're still limited by the amount of IO a database can perform per second. Even with the best indexes, query optimization, … retrieving terabytes of data will take ages. That's why having a proper "raw" IO performance is also an important element of the database.
If one expect a table to be easily sortable, filterable, in many dimensions, that's possible without indexes. For that, it needs to fit entirely in RAM, and to be cached by postgres. With that properly sets, we can expect a search within millions of records to be performed under <10 seconds.
We thus need to ensure that:
Database RAM > Size of our working set
Database CPU or IOPS are not overloaded
There is no query flushing the cache of the database.
Postgres is aware that the data should fit in RAM (it is frequently called)
To get global information about which request is reading a certain amount of data from the disk, and thus, are likely to be in RAM, you can use the following query.
select query,calls, rows, blk_read_time from pg_stat_statements order by blk_read_time desc limit 10;
Last updated