- The database statistics are exposed using views. The fastest way to spot performance issues is by using the pg_stat_statements view
- Particularly valuable statistics to monitor include table/index caching and query index usage statistics.
- Statistics can help you find and sort the queries that are responsible for most of the load on the system.
PostgreSQL has many features that can help you monitor your databases, and you will learn to use them in this section. There is no way to improve performance and reliability without first collecting the necessary data. Afterward, you could make educated decisions based on facts.
PostgreSQL offers a large set of system views that allows administrators and developers alike to take a deep look into what is really going on in their system. The challenge is to really make sense out of this plethora of metrics. Therefore, the goal in this section is to make use of PostgreSQL statistics and to make it easier for people to take advantage of their insights.
Taking advantage of pg_stat_statements
One of the easiest and fastest methods to spot performance problems is to use pg_stat_statements view. This will give us information about queries within the system. It helps us figure out which types of queries are slow and how often these queries are called.
This view was introduced in version 8.4, and it replaces several situations that required parsing the database log files. This view gathers profiling information needed for query tuning.
This module is not installed by default. It needs to be installed in each database you use it against. Here’s an example of how you can install it on your system.
- In postgresql.conf file, you should search for shared_preload_libraries and if necessary uncomment it. Then add ‘pg_stat_statements ’ as value
- Restart the database server
- Run “CREATE EXTENSION pg_stat_statements;” for the database of interest
We can inspect the definition of this view:
pgbench=# \d pg_stat_statements View "public.pg_stat_statements" Column | Type | Collation | Nullable | Default ---------------------+------------------+-----------+----------+--------- userid | oid | | | dbid | oid | | | queryid | bigint | | | query | text | | | calls | bigint | | | total_time | double precision | | | min_time | double precision | | | max_time | double precision | | | mean_time | double precision | | | stddev_time | double precision | | | rows | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | | local_blks_hit | bigint | | | local_blks_read | bigint | | | local_blks_dirtied | bigint | | | local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | |
When using this view, PostgreSQL will separate queries and parameters. It will put placeholders into the query. Therefore, identical queries but with different parameters can be aggregated. SELECT … FROM x WHERE y = 1 will be turned into SELECT … FROM x WHERE y = ?.
For each query, PostgreSQL will tell us the total time it has consumed, along with the number of calls.
The first parameter that we will discuss is stddev. The standard deviation is especially significant because it will tell us whether a query has stable or fluctuating runtimes. Unstable runtimes might occur because of the following reasons:
- Concurrency and locking
- PostgreSQL chooses different execution plans
- Requested data might be or not cache
In this regard, PostgreSQL will tell us about the caching behavior of a query. The columns starting with shared_ will indicate to us if blocks came from the cache (_hit) or from the operating system (_read). If many blocks come from the operating system, the runtime of a query might fluctuate.
Next, PostgreSQL provides information about temporary file I/O. These files can naturally happen when a large index is built or when some large DDL is executed. However, in OLTP systems having temporary files is usually a bad thing. Temporary files will slow down the entire system by potentially blocking the disk. A high amount of temporary file I/O can be caused by things such as:
- Suboptimal work_mem settings (OLTP)
- Suboptimal maintenance_work_mem settings (DDLs)
Finally, there are two fields containing information about how much time was consumed by I/O operations. By default, these two fields are empty. The reason for this is that measuring timing can involve quite a lot of overhead on some systems. In order to measure the time for I/O we should set track_io_timing parameter to true.
Top 10 time-consuming queries
We can run the following query to get an overview of all run queries. However, it’s important to create a sorted output so that the most relevant information can be seen instantly.
pgbench=# SELECT round((100 * total_time / sum(total_time) OVER ())::numeric, 2) percent, round(total_time::numeric, 2) AS total, calls, round(mean_time::numeric, 2) AS mean, substring(query, 1, 40) FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10; percent | total | calls | mean | substring ---------+----------+--------+-------+------------------------------------------ 99.69 | 42123.26 | 200000 | 0.21 | SELECT abalance FROM pgbench_accounts 0.21 | 87.57 | 1 | 87.57 | CREATE EXTENSION pg_stat_statements 0.08 | 32.50 | 1 | 32.50 | vacuum pgbench_branches 0.01 | 5.38 | 1 | 5.38 | vacuum pgbench_tellers 0.00 | 2.01 | 1 | 2.01 | truncate pgbench_history ....
It shows the top 10 queries by total time consumed. It also makes sense to display the average execution time of the queries so that you can see if the runtime for those queries is higher than expected. Checking through the top 1,000 queries is usually not worth it. Normally, the first queries are responsible for most of the load on the system.
Remember that track_activity_query_size is by default set to 1,024 bytes. This means that queries which are bigger in size are cut off before the interesting part is shown. Therefore, it is recommended to increase this value, especially for Java Client such as Hibernate.
Inspecting databases wide statistics
At the database level, we could inspect statistics via pg_stat_database view as follow:
pgbench=# \d pg_stat_database View "pg_catalog.pg_stat_database" Column | Type | Collation | Nullable | Default ----------------+--------------------------+-----------+----------+--------- datid | oid | | | datname | name | | | numbackends | integer | | | xact_commit | bigint | | | xact_rollback | bigint | | | blks_read | bigint | | | blks_hit | bigint | | | tup_returned | bigint | | | tup_fetched | bigint | | | tup_inserted | bigint | | | tup_updated | bigint | | | tup_deleted | bigint | | | conflicts | bigint | | | temp_files | bigint | | | temp_bytes | bigint | | | deadlocks | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | | stats_reset | timestamp with time zone | | |
First, blks_hit and blks_read will tell you about cache hits and cache misses. Here these two columns refer to shared buffer hits and shared buffer misses. There is no reasonable way, on the database level, to distinguish filesystem cache hits and real disk hits. Thus, you could correlate disk wait and cache misses in pg_stat_database to get an idea of what really goes on in the system.
If you want to see if your application tends to commit or rollback, you can checks it via xact_commit and xact_rollback columns.
Next, we have the tup_ columns which will tell you whether there is a lot of reading or a lot of writing going on in your system.
Then, we have temp_files and temp_bytes. These two columns are quite important because if you frequently write temporary files to disk then you will inevitably have slow operations. The major reasons for the high usage of temporary files could be:
- Poor settings: If your work_mem settings are too low, there is no way to do anything in RAM, and therefore PostgreSQL will go to disk.
- Expensive queries: If you see many temporary files on an OLTP system, consider checking for expensive queries.
- Indexing and other administrative tasks: Once in a while, people create indexes and run DDLs operation. These processes can lead to temporary files but are not necessarily considered a problem.
Thus, it definitely makes sense to keep an eye on temporary files to ensure that these files are not needed frequently.
Finally, there are two more relevant fields: blk_read_time and blk_write_time. By default, these two fields are empty and no data is collected. The idea behind these fields is to give you a way of seeing how much time was spent on I/O. A reasonable overhead is around 2 digits nanoseconds.
Inspecting table statistics
Going further, we are now interested in statistics at the table level. To do this we’ll inspect pg_stat_user_tables view which gives us information on tables defined by us, and we’ll ignore the system’s tables.
pg_stat_user_tables is one of the most important system views. When used properly, it could be a revelation in some cases.
The first useful thing you can use this data for is monitoring how well vacuum is working. You can see live, and dead tuple counts. Also, you’ll see timestamps for when vacuum and autovacuum last processed the table. Information about how to monitor that data is covered in Configuring Vacuum for Performance.
Secondly, you can use this view to determine whether tables are being accessed by sequential or index scans. Before seeing the view, we should run a command to create some dummy load.
pgbench --username postgres -S -c 8 -t 25000 pgbench pgbench=# SELECT schemaname,relname,seq_scan,idx_scan, cast(idx_scan AS numeric) / (idx_scan + seq_scan) AS idx_scan_pct FROM pg_stat_user_tables WHERE (idx_scan + seq_scan)>0 ORDER BY idx_scan_pct; schemaname | relname | seq_scan | idx_scan | idx_scan_pct ------------+------------------+----------+----------+--------------------- public | pgbench_branches | 1 | 0 | 0.0000 public | pgbench_accounts | 0 | 200000 | 1.0000 (2 rows)
Here, pgbench_branches table is accessed by a sequential scan because all of the data fits into a single data page. However, the pgbench_accounts table is large enough that SELECT statements on it are using an index to lookup values.
More noteworthy is how many tuples were actually processed by these scans.
pgbench=# SELECT relname,seq_tup_read,idx_tup_fetch, cast(idx_tup_fetch AS numeric) / (idx_tup_fetch + seq_tup_read) AS idx_tup_pct FROM pg_stat_user_tables WHERE (idx_tup_fetch + seq_tup_read)>0 ORDER BY idx_tup_pct; relname | seq_tup_read | idx_tup_fetch | idx_tup_pct ------------------+--------------+---------------+------------------------ pgbench_branches | 50 | 0 | 0.00000000000000000000 pgbench_accounts | 0 | 200000 | 1.00000000000000000000 (2 rows)
Both metrics, the number of scans and how many total tuples each fetched are significant numbers to monitor, as they suggest both how many queries run and the total volume executed by them.
A similar query worth monitoring is how often “HOT” is being used to update rows, instead of a less efficient regular update:
pgbench=# SELECT relname,n_tup_upd,n_tup_hot_upd, cast(n_tup_hot_upd AS numeric) / n_tup_upd AS hot_pct FROM pg_stat_user_tables WHERE n_tup_upd>0 ORDER BY hot_pct; relname | n_tup_upd | n_tup_hot_upd | hot_pct ---------+-----------+---------------+--------- pgbench_accounts | 28142 | 26499 | 0.94161 pgbench_branches | 28142 | 28142 | 1.00000 (2 rows)
In this example, almost all of the updates are processed using HOT UPDATES, which is what you’d like to see for the best UPDATE performance. This percentage is particularly valuable to monitor when doing initial testing of your database, to confirm whether or not you are satisfying the conditions for which HOT UPDATES kicks in.
When running an UPDATE, PostgreSQL has to copy a row to ensure that ROLLBACK will work correctly. HOT UPDATE comes in handy because it allows PostgreSQL to ensure that a row does not have to leave a block or page. The copy of the row stays inside the same page, which is beneficial for performance in general. A fair amount of HOT UPDATE indicates that you are on the right track in case of an intense workload of UPDATE.
Finally, a useful derived view is to consider the insert/update/delete characteristics of your tables:
pgbench=# SELECT relname, cast(n_tup_ins AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS ins_pct, cast(n_tup_upd AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS upd_pct, cast(n_tup_del AS numeric) / (n_tup_ins + n_tup_upd + n_tup_del) AS del_pct FROM pg_stat_user_tables ORDER BY relname; ---------+-----------+---------------+--------- pgbench_accounts | 0.78038 | 0.21961 | 0.00000 pgbench_branches | 0.00003 | 0.99996 | 0.00000 pgbench_history | 1.00000 | 0.00000 | 0.00000
This confirms that pgbench_history is what’s sometimes called an append-only table, one that is inserted into but never updated or deleted from. It also shows that pgbench_branches is being heavily updated relative to insertions. In this case, tables are likely to need periodic REINDEX operations. Similarly, you can use the deletion percentage figure for finding tables that are likely to have a large number of sparse data or index blocks. In this case, you might need an operation like CLUSTER to fully clean up after.
Detect which tables need an index
Reading all of this data might be interesting; however, unless you are able to make sense out of it, it is pretty pointless. One way to use pg_stat_user_tables is to detect which tables might need an index using the following query.
SELECT schemaname, relname, seq_scan, seq_tup_read, seq_tup_read / seq_scan AS avg, idx_scan FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC LIMIT 25;
The idea is to find large tables that have been used frequently in a sequential scan. Those tables will come out on top of the list with huge values for seq_tup_read. Sequential scans are not always bad. They appear naturally in backups and analytical statements. However, if you are running large sequential scans all the time, your performance will suffer.
Note that this query is really valuable since it will help you spot tables with missing indexes. It’s a known fact that missing indexes are the single most important reason for bad performance.
In addition to the operation statistics, there is also a set of counters in the database that concentrate on physical I/O. These can be found in pg_statio_user_tables view. When a read happens, the database distinguishes between data that is read using a block already in the database buffer cache, or whether it required a read from the operating system. If data is read from buffer cache, we call it a heap block hit.
pgbench=# SELECT relname, cast(heap_blks_hit as numeric) / (heap_blks_hit + heap_blks_read) AS hit_pct,heap_blks_hit,heap_blks_read FROM pg_statio_user_tables WHERE (heap_blks_hit + heap_blks_read)>0 ORDER BY hit_pct; relname | hit_pct | heap_blks_hit | heap_blks_read ------------------+------------------------+---------------+---------------- pgbench_accounts | 0.23566500000000000000 | 47133 | 152867 pgbench_tellers | 0.36363636363636363636 | 4 | 7 pgbench_branches | 0.37500000000000000000 | 3 | 5 (3 rows)
In this example, the database is not well cached since we have a pretty high number of reads that require actual disk I/O.
A similar query shows disk I/O but this time for every index on this table:
pgbench=# SELECT relname, cast(idx_blks_hit as numeric) / (idx_blks_hit + idx_blks_read) AS hit_pct,idx_blks_hit,idx_blks_read FROM pg_statio_user_tables WHERE (idx_blks_hit + idx_blks_read)>0 ORDER BY hit_pct; relname | hit_pct | idx_blks_hit | idx_blks_read ------------------+------------------------+--------------+--------------- pgbench_tellers | 0.00000000000000000000 | 0 | 1 pgbench_branches | 0.50000000000000000000 | 1 | 1 pgbench_accounts | 0.94779648948859524645 | 569729 | 31380 (3 rows)
Inspecting index statistics
While pg_stat_user_tables is essential for spotting missing indexes, it is sometimes necessary to find indexes that shouldn’t actually exist. While this might not be a problem if your database is really small, it does make a difference in the case of large systems. If you have hundreds of gigabytes of pointless indexes, it can seriously harm your overall performance.
pgbench=# SELECT schemaname, relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS idx_size FROM pg_stat_user_indexes; schemaname | relname | indexrelname | idx_scan | idx_size ------------+----------------------+-----------------+----------+----- public | pgbench_branches | pgbench_branches_pkey | 0 | 16 kB public | pgbench_tellers | pgbench_tellers_pkey | 0 | 32 kB public | pgbench_accounts | pgbench_accounts_pkey | 200000 | 107 MB public | test_indexing | idx_id | 0 | 107 MB
The output of this statement can be very useful. It contains information about how often an index was used and it also tells us how much space has been wasted for each index. It is difficult to come up with a general rule regarding when to drop an index, so some manual checking makes sense.
idx_tup_read vs idx_tup_fetch
There is a subtle but important difference between these two columns in pg_stat_user_indexes view.
idx_tup_read represents the number of index entries returned by index scans. However, idx_tup_fetch returns the number of live table rows fetched by index scans. Live rows highlight rows that are not yet committed or dead rows. This difference can point a valuable insight.
pgbench=# SELECT indexrelname, cast(idx_tup_read AS numeric) / idx_scan AS avg_tuples, idx_scan,idx_tup_read FROM pg_stat_user_indexes WHERE idx_scan > 0; indexrelname | avg_tuples | idx_scan | idx_tup_read ----------------------+------------+----------+------------- pgbench_accounts_pkey | 1.01016 | 200629 | 202668
The amount for avg_tuples is greater than 1. This indicates that, occasionally, a dead tuple is scanned.
Usually, it is appropriate to set up a monitoring system using pre-built external tools. However, knowing how to directly query and formulate interesting conclusions from the database activity can prove really valuable. Finally, monitoring activity and statistics lookup is a must, when it comes to troubleshooting system performance issues.