Importance of PostgreSQL Vacuum Tuning and Custom Scheduled Vacuum Job
source link: https://www.percona.com/blog/importance-of-postgresql-vacuum-tuning-and-custom-scheduled-vacuum-job/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Importance of PostgreSQL Vacuum Tuning and Custom Scheduled Vacuum Job
PostgreSQL’s built-in autovacuum – the housekeeper – is improving, version after version. It is becoming more capable while reducing its overhead and addressing edge cases. I think there is no PostgreSQL version that comes out without any autovacuum improvement, and no doubt that it is good enough for the majority of use cases.
But still, that is far from what is a perfect fit for any specific environment. While working with many of the customer environments we keep seeing cases where the built-in logic is not sufficient. As I keep doing fixes for many Percona customers, I thought of noting down important points for everyone.
Common Problems/Limitations
- Tables become candidates for autovacuum during peak hours.
The autovacuum settings are based on scale factors/thresholds. The chance that the table crosses these limits is high when there is a high number of transactions on the table – which is the peak hours. Effectively it gets kicked in at the very wrong time. - Starving tables.
It is very common to see a few tables become candidates for vacuuming too frequently and occupy all workers repeatedly. While other tables down the list of candidature remain unvacuumed for a long. The current autovacuum intelligence is not sufficient to understand who starved more and give a better priority - No way to control the throttle of autovacuum workers dynamically.
This is probably the worst. Even if there is an informed DBA, who wants to adjust theautovacuum_vacuum_cost_limit
based on need or time window and signal the PostgreSQL.
For example:
ALTER SYSTEM set autovacuum_vacuum_cost_limit = 2000;
select pg_reload_conf();
This has no effect on the currently running autovacuum workers. Only the next worker starting will consider this setting. So this can not be used for addressing the problem. - The attempt by DBAs to tune parameters often backfires.
After seeing the aged tables and starving tables, desperate DBAs keep aggressive settings and a higher number of workers. Many times this pushes the system way beyond its limit because everything gets in the wrong time with high aggression when the system has already a high number of active sessions. Multiplied by themaintenance_work_mem
allocations by workers. system performance suffers to a great extent. The worst I have seen is autovacuum workers occupying up to 50% of server resources. - Autovacuum during the active time window defeats its own purpose.
The autovacuum worker will be referring to an old xid/snapshot if it takes time to complete during the high activity window. So effectively it won’t be cleaning the dead tuples generated during the same duration, which is against the very purpose of autovacuum - Starved tables trigger wraparound prevention autovacuum.
It is very common to see that the tables which are starved for a longer duration of autovacuum reachautovacuum_freeze_max_age
and wraparound prevention aggressive vacuum get triggered.
Due to such ineffectiveness, we keep seeing DBAs tending to disable the autovacuum altogether and invite a bigger set of problems and even outages. At the very least, my request to anyone who is new to PostgreSQL is, please never try to turn off the autovacuum. That’s not the way to address autovacuum-related issues.
Tuning Autovacuum
Tuning autovacuum is obviously the first line of action.
Global level settings
The parameters autovacuum_vacuum_cost_limit
and autovacuum_vacuum_cost_delay
are the main two parameters to control the throttle of the autovacuum workers. autovacuum_max_workers
controls how many workers will be working at a time on different tables. By default, autovacuum_vacuum_cost_limit
will be disabled (-1) which means the value of the other parameter vacuum_cost_limit
will be in effect. So the very first thing suggestible is to set a value for autovacuum_vacuum_cost_limit
which will help us to control the autovacuum workers alone.
One common mistake I see across many installations is that autovacuum_max_workers
is set to a very high value like 15!. assuming that this makes the autovacuum run faster. Please remember that autovacuum_vacuum_cost_limit
is divided among all workers. So higher the number of workers, each worker runs slower. And slower workers mean ineffective cleanup as mentioned above. Moreover, each of them can occupy up to maintenance_work_mem
In general, the default value of autovacuum_max_workers
, which is 3 will be sufficient. Please consider increasing it only if it is an absolute necessity.
Table level analysis settings
The blanket tuning setting at the Instance level might not work great for at least a few tables. These outliers need special treatment and tuning the settings at table level might become unavoidable. I would start with those tables which becomes candidate too frequently for autovacuum.
PostgreSQL logs with log_autovacuum_min_duration
the setting gives great details of those tables which are frequently becoming candidates and those autovacuum runs which took considerable time and effort. Personally, I prefer this as the starting point. A summary of autovacuum runs can be obtained by comparing the autovacuum_count
of pg_stat_all_tables
taken in two different timestamps also. We need to consider is the HOT (Heap Only Tuple) updates and fillfactor
. Hot update information can be analyzed using the n_tup_hot_upd
of the same view (pg_stat_all_tables), Tuning this can bring down the vacuum requirements drastically.
Equipped with all this information analysis, specific table level settings can be adjusted. For example:
alter table t1 set (autovacuum_vacuum_scale_factor=0.0, autovacuum_vacuum_threshold=130000, autovacuum_analyze_scale_factor=0.0, autovacuum_analyze_threshold=630000, autovacuum_enabled=true, fillfactor=82); |
Supplementary Scheduled Vaccum Job
Our aim is not to disable the autovacuum, but to supplement the autovacuum with our knowledge about the system. It need not be complex at all. The simplest we can have is to run a ‘VACUUM FREEZE’ on tables which are having maximum age for itself or its TOAST.
For example, we can have vaccumjob.sql
file with the following content
WITH cur_vaccs AS (SELECT split_part(split_part(substring(query from '.*\..*'),'.',2),' ',1) as tab FROM pg_stat_activity WHERE query like 'autovacuum%') select 'VACUUM FREEZE "'|| n.nspname ||'"."'|| c.relname ||'";' from pg_class c inner join pg_namespace n on c.relnamespace = n.oid left join pg_class t on c.reltoastrelid = t.oid and t.relkind = 't' where c.relkind in ('r','m') AND NOT EXISTS (SELECT * FROM cur_vaccs WHERE tab = c.relname) order by GREATEST(age(c.relfrozenxid),age(t.relfrozenxid)) DESC limit 100; \gexec |
The query gets 100 aged tables that are not currently undergoing autovacuum and run a “VACUUM FREEZE” on them. (The \gexec at the end executes the query output)
This can be scheduled using corn for a low activity window like:
20 11 * * * /full/path/to/psql -X -f /path/to/vacuumjob.sql > /tmp/vacuumjob.out 2>&1 |
If there are multiple low-impact windows, all of them can be made use of using multiple schedules.
Practically we have seen that the supplementary, scheduled vacuum jobs, based on the table age approach have the following positive effects
- The chance of those tables becoming candidates again during the peak times is drastically reduced.
- Able to achieve very effective utilization of server resources during the off-peak times for the vacuum and freeze operation.
- Since the candidature was selected based on totally different criteria (age of table) than the default (scale factor and threshold), the chance of a few tables starving forever is eliminated. Moreover, that removes the possibility of the same table becoming a candidate for vacuum again and again.
- In customer/user environments, the wraparound prevention autovacuum is almost never reported again.
Summary
It is not rare to see systems where autovacuum remains untuned or poor settings are used at the instance level till the table level. Just want to summarize that:
- Default settings may not work great in most of the systems. Repeated autovacuum runs on a few tables while other tables starve for autovacuum is very common.
- Poor settings can result in autovacuum workers taking a considerable part of the server resources with little gain.
- Autovacuum has the natural tendency to start at the wrong time when a system undergoes heavy transactions.
- Practically a scheduled vacuum job becomes necessary for those systems which undergo heavy transactions and with a large number of transaction tables, and are expected to have spikes, and peak time periods of load.
Clear analysis and tuning are important. And it is always highly recommended to have a custom vacuum job that takes up your knowledge about the system and time windows of the least impact.
Recommend
-
60
PostgreSQL and Performance Performance is one of the key requirements in software architecture design, and has been the focus of PostgreSQL developers since
-
28
Implementation of MVCC (Multi-Version Concurrency Control) in PostgreSQL is different and special when compared with other RDBMS. MVCC in Postg...
-
24
-
11
As heavy users of PostgreSQL since 2012, we’ve learned quite a bit about operating PostgreSQL at scale. Our Manta object storage system uses a large fleet of sharded, highly-...
-
47
I’ve written about Full Text Indexing in PostgreSQL before but I was a bit more focused on speed and general...
-
16
Vacuum is one of the sub-systems in PostgreSQL which gets improved with each release. I have checked past five
-
8
PostgreSQL High-Performance Tuning and OptimizationThis site uses cookies and other tracking technologies to assist with navigation, analyze your use of our products and services, assist with promotional and marketing efforts, allow you to...
-
4
Angelika Salmen September 12, 2022 9 minute read...
-
5
PostgreSQL Performance Tuning and Optimization Guide We now have a YouTube Channel. Subscribe for the video...
-
4
OpenAI opens GPT-3.5 Turbo up for custom tuning / OpenAI says that by fine-tuning its chatbot to focus on specific tasks — like code completion or maintaining a consistent tone — businesses can make ChatGPT a more eff...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK