Skip to main content

Postgres maintenance: vacuum hell, cluster to the rescue

I have an OpenNMS server (a PC really) sitting under a desk at a company that I don't hear from often, but I still get weekly backup mail-recaps.
This week the recap brought bad news: postgres stopped accepting connections until the db is vacuumed to prevent transaction id wraparound.

I have know of this issue of postgres for some time and planned accordingly: a cron job every month would run a full vacuum. Unfortunately in one month the db grew so much that vacuum never reached completion because the customer simply rebooted the server thinking that it was 'stuck'.

This vacuum-stuck-reboot had gone on for months until today. As I knew from prior experience this was going to be a painful experience: a quick du on the database folder reported 60GB of data. Vacuum would probably take days.

Luckily for me I had heard from the great Postgresql 9.0 high performance book that there is another tool in the toolbox called cluster.
So I decided to execute this plan this time:

  1. delete old events, alarms and notifications (by far the largest tables)
  2. cluster each table (including the 5 largest)
  3. vacuum full
Well, believe it or not the whole process took no longer that a couple of hours, most of which spent deleting records. I am not kidding you when I say that other times I have spent days waiting for vacuum to end.
And best of all the database shrunk from 60GB to 7.9GB! Most of the shrinking happened already after clustering.


Comments

Popular posts from this blog

Indexing Apache access logs with ELK (Elasticsearch+Logstash+Kibana)

Who said that grepping Apache logs has to be boring?

The truth is that, as Enteprise applications move to the browser too, Apache access logs are a gold mine, it does not matter what your role is: developer, support or sysadmin. If you are not mining them you are most likely missing out a ton of information and, probably, making the wrong decisions.
ELK (Elasticsearch, Logstash, Kibana) is a terrific, Open Source stack for visually analyzing Apache (or nginx) logs (but also any other timestamped data).

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive.
Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

Detect missed executions with OpenNMS

Everyone knows that OpenNMS is a powerful monitoring solution, but not everyone knows that since version 1.10 circa it embeds the Drools rule processing engine. Drools programs can then be used to extend the event handling logic in new and powerful ways.

The following example shows how OpenNMS can be extended to detect missed executions for recurring activities like backups or scheduled jobs.