Skip to main content

OpenNMS: PostgreSQL 9.1 tuning

I have just completed an upgrade from OpenNMS 1.8.11 to the latest and greatest 1.10. The upgrade in itself is easy and the guides on the OpenNMS wiki will serve you well. Instead in this post I'll describe a couple of other changes that I made which improved very much the overall performance and responsiveness of the system.

One is the upgrade from PostgreSQL 8.4 (which came with CentOS) to 9.1 + tuning.
The other is switching from apache to nginx.

Upgrading postgres is mostly a matter of taking a backup, pulling in the right repo, running yum install and finally importing the database.

Tuning postgres
I left opennms running on PostgreSQL 9.1 for a while and then I went checking how well postgres was doing. Postgres 9 already performs significantly better that its 8.x predecessors, but I wanted to do better than out-of-the-box.

As the postgres user I logged in into the opennms database to install a utility that will help me estimate how much of the database is being cached. If the server had enough memory I could then configure postgres to use more memory for caching which means less disk i/o and overall better performance (unless your server starts thrashing, that is).

CREATE EXTENSION pg_buffercache;
create view v_database_cache as SELECT c.relname,pg_size_pretty(count(*) * 8192) as buffered,round(100.0 * count(*) /(SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent,round(100.0 * count(*) * 8192 /pg_relation_size(c.oid),1) AS percent_of_relation 
FROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode
INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.oid,c.relname ORDER BY 3 DESC LIMIT 10;

The second command creates a view as a placeholder for the complex query behind it. The query is documented in the (highly recommended) book PostgreSQL 9.0 High performance.
I then queried the database for its size as follows:

opennms=# select pg_size_pretty(pg_database_size('opennms')) as db_size;
 691 MB
(1 row)

Since the whole database is a little less than 700MB and the server has 4GB of RAM I thought that I could raise the shared_buffers param from the measly 32MB that is the default to something more appropriate like 512MB. After that I restarted both opennms and postgres and waited for a while to make sure that the server wasn't swapping. Two days after I queried the view I created above to see how much of the database was in the cache (which means in RAM):

opennms=# select * from v_database_cache;
            relname            | buffered | buffers_percent | percent_of_relation 
 events                        | 266 MB   |            52.0 |                64.5
 event_archives                | 22 MB    |             4.3 |               100.0
 notifications                 | 20 MB    |             3.9 |               100.1
 events_nodeid_display_ackuser | 5616 kB  |             1.1 |                33.2
 outages                       | 5000 kB  |             1.0 |               100.3
 events_ipaddr_idx             | 4496 kB  |             0.9 |                27.5
 events_nodeid_idx             | 4456 kB  |             0.8 |                36.4
 events_uei_idx                | 3648 kB  |             0.7 |                10.5
 iprouteinterface              | 2096 kB  |             0.4 |               101.6
 events_time_idx               | 2256 kB  |             0.4 |                20.0
(10 rows)

Looks good: the events table (the largest by far) and its indexes are mostly cached into RAM!
I could try to raise the shared_buffers value to a number larger than the database size, but since the most expensive relation and its indexes are for more than half cached and since postgres is already using lots of other memory besides shared_buffers I left it as it is.
Performance and responsiveness of the UI improved a lot. I/O wait went down too by a 2-4%.

Swapping NGINX in for Apache
I did it mostly to save CPU and RAM because the amount of resources that even a lightly loaded apache can consume is astounding. Don't take my word for it: google it!
This one's easy too: head to the download page and grab the rpm for your distro.

The net result of the above changes is that the UI is now faster to load and page transitions are smooth without eccessive waiting, even on the node detail page for a system with lots of events.

Hope this helps!


Popular posts from this blog

Indexing Apache access logs with ELK (Elasticsearch+Logstash+Kibana)

Who said that grepping Apache logs has to be boring?

The truth is that, as Enteprise applications move to the browser too, Apache access logs are a gold mine, it does not matter what your role is: developer, support or sysadmin. If you are not mining them you are most likely missing out a ton of information and, probably, making the wrong decisions.
ELK (Elasticsearch, Logstash, Kibana) is a terrific, Open Source stack for visually analyzing Apache (or nginx) logs (but also any other timestamped data).

From 0 to ZFS replication in 5m with syncoid

The ZFS filesystem has many features that once you try them you can never go back. One of the lesser known is probably the support for replicating a zfs filesystem by sending the changes over the network with zfs send/receive.
Technically the filesystem changes don't even need to be sent over a network: you could as well dump them on a removable disk, then receive  from the same removable disk.

Detect missed executions with OpenNMS

Everyone knows that OpenNMS is a powerful monitoring solution, but not everyone knows that since version 1.10 circa it embeds the Drools rule processing engine. Drools programs can then be used to extend the event handling logic in new and powerful ways.

The following example shows how OpenNMS can be extended to detect missed executions for recurring activities like backups or scheduled jobs.