One of the nice aspects of OpenNMS is that, out of the box, it will collect a lot of data from most snmp-enabled resources. The downside is that such collection is I/O heavy (iops, not throughput).
Even on moderate installations with hundreds of nodes it is enough to swamp even the fastest disk subsystem (except for those with controllers supported by large write caches). A symptom is that I/O wait will be quite high on the opennms box itself.
|I/O Wait before and after switch jrobin backend from FILE to MNIO|
The graph above show the I/O wait on a RAID 10 array with 4 15K drives, storing RRD data from approximately 300 nodes, sampled at the usual 5m interval.
The I/O wait is, or better was, constantly at 30% (before I applied some postgres tuning it was at 70%).
As you can see from the graph I/O wait fell sharply after 7.30 when I applied a simple change to Jrobin which is the OpenNMS subsystem responsible for writing and reading RRDs.
The change involves using an alternative I/O strategy called MNIO instead of FILE which is the default. It requires editing just a properties file. Restart is required.
The box has been running with the new setting for several days now without errors and excellent performance. On the mailing list someone reported years of running successfully with MNIO.