I’m _still_ trying to get our 1.3.2 installation finished up and ready to go. Whilst I’ve been doing that I’ve been giving a lot of thought to OpenNMS performance issues. I probably ought to write this up in the wiki, but in the meantime a blog entry will have to do.
Like most OpenNMS users, we started out small, on a shared machine, and as we grew to rely on OpenNMS more and more, we got a dedicated machine. Then, when it didn’t perform as well as we expected, we got around to installing more disks, then a disk array, then space on a NetApp filer, then (the next logical step perhaps) a separate SATA tray for said NetApp filer.
I guess my point is that in an enterprise setting, OpenNMS requires enterprise grade hardware. A very frequent question on opennms-discuss goes along the lines of “the web UI is slow”, or “My OpenNMS machine has a load average of 10 – help!”, or “the daemon is running out of heap”. These can almost always be tracked down to an IO bottleneck.
Back in the day, when I worked for EDS, I was involved in a (failed) Unicenter TNG implementation. The biggest overhead in that piece of work was DBA involvement. The back end was Oracle, and the first order of the day was OFA (Optimal Flexible Architecture). This meant at least four mount points for the database, all on separate spindles. That was just a starter for ten. The CA consultants recommended an Intergraph workstation and we were in real danger spending more on monitoring hardware than we had on the boxes that ran the applications. It was only thanks to CA and EDS’s global software licensing agreement that we didn’t have a software spend in excess of the GDP of Belgium to go with the hardware costs.
Now, with OpenNMS, I’m seeing the opposite. We can get enterprise grade software that runs on a laptop. Open Source advocates are used to introducing software below the IT department’s radar on borrowed or “too slow for (insert big commercial product here)” cast-offs. When we outgrow those cast-offs we try to scale horizontally.
OpenNMS isn’t quite like that though. Part of it’s strength is that it will go out there and collect a whole bunch of performance and response data with minimal configuration. That data has got to be written somewhere. It will happily attempt to swallow and store away vast quantities of SNMP traps, the resulting events consume tablespace. With the best will in the world, it’s hard to scale a database horizontally.
I guess this is a homily on realism. OpenNMS can, in a lot of situations, replace commercial products. As it has a lot less feature bloat, it will probably require less in the way of hardware resources. We still need to be realistic though. Especially with regards to storage and backup. I know that the whole Capex vs Opex argument can skew management’s viewpoint regarding hardware to run OpenNMS. It is, however, up to us as the IT professionals in charge, to set realistic expectations regarding the kind of hardware spend OpenNMS requires.
The written body of knowledge on scaling up OpenNMS is small right now. I guess we should be addressing this.