I'm currently evaluating monitoring software for (by my standards) a larger network expected to grow to around 3000 devices. I'm finding data on the hardware requirements for scaling hard to come by. (Edit: the devices are satellite receivers monitored by SNMP, so require an agentless monitor. Our main concern is to identify failing devices, and we don't need a great deal of analysis.)
Tthe 3000 devices will have about 40 data points each, logged on a cycle of 5 to 10 minutes. At a 10 minute polling interval, that's 12,000 points per minute. That provides two sorts of load: CPU load for the polling application, and most critically, disk write load to store those datapoints.
I've looked at Solarwinds Orion, Zenoss, Zabbix, and OpenNMS. We have experience of Zenoss and Orion on smaller networks of a few hundred devices. My initial impressions are:
- Zenoss doesn't have a very efficient RRD implementation, but allows us to scale horizontally by adding collectors, which store RRD data locally.
- Orion allows us to add polling engines, but requires a shared SQL server for the performance data.
- Zabbix claims to scale to this level, but I've not found any useful guidance. As it uses a database for performance data, database tuning is key.
- OpenNMS looks like the performance leader, due to an optimized RRD implementation and support for grouping.
Does anybody have experience or performance data for monitoring this scale of network?