Admin-friendly Nagios-Plugins for NetApp
Nagios-Plugins for NetApp
The Monitoring-Framework Nagios alerts users in case of deviation from normal operating mode and records the values measured in the long term. Nagios is highly flexible and is suitable for monitoring complex and heterogeneous IT infrastructures.
Nagios-Plugins for NetApp are a suite of professionally developed and tested check scripts which enable extensive monitoring by Nagios of Net-App devices (or the N-series by IBM).
Customer Satisfaction
Test versions are sold to selected customers for productive usage even before release of the stable version. None of our customers has ever availed themselves of the plugin trial period of several months.
0% Returns - 100% Customer Satisfaction
(since February 2009)
Unique Features
- Simple, uniform authentication and data-retrieval (HTTP/XML). We don't use SNMP for obvious reasons.
- Simple and uniform user-interface and output
- Implementation- and maintenance-friendly i.e. stable configuration in Nagios.
- A comprehensive, constantly updated variety of monitored values (usage, lag-time, snap reserve, SnapVaults, SnapMirrors, cluster, hardware, ops, transfer-rate, ...)
- Quick and simple reaction to feature requests (see under References)
- Long term planning and development (see under Roadmap).
Development objectives
Our plugins are developed in collaboration with our clients to ensure a quick and stable implementation.
Plugins developed and tested in a professional environment save implementation and maintenance time.
As a result we put a lot of effort into the planning, development and test phases – to make implementation as easy as possible. Our plugins can be embedded and maintained in Nagios with minimum effort. The following features illustrate this concept:
- The plugins authenticate themselves on the NetApp filers utilising a username and password; user-related data is saved in a text file which can be used for all plugins and transferred encrypted.
- All plugins request their data via HTTPS from the NetApp-API. Thus, dispensing with firewall configuration for SNMP, SSH or other protocols.
- Multiple instances (aggregates, volumes, snapMirrors,…) are determined and monitored dynamically by the plugins. Therefore listing, adding or deleting (for example volumes for usage monitoring) are no longer needed. This saves valuable time during implementation and keeps the monitoring stable over the course of time.
Overall-Checks (dynamic identification of multiple instances)
This can be of interest for example for volumes, aggregates and disks, but also for hardware checks: adding service checks manually for each volume or network device can be very tiresome and open to error especially considering the fact that monitoring has to be constantly adapted to changing circumstances.
Checks which automatically recognize which instances have been added or removed are far better: For example the command
$ check_netapp_disk.pl -H toaster -u nagios%mypass
which checks the status of all existing disks on the NetApp-filer at run-time. The status is then displayed very clearly in a single line.
One click on disks then displays all further details to the admin.
As soon as the check identifies a faulty disk on the NetApp-Filer the display changes to:
The details as well as a the faulty disks are displayed together with the reason for the failure.
Scope of services of Nagios-Plugins for NetApp
As of today (April 2012) we have developed 24 Nagios-plugins for monitoring a variety of aspects of a NetApp-device. These can be divided into the following groups:
- Bundles: Bundles of several plugins.
- Caches: Buffer Cache, FlashCache and FlexCache
- Hardware: Broken disks, temperature, cooling-devices, power-supplies, nvram
- Management: High-level overview for decsion-makers
- Network: Stats per interface (ifnet): Bytes and packets read and sent, errors per second, multicasts, collissions, ...
- Performance: Operations per second (HTTP, CIFS, ...), transfer-rate (network, disks, ...), utilization in % (processor, disk), performance per volume (latency, ops)
- Snap: Available snap-size, lag-time and transfer-errors from SnapMirrors and SnapVaults, utilization of the snap-reserve
- Storage: Available space for aggregates and volumes, quotas
- Status: Global System-Status, cluster-status, status (online/offline) of the iSCSI-adapters, RAID-status of aggregates and volumes
- Other: Other tools, modules and documentation
A short description of features for each plugin can be found here: [PDF] check_netapp: feature description.
