An introduction to OpenNMS as a large scale operations support platform
Abstract: This note provides a brief introduction to the OpenNMS network management platform.
OpenNMS Overview
OpenNMS (www.opennms.org) is the world’s first enterprise grade network management platform developed using the open source model.
Project Details
Out of the box OpenNMS provides a very capable and flexible OSS solution offering equivalent and, in a number of areas, more advanced FCAPS[1] capabilities than many similar commercial off the shelf OSS systems – with the added benefits of the low capital cost, community support and the ease of adaptation and customisation which an open source project provides. Having had nearly 10 years of development, the OpenNMS project has a very active community and has over 4000 downloads a month from Sourceforge. It is used by a number of large organisations including Swisscom, Wind Italy, Ocado, the BBC, Myspace and Pappa John’s Pizza.
Commercial Support
All of the code for OpenNMS is freely downloadable under the GPL Licence. There are no hidden ‘extra for a price’ features. All users have access to the same code and can use as much or as little of the commercial support as they need.
From a commercial perspective, nobody sells the software (it’s free!) but commercial professional services such as 24×7 help desk support, custom development, integration and training are available through The OpenNMS Group Inc (www.opennms.com) and their partners.
Many users of OpenNMS find that the mutual community support for the platform is sufficient. However larger organisations and those having more sophisticated integration or support needs can take advantage of the professional services provided by The OpenNMS Group Inc.
OpenNMS Problem Management Workflow
OpenNMS supports the workflow for resolving network problems in a number of ways. Firstly a sophisticated event correlation capability allows network faults and service problems to be accurately identified. OpenNMS can record all event occurrences but in the case of a recurring event (perhaps caused by a trap storm), OpenNMS uses an alarm management system to convert configurable ‘alarm raising events’ or ‘alarm clearing events’ into a manageable alarm lifecycle and event count. On first receiving an event, an alarm is raised. Subsequent events are counted against the alarm. A clearing event clears the alarm ready for a new raise event. This is the simplest use of the alarm list. However, user configured ‘automations’ can process the alarm list for more sophisticated event reduction or for alarm enrichment using external data sources. In addition, OpenNMS has a correlation module built around the open source Jboss Rules Rete correlation engine[2]. This can execute rules for more sophisticated down stream alarm suppression.
Secondly, OpenNMS supports a policy driven problem notification and escalation mechanism across multiple groups of users taking into account the users’ on-call schedules. If an event is detected which an administrator has configured as notifiable, this can generate a notification which is escalated over time through a list of users until it is acknowledged. Devices can also be scheduled as out of service for maintenance during which time alarms from the device are not escalated.
The system can also drive external paging, emails or instant messages to attract attention to an escalated problem notification. If the basic escalation mechanism is not enough, OpenNMS also has a Generic Trouble Ticket interface which has been used to integrate with a number of trouble ticket systems including the open source trouble ticket implementations RT[3] and OTRS[4]
OpenNMS Data Collection and Management
Out of the box, OpenNMS provides policy driven data collection, management and visualisation.
OpenNMS presents performance data as graphs. These graphs can also be exported in the form of performance reports. A number of users have successfully used OpenNMS data collection as a feed into their customer SLA accounting systems. In addition to collecting performance statistics, OpenNMS can generate threshold crossing events based on changes in the data. Services can be polled centrally or through a distributed group of remote rollers as described below.
A key service management feature of OpenNMS is its ability to perform policy directed polled synthetic transactions to test that a remote service is behaving as expected. Rather than just ‘ping’ a service, OpenNMS plug-ins can be configured to interrogate a remote service by simulating the actions of a client and then use the results to determine the service up-time and latency.
An exciting new feature is the OpenNMS remote poller which runs on a remote client PC. It detects service outages from the perspective of the client and reports them back to a central OpenNMS application. This feature was first proven by a large healthcare provider in California, who uses it to ensure that remote doctors’ surgeries can continuously access their centrally hosted medical applications. The remote poller can be installed by a complete novice on a client PC using Java Web Start[5] technology simply by pointing a browser at the central OpenNMS installation. Once installed, the poller requests its configuration from the central system and begins reporting outages as experienced by the client PC. The OpenNMS project are currently investigating porting this technology to Java ME running on mobile phones in order to facilitate service monitoring for mobile networks.
In common with many other network management tools such as MRTG[6] Nagios[7] or Cricket[8], OpenNMS efficiently stores performance data in RRD files. It can use RRDTool[9] to do the storage, but the preferred library is Jrobin[10] which is a Java implementation of RRD. However unlike other tools using RRD, all of the scheduling of data collection is controlled by a policy engine within OpenNMS which makes the solution very scalable. Data can be collected from a variety of sources including SNMP, ASCII Syslog messages, TL1, JMX and WMI. Polled synthetic transactions can even be used to log into a web page and retrieve data directly from the page. There is an API to allow new plug-ins to be written to collect data from other sources. There is also an integration with Nagios to allow OpenNMS to use Nagios plugins. OpenNMS has also been integrated with Snort[11] for security and network intrusion detection. OpenNMS has MIBs already installed for most large vendors’ equipment but users can add their own configurations. The user community often share this work along with their experiences of integrating new equipment.
OpenNMS Unified Configuration and Network Discovery
Given a set of IP address ranges, OpenNMS can self discover the elements and services in a network and use policies to determine how to manage the services on each discovered device (such as deciding the appropriate scan rate to retrieve port performance data for a given service). The discovery system has recently been completely re-written to meet service provider requirements and can now very rapidly discover the configuration of large core network devices having thousands of ports. OpenNMS provides several API’s to allow configuration from and reconciliation with an external Configuration Management Database. The latest API can also provide direct integration with RANCID[12]. The RANCID integration allows OpenNMS to be used as the front end for auditing all configuration changes and remote logins to network devices. Users can thus be restricted to accessing network devices only through the OpenNMS/RANCID front end.
OpenNMS Platform
Two major strengths of the OpenNMS project are, firstly in its use of a robust, coherent and extensible technology platform supporting a core model accessed using the Data Access Object design pattern and, secondly in it’s API’s for external integration. The internal OpenNMS model can now be accessed using external REST interfaces which allows easy integration with other systems or scripts written in Perl, PHP or .NET.
The OpenNMS team uses test driven development supported by rigorous and comprehensive Junit testing during the build process. Although nearly 10 years old, the OpenNMS project continues to be active in refreshing the technology of its core design. As an example, the latest release uses leading edge java atomic transitions to dramatically increase the speed of asynchronous network discovery.
OpenNMS is written completely in Java (JDK 1.5+) and is designed around a unified component model implemented using Spring[13] and Hibernate[14]. It uses Jetty or Tomcat as its web server and Postgresql[15] for its database. OpenNMS can be installed on a single system or be distributed across several machines for performance in very large networks. Binary distributions exist for most major Unix/Linux distributions and for Microsoft Windows. Alternatively users can download and build their own distribution.
Administration, such as the addition of new devices or the modification of policies, can be performed through the user interface. Less frequent configuration changes such as trap to event/alarm mapping, MIB management etc are performed by modifying a set of XML files contained within one directory. This requires no programming ability.
OpenNMS is a sophisticated OSS system with a surprisingly rich set of features for an open source project. Being an open source project keeps us honest. If it all seems too good to be true, we encourage you to get involved by downloading and trying out the features for yourself. To get started you can try the on line demo at http://demo.opennms.org/opennms/.
Open Source and TM Forum Standards
An increasing number of large users of the OpenNMS project are asking for interfaces to other systems. OpenNMS has already implemented a number of proprietary API’s but have long realised that the project would benefit from a more consistent approach.
The TM Forum[16] specifies standardised Operations Support System interfaces for the Telecommunications industry. OpenNMS’s first foray into OSS standardisation occurred when the project implemented an experimental TM Forum OSS/J Quality of Service Interface[17]. At that time it took the contributors 6 months to create the OSS/J interface mainly because, although some OSS/J example code was available, there were no production grade open source libraries one could leverage to speed up the process. We believe that this lack of easy to use open source code significantly slowed the industrial uptake of OSS/J. The exercise created an experimental stand alone Apache 2 licensed OSS/J library which OpenNMS and other OSS platforms could use[18].
OSS/J and MTOSI have now been subsumed into the TM Forum’s Interface Program (TIP)[19] which will have a common technology framework. The TM Forum has started a public open source project called the TIP Implementation Open Source project[20] with the objective of developing software in support of the common interface framework. The OpenNMS project are now contributing to TIP their experience in running a successful open source project by becoming founding members of the open source program with several other OSS vendors.
The primary reason for getting involved in the TIP program is to share our learning’s from our OSS/J experience in order to create an Apache 2 Licensed open source library which will make it much easier for all industry players to create TIP interfaces than it has historically been with either OSS/J or MTOSI. The project will be creating an interface library using automatic code generation from the SID model and leveraging the Cisco sponsored Eclipse Tigerstripe project[21]. This will allow vendors to extend TM Forum information models (or use their own proprietary models) to generate the core code for their interfaces. It should significantly reduce the time to market both for the TM Forum standardisation activities and for the creation of production interfaces.
The resulting libraries will be made freely available to the whole industry. The libraries will be used to create the TIP standard Reference Implementations and Compatibility Test Kits but they will also be designed for use in production solutions. Any OSS vendor, Integrator or Service Provider will be free to use the library as they choose. The libraries will be developed completely separately from OpenNMS however just like any other OSS vendor supporting TIP, OpenNMS will be free to use the libraries to create TIP interfaces which are relevant to their user community.
© OpenNMS Group Inc 2010 This document is licensed under a Creative Commons Attribution-Share Alike 3.0 Un-ported License http://creativecommons.org/licenses/by-sa/3.0/
[1] FCAPS: Fault, Configuration, Performance Accounting, Security[2] Jboss Rules open source rete algorithm correlation engine (http://www.jboss.org/drools/)[3] RT Request Tracker (http://bestpractical.com/rt/) [4] OTRS Open source Ticket Request System (http://otrs.org/)[5] Java Web Start (http://java.sun.com/javase/technologies/desktop/javawebstart/index.jsp )[6] MRTG (http://oss.oetiker.ch/mrtg/)[7] Nagios (http://www.nagios.org/)[8] Cricket (http://cricket.sourceforge.net/)[9] RRDTool (http://oss.oetiker.ch/rrdtool/)[10] Jrobin (http://www.jrobin.org)[11] Snort(http://www.snort.org/)[12] RANCID (Really Awesome New Cisco Config Differ http://www.shrubbery.net/rancid/ )[13] Spring (www.springframework.org)[14] Hibernate (www.hibernate.org)[15] Postgresql (http://www.postgresql.org)[16] www.tmforum.org[17] JSR 90: OSS Quality of Service API [18] OpenNMS experimental OSS/J Interface See http://www.opennms.org/index.php/Dev-Jam:QosDocumentation. This experimental code took about 6 months to write. It would however need some re-factoring to make it production quality code. This project is now being superseded by the TIP activities.[19] TM Forum’s Interface Program (TIP) http://www.tmforum.org/InterfaceProgram/5733/home.html[20] The TIP Open Source Implementation project is hosted on he OpenOSS site at http://openoss.sourceforge.net/[21] Eclipse Tigerstripe project (http://www.eclipse.org/tigerstripe/)