<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN" > <book> <title>Nagios Plug-in Developer Guidelines</title> <bookinfo> <authorgroup> <author> <affiliation> <orgname>Nagios Plugins Development Team</orgname> </affiliation> </author> </authorgroup> <pubdate>2006</pubdate> <title>Nagios plug-in development guidelines</title> <revhistory> <revision> <revnumber>$Revision$</revnumber> <date>$Date$</date> </revision> </revhistory> <copyright> <year>2000 - 2006</year> <holder>Nagios Plugins Development Team</holder> </copyright> </bookinfo> <preface id="preface"><title>Preface</title> <para>The purpose of this guidelines is to provide a reference for the plug-in developers and encourage the standarization of the different kind of plug-ins: C, shell, perl, python, etc.</para> <para>Nagios Plug-in Development Guidelines Copyright (C) 2000-2006 (Nagios Plugins Team)</para> <para>Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.</para> <para>The plugins themselves are copyrighted by their respective authors.</para> </preface> <article> <section id="DevRequirements"><title>Development platform requirements</title> <para> Nagios plugins are developed to the GNU standard, so any OS which is supported by GNU should run the plugins. While the requirements for compiling the Nagios plugins release is very small, to develop from CVS needs additional software to be installed. These are the minimum levels of software required: <literallayout> gnu make 3.79 automake 1.9.2 autoconf 2.58 gnu m4 1.4.2 </literallayout> To compile from CVS, after you have checked out the code, run: <literallayout> tools/setup ./configure make make install </literallayout> </para> <para>Note: gettext is no longer a developer platform requirement. A lot of the files in lib/ and m4/ are synced with the coreutils project and we use the same levels of gettext that they distribute. </para> <para>Note: gnu libtool, which must be at version 1.5.22 or above, has files installed into CVS, so is not a development platform requirement. </para> </section> <section id="PlugOutput"><title>Plugin Output for Nagios</title> <para>You should always print something to STDOUT that tells if the service is working or why it is failing. Try to keep the output short - probably less that 80 characters. Remember that you ideally would like the entire output to appear in a pager message, which will get chopped off after a certain length.</para> <section><title>Print only one line of text</title> <para>Nagios will only grab the first line of text from STDOUT when it notifies contacts about potential problems. If you print multiple lines, you're out of luck. Remember, keep it short and to the point.</para> <para>Output should be in the format:</para> <literallayout> SERVICE STATUS: Information text </literallayout> <para>However, note that this is not a requirement of the API, so you cannot depend on this being an accurate reflection of the status of the service - the status should always be determined by the return code.</para> </section> <section><title>Verbose output</title> <para>Use the -v flag for verbose output. You should allow multiple -v options for additional verbosity, up to a maximum of 3. The standard type of output should be:</para> <table id="verboselevels"><title>Verbose output levels</title> <tgroup cols="2"> <thead> <row> <entry><para>Verbosity level</para></entry> <entry><para>Type of output</para></entry> </row> </thead> <tbody> <row> <entry align="center"><para>0</para></entry> <entry><para>Single line, minimal output. Summary</para></entry> </row> <row> <entry align="center"><para>1</para></entry> <entry><para>Single line, additional information (eg list processes that fail)</para></entry> </row> <row> <entry align="center"><para>2</para></entry> <entry><para>Multi line, configuration debug output (eg ps command used)</para></entry> </row> <row> <entry align="center"><para>3</para></entry> <entry><para>Lots of detail for plugin problem diagnosis</para></entry> </row> </tbody> </tgroup> </table> </section> <section><title>Screen Output</title> <para>The plug-in should print the diagnostic and just the synopsis part of the help message. A well written plugin would then have --help as a way to get the verbose help.</para> <para>Code and output should try to respect the 80x25 size of a crt (remember when fixing stuff in the server room!)</para> </section> <section><title>Plugin Return Codes</title> <para>The return codes below are based on the POSIX spec of returning a positive value. Netsaint prior to v0.0.7 supported non-POSIX compliant return code of "-1" for unknown. Nagios supports POSIX return codes by default.</para> <para>Note: Some plugins will on occasion print on STDOUT that an error occurred and error code is 138 or 255 or some such number. These are usually caused by plugins using system commands and having not enough checks to catch unexpected output. Developers should include a default catch-all for system command output that returns an UNKNOWN return code.</para> <table id="ReturnCodes"><title>Plugin Return Codes</title> <tgroup cols="3"> <thead> <row> <entry><para>Numeric Value</para></entry> <entry><para>Service Status</para></entry> <entry><para>Status Description</para></entry> </row> </thead> <tbody> <row> <entry align="center"><para>0</para></entry> <entry valign="middle"><para>OK</para></entry> <entry><para>The plugin was able to check the service and it appeared to be functioning properly</para></entry> </row> <row> <entry align="center"><para>1</para></entry> <entry valign="middle"><para>Warning</para></entry> <entry><para>The plugin was able to check the service, but it appeared to be above some "warning" threshold or did not appear to be working properly</para></entry> </row> <row> <entry align="center"><para>2</para></entry> <entry valign="middle"><para>Critical</para></entry> <entry><para>The plugin detected that either the service was not running or it was above some "critical" threshold</para></entry> </row> <row> <entry align="center"><para>3</para></entry> <entry valign="middle"><para>Unknown</para></entry> <entry><para>Invalid command line arguments were supplied to the plugin or low-level failures internal to the plugin (such as unable to fork, or open a tcp socket) that prevent it from performing the specified operation. Higher-level errors (such as name resolution errors, socket timeouts, etc) are outside of the control of plugins and should generally NOT be reported as UNKNOWN states. </para></entry> </row> </tbody> </tgroup> </table> </section> <section id="thresholdformat"><title>Threshold and ranges</title> <para>A range is defined as a start and end point (inclusive) on a numeric scale (possibly negative or positive infinity). </para> <para>A threshold is a range with an alert level (either warning or critical). Use the set_thresholds(thresholds *, char *, char *) function to set the thresholds. </para> <para>The theory is that the plugin will do some sort of check which returns back a numerical value, or metric, which is then compared to the warning and critical thresholds. Use the get_status(double, thresholds *) function to compare the value against the thresholds.</para> <para>This is the generalised format for ranges:</para> <literallayout> [@]start:end </literallayout> <para>Notes:</para> <orderedlist> <listitem><para>start ≤ end</para> </listitem> <listitem><para>start and ":" is not required if start=0</para> </listitem> <listitem><para>if range is of format "start:" and end is not specified, assume end is infinity</para> </listitem> <listitem><para>to specify negative infinity, use "~"</para> </listitem> <listitem><para>alert is raised if metric is outside start and end range (inclusive of endpoints)</para> </listitem> <listitem><para>if range starts with "@", then alert if inside this range (inclusive of endpoints)</para> </listitem> </orderedlist> <para>Note: Not all plugins are coded to expect ranges in this format yet. There will be some work in providing multiple metrics.</para> <table id="ExampleRanges"><title>Example ranges</title> <tgroup cols="2"> <thead> <row> <entry><para>Range definition</para></entry> <entry><para>Generate an alert if x...</para></entry> </row> </thead> <tbody> <row> <entry>10</entry> <entry>< 0 or > 10, (outside the range of {0 .. 10})</entry> </row> <row> <entry>10:</entry> <entry>< 10, (outside {10 .. ∞})</entry> </row> <row> <entry>~:10</entry> <entry>> 10, (outside the range of {-∞ .. 10})</entry> </row> <row> <entry>10:20</entry> <entry>< 10 or > 20, (outside the range of {10 .. 20})</entry> </row> <row> <entry>@10:20</entry> <entry>≥ 10 and ≤ 20, (inside the range of {10 .. 20})</entry> </row> <row> <entry>10</entry> <entry>< 0 or > 10, (outside the range of {0 .. 10})</entry> </row> </tbody> </tgroup> </table> <table id="CommandLineExamples"><title>Command line examples</title> <tgroup cols="2"> <thead> <row> <entry><para>Command line</para></entry> <entry><para>Meaning</para></entry> </row> </thead> <tbody> <row> <entry>check_stuff -w10 -c20</entry> <entry>Critical if "stuff" is over 20, else warn if over 10 (will be critical if "stuff" is less than 0)</entry> </row> <row> <entry>check_stuff -w~:10 -c~:20</entry> <entry>Same as above. Negative "stuff" is OK</entry> </row> <row> <entry>check_stuff -w10: -c20</entry> <entry>Critical if "stuff" is over 20, else warn if "stuff" is below 10 (will be critical if "stuff" is less than 0)</entry> </row> <row> <entry>check_stuff -c1:</entry> <entry>Critical if "stuff" is less than 1</entry> </row> <row> <entry>check_stuff -w~:0 -c10</entry> <entry>Critical if "stuff" is above 10; Warn if "stuff" is above zero</entry> </row> <row> <entry>check_stuff -c5:6</entry> <entry>The only noncritical range is 5:6</entry> </row> <row> <entry>check_stuff -c10:20</entry> <entry>Critical if "stuff" is 10 to 20</entry> </row> </tbody> </tgroup> </table> </section> <section><title>Performance data</title> <para>Performance data is defined by Nagios as "everything after the | of the plugin output" - please refer to Nagios documentation for information on capturing this data to logfiles. However, it is the responsibility of the plugin writer to ensure the performance data is in a "Nagios plugins" format. This is the expected format:</para> <literallayout> 'label'=value[UOM];[warn];[crit];[min];[max] </literallayout> <para>Notes:</para> <orderedlist> <listitem><para>space separated list of label/value pairs</para> </listitem> <listitem><para>label can contain any characters</para> </listitem> <listitem><para>the single quotes for the label are optional. Required if spaces, = or ' are in the label</para> </listitem> <listitem><para>label length is arbitrary, but ideally the first 19 characters are unique (due to a limitation in RRD). Be aware of a limitation in the amount of data that NRPE returns to Nagios</para> </listitem> <listitem><para>to specify a quote character, use two single quotes</para> </listitem> <listitem><para>warn, crit, min or max may be null (for example, if the threshold is not defined or min and max do not apply). Trailing unfilled semicolons can be dropped</para> </listitem> <listitem><para>min and max are not required if UOM=%</para> </listitem> <listitem><para>value, min and max in class [-0-9.]. Must all be the same UOM</para> </listitem> <listitem><para>warn and crit are in the range format (see <xref linkend="thresholdformat">). Must be the same UOM</para> </listitem> <listitem><para>UOM (unit of measurement) is one of:</para> <orderedlist> <listitem><para>no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)</para> </listitem> <listitem><para>s - seconds (also us, ms)</para></listitem> <listitem><para>% - percentage</para></listitem> <listitem><para>B - bytes (also KB, MB, TB)</para></listitem> <listitem><para>c - a continous counter (such as bytes transmitted on an interface)</para></listitem> </orderedlist> </listitem> </orderedlist> <para>It is up to third party programs to convert the Nagios plugins performance data into graphs.</para> </section> <section><title>Translations</title> <para>If possible, use translation tools for all output to respect the user's language settings. See <xref linkend="translationsdevelopers"> for guidelines for the core plugins. </para> </section> </section> <section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title> <section><title>Don't execute system commands without specifying their full path</title> <para>Don't use exec(), popen(), etc. to execute external commands without explicity using the full path of the external program.</para> <para>Doing otherwise makes the plugin vulnerable to hijacking by a trojan horse earlier in the search path. See the main plugin distribution for examples on how this is done.</para> </section> <section><title>Use spopen() if external commands must be executed</title> <para>If you have to execute external commands from within your plugin and you're writing it in C, use the spopen() function that Karl DeBisschop has written.</para> <para>The code for spopen() and spclose() is included with the core plugin distribution.</para> </section> <section><title>Don't make temp files unless absolutely required</title> <para>If temp files are needed, make sure that the plugin will fail cleanly if the file can't be written (e.g., too few file handles, out of disk space, incorrect permissions, etc.) and delete the temp file when processing is complete.</para> </section> <section><title>Don't be tricked into following symlinks</title> <para>If your plugin opens any files, take steps to ensure that you are not following a symlink to another location on the system.</para> </section> <section><title>Validate all input</title> <para>use routines in utils.c or utils.pm and write more as needed</para> </section> </section> <section id="PerlPlugin"><title>Perl Plugins</title> <para>Perl plugins are coded a little more defensively than other plugins because of embedded Perl. When configured as such, embedded Perl Nagios (ePN) requires stricter use of the some of Perl's features. This section outlines some of the steps needed to use ePN effectively.</para> <orderedlist> <listitem><para> Do not use BEGIN and END blocks since they will be called only once (when Nagios starts and shuts down) with Embedded Perl (ePN). In particular, do not use BEGIN blocks to initialize variables.</para> </listitem> <listitem><para>To use utils.pm, you need to provide a full path to the module in order for it to work.</para> <literallayout> e.g. use lib "/usr/local/nagios/libexec"; use utils qw(...); </literallayout> </listitem> <listitem><para>Perl scripts should be called with "-w"</para> </listitem> <listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at least explicitly package names as in "$main::x" or predeclare every variable. </para> <para>Explicitly initialize each variable in use. Otherwise with caching enabled, the plugin will not be recompiled each time, and therefore Perl will not reinitialize all the variables. All old variable values will still be in effect.</para> </listitem> <listitem><para>Do not use >DATA< handles (these simply do not compile under ePN).</para> </listitem> <listitem><para>Do not use global variables in named subroutines. This is bad practise anyway, but with ePN the compiler will report an error "<global_var> will not stay shared ..". Values used by subroutines should be passed in the argument list.</para> </listitem> <listitem><para>If writing to a file (perhaps recording performance data) explicitly close close it. The plugin never calls <emphasis role="strong">exit</emphasis>; that is caught by p1.pl, so output streams are never closed.</para> </listitem> <listitem><para>As in <xref linkend="runtime"> all plugins need to monitor their runtime, specially if they are using network resources. Use of the <emphasis>alarm</emphasis> is recommended noting that some Perl modules (eg LWP) manage timers, so that an alarm set by a plugin using such a module is overwritten by the module. (workarounds are cunning (TM) or using the module timer) Plugins may import a default time out ($TIMEOUT) from utils.pm. </para> </listitem> <listitem><para>Perl plugins should import %ERRORS from utils.pm and then "exit $ERRORS{'OK'}" rather than "exit 0" </para> </listitem> </orderedlist> </section> <section id="runtime"><title>Runtime Timeouts</title> <para>Plugins have a very limited runtime - typically 10 sec. As a result, it is very important for plugins to maintain internal code to exit if runtime exceeds a threshold. </para> <para>All plugins should timeout gracefully, not just networking plugins. For instance, df may lock if you have automounted drives and your network fails - but on first glance, who'd think df could lock up like that. Plus, it should just be more error resistant to be able to time out rather than consume resources.</para> <section><title>Use DEFAULT_SOCKET_TIMEOUT</title> <para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para> </section> <section><title>Add alarms to network plugins</title> <para>If you write a plugin which communicates with another networked host, you should make sure to set an alarm() in your code that prevents the plugin from hanging due to abnormal socket closures, etc. Nagios takes steps to protect itself against unruly plugins that timeout, but any plugins you create should be well behaved on their own.</para> </section> </section> <section id="PlugOptions"><title>Plugin Options</title> <para>A well written plugin should have --help as a way to get verbose help. Code and output should try to respect the 80x25 size of a crt (remember when fixing stuff in the server room!)</para> <section><title>Option Processing</title> <para>For plugins written in C, we recommend the C standard getopt library for short options. Getopt_long is always available. </para> <para>For plugins written in Perl, we recommend Getopt::Long module.</para> <para>Positional arguments are strongly discouraged.</para> <para>There are a few reserved options that should not be used for other purposes:</para> <literallayout> -V version (--version) -h help (--help) -t timeout (--timeout) -w warning threshold (--warning) -c critical threshold (--critical) -H hostname (--hostname) -v verbose (--verbose) </literallayout> <para>In addition to the reserved options above, some other standard options are:</para> <literallayout> -C SNMP community (--community) -a authentication password (--authentication) -l login name (--logname) -p port or password (--port or --passwd/--password)monitors operational -u url or username (--url or --username) </literallayout> <para>Look at check_pgsql and check_procs to see how I currently think this can work. Standard options are:</para> <para>The option -V or --version should be present in all plugins. For C plugins it should result in a call to print_revision, a function in utils.c which takes two character arguments, the command name and the plugin revision.</para> <para>The -? option, or any other unparsable set of options, should print out a short usage statement. Character width should be 80 and less and no more that 23 lines should be printed (it should display cleanly on a dumb terminal in a server room).</para> <para>The option -h or --help should be present in all plugins. In C plugins, it should result in a call to print_help (or equivalent). The function print_help should call print_revision, then print_usage, then should provide detailed help. Help text should fit on an 80-character width display, but may run as many lines as needed.</para> <para>The option -v or --verbose should be present in all plugins. The user should be allowed to specify -v multiple times to increase the verbosity level, as described in <xref linkend="verboselevels">.</para> </section> <section> <title>Plugins with more than one type of threshold, or with threshold ranges</title> <para>Old style was to do things like -ct for critical time and -cv for critical value. That goes out the window with POSIX getopt. The allowable alternatives are:</para> <orderedlist> <listitem> <para>long options like -critical-time (or -ct and -cv, I suppose).</para> </listitem> <listitem> <para>repeated options like `check_load -w 10 -w 6 -w 4 -c 16 -c 10 -c 10`</para> </listitem> <listitem> <para>for brevity, the above can be expressed as `check_load -w 10,6,4 -c 16,10,10`</para> </listitem> <listitem> <para>ranges are expressed with colons as in `check_procs -C httpd -w 1:20 -c 1:30` which will warn above 20 instances, and critical at 0 and above 30</para> </listitem> <listitem> <para>lists are expressed with commas, so Jacob's check_nmap uses constructs like '-p 1000,1010,1050:1060,2000'</para> </listitem> <listitem> <para>If possible when writing lists, use tokens to make the list easy to remember and non-order dependent - so check_disk uses '-c 10000,10%' so that it is clear which is the precentage and which is the KB values (note that due to my own lack of foresight, that used to be '-c 10000:10%' but such constructs should all be changed for consistency, though providing reverse compatibility is fairly easy).</para> </listitem> </orderedlist> <para>As always, comments are welcome - making this consistent without a host of long options was quite a hassle, and I would suspect that there are flaws in this strategy. </para> </section> </section> <section id="Testcases"><title>Test cases</title> <para> Tests are the best way of knowing if the plugins work as expected. Please create and update test cases where possible. </para> <para> To run a test, from the top level directory, run "make test". This will run all the current tests and report an overall success rate. </para> <para> See the <ulink url="http://tinderbox.altinity.org">Nagios Plugins Tinderbox server</ulink> for the daily test results. </para> <section><title>Test cases for plugins</title> <para>These use perl's Test::More. To do a one time test, run "cd plugins && perl t/check_disk.t". </para> <para>There will somtimes be failures seen in this output which are known failures that need to be fixed. As long as the return code is 0, it will be reported as "test pass". (If you have a fix so that the specific test passes, that will be gratefully received!) </para> <para> If you want a summary test, run: "cd plugins && prove t/check_disk.t". This runs the test in a summary format. </para> <para> For a good and amusing tutorial on using Test::More, see this <ulink url="http://search.cpan.org/~mschwern/Test-Simple-0.62/lib/Test/Tutorial.pod"> link</ulink> </para> </section> <section><title>Testing the C library functions</title> <para> We use <ulink url="http://jc.ngo.org.uk/trac-bin/trac.cgi/wiki/LibTap">the libtap library</ulink>, which gives perl's TAP (Test Anything Protocol) output. This is used by the FreeBSD team for their regression testing. </para> <para> To run tests using the libtap library, download the latest tar ball and extract. There is a problem with tap-1.01 where <ulink url="http://jc.ngo.org.uk/trac-bin/trac.cgi/ticket/25">pthread support doesn't appear to work</ulink> properly on non-FreeBSD systems. Install with 'CPPFLAGS="-UHAVE_LIBPTHREAD" ./configure && make && make check && make install'. </para> <para> When you run Nagios Plugins' configure, it will look for the tap library and will automatically setup the tests. Run "make test" to run all the tests. </para> </section> </section> <section id="CodingGuidelines"><title>Coding guidelines</title> <para>See <ulink url="http://www.gnu.org/prep/standards_toc.html">GNU Coding standards</ulink> for general guidelines.</para> <section><title>C coding</title> <para>Variables should be declared at the beginning of code blocks and not inline because of portability with older compilers.</para> <para>You should use /* */ for comments and not // as some compilers do not handle the latter form.</para> </section> <section><title>Crediting sources</title> <para>If you have copied a routine from another source, make sure the licence from your source allows this. Add a comment referencing the ACKNOWLEDGEMENTS file, where you can put more detail about the source.</para> <para>For contributed code, do not add any named credits in the source code - contributors should be added into the THANKS.in file instead. </para> </section> <section><title>CVS comments</title> <para>When adding CVS comments at commit time, you can use the following prefixes: <variablelist> <varlistentry><term>- comment</term> <listitem> <para>for a comment that can be removed from the Changelog</para> </listitem> </varlistentry> <varlistentry><term>* comment</term> <listitem> <para>for an important amendment to be included into a features list</para> </listitem> </varlistentry> </variablelist> </para> <para>If the change is due to a contribution, please quote the contributor's name and, if applicable, add the SourceForge Tracker number. Don't forget to update the THANKS.in file.</para> </section> <section id="translationsdevelopers"><title>Translations for developers</title> <para>To make the job easier for translators, please follow these guidelines:</para> <orderedlist> <listitem><para> Before creating new strings, check the po/nagios-plugins.pot file to see if a similar string already exists </para></listitem> <listitem><para> For help texts, break into individual options so that these can be reused between plugins </para></listitem> <listitem><para>Try to avoid linefeeds unless you are working on a block of text</para></listitem> <listitem><para>Short help is not translated</para></listitem> <listitem><para>Long help has options in English language, but text translated</para></listitem> <listitem><para>"Copyright" kept in English</para></listitem> <listitem><para>Copyright holder names kept in original text</para></listitem> <listitem><para>Debugging output does not need to be translated</para></listitem> </orderedlist> </section> <section><title>Translations for translators</title> <para>To create an up to date list of translatable strings, run: tools/gen_locale.sh</para> </section> </section> <section id="SubmittingChanges"><title>Submission of new plugins and patches</title> <section id="Patches"><title>Patches</title> <para>If you have a bug patch, please supply a unified or context diff against the version you are using. For new features, please supply a diff against the CVS HEAD version.</para> <para>Patches should be submitted via <ulink url="http://sourceforge.net/tracker/?group_id=29880&atid=397599">SourceForge's tracker system for Nagiosplug patches</ulink> and be announced to the nagiosplug-devel mailing list.</para> <para>Submission of a patch implies that the submmitter acknowledges that they are the author of the code (or have permission from the author to release the code) and agree that the code can be released under the GPL. The copyright for the changes will then revert to the Nagios Plugin Development Team - this is required so that any copyright infringements can be investigated quickly without contacting a huge list of copyright holders. Credit will always be given for any patches through a THANKS file in the distribution.</para> </section> <section id="Newplugins"><title>New plugins</title> <para>If you would like others to use your plugins, please add it to the official 3rd party plugin repository, <ulink url="http://www.nagiosexchange.org">NagiosExchange</ulink>. </para> <para>We are not accepting requests for inclusion of plugins into our distribution at the moment, but when we do, these are the minimum requirements: </para> <orderedlist> <listitem> <para>Include copyright and license information in all files</para> </listitem> <listitem> <para>The standard command options are supported (--help, --version, --timeout, --warning, --critical)</para> </listitem> <listitem> <para>It is determined to be not redundant (for instance, we would not add a new version of check_disk just because someone had provide a plugin that had perf checking - we would incorporate the features into an exisiting plugin)</para> </listitem> <listitem> <para>One of the developers has had the time to audit the code and declare it ready for core</para> </listitem> <listitem> <para>It should also follow code format guidelines, and use functions from utils (perl or c or sh) rather than using its own</para> </listitem> <listitem> <para>Includes patches to configure.in if required (via the EXTRAS list if it will only work on some platforms)</para> </listitem> <listitem> <para>If possible, please submit a test harness. Documentation on sample tests coming soon</para> </listitem> </orderedlist> </section> </section> </article> </book>