From 911f631f59a4d6cafe760d55dffa330b13d4f921 Mon Sep 17 00:00:00 2001 From: Subhendu Ghosh Date: Mon, 27 May 2002 02:05:55 +0000 Subject: added developer guidelines. git-svn-id: https://nagiosplug.svn.sourceforge.net/svnroot/nagiosplug/nagiosplug/trunk@38 f882894a-f735-0410-b71e-b25c423dba1c --- doc/README | 3 + doc/developer-guidelines.html | 931 ++++++++++++++++++++++++++++++++++++++++++ doc/developer-guidelines.sgml | 483 ++++++++++++++++++++++ 3 files changed, 1417 insertions(+) create mode 100644 doc/README create mode 100644 doc/developer-guidelines.html create mode 100644 doc/developer-guidelines.sgml diff --git a/doc/README b/doc/README new file mode 100644 index 00000000..388bc1d7 --- /dev/null +++ b/doc/README @@ -0,0 +1,3 @@ +The developer documentation here is generated from the DocBook format. + + diff --git a/doc/developer-guidelines.html b/doc/developer-guidelines.html new file mode 100644 index 00000000..efac605f --- /dev/null +++ b/doc/developer-guidelines.html @@ -0,0 +1,931 @@ + +Nagios plug-in development guidelines

Nagios plug-in development guidelines

Karl DeBisschop

karl@debisschop.net

Ethan Galstad

netsaint@linuxbox.com

Hugo Gayosso

hgayosso@gnu.org

Subhendu Ghosh

sghosh@sourceforge.net

Stanley Hopcroft

stanleyhopcroft@sourceforge.net

Copyright © 2000 2001 2002 by Karl DeBisschop, Ethan Galstad, + Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh


Table of Contents
About the guidelines
Copyright
Plugin Output for Nagios
Print only one line of text
Screen Output
Return the proper status code
Plugin Return Codes
System Commands and Auxiliary Files
Don't execute system commands without specifying their + full path
Use spopen() if external commands must be executed
Don't make temp files unless absolutely required
Don't be tricked into following symlinks
Validate all input
Perl Plugins
Runtime Timeouts
Use DEFAULT_SOCKET_TIMEOUT
Add alarms to network plugins
Plugin Options
Option Processing
Plugins with more than one type of threshold, or with + threshold ranges
New submissions and patches

About the guidelines

The purpose of this guidelines is to provide a reference for + the plug-in developers and encourage the standarization of the + different kind of plug-ins: C, shell, perl, python, etc.


Copyright

Nagios Plug-in Development Guidelines Copyright (C) 2000 2001 + 2002 + Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft, + Subhendu Ghosh

Permission is granted to make and distribute verbatim + copies of this manual provided the copyright notice and this + permission notice are preserved on all copies.

The plugins themselves are copyrighted by their respective + authors.

Table of Contents
Plugin Output for Nagios
System Commands and Auxiliary Files
Perl Plugins
Runtime Timeouts
Plugin Options
New submissions and patches

Plugin Output for Nagios

You should always print something to STDOUT that tells if the + service is working or why its failing. Try to keep the output short - + probably less that 80 characters. Remember that you ideally would like + the entire output to appear in a pager message, which will get chopped + off after a certain length.


Print only one line of text

Nagios will only grab the first line of text from STDOUT + when it notifies contacts about potential problems. If you print + multiple lines, you're out of luck. Remember, keep it short and + to the point.


Screen Output

The plug-in should print the diagnostic and just the + synopsis part of the help message. A well written plugin would + then have --help as a way to get the verbose help.

Code and output should try to respect the 80x25 size of a + crt (remember when fixing stuff in the server room!)


Return the proper status code

See Table 1 in the section called Plugin Return Codes below + for the numeric values of status codes and their + description. Remember to return an UNKNOWN state if bogus or + invalid command line arguments are supplied or it you are unable + to check the service.


Plugin Return Codes

The return codes below are based on the POSIX spec of returning + a positive value. Netsaint prior to v0.0.7 supported non-POSIX + compliant return code of "-1" for unknown. Nagios supports POSIX return + codes by default.

Note: Some plugins will on occasion print on STDOUT that an error + occurred and error code is 138 or 255 or some such number. These + are usually caused by plugins using system commands and having not + enough checks to catch unexpected output. Developers should include a + default catch-all for system command output that returns an UNKOWN + return code.

Table 1. Plugin Return Codes

Numeric Value

Service Status

Status Description

0

OK

The plugin was able to check the service and it + appeared to be functioning properly

1

Warning

The plugin was able to check the service, but it + appeared to be above some "warning" threshold or did not appear + to be working properly

2

Critical

The plugin detected that either the service was not + running or it was above some "critical" threshold

3

Unknown

Invalid command line arguments were supplied to the + plugin or the plugin was unable to check the status of the given + hosts/service


System Commands and Auxiliary Files

Don't execute system commands without specifying their + full path

Don't use exec(), popen(), etc. to execute external + commands without explicity using the full path of the external + program.

Doing otherwise makes the plugin vulnerable to hijacking + by a trojan horse earlier in the search path. See the main + plugin distribution for examples on how this is done.


Use spopen() if external commands must be executed

If you have to execute external commands from within your + plugin and you're writing it in C, use the spopen() function + that Karl DeBisschop has written.

The code for spopen() and spclose() is included with the + core plugin distribution.


Don't make temp files unless absolutely required

If temp files are needed, make sure that the plugin will + fail cleanly if the file can't be written (e.g., too few file + handles, out of disk space, incorrect permissions, etc.) and + delete the temp file when processing is complete.


Don't be tricked into following symlinks

If your plugin opens any files, take steps to ensure that + you are not following a symlink to another location on the + system.


Validate all input

use routines in utils.c or utils.pm and write more as needed


Perl Plugins

Perl plugins are coded a little more defensively than other + plugins because of embedded Perl. When configured as such, embedded + Perl Nagios (ePN) requires stricter use of the some of Perl's features. + This section outlines some of the steps needed to use ePN + effectively.

  1. Do not use BEGIN and END blocks since they will be called + the first time and when Nagios shuts down with Embedded Perl (ePN). In + particular, do not use BEGIN blocks to initialize variables.

  2. To use utils.pm, you need to provide a full path to the + module in order for it to work with ePN.

      e.g.
    + use lib "/usr/local/nagios/libexec";
    + use utils qw(...);
    +   

  3. Perl scripts should be called with "-w"

  4. All Perl plugins must compile cleanly under "use strict" - i.e. at + least explicitly package names as in "$main::x" or predeclare every + variable.

    Explicitly initialize each varialable in use. Otherwise with + caching enabled, the plugin will not be recompilied each time, and + therefore Perl will not reinitialize all the variables. All old + variable values will still be in effect.

  5. Do not use < DATA > (these simply do not compile under ePN).

  6. Do not use named subroutines

  7. If writing to a file (perhaps recording + performance data) explicitly close close it. The plugin never + calls exit; that is caught by + p1.pl, so output streams are never closed.

  8. As in the section called Runtime Timeouts all plugins need + to monitor their runtime, specially if they are using network + resources. Use of the alarm is recommended. + Plugins may import a default time out ($TIMEOUT) from utils.pm. +

  9. Perl plugins should import %ERRORS from utils.pm + and then "exit $ERRORS{'OK'}" rather than "exit 0" +


Runtime Timeouts

Plugins have a very limited runtime - typically 10 sec. + As a result, it is very important for plugins to maintain internal + code to exit if runtime exceeds a threshold.

All plugins should timeout gracefully, not just networking + plugins. For instance, df may lock if you have automounted + drives and your network fails - but on first glance, who'd think + df could lock up like that. Plus, it should just be more error + resistant to be able to time out rather than consume + resources.


Use DEFAULT_SOCKET_TIMEOUT

All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout


Add alarms to network plugins

If you write a plugin which communicates with another + networked host, you should make sure to set an alarm() in your + code that prevents the plugin from hanging due to abnormal + socket closures, etc. Nagios takes steps to protect itself + against unruly plugins that timeout, but any plugins you create + should be well behaved on their own.


Plugin Options

A well written plugin should have --help as a way to get + verbose help. Code and output should try to respect the 80x25 size of a + crt (remember when fixing stuff in the server room!)


Option Processing

For plugins written in C, we recommend the C standard + getopt library for short options. If using getopt_long, check to + be sure that HAVE_GETOPT_H is defined (configure checks this and + sets the #define in common/config.h).

For plugins written in Perl, we recommend Getopt::Long module.

Positional arguments are strongly discouraged.

There are a few reserved options that should not be used + for other purposes:

          -V version (--version)
+          -h help (--help)
+          -t timeout (--timeout)
+          -w warning threshold (--warning)
+          -c critical threshold (--critical)
+          -H hostname (--hostname)
+

In addition to the reserved options above, some other standard options are:

          -C SNMP community (--community)
+          -a authentication password (--authentication)
+          -l login name (--logname)
+          -p port or password (--port or --passwd/--password)monitors operational
+          -u url or username (--url or --username)
+

Look at check_pgsql and check_procs to see how I currently + think this can work. Standard options are:

The option -V or --version should be present in all + plugins. For C plugins it should result in a call to print_revision, a + function in utils.c which takes two character arguments, the + command name and the plugin revision.

The -? option, or any other unparsable set of options, + should print out a short usage statement. Character width should + be 80 and less and no more that 23 lines should be printed (it + should display cleanly on a dumb terminal in a server + room).

The option -h or --help should be present in all plugins. + In C plugins, it should result in a call to print_help (or + equivalent). The function print_help should call print_revision, + then print_usage, then should provide detailed + help. Help text should fit on an 80-character width display, but + may run as many lines as needed.


Plugins with more than one type of threshold, or with + threshold ranges

Old style was to do things like -ct for critical time and + -cv for critical value. That goes out the window with POSIX + getopt. The allowable alternatves are:

  1. long options like -critical-time (or -ct and -cv, I + suppose).

  2. repeated options like `check_load -w 10 -w 6 -w 4 -c + 16 -c 10 -c 10`

  3. for brevity, the above can be expressed as `check_load + -w 10,6,4 -c 16,10,10`

  4. ranges are expressed with colons as in `check_procs -C + httpd -w 1:20 -c 1:30` which will warn above 20 instances, + and critical at 0 and above 30

  5. lists are expressed with commas, so Jacob's check_nmap + uses constructs like '-p 1000,1010,1050:1060,2000'

  6. If possible when writing lists, use tokens to make the + list easy to remember and non-order dependent - so + check_disk uses '-c 10000,10%' so that it is clear which is + the precentage and which is the KB values (note that due to + my own lack of foresight, that used to be '-c 10000:10%' but + such constructs should all be changed for consistency, + though providing reverse compatibility is fairly + easy).

As always, comments are welcome - making this consistent + without a host of long options was quite a hassle, and I would + suspect that there are flaws in this strategy. Perhaps clear + long-options is the most important of the above choices, but not + all POSIX systems have C libraries for long options, so the + short forms must exist as well.


New submissions and patches

If you would like other to use your plugins and have it included in + the standard distribution, please include patches for the relavant + configuration files, in particular "configure.in" Otherwise submitted + plugins will be included in the contrib directory.

Plugins in the contrib directory are going to be migrated to the + standard plugins/plugin-scripts directory as time permits and per user + requests

Patches should be submitted via the SourceForge and be announced to + the mailing list.

For new plugins, provide a diff to add to the EXTRAS list (configure.in) + unless you are fairly sure that the plugin will work for all platforms with + no non-standard software added.

If possible please submit a test harness. Documentation on sample + tests coming soon.

\ No newline at end of file diff --git a/doc/developer-guidelines.sgml b/doc/developer-guidelines.sgml new file mode 100644 index 00000000..42ad8964 --- /dev/null +++ b/doc/developer-guidelines.sgml @@ -0,0 +1,483 @@ + + + Nagios Plug-in Developer Guidelines + + + + + Karl + DeBisschop + +
karl@debisschop.net
+
+
+ + + Ethan + Galstad + + Author of Nagios + + + +
netsaint@linuxbox.com
+
+
+ + + Hugo + Gayosso + +
hgayosso@gnu.org
+
+
+ + + + Subhendu + Ghosh + +
sghosh@sourceforge.net
+
+
+ + + Stanley + Hopcroft + +
stanleyhopcroft@sourceforge.net
+
+
+ +
+ + 2002 + Nagios plug-in development guidelines + + + + 0.4 + 2 May 2002 + + + + + 2000 2001 2002 + Karl DeBisschop, Ethan Galstad, + Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh + + +
+ + + + About the guidelines + + The purpose of this guidelines is to provide a reference for + the plug-in developers and encourage the standarization of the + different kind of plug-ins: C, shell, perl, python, etc. + + +
Copyright + + Nagios Plug-in Development Guidelines Copyright (C) 2000 2001 + 2002 + Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft, + Subhendu Ghosh + + Permission is granted to make and distribute verbatim + copies of this manual provided the copyright notice and this + permission notice are preserved on all copies. + + The plugins themselves are copyrighted by their respective + authors. + +
+
+ +
+
Plugin Output for Nagios + + You should always print something to STDOUT that tells if the + service is working or why its failing. Try to keep the output short - + probably less that 80 characters. Remember that you ideally would like + the entire output to appear in a pager message, which will get chopped + off after a certain length. + +
Print only one line of text + Nagios will only grab the first line of text from STDOUT + when it notifies contacts about potential problems. If you print + multiple lines, you're out of luck. Remember, keep it short and + to the point. +
+ +
Screen Output + The plug-in should print the diagnostic and just the + synopsis part of the help message. A well written plugin would + then have --help as a way to get the verbose help. + Code and output should try to respect the 80x25 size of a + crt (remember when fixing stuff in the server room!) +
+ +
Return the proper status code + See below + for the numeric values of status codes and their + description. Remember to return an UNKNOWN state if bogus or + invalid command line arguments are supplied or it you are unable + to check the service. +
+ +
Plugin Return Codes + The return codes below are based on the POSIX spec of returning + a positive value. Netsaint prior to v0.0.7 supported non-POSIX + compliant return code of "-1" for unknown. Nagios supports POSIX return + codes by default. + + Note: Some plugins will on occasion print on STDOUT that an error + occurred and error code is 138 or 255 or some such number. These + are usually caused by plugins using system commands and having not + enough checks to catch unexpected output. Developers should include a + default catch-all for system command output that returns an UNKOWN + return code. + + Plugin Return Codes + + + + Numeric Value + Service Status + Status Description + + + + + 0 + OK + The plugin was able to check the service and it + appeared to be functioning properly + + + 1 + Warning + The plugin was able to check the service, but it + appeared to be above some "warning" threshold or did not appear + to be working properly + + + 2 + Critical + The plugin detected that either the service was not + running or it was above some "critical" threshold + + + 3 + Unknown + Invalid command line arguments were supplied to the + plugin or the plugin was unable to check the status of the given + hosts/service + + + +
+ + +
+ + +
+ +
System Commands and Auxiliary Files + +
Don't execute system commands without specifying their + full path + Don't use exec(), popen(), etc. to execute external + commands without explicity using the full path of the external + program. + + Doing otherwise makes the plugin vulnerable to hijacking + by a trojan horse earlier in the search path. See the main + plugin distribution for examples on how this is done. +
+ +
Use spopen() if external commands must be executed + + If you have to execute external commands from within your + plugin and you're writing it in C, use the spopen() function + that Karl DeBisschop has written. + + The code for spopen() and spclose() is included with the + core plugin distribution. +
+ +
Don't make temp files unless absolutely required + + If temp files are needed, make sure that the plugin will + fail cleanly if the file can't be written (e.g., too few file + handles, out of disk space, incorrect permissions, etc.) and + delete the temp file when processing is complete. +
+ +
Don't be tricked into following symlinks + + If your plugin opens any files, take steps to ensure that + you are not following a symlink to another location on the + system. +
+ +
Validate all input + + use routines in utils.c or utils.pm and write more as needed +
+ +
+ + + + +
Perl Plugins + + Perl plugins are coded a little more defensively than other + plugins because of embedded Perl. When configured as such, embedded + Perl Nagios (ePN) requires stricter use of the some of Perl's features. + This section outlines some of the steps needed to use ePN + effectively. + + + + Do not use BEGIN and END blocks since they will be called + the first time and when Nagios shuts down with Embedded Perl (ePN). In + particular, do not use BEGIN blocks to initialize variables. + + + To use utils.pm, you need to provide a full path to the + module in order for it to work with ePN. + + + e.g. + use lib "/usr/local/nagios/libexec"; + use utils qw(...); + + + + Perl scripts should be called with "-w" + + + All Perl plugins must compile cleanly under "use strict" - i.e. at + least explicitly package names as in "$main::x" or predeclare every + variable. + + + Explicitly initialize each varialable in use. Otherwise with + caching enabled, the plugin will not be recompilied each time, and + therefore Perl will not reinitialize all the variables. All old + variable values will still be in effect. + + + Do not use < DATA > (these simply do not compile under ePN). + + + Do not use named subroutines + + + If writing to a file (perhaps recording + performance data) explicitly close close it. The plugin never + calls exit; that is caught by + p1.pl, so output streams are never closed. + + + As in all plugins need + to monitor their runtime, specially if they are using network + resources. Use of the alarm is recommended. + Plugins may import a default time out ($TIMEOUT) from utils.pm. + + + + Perl plugins should import %ERRORS from utils.pm + and then "exit $ERRORS{'OK'}" rather than "exit 0" + + + + + +
+ +
Runtime Timeouts + + Plugins have a very limited runtime - typically 10 sec. + As a result, it is very important for plugins to maintain internal + code to exit if runtime exceeds a threshold. + + All plugins should timeout gracefully, not just networking + plugins. For instance, df may lock if you have automounted + drives and your network fails - but on first glance, who'd think + df could lock up like that. Plus, it should just be more error + resistant to be able to time out rather than consume + resources. + +
Use DEFAULT_SOCKET_TIMEOUT + + All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout + +
+ + +
Add alarms to network plugins + + If you write a plugin which communicates with another + networked host, you should make sure to set an alarm() in your + code that prevents the plugin from hanging due to abnormal + socket closures, etc. Nagios takes steps to protect itself + against unruly plugins that timeout, but any plugins you create + should be well behaved on their own. + +
+ + + +
+ +
Plugin Options + + A well written plugin should have --help as a way to get + verbose help. Code and output should try to respect the 80x25 size of a + crt (remember when fixing stuff in the server room!) + +
Option Processing + + For plugins written in C, we recommend the C standard + getopt library for short options. If using getopt_long, check to + be sure that HAVE_GETOPT_H is defined (configure checks this and + sets the #define in common/config.h). + + For plugins written in Perl, we recommend Getopt::Long module. + + Positional arguments are strongly discouraged. + + There are a few reserved options that should not be used + for other purposes: + + + -V version (--version) + -h help (--help) + -t timeout (--timeout) + -w warning threshold (--warning) + -c critical threshold (--critical) + -H hostname (--hostname) + + + In addition to the reserved options above, some other standard options are: + + + -C SNMP community (--community) + -a authentication password (--authentication) + -l login name (--logname) + -p port or password (--port or --passwd/--password)monitors operational + -u url or username (--url or --username) + + + Look at check_pgsql and check_procs to see how I currently + think this can work. Standard options are: + + + The option -V or --version should be present in all + plugins. For C plugins it should result in a call to print_revision, a + function in utils.c which takes two character arguments, the + command name and the plugin revision. + + The -? option, or any other unparsable set of options, + should print out a short usage statement. Character width should + be 80 and less and no more that 23 lines should be printed (it + should display cleanly on a dumb terminal in a server + room). + + The option -h or --help should be present in all plugins. + In C plugins, it should result in a call to print_help (or + equivalent). The function print_help should call print_revision, + then print_usage, then should provide detailed + help. Help text should fit on an 80-character width display, but + may run as many lines as needed. + +
+ +
+ Plugins with more than one type of threshold, or with + threshold ranges + + Old style was to do things like -ct for critical time and + -cv for critical value. That goes out the window with POSIX + getopt. The allowable alternatves are: + + + + long options like -critical-time (or -ct and -cv, I + suppose). + + + + repeated options like `check_load -w 10 -w 6 -w 4 -c + 16 -c 10 -c 10` + + + + for brevity, the above can be expressed as `check_load + -w 10,6,4 -c 16,10,10` + + + + ranges are expressed with colons as in `check_procs -C + httpd -w 1:20 -c 1:30` which will warn above 20 instances, + and critical at 0 and above 30 + + + + lists are expressed with commas, so Jacob's check_nmap + uses constructs like '-p 1000,1010,1050:1060,2000' + + + + If possible when writing lists, use tokens to make the + list easy to remember and non-order dependent - so + check_disk uses '-c 10000,10%' so that it is clear which is + the precentage and which is the KB values (note that due to + my own lack of foresight, that used to be '-c 10000:10%' but + such constructs should all be changed for consistency, + though providing reverse compatibility is fairly + easy). + + + + + As always, comments are welcome - making this consistent + without a host of long options was quite a hassle, and I would + suspect that there are flaws in this strategy. Perhaps clear + long-options is the most important of the above choices, but not + all POSIX systems have C libraries for long options, so the + short forms must exist as well. +
+
+ +
New submissions and patches + + If you would like other to use your plugins and have it included in + the standard distribution, please include patches for the relavant + configuration files, in particular "configure.in" Otherwise submitted + plugins will be included in the contrib directory. + + Plugins in the contrib directory are going to be migrated to the + standard plugins/plugin-scripts directory as time permits and per user + requests + + Patches should be submitted via the SourceForge and be announced to + the mailing list. + + For new plugins, provide a diff to add to the EXTRAS list (configure.in) + unless you are fairly sure that the plugin will work for all platforms with + no non-standard software added. + + If possible please submit a test harness. Documentation on sample + tests coming soon. + +
+
+ +
-- cgit v1.2.3-74-g34f1