diff options
Diffstat (limited to 'doc/developer-guidelines.sgml')
-rw-r--r-- | doc/developer-guidelines.sgml | 483 |
1 files changed, 483 insertions, 0 deletions
diff --git a/doc/developer-guidelines.sgml b/doc/developer-guidelines.sgml new file mode 100644 index 0000000..42ad896 --- /dev/null +++ b/doc/developer-guidelines.sgml | |||
@@ -0,0 +1,483 @@ | |||
1 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.1//EN"> | ||
2 | <book> | ||
3 | <title>Nagios Plug-in Developer Guidelines</title> | ||
4 | |||
5 | <bookinfo> | ||
6 | <authorgroup> | ||
7 | <author> | ||
8 | <firstname>Karl</firstname> | ||
9 | <surname>DeBisschop</surname> | ||
10 | <affiliation> | ||
11 | <address><email>karl@debisschop.net</email></address> | ||
12 | </affiliation> | ||
13 | </author> | ||
14 | |||
15 | <author> | ||
16 | <firstname>Ethan</firstname> | ||
17 | <surname>Galstad</surname> | ||
18 | <authorblurb> | ||
19 | <para>Author of Nagios</para> | ||
20 | <para><ulink url="http://www.nagios.org"></ulink></para> | ||
21 | </authorblurb> | ||
22 | <affiliation> | ||
23 | <address><email>netsaint@linuxbox.com</email></address> | ||
24 | </affiliation> | ||
25 | </author> | ||
26 | |||
27 | <author> | ||
28 | <firstname>Hugo</firstname> | ||
29 | <surname>Gayosso</surname> | ||
30 | <affiliation> | ||
31 | <address><email>hgayosso@gnu.org</email></address> | ||
32 | </affiliation> | ||
33 | </author> | ||
34 | |||
35 | |||
36 | <author> | ||
37 | <firstname>Subhendu</firstname> | ||
38 | <surname>Ghosh</surname> | ||
39 | <affiliation> | ||
40 | <address><email>sghosh@sourceforge.net</email></address> | ||
41 | </affiliation> | ||
42 | </author> | ||
43 | |||
44 | <author> | ||
45 | <firstname>Stanley</firstname> | ||
46 | <surname>Hopcroft</surname> | ||
47 | <affiliation> | ||
48 | <address><email>stanleyhopcroft@sourceforge.net</email></address> | ||
49 | </affiliation> | ||
50 | </author> | ||
51 | |||
52 | </authorgroup> | ||
53 | |||
54 | <pubdate>2002</pubdate> | ||
55 | <title>Nagios plug-in development guidelines</title> | ||
56 | |||
57 | <revhistory> | ||
58 | <revision> | ||
59 | <revnumber>0.4</revnumber> | ||
60 | <date>2 May 2002</date> | ||
61 | </revision> | ||
62 | </revhistory> | ||
63 | |||
64 | <copyright> | ||
65 | <year>2000 2001 2002</year> | ||
66 | <holder>Karl DeBisschop, Ethan Galstad, | ||
67 | Hugo Gayosso, Stanley Hopcroft, Subhendu Ghosh</holder> | ||
68 | </copyright> | ||
69 | |||
70 | </bookinfo> | ||
71 | |||
72 | |||
73 | <preface id=preface> | ||
74 | <title>About the guidelines</title> | ||
75 | |||
76 | <para>The purpose of this guidelines is to provide a reference for | ||
77 | the plug-in developers and encourage the standarization of the | ||
78 | different kind of plug-ins: C, shell, perl, python, etc.</para> | ||
79 | |||
80 | |||
81 | <section> <title>Copyright</title> | ||
82 | |||
83 | <para>Nagios Plug-in Development Guidelines Copyright (C) 2000 2001 | ||
84 | 2002 | ||
85 | Karl DeBisschop, Ethan Galstad, Hugo Gayosso, Stanley Hopcroft, | ||
86 | Subhendu Ghosh</para> | ||
87 | |||
88 | <para>Permission is granted to make and distribute verbatim | ||
89 | copies of this manual provided the copyright notice and this | ||
90 | permission notice are preserved on all copies.</para> | ||
91 | |||
92 | <para>The plugins themselves are copyrighted by their respective | ||
93 | authors.</para> | ||
94 | |||
95 | </section> | ||
96 | </preface> | ||
97 | |||
98 | <article> | ||
99 | <section id="PlugOutput"><title>Plugin Output for Nagios</title> | ||
100 | |||
101 | <para>You should always print something to STDOUT that tells if the | ||
102 | service is working or why its failing. Try to keep the output short - | ||
103 | probably less that 80 characters. Remember that you ideally would like | ||
104 | the entire output to appear in a pager message, which will get chopped | ||
105 | off after a certain length.</para> | ||
106 | |||
107 | <section><title>Print only one line of text</title> | ||
108 | <para>Nagios will only grab the first line of text from STDOUT | ||
109 | when it notifies contacts about potential problems. If you print | ||
110 | multiple lines, you're out of luck. Remember, keep it short and | ||
111 | to the point.</para> | ||
112 | </section> | ||
113 | |||
114 | <section><title>Screen Output</title> | ||
115 | <para>The plug-in should print the diagnostic and just the | ||
116 | synopsis part of the help message. A well written plugin would | ||
117 | then have --help as a way to get the verbose help.</para> | ||
118 | <para>Code and output should try to respect the 80x25 size of a | ||
119 | crt (remember when fixing stuff in the server room!)</para> | ||
120 | </section> | ||
121 | |||
122 | <section><title>Return the proper status code</title> | ||
123 | <para>See <xref linkend="ReturnCodes"> below | ||
124 | for the numeric values of status codes and their | ||
125 | description. Remember to return an UNKNOWN state if bogus or | ||
126 | invalid command line arguments are supplied or it you are unable | ||
127 | to check the service.</para> | ||
128 | </section> | ||
129 | |||
130 | <section><title>Plugin Return Codes</title> | ||
131 | <para>The return codes below are based on the POSIX spec of returning | ||
132 | a positive value. Netsaint prior to v0.0.7 supported non-POSIX | ||
133 | compliant return code of "-1" for unknown. Nagios supports POSIX return | ||
134 | codes by default.</para> | ||
135 | |||
136 | <para>Note: Some plugins will on occasion print on STDOUT that an error | ||
137 | occurred and error code is 138 or 255 or some such number. These | ||
138 | are usually caused by plugins using system commands and having not | ||
139 | enough checks to catch unexpected output. Developers should include a | ||
140 | default catch-all for system command output that returns an UNKOWN | ||
141 | return code.</para> | ||
142 | |||
143 | <table id="ReturnCodes"><title>Plugin Return Codes</title> | ||
144 | <tgroup cols="3"> | ||
145 | <thead> | ||
146 | <row> | ||
147 | <entry><para>Numeric Value</para></entry> | ||
148 | <entry><para>Service Status</para></entry> | ||
149 | <entry><para>Status Description</para></entry> | ||
150 | </row> | ||
151 | </thead> | ||
152 | <tbody> | ||
153 | <row> | ||
154 | <entry align=center><para>0</para></entry> | ||
155 | <entry valign=middle><para>OK</para></entry> | ||
156 | <entry><para>The plugin was able to check the service and it | ||
157 | appeared to be functioning properly</para></entry> | ||
158 | </row> | ||
159 | <row> | ||
160 | <entry align=center><para>1</para></entry> | ||
161 | <entry valign=middle><para>Warning</para></entry> | ||
162 | <entry><para>The plugin was able to check the service, but it | ||
163 | appeared to be above some "warning" threshold or did not appear | ||
164 | to be working properly</para></entry> | ||
165 | </row> | ||
166 | <row> | ||
167 | <entry align=center><para>2</para></entry> | ||
168 | <entry valign=middle><para>Critical</para></entry> | ||
169 | <entry><para>The plugin detected that either the service was not | ||
170 | running or it was above some "critical" threshold</para></entry> | ||
171 | </row> | ||
172 | <row> | ||
173 | <entry align=center><para>3</para></entry> | ||
174 | <entry valign=middle><para>Unknown</para></entry> | ||
175 | <entry><para>Invalid command line arguments were supplied to the | ||
176 | plugin or the plugin was unable to check the status of the given | ||
177 | hosts/service</para></entry> | ||
178 | </row> | ||
179 | </tbody> | ||
180 | </tgroup> | ||
181 | </table> | ||
182 | |||
183 | |||
184 | </section> | ||
185 | |||
186 | |||
187 | </section> | ||
188 | |||
189 | <section id="SysCmdAuxFiles"><title>System Commands and Auxiliary Files</title> | ||
190 | |||
191 | <section><title>Don't execute system commands without specifying their | ||
192 | full path</title> | ||
193 | <para>Don't use exec(), popen(), etc. to execute external | ||
194 | commands without explicity using the full path of the external | ||
195 | program.</para> | ||
196 | |||
197 | <para>Doing otherwise makes the plugin vulnerable to hijacking | ||
198 | by a trojan horse earlier in the search path. See the main | ||
199 | plugin distribution for examples on how this is done.</para> | ||
200 | </section> | ||
201 | |||
202 | <section><title>Use spopen() if external commands must be executed</title> | ||
203 | |||
204 | <para>If you have to execute external commands from within your | ||
205 | plugin and you're writing it in C, use the spopen() function | ||
206 | that Karl DeBisschop has written.</para> | ||
207 | |||
208 | <para>The code for spopen() and spclose() is included with the | ||
209 | core plugin distribution.</para> | ||
210 | </section> | ||
211 | |||
212 | <section><title>Don't make temp files unless absolutely required</title> | ||
213 | |||
214 | <para>If temp files are needed, make sure that the plugin will | ||
215 | fail cleanly if the file can't be written (e.g., too few file | ||
216 | handles, out of disk space, incorrect permissions, etc.) and | ||
217 | delete the temp file when processing is complete.</para> | ||
218 | </section> | ||
219 | |||
220 | <section><title>Don't be tricked into following symlinks</title> | ||
221 | |||
222 | <para>If your plugin opens any files, take steps to ensure that | ||
223 | you are not following a symlink to another location on the | ||
224 | system.</para> | ||
225 | </section> | ||
226 | |||
227 | <section><title>Validate all input</title> | ||
228 | |||
229 | <para>use routines in utils.c or utils.pm and write more as needed</para> | ||
230 | </section> | ||
231 | |||
232 | </section> | ||
233 | |||
234 | |||
235 | |||
236 | |||
237 | <section id="PerlPlugin"><title>Perl Plugins</title> | ||
238 | |||
239 | <para>Perl plugins are coded a little more defensively than other | ||
240 | plugins because of embedded Perl. When configured as such, embedded | ||
241 | Perl Nagios (ePN) requires stricter use of the some of Perl's features. | ||
242 | This section outlines some of the steps needed to use ePN | ||
243 | effectively.</para> | ||
244 | |||
245 | <orderedlist> | ||
246 | |||
247 | <listitem><para> Do not use BEGIN and END blocks since they will be called | ||
248 | the first time and when Nagios shuts down with Embedded Perl (ePN). In | ||
249 | particular, do not use BEGIN blocks to initialize variables.</para> | ||
250 | </listitem> | ||
251 | |||
252 | <listitem><para>To use utils.pm, you need to provide a full path to the | ||
253 | module in order for it to work with ePN.</para> | ||
254 | |||
255 | <literallayout> | ||
256 | e.g. | ||
257 | use lib "/usr/local/nagios/libexec"; | ||
258 | use utils qw(...); | ||
259 | </literallayout> | ||
260 | </listitem> | ||
261 | |||
262 | <listitem><para>Perl scripts should be called with "-w"</para> | ||
263 | </listitem> | ||
264 | |||
265 | <listitem><para>All Perl plugins must compile cleanly under "use strict" - i.e. at | ||
266 | least explicitly package names as in "$main::x" or predeclare every | ||
267 | variable. </para> | ||
268 | |||
269 | |||
270 | <para>Explicitly initialize each varialable in use. Otherwise with | ||
271 | caching enabled, the plugin will not be recompilied each time, and | ||
272 | therefore Perl will not reinitialize all the variables. All old | ||
273 | variable values will still be in effect.</para> | ||
274 | </listitem> | ||
275 | |||
276 | <listitem><para>Do not use < DATA > (these simply do not compile under ePN).</para> | ||
277 | </listitem> | ||
278 | |||
279 | <listitem><para>Do not use named subroutines</para> | ||
280 | </listitem> | ||
281 | |||
282 | <listitem><para>If writing to a file (perhaps recording | ||
283 | performance data) explicitly close close it. The plugin never | ||
284 | calls <emphasis role=strong>exit</emphasis>; that is caught by | ||
285 | p1.pl, so output streams are never closed.</para> | ||
286 | </listitem> | ||
287 | |||
288 | <listitem><para>As in <xref linkend="runtime"> all plugins need | ||
289 | to monitor their runtime, specially if they are using network | ||
290 | resources. Use of the <emphasis>alarm</emphasis> is recommended. | ||
291 | Plugins may import a default time out ($TIMEOUT) from utils.pm. | ||
292 | </para> | ||
293 | </listitem> | ||
294 | |||
295 | <listitem><para>Perl plugins should import %ERRORS from utils.pm | ||
296 | and then "exit $ERRORS{'OK'}" rather than "exit 0" | ||
297 | </para> | ||
298 | </listitem> | ||
299 | |||
300 | </orderedlist> | ||
301 | |||
302 | </section> | ||
303 | |||
304 | <section id="runtime"><title>Runtime Timeouts</title> | ||
305 | |||
306 | <para>Plugins have a very limited runtime - typically 10 sec. | ||
307 | As a result, it is very important for plugins to maintain internal | ||
308 | code to exit if runtime exceeds a threshold. </para> | ||
309 | |||
310 | <para>All plugins should timeout gracefully, not just networking | ||
311 | plugins. For instance, df may lock if you have automounted | ||
312 | drives and your network fails - but on first glance, who'd think | ||
313 | df could lock up like that. Plus, it should just be more error | ||
314 | resistant to be able to time out rather than consume | ||
315 | resources.</para> | ||
316 | |||
317 | <section><title>Use DEFAULT_SOCKET_TIMEOUT</title> | ||
318 | |||
319 | <para>All network plugins should use DEFAULT_SOCKET_TIMEOUT to timeout</para> | ||
320 | |||
321 | </section> | ||
322 | |||
323 | |||
324 | <section><title>Add alarms to network plugins</title> | ||
325 | |||
326 | <para>If you write a plugin which communicates with another | ||
327 | networked host, you should make sure to set an alarm() in your | ||
328 | code that prevents the plugin from hanging due to abnormal | ||
329 | socket closures, etc. Nagios takes steps to protect itself | ||
330 | against unruly plugins that timeout, but any plugins you create | ||
331 | should be well behaved on their own.</para> | ||
332 | |||
333 | </section> | ||
334 | |||
335 | |||
336 | |||
337 | </section> | ||
338 | |||
339 | <section id="PlugOptions"><title>Plugin Options</title> | ||
340 | |||
341 | <para>A well written plugin should have --help as a way to get | ||
342 | verbose help. Code and output should try to respect the 80x25 size of a | ||
343 | crt (remember when fixing stuff in the server room!)</para> | ||
344 | |||
345 | <section><title>Option Processing</title> | ||
346 | |||
347 | <para>For plugins written in C, we recommend the C standard | ||
348 | getopt library for short options. If using getopt_long, check to | ||
349 | be sure that HAVE_GETOPT_H is defined (configure checks this and | ||
350 | sets the #define in common/config.h).</para> | ||
351 | |||
352 | <para>For plugins written in Perl, we recommend Getopt::Long module.</para> | ||
353 | |||
354 | <para>Positional arguments are strongly discouraged.</para> | ||
355 | |||
356 | <para>There are a few reserved options that should not be used | ||
357 | for other purposes:</para> | ||
358 | |||
359 | <literallayout> | ||
360 | -V version (--version) | ||
361 | -h help (--help) | ||
362 | -t timeout (--timeout) | ||
363 | -w warning threshold (--warning) | ||
364 | -c critical threshold (--critical) | ||
365 | -H hostname (--hostname) | ||
366 | </literallayout> | ||
367 | |||
368 | <para>In addition to the reserved options above, some other standard options are:</para> | ||
369 | |||
370 | <literallayout> | ||
371 | -C SNMP community (--community) | ||
372 | -a authentication password (--authentication) | ||
373 | -l login name (--logname) | ||
374 | -p port or password (--port or --passwd/--password)monitors operational | ||
375 | -u url or username (--url or --username) | ||
376 | </literallayout> | ||
377 | |||
378 | <para>Look at check_pgsql and check_procs to see how I currently | ||
379 | think this can work. Standard options are:</para> | ||
380 | |||
381 | |||
382 | <para>The option -V or --version should be present in all | ||
383 | plugins. For C plugins it should result in a call to print_revision, a | ||
384 | function in utils.c which takes two character arguments, the | ||
385 | command name and the plugin revision.</para> | ||
386 | |||
387 | <para>The -? option, or any other unparsable set of options, | ||
388 | should print out a short usage statement. Character width should | ||
389 | be 80 and less and no more that 23 lines should be printed (it | ||
390 | should display cleanly on a dumb terminal in a server | ||
391 | room).</para> | ||
392 | |||
393 | <para>The option -h or --help should be present in all plugins. | ||
394 | In C plugins, it should result in a call to print_help (or | ||
395 | equivalent). The function print_help should call print_revision, | ||
396 | then print_usage, then should provide detailed | ||
397 | help. Help text should fit on an 80-character width display, but | ||
398 | may run as many lines as needed.</para> | ||
399 | |||
400 | </section> | ||
401 | |||
402 | <section> | ||
403 | <title>Plugins with more than one type of threshold, or with | ||
404 | threshold ranges</title> | ||
405 | |||
406 | <para>Old style was to do things like -ct for critical time and | ||
407 | -cv for critical value. That goes out the window with POSIX | ||
408 | getopt. The allowable alternatves are:</para> | ||
409 | |||
410 | <orderedlist> | ||
411 | <listitem> | ||
412 | <para>long options like -critical-time (or -ct and -cv, I | ||
413 | suppose).</para> | ||
414 | </listitem> | ||
415 | |||
416 | <listitem> | ||
417 | <para>repeated options like `check_load -w 10 -w 6 -w 4 -c | ||
418 | 16 -c 10 -c 10`</para> | ||
419 | </listitem> | ||
420 | |||
421 | <listitem> | ||
422 | <para>for brevity, the above can be expressed as `check_load | ||
423 | -w 10,6,4 -c 16,10,10`</para> | ||
424 | </listitem> | ||
425 | |||
426 | <listitem> | ||
427 | <para>ranges are expressed with colons as in `check_procs -C | ||
428 | httpd -w 1:20 -c 1:30` which will warn above 20 instances, | ||
429 | and critical at 0 and above 30</para> | ||
430 | </listitem> | ||
431 | |||
432 | <listitem> | ||
433 | <para>lists are expressed with commas, so Jacob's check_nmap | ||
434 | uses constructs like '-p 1000,1010,1050:1060,2000'</para> | ||
435 | </listitem> | ||
436 | |||
437 | <listitem> | ||
438 | <para>If possible when writing lists, use tokens to make the | ||
439 | list easy to remember and non-order dependent - so | ||
440 | check_disk uses '-c 10000,10%' so that it is clear which is | ||
441 | the precentage and which is the KB values (note that due to | ||
442 | my own lack of foresight, that used to be '-c 10000:10%' but | ||
443 | such constructs should all be changed for consistency, | ||
444 | though providing reverse compatibility is fairly | ||
445 | easy).</para> | ||
446 | </listitem> | ||
447 | |||
448 | </orderedlist> | ||
449 | |||
450 | <para>As always, comments are welcome - making this consistent | ||
451 | without a host of long options was quite a hassle, and I would | ||
452 | suspect that there are flaws in this strategy. Perhaps clear | ||
453 | long-options is the most important of the above choices, but not | ||
454 | all POSIX systems have C libraries for long options, so the | ||
455 | short forms must exist as well.</para> | ||
456 | </section> | ||
457 | </section> | ||
458 | |||
459 | <section id="SubmittingChanges"><title>New submissions and patches</title> | ||
460 | |||
461 | <para>If you would like other to use your plugins and have it included in | ||
462 | the standard distribution, please include patches for the relavant | ||
463 | configuration files, in particular "configure.in" Otherwise submitted | ||
464 | plugins will be included in the contrib directory.</para> | ||
465 | |||
466 | <para>Plugins in the contrib directory are going to be migrated to the | ||
467 | standard plugins/plugin-scripts directory as time permits and per user | ||
468 | requests</para> | ||
469 | |||
470 | <para>Patches should be submitted via the SourceForge and be announced to | ||
471 | the mailing list.</para> | ||
472 | |||
473 | <para>For new plugins, provide a diff to add to the EXTRAS list (configure.in) | ||
474 | unless you are fairly sure that the plugin will work for all platforms with | ||
475 | no non-standard software added.</para> | ||
476 | |||
477 | <para>If possible please submit a test harness. Documentation on sample | ||
478 | tests coming soon.</para> | ||
479 | |||
480 | </section> | ||
481 | </article> | ||
482 | |||
483 | </book> | ||