<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
On 2019-11-27 08:45, Csaba Dobo wrote:<br>
<blockquote type="cite"
cite="mid:CAOW7Zz3t2bzh4ZwuRJrA4o6iJU-mFVc+L9kR_E6qh9KkaEJNpQ@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">I am investigating this plugin and would like to
know the calculating method.<br>
<div><a
href="https://www.monitoring-plugins.org/doc/man/check_load.html"
moz-do-not-send="true">https://www.monitoring-plugins.org/doc/man/check_load.html</a></div>
<div><br>
</div>
<div>
<pre><code>-w, --warning=WLOAD1,WLOAD5,WLOAD15
Exit with WARNING status if load average exceeds WLOADn
-c, --critical=CLOAD1,CLOAD5,CLOAD15
Exit with CRITICAL status if load average exceed CLOADn
the load average format is the same used by "uptime" and "w"</code></pre>
</div>
<div>So when the system reports 3 values from ie. the uptime it
would be red by the plugin. And what is the evaluation logic?</div>
<br>
</div>
</blockquote>
<br>
Hi Csaba,<br>
<br>
This plugin will simply return WARNING or CRITICAL when the load is
above the specified WARNING and CRITICAL thresholds. This number is
expressed as a floating point number. The plugin is very lax about
missing thresholds and it will behave as such:<br>
<br>
1. Missing LOAD5 and/or LOAD15 value (for either threshold):
back-fill from the last given threshold value (LOAD1 or LOAD5)<br>
2. Missing warning or critical value: assume 0 (probably not
desired)<br>
<br>
<br>
The load average is the average number of process on the runqueue
for the last 1, 5 and 15 minutes. That number include currently
running process as well as those scheduled to run (usually if
greater than the numbed of cpus/cores) and most importantly
processes blocked in interruptible sleep (ex. blocked on I/O).<br>
<br>
On a purely CPU load, a number equal to the number of cores simply
means you're fully utilizing your system resources. Below it is
under-utilizing and above it you have processing contention. For I/O
load it depends on your I/O capacity and load average isn't the best
way to monitor specific block device usage (especially if you have
multiple devices as it doesn't tell you which one processes are
blocked on).<br>
<br>
Since load often consist of a mix between the two you have to
determine the right value for your specific load and it's usually
best when combined with other monitoring methods (like user/system
CPU cycles, context switch rate and per-device IO count/average
service time). On most system those metrics need a running daemon
like sadc (systats) or snmpd to collect as unlike load average they
cannot be just read in an instant (plugins that offers this will
often just poll for a very short time, between 500ms to 2 seconds,
which isn't a representative value and isn't scalable when you need
to poll many thousands of machines).<br>
<br>
Regards,<br>
<br>
Thomas<br>
<br>
<br>
</body>
</html>