[Nagiosplug-help] check_icmp problems
Israel Brewster
israel at frontierflying.com
Tue Aug 26 00:43:38 CEST 2008
On Aug 25, 2008, at 1:11 PM, Lee Scott wrote:
> What is the load on the routers?
>
> I have seen something similar with Extreme Networks equipment.
>
> ICMP packets are not routed through the asic of the equipment and
> require
> CPU intervention for handling. Under high loads on my Extreme gear
> I would
> notice that i would get high ping times and packet loss. All other
> traffic
> on the equipment worked fine, it was just a design flaw of the
> equipment
> and not really an issue but icmp was being monitored correctly by
> nagios.
>
> Just a thought.
Thanks for the thought, however, as I mentioned I am NOT getting
packet loss from these routers- a standard ping, even when set to ping
flood, returns all packets. Additionally, check_icmp reports ALL
packets, other than the first two, as lost, consistently. I can set it
to however many pings I want, and run it many times in a row, and
every time the first two packets will be returned, and the rest lost.
I can also simultaneously run a ping flood from another terminal, and
get no lost packets to speak of. Additionally, these routers are
generally only feeding a handful of devices, mostly workstations and
no servers, anyway, so they shouldn't be under any significant load.
So no, it's not just that there really is packet loss going on. I do
appreciate the thought and your taking the time to express it,
however. Thanks!
-----------------------------------------------
Israel Brewster
Computer Support Technician
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------
>
>
>
>
>
>
> Israel Brewster
> <israel at frontierf
>
> lying.com> To
> Sent by: Andreas Ericsson <ae at op5.se>
> nagiosplug-help-
> b cc
> ounces at lists.sour nagiosplug-help at lists.sourceforge.n
> ceforge.net et
>
> Subject
> Re: [Nagiosplug-help] check_icmp
> 08/25/2008 05:05 problems
> PM
>
>
>
>
>
>
>
>
>
>
> -----------------------------------------------
> Israel Brewster
> Computer Support Technician
> Frontier Flying Service Inc.
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7250 x293
> -----------------------------------------------
>
>
> On Aug 25, 2008, at 11:45 AM, Andreas Ericsson wrote:
>
>> Israel Brewster wrote:
>>> I think I may have mentioned this before, and if so I apologize,
>>> but it remains an issue so I thought I'd try again. I am having a
>>> problem using check_icmp where it consistently shows a number of
>>> my hosts (all of which are on the same hardware) as having 60%
>>> packet loss, even though a straight ping against these hosts
>>> returns no packet loss, even when doing a ping flood (thus
>>> implying that the issue is not rate limiting). More specifically,
>>> it would appear that all but the first two packets are being
>>> dropped- if I increase the number of pings to 10, I get 80% loss,
>>> if I decrease the number of pings to 4 I only get 50% loss, and if
>>> I drop to two pings, I get no loss. Increasing the delay between
>>> packets has no noticeable effect until the delay numbers get
>>> ridiculously high, also indicating that rate limiting is not the
>>> problem here.
>>
>> Are you pinging the nodes one by one, or all at once?
>
> Both. To be more specific, I have been (trying) to use it as a general
> ping, rather than check ping for most of my hosts simply because of
> the speed factor- check_ping, as you are certainly aware, takes much
> longer to check a host. For those, I am doing a single node. However,
> I do have a couple of hosts with dual-WAN ports, and on those I am
> pinging both at once, so nagios will only report critical on the host
> as a whole if both ports are down. The devices with this issue show
> the problem regardless of whether they are the only node being pinged
> or not.
>
>>
>>
>>> FWIW, I was having the same problem with the fping program from
>>> smokeping, and it turned out that this was caused by the fping
>>> binary using the ICMP sequence number to indicate which host the
>>> packet was for, rather than incrementing the sequence number with
>>> each packet sent to a given host. After patching that, fping
>>> worked fine. Perhaps this is the same problem with check_icmp? I
>>> seem to recall someone giving a patch for that a while back, but I
>>> could never get the patch to apply properly, so I don't know if it
>>> would have worked. Thanks for any help/function patches that can
>>> be provided!
>>
>> check_icmp is an fping derivative (although rewritten so much that I
>> can't say which lines, if any, remains from the original binary).
>>
>> check_icmp does indeed maintain the host id number in the icmp->seq
>> field. It's impossible to do otherwise when scanning multiple nodes
>> if one wants to determine which of the hosts generated a particular
>> error code, since error codes do not echo the data payload of the
>> original packet.
>
> So maybe the patch that fixed fping also broke something else? Haven't
> noticed any problems yet, but maybe that just because of how it is
> being used in smokeping.
>
>>
>>
>> According to the ICMP RFC though (737, iirc), the sequence number
>> of the header really shouldn't matter. It's for the sending host to
>> determine and for the responding node to echo back.
>
> Interesting. So apparently it is the remote device that is at fault,
> although unfortunately there is nothing we can do about that.
>
>>
>>
>> May I ask what kind of equipment you're working on? It could be that
>> it's more worth to have accurate error responses on most hardware
>> than it is to get accurate multi-node pings for some rather special
>> hardware. Otoh, if you're running one check_icmp process per host,
>> then the issue can be worked around while maintaining accuracy in
>> error messages.
>
> One per host, although as I mentioned a couple of the hosts have dual
> interfaces, so check_icmp is pinging both. The devices are Linksys
> RV082 routers running firmware newer than 1.3.2 (1.3.2 and older
> firmware works fine, but isn't available), so nothing terribly
> special. These are dual-WAN 8-port routers with VPN capabilities built
> in, although we are not using the dual-wan functionality on all of
> them. The commonality between ones that don't work is that they all
> have a newer firmware than 1.3.2
>
>>
>>
>> Btw, I wrote check_icmp once upon a time, and I'd like to keep it
>> working as good as possible. The arse it one day bites might, after
>> all, be my own ;-)
>>
>> --
>> Andreas Ericsson andreas.ericsson at op5.se
>> OP5 AB www.op5.se
>> Tel: +46 8-230225 Fax: +46 8-230231
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Nagiosplug-help mailing list
> Nagiosplug-help at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagiosplug-help
> ::: Please include plugins version (-v) and OS when reporting any
> issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
More information about the Help
mailing list