[Nagiosplug-help] check_icmp problems
Israel Brewster
israel at frontierflying.com
Mon Aug 25 23:04:51 CEST 2008
-----------------------------------------------
Israel Brewster
Computer Support Technician
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------
On Aug 25, 2008, at 11:45 AM, Andreas Ericsson wrote:
> Israel Brewster wrote:
>> I think I may have mentioned this before, and if so I apologize,
>> but it remains an issue so I thought I'd try again. I am having a
>> problem using check_icmp where it consistently shows a number of
>> my hosts (all of which are on the same hardware) as having 60%
>> packet loss, even though a straight ping against these hosts
>> returns no packet loss, even when doing a ping flood (thus
>> implying that the issue is not rate limiting). More specifically,
>> it would appear that all but the first two packets are being
>> dropped- if I increase the number of pings to 10, I get 80% loss,
>> if I decrease the number of pings to 4 I only get 50% loss, and if
>> I drop to two pings, I get no loss. Increasing the delay between
>> packets has no noticeable effect until the delay numbers get
>> ridiculously high, also indicating that rate limiting is not the
>> problem here.
>
> Are you pinging the nodes one by one, or all at once?
Both. To be more specific, I have been (trying) to use it as a general
ping, rather than check ping for most of my hosts simply because of
the speed factor- check_ping, as you are certainly aware, takes much
longer to check a host. For those, I am doing a single node. However,
I do have a couple of hosts with dual-WAN ports, and on those I am
pinging both at once, so nagios will only report critical on the host
as a whole if both ports are down. The devices with this issue show
the problem regardless of whether they are the only node being pinged
or not.
>
>
>> FWIW, I was having the same problem with the fping program from
>> smokeping, and it turned out that this was caused by the fping
>> binary using the ICMP sequence number to indicate which host the
>> packet was for, rather than incrementing the sequence number with
>> each packet sent to a given host. After patching that, fping
>> worked fine. Perhaps this is the same problem with check_icmp? I
>> seem to recall someone giving a patch for that a while back, but I
>> could never get the patch to apply properly, so I don't know if it
>> would have worked. Thanks for any help/function patches that can
>> be provided!
>
> check_icmp is an fping derivative (although rewritten so much that I
> can't say which lines, if any, remains from the original binary).
>
> check_icmp does indeed maintain the host id number in the icmp->seq
> field. It's impossible to do otherwise when scanning multiple nodes
> if one wants to determine which of the hosts generated a particular
> error code, since error codes do not echo the data payload of the
> original packet.
So maybe the patch that fixed fping also broke something else? Haven't
noticed any problems yet, but maybe that just because of how it is
being used in smokeping.
>
>
> According to the ICMP RFC though (737, iirc), the sequence number
> of the header really shouldn't matter. It's for the sending host to
> determine and for the responding node to echo back.
Interesting. So apparently it is the remote device that is at fault,
although unfortunately there is nothing we can do about that.
>
>
> May I ask what kind of equipment you're working on? It could be that
> it's more worth to have accurate error responses on most hardware
> than it is to get accurate multi-node pings for some rather special
> hardware. Otoh, if you're running one check_icmp process per host,
> then the issue can be worked around while maintaining accuracy in
> error messages.
One per host, although as I mentioned a couple of the hosts have dual
interfaces, so check_icmp is pinging both. The devices are Linksys
RV082 routers running firmware newer than 1.3.2 (1.3.2 and older
firmware works fine, but isn't available), so nothing terribly
special. These are dual-WAN 8-port routers with VPN capabilities built
in, although we are not using the dual-wan functionality on all of
them. The commonality between ones that don't work is that they all
have a newer firmware than 1.3.2
>
>
> Btw, I wrote check_icmp once upon a time, and I'd like to keep it
> working as good as possible. The arse it one day bites might, after
> all, be my own ;-)
>
> --
> Andreas Ericsson andreas.ericsson at op5.se
> OP5 AB www.op5.se
> Tel: +46 8-230225 Fax: +46 8-230231
More information about the Help
mailing list