[Nagiosplug-help] check_icmp problems
Andreas Ericsson
ae at op5.se
Mon Aug 25 21:45:33 CEST 2008
Israel Brewster wrote:
> I think I may have mentioned this before, and if so I apologize, but
> it remains an issue so I thought I'd try again. I am having a problem
> using check_icmp where it consistently shows a number of my hosts (all
> of which are on the same hardware) as having 60% packet loss, even
> though a straight ping against these hosts returns no packet loss,
> even when doing a ping flood (thus implying that the issue is not rate
> limiting). More specifically, it would appear that all but the first
> two packets are being dropped- if I increase the number of pings to
> 10, I get 80% loss, if I decrease the number of pings to 4 I only get
> 50% loss, and if I drop to two pings, I get no loss. Increasing the
> delay between packets has no noticeable effect until the delay numbers
> get ridiculously high, also indicating that rate limiting is not the
> problem here.
>
Are you pinging the nodes one by one, or all at once?
> FWIW, I was having the same problem with the fping program from
> smokeping, and it turned out that this was caused by the fping binary
> using the ICMP sequence number to indicate which host the packet was
> for, rather than incrementing the sequence number with each packet
> sent to a given host. After patching that, fping worked fine. Perhaps
> this is the same problem with check_icmp? I seem to recall someone
> giving a patch for that a while back, but I could never get the patch
> to apply properly, so I don't know if it would have worked. Thanks for
> any help/function patches that can be provided!
>
check_icmp is an fping derivative (although rewritten so much that I
can't say which lines, if any, remains from the original binary).
check_icmp does indeed maintain the host id number in the icmp->seq
field. It's impossible to do otherwise when scanning multiple nodes
if one wants to determine which of the hosts generated a particular
error code, since error codes do not echo the data payload of the
original packet.
According to the ICMP RFC though (737, iirc), the sequence number
of the header really shouldn't matter. It's for the sending host to
determine and for the responding node to echo back.
May I ask what kind of equipment you're working on? It could be that
it's more worth to have accurate error responses on most hardware
than it is to get accurate multi-node pings for some rather special
hardware. Otoh, if you're running one check_icmp process per host,
then the issue can be worked around while maintaining accuracy in
error messages.
Btw, I wrote check_icmp once upon a time, and I'd like to keep it
working as good as possible. The arse it one day bites might, after
all, be my own ;-)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
More information about the Help
mailing list