[Nagiosplug-help] check_disk hanging on bad nfs mount
Thomas Guyot-Sionnest
dermoth at aei.ca
Thu Jan 24 04:50:03 CET 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 23/01/08 03:05 PM, Mike Lindsey wrote:
> I'm running check_disk v1848 (nagios-plugins 1.4.11) on FreeBSD 6.0
>
> I've got a bad nfs mount, which is causing check_disk to hang, leaving,
> eventually, thousands of check_disk processes.
>
> truss ./check_disk -vvvv results in:
> [...]
> For /logs, total=830472192, available=324530624,
> available_to_root=324530624, used=505941568, fsp.fsu_files=15218898,
> fsp.fsu_ffree=15168033
> write(1,0x8066000,141) = 141 (0x8d)
> For /logs, used_pct=61 free_pct=39 used_units=247041 free_units=158462
> total_units=405504 used_inodes_pct=1 free_inodes_pct=99
> fsp.fsu_blocksize=512 mult=1048576
> write(1,0x8066000,162) = 162 (0xa2)
> Freespace_units result=0
> write(1,0x8066000,25) = 25 (0x19)
> Freespace% result=0
> write(1,0x8066000,20) = 20 (0x14)
> Usedspace_units result=0
> write(1,0x8066000,25) = 25 (0x19)
> Usedspace_percent result=0
> write(1,0x8066000,27) = 27 (0x1b)
> Usedinodes_percent result=0
> write(1,0x8066000,28) = 28 (0x1c)
> Freeinodes_percent result=0
> write(1,0x8066000,28) = 28 (0x1c)
> calling stat on /host
> write(1,0x8066000,22) = 22 (0x16)
>
> After which, it hangs. My standard arguements just set it to check the
> partition to see if it's mounted.
>
> check_disk -w 20 -c 10 -e -A -L -X procfs -X devfs
>
> Ideas, thoughts, workarounds or fixes?
That's normal behavior to hang on NFS when the server go away. All
process waiting for IO on the NFS will block until the server is back.
If you have a properly configures HA cluster NFS operations should
resume as well on failovers.
If you don't want this behavior, look in your nfs or mount manual for an
option to avoid this behavior. Here's what it says on Linux:
soft If an NFS file operation has a major timeout
then report an I/O error to the calling program.
The default is to continue retrying NFS file
operations indefinitely.
hard If an NFS file operation has a major timeout
then report "server not responding" on the con‐
sole and continue retrying indefinitely. This
is the default.
intr If an NFS file operation has a major timeout and
it is hard mounted, then allow signals to
interupt the file operation and cause it to
return EINTR to the calling program. The
default is to not allow file operations to be
interrupted.
So for a Linux server, "-o soft" would fix it, or alternatively
"-o intr" would leave the processes behind but allow you to kill them.
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHmArr6dZ+Kt5BchYRApwPAJ0RTETAr7Zu7bfiYpXt1VNGNh18KACg0ncJ
Q+B9QAP5ElqSrO58gNR+8x8=
=vHOV
-----END PGP SIGNATURE-----
More information about the Help
mailing list