AW: help needed with nagios alert
Heinze, Markus
Markus.Heinze at esta-bw.de
Tue Jan 27 12:03:11 CET 2015
Hi,
only a try to sort some things out.
Didn't know much of hadoop cluster, but think cluster means different clusternodes.
Did you check the master node against the free disk space or each node independently ?
An entry in the hosts.cfg for the world accessible hadoop cluster ip/dns name and different entrys for each clusternode?
We use a small linux webcluster with replicated MySQL databases and webdirectoys.
For replication we use DRBD and pacemaker as resource manager.
We get alerts for the whole cluster and each cluster node.
So, I use two different check_disk alerts. One for the replicated volume: check_linux_drbd0_disk.
Volume size and free disk space is the same over each cluster node.
The second check_disk alert checks the real hdd in each clusternode: check_linux_root_disk.
It's the physical hdd plugged into each cluster node.
$HOSTADDRESS$:
For check_linux_drbd0_disk it is the active, world accessible address. For example: www.example.com
For check_linux_root_disk it is the internal address of each clusternode. For example clusternode1.internal.com, clusternode2.internal.com
The objects/commands.cfg:
define command{
command_name check_linux_drbd0_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c check_drbd0
}
define command{
command_name check_linux_root_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -n -c check_sda1
}
The /usr/local/nagios/etc/nrpe.cfg on each clusternode:
command[check_drbd0]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/drbd0
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sda1
With this, we get alerts:
Running out of disk space for www.example.com
Running out of disk space for each clusternode
Regards,
Markus.
Earn money: http://www.verdiene-geld-im-netz.de/en/index.html
Von: Help [mailto:help-bounces+markus.heinze=esta-bw.de at monitoring-plugins.org] Im Auftrag von Natva, Arun Kumar
Gesendet: Freitag, 23. Januar 2015 23:47
An: help at monitoring-plugins.org
Betreff: help needed with nagios alert
Hi,
I am using nagios for alerting in our hadoop cluster.
When I setup a check_disk alert on all the nodes in the cluster, we are getting emails for all the hosts even though only one of the nodes exceeds the disk space threshold.
I tried multiple things but I am unable to figure out why nagios sends alerts for all hosts instead of just one host. Can you please help
Regards,
Arun.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-plugins.org/archive/help/attachments/20150127/756e95b0/attachment.html>
More information about the Help
mailing list