Nagios Script Sample BASH

What it does

I've been responsible for helping to run the servers and hardware that keep a technology company running. I set up and deployed Nagios as a monitoring solution, this way we can see problems happening before they affect our operations. During the process of implementing Nagios I realized there are no good scripts to monitor OS X machines. I took a week and wrote my own Nagios plugin using BASH. We use this script to monitor every OS X based machine in the building. It works incredibly well.

You can see the actual code I wrote in the next tab. You can also see the direct output from Nagios when you run the script.

The code

#!/bin/bash ###########################DECLARING GLOBAL VARIABLES################################# SKIPVOLUMES=false SKIPCPU=false SKIPPROCESSES=false SKIPDISKFULL=false HOST="" VERSION="1" #SNMP VERSION - This script only works with v.1 COMMUNITY="" VOLUMES="" PROCESSES="" PERCENTLIMIT="" PROCESSESTOCHECK="" EXITCODE=0 ALLRESULTSMESSAGE="" WARNINGMESSAGE="" ###########################PARSING ARGUMENTS AND SETTING ERROR MESSAGES################################# while getopts ":c:H:d:l:p:f:" opt; do case $opt in c) COMMUNITY=$OPTARG ;; H) ping -c 3 $HOST snmpget -v 1 -c public $OPTARG SNMPv2-MIB::sysName.0 >/dev/null 2>&1 if [ $? == 0 ] then HOSTEXISTS=$(snmpget -v $VERSION -c $COMMUNITY $OPTARG SNMPv2-MIB::sysName.0) HOSTEXISTS=`cut -d " " -f 4 <<< $HOSTEXISTS` ALLRESULTSMESSAGE+="Host "$HOSTEXISTS" Exists and is Reachable\n" HOST=$OPTARG else ALLRESULTSMESSAGE+="Host is Unreachable. Check your Host,Community and Version variables\n" WARNINGMESSAGE+="Host is Unreachable. Check your Host,Community and Version variables" echo "Host is Unreachable. It may be powered down or improperly configured" exit 2 fi ;; d) if [ $OPTARG == "SKIP" ] || [ $OPTARG == "skip" ] then SKIPVOLUMES=true ALLRESULTSMESSAGE+="Skipping the check for mounted volumes. " elif [ $OPTARG == "HELP" ] || [ $OPTARG == "help" ] then SKIPVOLUMES=true echo "You can list the Volume path as it appears in your SNMP.conf file. Generally this would be /Volumes/DISKNAME1. You can list multiple volumes as a comma separated list. /Volumes/DISKNAME1,/Volumes/DISKNAME2,/Volumes/DISKNAME2,/Volumes/DISKNAME4. You can skip this check by typing skip or SKIP as an argument" else IFS="," VOLUMES=($OPTARG) unset IFS fi ;; l) if [ $OPTARG == "SKIP" ] || [ $OPTARG == "skip" ] then SKIPCPU=true elif [ $OPTARG == "HELP" ] || [ $OPTARG == "help" ] then SKIPCPU=true echo "You can specify CPU Loads limits at 1 minute, 5 minute and 15 minutes averages. You must use an integer 1-1000. 1 = .01% load while 1000 = 100% load. i.e. 850 is 85% load. You must specify the loads in order in a comma separated list. 1min,5min,15min or 980,950,850 which would equal 98%(1 minute) 95%(5 minutes) 85%(15 minutes). You can skip this check by typing skip or SKIP as an argument" else IFS="," CPULIMITS=($OPTARG) unset IFS if [ ${#CPULIMITS[@]} == 3 ]; then for CPULIMIT in "${CPULIMITS[@]}"; do if [ "$CPULIMIT" -eq "$CPULIMIT" ] 2>/dev/null; then : else echo "You must specify an integer 1-1000. "$CPULIMIT" is not a number" fi done else echo "You must specify 3 integers (1-1000) separated by commas (no space) -c 1000,900,800" fi fi ;; p) if [ $OPTARG == "SKIP" ] || [ $OPTARG == "skip" ] then SKIPPROCESSES=true elif [ $OPTARG == "HELP" ] || [ $OPTARG == "help" ] then SKIPPROCESSES=true echo "You must list the process name as it appears in your SNMP.conf file which should match your activity monitor. Generally this would be httpd or ssh or something like that. You can list multiple processes as a comma separated list like this: httpd,ssh,CrashPlanService. You can skip this check by typing skip or SKIP as an argument" else IFS="," PROCESSESTOCHECK=($OPTARG) unset IFS fi ;; f) if [ $OPTARG == "SKIP" ] || [ $OPTARG == "skip" ] then SKIPDISKFULL=true elif [ $OPTARG == "HELP" ] || [ $OPTARG == "help" ] then SKIPDISKFULL=true echo "You must list an integer 1-100 to specify how full you would like the disk to be before it generates a critical warning. 85 will trigger a warning if the disk space becomes more than 85 percent full. This check will run for all mounted volumes. You cannot specify specific volumes. You can skip this check by typing skip or SKIP as an argument" else if [ "$OPTARG" -eq "$OPTARG" ] 2>/dev/null; then PERCENTLIMIT=$OPTARG else echo "You must specify a fullness threshold as an integer 1-100. Alert me when my disk is 80% full. (i.e. -d 80). "$OPTARG" is not a number." fi fi ;; \?) echo "Invalid option: -$OPTARG" >&2 echo "Flags are -c (Community) -H(host) -d(check for specific mounted volumes.) -l(CPU load check) p(check for specific processes) -f(check how full all mounted disks are). You will need to configure your /etc/snmp/smnpd.conf on each mac you monitor. You must specify arguments for each flag, for help type HELP as an argument. IE -l HELP or -f HELP. You can also skip a check by specifying SKIP (IE -f SKIP or -l SKIP) as an argument." exit 1 ;; :) echo "Option -$OPTARG requires an argument. Try using HELP as an argument. (IE -l HELP or -f HELP ...)" >&2 exit 1 ;; esac done ###########################RUNNING THE CHECKS################################# ###########################Checking Mounted Disks############################# ALLRESULTSMESSAGE+="#################\n" if [ $SKIPVOLUMES == false ] then MOUNTEDDISKINDEX=`snmpwalk -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::dskIndex | wc -l` for VOLUME in "${VOLUMES[@]}"; do ISMOUNTED=false COUNT=1 while [ $COUNT -le $MOUNTEDDISKINDEX ] do IFMOUNTED=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::dskPath.${COUNT}) IFMOUNTED=`cut -d " " -f 4 <<< $IFMOUNTED` #echo $IFMOUNTED"plu"$VOLUME if [ $IFMOUNTED == $VOLUME ] then ALLRESULTSMESSAGE+=$VOLUME" is mounted\n" ISMOUNTED=true fi COUNT=$(($COUNT+1)) done if [ $ISMOUNTED != true ] then ALLRESULTSMESSAGE+="WARNING! "$VOLUME" is not mounted!\n" WARNINGMESSAGE+="WARNING! "$VOLUME" is not mounted!" EXITCODE=2 fi done else ALLRESULTSMESSAGE+="Skipping mounted volumes test.\n" fi ALLRESULTSMESSAGE+="#################\n" ###########################Getting Disk Space################################# if [ $SKIPDISKFULL == false ] then i=1 LIMIT=false while [ $LIMIT = false ] do snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::dskIndex.${i} >/dev/null 2>&1 if [ $? == 0 ] then MESSAGE=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::dskPath.${i}) MESSAGE=`cut -d " " -f 4 <<< $MESSAGE` PERCENT=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::dskPercent.${i}) PERCENT=`cut -d " " -f 4 <<< $PERCENT` if [ $PERCENT -lt $PERCENTLIMIT ] then ALLRESULTSMESSAGE+=$MESSAGE" is "$PERCENT"% full\n" else ALLRESULTSMESSAGE+="WARNING! "$MESSAGE" has exceeded it's fullness threshold. It is "$PERCENT"% full!\n" WARNINGMESSAGE+="WARNING! "$MESSAGE" has exceeded it's fullness threshold. It is "$PERCENT"% full!" EXITCODE=2 fi else #echo "No more volumes to report" LIMIT=true fi i=$(($i+1)) done else ALLRESULTSMESSAGE+="Skipping Disk full test.\n" fi ALLRESULTSMESSAGE+="#################\n" ###########################Getting CPU LOADS################################# if [ $SKIPCPU == false ] then CPULOAD1=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::laLoadInt.1) CPULOAD1=`cut -d " " -f 4 <<< $CPULOAD1` if [ $CPULOAD1 -gt ${CPULIMITS[0]} ] then ALLRESULTSMESSAGE+="WARNING! CPU load over the last minute is "$CPULOAD1"/1000 and has exceeded it's quota of "${CPULIMITS[0]}"!\n" WARNINGMESSAGE+="WARNING! CPU load over the last minute is "$CPULOAD1"/1000 and has exceeded it's quota of "${CPULIMITS[0]}"!" EXITCODE=2 else ALLRESULTSMESSAGE+="CPU load over the last minute is "$CPULOAD1"/1000\n" fi CPULOAD5=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::laLoadInt.2) CPULOAD5=`cut -d " " -f 4 <<< $CPULOAD5` if [ $CPULOAD5 -gt ${CPULIMITS[1]} ] then ALLRESULTSMESSAGE+="WARNING! CPU load over the last 5 minutes is "$CPULOAD5"/1000 and has exceeded it's quota of "${CPULIMITS[1]}"!\n" WARNINGMESSAGE+="WARNING! CPU load over the last 5 minutes is "$CPULOAD5"/1000 and has exceeded it's quota of "${CPULIMITS[1]}"!" EXITCODE=2 else ALLRESULTSMESSAGE+="CPU load over the last 5 minutes is "$CPULOAD5"/1000\n" fi CPULOAD15=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::laLoadInt.3) CPULOAD15=`cut -d " " -f 4 <<< $CPULOAD15` if [ $CPULOAD15 -gt ${CPULIMITS[2]} ] then ALLRESULTSMESSAGE+="WARNING! CPU load over the last 15 minutes is "$CPULOAD15"/1000 and has exceeded it's quota of "${CPULIMITS[2]}"!\n" WARNINGMESSAGE+="WARNING! CPU load over the last 15 minutes is "$CPULOAD15"/1000 and has exceeded it's quota of "${CPULIMITS[2]}"!" EXITCODE=2 else ALLRESULTSMESSAGE+="CPU load over the last minute is "$CPULOAD15"/1000\n" fi else ALLRESULTSMESSAGE+="Skipping CPU load test.\n" fi ALLRESULTSMESSAGE+="#################\n" ###########################Checking Processes################################# if [ $SKIPPROCESSES == false ] then PROCESSINDEX=`snmpwalk -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::prIndex | wc -l` for PROCESS in "${PROCESSESTOCHECK[@]}"; do ISPROCESS=false PROCESSCOUNT=1 while [ $PROCESSCOUNT -le $PROCESSINDEX ]; do GETPROCESS=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::prNames.$PROCESSCOUNT) GETPROCESS=`cut -d " " -f 4 <<< $GETPROCESS` if [ $GETPROCESS == $PROCESS ] then GETPROCESSCOUNT=$(snmpget -v $VERSION -c $COMMUNITY $HOST UCD-SNMP-MIB::prCount.$PROCESSCOUNT) GETPROCESSCOUNT=`cut -d " " -f 4 <<< $GETPROCESSCOUNT` if [ $GETPROCESSCOUNT -gt "0" ] then ALLRESULTSMESSAGE+=$PROCESS" is present\n" ISPROCESS=true fi fi PROCESSCOUNT=$(($PROCESSCOUNT+1)) done if [ $ISPROCESS != true ] then ALLRESULTSMESSAGE+="WARNING! "$PROCESS" is not present!\n" WARNINGMESSAGE+="WARNING! "$PROCESS" is not present!" EXITCODE=2 fi done else ALLRESULTSMESSAGE+="Skipping present processes test.\n" fi ###########################EXIT CODE################################# if [[ $WARNINGMESSAGE == "" ]] then echo -e $ALLRESULTSMESSAGE else echo -e $WARNINGMESSAGE echo -e $ALLRESULTSMESSAGE fi exit $EXITCODE

The result

COMMAND: /usr/local/nagios/libexec/apple_snmp_akb.sh -c public -H 10.1.1.112 -d /Volumes/DVD2_SYS-SSD,/Volumes/DVD2_SCRATCH,/Volumes/DVD2_SYS-CLONE,/Volumes/SHERWIN,/Volumes/WILLIAMS,/Volumes/PRATT -l 1000,1000,1000 -p httpd -f 90 OUTPUT: Host dvd2.splice.lan Exists and is Reachable ################# /Volumes/DVD2_SYS-SSD is mounted /Volumes/DVD2_SCRATCH is mounted /Volumes/DVD2_SYS-CLONE is mounted /Volumes/SHERWIN is mounted /Volumes/WILLIAMS is mounted /Volumes/PRATT is mounted ################# /Volumes/SHERWIN is 80% full /Volumes/WILLIAMS is 69% full /Volumes/DVD2_SCRATCH is 63% full /Volumes/PRATT is 73% full /Volumes/DVD2_SYS-SSD is 75% full /Volumes/DVD2_SYS-CLONE is 33% full ################# CPU load over the last minute is 168/1000 CPU load over the last 5 minutes is 164/1000 CPU load over the last minute is 165/1000 ################# httpd is present

Nagios Script

A custom Nagio Monitoring plugin for OS X machines.

Python Backup Script

A script to back up web domains from a Linux VPS.

Resolve Database

Script to back up multiple databases for a Resolve Server running Postgres.

Producer Portal

This is a PHP web application that is used by producers to start a job.

Editor's WTF?!

My effort to create a desktop app written in Python. This is a simple video file transcoder.