cron
job that will only send out an alert message when an ESX host exceeds a specified threshold.Due to the simple design of the health report scripts, to set up this functionality we only need to modify a few lines from the
run-esx-report.sh
script:- The first change is in the loop where we SSH into each ESX host and run the esx-report.sh script. We'll simply change the append redirection symbols,
>>
, to the create or truncate symbol,>
, this way we're creating a new report output file for each host, rather than a combined report. To be extra sure the temp file is truncated each time through the loop, we'll use the noclobber override option as well, so the>>
symbols become>|
- Next, we
grep
for the word WARNING in the output file, and wrap the rest of the script in anif
statement so the email is only sent out if thegrep
command returns true
- And finally, we'll just change the subject of the email message
|
Don't spam yourself
When considering how often you want to run the threshold check script, keep one shortcoming of this method in mind: if a parameter continues to exceed its threshold, the script will continue to email you every time it runs. If you set this up to run every five minutes, and head out into the woods over a holiday weekend, you're going to get a thousand alert messages before you get a chance to resolve the issue.
For our purposes, once every 30 minutes will suffice, and so we'll add another
cron
job by issuing a crontab -e
command as the non-root user, press i
to enter insert mode, and below the line containing the 7:10 AM ESX server health report job, we'll add:
0,30 * * * * ${HOME}/esx-report/run-esx-threshold.sh ESX LIST >/dev/null 2>&1
Press
Esc
, then :wq
to write the crontab and exit vi
, and we're done!If you do want to run the threshold check every five minutes, instead of specifying a list like 0,5,10,15, etc., use the range of minutes followed by a forward slash and interval, like:
0-59/5 * * * * ${HOME}/esx-report/run-esx-threshold.sh ESX LIST >/dev/null 2>&1
Tweak the thresholds
You'll definitely want to play with the threshold settings from the
esx-report.sh
script in Part 1. The threshold is the third parameter supplied to the scale
function, and I've highlighted it below for the memory usage check:
printf " Memory Usage:\n"
(free | awk '/^Mem:/ {print $3, $2, "100", $1}
/^Swap:/ {print $3, $2, "1", $1}') | \
while read line; do scale $line; done
That does it for the DIY ESX Server Health Monitoring project, I hope you'll find this information easy to customize for your own environment. If you add new performance checks or enhancements, feel free to describe the changes in a comment.
Install it
If you'd like to set the whole thing up, just copy and paste each code segment with a light blue background into a
putty
session. To install:
- Create the
esx-report.sh
script from Part 1 as the non-root user. Copying the entire code segment in the light blue box into aputty
window will create theesx-report
folder under the home folder of the user that executes it.
- From Part 2, execute the
ssh-keygen
command as the non-root user. Then run theesxcfg-firewall
command as root to open an outbound port for SSH. Create the two remaining scripts;copykey.sh
, andstart-ssh-agent.sh
as the non-root user.
Usecopykey.sh
to distribute the public key file, then launch ~/start-ssh-agent.sh
to load the private key into memory, both as the non-root user. Make testssh
connections to each ESX host you need to run the report on, but make sure you source the.ssh-agent
file first so the variables are exported to your shell,source ~/.ssh-agent
- Now create the
html-mailer.pl
script from Part 3 as the non-root user. As root, run theesxcfg-firewall
command to open outbound SMTP in the firewall. Change users back to the non-root user, and create therun-esx-report.sh
script and change the email settings for your environment.
- Create the
run-esx-threshold.sh
script from this post as the non-root user and change the email settings.
- Set up the
cron
jobs for the daily health report and the threshold check. Customize the whole thing any way you see fit.
- Try to schedule the daily health check and threshold checks so they don't run at the same time. The jobs will run fine simultaneously, but the usage numbers could be inflated.
- Configure reverse DNS records for your ESX hosts on the DNS servers they point to or you'll see long pauses during SSH connection attempts as the server times out attempting to resolve the connecting client's hostname from its IP.
No comments:
Post a Comment