May 15, 2009

DIY ESX Server Health Monitoring - Part 2

In Part 1 of this series, we created the shell script that will generate a formatted health report for each ESX server. In order to have a combined health report for all of the ESX hosts, we'll use SSH to run the shell script on each host and send the output to the central ESX server responsible for gathering the data and sending the email.

To schedule the health report from a script and cron job, we'll need to use key based authentication rather than interactive passwords to access the remote ESX servers. We'll also encrypt the private key with a passphrase. That way if the filesystem security were ever compromised and someone were able to obtain the private key, they would still need the passphrase to unlock it and gain access to the remote systems.

Encrypting the private key presents a problem, however, as unlocking it during a connection attempt requires entering the password interactively. Thankfully there is a solution, ssh-agent, which will allow us to unlock the private key once with an interactive password prompt, and then keep it in memory until ssh-agent is terminated or the server is rebooted.

SSH authentication using keys
If you are new to the concept of using key based authentication for SSH, a quick Google search for 'ssh using keys' will provide a wealth of info. Here's a couple of links that explain it much better than I can: http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO and http://wiki.archlinux.org/index.php/Using_SSH_Keys
And this link covers some challenges with using ssh-agent: http://www.ibm.com/developerworks/library/l-keyc2

To get started, we'll generate a 2048 bit RSA key pair for authentication. There's plenty of debate on the merits of DSA over RSA, and vice versa, but we'll flip a coin, and pick RSA.

Logged in as the non-root account you are planning to use for the ESX health report, execute this command to generate a 2048 bit RSA key pair:


  ssh-keygen -t rsa -b 2048

When prompted with: Enter file in which to save the key, hit return to accept the default location. At the prompt: Enter passphrase (empty for no passphrase):, enter a strong passphrase for the private key.

If you type ls -la in the user's home directory, you should see that a .ssh folder was created. If you cd into that directory, you'll find the private key file, id_rsa, and the public key file, id_rsa.pub, have been created.

Distribute the public key
For the next step, we need to copy the public key just generated to each ESX host we want to SSH into and execute the health report script on. You can simply scp them, or even use a Windows SFTP client if you wish (yuck). One issue with that approach is that unless you have used the SSH client or gone through the ssh-keygen process on each remote host, the necessary .ssh folder hasn't been created in the user's home folder. The following script will take care of the whole process; SSH to each host, create the .ssh directory if needed, and add the public key to the authorized_keys file on the remote ESX server.

By default, the ESX server firewall blocks outgoing SSH client connections, so issue this command as root on the central reporting ESX server to enable outbound SSH:


  # Execute as root
  esxcfg-firewall -e sshClient

To use the public key distribution script below, paste the entire code block into a putty window, then execute the script with the list of ESX hosts you wish to copy the key to. You'll get a bunch of password and key fingerprint - authenticity prompts, but we only have to do this once. Remember to run this from the ESX server that will be polling the others, logged in as the non-root user that will be executing the health check script:


###############################################################################
#
#  copykey.sh
#
###############################################################################
#
#  To create the copykey.sh script, copy this whole code segment into your
#  shell. If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
###############################################################################

# If the ~/esx-report directory exists, cd to it so the script is created there
[ -d ~/esx-report ] && cd ~/esx-report

cat > ./copykey.sh <<'SCRIPTCREATOR'
#! /bin/bash
  PATH="/bin:/usr/bin"

  if [ ! -e ~/.ssh/id_rsa.pub ]; then
    echo "RSA public key file ~/.ssh/id_rsa.pub not found!"
    exit 1
  fi

  for esxhost in $@; do
    ssh -q "${USER}@${esxhost}" \
    "if [ ! -d ~/.ssh ]; then mkdir -m 0700 ~/.ssh; fi; \
    echo $(cat ~/.ssh/id_rsa.pub) >> ~/.ssh/authorized_keys; \
    chmod 0600 ~/.ssh/authorized_keys" || echo "Unable to connect to $esxhost"
  done
SCRIPTCREATOR

chmod 0700 ./copykey.sh

###############################################################################

Invoke the copykey.sh script with a space delimited list of ESX hosts:
./copykey.sh esx02.vmnet.local esx03.vmnet.local esx04.vmnet.local

Or read them from a text file if you have a lot of hosts. The text file can be space delimited or have each host on a new line:
./copykey.sh $(cat hostlist.txt)

In the copykey.sh script, notice how we used the command substitution syntax, $( ), to echo the text of the public key file into the authorized_keys file on the remote host. The local shell interprets the command substitution before the SSH command, so it executes cat on the local public key file. This is a handy trick, and we'll use it later to execute the locally stored health report shell script on the remote hosts.

Keep the private key unlocked with ssh-agent
Now that the public keys are pushed out, make a test SSH connection to one of the remote servers with a ssh somehost command. If you've set everything up correctly to this point, you should receive a prompt like Enter passphrase for key '/home/user/.ssh/id_rsa':, which is different from the user@host password: prompt of a typical SSH connection. We're being prompted to decrypt the local private key before the key algorithm is run to verify the connection attempt. Obviously, that's not going to work from a cron job.

This is where ssh-agent comes in. If you run it from a putty session, you should get some unusual output like:

SSH_AUTH_SOCK=/tmp/ssh-UIbA2689/agent.2689; export SSH_AUTH_SOCK;
SSH_AGENT_PID=2690; export SSH_AGENT_PID;
echo Agent pid 2690;
The output is providing you with the environment variables to use in order to locate the ssh-agent PID and socket. The application doesn't actually export any of the information into your shell, it expects you do that. You can test this out by typing echo $SSH_AGENT_PID after running ssh-agent; the variable isn't defined in the current shell.

There are a couple of ways to fix that, you could invoke it like ssh-agent bash, which will open a new bash shell with the variables exported. Or you could execute it with eval $(ssh-agent) to export the variables into your current shell. Since we won't be using it interactively, but rather from a cron job, we'll redirect the output from ssh-agent into a file, and then source that file from the cron job.

Something has to get ssh-agent running every time the ESX server is rebooted or someone kills the process, so let's create a handy start-ssh-agent.sh shell script in the non-root user's home directory:


###############################################################################
#
#  start-ssh-agent.sh
#
###############################################################################
#
#  To create the start-ssh-agent.sh script in the current user's home
#  directory, copy this whole code segment into your shell.
#  If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
###############################################################################

cat > ~/start-ssh-agent.sh <<'SCRIPTCREATOR'
#! /bin/bash
  PATH="/bin:/usr/bin"
  killall ssh-agent >/dev/null 2>&1
  ssh-agent > ~/.ssh-agent
  chmod 0600 ~/.ssh-agent
  source ~/.ssh-agent
    export SSH_AUTH_SOCK
    export SSH_AGENT_PID
  ssh-add
SCRIPTCREATOR

chmod 0700 ~/start-ssh-agent.sh

###############################################################################

The ssh-add command at the end of the script loads the private key into ssh-agent, and will prompt for the private key passphrase. Once the key is loaded, you can log off and ssh-agent will continue to run until the process is killed or the server is rebooted. You'll need to run start-ssh-agent.sh from an interactive login each time the ESX server is rebooted, but that's probably not very often, and the added security of using an encrypted private key certainly makes up for the hassle.

Execute the script above by typing ~/start-ssh-agent.sh to load the private key into ssh-agent, and we can test the health report script on multiple hosts. Paste the following into a putty window, after replacing the hostnames with your own, and the script output should display on your terminal:

[ -d ~/esx-report ] && cd ~/esx-report
source ~/.ssh-agent; \
for host in esx02.vmnet.local esx03.vmnet.local; do \
ssh -q $host "$(cat esx-report.sh)"; done

Notice again how we used command substitution, $( ), to cat the locally stored script file through the SSH session, running the commands in the script on the remote host. For a small script like esx-report.sh, this is a really simple and efficient method, and it makes it very easy to add additional checks to the script.

Stay tuned
Coming up in Part 3, we'll take a look at emailing the script output in HTML format, and tie the whole process together from a cron job.

No comments:

Post a Comment