May 29, 2009

Instantly Serve an ESX Directory via HTTP


python -m SimpleHTTPServer 9090

Before you type that in, understand that it's not going to work unless you've made some ill-advised changes to the Service Console firewall. Also, make sure you fully grasp the security risk you're about to take. This command will start up a simple web server on TCP port 9090 in the current working directory, allowing anyone to browse the files and subdirectories from a web browser under the security context of the user that executed the command. In other words, if you execute this as the root user, in the root directory, any file in the Service Console can be downloaded from a web browser.

This one-liner is extremely dangerous, but it is also extremely handy, and if used correctly in a properly designed environment, the potential risks can be managed. I use this all the time in my test lab to get output files from scripts by simply cd'ing to the script directory, running the above command, and pointing a web browser to http://IP_OF_ESX:9090 from the vCenter server.

How to make it safer:
  • The ESX Service Console network should be completely isolated from the LAN, and only vCenter servers and specific administrative workstations are allowed access

  • The Python command should be executed while the working directory is a folder created just for this purpose, and only contains the specific files you want to share and no subdirectories

  • The command should only be executed by a non-root user and the web server torn down as soon as the files have been downloaded by issuing a Ctrl-C

  • The root user must open a specific port in the firewall prior to using the command; for example, to open TCP port 9090:

    esxcfg-firewall --openPort 9090,tcp,in,SimpleHTTP

  • The port should then be closed immediately after the needed files have been downloaded; for example, to close down the previous command:

    esxcfg-firewall --closePort 9090,tcp,in


This also works in ESX 3.5, but the version of Python in the Service Console lacks the -m option, so the path to SimpleHTTPServer.py must be specified:

# ESX 3.5
python /usr/lib/python2.2/SimpleHTTPServer.py 9090


Might be too dangerous for production, so consider the risks carefully. But for testing, it can be really handy.

...read more

May 26, 2009

VM Security in vSphere - Same Ol' Situation (S.O.S.)

Over the weekend, I had a chance to test out the directives for locking down the virtual machine security issues discussed in Hardening the VMX File with vSphere / ESX 4.0. Unfortunately, all of the security issues are still present in the GA release of vSphere, including non-privileged users having the ability to disconnect virtual NICs and change the time synchronization behavior.

I can't imagine why this situation still persists through version 4.0 of VMware's enterprise virtualization platform. Are there customers who prefer non-privileged user accounts retain this ability? And if so, couldn't we disable this functionality by default, and require .vmx directives to enable it?

Yes, it is easy to change the default settings, and any sysadmin worth his or her salary will make the changes and audit their environment for compliance. That's a tired argument, however, and better "out of the box" security should be a goal for any product. Anybody remember Windows 2000?

...read more

May 25, 2009

DIY ESX Server Health Monitoring - Part 4

If you're just catching this series on creating an ESX health report, in Part 1, Part 2, and Part 3 we set up everything we need to schedule the daily health check and send the results in a HTML formatted email. Running the health check once a day is probably not sufficient if you want to be on top of developing issues, however, and if you have a lot of ESX hosts, reading through a long list of performance statistics may be unreasonable. So to wrap this project up, we'll look at setting up a second cron job that will only send out an alert message when an ESX host exceeds a specified threshold.

Due to the simple design of the health report scripts, to set up this functionality we only need to modify a few lines from the run-esx-report.sh script:
  • The first change is in the loop where we SSH into each ESX host and run the esx-report.sh script. We'll simply change the append redirection symbols, >>, to the create or truncate symbol, >, this way we're creating a new report output file for each host, rather than a combined report. To be extra sure the temp file is truncated each time through the loop, we'll use the noclobber override option as well, so the >> symbols become >|

  • Next, we grep for the word WARNING in the output file, and wrap the rest of the script in an if statement so the email is only sent out if the grep command returns true

  • And finally, we'll just change the subject of the email message


###############################################################################
#
#  run-esx-threshold.sh
#
###############################################################################
#
#  To create the run-esx-threshold.sh script in the ~/esx-report directory,
#  copy this entire code segment into your shell.
#  If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
#  putty will ignore all the tabs, making the copied script quite ugly
#
###############################################################################

# If the ~/esx-report directory exists, cd to it so the script is created there
[ -d ~/esx-report ] && cd ~/esx-report

cat > ./run-esx-threshold.sh <<'SCRIPTCREATOR'
#! /bin/bash
  PATH="/bin:/usr/bin"

  if [ -z $1 ]; then
    echo "No ESX hosts specified, exiting"
    exit 1
  fi

  if ! pgrep ssh-agent >/dev/null; then
    echo "The ssh-agent process does not appear to be running, exiting"
    exit 1
  fi

  RUNDIR=$(dirname "$(which "$0")")

  source "${HOME}/.ssh-agent" >/dev/null || exit 1

  THISHOST=$(hostname | cut -d . -f 1)

  TEMPTEXT=$(mktemp "${RUNDIR}/temptext.XXXXXXXXXX")

  TEMPHTML=$(mktemp "${RUNDIR}/temphtml.XXXXXXXXXX")

  for host in $@; do
    if [ $(echo $host | cut -d . -f 1) = $THISHOST ]; then
      "${RUNDIR}/esx-report.sh" >| "$TEMPTEXT"
    else
      ssh -q $host "$(cat "${RUNDIR}/esx-report.sh")" >| "$TEMPTEXT" || \
        printf "WARNING: SSH connection to $host failed\n\n\n\n" >| "$TEMPTEXT"
    fi

    if grep WARNING "$TEMPTEXT" >/dev/null; then

      cat >| "$TEMPHTML" <<-'HEADEREOF'
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<style type="text/css">
	body { font-family: monospace; font-size: 12px }
	pre { font-family: monospace; font-size: 12px }
	</style>
	</head>
	<body>
	<pre>
	HEADEREOF

      cat "$TEMPTEXT" | \
      sed -e 's/>/\&#62/g' \
          -e 's/WARNING:.*/<span style="color: red">&<\/span>/' >> "$TEMPHTML"

      cat >> "$TEMPHTML" <<-'FOOTEREOF'
	</pre>
	</body>
	</html>
	FOOTEREOF

      "${RUNDIR}/html-mailer.pl" -f esx-report@yourdomain.dom \
                                 -r administrator@yourdomain.dom \
                                 -s "Alert on $host" \
                                 -m exchange.yourdomain.com \
                                 -b "$TEMPHTML"
    fi
  done

  rm -f "$TEMPTEXT"; rm -f "$TEMPHTML"

SCRIPTCREATOR

chmod 0700 ./run-esx-threshold.sh

###############################################################################


Don't spam yourself
When considering how often you want to run the threshold check script, keep one shortcoming of this method in mind: if a parameter continues to exceed its threshold, the script will continue to email you every time it runs. If you set this up to run every five minutes, and head out into the woods over a holiday weekend, you're going to get a thousand alert messages before you get a chance to resolve the issue.

For our purposes, once every 30 minutes will suffice, and so we'll add another cron job by issuing a crontab -e command as the non-root user, press i to enter insert mode, and below the line containing the 7:10 AM ESX server health report job, we'll add:

0,30 * * * * ${HOME}/esx-report/run-esx-threshold.sh ESX LIST >/dev/null 2>&1

Press Esc, then :wq to write the crontab and exit vi, and we're done!

If you do want to run the threshold check every five minutes, instead of specifying a list like 0,5,10,15, etc., use the range of minutes followed by a forward slash and interval, like:

0-59/5 * * * * ${HOME}/esx-report/run-esx-threshold.sh ESX LIST >/dev/null 2>&1

Tweak the thresholds
You'll definitely want to play with the threshold settings from the esx-report.sh script in Part 1. The threshold is the third parameter supplied to the scale function, and I've highlighted it below for the memory usage check:

  printf "  Memory Usage:\n"
  (free | awk '/^Mem:/ {print $3, $2, "100", $1}
              /^Swap:/ {print $3, $2, "1", $1}') | \
    while read line; do scale $line; done

That does it for the DIY ESX Server Health Monitoring project, I hope you'll find this information easy to customize for your own environment. If you add new performance checks or enhancements, feel free to describe the changes in a comment.

Install it
If you'd like to set the whole thing up, just copy and paste each code segment with a light blue background into a putty session. To install:
  • Create the esx-report.sh script from Part 1 as the non-root user. Copying the entire code segment in the light blue box into a putty window will create the esx-report folder under the home folder of the user that executes it.

  • From Part 2, execute the ssh-keygen command as the non-root user. Then run the esxcfg-firewall command as root to open an outbound port for SSH. Create the two remaining scripts; copykey.sh, and start-ssh-agent.sh as the non-root user.
    Use copykey.sh to distribute the public key file, then launch ~/start-ssh-agent.sh to load the private key into memory, both as the non-root user. Make test ssh connections to each ESX host you need to run the report on, but make sure you source the .ssh-agent file first so the variables are exported to your shell, source ~/.ssh-agent

  • Now create the html-mailer.pl script from Part 3 as the non-root user. As root, run the esxcfg-firewall command to open outbound SMTP in the firewall. Change users back to the non-root user, and create the run-esx-report.sh script and change the email settings for your environment.

  • Create the run-esx-threshold.sh script from this post as the non-root user and change the email settings.

  • Set up the cron jobs for the daily health report and the threshold check. Customize the whole thing any way you see fit.

* A couple of tips:
  • Try to schedule the daily health check and threshold checks so they don't run at the same time. The jobs will run fine simultaneously, but the usage numbers could be inflated.

  • Configure reverse DNS records for your ESX hosts on the DNS servers they point to or you'll see long pauses during SSH connection attempts as the server times out attempting to resolve the connecting client's hostname from its IP.


...read more

May 19, 2009

DIY ESX Server Health Monitoring - Part 3

Updated: June 18, 2009
Added a semicolon to run-esx-report.sh that was left out and responsible for some ugly HTML formatting.

With the secure SSH access problem solved in Part 2, we'll move on to getting the data in the proper format and emailing it from the ESX Service Console. As you probably know, the Linux distribution installed with ESX 3.5 lacks sendmail or an equivalent command, but we can roll our own from a perl script.

The perl mailer script
We need to import two perl modules for the script, and both are included by default in the Service Console. Getopt::Std provides a simple way to get command line options, and Net::SMTP will interface with an Exchange or SMTP server accessible from the console network:

  use Getopt::Std;
  use Net::SMTP;

This getopt call is all that's necessary to declare the command line options (-f, -r, -s, etc.), and it will automatically populate a set of corresponding variables named opt_*. We'll do a quick check to make sure all the command line options were specified, and if not display the usage message:

  getopt ('frsmb');

  unless ($opt_f && $opt_r && $opt_s && $opt_m && $opt_b) {
    print_usage();
    exit 1;
  }

Next we'll create a filehandle named BODY, opening the file specified on the command line. After reading in each line to the variable body_data, we'll close the handle:

  open(BODY, $opt_b) || error("Could not open file $opt_b.");

  my @body_data=<BODY>;
  close(BODY);

The Net::SMTP module is pretty straightforward. To generate a HTML formatted email, we just need to specify the MIME version, the content type as HTML, and the character encoding as ISO 8859-1. If you would rather send the message as plain text, just remove those two lines:

  my $smtp = Net::SMTP->new($opt_m) ||
    error("SMTP connection to $opt_m failed.");
  $smtp->mail($opt_f);
  $smtp->to($opt_r);
  $smtp->data();
  $smtp->datasend("MIME-Version: 1.0\n");
  $smtp->datasend("Content-Type: text/html; charset=iso-8859-1\n");
  $smtp->datasend("To: $opt_r\n");
  $smtp->datasend("From: $opt_f\n");
  $smtp->datasend("Subject: $opt_s\n");
  foreach $line (@body_data)
    {
      $smtp->datasend("$line");
    }
  $smtp->dataend();
  $smtp->quit;

Here's the complete html-mailer.pl script:


###############################################################################
#
#  html-mailer.pl
#
###############################################################################
#
#  To create the html-mailer.pl script in the ~/esx-report directory, copy
#  this entire code segment into your shell.
#  If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
###############################################################################

# If the ~/esx-report directory exists, cd to it so the script is created there
[ -d ~/esx-report ] && cd ~/esx-report

cat > ./html-mailer.pl <<'SCRIPTCREATOR'
#! /usr/bin/perl -w

 use strict;
 use Getopt::Std;
 use Net::SMTP;

 # Options:
 # $opt_f  email address of the sender
 # $opt_r  recipient email address
 # $opt_s  message subject, enclose in quotes if spaces
 # $opt_m  SMTP server FQDN or IP address
 # $opt_b  HTML formatted file for the message body

  our ($opt_f, $opt_r, $opt_s, $opt_m, $opt_b);

  getopt ('frsmb');

  unless ($opt_f && $opt_r && $opt_s && $opt_m && $opt_b) {
    print_usage();
    exit 1;
  }

  open(BODY, $opt_b) || error("Unable to open file $opt_b");

  my @body_data=<BODY>;
  close(BODY);
  my $line;

  my $smtp = Net::SMTP->new($opt_m) ||
    error("SMTP connection to $opt_m failed");
  $smtp->mail($opt_f);
  $smtp->to($opt_r);
  $smtp->data();
  $smtp->datasend("MIME-Version: 1.0\n");
  $smtp->datasend("Content-Type: text/html; charset=iso-8859-1\n");
  $smtp->datasend("To: $opt_r\n");
  $smtp->datasend("From: $opt_f\n");
  $smtp->datasend("Subject: $opt_s\n");
  foreach $line (@body_data)
    {
      $smtp->datasend("$line");
    }
  $smtp->dataend();
  $smtp->quit;

sub error {
  my $msg = shift;
  print STDERR "html-mailer.pl: $msg\n";
  exit 1;
}

sub print_usage {
  print STDERR <<EOF

  html-mailer.pl - HTML Formatted Message Mailer

  Usage: html-mailer.pl -f FROM -r RECIP -s SUBJ -m SMTP_HOST -b HTML_FILE

  Sends an email to the specified address, filling the message body with the
  HTML formatted file specified.

EOF
}

SCRIPTCREATOR

chmod 0700 ./html-mailer.pl

###############################################################################


Enable outbound SMTP
Now that we've got a script that we can send test messages with, we need to enable outbound SMTP through the ESX firewall on the ESX server that will have the script scheduled from a cron job. Just type this command as root to open the port:

  # Execute as root
  esxcfg-firewall --openPort 25,tcp,out,SMTP


If your Exchange or SMTP server is reachable from the Service Console network, execute html-mailer.pl with the appropriate parameters, and specify any old text file:

  ./html-mailer.pl -f me@mydomain.dom \
                   -r me@mydomain.dom \
                   -s "Test message" \
                   -m exchange.mydomain.dom \
                   -b ./testfile.txt


The Service Console can't reach the Exchange server...
No worries, as long as you're able to reach the VirtualCenter server, we can install the SMTP service and set it up to forward to the Exchange server. To install SMTP on a Windows 2003 server, do the following:

  • Open the Control Panel > Add or Remove Programs > Add/Remove Windows Components > double click Application Server > and then double click Internet Information Server (IIS). Put a check next to SMTP Service and click OK, OK, and Next

  • After the SMTP install is complete, open the Start Menu > Programs > Administrative Tools > Internet Information Services (IIS) Manager, then right click Default SMTP Virtual Server and select Properties

  • In the General tab, drop down the IP address: to the IP address in the Service Console network, if different from the LAN. This will prevent the SMTP service from popping up on your network security guy's port scans :)

  • In the Access tab, click the Connection button, choose the Only the list below radio button, then click Add to add the appropriate subnet address and mask to the Group of computers option, or add each ESX server one at a time

  • In the Access tab again, click the Relay button, choose the Only the list below radio button, then click Add to add the appropriate subnet address and mask to the Group of computers option, or add each ESX server one at a time. Uncheck the option Allow all computers which successfully authenticate to relay

  • On the Delivery tab, click the Advanced button and add your Exchange server information in the Smart host: box. By specifying a smart host, the SMTP server will simply forward everything to the Exchange server, letting it make all the decisions about which domains to accept mail for, etc.

  • Now test out the SMTP forwarder by using telnet to initiate a SMTP session from the ESX server that will be sending the messages:
    
     telnet virtualcenter.lab.local 25
     ehlo
     mail from:spongebob@lab.local
     rcpt to:administrator@lab.local
     data
     Subject:test
     .
     quit
    

One script to rule them all
Almost there, so let's recap what we've done so far. In Part 1, we created the health check script that will run on each ESX server and send key performance stats and scaled histograms to the terminal. Then in Part 2, we covered how to distribute public keys so the script can be executed on several ESX servers via SSH. So far in Part 3, we've looked at a perl script that will email the combined script output, and now we need to create a script to tie it all together, and then schedule the script from a cron job.

Let's break down the main components of the script. First of all, if ssh-agent isn't running, the script isn't going to get very far, so we'll use pgrep to check for the process and exit if it's not found:

  if ! pgrep ssh-agent >/dev/null; then
    echo "The ssh-agent process does not appear to be running, exiting"
    exit 1
  fi

We need to source .ssh-agent, the file with the ssh-agent PID and socket info set up by the start-ssh-agent.sh script, or exit if it doesn't exist:

  source "${HOME}/.ssh-agent" >/dev/null || exit 1

Since we'll be running everything from an ESX Service Console, and that server is likely to be part of the health check, we should compare the list of ESX hosts to the local hostname so we don't open a SSH connection to the local machine. We use cut here to strip off the domain name so we'll match whether the FQDN or just the bare hostname is specified:

  THISHOST=$(hostname | cut -d . -f 1)

  for host in $@; do
    if [ $(echo $host | cut -d . -f 1) = $THISHOST ]; then
      "${RUNDIR}/esx-report.sh" >> "$TEMPTEXT"
    else
      ssh -q $host "$(cat "${RUNDIR}/esx-report.sh")" >> "$TEMPTEXT" || \
        printf "WARNING: SSH connection to $host failed\n\n\n\n" >> "$TEMPTEXT"
    fi
  done

After the health check script has looped through the list of ESX hosts, we'll start building the HTML file with the necessary tags. Setting the font size for the pre tag is the secret sauce for getting the email to display perfectly on a BlackBerry:

  cat > "$TEMPHTML" <<-'HEADEREOF'
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<style type="text/css">
	body { font-family: monospace; font-size: 12px }
	pre { font-family: monospace; font-size: 12px }
	</style>
	</head>
	<body>
	<pre>
	HEADEREOF

If you need to use < or > symbols in a HTML document, you have to specify the actual ASCII code of the character, as HTML considers words wrapped in those symbols to be tags. We'll use a sed filter to replace all the >'s with the ASCII equivalent, and add a color tag to any lines with the word WARNING to make it stand out:

  cat "$TEMPTEXT" | \
  sed -e 's/>/\&#62/g' \
      -e 's/WARNING:.*/<span style="color: red">&<\/span>/' >> "$TEMPHTML"
Then we'll add the closing tags for everything to the end of the HTML file:

  cat >> "$TEMPHTML" <<-'FOOTEREOF'
	</pre>
	</body>
	</html>
	FOOTEREOF

And finally, we'll execute html-mailer.pl with the appropriate parameters. You'll need to change this section of the script for your environment:

  "${RUNDIR}/html-mailer.pl" -f esx-report@lab.local \
                             -r administrator@lab.local \
                             -s "ESX Health Report" \
                             -m lab-vc \
                             -b "$TEMPHTML"

Here's the run-esx-report.sh script. Remember to change the email address and mail server parameters for your environment:

###############################################################################
#
#  run-esx-report.sh
#
###############################################################################
#
#  To create the run-esx-report.sh script in the ~/esx-report directory, copy
#  this entire code segment into your shell.
#  If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
#  putty will ignore all the tabs, making the copied script quite ugly
#
###############################################################################

# If the ~/esx-report directory exists, cd to it so the script is created there
[ -d ~/esx-report ] && cd ~/esx-report

cat > ./run-esx-report.sh <<'SCRIPTCREATOR'
#! /bin/bash
  PATH="/bin:/usr/bin"

  if [ -z $1 ]; then
    echo "No ESX hosts specified, exiting"
    exit 1
  fi

  if ! pgrep ssh-agent >/dev/null; then
    echo "The ssh-agent process does not appear to be running, exiting"
    exit 1
  fi

  RUNDIR=$(dirname "$(which "$0")")

  source "${HOME}/.ssh-agent" >/dev/null || exit 1

  THISHOST=$(hostname | cut -d . -f 1)

  TEMPTEXT=$(mktemp "${RUNDIR}/temptext.XXXXXXXXXX")

  TEMPHTML=$(mktemp "${RUNDIR}/temphtml.XXXXXXXXXX")

  for host in $@; do
    if [ $(echo $host | cut -d . -f 1) = $THISHOST ]; then
      "${RUNDIR}/esx-report.sh" >> "$TEMPTEXT"
    else
      ssh -q $host "$(cat "${RUNDIR}/esx-report.sh")" >> "$TEMPTEXT" || \
        printf "WARNING: SSH connection to $host failed\n\n\n\n" >> "$TEMPTEXT"
    fi
  done

  cat > "$TEMPHTML" <<-'HEADEREOF'
	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
	<html>
	<head>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
	<style type="text/css">
	body { font-family: monospace; font-size: 12px }
	pre { font-family: monospace; font-size: 12px }
	</style>
	</head>
	<body>
	<pre>
	HEADEREOF

  cat "$TEMPTEXT" | \
  sed -e 's/>/\&#62;/g' \
      -e 's/WARNING:.*/<span style="color: red">&<\/span>/' >> "$TEMPHTML"

  cat >> "$TEMPHTML" <<-'FOOTEREOF'
	</pre>
	</body>
	</html>
	FOOTEREOF

  "${RUNDIR}/html-mailer.pl" -f esx-report@yourdomain.dom \
                             -r administrator@yourdomain.dom \
                             -s "ESX Health Report" \
                             -m exchange.yourdomain.com \
                             -b "$TEMPHTML"

  rm -f "$TEMPTEXT"; rm -f "$TEMPHTML"

SCRIPTCREATOR

chmod 0700 ./run-esx-report.sh

###############################################################################


To cron foo, thanks for everything
Still with us? One more step, and it's an easy one. We'll add a cron job to run the script at 7:10 AM every morning. Remember to add the job for the user account you distributed SSH keys for.

To edit the cron entries for the user, type:

  crontab -e

This starts vi and opens up the user's crontab. To enter insert mode, type i

Assuming you've set everything up using the code segments used in this series, to add an entry for 7:10 AM, type this line, replacing ESX LIST with a space separated list of ESX hosts:

  10 7 * * * ${HOME}/esx-report/run-esx-report.sh ESX LIST >/dev/null 2>&1

After adding the entry, press the Esc key, and type :wq to write the crontab and quit.

If you have a long list of hosts, put them all in a text file, separated by spaces or each on its own line, and use command substitution to feed the list to run-esx-report.sh

  run-esx-report.sh $(cat ${HOME}/esx-report/hostlist.txt)

There's more?!
What if we wanted to trigger an email warning if an ESX host exceeds a threshold value? As we'll see in Part 4, we can do this easily with a quick modification to the run-esx-report.sh script.

...read more

May 15, 2009

DIY ESX Server Health Monitoring - Part 2

In Part 1 of this series, we created the shell script that will generate a formatted health report for each ESX server. In order to have a combined health report for all of the ESX hosts, we'll use SSH to run the shell script on each host and send the output to the central ESX server responsible for gathering the data and sending the email.

To schedule the health report from a script and cron job, we'll need to use key based authentication rather than interactive passwords to access the remote ESX servers. We'll also encrypt the private key with a passphrase. That way if the filesystem security were ever compromised and someone were able to obtain the private key, they would still need the passphrase to unlock it and gain access to the remote systems.

Encrypting the private key presents a problem, however, as unlocking it during a connection attempt requires entering the password interactively. Thankfully there is a solution, ssh-agent, which will allow us to unlock the private key once with an interactive password prompt, and then keep it in memory until ssh-agent is terminated or the server is rebooted.

SSH authentication using keys
If you are new to the concept of using key based authentication for SSH, a quick Google search for 'ssh using keys' will provide a wealth of info. Here's a couple of links that explain it much better than I can: http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO and http://wiki.archlinux.org/index.php/Using_SSH_Keys
And this link covers some challenges with using ssh-agent: http://www.ibm.com/developerworks/library/l-keyc2

To get started, we'll generate a 2048 bit RSA key pair for authentication. There's plenty of debate on the merits of DSA over RSA, and vice versa, but we'll flip a coin, and pick RSA.

Logged in as the non-root account you are planning to use for the ESX health report, execute this command to generate a 2048 bit RSA key pair:


  ssh-keygen -t rsa -b 2048

When prompted with: Enter file in which to save the key, hit return to accept the default location. At the prompt: Enter passphrase (empty for no passphrase):, enter a strong passphrase for the private key.

If you type ls -la in the user's home directory, you should see that a .ssh folder was created. If you cd into that directory, you'll find the private key file, id_rsa, and the public key file, id_rsa.pub, have been created.

Distribute the public key
For the next step, we need to copy the public key just generated to each ESX host we want to SSH into and execute the health report script on. You can simply scp them, or even use a Windows SFTP client if you wish (yuck). One issue with that approach is that unless you have used the SSH client or gone through the ssh-keygen process on each remote host, the necessary .ssh folder hasn't been created in the user's home folder. The following script will take care of the whole process; SSH to each host, create the .ssh directory if needed, and add the public key to the authorized_keys file on the remote ESX server.

By default, the ESX server firewall blocks outgoing SSH client connections, so issue this command as root on the central reporting ESX server to enable outbound SSH:


  # Execute as root
  esxcfg-firewall -e sshClient

To use the public key distribution script below, paste the entire code block into a putty window, then execute the script with the list of ESX hosts you wish to copy the key to. You'll get a bunch of password and key fingerprint - authenticity prompts, but we only have to do this once. Remember to run this from the ESX server that will be polling the others, logged in as the non-root user that will be executing the health check script:


###############################################################################
#
#  copykey.sh
#
###############################################################################
#
#  To create the copykey.sh script, copy this whole code segment into your
#  shell. If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
###############################################################################

# If the ~/esx-report directory exists, cd to it so the script is created there
[ -d ~/esx-report ] && cd ~/esx-report

cat > ./copykey.sh <<'SCRIPTCREATOR'
#! /bin/bash
  PATH="/bin:/usr/bin"

  if [ ! -e ~/.ssh/id_rsa.pub ]; then
    echo "RSA public key file ~/.ssh/id_rsa.pub not found!"
    exit 1
  fi

  for esxhost in $@; do
    ssh -q "${USER}@${esxhost}" \
    "if [ ! -d ~/.ssh ]; then mkdir -m 0700 ~/.ssh; fi; \
    echo $(cat ~/.ssh/id_rsa.pub) >> ~/.ssh/authorized_keys; \
    chmod 0600 ~/.ssh/authorized_keys" || echo "Unable to connect to $esxhost"
  done
SCRIPTCREATOR

chmod 0700 ./copykey.sh

###############################################################################

Invoke the copykey.sh script with a space delimited list of ESX hosts:
./copykey.sh esx02.vmnet.local esx03.vmnet.local esx04.vmnet.local

Or read them from a text file if you have a lot of hosts. The text file can be space delimited or have each host on a new line:
./copykey.sh $(cat hostlist.txt)

In the copykey.sh script, notice how we used the command substitution syntax, $( ), to echo the text of the public key file into the authorized_keys file on the remote host. The local shell interprets the command substitution before the SSH command, so it executes cat on the local public key file. This is a handy trick, and we'll use it later to execute the locally stored health report shell script on the remote hosts.

Keep the private key unlocked with ssh-agent
Now that the public keys are pushed out, make a test SSH connection to one of the remote servers with a ssh somehost command. If you've set everything up correctly to this point, you should receive a prompt like Enter passphrase for key '/home/user/.ssh/id_rsa':, which is different from the user@host password: prompt of a typical SSH connection. We're being prompted to decrypt the local private key before the key algorithm is run to verify the connection attempt. Obviously, that's not going to work from a cron job.

This is where ssh-agent comes in. If you run it from a putty session, you should get some unusual output like:

SSH_AUTH_SOCK=/tmp/ssh-UIbA2689/agent.2689; export SSH_AUTH_SOCK;
SSH_AGENT_PID=2690; export SSH_AGENT_PID;
echo Agent pid 2690;
The output is providing you with the environment variables to use in order to locate the ssh-agent PID and socket. The application doesn't actually export any of the information into your shell, it expects you do that. You can test this out by typing echo $SSH_AGENT_PID after running ssh-agent; the variable isn't defined in the current shell.

There are a couple of ways to fix that, you could invoke it like ssh-agent bash, which will open a new bash shell with the variables exported. Or you could execute it with eval $(ssh-agent) to export the variables into your current shell. Since we won't be using it interactively, but rather from a cron job, we'll redirect the output from ssh-agent into a file, and then source that file from the cron job.

Something has to get ssh-agent running every time the ESX server is rebooted or someone kills the process, so let's create a handy start-ssh-agent.sh shell script in the non-root user's home directory:


###############################################################################
#
#  start-ssh-agent.sh
#
###############################################################################
#
#  To create the start-ssh-agent.sh script in the current user's home
#  directory, copy this whole code segment into your shell.
#  If you'd rather copy just the script, select everything between the
#  SCRIPTCREATOR limit strings.
#
###############################################################################

cat > ~/start-ssh-agent.sh <<'SCRIPTCREATOR'
#! /bin/bash
  PATH="/bin:/usr/bin"
  killall ssh-agent >/dev/null 2>&1
  ssh-agent > ~/.ssh-agent
  chmod 0600 ~/.ssh-agent
  source ~/.ssh-agent
    export SSH_AUTH_SOCK
    export SSH_AGENT_PID
  ssh-add
SCRIPTCREATOR

chmod 0700 ~/start-ssh-agent.sh

###############################################################################

The ssh-add command at the end of the script loads the private key into ssh-agent, and will prompt for the private key passphrase. Once the key is loaded, you can log off and ssh-agent will continue to run until the process is killed or the server is rebooted. You'll need to run start-ssh-agent.sh from an interactive login each time the ESX server is rebooted, but that's probably not very often, and the added security of using an encrypted private key certainly makes up for the hassle.

Execute the script above by typing ~/start-ssh-agent.sh to load the private key into ssh-agent, and we can test the health report script on multiple hosts. Paste the following into a putty window, after replacing the hostnames with your own, and the script output should display on your terminal:

[ -d ~/esx-report ] && cd ~/esx-report
source ~/.ssh-agent; \
for host in esx02.vmnet.local esx03.vmnet.local; do \
ssh -q $host "$(cat esx-report.sh)"; done

Notice again how we used command substitution, $( ), to cat the locally stored script file through the SSH session, running the commands in the script on the remote host. For a small script like esx-report.sh, this is a really simple and efficient method, and it makes it very easy to add additional checks to the script.

Stay tuned
Coming up in Part 3, we'll take a look at emailing the script output in HTML format, and tie the whole process together from a cron job.

...read more

May 12, 2009

DIY ESX Server Health Monitoring - Part 1

Updated: May 25, 2009
I've had a chance to test this project out with vSphere - ESX 4.0, and everything works the same, except for the last section of the esx-report.sh script that parsed the /proc/vmware/mem file. This file has moved and the format was changed. Since the hypervisor memory usage can be monitored and alerts triggered from vCenter Server, I've just removed that section from the script.

VMware creates some pretty amazing stuff. If they didn't, I wouldn't have a blog about it, and you wouldn't be reading blogs about it. But let's be honest, it's not perfect (what software is?), and every now and then something buggy will happen. Sometimes these quirks occur at scary moments, like when removing a snapshot, or in the middle of a VMotion operation. Rarely do they cause any actual downtime, however.

You could classify the quirks into two categories; vCenter Server and ESX. The vCenter issues are almost always harmless, and are sometimes simply resolved by closing and reopening the VI/vSphere client. It's the ESX quirks that can be really serious, as any problems with ESX can lead to virtual machine downtime.

That's why it is so important to have some type of ESX performance monitoring in place. vCenter provides some basics, but doesn't offer any real visibility into the Service Console, which is where the serious problems can be lurking. There are some very good commercial products, and one in particular, Veeam Monitor, is even offered in a free edition. Sometimes you need something more customized, however, and the only option is to build it yourself.

In this four part series, we'll build our own ESX health report with a shell script, use key based SSH authentication so that one ESX server can run the script on the others, and then email the report using a perl script. We'll finish with a quick modification to enable the report to trigger an email when performance thresholds are reached. The format of the report will be designed to display perfectly on a BlackBerry Curve set to its smallest font size, allowing us to know about issues from anywhere.

The motivation
I started working on this script after two separate ESX Service Console incidents. The first occurred after upgrading an ESX 3.5 server to Update 2. The upgrade was successful, and everything seemed perfectly normal. But as it turned out, there was an issue between the HP Systems Insight Manager server and the new update, causing a new process to be launched in the Service Console every five minutes, but the processes never terminated. After a week of this, several thousand zombie processes were running in the Service Console. There is a limit on the number of processes before the ESX server will stop launching new ones, and once you hit that limit, chaos ensues.

The second incident was less severe, but was just as scary because no one really knew how long the problem had been occurring. For reasons never understood, one of the VMware log files started filling up with the same generic HA message, logging more than five entries a second. The log file was rotating through all of its file names several times an hour, and the Service Console processor usage was pegged. The problem was finally noticed when a VMotion took longer than fifteen minutes to complete.

The plan
The setup for the ESX health report is fairly simple and it satisfies two key components of a good security policy; do not permit SSH access for root, and never store passwords in scripts. The basic plan is:
  • A non-root user account will be used for the entire process, and a public SSH key for this account will be distributed to the other ESX servers

  • From the central ESX server the non-root account will SSH to each ESX host, execute the script, and redirect the script output to the central ESX host

  • The combined output from each ESX server will then be emailed using a perl script, formatted to display nicely on a BlackBerry Curve

For this setup to work, you'll need to use the same non-root account on each ESX server. Even though most of the VMware command-line tools can only be executed by root, we can get most of the critical stats we need with just a regular non-privileged account.

It will also be necessary to open an outbound port for SMTP in the firewall on the ESX host that will be emailing the report. If your network design has isolated the ESX hosts from the rest of the network (and it should!), and only the server running vCenter has access to the service console network, you'll need to set up SMTP on the vCenter server and configure it to forward to an Exchange server or whatever groupware application you use, which we'll cover in Part 3.

The script
The shell script used to gather data from the ESX hosts is pretty straightforward, and can be developed and tested from a local ESX console, as the output is just being sent to the terminal.

The scale function will perform all calculations and print the histogram output. The function expects four parameters: the value, the maximum range for the value, a threshold percentage for generating a warning, and a description for the histogram.

The CSCALE local variable determines how the data is scaled, or how many intervals the graph will display. I've used a value of twenty here, mainly because it displays perfectly on my BlackBerry Curve, so each hash will represent a 5% interval. If you need more resolution than that in the graphs, it's just a matter of changing the CSCALE constant:

  function scale {
    local CSCALE=20

The bash shell lacks the ability to perform floating point calculations, which makes getting percentages pretty tough. However, awk fills this gap easily, and can round the percentages by simply using a printf format specifying a floating point number, with a precision of 0, so no decimal point will be printed (%.0f)

   avg=$(echo $1 $2 | awk '{printf "%.0f", ($1/$2) * 100}')

We'll use awk again to determine how many hashes should be printed in the histogram, but this time we'll multiply by the CSCALE value rather than 100:

  scaled=$(echo $1 $2 $CSCALE | awk '{printf "%.0f", ($1/$2) * $3}')

With the percentage value rounded and stored in avg, we will compare it to the threshold value specified as the third parameter to the scale function. If avg is greater than or equal to the threshold, we'll store the third parameter in a variable named alert that we can check for later:

  if [ $avg -ge $3 ]; then
    alert=$3
  fi

When calling a bash function, the parameters are specified as a space delimited list. This is problematic if the arguments themselves have spaces in them, which the descriptions for the histograms most certainly will. So in the scale function, we'll make the description the last parameter, that way we know that from the fourth parameter on will be part of the description. Using the built-in shift command, we can shift off the first three parameters, making the array of all the arguments specified to the function, the $@ variable, hold the description. We can then grab the whole array of arguments left over and store it in a single variable:

  shift 3; histtext=$@

Now that the calculations are done and parameters stored, we can start printing some histograms. To keep everything lined up in the output, we'll let printf format specifiers do all the work. This first printf command outputs the description for the histogram followed by the average. The first format directive, %-10.10s, specifies a string value (s) will be printed in a 10 character width field (10), and it should be left justified (-) with a precision of 10, meaning only print the first 10 characters even if it is longer (.10).
The second format directive, %3d%%, specifies an integer value will be printed in a three character width field, and a percent sign will follow it:

  printf "    %-10.10s %3d%% " "$histtext" "$avg"

This loop adds # characters to the hist variable up to the value of scaled, giving us the bar for the histogram:

  for ((i=0; i>scaled; i++)); do
    hist="${hist}#"
  done

And now we'll print the histogram bar, enclose it in [] brackets, left justify it (-), and print it in a field width equal to the CSCALE value:

  printf "[%-${CSCALE}.${CSCALE}s]" "$hist"

If the average was greater than or equal to the threshold, the alert variable will be defined, so we will print a warning message right below the histogram, lining it up perfectly by using printf format directives to specify the field width:

  if [ $alert ]; then
    printf "%28s%3d%% %8s" "WARNING:" "$alert" "threshold"
    printf "\n"
  fi

With the scale function defined, we can start gathering data and formatting the report. We'll begin by printing the hostname of the ESX server:

  printf "$(hostname)\n\n"

To get the load average for the service console, we'll use the last section of the output from uptime. Using egrep with the -o option tells it to only print out the section of the line that matches. Then use tr to change the lower case 'l' in 'load' to upper case, and we have a nicely formatted load average:

  printf " Service Console Stats:\n"
  printf "  $(uptime | egrep -o 'load.*' | tr 'l' 'L')\n"

Getting the number of running processes is easy, just pipe ps into grep, match everything (.), and output the number of lines that matched (-c). This won't be completely accurate, as the script itself and any subshells it launches will be included in the count. But that's not a big deal, the process count is only critical when it gets into the thousands, so we don't really care if it's inflated by a couple of counts:

   printf "  Number of running processes: $(ps -e | grep . -c)\n\n"

We can get the processor use average from vmstat, telling it to calculate the average over five seconds. We're grabbing two stats here, the user and system processor use percentages, so the '\n' newline in the middle of the awk command splits the output onto two lines. In order to handle sending the two lines to the scale function, we use read in a while loop to process each line.

In the awk command, we'll send the scale function the appropriate field parsed from vmstat, and then follow it with the maximum value for the data (in this case 100 as vmstat is reporting a percentage), the threshold value for triggering a warning (75), and a description for the histogram (User: or System:)

  printf "  Proc 5 sec avg:\n"
  (vmstat 5 2 | awk 'END {
   print $13, "100", "75", "User:\n" $14, "100", "75", "System:"}') | \
    while read line; do scale $line; done

To parse the memory use information from free, we'll use the pattern matching capabilities of awk to execute print statements only on the lines that begin with Mem: and Swap:

You might want to exclude the memory use histogram and only report on swap usage, as Linux will use memory not allocated to applications for buffering files, resulting in free almost always reporting very high memory usage:

  printf "  Memory Usage:\n"
  (free | awk '/^Mem:/ {print $3, $2, "100", $1}
              /^Swap:/ {print $3, $2, "1", $1}') | \
    while read line; do scale $line; done

And finally, for disk usage output from df, we want to print every line except the header line, so we'll use the pattern exclusion command in awk to skip the first line. We also need to filter out the % signs from the df output, and awk can do this for us with a sub command:

  printf "  Disk Usage:\n"
  (df -mP | \
   awk '$1 !~ /Filesystem/ {sub(/%/,""); print $5, "100", "70", $6}') | \
    while read line; do scale $line; done

Try it out
You can copy the entire script below into a putty session as a non-root user and try the script out for yourself. In Part 2 of this series, we'll explore a technique for running the script on multiple ESX hosts using SSH.


###############################################################################
#
#  esx-report.sh
#
###############################################################################
#
#  Copy this entire code segment into your shell to create the ~/esx-report
#  directory and esx-report.sh script and make it executable.
#  If you would rather copy just the script itself, select everything between 
#  the SCRIPTCREATOR limit strings.
#
###############################################################################
if [ ! -d ~/esx-report ]; then mkdir ~/esx-report -m 0700; fi
cd ~/esx-report

cat > ./esx-report.sh <<'SCRIPTCREATOR'
#! /bin/bash

 PATH="/bin:/usr/bin"

# Usage: [value] [max value] [threshold percentage] [description]
function scale ()
{
  # exit if called without four parameters
  if [ -z $4 ]; then
    return 1
  fi

  local avg alert scaled hist histtext i
  # change histogram scale here
  local CSCALE=20

  # protect against divide by zero, even though awk doesn't complain
  if [ $1 -gt 0 ] && [ $2 -gt 0 ]; then

    # no floating point in bash, use awk to get avg and round (%.0f) to int
    avg=$(echo $1 $2 | awk '{printf "%.0f", ($1/$2) * 100}')

    scaled=$(echo $1 $2 $CSCALE | awk '{printf "%.0f", ($1/$2) * $3}')
  else
    avg=0; scaled=0
  fi

  if [ $avg -ge $3 ]; then
    alert=$3
  fi

  # shift off first three args, leaving rest of args for description
  shift 3
  # grab whole array of args left over, this allows for spaces in description
  histtext="$@"

  printf "    %-10.10s %3d%% " "$histtext" "$avg"

  for ((i=0; i<scaled; i++)); do
    hist="${hist}#"
  done

  # with scaling, low values may show nothing in histogram
  # if hist undef, add a #, but we want a zero value to show nothing
  if [ ! $hist ] && [ $avg -gt 0 ]; then
    hist='#'
  fi

  printf "[%-${CSCALE}.${CSCALE}s]" "$hist"
  printf "\n"

  if [ $alert ]; then
    printf "%28s%3d%% %8s" "WARNING:" "$alert" "threshold"
    printf "\n"
  fi
}

  printf ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n"
  printf ">\n"
  printf "> $(hostname)\n\n"

  printf " Service Console Stats:\n"
  printf "  $(uptime | egrep -o 'load.*' | tr 'l' 'L')\n"
  printf "  Number of running processes: $(ps -e | grep . -c)\n\n"
  printf "  Proc 5 sec avg:\n"
  (vmstat 5 2 | awk 'END {
   print $13, "100", "75", "User:\n" $14, "100", "75", "System:"}') | \
    while read line; do scale $line; done

  printf "  Memory Usage:\n"
  (free | awk '/^Mem:/ {print $3, $2, "100", $1}
              /^Swap:/ {print $3, $2, "1", $1}') | \
    while read line; do scale $line; done

  printf "  Disk Usage:\n"
  (df -mP | \
   awk '$1 !~ /Filesystem/ {sub(/%/,""); print $5, "100", "75", $6}') | \
    while read line; do scale $line; done
  printf "\n\n\n\n"

SCRIPTCREATOR

chmod 0700 ./esx-report.sh

###############################################################################
...read more

May 6, 2009

Become Friends with find

A while back I noticed a tip posted somewhere on how to use the find utility to register a bunch a virtual machines at once. It was a really helpful post and illustrated some of the potential of the the Swiss Army-like find utility. But it overlooked one of the coolest features of find, the -exec option.

Using -exec, you can launch a command with find and pass each found file as a parameter to it, eliminating the need to run vmware-cmd in a for loop. Just place the command to run after the -exec parameter, and find will replace the string {} with the current file being processed:

[root@esx02 root]# find /vmfs/volumes/ -name '*.vmx' -exec vmware-cmd -s register {} \;
 
register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-dc1/lab-dc1.vmx) = 1
register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-ex1/lab-ex1.vmx) = 1
register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/uda14-esx/uda14-esx.vmx) = 1

The -exec option is often considered less efficient than using xargs, which will feed multiple parameters to a command at once rather than launching the command with a single argument like -exec does. But for use with a command like vmware-cmd, which only expects to have one .vmx file passed to it, -exec is perfect.

And find has another option, -ok, which does the same thing but will present a prompt before running the command on each .vmx file that is found. This can be really handy if you are registering or powering on a bunch of VMs but know that there are some you are going to want to skip:

[root@esx02 root]# find /vmfs/volumes/ -name '*.vmx' -ok vmware-cmd -s register {} \;

< vmware-cmd ... /vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-dc1/lab-dc1.vmx > ? y
register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-dc1/lab-dc1.vmx) = 1

< vmware-cmd ... /vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-ex1/lab-ex1.vmx > ? y
register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-ex1/lab-ex1.vmx) = 1

< vmware-cmd ... /vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/uda14-esx/uda14-esx.vmx > ? n

find rules
I've been in a couple of jams where find really saved the day. Consider a situation where a standalone ESX server needs to be rebuilt, but the virtual machines are all on networked storage. Using find afterwards to register and start the VMs makes the process trivial:

[root@esx02 root]# find /vmfs/volumes/ -name 'lab*.vmx' -exec vmware-cmd -s register {} \;

register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-dc1/lab-dc1.vmx) = 1
register(/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-ex1/lab-ex1.vmx) = 1


[root@esx02 root]# find /vmfs/volumes/ -name 'lab*.vmx' -exec vmware-cmd {} start \;

VMControl error -16: Virtual machine requires user input to continue
VMControl error -16: Virtual machine requires user input to continue


[root@esx02 root]# find /vmfs/volumes/ -name 'lab*.vmx' -print -exec vmware-cmd {} getstate \;

/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-dc1/lab-dc1.vmx
getstate() = stuck

/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-ex1/lab-ex1.vmx
getstate() = stuck


[root@esx02 root]# find /vmfs/volumes/ -name 'lab*.vmx' -print -exec vmware-cmd {} answer \;

/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-dc1/lab-dc1.vmx

Question (id = 1) :msg.uuid.moved:The location of this virtual machine's configuration file has changed since it was last powered on.

If the virtual machine has been copied, you should create a new unique identifier (UUID).  If it has been moved, you should keep its old identifier.

If you are not sure, create a new identifier.

What do you want to do?
        0) Create
        1) Keep
        2) Always Create
        3) Always Keep
        4) Cancel
Select choice. Press enter for default ɘ> : 1
selected 1 : Keep

/vmfs/volumes/495e44a0-d41258bc-fac4-000c299206d0/lab-ex1/lab-ex1.vmx

Question (id = 1) :msg.uuid.moved:The location of this virtual machine's configuration file has changed since it was last powered on.

If the virtual machine has been copied, you should create a new unique identifier (UUID).  If it has been moved, you should keep its old identifier.

If you are not sure, create a new identifier.

What do you want to do?
        0) Create
        1) Keep
        2) Always Create
        3) Always Keep
        4) Cancel
Select choice. Press enter for default ɘ> : 1
selected 1 : Keep

You could even use the -ok option to introduce a pause between the VM start-ups.

Shrimp tacos
The syntax for -exec and -ok can be a little difficult to remember. The ; at the end of the command designates the end of the command find should execute, and it has to be escaped like \; so the shell doesn't interpret it.

I could never get the syntax right from memory until I associated it with shrimp tacos. The curly braces {} are a corn tortilla, and I'm using a spatula to get my shrimp off the skillet, \;

It may be corny, but I never forget the syntax now!

...read more

May 3, 2009

Secrets of the e1000

Updated: May 23, 2009
Note that as of vSphere - ESX 4.0, when using the new virtual machine wizard, if you select a custom configuration, and virtual machine version 7, the e1000 is now presented as a virtual adapter option for a Windows Guest (along with Flexible, VMXNET 2 and VMXNET 3. Sweet!). The operating system swap around and .vmx file editing hacks detailed below should no longer be necessary if you want to use the e1000 in a VM. For details on the available virtual NIC options, see this KB article.

There is still no equivalent to vlance.noOprom = "true" or vmxnet.noOprom = "true" for the e1000 to directly disable the PXE option ROM. However the solution described below also works with vSphere.


If you haven't used the e1000 virtual NIC before, it's a virtual implementation of the ubiquitous Intel PRO/1000 Ethernet adapter. According to this VMware KB article, the performance of the e1000 device lies somewhere in between the vlance and vmxnet devices, making it the perfect choice for a virtual machine that doesn't have VMware tools installed, and is therefore unable to utilize the advanced vmxnet virtual NIC.

Pesky pixies
I've been thinking all along that the e1000 virtual NIC lacks the option ROM necessary for PXE booting, and so have never bothered to disable it. While doing some testing on the Vyatta virtual machine I set up for the Protect the Service Console Network With a Virtual Firewall project, I noticed the familiar network boot screen after forgetting to connect the Vyatta installation CD for the initial boot.

If you read my post on Hardening the VMX File, you'll remember I discussed a potential exploit using PXE. Since I thought the e1000 was option ROM free, I failed to discuss disabling it if you have no use for PXE in your environment.

I can make at least one excuse for this oversight: there actually is no .vmx directive to disable the option ROM in the e1000. The option ROMs in the vlance and vmxnet adapters can be disabled with one of these directives, vlance.noOprom = "true" or vmxnet.noOprom = "true", but there is no equivalent command for the e1000.

There is an easy workaround for the lack of a disabling command, however. We can set the memory size that the BIOS makes available to the option ROM to zero, effectively preventing it from loading with this .vmx directive:


ethernet0.opromsize = "0"


You'll need to add this directive for each e1000 adapter, ethernet1.opromsize = "0" for example, if you have two of them in a virtual machine. I've gone back to the Hardening the VMX File post and added this as an additional recommended directive if you are using e1000 adapters.

Not so secret
In an attempt to understand how I never noticed that PXE boot was available with the e1000, I searched through the release notes for each update version of ESX 3.5 on the VMware downloads page. I was able to confirm that I am actually crazy, and PXE boot has always been available with the e1000. I also discovered that the e1000 is the default when creating virtual machines with some specific operating systems. As of Update 4, selecting one of the Linux 64-bit options, Netware, Solaris, or the Other (64-bit) guest operating system will cause the New Virtual Machine Wizard to present the e1000 as the default network adapter option. So if you don't feel like manually adding an e1000 adapter by editing the .vmx file for a virtual machine, you could initially set it up as one of those guest operating systems and then change it back after the VM is created.

If you don't mind editing a .vmx file, just add this directive, changing the device name to the specific adapter you wish to change:


ethernet0.virtualDev = "e1000"


...read more