April 12, 2009

The Ultimate Kickstart File - Part 1

If you've built a few ESX servers by hand in a moderately complex Virtual Infrastructure environment, you know that it can be a very tedious task. The CD-ROM based installation portion of the install is simple and quick, but the post-install configuration can be downright painful. In this three part series, we'll explore the options for automating ESX installations, and detail the methods and commands we can use to make the process as painless as possible.

Is it really that tedious?
Here's a short list of the post-install tasks I've had to endure:
  • Create multiple vSwitches and name them accordingly, often having to log into VirtualCenter and open up the network configuration of an existing host to ensure the vSwitches are all named exactly the same

  • Set up multiple vmkernels and consult diagrams to figure out the IP addresses for them

  • Add iSCSI hosts and configure CHAP authentication, requiring that a password safe be logged into and the CHAP passwords copied out of it

  • Enable the NTP service and configure multiple NTP servers

  • Enable VMotion

  • Customize the .bashrc file for both root and our non-root user so they have the aliases we like to use

  • Enable account password expiration, lockout intervals, etc. according to our security policies

  • Find the document with our standard SSH banner and copy that into a putty window on the new host

  • Configure the various little tweaks and fixes we have found to be helpful in our environment

And if even one of these tasks is overlooked, it's sure to cause severe annoyance down the road, especially during a moment of crisis.

So it's no wonder that after the third or fourth ESX install I went right to the web to search for any way to automate the process. And lo and behold, VMware has actually provided us with a scripted install generator right from the home pages of our ESX servers! But even better, two of our ESX building brethren are actively working on projects to provide automated network installations!

UDA and EDA
If you haven't checked out the Ultimate Deployment Appliance (UDA) or the ESX Deployment Appliance (EDA) yet, you need to jump on that immediately. They are virtual appliances that allow you to PXE boot your hosts and run scripted ESX installations over the network. Both projects have their strengths and weaknesses, but we will focus on the UDA for the purposes of this tutorial. That's not to take anything away from the EDA, it's a great appliance and seems to be the preferred ESX deployment vehicle at the moment, but we're used to the quirks of the UDA, and prefer its single kickstart configuration window to the multiple windows and drop-downs of the EDA. We've actually gotten so obsessed with creating the perfect kickstart file, that each new change is checked into a svn repository for some basic source control, which is very easy to manage with UDA as the entire kickstart file is easily copied and pasted to and from the window. You should definitely check out EDA however if shell scripting doesn't interest you, as it takes care of a lot of the details for you. There are many good sources of information for setting up both UDA and EDA in your environment, so we'll leave that to the experts and assume that you are up and running with UDA for the rest of the tutorial.

A lofty goal
We've got some excellent tools available to us for scripting ESX installations, and an extreme aversion to repetitive tasks, so we're going to hack away at this until the entire installation and post-configuration are completely automated. Our goal is to have to change only one item in our kickstart files: the hostname. The only task left after the scripted install is complete will be to add the ESX host to VirtualCenter. That's a big undertaking, so we better start with the basics.

Anatomy of a kickstart file
To get a good feel for how a scripted installation works, it's a worthwhile exercise to create a basic kickstart configuration file using the ESX Server Scripted Installer, which is accessible via a link available from the web interface of any ESX server. Before using the Scripted Installer it must be enabled by editing a configuration file on the ESX server itself. The details of this are covered in the ESX Server 3 Installation Guide, in the Remote and Scripted Installations section.

A kickstart file consists of four basic sections: command, %packages, %pre, and %post.

The command section consists of kickstart specific configuration settings used during the operating system install. We'll let UDA create much of this for us when we create a new template using the UDA web interface.

The %packages section defines the software packages that should be included during the operating system installation. This is always @base _ @ everything for an ESX installation, so we won't be touching this section.

The %pre section runs immediately after the kickstart options have been parsed, but before the operating system installation begins. The %pre section would be handy for any advanced partitioning options you may want to include before files are written to the system's hard drives. We won't be working with the %pre section in this tutorial, but it could be very handy in certain situations.

The %post section allows us to specify commands to be run immediately following the operation system installation, and it's where we'll spend most of our time in this tutorial. The commands in this section run after the installation, but before the system reboots. This really restricts the ESX specific things we can do here as there are no running VMware services yet. To work around this, we'll primarily use the %post section to configure a shell script that will be executed after the system reboots.

Note that by default the lines of commands included in the %post section are interpreted by the bash shell, but that can be changed with the --interpreter option, allowing you to use python syntax for example in the %post section. To change the interpreter, place the command --interpreter /usr/bin/python at the beginning of the %post section, and replace /usr/bin/python with the scripting language you prefer, provided it's available in the default install of ESX of course.

Now that we understand the basic layout of a kickstart file, we can develop some strategies for making our ESX installations as automated as possible:

- Use UDA to create our most generic configuration image
We build most of our ESX servers with almost exactly the same basic configuration, so we can use UDA to generate this baseline template. Create a new template in UDA using the root password you use for deployments, your Linux partitioning scheme, and your regional and licensing information. We'll use this basic development template to build and test our custom kickstart file.

- Enter our generic post-install commands in the %post section
If we edit our new template from the UDA website, we'll find that UDA has placed a %post section marker at the bottom of the kickstart file for us. As we discovered earlier, these commands are interpreted by the bash shell immediately after the operating system installation completes, but before the post-installation reboot. Since we won't have any VMware services running at this phase of the install, only our most generic Linux customization commands should go here. After putting in all of our non-ESX configuration commands, we will use the %post section to create a shell script that will be run after the ESX server automatically reboots as part of the install process.

- Use the %post section to create our shell script
It's a common practice to use a shell script to create another shell script, and that's exactly what we need to do here; use the kickstart script to create a new shell script. To do this, we'll use a here document to feed a list of commands to cat, and cat will output the commands to a shell script. You've probably seen here documents used a hundred times, but maybe didn't understand exactly what was going on under the hood, but we need to understand the basics in order to avoid a big pitfall with our shell script.

Building the post-install script
A here document is just a special block of text. When bash encounters a here document, instead of seeing the commands and the carriage returns in the text block as an indication that we want bash to execute the commands, bash feeds the block of text into the command we are directing it into. The here document is indicated by two less-than characters, <<, so in its basic form it looks like this:

COMMAND <<InputComesFromHERE
text
InputComesFromHERE

For our purposes, we'll be using cat as the COMMAND, so we'll call our here document a cat script, and it will look like this:

/bin/cat <<UntilYouSeeEOF
bash command 1
bash command 2
etc.
UntilYouSeeEOF

If you paste that code block into a bash shell, the output will be:

bash command 1
bash command 2
etc.

The shell fed the whole block of text to cat, and cat spit it back out to the terminal just as we wrote it, with the original white space and line feeds. If you remove the << characters, you'll get several 'command not found' messages from the shell, because without the <<, bash thinks you want to run the lines of text as commands.

Since we want to use our cat script to create a bash script, we need to tell cat to redirect its output to a file, rather than to our terminal. That's easy enough to do with the > character.

/bin/cat > ~/our_script.sh <<EOF
cp /etc/skel/.bashrc /etc/skel/backup_bashrc
cp /root/.bashrc /root/backup_bashrc
cp /root/.dircolors /root/backup_dircolors
EOF

If you copy that into a bash shell, you'll find it creates a new shell script in your home folder called our_script.sh, and the script has three commands in it to backup some .bashrc files.

So you're probably thinking Enough already, we already knew what a here document was, what's the pitfall we need to look out for? Well, in its default form, the shell will substitute anything it thinks is a parameter in our cat script. So if we write:

/bin/cat > ~/our_script.sh <<EOF
echo $HOSTNAME
EOF

Bash will substitute $HOSTNAME with our actual hostname, which may not be what we intended. This can be a real problem when trying to create users with useradd and specifying the password. Any $ or special shell characters in our cat script will trigger parameter replacement. This is easy enough to fix though, we just need to single quote our limit string like this:

/bin/cat > ~/our_script.sh <<'EOF'
echo $HOSTNAME
EOF

Now check out the script our cat script created:

cat ~/our_script.sh

The output should be:

echo $HOSTNAME

Bash did not substitute $HOSTNAME in our cat script with our real host name.

Start your hacking
So now that we understand how our kickstart file needs to be constructed, and we have our basic strategy planned out, we can start building our scripted installation. If you have a development or test environment, or even a new server you can spend a few days using for kickstart testing, you're set. Even an old server sitting in the recycle pile can be very useful for this purpose, especially if it is from the same vendor as your newer servers.

If you have used UDA before, and have been paying attention, you may be wondering what all the talk about parameter substitution and quoted 'here documents' was for. The first time you attempt to copy one of your kickstart scripts to another template for an additional ESX server you want to bring up, you'll realize one of the big bummers of this setup; we have to change the hostname and every IP address in the script. Ugh!

One of the big draws of EDA is that it will do some of this for you, but if you read through the forum postings, it's pretty clear that it's not 100% yet. So what do we do, just deal with hunting through our kickstart scripts and meticulously changing IP addresses for each ESX host we want a template for? Not a chance! We'll use our natural tendency to name and address our virtual environments in an organized manner to our advantage.

Compulsive labeling
If you've seen a few VMware environments in the wild, you'll know that the ESX servers are almost always named with a series of digits indicating their uniqueness. Whether it's the simple (esx01, esx02, esx03) or something amazingly complex (nyc01-dc03-esx001, nyc01-dc03-esx002), we all appear to be labeling ESX servers in this format. Chalk it up to the fact that an ESX host can be serving many different roles in an enterprise at the same time, so this is really the only way to name them. The IP addressing schemes also tend to follow the naming scheme, so for esx01, the IP address of the service console might be 172.20.1.101, and esx02 would be 172.20.1.102, etc.

esx01
Service Console: 172.20.1.101
iSCSI VMkernel:  172.21.1.101
iSCSI Svc Console: 172.21.1.201
VMotion VMkernel: 172.22.1.101
esx02
Service Console: 172.20.1.102
iSCSI VMkernel:  172.21.1.102
iSCSI Svc Console: 172.21.1.202
VMotion VMkernel: 172.22.1.102

So if we're doing this in our environment, and our IP addressing scheme is tied to our host naming scheme in some fashion, couldn't we just grab those digits from the end of our hostname and feed them to our networking configuration for the IP addresses? You bet we can! Our kickstart script is creating a bash script that will run after the initial reboot, and we can do stuff like this easily in bash. So how do we use bash to grab just the two digits in our hostnames?

As with all problems like this, there are many good solutions. We could use sed or awk to grab the digits, but the syntax for capturing backreferences in both of those can look like Klingon. Backreferences? In the simplest terms, a backreference allows us to enclose a section of our regular expression in parenthesis and save it for later as a match within our match. So we can create a regular expression to match the FQDN of our host naming scheme, and enclose the digits at the end of our hostname in (..) to grab only those after the match. Since Perl handles backreferences (and regular expressions in general) in a very straightforward manner, we're going to use a perl one-liner from our bash script to grab our host digits. Let's imagine that we have an ESX server named esx42.area51.mil, and we want to pull the 42 (or 43, 44, etc.) out of that FQDN. Our regular expression for matching the whole FQDN format would be:

/esx[0-9]+\.area51\.mil/

The leading / and trailing / serve to enclose our regular expression, and we're telling perl to match something that begins with esx followed by the set of numbers 0 to 9 ([0-9]) one or more times (+), then a period (\.), the word area51, another period (\.), and the word mil.

To have perl store the digits right after esx in a backreference for us, we simply enclose them in parenthesis:

/esx([0-9])+\.area51\.mil/

And then reference the backreference using the special variable $1 that perl uses to store it, like this:

perl -e '$_ = "esx42.area51.mil" ; /esx([0-9]+)\.area51\.mil/ ; print $1'

Should output '42' to our terminal. Since we're using perl anyway, we might as well use perl's Sys::Hostname module in our one-liner to grab the hostname of our ESX server. So our complete one-liner to grab just the hostname digits of our ESX servers is:

perl -e 'use Sys::Hostname; hostname() =~ /esx([0-9]+)\.area51\.mil/ ; print $1'

Let's make it a little more portable so we can use it without modification in other scripts. We know our hostnames will always begin with some combination of letters, followed by a two or three digit number, followed by a period, and then some domain name, but possibly no domain name if we somehow forgot to put one in the kickstart.

In the original version of this post, I wasn't concerned about a hostname containing numbers with leading zeroes, like 003, as the VMware utilities will gladly accept an IP address in that format. But keeping the final octet of the IP in that format is causing trouble later in the script, so we'll just use printf instead of plain old print and specify that printf should output a decimal value, which will strip any leading zeroes. So let's change our perl one-liner to:

perl -e 'use Sys::Hostname; hostname() =~ /^[a-zA-Z]+([0-9]+)\.?.*/ ; printf "%d", $1'

That's pretty compact, fairly readable, and it's going to save us a ton of tedium.

Well I can see how that would work in your environment, but we're naming our ESX servers after Greek deities, or characters from the Smurfs, how am I going to get an IP address from azrael, hefty, johan and peewit?

We can easily use a bash case statement to maintain a chart of hostnames to IPs. To grab our hostname this time, we're going to switch things up a little and call the hostname command and then pipe it through the cut utility, using a period as the delimiter, and only print the first field, basically stripping out our domain name. We'll then pipe it to the tr utility and tell it to translate any upper case letters to lower case so we don't have to worry about that in our case statement.

Why didn't you just call `hostname --short` to grab the bare host name without the domain name? I've seen the --short option return localhost on several different types of Linux hosts in the past, even though the man page claims it cuts the full hostname at the first dot. So basically I don't trust the --short option. Our cut method is also safe if hostname returns something without any periods, so even if you forgot to put your domain name in the kickstart file, this still works.

As you can see below, this becomes burdensome very quickly, but it may be the only way to go, so here's how it could look:

BAREHOST=`hostname | cut --delimiter . --field 1 | tr '[:upper:]' '[:lower:]'`

case $BAREHOST in
 azrael)
   IPNUM='1'
;;
 hefty)
   IPNUM='2'
;;
 johan)
   IPNUM='3'
;;
 peewit)
   IPNUM='4'
;;
 *)
   IPNUM='99'
;;
esac

The *) in our case statement is a catch all, so in the event we don't match any of the host names, we'll give the host an IP ending in 99 so the installation doesn't just bomb out.

It's hard to see how maintaining a case statement like this would be less work than just editing the kickstart for each host by hand, but at least it is possible.

Where's the beef?
That's a lot to digest, so we'll end Part 1 of our quest for kickstart nirvana here. In Part 2, we'll get in to the real meat of automating the install process, including vSwitch creation and IP addressing, iSCSI configuration, SSH customization, creation of a non-root user account, and much more.

2 comments:

  1. Thanks for the tip about single quoting the limit string. Figuring out the var substitution was killing me!!!

    ReplyDelete
  2. I was just thinking could we not use nslookup with the ip and get the host name this way or the other way around too?

    ReplyDelete