April 14, 2009

The Ultimate Kickstart File - Part 2

In Part 2 of our quest for The Ultimate Kickstart File, we'll get down and dirty with the %post section. We'll create a shell script and set it to execute after reboot, and use the strategy we explored in Part 1 for deriving the IP addresses for the various networking components by grabbing the digits from the ESX hostname. The kickstart we build here could be used to install hundreds of ESX hosts, requiring only one change per host.

Since we'll be changing the IP configuration of the Service Console after the kickstart installation, use DHCP for the boot protocol when setting up a template with UDA.

There's a lot going on here, so we'll break out each section as we go. To see the whole kickstart file come together, just scroll down to the bottom.

In the first part of our %post section, we'll get some of our Linux customizations out of the way. As we saw in Part 1, commands in the %post section will execute before the ESX host reboots for the first time, so there are no VMware services running yet. Our VMware commands will need to wait until we construct the shell script that will run after the first reboot. These first commands will affect the Linux service console only.

As you are working your way through these commands, notice when we are using the > symbol to clobber and create new files, and the >> symbol to append to the end of existing files.

We'll create our SSH banner and append sshd_config with its location:

/bin/cat > /etc/ssh/banner <<'SSHEOF' 
         ========================================= 
          WARNING: UNAUTHORIZED USE IS PROHIBITED 
         ----------------------------------------- 

     All Virtual Foundry maintained telecommunications, 
     data information systems,  and  related  equipment 
     are  for the communication, transmission, process- 
     ing, and  storage of Virtual  Foundry  information 
     only,  and  should  only be accessed by authorized 
     Virtual  Foundry   employees.  These  systems  and 
     equipment  are subject to authorized monitoring to 
     ensure  proper  functioning,  to  protect  against 
     unauthorized  use,  and to verify the presence and 
     performance of applicable security features.  Such 
     monitoring  may result in the acquisition, record- 
     ing, and analysis of all data being  communicated, 
     transmitted,  processed,  or stored in this system 
     by a user. 
     If monitoring reveals possible evidence of  crimi- 
     nal activity, such evidence may be provided to law 
     enforcement personnel.  Anyone using  this  system 
     expressly consents to such monitoring. 

SSHEOF 

/bin/echo "banner /etc/ssh/banner" >> /etc/ssh/sshd_config 

This will send warnings and errors to the various consoles, which can come in handy during those crazy 'the network is down!' moments when you are running to the server racks. You can press Alt-F2, Alt-F3, and Alt-F4 to access the different consoles and see various degrees of message importance:

/bin/cp /etc/syslog.conf /etc/backup_syslog.conf

/bin/cat >> /etc/syslog.conf <<'EOFSYSLOG'
# Send error msgs to tty2-tty4 for troubleshooting
*.crit /dev/tty2
*.err /dev/tty3
*.warning /dev/tty4
EOFSYSLOG

Now we'll customize the root account's .bashrc file and add some custom colors so any VMware specific file extensions will stand out when we list directories. I always add an alias named lah for ls -lah, as I use this command constantly:

/bin/cp /root/.bashrc /root/backup_bashrc 

/usr/bin/dircolors -p > /root/.dircolors 

/bin/cat >> /root/.bashrc <<'BASHRCEOF' 
[ -e "$HOME/.dircolors" ] && DIR_COLORS="$HOME/.dircolors" 
[ -e "$DIR_COLORS" ] || DIR_COLORS="" 
eval "`dircolors -b $DIR_COLORS`" 
alias ls='ls --color=auto' 
alias lah='ls -lah' 
BASHRCEOF 

/bin/cat >> /root/.dircolors <<'DIRCOLORSEOF' 
# VMware files 
.vmx 00;36 
.vmdk 01;35 
.vmtx 01;32 
.vmsn 01;31 
DIRCOLORSEOF

I like to have the customizations made to root's bash shell available to the non-root user automatically, so I'll copy root's .bashrc and .dircolors files to the /etc/skel directory, which is like a template directory for new user accounts. There are several different ways you could do this, like using /etc/profile and /etc/bashrc for instance, but I like the flexibility of giving each user a base set of rc files that they are free to customize as they see fit:

mv -f /etc/skel/.bashrc /root/backup_skel_bashrc
cp /root/.bashrc /etc/skel/.bashrc 
cp /root/.dircolors /etc/skel/.dircolors 

Adding the non-root user is simple enough with useradd, but first we need to generate an encrypted password. To do this, just execute /sbin/grub-md5-crypt from the console of one of your ESX servers and enter the password you want to use. grub-md5-crypt will output an encrypted password that we can use with useradd. Make sure you single quote the password, we don't want bash doing any parameter substitution when it encounters special shell characters:

useradd -p '$1$IIoWw$UBdW2FnKMwci0OeBXmM.i0' admin

Now we'll start writing our post-reboot configuration script. When these commands are executed, we'll have running VMware services, so the vmware-vim-cmd and esxcfg-* commands are at our disposal. Also notice that we're creating the post-reboot script in the root directory. Since this script will have encrypted passwords and a good deal of information about the ESX server that was configured by it, we should place it in a secure location rather than on /tmp, which is world writable by default:

/bin/cat > /root/esx_script.sh <<'ESXCFG'
#!/bin/bash

It's always a good practice to define constants for variables you know might change at some point in the future, so we'll define our DNS, NTP, and iSCSI server IP addresses. If you do this, remember to enclose the constants in double quotes rather than single quotes if you need to quote them so bash will perform parameter substitution. Also, we have to place these within the post-reboot cat script, as they won't be expanded until the post-reboot script runs because we have quoted our cat script limit string.

CDNSSERVER1='172.20.1.240'
CDNSSERVER2='172.20.1.241'
CNTPSERVER1='172.20.1.240'
CISCSITARG1='172.21.1.250'

We'll have to pause for a bit after the reboot to give the VMware services time to initialize. Four minutes is probably overkill, but it's safe:

sleep 4m

We need to add our DNS servers to resolv.conf, as the ESX host is currently picking those up from DHCP:

/bin/echo "nameserver $CDNSSERVER1" > /etc/resolv.conf
/bin/echo "nameserver $CDNSSERVER2" >> /etc/resolv.conf

Let's grab our host IP address from our hostname like we covered in Part 1:

HOSTIP=`perl -e 'use Sys::Hostname; hostname() =~ /^[a-zA-Z]+([0-9]+)\.?.*/ ; printf "%d", $1'`

We'll change the Service Console IP address from the DHCP setting we put in the kickstart configuration section to the real IP address we derived from the hostname:

/usr/sbin/esxcfg-vswif --ip 172.20.1.$HOSTIP --netmask 255.255.255.0 vswif0

We're also going to change the name of the Service Console to differentiate it from the iSCSI Service Console we'll need to configure as well:

/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--portgroup-name="ESX_Console" vSwitch0 "Service Console"

Then we'll add a port group named Console_Net that we can use to home VMs in the ESX Console Network:

/usr/sbin/esxcfg-vswitch --add-pg="Console_Net" vSwitch0

Now we'll add another vSwitch, linking vmnic1 to it, and create our primary VM network, Server_Net:

/usr/sbin/esxcfg-vswitch --add vSwitch1
/usr/sbin/esxcfg-vswitch --link=vmnic1 vSwitch1
/usr/sbin/esxcfg-vswitch --add-pg="Server_Net" vSwitch1

Let's create our iSCSI networking at this point. Notice we're using the 172.21.1.1 - 100 range for the vmkernel IP addresses, and the 172.21.1.101 - 200 range for the iSCSI Service Console. In order to get the iSCSI Service Console IP address from the hostname, we tell bash to perform arithmetic expansion with the $(( )) command and add 100 to our host IP:

/usr/sbin/esxcfg-vswitch --add vSwitch2
/usr/sbin/esxcfg-vswitch --link=vmnic2 vSwitch2
/usr/sbin/esxcfg-vswitch --add-pg="iSCSI_Console" vSwitch2
/usr/sbin/esxcfg-vswif --add vswif1 --ip 172.21.1.$(($HOSTIP+100)) \
--netmask 255.255.255.0 --portgroup "iSCSI_Console"

/usr/sbin/esxcfg-vswitch --add-pg="iSCSI_VMkernel" vSwitch2
/usr/sbin/esxcfg-vmknic --add --ip 172.21.1.$HOSTNUMBER \
--netmask 255.255.255.0 "iSCSI_VMkernel"

/usr/sbin/esxcfg-route --add default 172.21.1.254

We've got one more vSwitch to add, this one will handle VMotion traffic:

/usr/sbin/esxcfg-vswitch --add vSwitch3
/usr/sbin/esxcfg-vswitch --link=vmnic3 vSwitch3
/usr/sbin/esxcfg-vswitch --add-pg="VMotion_VMkernel" vSwitch3
/usr/sbin/esxcfg-vmknic --add --ip 172.22.1.$HOSTIP \
--netmask 255.255.255.0 "VMotion_VMkernel"

/usr/bin/vmware-vim-cmd hostsvc/vmotion/vnic_set vmk1

Our security policy dictates that we disable Promiscuous Mode, MAC Address Changes, and Forged Transmits on all vSwitches and port groups, so we'll put those commands here:

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch0

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch0

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch0

/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--securepolicy-promisc=false vSwitch0 ESX_Console

/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--securepolicy-macchange=false vSwitch0 ESX_Console

/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--securepolicy-forgedxmit=false vSwitch0 ESX_Console

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch1

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch1

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch1

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch2

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch2

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch2

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch3

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch3

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch3

Since we started out with a DHCP assigned address, we'll have to manually add an entry to /etc/hosts with the proper IP:

printf "172.20.1.$HOSTIP\t\t`hostname` $BAREHOST\n" >>/etc/hosts

Now we'll allow iSCSI traffic through the firewall, enable the software iSCSI adapter, set an IQN name, and add a send target. We'll even set up our CHAP password:

BAREHOST=`hostname | cut --delimiter . --field 1`

/usr/bin/vmware-vim-cmd hostsvc/firewall_enable_ruleset swISCSIClient
/usr/bin/vmware-vim-cmd hostsvc/storage/software_iscsi_enabled true
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_set_name \
vmhba32 iqn.1998-01.com.vmware:$BAREHOST

/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_add_send_target \
vmhba32 $CISCSITARG1

# Set CHAP password
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_enable_chap \
vmhba32 iqn.1998-01.com.vmware:$BAREHOST "0123456789abcdef"

EDIT - this section had turned into an essay on random number generation, so I removed the bad options and am just listing the one now ~ RP

Notice that we specified the CHAP password in clear text here. That's not very secure as it will be hanging around on our UDA server. So let's do that a little more securely, by first using a bash one-liner to generate a random password into a text file on the ESX server itself, and then we'll read it from the text file for our iscsi_enable_chap command:

until [ ${#KEY} -eq 16 ]; do D=$(od -A n -N 1 -t u1 </dev/random); \
if ([ $D -ge 48 ] && [ $D -le 57 ]) || \
   ([ $D -ge 65 ] && [ $D -le 90 ]) || \
   ([ $D -ge 97 ] && [ $D -le 122 ]); \
then KEY="$KEY"$(printf \\$(printf '%03o' $D)); fi; done; \
printf $KEY >/root/chap_pwd.txt; unset KEY

We're telling bash to loop until the key is 16 characters long (${#var} returns variable length), read one byte from /dev/random into od (octal dump), not show us the offset value (-A n) as we just want the data, and output it in unsigned decimal format (-t u1). This gives us an integer ranging in value from 0 to 255. This range nicely covers the decimal values for the ASCII character sets we want to grab, so we'll run each number through a series of short circuits to test if it is in the range of the character sets:[0-9], [a-z], [A-Z]. If a number is in the range, we'll add it to the end of the key by using printf to convert the decimal to an octal value, and printf again to print the ASCII character that corresponds to the octal value.

Now make sure the chap_pwd.txt file is only readable by root:

chmod 600 /root/chap_pwd.txt

And we'll read the password file into the iscsi_enable_chap command:

# Set CHAP password
CHAP_PWD=`cat /root/chap_pwd.txt`; \
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_enable_chap \
vmhba32 iqn.1998-01.com.vmware:$BAREHOST $CHAP_PWD

On to the NTP setup. This was gleaned from viewing the ntp.conf and step-tickers files after setting up NTP through the VI Client:

/usr/sbin/esxcfg-firewall --enableService ntpClient

/bin/mv -f /etc/ntp.conf /etc/backup_ntp.conf

/bin/cat > /etc/ntp.conf <<NTPEOF
restrict kod nomodify notrap noquery nopeer
restrict 127.0.0.1
server 127.127.1.0
server $CNTPSERVER1
driftfile /var/lib/ntp/drift
NTPEOF

/bin/mv -f /etc/ntp/step-tickers /etc/ntp/backup_step-tickers
 
/bin/cat > /etc/ntp/step-tickers <<STEPEOF
server 127.127.1.0
server $CNTPSERVER1
STEPEOF

chkconfig --level 2345 ntpd on
service ntpd restart

Our security policies dictate that we set the non-root user account used for SSH access, admin in this example, to have a minimum password age of 2 days, a maximum of 90 days between password changes, and 14 days notice before the password expires. We'll use the chage command to change the password expiration settings. The esxcfg-auth command will enable the PAM module pam_tally.so, which tracks failed login attempts, and uses by default a failed attempt counter database /var/log/faillog, which we'll create and restrict to root access only. Finally, we'll use the faillog command to set the admin account to 5 login attempts with a 120 minute lockout duration:

chage -m 2 -M 90 -W 14 admin

# enables pam_tally.so for failed attempt lockout
esxcfg-auth --maxfailedlogins=5

# create a failed login log file
touch /var/log/faillog
chown root:root /var/log/faillog
chmod 600 /var/log/faillog

# set max attempts to 5 and duration to 120 minutes for user admin
faillog -u admin -m 5 -l 7200

We'll end the cat script with the closing limit string:

ESXCFG

And add the executable bit to our shell script and remove all access for everyone but root:

chmod 700 /root/esx_script.sh

So we've created a shell script, but what's going to execute it before we log in to the ESX server for the first time? We're going to use the rc.local script to run our shell script after the ESX installation process reboots the host. The rc.local script is the last init script to run before the system presents a login prompt, and it's perfect for launching a one-time configuration script.

First of all, let's make a backup of rc.local:

cp /etc/rc.d/rc.local /etc/rc.d/rc.local.backup

Now we'll use a cat script to append our commands on to the end of rc.local:

cat >> /etc/rc.d/rc.local <<RCCFGEOF
/root/esx_script.sh
# Move rc.local.backup back to rc.local, restoring the original
mv -f /etc/rc.d/rc.local.backup /etc/rc.d/rc.local
RCCFGEOF

We're done! We'll paste our worked up %post section into a template through the UDA web console and fire up an unconfigured server.

During the kickstart install, you can see what's going on behind the scenes by pressing Alt-F3, Alt-F4, and Alt-F5 to see the logging from various processes as the system installs. From the Alt-F3 screen, you can see if there are any errors in your script as the %post section is parsed.

Use the source
If you watch the console of the ESX server as it is being deployed automatically, press Alt-F1 when the ESX splash screen comes up and you can see the post-reboot processes execute. You'll notice that some of the commands being executed by the shell script are sending their output to the terminal. It would be nice if we could write our own messages during the post-reboot shell script, not only to see the script's progress as it executes, but also to see if any of the commands fail. It would be even better if we could format the output to match the output of the init scripts, with the fancy green OKs and red FAILs. But we don't have time for that and it would be a pain, wouldn't it?

It actually couldn't be any easier! Take a look at the shell scripts in /etc/init.d, they almost all contain this line near the top:

. /etc/init.d/functions

- or -

. /etc/rc.d/init.d/functions

The period is a bash shortcut for the source builtin command, which runs the script specified under the current process, giving the current shell access to the subroutines and variables in the sourced script. And since /etc/init.d is just a symbolic link to /etc/rc.d/init.d, they are all sourcing the same file.

If you look through /etc/rc.d/init.d/functions, you'll see the functions that the startup scripts are calling, such as success, failure, action, and you can try them out in your shell. Just source functions in your shell by typing, . /etc/rc.d/init.d/functions

Then type success to see the familiar [ OK ], or failure to get a [FAILED].

You can browse through the startup scripts to get all kinds of ideas for how to use these, but we're going to use the action function in our Ultimate Kickstart File. Calling action will print out the descriptive string we want, run the command we specify, and then give us an [ OK ] or [FAILED] depending on the exit status of the command, plus it will even log the results to /var/log/messages for us!

We could do one big action, using it to run the entire post-reboot script:

cat >> /etc/rc.d/rc.local <<RCCFGEOF
action "Executing ESX Configuration Script: " /root/esx_script.sh
# Move rc.local.backup back to rc.local, restoring the original
action "Restoring orginal rc.local: " mv -f /etc/rc.d/rc.local.backup /etc/rc.d/rc.local
RCCFGEOF

But we want to know when the script hits a few key places, so when we're staring at the boot screen, we have something to look at. Plus it makes things much more exciting when showing your boss the kickstart script you've just spent the past two weeks customizing! In the final kickstart script at the bottom, you'll notice a few action calls sprinkled around. Notice that they also provide some handy comments throughout the script, documenting things as we go. One pitfall, make sure you don't action any commands that pipe their output to another file or command, as you'll actually be redirecting the output from action. Instead, just tell action to run /bin/true, which does nothing more than return successfully.

Known issues
I've heard of folks having issues with physical NICs getting swapped around during a kickstart deployment, and have seen this issue myself on a HP DL360. This article from Frank Denneman's excellent blog seems to point to a solution, but unfortunately I don't have any HP hardware right now to test with. If you have seen this issue, and have a working solution, posting a comment with the details would be greatly appreciated.

There are also a couple of syntactic gotchas to watch out for. Make sure there is no white space after your limit strings in the cat scripts, you know, the BLAHBLAHEOF things scattered all over this script. Just one space at the end of a limit string will cause everything below it to be included in the cat command. Also, watch for when a backtick (`) is used as opposed to just a single quote (').

And finally, this kickstart script isn't meant to address every configuration item you'll need in your environment, there's a lot of security tweaking you can put in here as well. But hopefully this will give you some ideas to play with when you are getting it dialed in.

Pretty sweet, but...
So that's slick and all, but what if we had 100 ESX hosts to build?! We've gotten each template down to only one change per ESX host, but that's still 100 edits to make by hand! Plus we'll have to add 100 templates to the UDA web console, not cool!

In Part 3, the final part of our kickstart crushing series, we'll look at this exact scenario, and configure 100 UDA templates for 100 unique ESX hosts with just two commands.

Sample script:
Many of the commands listed in this script were generously shared with the community by virtualization and kickstart pioneers. Many thanks and kudos to the blogs and user forum members who have developed and given us this info!


# Regional Settings
keyboard us
lang en_US
langsupport --default en_US
timezone America/Los_Angeles

# Installatition settings
skipx
mouse none
firewall --disabled
rootpw --iscrypted  $1$hL0l./$ifxVO8MxcXwYoQ5sfdzQn0
reboot
install
url --url http://172.20.1.245/esx/esx301/

# Driver disks

# Load drivers

# Bootloader options
bootloader --location=mbr --driveorder=sda  

# Authentication
auth --enableshadow --enablemd5

# Partitioning
clearpart --all --drives=sda --initlabel
part /boot --fstype ext3  --size 250  --ondisk=sda --asprimary
part / --fstype ext3  --size 5120  --ondisk=sda --asprimary
part swap   --size 1600  --ondisk=sda --asprimary
part /var/log --fstype ext3  --size 4096  --ondisk=sda 
part /var --fstype ext3  --size 4096  --ondisk=sda 
part /opt --fstype ext3  --size 2048  --ondisk=sda 
part None --fstype vmfs3  --size 1 --grow --ondisk=sda 
part /tmp --fstype ext3  --size 2048  --ondisk=sda 
part /home --fstype ext3  --size 2048  --ondisk=sda 
part None --fstype vmkcore  --size 100  --ondisk=sda 


#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#+ Initial Network Configuration
#+ Change --hostname
#+
network --device eth0 --bootproto dhcp --hostname esx02.vmnet.local --addvmportgroup=0 
#+
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

vmaccepteula

%packages
@base
@ everything

%post

/bin/cat > /etc/ssh/banner <<SSHEOF
         ========================================= 
          WARNING: UNAUTHORIZED USE IS PROHIBITED 
         ----------------------------------------- 

     All Virtual Foundry maintained telecommunications, 
     data information systems,  and  related  equipment 
     are  for the communication, transmission, process- 
     ing, and  storage of Virtual  Foundry  information 
     only,  and  should  only be accessed by authorized 
     Virtual  Foundry   employees.  These  systems  and 
     equipment  are subject to authorized monitoring to 
     ensure  proper  functioning,  to  protect  against 
     unauthorized  use,  and to verify the presence and 
     performance of applicable security features.  Such 
     monitoring  may result in the acquisition, record- 
     ing, and analysis of all data being  communicated, 
     transmitted,  processed,  or stored in this system 
     by a user. 
     If monitoring reveals possible evidence of  crimi- 
     nal activity, such evidence may be provided to law 
     enforcement personnel.  Anyone using  this  system 
     expressly consents to such monitoring. 

SSHEOF

/bin/echo "banner /etc/ssh/banner" >> /etc/ssh/sshd_config

/bin/cp /etc/syslog.conf /etc/backup_syslog.conf

/bin/cat >> /etc/syslog.conf <<'EOFSYSLOG'
# Send error msgs to tty2-tty4 for troubleshooting
*.crit /dev/tty2
*.err /dev/tty3
*.warning /dev/tty4
EOFSYSLOG

/bin/cp /root/.bashrc /root/backup_bashrc

/usr/bin/dircolors -p > /root/.dircolors

/bin/cat >> /root/.bashrc <<'BASHRCEOF'
[ -e "$HOME/.dircolors" ] && DIR_COLORS="$HOME/.dircolors"
[ -e "$DIR_COLORS" ] || DIR_COLORS=""
eval "`dircolors -b $DIR_COLORS`"
alias ls='ls --color=auto'
alias lah='ls -lah'
BASHRCEOF

/bin/cat >> /root/.dircolors <<'DIRCOLORSEOF'
# VMware files
.vmx 00;36
.vmdk 01;35
.vmtx 01;32
.vmsn 01;31
DIRCOLORSEOF

mv -f /etc/skel/.bashrc /root/backup_skel_bashrc
cp /root/.bashrc /etc/skel/.bashrc
cp /root/.dircolors /etc/skel/.dircolors

useradd -p '$1$IIoWw$UBdW2FnKMwci0OeBXmM.i0' admin

#########################################################################################
#
# The section below is the post-reboot script, VMware commands should be placed here
#
/bin/cat > /root/esx_script.sh <<'ESXCFG'
#!/bin/bash

. /etc/rc.d/init.d/functions

# Define constants for various servers here
CDNSSERVER1='172.20.1.240'
CDNSSERVER2='172.20.1.241'
CNTPSERVER1='172.20.1.240'
CISCSITARG1='172.21.1.250'

action "   Sleeping four minutes: " sleep 4m

BAREHOST=`hostname | cut --delimiter . --field 1`

action "   Using hostname $BAREHOST: " /bin/true

HOSTIP=`/usr/bin/perl -e \
'use Sys::Hostname; hostname() =~ /^[a-z-A-Z]+([0-9]+)\.?.*/ ; printf "%d", $1'`

action "   Using host IP index of $HOSTIP: " /bin/true

action "   Setting Service Console IP address: " \
/usr/sbin/esxcfg-vswif --ip 172.20.1.$HOSTIP --netmask 255.255.255.0 vswif0

action "   Adding DNS server info to resolv.conf: " /bin/true

/bin/echo "nameserver $CDNSSERVER1" > /etc/resolv.conf
/bin/echo "nameserver $CDNSSERVER2" >> /etc/resolv.conf

action "   Setting up virtual networking: " /bin/true

/usr/bin/vmware-vim-cmd /hostsvc/net/portgroup_set \
--portgroup-name="ESX_Console" vSwitch0 "Service Console"

/usr/sbin/esxcfg-vswitch --add-pg="Console_Net" vSwitch0

/usr/sbin/esxcfg-vswitch --add vSwitch1
/usr/sbin/esxcfg-vswitch --link=vmnic1 vSwitch1
/usr/sbin/esxcfg-vswitch --add-pg="Server_Net" vSwitch1

/usr/sbin/esxcfg-vswitch --add vSwitch2
/usr/sbin/esxcfg-vswitch --link=vmnic2 vSwitch2
/usr/sbin/esxcfg-vswitch --add-pg="iSCSI_Console" vSwitch2
/usr/sbin/esxcfg-vswif --add vswif1 --ip 172.21.1.$(($HOSTIP+100)) \
--netmask 255.255.255.0 --portgroup "iSCSI_Console" >/dev/null 2>&1

/usr/sbin/esxcfg-vswitch --add-pg="iSCSI_VMkernel" vSwitch2
/usr/sbin/esxcfg-vmknic --add --ip 172.21.1.$HOSTIP \
--netmask 255.255.255.0 "iSCSI_VMkernel"

/usr/sbin/esxcfg-route --add default 172.21.1.254 >/dev/null 2>&1

/usr/sbin/esxcfg-vswitch --add vSwitch3
/usr/sbin/esxcfg-vswitch --link=vmnic3 vSwitch3
/usr/sbin/esxcfg-vswitch --add-pg="VMotion_VMkernel" vSwitch3
/usr/sbin/esxcfg-vmknic --add --ip 172.22.1.$HOSTIP \
--netmask 255.255.255.0 "VMotion_VMkernel"

/usr/bin/vmware-vim-cmd hostsvc/vmotion/vnic_set vmk1

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch0
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch0
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch0

/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--securepolicy-macchange=false vSwitch0 ESX_Console
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--securepolicy-promisc=false vSwitch0 ESX_Console
/usr/bin/vmware-vim-cmd hostsvc/net/portgroup_set \
--securepolicy-forgedxmit=false vSwitch0 ESX_Console

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch1
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch1
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch1

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch2
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch2
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch2

/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-promisc=false vSwitch3
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-macchange=false vSwitch3
/usr/bin/vmware-vim-cmd hostsvc/net/vswitch_setpolicy \
--securepolicy-forgedxmit=false vSwitch3

action "   Adding entry to /etc/hosts: " /bin/true
printf "172.20.1.$HOSTIP\t\t`hostname` $BAREHOST\n" >>/etc/hosts

action "   Opening iSCSI firewall ports: " \
/usr/bin/vmware-vim-cmd hostsvc/firewall_enable_ruleset swISCSIClient

action "   Enabling software iSCSI: " \
/usr/bin/vmware-vim-cmd hostsvc/storage/software_iscsi_enabled true

action "   Setting iSCSI IQN name: " \
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_set_name \
vmhba32 iqn.1998-01.com.vmware:$BAREHOST

action "   Adding iSCSI send target: " \
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_add_send_target \
vmhba32 $CISCSITARG1

action "   Generating random iSCSI CHAP password: " /bin/true

until [ ${#KEY} -eq 16 ]; do D=$(od -A n -N 1 -t u1 </dev/random); \
if ([ $D -ge 48 ] && [ $D -le 57 ]) || \
   ([ $D -ge 65 ] && [ $D -le 90 ]) || \
   ([ $D -ge 97 ] && [ $D -le 122 ]); \
then KEY="$KEY"$(printf \\$(printf '%03o' $D)); fi; done; \
printf $KEY >/root/chap_pwd.txt; unset KEY

chmod 600 /root/chap_pwd.txt

CHAP_PWD=`cat /root/chap_pwd.txt`

action "   Setting iSCSI CHAP password: " \
/usr/bin/vmware-vim-cmd hostsvc/storage/iscsi_enable_chap \
vmhba32 iqn.1998-01.com.vmware:$BAREHOST $CHAP_PWD

unset CHAP_PWD

action "   Setting up NTP: " /bin/true

/usr/sbin/esxcfg-firewall --enableService ntpClient
/bin/mv -f /etc/ntp.conf /etc/backup_ntp.conf
/bin/cat > /etc/ntp.conf <<NTPEOF
restrict kod nomodify notrap noquery nopeer
restrict 127.0.0.1
server 127.127.1.0
server $CNTPSERVER1
driftfile /var/lib/ntp/drift
NTPEOF

/bin/mv -f /etc/ntp/step-tickers /etc/ntp/backup_step-tickers

/bin/cat > /etc/ntp/step-tickers <<STEPEOF
server 127.127.1.0
server $CNTPSERVER1
STEPEOF

chkconfig --level 2345 ntpd on
service ntpd restart >/dev/null 2>&1

action "   Configuring user account limits: " /bin/true

chage -m 2 -M 90 -W 14 admin

# enables pam_tally.so for failed attempt lockout
esxcfg-auth --maxfailedlogins=5

# create a failed login log file
touch /var/log/faillog
chown root:root /var/log/faillog
chmod 600 /var/log/faillog

# set max attempts to 5 and duration to 120 minutes for user admin
faillog -u admin -m 5 -l 7200

ESXCFG
#
# limit string ESXCFG marks the end of the post-reboot script
#########################################################################################

chmod 700 /root/esx_script.sh

cp /etc/rc.d/rc.local /etc/rc.d/rc.local.backup

cat >> /etc/rc.d/rc.local <<RCCFGEOF
. /etc/init.d/functions
action "Running ESX Configuration Script: "  /bin/true
# Execute post-reboot config script
/root/esx_script.sh
# Move rc.local.backup back to rc.local, restoring the original
action "   Restoring orginal rc.local: " mv -f /etc/rc.d/rc.local.backup /etc/rc.d/rc.local
RCCFGEOF

3 comments:

  1. Robert some possible additions / tricks:

    *Point towards the license server (all one line)
    vmlicense --mode=server --server=27000@server.corp.com --edition=esxFull --features=backup,vsmp

    *The 4 minute sleep could be replaced by:
    while ! vmware-vim-cmd /hostsvc/runtimeinfo; do
    sleep 20
    done

    *Increase the Service Console memory to 800MB, remember SWAP should be twice this.
    mv -f /etc/vmware/esx.conf /etc/vmware/esx.conf.orig
    sed -e 's/boot\/memSize = \"272\"/boot\/memSize = \"800\"/g' /etc/vmware/esx.conf.orig > /etc/vmware/esx.conf
    mv -f /boot/grub/grub.conf /tmp/grub.conf.bak
    sed -e 's/uppermem 277504/uppermem 818176/g' -e 's/mem=272M/mem=800M/g' /tmp/grub.conf.bak > /boot/grub/grub.conf
    rm -f /tmp/grub.conf.bak

    *And there are more. Everything can and should be done with the kickstart script. That way you know all 3 or 100 of your Hosts are configured the same.

    I noticed you hardcoded you SW iSCSI to vmhba32. I have some servers that use vmhba33. Try this (though it might be esx 4...):
    esxcfg-scsidevs -a |grep "Software iSCSI" |awk '{print $1}'

    ReplyDelete
  2. David, thanks for the excellent tips! I completely forgot about the license server, and the while loop is far superior to my sleep for 4 minutes method. I'm going to completely rebuild my lab environment in a couple of days, so I'll get to test these out.

    You mentioned you might have more? If so, it would be great to have them here.

    ReplyDelete
  3. Your welcome Rob. I don't think any of these are my original work, but have been accumulated over time from the smarts of others.

    *For lab environments we are less concerned with security, the service restart is redundant as the Host will be rebooted later.
    # DANGEROUS Allow ROOT access using SSH
    sed -e 's/PermitRootLogin no/PermitRootLogin yes/' /etc/ssh/sshd_config > /etc/ssh/sshd_config.new
    mv -f /etc/ssh/sshd_config.new /etc/ssh/sshd_config
    service sshd restart

    *Two 3rd party packages I've installed are the QLogic iscli tool and Dell OpenManage. I store these on an NFS mount.
    #First NFS access
    echo "Opening firewall port for NFS client"
    esxcfg-firewall -e nfsClient
    echo "Starting portmap service"
    service portmap start
    echo "Mounting NFS mount point"
    mkdir /tmp/nfs
    mount -t nfs server:/directory /tmp/nfs
    #
    echo "Installing Dell OpenManage"
    /bin/bash /tmp/nfs/vmware/OMSA_5.4/linux/supportscripts/srvadmin-install.sh -b -w -r -s
    echo "Opening firewall port for OMSA agent"
    esxcfg-firewall -o 1311,tcp,in,OpenManageRequest
    echo "Opening firewall port for SNMP traffic"
    esxcfg-firewall -e snmpd
    #
    echo "Installing iSCLI"
    rpm -i /tmp/nfs/vmware/iscli/iscli-1.1.00-13_linux_i386.rpm
    #This is a good place to use iscli to tweak your QLogic HBAs
    /usr/local/bin/iscli 0 -n ExeThrottle 128 Large_Frames on KeepAliveTO 60 IP_ARP_Redirect on
    #Clean up, remove NFS mount point
    echo "Unmounting NFS mount point"
    /bin/umount /tmp/nfs
    echo "Stopping portmap service"
    service portmap stop
    echo "Closing firewall port for NFS client"
    esxcfg-firewall -d nfsClient

    * Using vRanger?
    echo "Enabling SSH client for vRanger"
    esxcfg-firewall -e sshClient

    *Redundancy is important, but there are only so many NICs you can put in a 2U server. So the Service Console and VMotion share a pair of NICs. However in normal usage I don't want them to use the same NIC.
    Note: The unlinking of the NIC at the beginning was to work around some bug, it may no longer be needed.
    echo "Configuring vSwitch0 for two NICs supporting Service Console & VMotion in an active/standby & standby/active"
    echo "Adding second NIC to vSwitch0"
    esxcfg-vswitch -U vmnic0 vSwitch0
    vmware-vim-cmd internalsvc/refresh_network
    esxcfg-vswitch -L vmnic0 vSwitch0
    esxcfg-vswitch -L vmnic2 vSwitch0

    echo "Creating VMotion portgroup"
    esxcfg-vswitch --add-pg VMotion vSwitch0
    esxcfg-vswitch --vlan 47 --pg VMotion vSwitch0
    echo "Configuring VMotion IP setting"
    esxcfg-vmknic --add VMotion --ip 10.14.17.1XXXX --netmask 255.255.255.0

    echo "Enabling VMotion"
    vmware-vim-cmd hostsvc/vmotion/vnic_set vmk0
    vmware-vim-cmd internalsvc/refresh_network

    echo "Setting Service Console portgroup to active/standby"
    vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-active vmnic0 vSwitch0 "Service Console"
    vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-standby vmnic2 vSwitch0 "Service Console"

    echo "Setting VMotion portgroup standby/active"
    vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-active vmnic2 vSwitch0 VMotion
    vmware-vim-cmd hostsvc/net/portgroup_set --nicorderpolicy-standby vmnic0 vSwitch0 VMotion
    vmware-vim-cmd internalsvc/refresh_network

    *If you are using Cisco switches, your network admins might want Cisco Discovery Protocol turned on:
    esxcfg-vswitch --set-cdp both vSwitch0

    * Configure HW iSCSI:
    echo "Configuring iSCSI settings in HBA 1"
    vmkiscsi-tool vmhba1 -i -a 10.14.17.1XXXX
    vmkiscsi-tool vmhba1 -s -a 255.255.255.0
    vmkiscsi-tool vmhba1 -g -a 10.14.17.1
    vmkiscsi-tool vmhba1 -k -a esxXXXX-vmhba1
    vmkiscsi-tool vmhba1 -X
    #Point it at some storage!
    vmkiscsi-tool vmhba1 -D -a 10.14.17.31:3260

    I think that is it :)

    ReplyDelete