There is a critical step when creating OS images or templates for use in image-based provisioning systems such as those embedded in most virtualization platforms. That step is cleaning up residual instance-specific data from the base, or golden, image. That step is called ‘sysprep’ in the Windows administration terminology. Failing to do so can lead to various problems such as the provisioned hosts failing to boot, filing to automatically gain network connectivity or subtle identification issues when trying to embed the provisioned host into network-wide distributed systems.
Recent changes in the low-level plumbing of Linux systems, mostly due to the switch to systemd-based system and service management from System-V based management, necessitate some updates the the procedures used to perform golden image cleanup. This post documents the various steps needed to clean up a RHEL7 golden image. While I haven’t tested them directly on such systems, similar steps should apply to Fedora and CentOS 7 systems as well.
Erasing network configuration data
Failing to perform this step may result in issues ranging from provisioned hosts failing to automatically gain network connectivity to duplicate IPs on the network created during the provisioning process. To perform this step one must delete all the files that match the ‘
ifcfg-*‘ pattern that reside under the following path, except the ‘
Some provisioning systems will create network configuration during the provisioning process, however I have found this to be somewhat fragile. I have found you can increase the chances of a provisioned host successfully gaining network connectivity, especially in DHCP-managed networks, by creating a generic configuration for the first network interface. You do this by creating the ‘ifcfg-eth0’ file with the following configuration:
DEVICE=eth0 ONBOOT=yes TYPE=Ethernet BOOTPROTO=dhcp
Note that line 4 above was not needed in RHEL6 as DHCP was used by default, but it seems it is needed in RHEL7. Apart from the interface configuration files, all host-specific configuration should be removed from the ‘
/etc/sysconfig/network‘ file. I find its easiest to simply empty the file, but others might prefer to leave some generic configuration such as disabling the IPV6 stack or setting up DNS domain search suffix. One could also simply erase the ‘
/etc/sysconfig/network‘ file, but I found that this causes unneeded error messages to be yielded during the boot process.
Erasing network interface hardware association
This step is needed so that the network interface created for the provisioned hosts will end up being called ‘<code>eth0</code>’ buy the OS, so that it will pick up the basic configuration created in the previous step. To perform this step one must delete the following file if it exists:
Sysadmins should be familiar with this step as a similar step was also needed in older RHEL versions. In my testing I have found that this file is not always created on RHEL7. But this might be a function of the virtualization environment I was testing with (RHEV/oVirt).
This step is needed because of the way ‘systemd’ works differently then previous systems. One needs to erase the following file:
Failing to delete this file will result in the provisioned hosts retaining the hostname of the golden image host.
/etc/machine-id‘ is an important file for ‘systemd’. Removing this file will not only cause a host to fail to boot, but will also cause systemd to act very erroneously yielding many errors and taking very long to reach even single-user mode or the rescue console. The code in this file can be used by various systems to uniquely identify a host on the network, therefore steps must be taken to make sure a new code is generated for provisioned hosts.
The ‘systemd’ package provides the ‘systemd-machine-id-setup’ tool that can be used by provisioning systems to generate a new ‘machine-id’ file, however, not all image-based provisioning systems provide means to have such a tool run during the provisioning process. One way to clear out the ‘machine-id’ code from the golden image host is to empty out the ‘
/etc/machine-id‘ with the following command:
Running the above command will cause ‘systemd’ to create a temporary machine-id code every time the provisioned host boots up. It may be desirable to retain the machine-id generated during the fiirst boot-up of a provisioned host over the lifetime of that host. For that purpose some sort of a first-boot script needs to be written. The ‘systemd’ developers have already implemented such a script, but it is not yet included in RHEL7 or even Fedora 20.
In some cases the environment running the provisioned hosts may take care of this ‘machine-id’ issue for you.The ‘
systemd-machine-id-setup‘ script, used by ‘systemd’ to generate the machine-id, supports using a UUID number that could be passed into the virtual machine in KVM-based environments. I can confirm this indeed happens in ‘oVirt’ thereby creating the situation where each VM gets its own uniqe and persistent ID when the ‘
/etc/system-id‘ file is empty. The same script also supports container environments as detailed in this document.
Erasing SSH host keys
The SSH service is configure by default in such a way that when the SSH host keys, used to uniquely identify a host for the SSH client, are erased, a new set is generated during service start up. Failing to erase the SSH host keys will result in having all provisioned hosts using the same set of keys which is harmful for SSH protocol security. To erase the SSH host keys one must simply erase all files matching the following pattern:
Erasing RHN system ID
The RHN system ID is used to uniquely identify a host in the RHN or when managing hosts with tools such as Spacewalk or Satellite. Failign to erase this system ID will cause all provisioned hosts to be recognized as the same host by the management system rendering it useless. To erase the RHN system ID and cause it to be re-generated one must sipmly erase the following file:
Regenerating ‘initrd’ image
In RHEL7, ‘systemd’ is started from the ‘initrd’ image even before the root file system had been mounted. Therefore, many of the basic system configuration files such as ‘
/etc/hostname‘ are actually read from the ‘initrd’ image and not from the root file system. In order to properly clean up those files, a generic ‘initrd’ file need to be generated with the following command:
dracut --no-hostonly --force
Note: The installation process sets up a host-specific ‘initrd’ file by default. This means that hosts that boot with a generic ‘initrd’ will work differently in subtle ways and may boot more slowly. In particular, removing the ‘
/etc/hostname file from the ‘initrd’ image will cause systemd to initialize the hostname to ‘localhost’. That value will be retained as the hostname until NetworkManager will start up and set the hostname by using revers-dns lookup (By default). This fact could be seen in the system logs for example. This behavior might influence various services that start up before the network is setup. It is advisable to use some kind of a first-boot script to create the host-specific configuration files and the host-specific ‘initrd’ image once the provisioned host had booted and detected its final configuration.
Cleaning up log files
While not essential, it is better to have the provisioned hosts free of log messages created when the golden image host was installed. One needs to erase all the log files ‘
/var/log‘ and other log directories (if any).
Once the files are erased, some processes may continue writing to the orphaned inodes of those files. The files will, however, be removed once those processes had been shut down along with the system. When a provisioned hosts boots up, a new, empty set of files will be created for it.
The above is needed only if your system is running ‘rsyslog’ (Which is setup by default on RHEL7), an equivalent service, or another service that writes its logs directly to ‘
/var/log‘. Unless the ‘
/var/log/journal‘ directory is manually created, the ‘
journald‘ service saves its journal file to ‘
/run/log/journal‘ which is erased automatically on boot.
Triggering manual first-boot setup (anti-step)
While I believe this step should not be done in practice, since it creates a situation in which a sysadmin must manually configure various details for each and every provisioned host (In my view, this beats the purpose of having images in the first place). Since I’ve seen this step mentioned in various guides that have to do with creating system images, I’ve seen fit to mention it here if only to discourage other from performing it. This step essentially consists of creating the ‘
/.unconfigured‘ file or running the ‘
sys-unconfig‘ script. As stated, this causes the provisioned host to interactively ask for various configuration details such as the keyboard layout, the root password and the timezone when it boots.
Automating the process
Here is a GitHub repository where you can find a small script I wrote to perform all the steps mentioned above. This script is written in such a way that, when run without parameters ,it only displays what is about to be done, without actually committing any changes. To make the script actually clean up the host-specific data, one needs to run it with the ‘
-f‘ command-line argument.
I simply copy this script to ‘
/usr/local/sbin‘ on the golden host. That way I can run it on that host when it is ready to be turned into an image. I can also run it on a provisioned host when I decide that it had reached a point where an image should be made from it.