I’m a fan of version control in systems administration. If you don’t have a central VCS for your server configuration files, you can always use RCS. I habitually add #$Id$ at the top of configuration files, so I can easily see who touched this file last and when.
On an unrelated note, I’m upgrading my virtualization cluster to Ubuntu 10.10. The worker nodes run diskless. Each diskless node reads a configuration file over TFTP. Mine looked like the following:
APPEND root=/dev/nfs initrd=initrd.img-2.6.35-27-server-pxe nfsroot=192.0.2.2:/data1/imagine,noacl ip=dhcp rw
This has worked fine for a year or so now, with me changing the kernel and initrd versions as I upgraded. With the Ubuntu 10.10 update, however, some pieces of hardware wouldn’t reboot. Most booted fine, but a few didn’t come back up again.
This is notably annoying because the hardware is in a remote datacenter. Driving out to view the console messages burns an hour and, more annoyingly, requires that I stir my lazy carcass out of my house. I have a serial console on one of the machines, but not on the affected one. Fortunately, I do have remote power, and I can make changes on the diskless filesystem.
Packet sniffing revealed that the machine successfully made a TFTP request, then just… stopped. This exact same configuration and filesystem worked on other machines, however. Except that the affected machines all had #$Id$ on the first line of their pxelinux.cfg file, and machines that booted successfully didn’t.
That shouldn’t matter. Really, it shouldn’t. pxelinux.cfg files accept comments. But I removed the tag, making the first line the LABEL statement, and power cycled the machine. And it came up perfectly.
Apparently this particular rev of Linux PXE is incompatible with version control ID tags. Oh joy, oh rapture!