The folks at undeadly.org have started posting “how I discovered OpenBSD” stories. This isn’t a story of how I discovered OpenBSD, but rather why I like it. Before you ask, I don’t have similar stories about any other operating system, not even any other BSDs. I was guided to FreeBSD in 1995, and I discovered NetBSD on my own shortly after. (An earlier version of this was previously published in a small promo pamphlet handed out at a tech conference years ago.)
Back around 2000, my employer’s main business was designing Web applications, but once those applications were built our clients would turn around and ask “Where should we host this?” That’s where I came in, building and running a small but professional-grade data center for custom applications.
As with any new business, our hosting operation had to make the most of existing resources. Hardware was strictly limited to cast-off hardware from the web developers, and software had to be free. The only major expense was a big-name commercial firewall, purchased for marketing reasons rather than technical ones. With a whole mess of open-source software, we built a reliable network management system that provided the clients with a more insight into their equipment than their in-house people could offer. The clients paid for their own hardware, and so had fancy high-end rackmount servers with their chosen applications, platforms, and operating systems. As the business grew we upgraded the hardware – disk drives less than five years old are nice – but saw no need to replace the software.
One Monday morning, a customer that had expected to use very little bandwidth found that they had sufficient requests to devour twice the bandwidth we had for the entire datacenter. This affected every customer. If your $9.95/month web page is slow you have little to complain about, but if your multiple-thousands-of-dollars-a-month Web application is slow you pick up the phone and scream until the problem stops.
To make matters worse, my grandmother had died only a couple days before. Visitation was on Tuesday, the funeral Wednesday morning. I handed the problem to a minion and said “Here, do something about this.” I knew bandwidth could be managed at many points: the Web servers themselves, the load balancer in front of them, the commercial firewall, and even the router all claimed to have traffic management capacity.
Tuesday after visitation I found my cellphone full of messages. The version of Internet Information Server could manage bandwidth — in eight megabyte increments, and only if the content was static HTML and JPEG files. With several Web servers behind the load balancer, that fell somewhere between useless and laughable. The load balancer did support traffic shaping, if we bought the new feature set. If we plopped down a credit card number, we could have it installed by next Sunday. Our big-name commercial firewall also had traffic shaping features available, if we upgraded our service level and paid an additional (and quite hefty) fee for the feature set. That left the router, which I had previously investigated and found would support traffic shaping with only an IOS upgrade.
I was on the phone until midnight Tuesday night, making arrangements to do an emergency OS upgrade on the router on Wednesday night. I had planned to go to the funeral Wednesday morning, give the eulogy, go home and take a nap, and arrive at work at midnight ready to rock. The funeral was more dramatic than I had expected and I showed up at work at midnight sleepless, bleary-eyed, and upright only courtesy of the twin blessings of caffeine and adrenaline. In my email, I found a note that several big clients had threatened to leave unless the problem were resolved Thursday morning. If I hadn’t already been stressed out, the prospect of choosing a minion to lay off would have done the trick. (Before any of those minions start to think I care about them personally: I work hard training minions, and swinging the Club of Correction makes my arms sore. Eventually. I don’t like to replace them.)
Still, only a simple router flash upgrade and some basic configuration stood between me and relief. What could possibly go wrong?
The upgrade went smoothly, but the router behaved oddly when I enabled traffic shaping. Over the next few hours, I discovered that the router didn’t have enough memory to simultaneously support all of our BGP feeds and the traffic shaping functionality. Worse, this router wouldn’t accept more memory. At about six in the morning, I finally got an admission from the router vendor that they could not help me.
I hung up the phone. The first client who had threatened departure would be checking in at seven thirty AM. I had slept four hours of the last forty-eight, and had spent most of that time under fiendish levels of emotional stress. I had already emptied my stash of quarters for the soda machine, and had pillaged a co-worker’s desk for his. The caffeine and adrenaline that had gotten me to the office had long since worn off, and further doses of each merely slowed my collapse. We had support contracts on every piece of equipment, and they were all useless. All the hours of work my team and I had put in left me with absolutely nothing.
I made myself sit still for two minutes simply focusing on breathing, making my head stop sliding around loose on my shoulders, and ignoring the loud ticking clock. What could be done in ninety minutes — now only eighty-eight?
I really had one only option. If it didn’t work, I would either lay someone off or file for unemployment myself.
6:05 AM. I started downloading the OpenBSD install floppy image then grabbed a spare desktop machine, selecting it from amongst many similar machines by virtue of it being on top of the pile. The next few minutes I alternated between hitting the few required installation commands and dismantling every unused machine unlucky enough to be in reach to find two decent network cards.
By 6:33 AM I had two Intel EtherExpress cards in my hands and a virgin OpenBSD system. I logged in long enough to shut the system down so I could wrench the case off, slam the cards into place, and boot again. Even early versions of PF included all sorts of nifty filtering abilities, all of which I ignored in favor of the newly-integrated traffic-shaping functions. By 6:37 AM I was wheeling a cart with a monitor, keyboard, and my new traffic shaper over to the rack.
Then things got hard. I didn’t have a spare switch that could handle our Internet bandwidth. The router rack was jammed to overflowing, leaving me no place to put the new shaper. I lost almost half an hour finding a crossover cable, and when I discovered one it was only two feet long. The router, of course, was mounted in the top of the rack. About 7:10 AM, I discovered that if I put the desktop PC on end, balanced it on an empty shipping box, and put the box on the mail cart, the cable just reached the router. I stacked everything so it would reach and began re-wiring the network and reconfiguring subnets.
I vaguely recall my manager coming in about 7:15 AM, asking with taut calmness if he could help. If I remember correctly, as I typed madly at the router console I said “Yes. Go away.”
At 7:28 AM we had an OpenBSD traffic shaper between the hosting area and our router. All the client applications were reachable from the Internet. I collapsed in my chair and stared blankly at the wall.
While everything seemed to work, the proof would be in what happened as our offending site started its daily business. I watched with growing tension as that client’s network traffic climbed towards the red line that indicated trouble. The traffic grew to just short of the danger line — and flatlined. Other clients called, happy that their service was restored to its usual quality. (One complained that his site was still slow, but it turned out that bandwidth problems had masked an application problem.) The problem client complained that their web site now ran even slower than before, to which we offered to purchase more bandwidth if they’d agree to buy it.
I taped a note to the shipping box that said “Touch this and I will kill you,” staggered to my car, and by some miracle got home.
Shortly afterwards, I had two new routers and new DS3s. The racks were again clean. The decrepit desktop machine was replaced by two rack-mount OpenBSD boxes in a live-failover configuration, protecting our big-name commercial firewall as well as shaping traffic. And I now keep a crossover cables in a variety of lengths.
Should we have had traffic shaping in place before selling service? Absolutely. As with any startup, though, our hands were full fixing the agonies of the moment and less on the future.
If I had started with OpenBSD, I would have had a much better night.