When the Intern Writes the Billing System

Posted April 19, 2013

In the autumn of 2005 I was finishing an internship at Fog Creek Software. My colleagues and I on Project Aardvark had successfully developed and released Copilot. After the launch, we cleaned up the few remaining bugs and added a few small, requested features. Then people started leaving. First Ben, then Yaron, and finally Michael. By the end of August, I was the only intern left.

It would stay that way for the next three months. Unlike the other interns, I didn't have to return to school immediately. I had graduated from Rose-Hulman already and was able to defer my admission to Stanford for the first quarter of classes. This gave me another three months at Fog Creek to tie up any loose ends with Copilot and start adding the next big feature. My assignment, as the last remaining intern? A subscription billing for Copilot.

(Yes, they had an intern writing the new billing system. No, I'm not making this up.)

When we first launched Copilot, there was only one option for payments: a $5 pass that was good for 24 hours of use. This was great for occasional users, since the cost was more than worth the time saved by not having to do remote support blind. But for more frequent users, it was a hassle. They'd have to enter their credit card information every time they needed to help someone. This was especially bad for companies that used it for customer support, since it forced their support people to request access to the company card on a daily basis.

Our plan was to implement a subscription billing system, similar to cell phone plans, based on minutes used per month, except with transparent, benevolent pricing. It would have several plans with a different number of included minutes and a per-minute charge for any overages. To make sure users weren't locked into something too expensive, we also said the system needed to let users change plans at any time.

My first task was to research options for storing and charging credit cards. I read dozens of marketing pages, sent emails to companies asking for info, and made phone calls to sales people. But every option had the same basic flaw: they could only do rigid, set-price subscriptions. This wouldn't work for us, since users could easily be charged a different amount each month, depending on how much they utilized the service.

Instead, we decided that we would write the whole subscription system from scratch ourselves. We already had a similar system in place for one-off purchases, so we thought we might be able to reuse some of the old code. I started writing the recurring billing code while one of our co-founders started on the credit card vault, which we called Roach Motel. (Get it? Cards check in, but they don't check out...)

We did everything possible to isolate credit cards from the rest of our infrastructure, keeping it all on a separate server stored in our secure cage in the office, instead of the data center where data-center employees might have access to it. The code was written to accept credit cards, but never to send them out again, except in encrypted form, and only to the payment processor. Access to the box required a thumb drive, held in secure storage, and a password. (Note: As of a couple years ago, this was no longer the case, as we began using a proper credit card vault instead.)

Over the next couple of months, the system came together nicely, being designed and redesigned to be flexible, so more products could be added in the future, while still being simple and robust. When I left that December, the system was up and running, billing customers each month. Overall, the project seemed like a total success.

Upon returning to Fog Creek the following summer, I found that was not the case. While I was away, FogBugz had grown from an install-only product to its own hosted solution. As with Copilot, this new SaaS-y version of FogBugz required a subscription billing system. So one of the developers on the FogBugz team added a significant amount of new code to support it.

While part of this new code was to deal with cases I had not foreseen in the original version, much more duplicated existing concepts in a different and incompatible way. (This might sound like an admonishment of the developers adding that new code, but most of the blame lies with me for not communicating, through documentation, the intent of the features they had duplicated.)

As time went on, the code base continued to grow organically. Occasionally, a change in one side of the code would cause billing errors for the other side. Bugs were introduced that caused miscalculations in customers bills. Other bugs caused duplicate payments to be triggered without properly recording them.

When Kiln came along it needed to be added to the system as well. Because of its tight integration with FogBugz, even more code warts were added on to deal with the possibility that accounts might only have Kiln, only have FogBugz, or have both.

In all, thousands of developer hours were spent developing and maintaining something that had nothing to do with the software our customers were paying us for. Time that we could have spent on writing new features and fixing bugs for customers was instead spent digging into stringy old billing code. What's worse, the system always was and, unless some great rewrite happens, will always be a mess. (Little known fact: The odds of such a rewrite happening on such a critical system are 2276709 to 1 against.) The fact is, writing billing software is not Fog Creek's core competency.

I bring this story up because it seems to closely parallel the situation that lead to Linode's recent security incident, in which they lost both their password database as well as their entire credit card database. (Fortunately, Fog Creek never suffered a similar loss of credit card data while we were still storing it ourselves.) In these sorts of security breaches, passwords are relatively easy to fix by requiring a password reset. The credit cards, however, are a much bigger problem, since they are often used many different places, requiring worried customers to update their credit card with every vendor they work with.

(Fortunately, the credit card numbers were encrypted in the database. Unfortunately, they decided to store the public and private key for those credit card numbers on the same server. According to Linode, the private key is encrypted with a "complex passphrase" which is "not guessable, sufficiently long and complex, not based on dictionary words, and not stored anywhere but in our heads".)

So how did Linode lose their entire credit card database to a hacker? Putting aside the technical details (which involve a zero-day exploit of ColdFusion), it comes down to Linode, a great VPS provider, spending time developing sensitive, complex systems that were outside of both their core competency and their main business. They, like Fog Creek, are not a payment processing company. Literally speaking, the system brings in the money, it's not what their customers pay them for. If, instead, they'd simply paid for one of the many affordable credit card vaults available, whose job it is to securely store credit card numbers, this never would have happened.