32 Is The New 20

Posted December 09, 2013

Recently, a dear friend turned 30. Being my longest tenured friend, I gave her a call to wish her well on the dawn of her fourth decade.

Like many, she was somewhat distressed by the prospect of turning 30. (In retrospect, opening the conversation with "Hey old lady!" was probably not the best choice.) Being that I had passed that same milestone only a few months before, she asked me how I got through it.

"I realized that 30 is a totally arbitrary number and that it is just another birthday."

I could tell she recognized the logic of the statement, but the sentiment rang hollow. Sure, it's an arbitrary number, but it's an arbitrary number that ends in zero. I decided to try another tack.

"That's just because we use base-10 for our numbers. If we used base-16, you would still have two years until you turn 20!"

She was intrigued by this prospect. But being one of the 99% of people who do not regularly think in other bases, she needed a quick refresher. "What's base-16?"

"It's when you don't go into double digits until 16, so in base-16, one-zero equals 16."

"Oh, right. But what do you do when you get above 9?"

"Well, most commonly you use letters. In base-16, which is also called hexadecimal, you use letters. So it goes 8, 9, A, B, C, D, E, F, 10."

"So I just turned 1E?"

"Yeah, exactly!"

"I like that! I'm going to tell everyone that I'm not turning 30, I'm just turning 1E!"

I'm not sure the deviation into base-16 helped assuage her fears about the interminable march of time, but for a moment a little math distracted her from her worries.

Microcaching for a Faster Site

Posted May 21, 2013

My website, this site, is not fast. But, because of this little trick I'm about to show you, you probably think it is.

It's not particularly slow, either, at least not when there's not too much load on it. ab reports that the median request time for the homepage is about 60ms when there's only one request per second coming in. But if traffic starts picking up, it starts slowing down. With 2 req/sec, the median jumps to 90ms per request, a 50% increase. At 5 req/sec, it slows to 225ms per request. Do some quick math and you'll see we'll soon have a problem.

Let's take a quick look under the hood. The website is a heavily modified version of an early iteration of Simple. It is written in Python using Flask and SqlAlchemy, talking to a PostgreSQL database. This is all being run by uWSGI in Emperor mode and served by Nginx.

Each of these levels could be a source of slowness. We could profile the Python code to figure out where we're spending our time. Is Jinja2 being slow about autoescaping the html? Maybe. Perhaps it's in the database layer. SqlAlchemy might be creating some bad queries, so we should log those. And, of course, we need to make sure that PostgreSQL is tuned properly so we're getting the most out of its caching. Then there's uWSGI; should we allocate 2, 4, or 8 processes to the site?

But you know what? That's a difficult, tedious process, and it's easy to make things worse in the process.


Optimization is hard! Let's go shopping.

What if we could just speed the whole thing up all at once?

It turns out that, for this type of site, where the users only see one version of the content (as opposed to a web app, where each user has their own version of the site) microcaching is an ideal solution.

Microcaching is the practice of caching the entire contents of a web page for a very short amount of time. In our case, this will be just one second.

By doing this, we ensure that when the site is under any sort of load, the vast majority of visitors are getting a copy of the site served as static content from the cache, which Nginx is very good at. In fact, because of the way the caching is set up, the only time a user would wait for the "slow" site would be if they were the first person to hit the site in over a second. But, we know that the "slow" site is pretty fast when it's under such light load.

The following is a slightly modified version of my Nginx config file for tghw.com, which shows how to do this:

# Set cache dir
proxy_cache_path /var/cache/nginx levels=1:2 
                 keys_zone=microcache:5m max_size=1000m;

# Actual server
server {
    listen 80;
    server_name a.tghw.com;
    # ...the rest of your normal server config...
}

# Virtualhost/server configuration
server {
    listen   80;
    server_name  tghw.com;

    # Define cached location (may not be whole site)
    location / {
        # Setup var defaults
        set $no_cache "";
        # If non GET/HEAD, don't cache & mark user as uncacheable for 1 second via cookie
        if ($request_method !~ ^(GET|HEAD)$) {
            set $no_cache "1";
        }
        # Drop no cache cookie if need be
        # (for some reason, add_header fails if included in prior if-block)
        if ($no_cache = "1") {
            add_header Set-Cookie "_mcnc=1; Max-Age=2; Path=/";            
            add_header X-Microcachable "0";
        }
        # Bypass cache if no-cache cookie is set
        if ($http_cookie ~* "_mcnc") {
            set $no_cache "1";
        }
        # Bypass cache if flag is set
        proxy_no_cache $no_cache;
        proxy_cache_bypass $no_cache;
        # Point nginx to the real app/web server
        proxy_pass http://a.tghw.com/;
        # Set cache zone
        proxy_cache microcache;
        # Set cache key to include identifying components
        proxy_cache_key $scheme$host$request_method$request_uri;
        # Only cache valid HTTP 200 responses for 1 second
        proxy_cache_valid 200 1s;
        # Serve from cache if currently refreshing
        proxy_cache_use_stale updating;
        # Send appropriate headers through
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        # Set files larger than 1M to stream rather than cache
        proxy_max_temp_file_size 1M;
    }
}

(Most of this code was originally derived from Fenn Bailey. Unfortunately, it seems his site has gone down with the Posterous shutdown.)

So what's going on here? Let's start with the top. I've set up the Flask site to respond to a.tghw.com. That's the actual site that we'll be caching. Note that the subdomain is not required. You could just as easily use another port, like 8080.

Next, for tghw.com, we first check to see if there's any reason we shouldn't use the cache. This includes doing a request other than HEAD or GET or having a certain cookie (which the admin page sets for me). If that's the case, we set a cookie for the next 2 seconds that says not to use the cache and we skip the cache for this request. (You want this to be longer than the caching time so your next GET request will grab a fresh copy.)

If we are using the cache for this request, then we defer to Nginx's usual proxy_pass. We tell it that all successful requests (HTTP 200) should be cached for 1 second. The choice of 1 second is pretty arbitrary, it could be longer, but since I know the app itself performs well with 1 request per second, there wouldn't be a lot of benefit to making it longer. We also set proxy_cache_use_stale to serve from the cache if Nginx is still busy updating the cache, meaning that users won't actually see a slower response while we go to the actual site.

So how does this do compared to the stock site? Well...it blows it out of the water.

Command used: ab -k -n 50000 -c [1|5|10|25|50|100] -t 10 http://[a.]tghw.com/

f
a.tghw.com tghw.com
-c req/sec med resp (ms) req/sec med resp (ms)
1 15 64 5,952 0
5 32 151 17,283 0
10 31 312 19,991 0
25 33 751 19,916 1
50 30 1,589 17,397 3
100 32 2,984 16,717 5

While the Flask site can reliably serve up to about 30 requests per second, it starts slowing down pretty significantly. The microcached site, on the other hand, serves almost 20,000 requests per second at its peak. More importantly, the response times stay in the single digit milliseconds, making the site feel nice and fast, regardless of load.

So there's an easy way to speed up your blog without having to make any changes to the application code.

When the Intern Writes the Billing System

Posted April 19, 2013

In the autumn of 2005 I was finishing an internship at Fog Creek Software. My colleagues and I on Project Aardvark had successfully developed and released Copilot. After the launch, we cleaned up the few remaining bugs and added a few small, requested features. Then people started leaving. First Ben, then Yaron, and finally Michael. By the end of August, I was the only intern left.

It would stay that way for the next three months. Unlike the other interns, I didn't have to return to school immediately. I had graduated from Rose-Hulman already and was able to defer my admission to Stanford for the first quarter of classes. This gave me another three months at Fog Creek to tie up any loose ends with Copilot and start adding the next big feature. My assignment, as the last remaining intern? A subscription billing for Copilot.

(Yes, they had an intern writing the new billing system. No, I'm not making this up.)

When we first launched Copilot, there was only one option for payments: a $5 pass that was good for 24 hours of use. This was great for occasional users, since the cost was more than worth the time saved by not having to do remote support blind. But for more frequent users, it was a hassle. They'd have to enter their credit card information every time they needed to help someone. This was especially bad for companies that used it for customer support, since it forced their support people to request access to the company card on a daily basis.

Our plan was to implement a subscription billing system, similar to cell phone plans, based on minutes used per month, except with transparent, benevolent pricing. It would have several plans with a different number of included minutes and a per-minute charge for any overages. To make sure users weren't locked into something too expensive, we also said the system needed to let users change plans at any time.

My first task was to research options for storing and charging credit cards. I read dozens of marketing pages, sent emails to companies asking for info, and made phone calls to sales people. But every option had the same basic flaw: they could only do rigid, set-price subscriptions. This wouldn't work for us, since users could easily be charged a different amount each month, depending on how much they utilized the service.

Instead, we decided that we would write the whole subscription system from scratch ourselves. We already had a similar system in place for one-off purchases, so we thought we might be able to reuse some of the old code. I started writing the recurring billing code while one of our co-founders started on the credit card vault, which we called Roach Motel. (Get it? Cards check in, but they don't check out...)

We did everything possible to isolate credit cards from the rest of our infrastructure, keeping it all on a separate server stored in our secure cage in the office, instead of the data center where data-center employees might have access to it. The code was written to accept credit cards, but never to send them out again, except in encrypted form, and only to the payment processor. Access to the box required a thumb drive, held in secure storage, and a password. (Note: As of a couple years ago, this was no longer the case, as we began using a proper credit card vault instead.)

Over the next couple of months, the system came together nicely, being designed and redesigned to be flexible, so more products could be added in the future, while still being simple and robust. When I left that December, the system was up and running, billing customers each month. Overall, the project seemed like a total success.

Upon returning to Fog Creek the following summer, I found that was not the case. While I was away, FogBugz had grown from an install-only product to its own hosted solution. As with Copilot, this new SaaS-y version of FogBugz required a subscription billing system. So one of the developers on the FogBugz team added a significant amount of new code to support it.

While part of this new code was to deal with cases I had not foreseen in the original version, much more duplicated existing concepts in a different and incompatible way. (This might sound like an admonishment of the developers adding that new code, but most of the blame lies with me for not communicating, through documentation, the intent of the features they had duplicated.)

As time went on, the code base continued to grow organically. Occasionally, a change in one side of the code would cause billing errors for the other side. Bugs were introduced that caused miscalculations in customers bills. Other bugs caused duplicate payments to be triggered without properly recording them.

When Kiln came along it needed to be added to the system as well. Because of its tight integration with FogBugz, even more code warts were added on to deal with the possibility that accounts might only have Kiln, only have FogBugz, or have both.

In all, thousands of developer hours were spent developing and maintaining something that had nothing to do with the software our customers were paying us for. Time that we could have spent on writing new features and fixing bugs for customers was instead spent digging into stringy old billing code. What's worse, the system always was and, unless some great rewrite happens, will always be a mess. (Little known fact: The odds of such a rewrite happening on such a critical system are 2276709 to 1 against.) The fact is, writing billing software is not Fog Creek's core competency.

I bring this story up because it seems to closely parallel the situation that lead to Linode's recent security incident, in which they lost both their password database as well as their entire credit card database. (Fortunately, Fog Creek never suffered a similar loss of credit card data while we were still storing it ourselves.) In these sorts of security breaches, passwords are relatively easy to fix by requiring a password reset. The credit cards, however, are a much bigger problem, since they are often used many different places, requiring worried customers to update their credit card with every vendor they work with.

(Fortunately, the credit card numbers were encrypted in the database. Unfortunately, they decided to store the public and private key for those credit card numbers on the same server. According to Linode, the private key is encrypted with a "complex passphrase" which is "not guessable, sufficiently long and complex, not based on dictionary words, and not stored anywhere but in our heads".)

So how did Linode lose their entire credit card database to a hacker? Putting aside the technical details (which involve a zero-day exploit of ColdFusion), it comes down to Linode, a great VPS provider, spending time developing sensitive, complex systems that were outside of both their core competency and their main business. They, like Fog Creek, are not a payment processing company. Literally speaking, the system brings in the money, it's not what their customers pay them for. If, instead, they'd simply paid for one of the many affordable credit card vaults available, whose job it is to securely store credit card numbers, this never would have happened.

Well that sucks, what else you got?

Posted October 16, 2012

The Conversation

It was getting towards the end of the breakfast hour on the second day of Business of Software, and people started getting up to refill their coffee before the first session, leaving only Jason Cohen and me to chat while we finished our meal.

"So what do you do?" he asked me. I explained that I used to work for Fog Creek, but now I was off on my own, trying to start a company that provides photo backup for serious photographers. He asked the questions I've come to expect from people in the tech industry. "How is it different than Dropbox/BackBlaze? Where are you storing the files?" And so on.

Then he asked the key question. "How's it going so far?"

"Well, to be honest, very slowly." I proceeded to describe how we'd gotten a strong showing of interest by being featured on betali.st, only to have to drastically restructure pricing in a way that significantly cut our margins. Then, how, even with the new pricing, conversion rates were dismally low. I explained who our competition was and how we were having trouble explaining to our users why we were better. I repeated an observation that a former coworker had made, that the photography market is inverted, amateurs have plenty of money but don't care (that much) if they lose their photos, while professionals, whose businesses could be devastated be losing a single shot, don't have a lot of extra money to spend. Lastly, I pointed out that an effective marketing campaign would almost necessarily rely on convincing people that something bad (i.e. losing all of their photos) will likely happen to them sometime in the future, a particularly unappealing proposition to me.

Jason listened patiently while I explained all of this, and then summed it all up.

"Well that sucks, what else you got?"

The Realization

It had slowly been occurring to me over the past week. Snaposit was probably done. My co-founder, Jeff and I had already put it on the back burner to focus on Jeff's photo deblurring software, Blurity. Further development was becoming harder to justify.

Over the course of Business of Software, I became more open about talking with people about the problems we were running into. The hope was that the more experienced entrepreneurs I was talking to would be able to provide some reassurance. But reassurance was not forthcoming. If anything, I was hearing more reasons why it would be a tough sell.

My talk with Jason gelled it all together in that single phrase. "Well that sucks, what else you got?"

I realized it at the time, but it took me another few days before I could admit it. Keeping Snaposit going just didn't make any sense. Sure, it wasn't hemorrhaging cash, but it did have a slow leak. More importantly, it was still demanding enough attention to be detracting from other projects. To keep it running in the hopes that it might miraculously take off at some point in the future was unrealistic. It was time to shut it down.

The Causes

Poor Conversion Rates

While we were thrilled with the response from betali.st, the subsequent conversions were abysmally low. With those numbers, we would have to find traffic sources ten times larger than betali.st, every week for two years before we would even reach the level of "lifestyle business". Of course, we could hope to improve the conversion rate or the incoming traffic, but the numbers were still pretty poor. More realistically, it would take us much longer to hit the sort of revenue levels we were looking for.

Misunderstanding Our Market

In hind sight, there was a disconnect between what people said they thought was a good idea and what they would actually pay for. Almost everyone we talked to said something similar: "That's a really good idea." We heard it enough times that we really thought we were onto something. But there were two problems with this feedback: First, we were talking to more software developers, friends, and family than actual photographers. Second, we were just getting feedback on the idea itself, not on whether people would pay for it or not. Of course people think an automatic backup service is a good idea. It's like flossing, one of those things that everyone knows they should do, but the majority of people still neglect actually doing.

We also didn't fully understand what photographers wanted and needed. One of Snaposit's big features was that it backed up your entire library faster than a general backup service. It did this by compressing the photos to full resolution, high quality JPEGs, which saved anywhere from 50%-90% of the size. The downside of this, of course, is that photographers who shoot in RAW would only have a JPEG copy backed up. We figured this was a reasonable compromise. In a catastrophic event, wouldn't a photographer rather have a JPEG copy than nothing at all? It turns out the answer was no. But we didn't ask that question, because we thought the answer was so obvious.

Lastly, we wrote the first version of the Snaposit desktop application for Windows. We figured, "We're photographers who use Windows, we'll be able to find beta testers for this version and we'll get to the Mac version once all the bugs are ironed out." Nope. We couldn't find a single beta tester who used Windows.

The Loss of a Founder

For various reasons, the amount of personal runway Jeff and I had differed greatly. At a point, it became very clear that Snaposit would not be pulling in revenue before we reached the end of Jeff's runway. He decided that, in order to remain financially secure, he would shift his focus to Blurity, which was much closer to having a reliable revenue stream. I fully supported his decision, but it meant that the majority of the Snaposit work fell to me after that point. While this seemed feasible at the time, in retrospect, it did cut the chances of Snaposit's success. Unfortunately, there wasn't really any other way around it, it was just the reality of the situation. To this day, I still fully support Jeff's decision.

The Lessons

Of course, the experience was not a total loss, by any means. We learned a lot from failing. Below is a quick list of what we learned about:

  • How to apply for Y Combinator and Tech Stars.
  • How to integrate Stripe's subscription plans.
  • Python on the desktop with WxPython.
  • Mac development and deployment.
  • How to organize and form an LLC.
  • A better look at the backup industry.
  • That Jeff and I work well together.
  • How to run an effective 99designs competition.
  • Twitter's Bootstrap framework.
  • Desktop deployment and installers.
  • How to ask better questions about our market.

All told, we actually did quite well. For this experience, we spent less than $1,000. Compared to an entrepreneurship class at any college, that's quite cheap!

The Next Step

Jeff is continuing to work on Blurity. The recent press he's gotten has kicked Blurity into overdrive and he's continuing to knock it out of the park.

As for me, there are a few small side projects that could use some attention. I'm also getting back into writing. A few larger ideas are percolating, but nothing's quite finished brewing yet.

Comments for this post can be found on Hacker News.

Gatekeeper: Using FUD to get Developers to Pony Up

Posted August 29, 2012

A new user comes to your site and downloads your installer. They extract the disk image and fire up the PKG file. They're greeted with this message.

Corrupt

"Hmm," they think to themselves, "maybe something went wrong with the download." So they download it again, extract the disk image and run the PKG file again. Same error. The file is damaged. Now running late to pick up their kids from school, your new users leaves, likely never to come back.

This is their first introduction to Gatekeeper, a new feature of OS X 10.8, a.k.a. Mountain Lion. Intended as an anti-malware feature, Gatekeeper prevents downloaded applications from running, depending on their source. Similar to the SmartScreen in Windows, Gatekeeper presents the user with a warning when they are trying to run unsigned applications.

However, there are two important differences between Gatekeeper and SmartScreen.

First, with SmartScreen, the user has the option to continue running the application, despite not being signed. However, the default option for Gatekeeper provides no option to continue.

Gatekeeper dialog

(Note, for applications, unlike PKG files, the warning dialog is somewhat more helpful, though it still does not allow the user to actually run anything.)

This causes problems for any developers who have not paid Apple $100 for the Mac Developer Program, as the users are not left with any obvious way of continuing past the warning. This includes many developers who have chosen not to release via the App Store. Jeff and I have run into this problem already with users of both Snaposit and Blurity.

The second, and possibly more troubling , is that Apple is the only source for developer certificates. With SmartScreen, there are a wide variety of certificate authorities and brokers to choose from, each of which offers different pricing options, so the developer has a chance to comparison shop, which yields prices on the order of a third of the cost of the Mac Developer Program. But for Mac developers, Apple has created an artificial barrier instead of using existing and well established signing practices in order to make themselves a monopoly for developer certificates.

Is there any way around it?

Yes, but you're going to have to educate your users. Each and every one of them.

There are two options:

The first, and easiest, is to instruct your users to Control-Click (or right-click) the application once they've downloaded and installed it. From the context menu, the'll need to choose "Open".

Context menu

When they start the application this way, they'll be presented with a slightly different warning dialog, which allows them to continue.

Gatekeeper dialog

The next time they launch the application, they shouldn't see any warnings.

The other option is to have them disable Gatekeeper entirely. To do this, have them go to System Preferences and choose Security & Privacy.

System preferences

Once there, they'll need to click the lock in order to make changes. Once they've unlocked the preferences, they'll need to choose "Anywhere" from the list of options under "Allow applications downloaded from:"

Security preferences

This will essentially disable Gatekeeper, reverting their computer's security to a state similar to OS X Lion. Obviously, this isn't ideal, but if it's something they might run into often, it might be the best choice.