Microcaching for a Faster Site

Posted May 21, 2013

My website, this site, is not fast. But, because of this little trick I'm about to show you, you probably think it is.

It's not particularly slow, either, at least not when there's not too much load on it. ab reports that the median request time for the homepage is about 60ms when there's only one request per second coming in. But if traffic starts picking up, it starts slowing down. With 2 req/sec, the median jumps to 90ms per request, a 50% increase. At 5 req/sec, it slows to 225ms per request. Do some quick math and you'll see we'll soon have a problem.

Let's take a quick look under the hood. The website is a heavily modified version of an early iteration of Simple. It is written in Python using Flask and SqlAlchemy, talking to a PostgreSQL database. This is all being run by uWSGI in Emperor mode and served by Nginx.

Each of these levels could be a source of slowness. We could profile the Python code to figure out where we're spending our time. Is Jinja2 being slow about autoescaping the html? Maybe. Perhaps it's in the database layer. SqlAlchemy might be creating some bad queries, so we should log those. And, of course, we need to make sure that PostgreSQL is tuned properly so we're getting the most out of its caching. Then there's uWSGI; should we allocate 2, 4, or 8 processes to the site?

But you know what? That's a difficult, tedious process, and it's easy to make things worse in the process.


Optimization is hard! Let's go shopping.

What if we could just speed the whole thing up all at once?

It turns out that, for this type of site, where the users only see one version of the content (as opposed to a web app, where each user has their own version of the site) microcaching is an ideal solution.

Microcaching is the practice of caching the entire contents of a web page for a very short amount of time. In our case, this will be just one second.

By doing this, we ensure that when the site is under any sort of load, the vast majority of visitors are getting a copy of the site served as static content from the cache, which Nginx is very good at. In fact, because of the way the caching is set up, the only time a user would wait for the "slow" site would be if they were the first person to hit the site in over a second. But, we know that the "slow" site is pretty fast when it's under such light load.

The following is a slightly modified version of my Nginx config file for tghw.com, which shows how to do this:

# Set cache dir
proxy_cache_path /var/cache/nginx levels=1:2 
                 keys_zone=microcache:5m max_size=1000m;

# Actual server
server {
    listen 80;
    server_name a.tghw.com;
    # ...the rest of your normal server config...
}

# Virtualhost/server configuration
server {
    listen   80;
    server_name  tghw.com;

    # Define cached location (may not be whole site)
    location / {
        # Setup var defaults
        set $no_cache "";
        # If non GET/HEAD, don't cache & mark user as uncacheable for 1 second via cookie
        if ($request_method !~ ^(GET|HEAD)$) {
            set $no_cache "1";
        }
        # Drop no cache cookie if need be
        # (for some reason, add_header fails if included in prior if-block)
        if ($no_cache = "1") {
            add_header Set-Cookie "_mcnc=1; Max-Age=2; Path=/";            
            add_header X-Microcachable "0";
        }
        # Bypass cache if no-cache cookie is set
        if ($http_cookie ~* "_mcnc") {
            set $no_cache "1";
        }
        # Bypass cache if flag is set
        proxy_no_cache $no_cache;
        proxy_cache_bypass $no_cache;
        # Point nginx to the real app/web server
        proxy_pass http://a.tghw.com/;
        # Set cache zone
        proxy_cache microcache;
        # Set cache key to include identifying components
        proxy_cache_key $scheme$host$request_method$request_uri;
        # Only cache valid HTTP 200 responses for 1 second
        proxy_cache_valid 200 1s;
        # Serve from cache if currently refreshing
        proxy_cache_use_stale updating;
        # Send appropriate headers through
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        # Set files larger than 1M to stream rather than cache
        proxy_max_temp_file_size 1M;
    }
}

(Most of this code was originally derived from Fenn Bailey. Unfortunately, it seems his site has gone down with the Posterous shutdown.)

So what's going on here? Let's start with the top. I've set up the Flask site to respond to a.tghw.com. That's the actual site that we'll be caching. Note that the subdomain is not required. You could just as easily use another port, like 8080.

Next, for tghw.com, we first check to see if there's any reason we shouldn't use the cache. This includes doing a request other than HEAD or GET or having a certain cookie (which the admin page sets for me). If that's the case, we set a cookie for the next 2 seconds that says not to use the cache and we skip the cache for this request. (You want this to be longer than the caching time so your next GET request will grab a fresh copy.)

If we are using the cache for this request, then we defer to Nginx's usual proxy_pass. We tell it that all successful requests (HTTP 200) should be cached for 1 second. The choice of 1 second is pretty arbitrary, it could be longer, but since I know the app itself performs well with 1 request per second, there wouldn't be a lot of benefit to making it longer. We also set proxy_cache_use_stale to serve from the cache if Nginx is still busy updating the cache, meaning that users won't actually see a slower response while we go to the actual site.

So how does this do compared to the stock site? Well...it blows it out of the water.

Command used: ab -k -n 50000 -c [1|5|10|25|50|100] -t 10 http://[a.]tghw.com/

f
a.tghw.com tghw.com
-c req/sec med resp (ms) req/sec med resp (ms)
1 15 64 5,952 0
5 32 151 17,283 0
10 31 312 19,991 0
25 33 751 19,916 1
50 30 1,589 17,397 3
100 32 2,984 16,717 5

While the Flask site can reliably serve up to about 30 requests per second, it starts slowing down pretty significantly. The microcached site, on the other hand, serves almost 20,000 requests per second at its peak. More importantly, the response times stay in the single digit milliseconds, making the site feel nice and fast, regardless of load.

So there's an easy way to speed up your blog without having to make any changes to the application code.