July 15, 2009

Project Mercury: A pre-configured Drupal+Varnish EC2 AMI

Do you want your Drupal front page to render in less than a second? Do you want your site to be fast for logged in as well as for anonymous users? Do you want to have total confidence in your ability to weather the storms of internet fortune (e.g. links from Digg, Drudge, Slashdot or MSN.com)? If so, then we hope the Mercury project will be of interest to you.

The goal of this project is to make Drupal as fast as possible for as many people as possible. To that end, we are developing a pre-built Amazon Machine Image (AMI) which will allow anyone with an Amazon Web Services account to spin up an EC2 instance and see how all this works in real-time. The ultimate goal is a production-ready release that can be used for deploying real websites.

Today, thanks to this inspiring post from Eric Hammond at Alestic and some excellent feedback from the Drupal community, I'm proud to announce the public availability of an initial Alpha release. Don't use this for production, but if you want to see how these techniques work in action, you can get a working copy with root access for just a little scratch; ten cents an hour to be precise.

For those hungry to get started, the public AMI id is ami-0722c36e. It's a 32-bit instance, and I've run all my tests on the "small" instance type. You can find it pretty easily by searching for "chapter3" in the AMI list:

This ready-to-run machine image contains the following high-performance options, all configured to work for a harmonious liquid-metal fast WIN:

  • Ubuntu Jaunty base operating system
  • The latest Pressflow Drupal
  • Varnish HTTP acceleration
  • mod_deflate configured to compress pages
  • An up-to-date memcached/libevent and libmemcached install. This includes experimental support for the new libmemcached-based PHP library via this patch to Cacherouter.
  • A basic boot script to update Pressflow from the BZR repository and move disk-intensive operations like MySQL and Varnish storage into /mnt

Since the install comes pre-configured and I didn't have time to do this as a profile, you'll need to use the user #1 credentials I set up. Login: root. Pass: drupal. Change this immediately.

Now, if you want to know more about how this works, see links to the giants who's shoulders we're standing on here, or if you've got a minute to kill while your instance spins up, read on for a full explanation of what we've done in this initial Alpha release.

Pressflow
As our good friends at Four Kitchens say, Pressflow makes drupal scale. The Pressflow project is important because Drupal core is code-frozen well in advance of most real-world deployments, meaning the kinds of tuning patches and tweaks necessary to make Drupal-powered sites screamin' fast aren't included in the stock download. Pressflow fills that gap by including core patches which are well-tested and necessary for high performance.

The project is maintained by some of the top high-availability minds (we're working on some advanced memcached features, for instance) and is what powers drupal.org itself. The upshot here is that you can trust it, and it doesn't materially differ from stock drupal in ways that mater for 99% of development.

As you can see, it's just like Drupal, but with a geared icon to remind you you're a power user:

Varnish
If you run a complex Drupal site and want to stop stop stressing about traffic spikes and start seeing pageloads in under a second, you need to start treating Apache+PHP as an application server, and not a simple web server. Varnish is a purpose-built HTTP accelerator, carrying the work of traditional reverse-proxy features forward by focusing on the specific application of delivering web content as quickly as possible.

Pressflow allows us to configure Varnish to run "in front" of Apache+PHP+Drupal, handling all anonymous page requests as well as static files. This requires configuring Apache to run on a non-standard HTTP port (in this case we use 8080), and configuring Varnish to respect the headers and conditions which Drupal operates within for determining a logged-in vs. logged-out page request.

The result is that Apache no longer has to bother itself with css, js, jpeg or even anonymous page requests (except for once, after which Varnish will take over). This gives you the fastest possible anonymous pageview performance. Better than boost, even.

Drupal, PHP and Apache
Back in more familiar territory, we've configured Drupal's performance settings to maximize the benefits of Varnish, and enabled the must-have APC opcode cache to accelerate Drupal in general.

We also installed and configured Steve Rude's exciting CacheRouter module to allow the use of different engines for Drupal cache optimization. Since I wanted to try out the newest version of memcached, I built some support for that, as well as a "none" cache option which we use for page caching since Varnish handles all that for us. Patches for these methods are here and here.

I was previously using APC as the local drupal cache as it is a little simpler as it doesn't involve running a separate service. This is less error-prone, more secure, and allegedly as fast (if not faster) than running memcached according to the folks at Facebook. More testing is needed to see what the difference is here, if any. The major limitation of APC is that the caches cannot be shared across machines, but since our initial goal is a one-box solution, it's all good.

Finally, our Apache config enables mod_deflate, which is important for end-user page load times. Not only does mod_deflate compress (gzip) all possible content transferred out of the server, which can shave 100s of milliseconds off the real-world pageload, it also allows us to turn off Drupal's own page compression, and avoid additional modules like css_compress. This means less work for Drupal and faster response times overall.

Testing
To see what all this mean, I spun up a Mercury instance, logged in as user #1 (login: root, pass: drupal, change this immediately), and installed devel to generate 500 dummy nodes.

I then set up a local jmeter test script to hammer the /node page. This isn't a good real-world use-case, but it's fine for this simple performance benchmark. With 50 threads hammering as hard as they could, I got up to over 2000 successful requests per minute:

During which server load was a whopping 0.02.

Indeed, with Varnish handling requests, your operative bottleneck rapidly becomes bandwidth and networking capacity. That's where you want to be though, as these are the simplest and least expensive resources to scale.

Credit Where Credit Is Due

I should acknowledge all the great developers who have made this possible with their trailblazing documentation and code. I am doing virtually no innovation in this process, just integrating a bunch of existing pieces. As they say, we stand on the shoulders of giants.

Roadmap

The next steps for me are to continue working on the ec2 rollout piece, as well as developing a better test suite so I can easily benchmark configuration changes and more finely-tune the stack.

I also plan on integrating a much more fully-featured install of Drupal, with a lot of a-list modules included and configured. The goal is to eventually get to a public release that can provide immediate value to folks looking for a Drupal CMS solution in the cloud.

The roadmap is something like:

  • Continued alpha releases (at least one or two) focusing on improving underlying infrastructure and making testing/benchmarking easy.
  • Beta releases focusing on the pre-configured Drupal install and admin experience. Maybe including apache Solr?
  • A public release (in time for drupalcon?) that could be used in small-scale production cases. Possibly including a 64-bit version.

Ok then. If you've read this far, what are you waiting for? Go get an Amazon Web Services account and try this thing out. Running it for an afternoon will literally cost you only a dollar. :)