Aug
20

HTTP response caching in Haywire

By kellabyte  //  Performance  //  4 Comments

Now that I’ve introduced Haywire I’m going to dig into some of the things I’ve done up to this point to get the performance I’ve gotten.

At first Haywire was parsing HTTP requests and routing them to handler functions that register to a URI but it was still responding with static responses. Benchmarking a HTTP server in this state isn’t all that interesting since it wasn’t doing anything dynamic but I still used it as a baseline to compare against so that when I added code I understood when I introduced something really bad.

I then created a response struct and started writing the functions that support dynamic construction of the response and let you set things like HTTP status code, headers and the body.

Performance before (keep an eye on requests/s, CPU and network rates)

Performance after

Benchmarking showed I really killed performance. When comparing the two graphs the newer version was going at a rate that was 178,000 requests/second less while using 40% more CPU and at 200mbps lower rate than the previous version. At first I thought that this was normal, Haywire wasn’t doing anything actually interesting before and now that it is, this is just the price to pay for actually implementing features. I wasn’t comfortable with this though, so I thought for a few days what to do about it.

Problem

After talking to a few people I realized just because I’m writing C doesn’t mean I’m doing things in efficient ways. I learned that functions like strlen() and strcat() are maybe pretty fast in C in general but not so much when you have millions of them happening every second.

An example of a HTTP response looks like this.

HTTP/1.1 200 OK
Server: Haywire/master
Date: Fri, 06 Aug 2013 00:31:53 GMT
Connection: Keep-Alive
Content-Type: text/html
Content-Length: 14
hello world

I was using strlen() and strcat() to construct this response based on the status code, headers and body in the response struct. A dozen calls to strlen() and strcat() were happening every request, and multiply that by 400,000 and you quickly get to millions of slow string functions happening every second.

HTTP response caching

I’ve done a few things to reduce the string functions I call but the first big impact I did was introduce a HTTP response cache. After looking at the above response I realized there are parts of the response that never change or infrequently change.

HTTP/1.1 200 OK

There are only currently 2 protocol versions (1.0 and 1.1) and well under 100 response status codes so it makes sense to pre-construct each combination of this string and hold it in memory and never concatenate it again.

Server: Haywire/master

This isn’t going to change from one request to another, so this can also be constructed and kept in memory.

Date: Fri, 06 Aug 2013 00:31:53 GMT

The date/time was an interesting one and this header ultimately was the one where I had my “aha” moment. I did a bit of research and as far as I can tell so far, HTTP servers typically only have up to the second resolution. That means I could construct all combinations of the first 3 lines of the response and cache them in memory once a second and still have the proper time resolution. This reduced the amounts of times I called strlen() and strcat() significantly.

The first implementation every 500ms (just to give a little tighter resolution) would clear the cache and re-construct all the possible combinations. This worked out really well but I decided this wasn’t the ideal solution since I doubt all HTTP status codes will get used every second. I still felt like I was doing wasteful work.

What I decided on in the end was every 500ms a timer runs and clears the response cache but doesn’t construct anything. When the next request comes in it will construct the first request for a given HTTP status code and store it in the cache. Any following requests in the 500ms window using the same status code will be a cache hit.

Reducing what was adding 4′ish million string function calls per second to around 1 million was a huge gain. I learned quickly that avoiding string functions was key to keeping Haywire fast. There are still some of these happening now but greatly reduced compared to previously. A big win for making that happen was response caching but there was another big win I’ll blog about soon.

  • http://haacked.com/ Haacked

    Great write-up! Very cool approach to optimizing perf! I love seeing novel uses of precalculation and amortizing them across multiple uses like this to eek out performance gains.

    p.s. typo in the 3rd paragraph “funcitons”

  • http://blog.scooletz.com/ Scooletz

    What a nice optimization! Have you used any tools beside talking with C people?

    As currently I go through the internals of strings in .NET I sometime miss the ability to operate on pure char* and doing it without plenty of reallocations. Waiting for your managed wrap of Haywire. I’m curious which direction will you take, will there be strings or maybe some unsafe pointerish structs as well.

  • kellabyte

    Scooletz,

    I think you’ll find the next blog post about Haywire interesting. It’ll talk about how I handled strings :)

  • http://byterot.blogspot.com Ali

    Back in 2006, there were bunch of C++ desktop guys (as a 3rd party company) that did a web application/service for us.

    The application was essentially doing not a lot but was running like a dog. It was an ISAPI which is supposed to be super fast. They had cobbled together all the various operations including HTTP and even XML parsing.

    The profiling showed that the bottleneck was ………….. no surprise… string functions.