Now that I’ve introduced Haywire I’m going to dig into some of the things I’ve done up to this point to get the performance I’ve gotten.
At first Haywire was parsing HTTP requests and routing them to handler functions that register to a URI but it was still responding with static responses. Benchmarking a HTTP server in this state isn’t all that interesting since it wasn’t doing anything dynamic but I still used it as a baseline to compare against so that when I added code I understood when I introduced something really bad.
I then created a response struct and started writing the functions that support dynamic construction of the response and let you set things like HTTP status code, headers and the body.
Performance before (keep an eye on requests/s, CPU and network rates)
Benchmarking showed I really killed performance. When comparing the two graphs the newer version was going at a rate that was 178,000 requests/second less while using 40% more CPU and at 200mbps lower rate than the previous version. At first I thought that this was normal, Haywire wasn’t doing anything actually interesting before and now that it is, this is just the price to pay for actually implementing features. I wasn’t comfortable with this though, so I thought for a few days what to do about it.
After talking to a few people I realized just because I’m writing C doesn’t mean I’m doing things in efficient ways. I learned that functions like strlen() and strcat() are maybe pretty fast in C in general but not so much when you have millions of them happening every second.
An example of a HTTP response looks like this.
HTTP/1.1 200 OK Server: Haywire/master Date: Fri, 06 Aug 2013 00:31:53 GMT Connection: Keep-Alive Content-Type: text/html Content-Length: 14 hello world
I was using strlen() and strcat() to construct this response based on the status code, headers and body in the response struct. A dozen calls to strlen() and strcat() were happening every request, and multiply that by 400,000 and you quickly get to millions of slow string functions happening every second.
HTTP response caching
I’ve done a few things to reduce the string functions I call but the first big impact I did was introduce a HTTP response cache. After looking at the above response I realized there are parts of the response that never change or infrequently change.
HTTP/1.1 200 OK
There are only currently 2 protocol versions (1.0 and 1.1) and well under 100 response status codes so it makes sense to pre-construct each combination of this string and hold it in memory and never concatenate it again.
This isn’t going to change from one request to another, so this can also be constructed and kept in memory.
Date: Fri, 06 Aug 2013 00:31:53 GMT
The date/time was an interesting one and this header ultimately was the one where I had my “aha” moment. I did a bit of research and as far as I can tell so far, HTTP servers typically only have up to the second resolution. That means I could construct all combinations of the first 3 lines of the response and cache them in memory once a second and still have the proper time resolution. This reduced the amounts of times I called strlen() and strcat() significantly.
The first implementation every 500ms (just to give a little tighter resolution) would clear the cache and re-construct all the possible combinations. This worked out really well but I decided this wasn’t the ideal solution since I doubt all HTTP status codes will get used every second. I still felt like I was doing wasteful work.
What I decided on in the end was every 500ms a timer runs and clears the response cache but doesn’t construct anything. When the next request comes in it will construct the first request for a given HTTP status code and store it in the cache. Any following requests in the 500ms window using the same status code will be a cache hit.
Reducing what was adding 4′ish million string function calls per second to around 1 million was a huge gain. I learned quickly that avoiding string functions was key to keeping Haywire fast. There are still some of these happening now but greatly reduced compared to previously. A big win for making that happen was response caching but there was another big win I’ll blog about soon.
- The 99th percentile matters
- Batching and pipelining linearizable operations in replicated logs
- Trick to reduce allocations improves response latency in Haywire
- Improving the protocol parsing performance in Redis
- Mencius and Fast Mencius a high performance replicated state machine for WANs
- Tuning Paxos for high-throughput with batching and pipelining
- Scalable Eventually Consistent Counters
- Create benchmarks and results that have value
- Routing aware master elections
- My new test lab
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- October 2014
- September 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010