Aug
22

Further reducing memory allocations and use of string functions in Haywire

By kellabyte  //  Performance  //  3 Comments

I wrote previously about how I learned that calling several million string functions like strlen() and strcat() per second really impacted performance in Haywire. Those steps really helped out regaining requests per second and network throughput performance.

After looking at some other really high performance C code (for example LMDB) and speaking with Howard Chu I quickly learned that I was still calling too many string functions and calling malloc() way too much and that memory allocation is expensive. At that point I went off and learned all kinds of techniques to improve memory allocation performance like slab memory allocators etc. I wasn’t prepared yet to write a memory pool or anything like that in Haywire and I didn’t want to just cover up my excessive malloc() calls by introducing one. I wanted to write better code first.

The LMDB code base has a struct that looks like this.

typedef struct MDB_val {
 size_t mv_size;   /*< size of the data item */
 void *mv_data;    /*< address of the data item */
 }

At first I didn’t quite get the point about how this struct was going to save me a lot of CPU cycles until a bunch of people on Twitter started asking me about when I was going to write an OWIN layer for Haywire and I started to ponder how I was going to deliver good performance to the managed world through P/Invokes and that’s when it hit me.

A string on the CLR or JVM already has the length calculated. What a struct like this allows me to do is concatenate multiple instances of this struct for the response string without ever having to call strlen() to figure out how many bytes I need to memcpy() because it will already be given to me in the size member of the struct.

The second realization was that this also allows me to reduce malloc() calls and do all kinds of zero-copy optimizations. I was calling strdup() all over the place before and I was able to almost eliminate all of them and the ones remaining I think I can do better.

Now in a function that receives a routed request you can do this and Haywire won’t allocate any memory internally.

hw_set_response_header(response, &content_type_name, &content_type_value);

Introducing some zero-copy optimizations reduced millions of malloc() calls per second that were happening.

When I complete the CLR wrapper for Haywire since the string length is already stored in the byte buffer and exposed by the Length property we don’t need to strlen() anything on the C side of the P/Invoke fence if I P/Invoke providing the structs.

I created the hw_string struct during this work and as you can see we gained some significant performance to the point that the 800 mbps network link is holding Haywire back while the CPU is just barely breaking a sweat. I hope to further decrease CPU usage so that I can sustain the same results but with less CPU usage.

Haywire’s code base isn’t all transitioned to hw_string yet and I plan to do that but the major areas I did re-work have made really noticeable gains.

 

  • http://dcreager.net/ Douglas Creager

    Removing malloc calls is a great suggestion. C has a reputation for being hard to work with since you have to manage memory allocation and deallocation by hand, but I’ve never found that to be the hard part — just make sure you know which part of your code “owns” each object and everything else falls into place. Eliminating as much memory allocation as possible is the more important part (at least for efficiency, which is presumably why you’re coding in C in the first place (besides the fact that C is awesome!)). If you can get the number of mallocs to be constant, and not tied to the amount of data or number of requests you’re processing, then you’re golden.

    And my favorite part is that this rule of thumb is just as important in GC languages like Java! “You should use a GC language so you don’t have to worry about memory allocation. Oh, and make sure to keep track of your memory allocation so your GC pause times aren’t too large!”

  • http://dcreager.net/ Douglas Creager

    Also if you want to play around with memory pools, managed buffers, etc, without having to write them, you can take a look at libcork (github, docs), which is where I’ve been putting some of this stuff as I’ve needed it for work projects. Basically a lighter-weight alternative to glib, apr, etc. It should be easy to embed into Haywire’s source tree, if you don’t want to declare it as an external dependency.

  • gimenete

    This reminds me this article from Joel Spolsky: Back to the basics http://www.joelonsoftware.com/articles/fog0000000319.html