In the industry there is a lot of reuse of storage engines because writing a good, reliable and efficient one is a tough challenge and takes a lot of time to get right. If a database implements its own storage engine I’m likely to be skeptical of the implementation until it has been proven to be solid. It takes time to iron out all the bugs and if someone tries to sell you a solution on top of a storage engine they whipped up in a short amount of time, you should be skeptical if you want to trust it with your data. In general, I rarely want to be the trail blazer with these kinds of things in a production system.
In some of my database experiments on GitHub I used Google’s LevelDB as the backing store. LevelDB is generally a good storage engine but it’s important to remember it’s designed and developed for a mobile web browser. It’s a testament to its design that people are using LevelDB in databases running on servers but there are limitations and several people have had issues. Some of its design definitely has a mobile influence which doesn’t scale well to hardware with more capacity.
I know some companies who have made modifications to private forks and there are also other open source forks that make some significant changes to fix major issues and performance problems experienced in services and database workload environments.
If you’re thinking of using LevelDB on the server or are copying the LevelDB design/code to write a new storage engine, I highly recommend looking at these forks because they are much more suited for these kinds of workloads.
I’m aware of a dozen or so database or service related projects taking some influence from some of my experiments so I feel compelled to express this information
Basho has made significant improvements to LevelDB for use with Riak you can find here. Take a look at the GitHub issues and closed items to get a sense of what kind of stuff I’m talking about. Matthew Von-Maszewski of Basho did an absolutely phenomenal deep dive talk on some of the problems in LevelDB and how they fixed them (one of my favorite talks ever). One of my favourite quotes from the talk was “Measuring your performance against LevelDB is like running a race against someone with their 2 legs cut off”.
HyperDex has also made significant changes which they call HyperLevelDB.
Thanks to Justin Sheehy for the links!
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- Lightning Memory-Mapped Database
- Write amplification
- Amortizing de-duplication at read time instead of write time
- LevelDB was designed for mobile devices
- AMQP and wire format interopability
- Convergent Replicated Data Types
- Configuration is bad but what about operational flexibility?
- An alternative to Paxos, the RAFT consensus algorithm
- Version tolerance and accidental operation complexity
- Hardware configurations can introduce tight coupling and increase failure foot print
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010