Write amplification or “write cost” is a term used to describe how much physical disk is written to for a logical piece of data in the database.
Typically (but not always) when you make a write to a database it gets written to at least 2 different places.
- In the commit log for reliability to ensure you won’t lose anything if something fails in the later stages.
- In a form that suits the orientation of the databases data model.
For example, Log-Structured Merge-Tree based storage engines store SSTable’s (sorted string tables) on disk.
Often we see database or store engine benchmarks that claim superiority by short benchmarks that only run for a few minutes but this isn’t a realistic view on what this technology is going to face in a production environment.
The graph above is showing CPU usage and the green dots are measurements of the CPU I/O wait times. Meaning the CPU is waiting for I/O requests to complete. The part visible in this image of the graph is about 1 hour of the 3 hour test. It’s interesting how the iowaits are changing over time, right? How is this going to look 48 hours from now?
If we look at the disk throughput measurements and zoom in to 30 minutes into the test we see another interesting story. The writes are pretty consistent and this continues mostly through the rest of the test.
The test I was running unfortunately doesn’t output the records inserted per second so that I can graph that to compare but it does output total record count on the screen and that was going slower as time went on but the disk throughput was pretty consistent. I’ll be working on instrumenting the test to output these metrics so I can plot the trend with more accuracy.
What’s this telling us? It’s hinting that in the beginning X records were being written to using a disk throughput rate of Y. Y sustained but X degraded. This perhaps means the write cost per record was going up the longer the test was run.
Write amplification was potentially getting worse over time.
The database measured was a LSM based storage engine and it was compacting which means re-writing bytes that were previously written. Disk utilization was being spent on previously written records which impacts new writes currently in progress. Compacting log segments isn’t necessarily bad but it’s something you need to keep an eye on because the compaction strategy may not align well with your production workload characteristics. Often databases offer multiple storage engine options or compaction strategies that can help with these sorts of things.
In many cases I would rather a slower database and storage engine that behaves in predictable ways over longer periods of time than one that is really fast in the beginning but can degrade really badly later on in production. Getting a call at 2AM in the morning is never fun.
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- Lightning Memory-Mapped Database
- Write amplification
- Amortizing de-duplication at read time instead of write time
- LevelDB was designed for mobile devices
- AMQP and wire format interopability
- Convergent Replicated Data Types
- Configuration is bad but what about operational flexibility?
- An alternative to Paxos, the RAFT consensus algorithm
- Version tolerance and accidental operation complexity
- Hardware configurations can introduce tight coupling and increase failure foot print
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010