Yesterday I was reading a blog post by Jimmy Bogard where he describes using ACID 2.0 (Associative, Commutative, Idempotent, Distributed) to create a ledger for a point of sales system. I’m not going in depth on “ACID 2.0″ in this post but the idea is that you can provide correctness in distributed systems using different techniques.
One part of the post describes idempotency as:
Idempotent – we only record a transaction if we know we haven’t seen that transaction ID before
There are a few problems with this approach which make it difficult to ensure idempotency.
- It requires the entire history of transaction id’s to be available to query for de-duplication logic.
- The list of transaction id’s is unbounded.
- It requires every write do a read before the write.
- It penalizes every transaction commit with the cost of compensating for something that has a low rate of failure. Why make 95% of the commits pay the price for 5% of the problems?
In my experience, even single system databases with OLTP workloads with designs like this aren’t sustainable very long because the transaction id index grows unbounded. If the system does any kind of archiving of old transactions, you can no longer guarantee idempotency.
Providing correctness for a window of time can be a good solution and services in the cloud like Azure Service Bus do this to de-dupe messages that occur within a bounded time constraint (because you can’t hold every single message id). However this is not idempotency. A transaction can come in outside that window of time that alters the state of the system.
I also don’t like the fact I’m forcing a read before a write. Databases work hard at making append-only internals for performance gains and here we throw all that away by forcing a query before a commit.
Another problem is if this is a highly available system and you are sharding by some property say userid and the shards for the given userid are down you may still want to accept the transaction on other shards for the time being. Unfortunately these nodes do not have the history to make the decision about de-duplication and the write will be accepted.
Let me propose another solution.
We don’t actually have to incur the cost of de-duplication at transaction commit time. Instead we can amortizing the de-duplication cost at read time. Record the transaction, even the duplicates and for any query, query for unique transaction ids which will leave out the duplicates. SQL views, stored procs or the public API of this service will isolate the private implementation details (you shouldn’t be doing shared database anyway!).
There are a few options for dealing with the duplicate data. One approach is during reads we can issue a “garbage collection” that removes duplicate transaction ids in the background. Another approach is a periodic schedule to clean up duplicate transactions. Another value add to this approach is the GC process can report how many duplicates it cleans up and we can set alerts if a system is misbehaving and sending excessive duplicates. A lot of duplicates may indicate a problem elsewhere in the system.
This option has a predictable capacity by eliminating the unbound query which is a really good characteristic to have. We gain transaction performance by eliminating a read before a write and we don’t have to hold an index with high cardinality in memory to solve it.
This isn’t the only option, just one that came to my mind while reading the post.
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- Lightning Memory-Mapped Database
- Write amplification
- Amortizing de-duplication at read time instead of write time
- LevelDB was designed for mobile devices
- AMQP and wire format interopability
- Convergent Replicated Data Types
- Configuration is bad but what about operational flexibility?
- An alternative to Paxos, the RAFT consensus algorithm
- Version tolerance and accidental operation complexity
- Hardware configurations can introduce tight coupling and increase failure foot print
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010