The other day I had a long conversation about serialization on Twitter about how I’ve changed my stance.
In the beginning I loved the convenience of serializing data from various formats like XML or JSON into classes (that more resemble structs actually).
I got into many debates where I held this opinion strongly.
But later on I built a system that you could not shut off to deploy updates. Availability was important and the system was large enough that an upgrade had to be a rolling operation where there would be multiple versions of the system online in production until the system reached consensus on a single version.
During these operations I witnessed something that completely changed my opinion on serialization (I can hear Darrel Miller saying “I told you so!”).
The components that used serialization were more brittle than the ones that did not.
When a v1 service gets a v2 message and then deserializes it into a struct it has lost the v2 data. If that service then decides to do something that then ends up on another v2 service, the v2 context isn’t available.
Some messaging frameworks or document databases make serialization very convenient and make lossless more difficult and you may not find out until it hits production because it may not be super obvious up front you’re making this trade-off.
Serialization is a lossy copy operation. It doesn’t matter if its for messaging or a database. If you try to deserialize v2 into v1 struct and then serialize you end up with v1. It doesn’t matter if you use formats like protobuf etc. If you serialize from the struct which is a lossy copy of the original data it doesn’t matter how flexible the format is if you’ve left it behind.
During that experience the systems that didn’t rely on serialization experienced no problems during the upgrades because instead of doing a lossy copy of data, they were interpreting the data and only coupling to the data they knew how to interpret but without losing the data they were ignorant existed.
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- Lightning Memory-Mapped Database
- Write amplification
- Amortizing de-duplication at read time instead of write time
- LevelDB was designed for mobile devices
- AMQP and wire format interopability
- Convergent Replicated Data Types
- Configuration is bad but what about operational flexibility?
- An alternative to Paxos, the RAFT consensus algorithm
- Version tolerance and accidental operation complexity
- Hardware configurations can introduce tight coupling and increase failure foot print
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010