As you build more advanced solutions, you may find the need to do collaborative processing that goes across multiple systems. Since failures are normal in these environments, some activities may fail while others succeed, which will yield inconsistent outcomes. There are mechanisms to handle these situations such as coordinated 2-phase commit (distributed transactions), but in the world of higher scale and cloud computing, these options are no longer suitable.
The Saga interaction pattern was designed to handle these failures. A Saga is a distribution of long-living transactions where steps may interleave, each with associated compensating transactions providing a compensation path across databases in the occurrence of a fault that may or may not compensate the entire chain back to the originator.
In the paper written by Hector Garcia-Molina in 1987 where the Saga pattern is introduced, Hector presented two possible implementations. Implemented directly in the DBMS (database management system) along side the transaction coordinator the Saga log and the database transaction log could be merged together. Another implementation could be on top of an existing DBMS using save points.
The original paper intended use within or on top of a database. A Saga is useful in other contexts as well, but this doesn’t change the fact that a Saga is about compensation across systems.
A Saga is a set of rules for routing a job to multiple collaborating parties, and allowing these parties to backtrack and/or take corrective action in the case of failure.
There are frameworks and examples out there using the name Saga that don’t exhibit the properties defined by Hector Garcia-Molina and resemble a state machine or workflow instead of a Saga. A Saga may be built on top of a workflow and a workflow is built on top of a state machine but state machines or workflows are not Sagas.
The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition, this is called a transition.
A state machine has a set of defined states. Each state has a set of defined operations available.
An example is a car alarm. When the car alarm is engaged, the possible operations are to disengage the alarm or trip the alarm.
Kevin Junghans describes a workflow as
A workflow represents a sequence of activities. The transition between each activity, or step, occurs when a previous activity is completed. Workflows can have decisions on transitions that can cause branching to other activities. Workflows are commonly used to depict business processes.
Kevin continues to state
The main difference between our state machine and activity diagram (i.e. workflow) is that the focus is on actions instead of states and the transitions occur when an action is completed, instead of when an event occurs.
In Service-oriented architectures an application can be represented through an executable workflow, where different, possibly geographically distributed, service components interact to provide the corresponding functionality, under the control of a Workflow Management System.
A workflow can be built on top of a state machine and is similar to a program in that it has a defined set of steps that can alter the control flow based on the results of each step execution.
An example is creating an IT ticket because your computer won’t boot. The user submits a ticket into the system, the system emails the IT staff, the IT staff fixes your computer and the ticket is closed.
The Process Manager Pattern
The Process Manager pattern from the Enterprise Integration Patterns book by Gregor Hohpe is a workflow pattern and more closely resembles the implementations we are seeing in the community that are incorrectly being labelled Sagas.
A Saga is a distribution of multiple workflows across multiple systems, each providing a path (fork) of compensating actions in the event that any of the steps in the workflow fails.
Figure 1. Saga compensating due to a failed transaction (T4).
From 1992 in the publication ACTA: The SAGA Continues, Panos K. Chrysanthis and Krithi Ramamritham describe the characteristics of a Saga as
Sagas have been proposed as a transaction model for long lived activities. A saga is a set of relatively independent (component) transactions T1, T2…Tn which can interleave in any way with component transactions of other sagas. Component transactions within a saga execute in a predefined order which, in the simplest case, is either sequential or parallel (no order).
Each component transaction T1 (0 ≤ i < n) is associated with a compensating transaction CT1. A compensating transaction CT1 undoes, from a semantic point of view, any effects of T1, but does not necessarily restore the database to the state that existed when T1 began executing.
Further more there are other patterns that build from the Saga pattern such as Kangaroo Transactions (Margaret H. Dunham, 1997). Kangaroo transactions deal with transactions in mobile environments that hop from one base station to another as the mobile unit moves through cells. Similar to the Saga pattern a kangaroo transaction is designed for the purpose of compensating in the event of failure.
Saga is similar to a transaction in the sense that it provides a shared context for an attempt to get a distributed consensus. Unlike a transaction which insures ACID properties, Sagas are not.
When a saga is aborted the only thing the coordinator can do is pass the status to the participants. Each of the services is responsible to do its best effort to handle the abort (either by rolling back, compensation or whatever)
Workflow is another thing altogether. which keeps a context between calls and means externalizing the decisions on the logic flow from the business logic (usually with a workflow engine). You can use workflows within a service (a pattern I call workflodize) or you can use them externally (a pattern I call orchestrated choreography e.g. BPM).
You can use either form of workflow to support the implementation of a saga but you can also implement sagas without workflows.
Example of a Saga
Clemens Vasters has kindly implemented an example of a Saga that he describes on his blog. I highly recommend reading his explanation of the example. Key things to notice that differ from implementations calling themselves Sagas that are not is the lack of centralized coordination and lack of centralized state. As Arnon Rotem-gal-oz stated above: “a shared context”.
- The 99th percentile matters
- Batching and pipelining linearizable operations in replicated logs
- Trick to reduce allocations improves response latency in Haywire
- Improving the protocol parsing performance in Redis
- Mencius and Fast Mencius a high performance replicated state machine for WANs
- Tuning Paxos for high-throughput with batching and pipelining
- Scalable Eventually Consistent Counters
- Create benchmarks and results that have value
- Routing aware master elections
- My new test lab
- Responsible benchmarking
- Understanding hardware still matters in the cloud
- The “network partitions are rare” fallacy
- Messaging and event sourcing
- Further reducing memory allocations and use of string functions in Haywire
- HTTP response caching in Haywire
- Atomic sector writes and misdirected writes
- How memory mapped files, filesystems and cloud storage works
- Hello haywire
- Active Anti-Entropy
- October 2014
- September 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- January 2013
- October 2012
- September 2012
- August 2012
- May 2012
- April 2012
- February 2012
- January 2012
- December 2011
- September 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010