Saga Choreography implementation problems

Question

I am designing and developing a microservice platform based on the specifications of http://microservices.io/

The entire framework integrates through socket thus removing the overhead of multiple HTTP requests (like most REST APIs).

A service registry host receives the registry of multiple microservice hosts, each microservice is responsible for a domain of the business. Another host we call a router (or API gateway) is responsible for exposing the microservices for consumption by third parties.

We will use the structure of Sagas (in choreography style) to distribute the requisitions, so we have some doubts:

Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events? (the same logic applies to rollback)
Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?
If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?

I think the main point is that in this router and microservice structure, who is responsible for building the Sagas and propagating their events.

We end up adopting an event sourcing model (using Kafka) with a microservices per business domain. Every microservice have all data needed to do it's jobs. After any resource creation a Kafka message is published, and other microservices can get the data of the resource from the message. We don't need transactions involving more than one microservice, all tasks are done asynchronous. You have to think that every operation is an event to implement the coreography model. — Victor França

Byron Ruth Byron Ruth · Accepted Answer · 2018-06-18T13:32:10

The article Patterns for Microservices — Sync vs. Async does a great job defining many of the terms used here and has animated gifs demonstrating sync vs. async and orchestrated vs. choreographed as well as hybrid setups.

I know the OP answered his own question for his use case, but I want to try and address the questions raised a bit more generally in lieu of the linked article.

Should a microservice issue the event in any process manager or should it be passed directly to the next microservice responsible for the chain of events?

To use a more general term, a process manager is an orchestrator. A concrete implementation of this may involve a stateful actor that orchestrates a workflow, keeping track of the progress in some way. Since a saga is workflow itself (composed of both forward and compensating actions), it would be the job of the process manager to keep track of the state the saga until completion (success or failure). This typically involves the actor sending synchronous* calls to services waiting for some result before going to the next step. Parallel operations can of course be introduced and what not, but the point is that this actor dictates the progression of the saga.

This is fundamentally different from the choreography model. With this model there is no central actor keeping track of the state of a saga, but rather the saga progresses implicitly via the events that each step emits. Arguably, this is a more pure case of an event-driven model since there is no coordination.

That said, the challenge with this model is observing the state at any given point in time. With the orchestration model above, in theory, each actor could be queried for the state of the saga. In this choreographed model, we don't have this luxury, so in practice a correlation ID is added to every message corresponding to (in this case) a saga. If the messages are queryable in some way (the event bus supports it or through some other storage means), then the messages corresponding to a saga could be queried and the saga state could be reconstructed.. (effectively an event sourced modeled).

Who should know how to build the Saga chain of events? The first microservice that receives a certain work or the router?

This is an interesting question by itself and one that I have been thinking about quite a lot. The easiest and default answer would be.. hard code the saga plans and map them to the incoming message types. E.g. message A triggers plan X, message B triggers plan Y, etc.

However, I have been thinking about what a control plane might look like that manages these plans and provides the mechanism for pushing changes dynamically to message handlers and/or orchestrators dynamically. The two specific use cases in mind are changes in authorization policies or dynamically adding new steps to a plan.

If an event needs to pass a very large volume of data to the next Saga event, how is this done in terms of the request structure? Is it divided into multiple Sagas for example (as a result pagination type)?

The way I have approached this is to include references to the large data if these are objects such as a file or something. For data that are inherently streams themselves, a parallel channel could be referenced that a consumer could read from once it receives the message. I think the important distinction here is to decouple thinking about the messages driving the workflow from where the data is physically materialized which depends on the data representation.

Saga Choreography implementation problems

1 Answers