Why Event Sourcing?
Event Sourcing is a concept that becomes increasingly popular day by day. Even ThoughtWorks has brought it into it's latest Technology Radar. Let's do a quick overview of ES one more time.
In essence event sourcing is about persisting data in a way that preserves every single bit of information. It's about representing objects as a sequence of events that took place through the time and led them to the current state.
For instance, if I were to persist information about my pocket money (i.e.: 67 EUR), I could simply save the latest state somewhere in a variable or database:
Balance: 100 EUR
Now, whenever there is a change, we would overwrite this value with the new value (discarding the previous one). Then at some point in time we will have something like this:
Balance: 67 EUR
Simple and elegant (and works perfectly in a large number of scenarios). However, we are performing a logical compression here (lossy one) and discarding some information. Let's see what would happen if we were to preserve all the changes:
Got from ATM: 100 EUR
Bought metro tickets: -12 EUR
Grabbed a lunch: -8 EUR
Found a coin: 1 EUR
Took taxi: -14 EUR
Obviously, if we have such a sequence of events, we can always "reconstruct" the current balance, by doing a total:
Balance: 100 - 12 - 8 + 1 - 14 = 67 EUR
In essence, the final state (Balance) is a left-fold function of the previous states (equivalent of IEnumerable.Aggregate
in .NET, std::accumulate
in C++ or array.reduce
in JavaScript).
Continue Reading on Bliki...
Reader Comments (13)
Great stuff! This will work nicely as an introduction to collegues who have not yet heard about ES.
Olav, glad you've liked it!
ES intro was precisely the intent :)
Jdon Framework is a Event Sourcing opensource framework , DDD + DCI + Domain Events.here is a introduction:jivejdon.blogspot.com/
I find ES to be an interesting concept, and while I have re-implemented the simple CQRS sample by Greg Young and the sample app for the Ncqrs framework, the questions that remain unanswered but are the most critical for me are the following.
1) How should one go about making changes to the event definition/structure? Particularly if you need to rename a field, break up/split a field, replace a field, remove a field, or event break up/split an event, how should one proceed to a) apply the changes to the serialized event instances and b) apply the changes to subscribers/listeners (the units that project events into read models and other). Most of the answers I get to this question go something like "if you model your domain right, you won't have to change your events much." I find that answer a bit naive and disingenuous. I am willing to consider that thinking about your domain in terms of streams of events may eliminate some of the challenges that come with modeling your domain in terms of entities, but I don't buy that the result will be a perfect domain model that will never have to change.
2) How does one deal with new subscribers/listeners (the units that project events into read models and other)? The answer I usually hear is to replay the events. Well that part is rather self evident, you need to repeat the information to the guy who just joined the conversation. What I'm interested in are things like, should one replay the events only for a specific subscriber, if so, how is that done and doesn't that eliminate the loose coupling between pub and sub? Or if the answer is to replay the events on their usual channel, and rely on the idempotency of the events, then my question is won't this negatively impact performance in a large system if every time a new sub is added all the existing subs have to consider and then discard events they had previously received?
I hope you won't think my questions too terse but I think that these details are frequently glossed over and yet they are, in my view, the primary obstacles to begin implementing cqrs and event sourcing.
Great intro! One aspect to consider is that an event you created yesterday may not be possible to load in the new version of the software. So, restoring events by reading from the beginning of time is not guaranteed to work without knowing which version of software that creates the event. This may have implications in some scenarios.
Another idea is to put events in an atom feed. Atom has a syntax for the temporal aspects. That may be suitable if you want to get a clean interface to the outside world.
Peter, event versioning is never a problem, if you handle it explicitly. I use in-memory upgraders (something to be blogged about) as a way to keep the entire event streams up-to-date without even rewriting them. There are other options as well.
As for the interfaces, the cleanest one that I found was about downloading files containing serialized event streams. Faster and independent from any syntax :)
Kristoffer, thank you for investing time in this comment.
In short: Solution to 1 in my situation lies in the conscious combination of the following: (a) use of evolution friendly serialization format (i.e.: Google ProtoBuf that allows renaming fields and can handle unknown data), (b) use of in-memory upgraders to make sure that old events are upgraded properly to the latest version (i.e.: filling in missing fields, splitting old events or converting them completely) and (c) careful modeling effort that considers all these from the start. No silver bullet here, though.
Solution to 2 is purely technological (although I rarely care about the performance, since this could be handled later). The simplest way for replays in pub/sub scenario is when each subscriber maintains a log of all events that it received. If it goes down (or goes for the rebuild), then first it replays the old log and the continues listening to the new events. Essentially this approach (one of the many) follows the principles of building reliable distributed systems (esp. in the cloud scenarios)
Does this help?
Rinat, thanks for following up. I hope you get a chance to blog about the in-memory upgraders. I imagine they are conceptually similar to database migrations but I'm sure there are important differences as well.
I understand what you are saying about the management of subscribers and I will have to spend a little more time learning about this space to get a handle on the implementation details.
BTW, I just took a look at the new site for Lokad.CQRS and it looks pretty good. Keep up the good work.
Kristoffer, upgraders are much simpler than any SQL I've used to work with. I've got to write an article on that. Please, don't hesitate to ping me if I don't deliver one in the next few weeks.
Rinat : "hint: if you have a lot of events with following words in their names, then you are doing something wrong: Create, Insert,"
Rinat :
"// projects
CreateProject? (projectId, name, int Rank, auth)
ProjectCreated! (projectId, name, int Rank, security)"
I am confused :)
Omari, key point was "a lot of events with the following words", which would indicate lack of proper domain modeling. Sometimes you can't avoid these terms just because they are present in the language of the domain that you are trying to model.
For instance, in our field (Lokad.com), CRUD words show up (on average) on one event out of 10, which is acceptable here. Your mileage may vary, though.
Rinat,
You might want to mention that events are naturally immutable, which helps a lot with parallel processing.
Changes in events structure are events too and can be subject of ES. Might help with upgrades ;)
Oleg. thanks for the reminders. I've added parallel processing notes to the article (migrated to bliki for easier updates).
As for the event upgrades - there is another article on versioning, that explains one of the approaches here.