background-shape
feature-image

What is Event Sourcing?

Traditional Databases and Event Sourcing

In a traditional application, we usually save the most recent change to a record and don’t care about the historical states a record has had. This is a very good approach, and if your application requirements are simple and doesn’t require all previous states of your records, then I believe that its best to make your life easier and stay with the traditional database approach. Why? Well, because traditional databases are good at doing their job – i.e. to keep track of your records in a nice systematic manner. This is a very good and indeed valid approach. However, there are times when this approach doesn’t quite work. For example, there are time when there is a business requirement that you need to keep track of all states of each record within certain tables or indeed all states of all records within your databases.

This approach is called Event Sourcing and for its purpose is very good since the data integrity is unmatched when compared to the traditional approach of storing the current state of your records, however there are also some caveats (as with everything else in Software Engineering.) The caveat I usually see is that, its not very efficient when compared to the traditional approach unless your implementation is highly optimised and even then you’re likely to find things in the traditional approach that are still more efficient even after all your optimisations.

Anyhow, I wanted to make this clear on the outset that I’m not against using this nor am I advocating the use of this approach on every project that you come across. Software engineering is about making sure that you weigh in all pros and cons of each approach and deciding on the right one for you to use in your use case. That can only be answered by yourself, not by this article so all I can say is to keep that in mind when deciding on which approach to go with.

Event Sourcing

Remember that the fundamental idea of Event Sourcing is to keep track of all the states your records have had in the lifetime of your application (and beyond.)

To that extent, we begin with our tradition database approach. Consider that we have a database table called orders . This table has fields id , status , price , pending_amount

and customer . Also note that each order record must have either status “PENDING”, “PARTIALLY_PAID” and “PAID”. Now in this design, the order can go through multiple phases of “PARTIALLY_PAID” until the pending_amount reaches 0 .

But in this scenario, we won’t have track of what we got each time we made an update to a record. NOTE: there could be some improvements to this record with another table to keep track of each change but that is essentially event sourcing, which is what we will do but in a very basic sense without much thought (thus it won’t be efficient but it will give you the best idea of what event sourcing consists of.)

Back to the earlier design, consider the record in its different update phases. Initially this customer orders product A with £10 initial payment and asks for it to be delivered:

id status price pending_amount customer
1 PARTIALLY_PAID 100 90 John

Then we give this customer what he ordered, and move the status to pending:

id status price pending_amount customer
1 PARTIALLY_PAID 100 90 John

Now suppose we have customer John send us more money, say £50 to pay for what he owes us. Then in our systems, we would update our record like so:

id status price pending_amount customer
1 PARTIALLY_PAID 100 40 John

And finally he sends us £40 pounds to settle his payment, which results in the following update to the record:

id status price pending_amount customer
1 PAID 100 0 John

Now suppose that your business wants to track each and every one of the transactions that happens to your business. In this design, we don’t have a way to do this unless we store more information in a separate table. So we create this new table, call it order_transactions with fields id , order_id , old_status , new_status , price , old_pending_amount , new_pending_amount .

Now in this design, consider the same scenario of the customer John. Initially John orders product A with £10 initial payment and asks for it to be delivered, in the backend we store the following information:

Table 1: Main table

id status price pending_amount customer
1 PARTIALLY_PAID 100 90 John

and

Table 2: History Table

id status price pending_amount customer version
1 PARTIALLY_PAID 100 90 John 1

Now, John comes back and gives us £40 payment to reduce what he owes to us, so we store this record by updating the previous record in our main table and adding a new record in our History Table, like so:

Table 1: Main table

id status price pending_amount customer
1 PARTIALLY_PAID 100 50 John

and

Table 2: History Table

id status price pending_amount customer version
1 PARTIALLY_PAID 100 50 John 2

Notice the version number of this newly added record.

Finally John pays the left over £50, to clear his account with us, which is done by doing the same thing again:

Table 1: Main table

id status price pending_amount customer
1 PARTIALLY_PAID 100 0 John

and

Table 2: History Table

id status price pending_amount customer version
1 PARTIALLY_PAID 100 0 John 3

As you can see, now we have record of each transaction a customer has with us. This is essentially what event sourcing is. You can probably do this in a more efficient way by having a separate account and transactions table for our customer John but I leave that up to the reader to design such a system. Another reason why I didn’t go with a separate account and transaction table is that initially we were not clear on our requirements, this can essentially happen to anybody in Software engineering and it might in fact be that you may not have resources to restructure what you have in your database and this is the bare minimum you can do to get the benefits of Event Sourcing.

Database Transactions

Anyways, I wanted to point out that this is currently not a robust design, especially when we’re working in distributed systems. If you’re familiar with database transactions, then this would sound familiar to you. If not, please read on anyways because its a very important concept.

Consider the case where in our earlier system, you are updating the customer Johns gives us £50 to settle his account with us. We add the data in Main Table without issues, but unfortunately it fails when we try to add a new record into the History Table. Now, there is a record in one of our systems but its not present in the second system. Now comes the rub, which system should we trust completely for an audit? If we trust the first table, then we have no record in the second table but if we trust the second table the record of the customer clearing his account with us doesn’t exist. Isn’t that a big problem? I think it is but not a big enough problem that we would have to devise a new system.

All we need to use is something called a database transaction feature of most modern database systems (even MongoDB provides transaction based operations - so not only limited to SQL databases.)

Now you may be wondering what is a database transaction. A transaction is a unit of work that you want to treat as “a whole.” It has to either happen in full or not at all. Meaning that in our case if one of your add/update fails, the whole operation would fail and this is why its a must to be aware of them and be able to use them if need arises in your career.

Since I haven’t presented any code in this post, I would refrain from showing how it works but they are quite straight forward if you’re working directly with SQL and I’m sure if you’re using an ORM then they may in fact have this functionality present so I urge you to read their documentation, you may find something about transactions there.

Closing thoughts

In my mind, Event Sourcing is not a new idea. Anyone who has worked with SQL knows that this method is essentially the best way to keep track of all changes to any particular table; and is often used all over the place. I just wanted to share my theoretical understanding of this concept since its become a buzzword at this point. Furthermore, it may be that my understanding of the concept is primitive in your opinion, in which case I would like to find out why you think that and where my understanding is incorrect.