Event Sourcing: CQRS and querying using read models

In my previous post I introduced the concept of Event Sourcing, which is a radically different way of storing application state: instead of storing the current state, we only store the events that lead up to that state. More than one person asked the following question in response: how do you query your application state? If you have a number of appointments in your system, how do you get a list of all the appointments on a given day?

I consciously skirted around that question in the previous post, because it was already long and loaded with information. In this post I will attempt to address that question properly.

Don’t query your events

One of the rules of Event Sourcing is: you don’t query your events. Why not?

First of all, they are hard, if not impossible, to query. If you use an RDBMS to store your events, you’re most likely serializing events into something like JSON, so unless your RBDMS supports a JSON datatype, it’s incredibly hard to get some data out of them inside of a query, let alone using it to build an index of some sort.

If you use a document database, like RavenDB or DynamoDB, to store your events, this becomes a lot easier, but it’s still not a good idea. Let’s say that you have an AppointmentCreated event to indicate that an appointment was created, and an AppointmentCanceled event to indicate it was canceled. Yes, you can ask your database ‘give me the information from AppointmentCreated events where the date is X unless an AppointmentCanceled event exists for that appointment,’ which is still somewhat manageable.

What if we add rescheduling into the equation? ‘Give me the information from AppointmentCreated where the date is X, unless an AppointmentCanceled event exists for that appointment, or an AppointmentRescheduled event exists where the date is no longer X.’ Wait, what about rescheduling to that date? ‘Oh, and include appointments where the latest AppointmentRescheduled event has the date X.’ You see where I’m going with this, I hope.

Don’t query your state

You might be tempted to replay all the events in your system in memory and querying over those. This will be slow for large-ish systems and while it might work for very small systems, in theory, it’s still not a very good idea. It requires you to have knowledge of all the streams in your system. While most storage solutions would probably be able to answer this question, it’s usually a rather slow operation. A work-around to that is actually maintaining a list of streams, but this is error-prone (if done manually) and not very elegant.

Enter CQRS

CQRS stands for ‘Command-Query Responsibility Segregation’. It’s a pattern (not an infrastructure, as some people refer to it) coined by Greg Young. It requires you to have separate classes for writing data and reading data. This then allows you to have separate models for reading and writing. You could write to a set of properly normalized tables and read from a different set of tables that is entirely denormalized. You could write to a relational database and read from a NoSQL database or, in our case, write events and read from whatever tickles your fancy and suits the application.

It does this, in the words of Greg Young, by ‘creating two objects where there was previously only one.’

Let’s say we have a single repository for our (not Event Sourcing) application that deals with appointments.

interface IAppointmentRepository  
{
    void CreateAppointment(
         int appointmentId, 
         DateTimeOffset time, 
         string title);
    void CancelAppointment(int appointmentId);
    Appointment GetAppointment(int appointmentId);
    IList<Appointment> AllAppointments();
}

We simply separate this out into two new repositories. One for writing:

interface IAppointmentWriteRepository  
{
    void CreateAppointment(
         int appointmentId, 
         DateTimeOffset time, 
         string title);
    void CancelAppointment(int appointmentId);
}

And one for reading:

interface IAppointmentReadRepository  
{
    Appointment GetAppointment(int appointmentId);
    IList<Appointment> AllAppointments();    
}

You should be able to see how this makes it very easy for the implementation of IAppointmentWriteRepository to use, for example, Event Sourcing, while the implementation of IAppointmentReadRepository uses, for example, a NoSQL database like DynamoDB.

In this situation we talk about a ‘write model’ and a ‘read model’. The write model is the storage used by the implementation of IAppointmentWriteRepository. The ‘read model’ is then, logically, the storage used by the implementation of IAppointmentReadRepository. Note that these models can be exactly the same. They don’t have to, however.

Command and Queries

The first two letters of CQRS stand for, respectively, Command and Query. Let’s look at what that means. Greg Young’s definition is as follows:

A command is any method that mutates state and a query is any method that returns a value.

This implies that a request to the data store is either a command or a query. Not a little bit of both. This means, by extension, that a command does not return a value. It also means that a query does not mutate state.

Why is this? One of the reasons behind CQRS is that it allows your read and write models to be asymmetric. Another very important reason is that it allows commands to become asynchronous. A command might be put onto a queue and executed at a later stage, which means you might have to wait for quite a long while before the command completes.

One of the most common patterns that would compromise this is using auto-generated identifiers. Something like this:

// returns ID of newly created user
int CreateUser(string userName, string password);  

Usually you need the result of this method, for example to redirect the user to his own profile page after an account was created. You have to wait for the CreateMethod to return before you can do anything. If creating the user is done asynchronously, this means the user is sitting there, waiting for something to happen.

In the CQRS pattern you would fix this by specifying the ID beforehand (using a GUID, for example). The CreateMethod can then return as soon as the request has been validated and it has been put on a queue to be executed. You can then show a page to the user telling them ‘please hold on, we’re processing your request’. Meanwhile you’re polling your read model to see if the account has been created successfully.

Synchronization

If you’re writing to and reading from separate models, how do you go about keeping the data up to date? Let’s assume we’re implementing the IAppointmentWriteRepository from earlier. You could add some code to the end of the CreateAppointment method that calls into another method that updates the read model for you.

That method should really be in another class, to avoid violating the Single Responsibility Principle (which is really the ‘Single Reason for Change Principle’). Let’s call it a Normalizer.

void CreateAppointment(int appointmentId,  
                       DateTimeOffset time, 
                       string title)
{
    var appointment = new Appointment(appointmentId, time, title);
    this.eventStore.Store(appointment.NewEvents);
    this.normalizer.AddAppointment(appointmentId, time, title);
}

There are a number of problems here. First of all, this violates the Open-Closed Principe. When you add other read models, you might have to change this repository as well. Second of all, the class knows too much about how the Normalizer works; a change to its parameters means this class will also have to change. And what if the logic for creating an appointment changes and we also want to store, for example, which attendees have been invited to which appointments?

All in all, this doesn’t sound like a very scalable and maintainable solution. So what is? The answer is: events. We already have events that describe what has happened to an appointment, since they are the source of truth the system is built upon. Why not broadcast those events and have the read models response to those events?

void CreateAppointment(int appointmentId,  
                       DateTimeOffset time, 
                       string title)
{
    var appointment = new Appointment(appointmentId, time, title);
    this.eventStore.Store(appointment.NewEvents);
    this.eventBus.BroadcastEvents(appointment.NewEvents);
}

Obviously we need some infrastructure for this, so let’s create a (simplistic) example.

interface IEventBus  
{
    void BroadcastEvents(IEnumerable<object> events);
}

interface IEventHandler<T>  
{
    void Handle(T @event);
}

Then our ‘event bus’ enumerates over all the events and notifies all known handlers for that event type. The implementation would then look something like this:

class AppointmentNormalizer:  
    IEventHandler<AppointmentCreated>,
    IEventHandler<AppointmentCanceled>,
    IEventHandler<AppointmentRescheduled>
{
    void Handle(AppointmentCreated @event)
    {
        // add a new appointment to your read model
    }

    void Handle(AppointmentCanceled @event)
    {
        // remove an appointment from your read model
    }

    void Handle(AppointmentRescheduled @event)
    {
        // update an appointment's time
    }
}

This nicely solves most of the problems, except that each method in the implementation of IAppointmentWriteRepository has to store events and then broadcast them. If we combine the two steps into one method, that foregoes a lot of code repetition.

CQRS and Event Sourcing

CQRS on its own is a very powerful pattern, but using it in combination with Event Sourcing is especially potent. Let’s look at a scenario.

We’ve created our little calendar application using Event Sourcing and CQRS. We’ve created a read model that aggregates appointments by month and stores the results in a NoSQL database. Alas, there is a bug in the code that updates the read model, so it doesn’t handle rescheduling across months. What do we do? Well, we start by throwing away the entire NoSQL database. Then we replay all the events in the system to the (bugfixed) read model code, which will rebuild the database from scratch. Problem solved!

This could have probably been solved in a non-Event-Sourcing system by writing a custom piece of code that loops over all the appointments and adjusts the data in the NoSQL database. The beauty of CQRS with Event Sourcing is: you don’t have to write any custom code! This is part of the normal way the system works!

Caveats

There are a number of consequences, depending on your implementation, that you should be aware of when implementing CQRS.

Your storage space requirements will, relatively, explode. Most storage nowadays is pretty cheap, so this is usually not such a problem. Still, it’s worth being aware of it. The complexity of your codebase will also increase significantly.

If your ‘event bus’ is asynchronous, you will have to come to grips with the realities of Eventual Consistency. In a lot of applications this is not really a big issue, but of all the potential problems to be aware of, this is one of the few that might impact users of your application, so it’s the most important one to be aware of. And, unfortunately, also one of the most mind-boggling ones.

It is important to realize that, like with Event Sourcing, the decision to use CQRS with separate models (note the emphasis) should not be made on an application-wide level. You should make this decision for each part of your application separately. For instance, it might be valuable to use separate models for appointments, but not for users. If you use separate models anywhere in your application, it’s not a bad idea to use the ‘separate classes for reading and writing’ part of CQRS in your entire application. If you can, of course.

Further reading

There is a lot of stuff to be found around the web about CQRS and its many wonderful advantages and dangerous pitfalls. Here is some stuff I found useful.