Simple event sourcing - introduction (part 1)

This is the first part of a series on building an event sourced application. We’ll build a simple blogging application (inspired by the Ruby on Rails “Getting Started” tutorial), so the domain should be familiar. This allows us to focus on implementing a memory image based architecture using event sourcing. Another goal is to show that this kind of architecture is not more complex (and arguably, simpler) than those implemented by traditional database centered applications. The example code is in Scala using the Play! web framework, but should be understandable enough if you come from a Java (or similar) background.

Other Parts

Part 1 – Introduction
Part 2 – Consistency
Part 3 – Redis Event Store
Part 4 – Conflict Resolution
Part 5 – Refactoring and Transactions
Part 6 – Users, Authentication, Authorization

Memory Image and Event Sourcing

The example application is build using a Memory Image. This means that the current state of the application is not stored in a database, but kept in main memory instead. This immediately raises the question of how our state is going to survive application restarts or crashes. Since we cannot save the entire state of our application on every change, we’ll need to keep something like a durable transaction log, just like databases do. To implement this every change our application makes is first represented as a domain event. These domain events are stored as our application’s transaction log, using an event store.

After a restart, we can replay the saved domain events to rebuild the in-memory data structure. This concept is known as event sourcing. Keeping this durable record of domain events also means we will continue to have access to all historical information, which is very useful for auditing, troubleshooting, or data mining purposes.

Keeping our current state in memory has some obvious advantages:

No need to perform mapping between the in-memory model and a database model. This allows you to quickly evolve your application without worrying about database schema migrations.
No need to access the database to answer a query or render a view. The database should no longer be the first scalability bottleneck you hit.

There are some disadvantages too:

Your data must fit in memory. Fortunately, you can start out with a pure in-memory implementation and later refactor your biggest data into a database. Being able to safely delay major architectural decisions until you really need to decide is core to agile architecture and development.
You must be able to safely and efficiently manage this memory. Garbage collection pause times can become problematic. With the OpenJDK Java VM you can probably get up to 10 Gigabyte or so, and Azul’s Zing JVM will go all the way to 2 Terabyte. The latter should be enough to handle tens of millions of blog posts.

Running the application

The code for this part can be found on https://github.com/zilverline/event-sourced-blog-example. You can use git clone -b part-1 https://github.com/zilverline/event-sourced-blog-example command to clone and checkout the code that matches the contents of this part.

To run the example you need to install either Play! 2.0.2 (or later) or sbt 0.11.3 or later. If you have a Mac and use homebrew you can simply install these using brew install play or brew install sbt.

After installing execute play run or sbt run to start the application in development mode. After downloading all required dependencies and compilation, the application should be available on http://localhost:9000/.

Click around a bit to see how everything works. The functionality is pretty minimal, so this shouldn’t take long.

The domain events

The domain events (and supporting classes) for blog postings are defined in PostEvents.scala. Since the application is (still) extremely simple, the events basically model adding a new blog posts, and editing or deleting an existing one. These basically mimic the typical create/update/delete actions, but are named in terms our users understand. This way we also capture the intent of the user, not just the effects of the action they just performed. Later we’ll add more events, such as Post Published and Comment Added.

sealed trait PostEvent {
  def postId: PostId
}
case class PostAdded(postId: PostId, content: PostContent) extends PostEvent
case class PostEdited(postId: PostId, content: PostContent) extends PostEvent
case class PostDeleted(postId: PostId) extends PostEvent

Notice that events are named in the past tense. This can be a bit confusing when writing code that actually generates the events, but at all other times events have always happened in the past so this naming convention makes sense.

The events are implemented using Scala case classes. The Scala compiler will translate these into ordinary JVM classes but will add the fields specified in the constructor, accessors, field based equals/hashCode implementations, and a factory method so that you do not need to use the new keyword when instantiating an object. Case classes can also be used with Scala’s match expression.

In Scala the type is written after the name of a variable or parameter. Explicit types are usually only required on parameters, as the Scala compiler can usually figure out the type in other cases by itself.

The events all extend the PostEvent trait (which is translated into a JVM interface). The PostEvent trait is marked as sealed so that it can only be extended in the same source file. This allows the Scala compiler to do compile-time analysis of match statements to ensure all possible cases are covered. This is very useful when new events are added later!

The events also use two support classes to represent the blog post’s identifier and content, respectively:

case class PostContent(author: String, title: String, content: String)

case class PostId(uuid: UUID)
object PostId {
  def generate(): PostId = PostId(UUID.randomUUID())

  def fromString(s: String): Option[PostId] = s match {
    case PostIdRegex(uuid) => catching(classOf[RuntimeException]) opt { PostId(UUID.fromString(uuid)) }
    case _                 => None
  }

  private val PostIdRegex = """PostId\(([a-fA-F0-9-]{36})\)""".r
}

The PostContent class is a simple case class with three fields, while the PostId class has a companion object with a generate factory method and fromString parse method.

Since events make up the durable record of everything that happened in our domain, they need to be stable. Stability can be achieved by getting the design right (hard!) and ensuring the events related definitions have very few dependencies on other code. See Stability (PDF) by Robert C. Martin for a great explanation of managing stability and dependencies within a program.

Keeping track of the current state

Although events are very useful to track transactional and historical information, they cannot easily be used to determine the current state. So in addition to capturing the events, we also need to derive the current state from these events. We do not necessarily need to track every piece of data, only what we need for our application to make decisions (by creating new events) or to show information to the user (by rendering views, or sending out emails, etc). The state needed for the blogging application is implemented in Post.scala:

/**
 * A specific blog post with its current content.
 */
case class Post(id: PostId, content: PostContent)

/**
 * The current state of blog posts, derived from all committed PostEvents.
 */
case class Posts(byId: Map[PostId, Post] = Map.empty, orderedByTimeAdded: Seq[PostId] = Vector.empty) {
  def get(id: PostId): Option[Post] = byId.get(id)
  def mostRecent(n: Int): Seq[Post] = orderedByTimeAdded.takeRight(n).reverse.map(byId)

  def apply(event: PostEvent): Posts = event match {
    case PostAdded(id, content) =>
      this.copy(byId = byId.updated(id, Post(id, content)), orderedByTimeAdded = orderedByTimeAdded :+ id)
    case PostEdited(id, content) =>
      this.copy(byId = byId.updated(id, byId(id).copy(content = content)))
    case PostDeleted(id) =>
      this.copy(byId = byId - id, orderedByTimeAdded = orderedByTimeAdded.filterNot(_ == id))
  }
}
object Posts {
  def fromHistory(events: PostEvent*): Posts = events.foldLeft(Posts())(_ apply _)
}

Since the model classes are in-memory only, we’re free to use regular Scala data structure classes to implement whatever query capabilities we need. There is no need to worry about Object-Relational Mapping, etc. Here we use a map to track each blog post by its identifier and a simple Seq (ordered collection or list) to keep track of the order that posts were added. Notice that this entire model is represented using immutable values, which allows us to safely share the current state between multiple concurrent requests.

The Posts.get method simply looks up a post by its identifier. The Posts.mostRecent method takes the last n added post identifers, reverses the result (so the most recently added post is first), and translates each identifier into a post using the byId map.

The Posts.apply method implements updating the current state based on an event. It basically matches on the type of event and updates its state accordingly. Posts.fromHistory builds up the current state by folding a sequence of events using the Posts.apply method. As with all immutable structures, an update results in a new copy of the original state to be returned with the necessary changes applied. Fortunately we do not need to copy everything to apply changes, only the parts that are changed need to be copied. This makes immutable data structures efficient enough to be practical.

Another advantage of using an in-memory model is that it can be thoroughly tested, with no need to start or stub a database. PostsSpec.scala uses both manually written examples and randomly generated data to test the model. Using the randomly generated date we can run hundreds of tests in less than a second.

Putting it all together: the UI

The UI pulls everything together into a working application. In this case it is a standard Play! 2.0 Scala application and the main work is done by the PostsController.

In this example application the PostsController keeps a Software Transactional Memory (STM) reference to the current state of the application. This reference is our only piece of mutable data in the entire application!

object PostsController extends Controller {
  /**
   * A Scala STM reference holding the current state of the application,
   * which is derived from all committed events.
   */
  val posts = Ref(Posts()).single

By using an STM reference any controller method can always access the current state simply by reading from the reference. New events are applied to the current state using the commit method:

/**
 * Commits an event and applies it to the current state.
 */
def commit(event: PostEvent): Unit = {
  posts.transform(_.apply(event))
  Logger.debug("Committed event: " + event)
}

The posts reference is updated by using the transform method. This ensures the update occurs in a concurrency safe manner. In this version of the application the events are not yet saved to durable storage, but to help you see what is happening while running the application the event is printed to the debug log instead. Later we’ll implement saving the committed events to durable storage before updating the current state. The commit method returns Unit, which is similar to void in Java and provides no useful information to the caller.

The PostsController.index method is invoked by Play! to render a list of posts (the mapping from URL to a controller method is configured in the routes file). The index method returns a Play! action which generates an HTTP response from an HTTP request. In this case the response is an HTTP OK (200) containing the most recent 20 posts as rendered by the index template. Notice the use of the parentheses to read the current value of the posts reference:

/**
 * Show an overview of the most recent blog posts.
 */
def index = Action { implicit request =>
  Ok(views.html.posts.index(posts().mostRecent(20)))
}

To change an existing post we need to implement a GET to render an HTML form, and a HTTP POST to validate the form and update the post. The postContentForm defines the mapping between the HTML add/edit form and the PostContent class using a Play! form:

/*
 * Blog content form definition.
 */
private val postContentForm = Form(mapping(
  "author"  -> trimmedText.verifying(minLength(3)),
  "title"   -> trimmedText.verifying(minLength(3)),
  "content" -> trimmedText.verifying(minLength(3)))(PostContent.apply)(PostContent.unapply))

The show method first tries to look up the specified post by its id. If successful it fills the form with the current contents of the post and renders the views.html.posts.edit template. Otherwise it returns an HTTP 404 (Not Found) response:

def show(id: PostId) = Action { implicit request =>
  posts().get(id) match {
    case Some(post) => Ok(views.html.posts.edit(id, postContentForm.fill(post.content)))
    case None       => NotFound(notFound(request, None))
  }
}

When the user has made the modifications and hits the save button the submit method is invoked. First we bind the HTTP POST parameters to the form (bindFromRequest) and then perform validation (fold). If form validation succeeds it commits a new PostEdited event and redirects the browser to the PostsController.show action to show the updated post. Otherwise the method rerenders the HTML form together with the validation errors:

def submit(id: PostId) = Action { implicit request =>
  postContentForm.bindFromRequest.fold(
    formWithErrors => BadRequest(views.html.posts.edit(id, formWithErrors)),
    postContent => {
      commit(PostEdited(id, postContent))
      Redirect(routes.PostsController.show(id)).flashing("info" -> "Post saved.")
    })
}

Note that our “domain logic” is implemented directly in the controller. This is fine for simple applications like this example, but a rich domain model is usually introduced when the logic to generate the events becomes less straightforward.

Performance

Like we discusses above one of the advantages of an in-memory model is that there is no need to talk to a database just to render a view. Even though this application has not been performance tuned, it is still interesting to get a basic idea of the performance. I ran some trivial benchmarks on my laptop (an early 2010 MacBook Pro with 2.66 GHz dual core i7 with hyper-threading).

Just rendering the blog posts index view runs at about 5,500 GETs per second (with the three example blog posts you see when you start the application). This was measured with Apache JMeter using 25 concurrent client threads running on the same machine as the server.

Just submitting the blog post edit form runs at 4,200 HTTP POSTs per second or so.

In both tests the server is basically CPU bound, since we’re not performing any disk I/O. It will be interesting to see how well we can do with an actual event store implementation. The server was running on JDK 1.7.0u5 with the Garbage First GC (JVM option -XX:+UseG1GC).

Conclusion

The example application shows a simplified implementation of the event sourcing and memory image principles. Domain events are generated and used to apply updates to the memory image, which is then used to render views, without the need for a traditional database.

Hopefully you’ve noticed that none of the code is very complex. Each part (events, model, controller actions) simply focuses on a single task, without any of the “magic” that is so common with many web- or CRUD-frameworks. Even the most complicated method (Posts.apply) is a rather straightforward translation of events into updates to the current state and is easy to thoroughly test.

But the events are not yet committed to durable storage, which is necessary to build a production-ready application. In the next part we’ll see what is needed to capture the events correctly so that we can start writing the events to durable storage.

Blog