Learning Akka Streams

2016, June 28

This blog post differs from my usual ones; I’m writing it as I learn something. As such, it is more of a story that contains errors and misunderstanding than a factual blog post.

Hello World

This blog post follows me trying to learn how to use Akka Streams. I haven’t needed to use them before, and whenever I glance at the documentation, I usually get confused about just how many new terms are being introduced.

Runnable code is available here. I’ve included more type annotations that normal, as they will assist us discussing what is going on.

We start by compiling and running the ‘Hello World’ example in the Quick Start Guide.

Source will be widely used. It represents the inputs to the stream.

A couple of things stand out in the code. Firstly, the ActorMaterializer is new, compared to standard Akka. I have no idea what it does, but I’m guessing it could have been named “ActorFactory”.

Secondly, the Source takes two type parameters. The first is the type that the Source emits. The documentations says that “the second one may signal that running the source produces some auxiliary value (e.g. a network source may provide information about the bound port or the peer’s address)”. The first one makes sense. The second one doesn’t yet. Things will hopefully become clearer once I write something else.

Either way, this stream runs and prints out 1 to 100.

Using a Sink

Source –> Sink

The first example uses a Source. It is not really a Stream. We just send everything to stdout. Let’s use Sink, which are the outputs of a stream.

There seem to be lots of kinds of Sink, including a foreach one, which we can use to println again.

Interestingly, the type signature of this is Sink[Any, Future[Done]]. From reading the ScalaDoc, Done is essentially Unit, but they used included it so the code could also run on Java. We’ve also used that second mysterious type parameter.

ScalaDoc says “The sink is materialized into a [[scala.concurrent.Future]]”. Perhaps the ActorMaterializer has been used, and deals with side-effects? Let’s keep going, and see if a more complicated example makes it easier to understand.

Simple Transform

Source –> Flow –> Sink

So far, we generate a Stream from an Iterable, send it to a Sink, and then print it out. Let’s include an intermediate step, where we do some “stream processing”. For this, we need the Flow class. This is beginning to look more like the Stream I imagined.

We discovered another type now; RunnableGraph. This is a “Flow with attached input and output, can be executed.”

That makes sense. We’ve attached a Source, and a Sink, to our Flow. Therefore it has input and output, and should work.

RunnableGraph also specifies that it has a ClosedShape, which also hints at the role that a Materializer takes. I’m still yet to figure them out.

Graphs

All of our examples so far have been very linear. There has been no way of splitting the stream to do different pieces of work, or combine multiple sources.

One use-case I can think of when using Akka Streams is to persist events on the stream to a database, in addition to continuing the stream processing.

Source --> Broadcast --> Flow --> Sink
               |
               --> Save to DB

Akka Streams has a thing named Broadcast especially for this. However, constructing and using one is more complex than I imaged. You need to start using a mutable GraphDSL.Builder. The GraphDSL implies that we now need to learn what lots of funny symbols mean, such as ~>.

We need to build up a Graph[ClosedShape, Mat] where Mat is one of those Materializers that I still don’t understand.

Interesting, it seems as though Graph doesn’t type-check very well. It is possible to construct and run a Graph that doesn’t do anything except fail at runtime. The following code gets checked via a require assertion when running the code:

I’m not really sure how one would go about asserting this at compile-time. It definitely seems possible though, as other Scala libraries have similar builder patterns that type-check. Hopefully this will be addressed in a future release.

Anyway, lets try and build a Stream that saves events to a DB.

There is a lot of new stuff here. Starting with the easiest thing; I’ve renamed some vals to reflect what they do now. We also have a new Flow named intToEvent which maps an Int to a DB.Event case class.

We also have the GraphDSL syntax, and implicit conversions. ~> means “Add Edge to Graph” in my head. As DSLs go, it isn’t too bad. Arrows are Stream processing go hand-in-hand. I’ll try and use this syntax from now on. The Broadcast class also checks at compile time that we’ve linked all the specified ‘ports’. In our example, we create the Broadcast and say it will have two things listening to it. If we only connect, one, we get a runtime error.

Lastly, we have a new Sink named dbSink. I basically copied the code from inside the printlnSink and changed the map method. It seems as though in order to perform actions on a Stream, we need to materialize the values contained inside. I now assume that Streams are inherently lazy, and Materializing is the act of evaluating the Stream.

We need to dig into Materializers, and finally figure them out.

Basic Materializer

Having read the docs some more, it seems as though “materialization” is the thing that actually runs our Stream. When using Akka, actors are created (or materialized) in order to do the work. Makes sense. I can’t help but feel that this should have been more obvious at the start…

Source ~> Sink

We return to the simplest graph, with just a Source and a Sink. The Source generates Ints, and the Sink just takes the first one.

I’ve included code of two RunnableGraphs. They only vary in second arguments; Keep.right versus Keep.left. This seems to be the key to Materializers, and the mysterious second type parameter included everywhere in our Source, Flows, and Sinks.

In order to get any values out of a Stream flowing left to right, we need to keep the right values. Presumably, in a stream going the other way, we need to keep the left values. In our case, the right side of our graph has Future[Int] as the second type parameter. This is the one we need.

The types on these methods are very obvious; given two types select the one on the left, or the right.

This now also answers my questions above about the previously used Sink named dbSink. Here’s the implementation again:

DB.persistEvent returns a Future[Unit]. In order to actually evalutate these Futures, we need to materialize the stream. As we’re Keeping right, we pass our futures into Sink.ignore. If we kept left, we pass NotUsed. Whilst ignoring them doesn’t sound very useful, this actually runs them before ignoring whatever they return.

I feel like I understand roughly how Akka Streams work now.

Here’s a recap.

This is enough blog post for now. I’d like to continue learning Akka Streams; they seem a lot easier to use than Actors, and patterns are built in. Streams are more type-safe than Actors too, which hopefully will permit writing more maintainable code.