Richard Jonas's


To be honest, in the last couple of months when I implemented REST interfaces in cowboy, I always implemented the happy path, and I followed the okay-we-will-see principle in case of corner cases. That didn't sound good enough for me so I decided to dig deeper into how to properly implement a REST interface in cowboy. The first thing we need to take care is how many handler module we will need? The fewer is the better of course. We don't want to maintain a bunch of module for a bunch of different HTTP methods. It sounded logical for me that one module for one REST target (product, shop, user, etc).

The next question is how to handle different outcomes, like not found, not authorized, malformed request, etc. So I checked cowboy documentation and I found that in rest modules we need to implement an initialization and handlers for different content types. And for a while I lived with them happily... but they are not enough. There are other functions, callback functions which are good for something.

Cracking the code

In order to know how to implement a proper rest interface we need to understand how cowboy implemented rest handlers. Under the hood cowboy rest handlers are implement via a finite state automaton in the cowboy_rest module. Check the source! Yeah I know, it is intimidatingly long module. Don't fear it is easy. They are just states of the automata.

As you can see the first interaction with the rest FSM is the invocation of the rest_init/2. Here we have opportunity to check the HTTP request and set a default state which will be stored by the underlying rest FSM. Here you can make certain initialisations if you want (get some cookie values, set some of the etc.).

A bit below you can find a good example of handling/calling our callbacks. Check the known_methods/2 function. It will call our ?MODULE:known_methods/2 function. There is a wrapper function to execute that call, called call/3. It gives back no_call if there is no callback exported (so default case in a sense), or other values which we are giving back from our callback functions. If you are unsure what a callback function should give back, check the source of the appropriate function which calls our callback. Here you can see that we can say {:halt, req, state} in our function, or {["GET", "POST], req, state} as a good response. The next/3 function sets the next state of the rest FSM. Easy, right?


Since most of us understand better the graphs, here are the REST flowcharts which describe what happens on different return values of the different callbacks.

In the known_methods we can check and give back (OPTIONS method for querying them) the possibly methods for a resource. Generally it comes from the rest handler what are implemented what are not, sometimes it depends on the content type of the requests, so we can implement a little logic which generates the possible HTTP methods. In the allowed_methods we can restrict the methods for a specific resource (or a specific user given by a cookie). Let us say we have a read-only table in the SQL database, so we don't want to provide POST and PUT methods. In the is_authorized we can check permission depending on the current user, if we have such a feature.

We have content_types_provided callback for specifying what kind of content type will be generated which of our sub-handler functions. Later you will see how to bind the content type the client wants with the functions we have. In the content_type_accepted we can do the same for the incoming data in the request headers.

Let us implement a very simple data access CRUD like service for products. It will have

  • GET / for listing all the data in json
  • GET /:id for fetching a specific product (results in 200 or 404)
  • POST / for creating a new product (results in 201 CREATED)
  • PUT /:id for updating a product
  • DELETE /:id for deleting a product
You can implement PATCH as a homework. It is a partial update, when the client sends only the changing part of the object.

defmodule Store.ProductHandler do
  require Logger

  def init(protocol, _req, _opts) do"In init/3 #{inspect protocol}")
    {:upgrade, :protocol, :cowboy_rest}

  # init state to be the empty map
  def rest_init(req, _state) do
    {method, req2} = :cowboy_req.method(req)
    {path_info, req3} = :cowboy_req.path_info(req2)
    state = %{method: method, path_info: path_info}"state = #{inspect state}")
    {:ok, req3, state}

  def content_types_provided(req, state) do
    {[{"application/json", :handle_req}], req, state}

  def content_types_accepted(req, state) do
    {[{{"application", "json", :"*"}, :handle_in}], req, state}

  def allowed_methods(req, state) do
    {["GET", "PUT", "POST", "PATCH", "DELETE"], req, state}

  # Handling 404 code
  def resource_exists(req, %{:path_info => []} = state) do
    {true, req, state}
  def resource_exists(req, %{:path_info => [id]} = state) do"Checking if #{id} exists")
    case :ets.lookup(:repo, String.to_integer(id)) do
      [{_id, obj}] ->
        {true, req, Map.put(state, :obj, obj)}
      _ ->
        {false, req, state}

  # Handle DELETE method
  def delete_resource(req, %{:obj => obj} = state) do
    :ets.delete(:repo, obj["id"])
    {true, req, state}

  def handle_req(req, %{:obj => obj} = state) do
    # Handle GET /id
    {Poison.encode!(obj), req, state}
  def handle_req(req, state) do
    # Handle GET /
    response = :ets.tab2list(:repo)
                |>{_id, obj}) -> obj end)
                |> Poison.encode!
    {response, req, state}

  # Don't allow post on missing resource -> 404
  def allow_missing_post(req, state) do
    {false, req, state}

  # Handle PUT or POST if resource is not missing
  def handle_in(req, state) do
    {:ok, body, req2} = :cowboy_req.body(req)
    obj = Poison.decode!(body)"Accepting #{inspect obj}")
    :ets.insert(:repo, [{obj["id"], obj}])
    {true, req2, state}

(That is what I like in Elixir, I didn't even need to reformat my code after pasting it here ;) ).

You can see, in the rest_init I cache the method and the path elements in the state which is a map. In the handle_req function we will handle the requests which typically don't have request bodies (GET, HEAD). In the handle_in I handle incoming data (POST, PUT).

In the resouce_exists I check if a specific record exists in the ETS table. But if we get the index (the list of all objects) we don't have such id and we don't need such a check. If I check the existence of an object, I can cache the object, since it is very probable that I need to fetch that object again. I am caching it in the rest FSM state which will be freed after the rest request is terminated, served.

In the handle_req I split the two cases: fetching the index or fetching a specific object. In the handle_in we could get here if the request is a POST or PUT. So I get the body of the request, decode the JSON there with Poision. It gives back a map, so I can build my tuple which I store in the ETS table. Here we can give back true in case of generating a 204 NO CONTENT, or {true, url} to generate a 200 OK with body content.

In the allow_missing_post by giving back false we say that we don't want clients to send POST requests to non-existing resource. So POST / is allowed, and a key will be generated, but POST /99 is not allowed if product with id 99 doesn't exist. In the delete_resource we are covering the DELETE method.

From the flowchart you can see how for example a DELETE request can be handled in detail. Like if the object to be deleted, changed since we fetched the object, with the if-match header we can check if the request has the same Etag, etc. So we have a tons of choices to fully exploit the cowboy rest handlers.

For testing I use the Advanced REST Client Google Chrome extension. I records my http requests, so from the history I can easily replay them.

Future changes

If you see a very recent version of cowboy, you can see that several thing will be changed. For example rest_init and rest_terminate are removed, and init/3 is init/2 and the function returns are refactored too. Probably we will cover the changes later, but since this is the newer version of cowboy from we can safely use this version (1.0.4 at the moment).

In the last couple of months I demonstrated what Riak is, what features it has, how easy to manage Riak. That is ok, and one can believe that it is easy to install/setup Riak, but the best way to prove that is to show how to setup a cluster.

Build a cluster

There are many ways to install Riak. There are built binaries per platform, so for example if you have an Enterprise RedHat Linux, you can choose the appropriate binary for you. Now we want to play with Riak, we don't really want to set up multiple physical (or virtual) hosts, but just build a cluster. We can easily do that by compiling the source and build a dev cluster.

Check Riak downloads page for binaries and the source package. With the next couple of commands you will have Riak source in your riak-2.1.3 directory.

mkdir ~/riak
cd ~/riak
curl -O
tar xzf riak-2.1.3.tar.gz
cd riak-2.1.3

So far so good, now we need to build Riak with 5 nodes. You need to have Erlang 16 or 17 in your path to build Riak, so activate it at first. With DEVNODES=5 make devrel we can build Riak and create 5 dev nodes. As a result we get Riak nodes in dev/dev1, dev/dev2, etc directories with non-conflicting ports specified. You can check that in the dev/dev1/etc/riak.conf and so on. Here are some lines about ports in riak.conf.

## Name of the Erlang node
## Default: dev1@
## Acceptable values:
##   - text
nodename = dev1@
## listener.http. is an IP address and TCP port that the Riak
## HTTP interface will bind.
## Default:
## Acceptable values:
##   - an IP/port pair, e.g.
listener.http.internal =

## listener.protobuf. is an IP address and TCP port that the Riak
## Protocol Buffers interface will bind.
## Default:
## Acceptable values:
##   - an IP/port pair, e.g.
listener.protobuf.internal =

So we have 5 isolated Riak instances here. Let us run the first 3 and build a cluster. During cluster build we need to tell the nodes to join another designated node. In our example we will tell dev2 to join dev1, dev3 to join dev1. It won't happen instantly as we execute the commands, but a cluster plan has been created. We need to check the plan, and we need to commit the plan in order that changes kick in.

$ cd dev
$ for i in dev{1..3}; do $i/bin/riak start; done
$ dev1/bin/riak-admin member-status
=============================== Membership ================================
Status     Ring    Pending    Node
valid     100.0%      --      'dev1@'
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

So nodes are running and with member-status we can check the dev1 holds the whole ring. Riak puts all key-value pairs in a ring which is divided into 64 parts by default, called vnodes or partitions. Now dev2 and dev3 nodes will join to dev1, and we check the cluster plan.

$ dev2/bin/riak-admin cluster join dev1@
Success: staged join request for 'dev2@' to 'dev1@'
$ dev3/bin/riak-admin cluster join dev1@
Success: staged join request for 'dev3@' to 'dev1@'
$ dev1/bin/riak-admin cluster plan
============================= Staged Changes ==============================
Action         Details(s)
join           'dev2@'
join           'dev3@'

NOTE: Applying these changes will result in 1 cluster transition

                       After cluster transition 1/1

=============================== Membership ================================
Status     Ring    Pending    Node
valid     100.0%     34.4%    'dev1@'
valid       0.0%     32.8%    'dev2@'
valid       0.0%     32.8%    'dev3@'
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 42
  21 transfers from 'dev1@' to 'dev3@'
  21 transfers from 'dev1@' to 'dev2@'

You can see that how the ring will be distributed after dev2 and dev3 join to the cluster (which is a 1-node cluster right now). Now we commit the changes, and check how partitions will move.

$ dev1/bin/riak-admin cluster commit
Cluster changes committed

$ dev1/bin/riak-admin transfers
'dev3@' waiting to handoff 1 partitions
'dev2@' waiting to handoff 1 partitions
'dev1@' does not have 8 primary partitions running

Active Transfers:

$ dev1/bin/riak-admin member-status
=============================== Membership ================================
Status     Ring    Pending    Node
valid      75.0%     34.4%    'dev1@'
valid      17.2%     32.8%    'dev2@'
valid       7.8%     32.8%    'dev3@'
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

$ dev1/bin/riak-admin transfers
'dev3@' waiting to handoff 3 partitions
'dev2@' waiting to handoff 3 partitions
'dev1@' waiting to handoff 15 partitions
'dev1@' does not have 3 primary partitions running

Active Transfers:

$ dev1/bin/riak-admin transfers
No transfers active

Active Transfers:

When we see that there are no active transfers, we are ready, all the partitions are distributed. Let us check it with a riak-admin member-status.

$ dev1/bin/riak-admin member-status
=============================== Membership ================================
Status     Ring    Pending    Node
valid      34.4%      --      'dev1@'
valid      32.8%      --      'dev2@'
valid      32.8%      --      'dev3@'
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Node failure

Let us simulate the situation when node 2 is down. Bring it down by dev2/bin/riak stop.

$ dev1/bin/riak-admin ring-status
================================ Claimant =================================
Claimant:  'dev1@'
Status:     up
Ring Ready: true

============================ Ownership Handoff ============================
No pending changes.

============================ Unreachable Nodes ============================
The following nodes are unreachable: ['dev2@']

This happens when dev2 crashes for some reason. Probably our monitoring system will detect this situation faster than we could by checking the ring status. But it also can happen that dev2 is not down, but it is unreachable (netsplit). Netsplits can happen when the nettick message cannot be received on time (overloaded network). Maybe the node itself is not overloaded however. So it is a different situation from node crash, so we need a different monitoring tool to detect netsplits.

Extending the cluster

Let us suppose that Black Friday is coming and we expect a growth in the number of transactions. We don't want to extend Riak cluster node by node, but we want to add two nodes at one step (which is the recommended way of extending the cluster). Let us start the two nodes and join them to the cluster.

$ dev4/bin/riak start
$ dev5/bin/riak start
$ dev4/bin/riak-admin cluster join dev1@
$ dev5/bin/riak-admin cluster join dev1@
$ dev1/bin/riak-admin cluster plan
$ dev1/bin/riak-admin cluster commit

We pretty much know what to expect from the commands, but I pasted here the cluster plan. It shows how many partitions will be moved during the cluster extension.

$ dev1/bin/riak-admin cluster plan
============================= Staged Changes ==============================
Action         Details(s)
join           'dev4@'
join           'dev5@'

NOTE: Applying these changes will result in 1 cluster transition

                       After cluster transition 1/1

=============================== Membership ================================
Status     Ring    Pending    Node
valid      34.4%     20.3%    'dev1@'
valid      32.8%     20.3%    'dev2@'
valid      32.8%     20.3%    'dev3@'
valid       0.0%     20.3%    'dev4@'
valid       0.0%     18.8%    'dev5@'
Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Transfers resulting from cluster changes: 49
  4 transfers from 'dev3@' to 'dev5@'
  4 transfers from 'dev1@' to 'dev2@'
  4 transfers from 'dev3@' to 'dev1@'
  4 transfers from 'dev2@' to 'dev4@'
  4 transfers from 'dev1@' to 'dev3@'
  4 transfers from 'dev3@' to 'dev2@'
  4 transfers from 'dev2@' to 'dev5@'
  5 transfers from 'dev1@' to 'dev4@'
  4 transfers from 'dev2@' to 'dev1@'
  4 transfers from 'dev1@' to 'dev5@'
  4 transfers from 'dev3@' to 'dev4@'
  4 transfers from 'dev2@' to 'dev3@'

That is it

So basically we know how to build a cluster in our development machine. Obviously if we install a Riak cluster in a production server environment we need to act differently (install binary packages which uses system-wide /etc, /var/lib, /var/log directories. But the thinking is the same: new nodes always join to existing nodes in the cluster, and we always have to check the cluster plan.

In this second blogpost I introduce some advanced features of Erlydtl templating (in the first post I showed how to create custom behaviours) like template inclusion, extension and basic setup of a template-enabled Erlang application.

Set up environment

Create a directory and download For the last couple of projects I used over rebar because for me it is better customizable.

curl -O

In the project we will have src and templates directories. Let us create a Makefile with which we can compile the application including the templates, and also create the release itself. The Makefile looks like this

PROJECT = product
DEPS = cowboy erlydtl


Issuing a make command will bootstrap and then we can compile the application by make app. Let us create a simple Restful application.

Restful Cowboy

The product application will have and application description file (, the application behaviour (product.erl), a supervisor (product_sup.erl) and a rest handler (product_rest.erl). The supervisor won't do anything but start cowboy and register dispatch rules. That is the minimal set of Erlang files we can live with. File will be put into src directory. See the file

{application, product, [
    {description, "Simple product RESTful app"},
    {id, "product"},
    {vsn, "0.0.1"},
    {modules, []},
    {applications, [
        kernel, stdlib,
        cowboy, erlydtl
    {registered, []},
    {mod, {product, []}}

It contains our minimal needs, the dependent applications (cowboy and erlydtl), and it starts the product application module.


-export([start/2, stop/1]).

start(_StartType, _Args) ->
    Rules = [{'_', [{"/product/:id", product_rest, []}]}],
    Dispatch = cowboy_router:compile(Rules),
    cowboy:start_http(product_http, 5, [{port, 8080}],
                      [{env, [{dispatch, Dispatch}]}]),

stop(_State) ->

The application module starts the supervisor and besides registering cowboy dispatch rules it starts cowboy.



start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
    {ok, {{one_for_one, 1, 1}, []}}.

Now we stop a bit and create the relx.config to see if we have the minimal application which we want. With relx the release creation is very simple, we just need to specify the release name and the applications we use and that i is. During development it is a good idea to specify dev_mode, in this way the available OTP release won't be copied into our _rel directory but symlinks will be created. Put the relx.config into the project root, will download relx executable automatically, so when we execute make rel make will call relx to create the release in the _rel directory by default. If the extended_start_script is true it will create a product script into _rel/product/bin directory with which we can start the application (or by make run).

{release, {product, "0.0.1"},
          [cowboy, erlydtl, product]}.
{extended_start_script, true}.
{dev_mode, true}.

With make rel the Erlang release is build, so with make run we can run the application. Bingo.

Implement rest handler

For the sake of simplicity the rest handler contains a wired database. It contains a generic product (with id 1) which will be displayed by generic.dtl template (see later), and another product which is rendered by guitar.dtl template.



init(_Protocol, _Req, _Opts) ->
    {upgrade, protocol, cowboy_rest}.

content_types_provided(Req, State) ->
    Handlers = [{<<"application/json">>, handle_get}],
    {Handlers, Req, State}.

content_types_accepted(Req, State) ->
    Accepted = [{{<<"application">>, <<"json">>, '*'}, handle_post}],
    {Accepted, Req, State}.

allowed_methods(Req, State) ->
    {[<<"GET">>, <<"POST">>, <<"OPTIONS">>], Req, State}.

handle_get(Req, State) ->
    {Param, _} = cowboy_req:binding(id, Req),
    Id = binary_to_integer(Param),
    {ok, Msg} = case {Id, get_product(Id)} of
                    {1, P} ->
                        generic_dtl:render([{product, P}]);
                    {2, P} ->
                        guitar_dtl:render([{product, P}])
    {Msg, Req, State}.

%% Sample POST handler for sake of example :)
handle_post(Req, State) ->
    {ok, Body, Req2} = cowboy_req:body(Req),
    {process_json(Body), Req2, State}.

process_json(Binary) ->
    case post_handler(Binary) of
        ok ->
        {error, _Reason} ->

get_product(Id) ->
    case Id of
        1 ->
            #{id => 1,
              description => <<"Guitar leather bag">>,
              category => #{name => <<"Other">>}};
        2 ->
            #{id => 2,
              description => <<"Jackson SL-3">>,
              category => #{name => <<"Electric guitar">>},
              frets => 24,
              body => <<"Alder">>,
              pickup => <<"Seymour Duncan">>}

post_handler(_) ->

This is the longest module, there are mandatory functions which implements the REST API (all exported functions). The get_product/1 function is the wired product database, and in the handle_get/2 we will get the Id sent in the URL path, and also gets the product and then render it conditionally.

The base template is generic.dtl which is

    "id": {{ }},
    "description": "{{ product.description }}",
    {% block category %}
    "category": "{{ }}"
    {% endblock %}
    {% block specific %}
    {% endblock %}

It defines the generic part serializing id and description which are in every product. We give a default implementation for category, and we are waiting the specific part, which will be defined by specific products like guitars, keyboards, etc. The guitar.dtl template extends the generic template, and refines the implementation of category.

{% extends "generic.dtl" %}

{% block category %}
    "category": "Guitar/{{ }}"
{% endblock %}
{% block specific %},
    "frets": {{ product.frets }},
    "body": "{{ product.body }}"
{% endblock %}

We can use {% include "file.dtl" %} for externalizing complex and/or reusable parts of the template. It is not a big deal, all variables which are in the including template will be visible in the included template.


So when we need to create formatted messages which collect a number of variables or behave differently depending on some input parameters we can use ErlyDTL template with success. Also, creating JSONs or maps which are the datasource of JSONs, can make the source hard-to-understand. A lot of boilerplate code, and only a small lines of real business logic. If it is the case, use templates. In part 1 you can learn how to implement custom ErlyDTL library, which gives the possibility to enrich the functionality of the templates you create.

As Java doesn't have neither Erlang has a standard tool with which we can handle dependencies, compile the application modules and generates Erlang modules. In Java the Maven and Ivy are for dependency handling, the Erlang world also has a palette from which we can choose if we are talking about dependency handling.

Later in my projects I used rebar which was good, but as I wanted to extend my build with specific steps (generating sys.config from a template, or starting riak during testing) I found it difficult to solve those problems with rebar. Then I found At first I found it overly complex, but as I made more and more project builds with I liked it. I liked it because of its extensibility. In this post I like to show how works and how one can extend the build lifecycle.

How works

Basically is a big parametric Makefile. At first we need to provide the parameter values and then we can include Let us create an empty directory and download bootstrap file.

curl -O
And create a Makefile.
PROJECT = webshop
PROJECT_DESCRIPTION = A webshop application

DEPS = cowboy jsx ejson lager riakc

dep_ejson = git


In a typical makefile there will be a project identification and description, and also the DEPS variable will contain a list of dependencies we require. Dependencies are project names, so a question can pop into our mind, how does know what is the url of jsx or cowboy. contains the name, repo url of typical Erlang dependencies, so they are in the file. However ejson is not such a module therefore I need to specify the repo url by specifying the dep_ejson variable.

After running make, will be downloaded and then all the dependencies are fetched and compiled. With make help you can see the targets defines by default. With make deps gets the dependencies from various source repositories. If you add a new dependency you need to fetch that by make deps, if they are not fetched automatically. make app compiles the dependencies and the application itself. With make test we can run unit and common tests. make rel builds the Erlang release and what we can run with the generated start script. Or we can run the project with make run.

As always with environment variables we can specify parameters to the targets, like if we want to run unit tests with cover enabled run COVER=1 make test cover-report. In that way we can specify ERLC_OPTS which is the command line switches of the Erlang compiler. Also, we can write a build environment sensitive makefile, so with ENV=dev make run we can run the application in development environment.

Build lifecycle

Simple makefiles can be debugged with make -sn which prints to the stderr what make will do. Try this with, it will result in a tons of messages. It is because creates variables which contain Erlang code snippets, which will be evaluated by erl -eval. It is a fair way of defining custom build targets by writing Erlang codes. So it is way easier is to open the source of which shows you what will be done. At first this many-thousand-line makefile can be intimidating but after a bit of analyzation you can see that it starts with the general part, then there are embedded parts (like unit testing, cover, dialyzer), then it contains some thousand line of knows dependencies, then the 3rd party plugins come.

In makefile one can write something like

.PHONY: compile eunit

    erlc src/*.erl

eunit: compile
    erlc test/*.erl -o test
    erl -pz ebin test -eval "run the eunit tests :)"

Which is nice if the parameters are correctly specified. Compile target compiles the source, eunit target depends on compile and it also compiles tests and run them. The only problem is that if we define such a framework, nobody can make hooks which can be execute before or after compilation or running unit tests.

In most cases uses double-colon rules. There may be more double-colon rules with the same name. In that case all rules will be run in the order of their occurence. App target is such a double-colon target, so one can hook commands before and after running app target. If we write a rule before including that rule will occur earlier that the rules in, so it will be execute before the app rule, and vice versa. So one can filter sources before compilation and make an archive file then.

    filter source files, replace things in them


    tar vxfz beams.tar.gz ebin/

The only problem with this solution is that in Makefile the first target will be the default target. And we overwrote the default target of which was all and now it is app ( had all:: deps app rel as a first target). So we need to write a .DEFAULT_GOAL: all somewhere in the Makefile to set the default target back. Why is it a problem? Because if we are overriding the rel target, the rel target will be the default. And if somebody uses our project as a dependency, when builds dependencies it will go in each deps/project directory and executes a make command without specifying the target. In our case the rel (or app) will be the default target which may or may not work.

But back to the business let us build a release.

Specify relx.config

In default uses relx to build releases. Relx makes Erlang release creation very simple, at least it simplifies the first steps very much. To run relx we need to have a relx.config file which describes what relx needs to do. Here we have a simple relx.config

{release, {webshop_release, "0.0.1"}, [
{extended_start_script, true}.

relx relies on application app file (in this case ebin/ content, namely on the application tuple which contains the dependencies. So in relx.config we don't need to specify the dependencies of our applications, only our applications. And relx will traverse the app files and collect all the applications required. So after run make rel the release is created in the _rel directory.

How to go on?

We are just scratching the surface of what we can do with There are a lot of plugins which can execute not-so-popular tasks, and we also can define 3rd party plugins for Anyway in this post I described the basic idea how can be used, advanced topic can be understood after understanding this introduction. The main takeaway is, if you have question about, always use the source.

Last time I needed to implement a RESTful service in Erlang which got a large JSON and depending on the content it needed to react on different events (like a new product registered in a webshop, etc.). Reacting meant that the application should have called services with JSON response bodies. I don't know if you made even medium sized JSONs (or maps) in your source, it doesn't look like good.

Let us imagine that we have a product service, which needed to provide various information about products like code, desciption, price. We have some database which can give us maps describing our products. Those maps can be any deep, so we want some chaining access of the 3 or 4 level deep things in the maps. The products look like these


-spec create_json(Product::map()) -> map().
create_product() ->
    #{id = 483,
      product_code = "C7HR-BCH",
      brand = "Schecter",
      description = "Schecter C7HR-BCH",
      category = #{name = "electric guitar",
                   category = #{name = "7 string model"}},
      frets = 24,
      body = "mahogany",
      pickup = "2x EMG 707TW"}.

map_to_json() ->

In this module we handle data as maps. Maps can be converted to JSON format and back. I am using jsx library to do that. During decoding we should give hint to jsx that we expect maps by jsx:decode(Binary, [return_maps]).

In this simple example you can see what happens when we have embedded JSON objects in the code. When I needed to get object paths from JSONs the situation was even worse. So let us suppose that we need to create a category path from this product resulting "electric guitar/7 string model" string.


get_category_path(Product) ->

get_categories(Product) ->
    case maps:is_key(category, Product) of
        true ->
            C = maps:get(category, Product),
            [map:get(name, C)] ++ get_categories(C);
        false ->

The problem is that we need to take care of safeness of accessing map elements. Otherwise we will face with badkey exceptions. That can make the code complex even if we know that if there is no such field as name and the empty string will be fine for now. The category path will be rendered on the webpage anyway, and someone will fix it. But we don't want to crash the page generation.

Solve the problem with templates

ErlyDTL implements Django templates in Erlang environment. One should write templates and save them into files with .dtl extensions. The ErlyDTL compiler will compile them to Erlang source files. In the next step they will be compiled to beam files with erlc as usual. A template will be a generated Erlang module, so for example my.dtl will be my_dtl module. That module has a render(Vars) function which renders the template with the context we provide by giving the variables to the render function. As a result we will have an iolist which is good for optimization point of view, but we need to be aware of having an iolist, when we pass that result to our functions (io:format("~s", ...) handles iolists but now every function prepared for iolists).

    "id": {{ }},
    "productCode": "{{ product.product_code }}",
    "brand": "{{ product.brand }}",
    "description": "{{ product.description }}",
    "category": "{{ }}",
    "subCategory": "{{ }}",
    "frets": {{ product.frets|default:22 }},
  {% if product.body %}
    "body": "mahogany",
  {% endif %}
    "pickup": 2x EMG 707TW"

Templates contain tags, expression and pure text (see ErlyDTL Github page for details). Between double brackets you can write an expression, which evaluates in the variable context which is passed to the render() function of the module. So basically I need to define a variable product and put them in the context.

my_dtl:render([{product, product:create_product()}]).

With tags like if or ifequal we can write control structures, so if guitar body is not specified we don't write such a property in the JSON. Also with filters I can say that if the number of frets are not specified, let the default be 22. It depends on the business logic, but probably you understand what they are good for.

Custom tags, filters

In most situations the functionality what Django/ErlyDTL gives us is enough, but as always the 10% of the problems make software development complex. So we are almost there, but we need to query the product price from an external database. Or not the price but the availability. How to solve that problem? Since a template can contain only pre-defined variables and expressions, and can contain control structures or things which helps to format texts... so what to do?

The big power what ErlyDTL gives is a possibility to extend the functionality by creating custom tags and filters, or in other words a custom library. To create such a library we need to create a module which implements erlydtl_library behaviour. The module should provides all the filter and tag names it defines. Also it needs to export the functions which implements tags and filters. A custom filter is a one or two parameter function which gets the value of the variable we want to filter. The second optional parameter is the parameter of the filter (like the default value in case of default filter in the example above). Custom tags are two parameter functions which get the variables provided in the parameter list and a rendering option list. A custom tag may return with new variable bindings, so by executing a custom tag we can define variables in the page context. Very powerful tool.


-export([version/0, inventory/1]).
-export([get_price/2, frets/2]).

version() -> 1.
inventory(filters) -> [frets];
inventory(tags) -> [get_price].

%% In the lack of a fret number, we can give defaults
%% depending on the guitar brand
frets(undefined, Brand) ->
    case Brand of
        <<"Schecter">> -> 24;
        <<"Jackson">> -> 24;
        _ -> 22
frets(FretNum, _Brand) ->

get_price(Vars, _Opts) ->
    case lists:keyfind(id, 1, Vars) of
        {id, Id} ->
            %% Let it crash if service fails
            {ok, Price} = guitar_store:get_price_by_id(Id),
            [{value, Price}];
        false ->
            %% No id specified, we can crash or we can
            %% leave the context variables as they are

We have the module, so we need to compile it and we need to explain erlydtl compiler that we have a library to be loaded during compilation. We can do it by adding {libraries, [{guitar, guitar_lib}]} to the compiler options. With the guitar name the guitar_lib module will be accessible in the templates. So we can rewrite the template a bit now.

{% load guitar %}
{% get_price as price %}
    "id": {{ }},
    "productCode": "{{ product.product_code }}",
    "brand": "{{ product.brand }}",
    "description": "{{ product.description }}",
    "price": "{{ price.value|default:"n/a" }}",
    "category": "{{ }}",
    "subCategory": "{{ }}",
    "frets": {{ product.frets|frets:product.brand }},
  {% if product.body %}
    "body": "mahogany",
  {% endif %}
    "pickup": 2x EMG 707TW"

We don't want 0 prices otherwise if somebody manages to buy a product in that price, we need to ship it them. Also, frets filter gets the brand of the product in order that it can get the parameter by which it can give sensible defaults. With the load tag we can load the library, so all the features of that library will be accessible. If we are using a small number of modules whose functionalities don't collide, we can load the modules in default by specifying the {default_libraries, [guitar]} tuple in the compiler options.

Custom tags, filters

So custom library is a powerful feature of ErlyDTL since we can implement custom business logic in templates. Also, in the lack of set tag now we can implement a specific variable setter (including some business logic). As always, the advice is that don't make template library overly complex since the goal is to have an easy-to-read template.

Actually when I worked as a Java developer I didn't hear about the concept of that testing. We wrote small unit tests, some integration tests and a lot of end to end tests. But that kind of testing didn't come up somehow.

What is property-based testing?

Property-based testing is a good additional test when we have enough unit tests but we want to make sure that our functions or modules prepare for any type of incoming data possible. The main idea behind the two tools I know (PropEr, QuickCheck) that let us not to write test cases, let us generate them instead. From the function specifications one can easily guess what kind of inputs a function can receive. If we have an 'add/2' function and the specification says that it adds two numbers, we know that both parameters will be a number. So we can generate infinite number of test cases.

The problem is that we don't know the expected result since we don't know the input parameters. Ok, we can write that 'add(x, y) =:= x + y' but in that case we need to reimplement the function itself inside the test code. We don't want to do that. Instead we can find properties of those operations with which we can describe their nature.

Add is symmetrical, so 'add(x, y) =:= add(y, x)'. It is trivial here but testing an 'equals' method in Java that way is a very very useful test. Property-based tests become much more useful when we have the reverse operation at hand. Imagine that we implemented a 'sub/2' function which subtracts the second parameter from the first one. Great, we can test the two functions together we can write 'sub(add(x, y), y) =:= x'. If we implement a test that way, PropEr tool will generate 100 tests with random numbers, and it checks if the condition we have written is true. Sometimes we get surprising test fails because we didn't know about -0 or 0.0 or -0.0, things like that.

If the test fails PropEr will have the exact test case on which our test failed. It can be very complicated containing long lists or big float numbers and can be hard to understand the error if the tool just spit out those numbers. Instead those tools shrink the test case, they convert the test case to a simpler form and checks if it still fails. If the minimal failing test case found it is reported.

JSON name conversion

I implemented a simple framework which helps to convert Erlang records to JSON. Right now it can convert Erlang record to JSON but there is not way back. So I started to implement decoding too, and since encoding and decoding are two reverse operations we can use QuickCheck or Proper to implement both operations.

The project is here: ejson github repo and our first task is to convert an Erlang atom to json string. Unfortunately Erlang atom set is wider that json names and since we want to convert json values to Javascript object we need to make some restrictions on record field names.

A record field should look like this way: 'number_of_connections', and it should be converted into 'numberOfConnections'. The atom contains small letters and underscores (numbers as well), and the json name will be camel cased accordingly. If there is an underscore in the name the next character will be a capital letter. Those restrictions make it possible to convert the json names into Erlang atoms unambiguously.



all_test() ->
                                   [{to_file, user}])).

identifier_char() ->
        {$z - $a + 1, choose($a, $z)},
        {3, $_},
        {10, choose($0, $9)}

record_name() ->
    ?LET(Chars, list(identifier_char()),

camel_case_prop() ->
        ?SUCHTHAT(R, record_name(),
                CC = ejson_util:atom_to_binary_cc(Name),
                ejson_util:binary_to_atom_cc(CC) =:= Name

I am using Proper and eunit together. This module has an eunit unit test which is the main entry point. It is picked by eunit and executed. It calls Proper in order to check the 'camel_case_prop' test. The test basically says that for all Name generated if we convert the name to a binary (cc means the camel case) and that we convert that binary back, the converted atom and the generated atom should equal.

The FORALL macro is the executor which generates test cases (see the documentation). The test case will be put in the Name variable. The function record_name() is a generator which generates atom we specified above. It generates a list of identifier characters where those characters are generated by another generator. Inside identifier_char() there is a frequency (generator too) which generated weighted test cases. With 27 weight it will choose between 'a'-'z', with 3 weight it will be an underscore and with weight 10 it will be a decimal.

Obviously we need to filter out some names like '1st_step' or '_main' so we need to include SUCHTHAT macro to filter out all test cases which doesn't conform to 'is_convertable_atom/1' (which enforces those rules). So let us create an erlang module 'ejson_util' and start to put the functions there.

is_convertable_atom(Atom) ->
    %% true if the atom can be converted by the
    %% two functions unambiguously
    L = atom_to_list(Atom),
    start_with_char(L) andalso proper_underscore(L).

start_with_char([L|_]) when L >= $a andalso L =< $z ->
start_with_char(_) ->

%% If there is an underscore, it needs to
%% follow by a letter
proper_underscore([]) ->
proper_underscore([$_, L | T]) when L >= $a
                            andalso L =< $z ->
proper_underscore([$_ | T]) ->
proper_underscore([_ | T]) ->

Now Proper can generate test cases. Let us implement the atom-binary conversions during continuously running property tests. Tests can be run this way:

./rebar compile
./rebar eunit apps=ejson

Try to implement the functions by yourself, it is very useful experience. At first the test will fail with the empty atom '', and so on. Now I put here the final implementation of the two functions and the utility functions as well.

atom_to_binary_cc(Atom) ->
    CC = camel_case(atom_to_list(Atom), []),

binary_to_atom_cc(Binary) ->
    UScore = underscore(binary_to_list(Binary), []),

camel_case([], R) ->
camel_case([L], R) ->
camel_case([$_, L | T], R) ->
    camel_case(T, [string:to_upper(L) | R]);
camel_case([H | T], R) ->
    camel_case(T, [H | R]).

underscore([], R) ->
underscore([Cap | T], R) when Cap >= $A
                      andalso Cap =< $Z ->
    underscore(T, [Cap + 32, $_ | R]);
underscore([Low | T], R) ->
    underscore(T, [Low | R]).
Recently I needed to write a massive amount of sensor data into a database and I quickly ran into its limitations. After some analysis I found that the data can be written independently based on the source from where the data are coming. So the solution can be that writing the sensor data into files belonging to the individual sources.

There were no real problems of that solution until I needed to implement a Bitcask-like merge operation. During that operation we open a data file for reading, create a new file for writing and we read all records from the first file, check if some retention condition can be hold, and write the record in the new file if we need to keep that record. It requires a massive amount of write of small data (around 1 KB). The speed of the copy wasn't very convincing, to be gentle.

Erlang file types

In Erlang there are two types of file what we can use. The first is (non-raw) file which spawns a process for the file, so every file operation is a message passing to that process which reacts to the message and reads, writes data. One can feel that it can work with larger binaries but it won't perform brilliantly if the binaries are small. The other is a raw file when there is no controlling process spawned so only an erlang port what we have wrapped. Then according to some fprof profiling the big amount of computing time is spent during port communication.

Fast file

That is what drove me to implement a fast_file based on Joe Armstrong's idea. Instead of writing the data into somewhere which requires cross-context call (kernel call or port command), let us collect the data into a buffer and when the buffer grows big enough just flush the buffer.

Fast file module defines a record which holds a buffer for reading and writing. Yes, one buffer. If we are writing data we are using that buffer as a write buffer. If we want to read, the buffer is synced and we can use it as a read buffer. So fast file remembers the last operation, too.


I wrote small and bigger chunks of binaries into normal Erlang file, raw file and fast file. I ran the tests on my laptop (Core i5 2.4GHz, 6GB ram, 640GB HDD 5400rpm ext4).

TestNormal file   Raw file   Fast file
100 big280ms15ms24ms
1000 big2 336ms123ms222ms
10000 small     338ms153ms7ms
100000 small    2 366ms1 604ms79ms
200000 small    4 854ms3 088ms163ms

In case of one million writes only fast file didn't run into timeout (763ms). We can see that buffering is still a good use case.

How dangerous to buffer data?

I can see questions like what if the process, Erlang VM or OS crashes? Since fast_file creates an ever changing record we need to update our fast file record whenever a read or write happens. The usage of normal file is much more comfortable, we have an {ok, file:io_device()} and reads and writes leave the io device (the port in most cases) unchanged.

If process crashes we lost some data what haven't written yet. The good news is that we don't cross record boundaries during writing, so we don't need to repair the file when we open after a crash. In case of Erlang VM crash, the story is the same. In case of OS crash, it depends on how OS handles the file buffer. Linux knows a commit=nrsecs option during mounting a device. It means that in every nrsecs seconds Linux will sync all data to the device. If the crash happens between two commits there is a change of data loss.

Till I find a good place for my implementation you can check Joe's elib1_fast_write.erl.
Previous PostOlder Posts Home