Sunday, January 24, 2016

Build Riak cluster from source

In the last couple of months I demonstrated what Riak is, what features it has, how easy to manage Riak. That is ok, and one can believe that it is easy to install/setup Riak, but the best way to prove that is to show how to setup a cluster.

Build a cluster

There are many ways to install Riak. There are built binaries per platform, so for example if you have an Enterprise RedHat Linux, you can choose the appropriate binary for you. Now we want to play with Riak, we don't really want to set up multiple physical (or virtual) hosts, but just build a cluster. We can easily do that by compiling the source and build a dev cluster.

Check Riak downloads page for binaries and the source package. With the next couple of commands you will have Riak source in your riak-2.1.3 directory.

mkdir ~/riak
cd ~/riak
curl http://s3.amazonaws.com/downloads.basho.com/riak/2.1/2.1.3/riak-2.1.3.tar.gz -O
tar xzf riak-2.1.3.tar.gz
cd riak-2.1.3

So far so good, now we need to build Riak with 5 nodes. You need to have Erlang 16 or 17 in your path to build Riak, so activate it at first. With DEVNODES=5 make devrel we can build Riak and create 5 dev nodes. As a result we get Riak nodes in dev/dev1, dev/dev2, etc directories with non-conflicting ports specified. You can check that in the dev/dev1/etc/riak.conf and so on. Here are some lines about ports in riak.conf.

...
## Name of the Erlang node
##
## Default: dev1@127.0.0.1
##
## Acceptable values:
##   - text
nodename = dev1@127.0.0.1
## listener.http. is an IP address and TCP port that the Riak
## HTTP interface will bind.
##
## Default: 127.0.0.1:10018
##
## Acceptable values:
##   - an IP/port pair, e.g. 127.0.0.1:10011
listener.http.internal = 127.0.0.1:10018

## listener.protobuf. is an IP address and TCP port that the Riak
## Protocol Buffers interface will bind.
##
## Default: 127.0.0.1:10017
##
## Acceptable values:
##   - an IP/port pair, e.g. 127.0.0.1:10011
listener.protobuf.internal = 127.0.0.1:10017
...

So we have 5 isolated Riak instances here. Let us run the first 3 and build a cluster. During cluster build we need to tell the nodes to join another designated node. In our example we will tell dev2 to join dev1, dev3 to join dev1. It won't happen instantly as we execute the commands, but a cluster plan has been created. We need to check the plan, and we need to commit the plan in order that changes kick in.

$ cd dev
$ for i in dev{1..3}; do $i/bin/riak start; done
$ dev1/bin/riak-admin member-status
=============================== Membership ================================
Status     Ring    Pending    Node
---------------------------------------------------------------------------
valid     100.0%      --      'dev1@127.0.0.1'
---------------------------------------------------------------------------
Valid:1 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

So nodes are running and with member-status we can check the dev1 holds the whole ring. Riak puts all key-value pairs in a ring which is divided into 64 parts by default, called vnodes or partitions. Now dev2 and dev3 nodes will join to dev1, and we check the cluster plan.

$ dev2/bin/riak-admin cluster join dev1@127.0.0.1
Success: staged join request for 'dev2@127.0.0.1' to 'dev1@127.0.0.1'
$ dev3/bin/riak-admin cluster join dev1@127.0.0.1
Success: staged join request for 'dev3@127.0.0.1' to 'dev1@127.0.0.1'
$ dev1/bin/riak-admin cluster plan
============================= Staged Changes ==============================
Action         Details(s)
---------------------------------------------------------------------------
join           'dev2@127.0.0.1'
join           'dev3@127.0.0.1'
---------------------------------------------------------------------------


NOTE: Applying these changes will result in 1 cluster transition

###########################################################################
                       After cluster transition 1/1
###########################################################################

=============================== Membership ================================
Status     Ring    Pending    Node
---------------------------------------------------------------------------
valid     100.0%     34.4%    'dev1@127.0.0.1'
valid       0.0%     32.8%    'dev2@127.0.0.1'
valid       0.0%     32.8%    'dev3@127.0.0.1'
---------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

WARNING: Not all replicas will be on distinct nodes

Transfers resulting from cluster changes: 42
  21 transfers from 'dev1@127.0.0.1' to 'dev3@127.0.0.1'
  21 transfers from 'dev1@127.0.0.1' to 'dev2@127.0.0.1'

You can see that how the ring will be distributed after dev2 and dev3 join to the cluster (which is a 1-node cluster right now). Now we commit the changes, and check how partitions will move.

$ dev1/bin/riak-admin cluster commit
Cluster changes committed

$ dev1/bin/riak-admin transfers
'dev3@127.0.0.1' waiting to handoff 1 partitions
'dev2@127.0.0.1' waiting to handoff 1 partitions
'dev1@127.0.0.1' does not have 8 primary partitions running

Active Transfers:

$ dev1/bin/riak-admin member-status
=============================== Membership ================================
Status     Ring    Pending    Node
---------------------------------------------------------------------------
valid      75.0%     34.4%    'dev1@127.0.0.1'
valid      17.2%     32.8%    'dev2@127.0.0.1'
valid       7.8%     32.8%    'dev3@127.0.0.1'
---------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

$ dev1/bin/riak-admin transfers
'dev3@127.0.0.1' waiting to handoff 3 partitions
'dev2@127.0.0.1' waiting to handoff 3 partitions
'dev1@127.0.0.1' waiting to handoff 15 partitions
'dev1@127.0.0.1' does not have 3 primary partitions running

Active Transfers:

$ dev1/bin/riak-admin transfers
No transfers active

Active Transfers:

When we see that there are no active transfers, we are ready, all the partitions are distributed. Let us check it with a riak-admin member-status.

$ dev1/bin/riak-admin member-status
=============================== Membership ================================
Status     Ring    Pending    Node
---------------------------------------------------------------------------
valid      34.4%      --      'dev1@127.0.0.1'
valid      32.8%      --      'dev2@127.0.0.1'
valid      32.8%      --      'dev3@127.0.0.1'
---------------------------------------------------------------------------
Valid:3 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Node failure

Let us simulate the situation when node 2 is down. Bring it down by dev2/bin/riak stop.

$ dev1/bin/riak-admin ring-status
================================ Claimant =================================
Claimant:  'dev1@127.0.0.1'
Status:     up
Ring Ready: true

============================ Ownership Handoff ============================
No pending changes.

============================ Unreachable Nodes ============================
The following nodes are unreachable: ['dev2@127.0.0.1']

This happens when dev2 crashes for some reason. Probably our monitoring system will detect this situation faster than we could by checking the ring status. But it also can happen that dev2 is not down, but it is unreachable (netsplit). Netsplits can happen when the nettick message cannot be received on time (overloaded network). Maybe the node itself is not overloaded however. So it is a different situation from node crash, so we need a different monitoring tool to detect netsplits.


Extending the cluster

Let us suppose that Black Friday is coming and we expect a growth in the number of transactions. We don't want to extend Riak cluster node by node, but we want to add two nodes at one step (which is the recommended way of extending the cluster). Let us start the two nodes and join them to the cluster.

$ dev4/bin/riak start
$ dev5/bin/riak start
$ dev4/bin/riak-admin cluster join dev1@127.0.0.1
$ dev5/bin/riak-admin cluster join dev1@127.0.0.1
$ dev1/bin/riak-admin cluster plan
$ dev1/bin/riak-admin cluster commit

We pretty much know what to expect from the commands, but I pasted here the cluster plan. It shows how many partitions will be moved during the cluster extension.

$ dev1/bin/riak-admin cluster plan
============================= Staged Changes ==============================
Action         Details(s)
---------------------------------------------------------------------------
join           'dev4@127.0.0.1'
join           'dev5@127.0.0.1'
---------------------------------------------------------------------------


NOTE: Applying these changes will result in 1 cluster transition

###########################################################################
                       After cluster transition 1/1
###########################################################################

=============================== Membership ================================
Status     Ring    Pending    Node
---------------------------------------------------------------------------
valid      34.4%     20.3%    'dev1@127.0.0.1'
valid      32.8%     20.3%    'dev2@127.0.0.1'
valid      32.8%     20.3%    'dev3@127.0.0.1'
valid       0.0%     20.3%    'dev4@127.0.0.1'
valid       0.0%     18.8%    'dev5@127.0.0.1'
---------------------------------------------------------------------------
Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Transfers resulting from cluster changes: 49
  4 transfers from 'dev3@127.0.0.1' to 'dev5@127.0.0.1'
  4 transfers from 'dev1@127.0.0.1' to 'dev2@127.0.0.1'
  4 transfers from 'dev3@127.0.0.1' to 'dev1@127.0.0.1'
  4 transfers from 'dev2@127.0.0.1' to 'dev4@127.0.0.1'
  4 transfers from 'dev1@127.0.0.1' to 'dev3@127.0.0.1'
  4 transfers from 'dev3@127.0.0.1' to 'dev2@127.0.0.1'
  4 transfers from 'dev2@127.0.0.1' to 'dev5@127.0.0.1'
  5 transfers from 'dev1@127.0.0.1' to 'dev4@127.0.0.1'
  4 transfers from 'dev2@127.0.0.1' to 'dev1@127.0.0.1'
  4 transfers from 'dev1@127.0.0.1' to 'dev5@127.0.0.1'
  4 transfers from 'dev3@127.0.0.1' to 'dev4@127.0.0.1'
  4 transfers from 'dev2@127.0.0.1' to 'dev3@127.0.0.1'

That is it

So basically we know how to build a cluster in our development machine. Obviously if we install a Riak cluster in a production server environment we need to act differently (install binary packages which uses system-wide /etc, /var/lib, /var/log directories. But the thinking is the same: new nodes always join to existing nodes in the cluster, and we always have to check the cluster plan.

Share:

0 comments :

Post a Comment

Richard Jonas. Powered by Blogger.

About me

My name is Richárd Jónás, live in Budapest, Hungary. In this blog I want to share my coding experiences in Erlang, Elixir and other languages I use. Some topics are simpler ones but you can use them as a reference. I also present some of my thoughts about developing distributed systems.