Setup Mesos Multi-node Cluster on Ubuntu

Building on Setup Standalone Mesos on Ubuntu I will publishing at least two posts that walk through expanding the cluster to a multi-node high availability (HA) design that begins to approach what might be run in production. Rather than going directly from a single node with Mesos Master, Mesos Slave, Zookeeper, and Marathon co-located to a dozen or more nodes dedicated to Zookeeper, Mesos Master, etc. I opted to step through the process of incrementally expanding the cluster. I may be the only one but this method helped me understand the interaction between the architecture and configuration files. At the conclusion of this post we will have built a two node cluster–one Master and one Slave.

For those that might want to start with the finished product and work back from there I would recommend these options that
Michael Hamrah and Shingo Omura put together:

Move Mesos Slave to Another Node

The environment in the Setup Standalone Mesos on Ubuntu post had everything on a single node:

http://frankhinek.com/wp-content/uploads/2014/08/2014-08-28_mesos-standalone-cluster.png

In this initial step we’ll stop running a Mesos Slave on the existing node and spin up a second node with just a Mesos Slave. I assume you already have a two Ubuntu 14.04.1 LTS Server 64-bit systems built. When we are done the environment will be:

http://frankhinek.com/wp-content/uploads/2014/08/2014-08-28_mesos-2node-cluster.png

Build Node 1

We’ll follow exactly the same process as we did when building a standalone Mesos environment, but at the end we’ll stop the Mesos Slave from running on this node.

  1. Import the Mesosphere Archive Automatic Signing Key:
    $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
  2. Add the Mesosphere Ubuntu 14.04 Repo:
    $ DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]') $ CODENAME=$(lsb_release -cs) $ echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \ sudo tee /etc/apt/sources.list.d/mesosphere.list
  3. Download package lists and information of latest versions:
    $ sudo apt-get -y update
  4. Install Mesos, Marathon, and Zookeper:
    $ sudo apt-get -y install mesos marathon
  5. Reboot the system:
    $ sudo reboot

Mesos Master Configuration

The Mesosphere package we installed for Mesos includes an init script that automatically starts up a Mesos Slave. Since we don’t want Node 1 to provide a Slave and offer resources we’ll disable the service from running:

  1. Stop the Mesos Slave service:
    $ sudo service mesos-slave stop
  2. Disable the Mesos Slave service on Node 1:
    $ echo manual | sudo tee /etc/init/mesos-slave.override
  3. I’ve run into issues with systems that have multiple ethernet interfaces when the Master or Slave registers with a loopback or otherwise undesirable interface. To ensure that the Mesos Master on Node 1 listens on your preferred interface execute the command below. If Node 1 has an IP address of 10.1.1.10 the command would be:
    $ echo 10.1.1.10 | sudo tee /etc/mesos-master/ip
  4. Specify the master Zookeeper URL which the Mesos Master and Marathon will register with. The IP address used is the same interface address on Node 1 used in a previous step.
    $ echo zk://10.1.1.10:2181/mesos | sudo tee /etc/mesos/zk
  5. Specify a human readable name for the cluster which will be displayed in the Mesos web console:
    $ echo MyCluster | sudo tee /etc/mesos-master/cluster
  6. By default, the Master will use the system hostname which can result in issues in the event the system name isn’t resolvable via your DNS server. To avoid issues if you are working in a test or Vagrant environment without a resolvable system hostname switch to using the IP address:
    $ echo 10.20.0.10 | sudo tee /etc/mesos-master/hostname
  7. Restart the zookeeper, mesos-master, and marathon services:
    $ sudo service zookeeper restart $ sudo service mesos-master restart $ sudo service marathon restart

Zookeeper Configuration

Each Zookeeper needs to know its position in the quorum. In this tutorial there is only 1 Zookeeper, however, we’ll set the value so that we can modify it when the Zookeeper node count is increased. See the Zookeeper documentation for more detail.

$ echo 1 | sudo tee /etc/zookeeper/conf/myid

Build Node 2

We’ll install the same Mesos package as we did when building a standalone environment, but we’ll stop the Mesos Master and Zookeeper from running on this node. Marathon will not be installed on Node 2.

  1. Import the Mesosphere Archive Automatic Signing Key:
    $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF
  2. Add the Mesosphere Ubuntu 14.04 Repo:
    $ DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]') $ CODENAME=$(lsb_release -cs) $ echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \ sudo tee /etc/apt/sources.list.d/mesosphere.list
    Download package lists and information of latest versions:
    $ sudo apt-get -y update Install Mesos and Zookeper:
    $ sudo apt-get -y install mesos Reboot the system:
    $ sudo reboot

Mesos Slave Configuration

The Mesosphere package we installed for Mesos includes an init script that automatically starts up a Mesos Master and Zookeeper. Since we don’t want Node 2 to provide a Master or Zookeeper we’ll disable the services from running:

  1. Stop and disable the Mesos Master service on Node 2:
    $ sudo service mesos-master stop $ echo manual | sudo tee /etc/init/mesos-master.override
  2. Stop and disable the Zookeeper service on Node 2:
    $ sudo service zookeeper stop $ echo manual | sudo tee /etc/init/zookeeper.override $ sudo apt-get -y remove --purge zookeeper
  3. I’ve run into issues with systems that have multiple ethernet interfaces when the Master or Slave registers with a loopback or otherwise undesirable interface. To ensure that the Mesos Master on Node 1 listens on your preferred interface execute the command below. If Node 2 has an IP address of 10.1.1.11 the command would be:
    $ echo 10.1.1.11 | sudo tee /etc/mesos-slave/ip
  4. Now we need the Slave to discover the Master. This is accomplished by updating the /etc/mesos/zk to the master Zookeeper URL. The command below will cause the Slave to connect to the Zookeeper at 10.1.1.10, which is Node 1’s IP address.
    $ echo zk://10.1.1.10:2181/mesos | sudo tee /etc/mesos/zk
  5. By default, the Slave will use the system hostname which results in the Mesos Sandbox not being able to reach the Slave in the event the system name isn’t resolvable via your DNS server. To avoid issues if you are working in a test or Vagrant environment without a resolvable system hostname switch to using the IP address:
    $ echo 10.20.0.11 | sudo tee /etc/mesos-slave/hostname
  6. Restart the mesos-slave service:
    $ sudo service mesos-slave restart

Summary

In this first post of a series detailing the expansion of a Mesos environment from standalone to multi-node we built a second node and migrated the Mesos Slave role to it. I will publishing at least one more post that will walk through building additional nodes in a HA Zookeeper, Mesos Master, and Marathon configuration.