6 min read

This post has two parts. In this first part I introduce Ryba, its goals and how to install and start using it. Ryba bootstraps and manages a full secured Hadoop cluster with one command. In Part 2 I detail how we came to write Ryba, how it is multi-tenancy addressed, and we will also explore the targeted user.

So, let’s get started. Ryba is born out of the following needs:

  • It can be resumed to the system operator as a single command, ryba install, to bootstrap freshly installed servers into fully configured Hadoop clusters, as a system operator.
  • It relies on the Hortonworks distribution and follows the manual instructions published on Hortonworks website, which makes it compatible with the support offered by Hortonworks.
  • It is not limited to Hadoop. It configures the system, for example local repositories or SSSD, as well as any complementary software you might need.
  • It has proven to be flexible enough to adjust to all of the constrains of any organization such as access policy, integration with existing directory services, leveraging tools installed across all the datacenter and even integrate with weird DNS configuration problematic with Kerberos.
  • Being a file based without any database and running from any operating system, there is a guarantee to rollback or deploy a hot fix within minutes without the need to compile, install or deploy anything.
  • It is not invasive, so nothing related to Ryba is deployed on the targeted servers.
  • It is secured and uses standards by leveraging SSH and SSL keys for all of the communications and also offers the possibility to pass a firewall as long as an SSH connection is allowed.
  • It is written in CoffeeScript, a language which is fast to write, easy to read, self documented and running from any operating system with Node.js. Code may also be written in JavaScript or any language which transpile to it.
  • All of the configuration and source code are under version control with Git and versioned with NPM, the Node.js package manager.
  • From its early days, Ryba embraced idempotence by design; running the same command multiple times must produce the same effects.
  • Every modification must be logged with clear information, and a backup is made of any modified configuration file.
  • It runs transparently with an Internet connection, with an Intranet connection behind a proxy, or inside an offline environment without any Internet access.

The easiest way to get started is to install the package ryba-cluster and use it as an example. It provides a pre-configured ryba deployment for a 6 node cluster. 3 nodes are configured as master nodes, 1 node is a front node (also named edge node) and 2 nodes are worker nodes. The reason why we only set 2 nodes for the worker nodes is rather simple. Those 6 nodes fit inside 6 virtual machines on our development laptop configured with 16GB of memory. To this effect, you’ll find a vagrant file inside the ryba-cluster package you can use.

The following instructions install the tools you’ll need, download the ryba packages, start a local cluster of virtual machines and run ryba to bootstrap the cluster. It assumes your host is connected to the Internet. Get in touch with us or visit our website if you wish to work offline. They apply to any OSX or Linux system and will work on Windows with minimal efforts.

1. Install Git

You can either install it as a package, via another installer, or download the source code and compile it yourself.

On Linux, you can run for example yum install git or apt-get install git if you’re running on a Fedora or a Debian-based distribution. On OSX or Windows, you can download the Git installer available for your operating system.

2. Install Node.js

To install Node.js, the recommended way is to use n. If you are not familiar with Node.js, it would be easier to simply download the Node.js installer available for your operating system.

3. Download the ryba-cluster starting package

We use Git to download the default configuration and NPM to install all its dependencies. Ryba is a Node.js good citizen. Getting familiar with the Node.js platform is all you need to understand its internal.

git clone https://github.com/ryba-io/ryba-cluster.git
cd ryba-cluster
npm install

4. Get Familiar with the package

Inside the “bin” folder are commands to run vagrant, and Ryba, as well as to synchronize local YUM repositories. The “conf” folder store configuration files that are merged by ryba when started. The “node_modules” folder is managed by NPM and Node.js to store all your dependencies, including the Ryba source code. The “package.json” file is a Node.js file that describes your package.

5. Start you’re cluster

This step is using Vagrant to bootstrap a cluster of 6 nodes with a private network. You’ll need 16GB of memory. It also registers the server name and IP address inside you’re “/etc/hosts” file. You can skip this step if you already have physical or virtual node at your disposal. Just modify the “conf/server.coffee” file to reflect your network topology.

bin/vagrant up

6. Run Ryba

After you cluster nodes are started and when your configuration is ready, running Ryba to install, start and check your components is as simple as executing:

bin/ryba install

7. Configure your host machine

On your host, you need to declare the name and IP address of your cluster (if using Vagrant). You’ll also need to import the Kerberos client configuration file.

sudo tee -a /etc/hosts << RYBA
10.10.10.11 master1.ryba
10.10.10.12 master2.ryba
10.10.10.13 master3.ryba
10.10.10.14 front1.ryba
10.10.10.16 worker1.ryba
10.10.10.17 worker2.ryba
10.10.10.18 worker3.ryba
RYBA
# Write "vagrant" as a password
# Be careful, this will overwrite your local krb5 file
scp [email protected]:/etc/krb5.conf /etc/krb5.conf

8. Access the Hadoop Cluster web interfaces

Your host machine is now configured with Kerberos. From the command line, you shall be able to get a new ticket:

echo hdfs123 | kinit [email protected]
klist

Most of the web applications started by Hadoop use SPNEGO to provide Kerberos authentication. SPNEGO isn’t limited to Kerberos and is already supported by your favorite web browsers. However, most of the browser (with the exception of Safari) need some specific configuration. Refer to the web to configure it or use curl:

curl -k --negotiate -u: https://master1.ryba:50470

You shall now be familiar with Ryba. Join us and participate in this project on GitHub.

Ryba is a tool licensed under the BSD New license used to deploy secured Hadoop clusters with a focus on multi-tenancy.

About this author

The author, David Worms, is the owner of Adaltas, a French company based in Paris and specialized in the deployment of secure Hadoop clusters.

LEAVE A REPLY

Please enter your comment!
Please enter your name here