This post has two parts. In this first part I introduce Ryba, its goals and how to install and start using it. Ryba bootstraps and manages a full secured Hadoop cluster with one command. In Part 2 I detail how we came to write Ryba, how it is multi-tenancy addressed, and we will also explore the targeted user.
So, let’s get started. Ryba is born out of the following needs:
The easiest way to get started is to install the package ryba-cluster and use it as an example. It provides a pre-configured ryba deployment for a 6 node cluster. 3 nodes are configured as master nodes, 1 node is a front node (also named edge node) and 2 nodes are worker nodes. The reason why we only set 2 nodes for the worker nodes is rather simple. Those 6 nodes fit inside 6 virtual machines on our development laptop configured with 16GB of memory. To this effect, you’ll find a vagrant file inside the ryba-cluster package you can use.
The following instructions install the tools you’ll need, download the ryba packages, start a local cluster of virtual machines and run ryba to bootstrap the cluster. It assumes your host is connected to the Internet. Get in touch with us or visit our website if you wish to work offline. They apply to any OSX or Linux system and will work on Windows with minimal efforts.
1. Install Git
You can either install it as a package, via another installer, or download the source code and compile it yourself.
On Linux, you can run for example yum install git or apt-get install git if you’re running on a Fedora or a Debian-based distribution. On OSX or Windows, you can download the Git installer available for your operating system.
2. Install Node.js
To install Node.js, the recommended way is to use n. If you are not familiar with Node.js, it would be easier to simply download the Node.js installer available for your operating system.
3. Download the ryba-cluster starting package
We use Git to download the default configuration and NPM to install all its dependencies. Ryba is a Node.js good citizen. Getting familiar with the Node.js platform is all you need to understand its internal.
git clone https://github.com/ryba-io/ryba-cluster.git
cd ryba-cluster
npm install
4. Get Familiar with the package
Inside the “bin” folder are commands to run vagrant, and Ryba, as well as to synchronize local YUM repositories. The “conf” folder store configuration files that are merged by ryba when started. The “node_modules” folder is managed by NPM and Node.js to store all your dependencies, including the Ryba source code. The “package.json” file is a Node.js file that describes your package.
5. Start you’re cluster
This step is using Vagrant to bootstrap a cluster of 6 nodes with a private network. You’ll need 16GB of memory. It also registers the server name and IP address inside you’re “/etc/hosts” file. You can skip this step if you already have physical or virtual node at your disposal. Just modify the “conf/server.coffee” file to reflect your network topology.
bin/vagrant up
6. Run Ryba
After you cluster nodes are started and when your configuration is ready, running Ryba to install, start and check your components is as simple as executing:
bin/ryba install
7. Configure your host machine
On your host, you need to declare the name and IP address of your cluster (if using Vagrant). You’ll also need to import the Kerberos client configuration file.
sudo tee -a /etc/hosts << RYBA
10.10.10.11 master1.ryba
10.10.10.12 master2.ryba
10.10.10.13 master3.ryba
10.10.10.14 front1.ryba
10.10.10.16 worker1.ryba
10.10.10.17 worker2.ryba
10.10.10.18 worker3.ryba
RYBA
# Write "vagrant" as a password
# Be careful, this will overwrite your local krb5 file
scp vagrant@master1.ryba:/etc/krb5.conf /etc/krb5.conf
8. Access the Hadoop Cluster web interfaces
Your host machine is now configured with Kerberos. From the command line, you shall be able to get a new ticket:
echo hdfs123 | kinit hdfs@HADOOP.RYBA
klist
Most of the web applications started by Hadoop use SPNEGO to provide Kerberos authentication. SPNEGO isn’t limited to Kerberos and is already supported by your favorite web browsers. However, most of the browser (with the exception of Safari) need some specific configuration. Refer to the web to configure it or use curl:
curl -k --negotiate -u: https://master1.ryba:50470
You shall now be familiar with Ryba. Join us and participate in this project on GitHub.
Ryba is a tool licensed under the BSD New license used to deploy secured Hadoop clusters with a focus on multi-tenancy.
The author, David Worms, is the owner of Adaltas, a French company based in Paris and specialized in the deployment of secure Hadoop clusters.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…