





















































(For more resources related to this topic, see here.)
Hadoop is supported by many cloud vendors as the popularity of Map/Reduce has grown over the past few years. Accumulo is another story; even though popularity is growing, the support of cloud vendors hasn't caught up.
Amazon has great support for Accumulo, Hadoop, and ZooKeeper. For Hadoop and ZooKeeper, there is a set of libraries called Apache Whirr. Apache Whirr supports Amazon EC2, Rackspace, and many more cloud providers. Apache Whirr uses low-level API libraries. For Accumulo, you have two options: one is to use the Amazon EMR command-line interface, and the other is to create a new virtual machine and then setup it.
Prerequisites needed to complete the setup phase for Amazon EC2 are as follows:
The following steps are required to create Amazon EC2 Hadoop and the ZooKeeper cluster:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr
The rsa key is used later when configuring Whirr. It is not required to copy the key to the ~/.ssh/authorized_keys folder because the rsa key is going to be used from the current location.
cd /usr/local sudo wget http://apache.claz.org/whirr/stable/whirr-0.8.2.tar.gz sudo tar xzf whirr-0.8.2.tar.gz sudo mv whirr-0.8.2 whirr sudo chown –R hadoopuser:hadoopgroup whirr
Download Whirr in the /usr/local folder, unpack it, and rename it to whirr. For Cygwin, don't run the last command in the script.
sudo cp /usr/local/whirr/conf/credentials.sample/usr/local/whirr/credentials sudo nano /usr/local/whirr/conf/credentials
PROVIDER=aws-ec2 IDENTITY=<The value from the variable AWSAccessKeyId> CREDENTIAL= <The value from the variable AWSSecretKey>
sudo nano /usr/local/whirr/conf/cluster.properties
nano /usr/local/whirr/conf/cluster.properties
Add the following lines:
whirr.cluster-name=demo-cluster whirr.instance-templates=1 zookeeper,1 hadoop-jobtracker+hadoop-namenode,
1 hadoop-datanode+Hadoop-tasktracker whirr.provider=aws-ec2 whirr.private-key-file=${sys:user.home}/.ssh/id_rsa_whirr whirr.public-key-file=${sys:user.home}/.ssh/id_rsa_whirr.pub
This file describes a single cluster with one ZooKeeper node, one Hadoop node running JobTracker and NameNode, and one Hadoop node running DataNode and JobTracker.
su - hadoopuser
cd /usr/local/whirr bin/whirr launch-cluster --config conf/cluster.properties
If you get the error message java.io.FileNotFoundException: whirr.log (Permission denied), then the current user has not got permission to access the whirr.log file.
After a few seconds, you will see that the script will start to print out the status message and information about what is going to be done, as shown in the following screenshot:
The result from creating a cluster using Whirr is very detailed and important for troubleshooting and monitoring purposes, as shown in the following screenshot:
The output from running the script gives very valuable information about the cluster created. Every instance has a role and an external and internal IP address. The ID of every node is in the form <region>/<unique id>.
cd /usr/local/whirr bin/whirr destroy-cluster --config conf/cluster.properties
The easiest way to set up Accumulo on Amazon is to use the Amazon CLI (command-line interface). There is a single ZooKeeper node up and running, that should be used while setting up Accumulo.
For Linux and Windows:
elastic-mapreduce --create --alive --name "Accumulo"--bootstrap-action s3://elasticmapreduce/samples/accumulo/accumulo-install.sh --args "<zookeeper ip address>, Demo-Database, DBPassword"
--bootstrap-name "install Accumulo" --enable-debugging –log-url s3://demo-accumulo/Accumulo-logs/ --instance-type m1.large --instance-count 4 --key- pair<Key Pair Name>
Locate the key pair name at https://console.aws.amazon.com/ec2/home?region=us-east-1#s=KeyPairs.