Building a test Hadoop Cluster
I have been working with HortonWorks recently and they wanted to see the installation process of an Hadoop Cluster using Isilon as the storage layer and any differences from a standard DAS based install. I ran into numerous issues just getting the Linux hosts into a sensible state to install Hadoop. It thought I would summarise some of the simple issues that you should try to resolve before starting to install Hadoop.
Initial starting point (HortonWorks instructions and Isilon specific set up instructions)
- I built a few VMs using Centos 6.5 DVD 1
- Selected a Basic Database Server as the install flavour
- Choose a reasonable sized OS partition as you might want to make a local Hadoop repository and that is a 10GB tar.gz file download. You have to extract that so over 20GB is needed to complete that process. I ended up resizing the VM a couple of times so I would suggest at least 60GB for the Ambari Server VM including the local repository area.
- You might want to set up a simple script for copying files to all nodes or running a command on all nodes to save you logging into each node one at a time.Something a simple as ( for nodes in 1 2 3 4 5 6; do; scp $1 yourvmname$nodes:$1 ; done) will save a lot of time.
- Set up Networking and Name resolution for all the nodes and Isilon (use SmartConnect)
- Enable NTP
- Turn off or edit the IPTABLES setting so the nodes get access to the various ports used by Hadoop
- I needed to update the Openssl package as the Hadoop install process fails quite a few steps along the way and you may run into other issue if you restart the process again. (# yum update openssl)
- Disable the transparent huge pages (edit the /boot/grub/grub.config file and reboot)
- Set up password less root access for the Ambari server to the other compute nodes in the cluster
- The only real changes during the Ambari based install process occur during the initial set up as per below:
- Add all the compute and Master nodes into the install process and use the ssh key.
- Go to the next page so they all are registered and install the Ambari client.
- Then press the back button, add the Isilon FQDN to the list with a manual (not ssh login) and then continue.
- Later, during the services/node selection process, just have the NameNode and Datanode services on the Isilon only.
- Just follow the install process (change the repository to a local one if you set that up) I did, as my link to the remote repositories was limited to around 500k so it took ages to install multiple nodes without the local options.
I now have two Hadoop clusters up and running and using Isilon as the HDFS store so more to play with 🙂