If you’ve viewed part1, be sure to go back and have a relook as I had an error in some of the xml conf files.
Be aware this is a work in progress. What I have here works, but the steps to setup the cluster could use some polish and optimization for ease of use.
While hadoop running on one node is slightly interesting. hadoop running across several ARM nodes, now that’s more like it. It’s time to add in additional nodes. In my case I’m going to have a 4 node hadoop cluster made up of 3 TI panda boards and 1 freescale imx53. Let’s walk through the steps to get that up and running. At this end of this exercise, there’s a great opportunity to have a hadoop-server image which is mostly setup.
In hadoop you must specify one machine as the master. The rest will be slaves. So across the collection of machines, let’s first get the hostnames all setup and organized. You’ll want to also get all the machines setup with static ip addresses.
One way to setup a static ip address is to edit /etc/network/interfaces. All my machines are on a 192.168.1.x network. You’ll have to make adjustments as are appropriate for your setup. Note in this example the line involving dhcp is commented out:
auto lo iface lo inet loopback auto eth0 #iface eth0 inet dhcp iface eth0 inet static address 192.168.1.15 netmask 255.255.255.0 gateway 192.168.1.1
Next we’ll update the /etc/hosts file with entries for master and all the slaves across all the machines. Edit /etc/hosts on the respective machines with their ip address to name mappings. Here’s an example from the system that is named master. Note the respective system name appears on first line:
127.0.0.1 localhost master ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.1.14 master 192.168.1.15 slave1 192.168.1.16 slave2 192.168.1.17 slave3
Next it’s time to update the /etc/hostname file on all the systems with it’s name. On my system named master for instance it’s as simple as this:
With that complete, go back to my previous post and repeat those steps for each node. However there are some exceptions.
- If you issue start-all.sh make sure you shut down the node with end-all.sh
- Reboot each node after you’ve completed.
All done? Good! Now we’re ready to connect the nodes so that work will be dispatched across the cluster. First we net to get hduser’s pub ssh key onto all the slaves. As the hduser on your master node, issue the following for each slave node:
master$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave1
Afterwards test that you can ssh from the master to each of the slave nodes. It’s extremely important this works.
master $ slogin slave1
Multinode hadoop configuration
Now it’s time to configure the various services that will run. Just like in the first blog post, you can have your master node run as both a slave node and a master node, or you can have it run just as a master node. Up to you!
On the master node or nodes, it’s their job to run NodeNode and JobTracker. On the slave nodes, it’s their job to run DataNode and TaskTracker. We’ll now configure this.
On the master machine as root edit /usr/local/hadoop/conf/masters and change to the name of your master machine and localhost:
Now on the master machine we are going to tell it what nodes are it’s slaves. This is done by listing the names of the machines on /usr/local/hadoop/conf/slaves. If you want the master machine to also serve as a slave then you need to list it. If you don’t want the master to be a slave then you should remove the entry for localhost and make sure the name of the master isn’t listed. You have to update this file on all machines in the cluster.
localhost master slave1 slave2 slave3
Now for all machines in the cluster, we need to update /usr/local/hadoop/conf/core-site.xml. Specifically look for
and change localhost to the name of your master node.
Now we’re going to update /usr/local/hadoop/conf/mapred-site.xml again and all machines which specifies where the JobReducer is run. This runs on our master node. Look for
and change to
Next on all nodes edit /usr/local/hadoop/conf/hdfs-site.xml to adjust the value for dfs.replication. This value needs to be equal to or less then the number of slave nodes you have in your cluster. Specifically it controls how many nodes data has to be copied to before it the job starts. The default for the file if unchanged is 3. If the number is larger than the number of slave nodes that you have, your jobs will experience errors. Here’s how mine is setup which is acceptable since I have a total of 4 nodes.
Next on the master node, su – hduser and issue the following to format the hdfs file system.
master ~$ hadoop namenode -format
Starting the cluster daemons
Now it’s time to start the cluster. Here’s the high level order:
- HDFS daemons are started, this starts the NameNode daemon on the master node
- DataNode daemons are started on all slaves
- JobTracker is started on master
- TaskTracker daemons are started on all slaves
Here’s the commands in practice. On master as hduser:
Presuming things successfully start you can run jps and see the following:
hduser@master:~$ jps 3658 Jps 3203 DataNode 3536 SecondaryNameNode 2920 NameNode
If you were to also run jps on your slave nodes you’d notice that DataNode is also running there.
Now we will start the MapReducer and JobTracker on master. This is done with:
Presuming you don’t encounter any errors, your cluster should be fully up and running. In my next blog post I’ll do some runs with the cluster and do some performance measurements.