Blog
Hadoop 3.2.1 Installation Steps on Ubuntu
- November 23, 2021
- Posted by: Ankush Thavali
- Category: Hadoop
No Comments


Putty Setting [Assuming Ubuntu installation completed]
Below setting will help to take remote of hadoop cluster using putty
Change Hostname in Ubuntu
sudo hostnamectl set-hostname hadoop.com
open /etc/hosts file and change hostname
root@ankush-virtual-machine:/home/ankush# cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 hadoop.com
Fire hostname and ensure hostname has changed
root@ankush-virtual-machine:/home/ankush# hostname hadoop.com
Install Packages that will help you to take ssh and enable copy paste from vmware
sudo apt update sudo apt install net-tools sudo apt install open-vm-tools-desktop -y sudo apt install vim -y sudo apt install openssh-server -y sudo service ssh status
Switch to root user
sudo su -
Java Installation
sudo apt install openjdk-8-jdk -y java -version; javac -version
sudo adduser hdoop su - hdoop ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys ssh localhost
Add hdoop users to suers list
su - ankush sudo adduser hdoop sudo
Downloading Hadoop
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz or Directly download hadoop from Apache official website. [move zip file to /home/hdoop location] tar xzf hadoop-3.2.1.tar.gz
Editing 6 important files
1st file [.bashrc]
cd /home/hdoop sudo vi .bashrc - ##here you might face issue saying hdoop is not sudo user if this issue comes then su - ankush sudo adduser hdoop sudo
cd /home/hdoop sudo vi .bashrc #Add below lines in this file #Hadoop Related Options export HADOOP_HOME=/home/hdoop/hadoop-3.2.1 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
source ~/.bashrc
2nd File [hadoop-env.sh]
sudo vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
#Add below line in this file in the end
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
3rd File [core-site.xml]
vi $HADOOP_HOME/etc/hadoop/core-site.xml #Add below lines in this file(between "<configuration>" and "<"/configuration>") <property> <name>hadoop.tmp.dir</name> <value>/home/hdoop/tmpdata</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system></description> </property>
4th File [hdfs-site.xml]
#Add below lines in this file(between "<configuration>" and "<"/configuration>") <property> <name>dfs.data.dir</name> <value>/home/hdoop/dfsdata/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hdoop/dfsdata/datanode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
5th File [mapred-site.xml]
$HADOOP_HOME/etc/hadoop/mapred-site.xml #Add below lines in this file(between <configuration> and </configuration>) <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
6th File [yarn-site.xml]
sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>127.0.0.1</value> </property> <property> <name>yarn.acl.enable</name> <value>0</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property>
Launching Hadoop
hdfs namenode -format start-all.sh start-dfs.sh
Hadoop WebUi URL
Hadoop NameNode started on default port 9870 http://192.168.0.60:9870/ port 8042 for getting the information about the cluster and all applications. http://localhost:8042/ Access port 9864 to get details about your Hadoop node. http://localhost:9864/
Start/Stop Hadoop Services
start-all.sh & stop-all.sh : Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed. start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh : Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager : To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands. Use case : Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine, bin/hadoop-daemon.sh start datanode

Putty Setting [Assuming Ubuntu installation completed]
Below setting will help to take remote of hadoop cluster using putty
Change Hostname in Ubuntu
sudo hostnamectl set-hostname hadoop.com
open /etc/hosts file and change hostname
root@ankush-virtual-machine:/home/ankush# cat /etc/hosts 127.0.0.1 localhost 127.0.1.1 hadoop.com
Fire hostname and ensure hostname has changed
root@ankush-virtual-machine:/home/ankush# hostname hadoop.com
Install Packages that will help you to take ssh and enable copy paste from vmware
sudo apt update sudo apt install net-tools sudo apt install open-vm-tools-desktop -y sudo apt install vim -y sudo apt install openssh-server -y sudo service ssh status
Switch to root user
sudo su -
Java Installation
sudo apt install openjdk-8-jdk -y java -version; javac -version
sudo adduser hdoop su - hdoop ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 0600 ~/.ssh/authorized_keys ssh localhost
Add hdoop users to suers list
su - ankush sudo adduser hdoop sudo
Downloading Hadoop
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz or Directly download hadoop from Apache official website. [move zip file to /home/hdoop location] tar xzf hadoop-3.2.1.tar.gz
Editing 6 important files
1st file [.bashrc]
cd /home/hdoop sudo vi .bashrc - ##here you might face issue saying hdoop is not sudo user if this issue comes then su - ankush sudo adduser hdoop sudo
cd /home/hdoop sudo vi .bashrc #Add below lines in this file #Hadoop Related Options export HADOOP_HOME=/home/hdoop/hadoop-3.2.1 export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
source ~/.bashrc
2nd File [hadoop-env.sh]
sudo vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
#Add below line in this file in the end
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
3rd File [core-site.xml]
vi $HADOOP_HOME/etc/hadoop/core-site.xml #Add below lines in this file(between "<configuration>" and "<"/configuration>") <property> <name>hadoop.tmp.dir</name> <value>/home/hdoop/tmpdata</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> <description>The name of the default file system></description> </property>
4th File [hdfs-site.xml]
#Add below lines in this file(between "<configuration>" and "<"/configuration>") <property> <name>dfs.data.dir</name> <value>/home/hdoop/dfsdata/namenode</value> </property> <property> <name>dfs.data.dir</name> <value>/home/hdoop/dfsdata/datanode</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property>
5th File [mapred-site.xml]
$HADOOP_HOME/etc/hadoop/mapred-site.xml #Add below lines in this file(between <configuration> and </configuration>) <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
6th File [yarn-site.xml]
sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>127.0.0.1</value> </property> <property> <name>yarn.acl.enable</name> <value>0</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property>
Launching Hadoop
hdfs namenode -format start-all.sh start-dfs.sh
Hadoop WebUi URL
Hadoop NameNode started on default port 9870 http://192.168.0.60:9870/ port 8042 for getting the information about the cluster and all applications. http://localhost:8042/ Access port 9864 to get details about your Hadoop node. http://localhost:9864/
Start/Stop Hadoop Services
start-all.sh & stop-all.sh : Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed. start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh : Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager : To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands. Use case : Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine, bin/hadoop-daemon.sh start datanode