Kiran Dalvi
23 Nov, 2021
0 Comments
1 Min Read

Hadoop 3.2.1 Installation Steps on Ubuntu

Putty Setting [Assuming Ubuntu installation completed]

Below setting will help to take remote of hadoop cluster using putty

https://learnomate.org/settings-to-connect-to-putty-with-remove-oracle-database-server/

Change Hostname in Ubuntu

sudo hostnamectl set-hostname hadoop.com

open /etc/hosts file and change hostname

root@ankush-virtual-machine:/home/ankush# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       hadoop.com

Fire hostname and ensure hostname has changed

root@ankush-virtual-machine:/home/ankush# hostname
hadoop.com

Install Packages that will help you to take ssh and enable copy paste from vmware

sudo apt update
sudo apt install net-tools
sudo apt install open-vm-tools-desktop -y
sudo apt install vim -y
sudo apt install openssh-server -y
sudo service ssh status

Switch to root user

sudo su -

Java Installation

sudo apt install openjdk-8-jdk -y

java -version;
javac -version

sudo adduser hdoop
su - hdoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost

Add hdoop users to suers list

su - ankush
sudo adduser hdoop sudo

Downloading Hadoop


wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
or
Directly download hadoop from Apache official website. [move zip file to /home/hdoop location]

tar xzf hadoop-3.2.1.tar.gz

Editing 6 important files

1st file [.bashrc]


cd /home/hdoop
sudo vi .bashrc - ##here you might face issue saying hdoop is not sudo user
if this issue comes then
su - ankush
sudo adduser hdoop sudo


cd /home/hdoop
sudo vi .bashrc
#Add below lines in this file

#Hadoop Related Options
export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

source ~/.bashrc

2nd File [hadoop-env.sh]

sudo vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh #Add below line in this file in the end export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

3rd File [core-site.xml]


vi $HADOOP_HOME/etc/hadoop/core-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")
   
   <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hdoop/tmpdata</value>
        <description>A base for other temporary directories.</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
        <description>The name of the default file system></description>
    </property>

4th File [hdfs-site.xml]

#Add below lines in this file(between "<configuration>" and "<"/configuration>")


<property>
  <name>dfs.data.dir</name>
  <value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

5th File [mapred-site.xml]

$HADOOP_HOME/etc/hadoop/mapred-site.xml

#Add below lines in this file(between <configuration> and </configuration>)

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

6th File [yarn-site.xml]

sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

Launching Hadoop

hdfs namenode -format

start-all.sh
start-dfs.sh

Hadoop WebUi URL


Hadoop NameNode started on default port 9870

http://192.168.0.60:9870/

port 8042 for getting the information about the cluster and all applications.

http://localhost:8042/

Access port 9864 to get details about your Hadoop node.

http://localhost:9864/

Start/Stop Hadoop Services

start-all.sh & stop-all.sh : Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed.

start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh : Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh

hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager : To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands.

Use case : Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine,

bin/hadoop-daemon.sh start datanode