Blog

Putty Setting [Assuming Ubuntu installation completed]

Below setting will help to take remote of hadoop cluster using putty

Change Hostname in Ubuntu

sudo hostnamectl set-hostname hadoop.com

open /etc/hosts file and change hostname

root@ankush-virtual-machine:/home/ankush# cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       hadoop.com

Fire hostname and ensure hostname has changed

root@ankush-virtual-machine:/home/ankush# hostname
hadoop.com

Install Packages that will help you to take ssh and enable copy paste from vmware

sudo apt update
sudo apt install net-tools
sudo apt install open-vm-tools-desktop -y
sudo apt install vim -y
sudo apt install openssh-server -y
sudo service ssh status

Switch to root user

sudo su -

Java Installation

sudo apt install openjdk-8-jdk -y

java -version;
javac -version

sudo adduser hdoop
su - hdoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost

Add hdoop users to suers list

su - ankush
sudo adduser hdoop sudo

Downloading Hadoop


wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
or
Directly download hadoop from Apache official website. [move zip file to /home/hdoop location]

tar xzf hadoop-3.2.1.tar.gz

Editing 6 important files

1st file [.bashrc]


cd /home/hdoop
sudo vi .bashrc - ##here you might face issue saying hdoop is not sudo user
if this issue comes then
su - ankush
sudo adduser hdoop sudo

cd /home/hdoop
sudo vi .bashrc
#Add below lines in this file

#Hadoop Related Options
export HADOOP_HOME=/home/hdoop/hadoop-3.2.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

source ~/.bashrc

2nd File [hadoop-env.sh]

sudo vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

#Add below line in this file in the end

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

3rd File [core-site.xml]


vi $HADOOP_HOME/etc/hadoop/core-site.xml

#Add below lines in this file(between "<configuration>" and "<"/configuration>")
   
   <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/hdoop/tmpdata</value>
        <description>A base for other temporary directories.</description>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
        <description>The name of the default file system></description>
    </property>

4th File [hdfs-site.xml]

#Add below lines in this file(between "<configuration>" and "<"/configuration>")


<property>
  <name>dfs.data.dir</name>
  <value>/home/hdoop/dfsdata/namenode</value>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hdoop/dfsdata/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

5th File [mapred-site.xml]

$HADOOP_HOME/etc/hadoop/mapred-site.xml

#Add below lines in this file(between <configuration> and </configuration>)

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

6th File [yarn-site.xml]

sudo vi $HADOOP_HOME/etc/hadoop/yarn-site.xml

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

Launching Hadoop

hdfs namenode -format

start-all.sh
start-dfs.sh

Hadoop WebUi URL


Hadoop NameNode started on default port 9870

http://192.168.0.60:9870/

port 8042 for getting the information about the cluster and all applications.

http://localhost:8042/

Access port 9864 to get details about your Hadoop node.

http://localhost:9864/

Start/Stop Hadoop Services

start-all.sh & stop-all.sh : Used to start and stop hadoop daemons all at once. Issuing it on the master machine will start/stop the daemons on all the nodes of a cluster. Deprecated as you have already noticed.

start-dfs.sh, stop-dfs.sh and start-yarn.sh, stop-yarn.sh : Same as above but start/stop HDFS and YARN daemons separately on all the nodes from the master machine. It is advisable to use these commands now over start-all.sh & stop-all.sh

hadoop-daemon.sh namenode/datanode and yarn-deamon.sh resourcemanager : To start individual daemons on an individual machine manually. You need to go to a particular node and issue these commands.

Use case : Suppose you have added a new DN to your cluster and you need to start the DN daemon only on this machine,

bin/hadoop-daemon.sh start datanode

Follow me

Contact us for Training/ Job Support

Caution: Your use of any information or materials on this website is entirely at your own risk. It is provided for educational purposes only. It has been tested internally, however, we do not guarantee that it will work for you. Ensure that you run it in your test environment before using.