Spark Hadoop Training | Best Training for Hadoop Spark | Data Engineer Training

Duration - 30+ Hours

Spark Hadoop Syllabus | Start your Career with us

Module 1 : Introduction To Big Data

Module 2 : Introduction To Hadoop.

Module 3 : Hadoop Installation.

  • Hadoop Installation
  • Hive Installation
  • Sqoop Installation
  • Spark Installation
  • Python Installation
  • Jupytor notebook Installation
  • Pycharm Installation

Module 4 : HDFS (Hadoop Distributed File System)

  • What is DFS ?
  • Benefits of DFS.
  • What is HDFS?
  • HDFS Daemons
  • Fault tolerance : File blocks and Replication
  • Rack Awareness
  • HDFS Read mechanism
  • HDFS write mechanism
  • Different file formats in HDFS
  • HDFS safe mode
  • How Hadoop Handles metadata
  • HDFS permissions
  • Data Compression
  • Working with HDFS (HDFS commands)

Module 5 : YARN (Yet Another Resource Negotiator)

  • Introduction to YARN
  • MRv1 vs YARN
  • YARN Daemons
  • Schedulers in YARN : Fair Scheduler vs Capacity Scheduler
  • Application Manager
  • Application Master VS Application Manager
  • YARN Architecture
  • How YARN handles failures
  • Types of applications supported by YARN

Module 6 : MapReduce

  • How MapReduce works ?
  • MapReduce phases : map and reduce
  • Shuffling and sorting
  • Use cases for MapReduce
  • Limitations of MapReduce
  • WordCount MapReduce program
  • MapReduce programming examples

Module 7 : Hive

  • Introduction to Apache Hive Preview
  • Hive vs Pig
  • Hive Architecture and Components Preview
  • Hive Metastore
  • Limitations of Hive
  • Comparison with Traditional Database
  • Hive Data Types and Data Models
  • Hive Partition
  • Hive Bucketing
  • Hive Tables (Managed Tables and External Tables)
  • Importing Data
  • Querying Data & Managing Outputs
  • Hive Script & Hive UDF

Module 8 : Sqoop

  • Introduction to sqoop
  • Sqoop’s working mechanism
  • Importing data from RDBMS to HDFS using sqoop
  • Exporting data to RDBMS from HDFS using sqoop
  • Sqoop’s Incremental import

Module 9 : Apache Spark

  • Introduction to Apache Spark
  • Spark unified stack
  • Features of Apache Spark
  • Why Spark is Faster than Hadoop
  • Spark Architecture
  • Spark Drivers and Executors
  • A typical spark application
  • SparkContext vs SparkSession
  • Cluster managers in spark
  • Set YARN as cluster manager for spark applications
  • Getting familiar with Spark shell
  • Getting familiar with ScalaIDE
  • Spark-Submit : Submitting applications to cluster

Module 10 : Spark Programming Model

  • What are RDDs ?
  • RDDs and Partitions
  • Reading data from various sources
  • RDD : Transformations and Actions
  • Narrow transformations vs Wide transformations
  • The concept of DAG and Lazy Evaluation
  • The concept of RDD Persistance
  • Spark Application vs MapReduce Application
  • RDD Programming examples

Module 11 : Spark SQL

  • Need for SparkSQL
  • Workflow for SparkSQL
  • The concept of DataFrames
  • The concept of DataSets
  • DataFrame vs DataSet
  • Views in SparkSQL
  • Hive and Spark Integration
  • Working with SparkSQL through : Spark-shell , sprk-sql shell and IDE
  • Spark-SQL programming examples

Bonus Module  :

Oozie : The workflow Scheduler

  • What is Oozie?
  • Need for Oozie
  • Scheduling Jobs using Oozie

Crontab Introduction

  • What is crontab
  • Schedule job in crontab