Read CSV,Parquet,Avro file format using Spark

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
  • User AvatarANKUSH THAVALI
  • 20 Mar, 2022
  • 0 Comments
  • 36 Secs Read

Read CSV,Parquet,Avro file format using Spark

Read CSV,Avro and Parquest File format

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, when


def getsparkSession():
    spark = SparkSession.builder.master("yarn") \
        .appName('Learnomate Example') \
        .getOrCreate()
    return spark

spark = getsparkSession()

origin_df = spark.read.format('csv').option('header', 'True').option('delimiter', '|') \
    .load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\empdata.csv")
origin_df.show()


df = spark.read.format('parquet').load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\Train.parquet")
df.show()

df = spark.read.format('avro').load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\variants.avro")
df.show()


Read Data from HDFS 

origin_df = spark.read.format('csv').option('header', 'True').option('delimiter', '|') \
    .load("hdfs://sandbox-hdp.hortonworks.com:8020/input/empdata.csv")
origin_df.show()