icon Batch Starting in Next Week-Data Science with Gen AI ENROLL NOW

Read CSV,Parquet,Avro file format using Spark

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
  • 20 Mar, 2022
  • 0 Comments
  • 36 Secs Read

Read CSV,Parquet,Avro file format using Spark

Read CSV,Avro and Parquest File format

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, when


def getsparkSession():
    spark = SparkSession.builder.master("yarn") \
        .appName('Learnomate Example') \
        .getOrCreate()
    return spark

spark = getsparkSession()

origin_df = spark.read.format('csv').option('header', 'True').option('delimiter', '|') \
    .load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\empdata.csv")
origin_df.show()


df = spark.read.format('parquet').load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\Train.parquet")
df.show()

df = spark.read.format('avro').load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\variants.avro")
df.show()


Read Data from HDFS 

origin_df = spark.read.format('csv').option('header', 'True').option('delimiter', '|') \
    .load("hdfs://sandbox-hdp.hortonworks.com:8020/input/empdata.csv")
origin_df.show()

Let's Talk

Find your desired career path with us!

Let's Talk

Find your desired career path with us!