icon Join the 3-Day Free Live Sessions on Data Science with Gen AI ENROLL NOW

Read CSV,Parquet,Avro file format using Spark

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
  • 20 Mar, 2022
  • 0 Comments
  • 36 Secs Read

Read CSV,Parquet,Avro file format using Spark

Read CSV,Avro and Parquest File format

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, lit, when


def getsparkSession():
    spark = SparkSession.builder.master("yarn") \
        .appName('Learnomate Example') \
        .getOrCreate()
    return spark

spark = getsparkSession()

origin_df = spark.read.format('csv').option('header', 'True').option('delimiter', '|') \
    .load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\empdata.csv")
origin_df.show()


df = spark.read.format('parquet').load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\Train.parquet")
df.show()

df = spark.read.format('avro').load(r"C:\Users\ankus\PycharmProjects\pythonProject2\venv\resources\variants.avro")
df.show()


Read Data from HDFS 

origin_df = spark.read.format('csv').option('header', 'True').option('delimiter', '|') \
    .load("hdfs://sandbox-hdp.hortonworks.com:8020/input/empdata.csv")
origin_df.show()

lets talk - learnomate helpdesk

Let's Talk

Find your desired career path with us!

lets talk - learnomate helpdesk

Let's Talk

Find your desired career path with us!