hadoop
Var keyword is just similar to variable declaration in Java whereas Val is little different. Once a variable is declared using Val the reference cannot be changed to point to another reference. This functionality of Val keyword in Scala can be related to the functionality of java final keyword. Val refers to immutable declaration of […]
Lazy evaluation in Spark means that the execution will not start until an action is triggered. The Spark Lazy evaluation, users can divide into smaller operations. It reduces the number of passes on data by transformation grouping operation. By lazy evaluation in Spark to saves the trip between driver and cluster, speed up the process. […]
Hive Partitions Create Hive Partition Table Create non-partition table Load data to non-partition table Set the Hive property Load data from non-partition to partition table Show All Partitions on Hive Table Add New Partition to the Hive Table Rename or Update Hive Partition Manually Renaming Partitions on HDFS When you manually modify the partitions directly […]
Start Hive Metastore Create Database SHOW DATABASES Use Database Describe Database Drop Database Hive DDL Table Commands Show Tables Describe Table Truncate Table Alter Table Drop Table Internal Tables External Table Hive – Load Data Into Table In hive with DML statements, we can add data to the Hive table in 2 different ways. Using INSERT […]
This article gives the difference between hadoop and big data. I personally found many students have confusion between hadoop and big data. Actually both are different entity. In one sentense I would say “big data is the problem” and “Hadoop is framework which provide the solution to resolve big data problem.” When you go for […]
I have seen a many people those who want to switch their career from the Oracle DBA to the Hadoop but they are not sure how to start and from where to start it. after working so many years in the Oracle DBA Technology recently I have sw itch my career from the Oracle DBA […]
Apache Spark SQL is a Spark module to simplify working with structured data using DataFrame and DataSet abstractions in Python, Java, and Scala. These abstractions are the distributed collection of data organized into named columns. It provides a good optimization technique. Using Spark SQL we can query data, both from inside a Spark program and […]
Basically reduceByKey function works only for RDDs which contains key and value pairs kind of elements(i.e RDDs having tuple or Map as a data element). It is a transformation operation which means it is lazily evaluated. We need to pass one associative function as a parameter, which will be applied to the source RDD and […]