spark
Var keyword is just similar to variable declaration in Java whereas Val is little different. Once a variable is declared using Val the reference cannot be changed to point to another reference. This functionality of Val keyword in Scala can be related to the functionality of java final keyword. Val refers to immutable declaration of […]
Lazy evaluation in Spark means that the execution will not start until an action is triggered. The Spark Lazy evaluation, users can divide into smaller operations. It reduces the number of passes on data by transformation grouping operation. By lazy evaluation in Spark to saves the trip between driver and cluster, speed up the process. […]
Apache Spark SQL is a Spark module to simplify working with structured data using DataFrame and DataSet abstractions in Python, Java, and Scala. These abstractions are the distributed collection of data organized into named columns. It provides a good optimization technique. Using Spark SQL we can query data, both from inside a Spark program and […]
Basically reduceByKey function works only for RDDs which contains key and value pairs kind of elements(i.e RDDs having tuple or Map as a data element). It is a transformation operation which means it is lazily evaluated. We need to pass one associative function as a parameter, which will be applied to the source RDD and […]