reduceByKey Spark

ANKUSH THAVALI
06 Jun, 2019
0 Comments
40 Secs Read

Basically reduceByKey function works only for RDDs which contains key and value pairs kind of elements(i.e RDDs having tuple or Map as a data element). It is a transformation operation which means it is lazily evaluated. We need to pass one associative function as a parameter, which will be applied to the source RDD and will create a new RDD as with resulting values(i.e. key value pair). This operation is a wide operation as data shuffling may happen across the partitions.

Following videos will explain briefly along with example. Please follow the youtube channel for further updates.

Example from following video.

val x = sc.parallelize(Array((“a”, 1), (“b”, 1), (“a”, 1),(“a”, 1), (“b”, 1), (“b”, 1),(“b”, 1), (“b”, 1)), 3)

val y = x.reduceByKey((accum, n) => (accum + n))

y.collect

reduceByKey Spark

The next success story is yours....

Get the right guidance to leap through your career

About Us

Explore

Useful Links

Contact Info