Kiran Dalvi
06 Jun, 2019
0 Comments
40 Secs Read

reduceByKey Spark

Basically reduceByKey function works only for RDDs which contains key and value pairs kind of elements(i.e RDDs having tuple or Map as a data element). It is a transformation operation which means it is lazily evaluated. We need to pass one associative function as a parameter, which will be applied to the source RDD and will create a new RDD as with resulting values(i.e. key value pair). This operation is a wide operation as data shuffling may happen across the partitions.

Following videos will explain briefly along with example. Please follow the youtube channel for further updates.

Example from following video.

val x = sc.parallelize(Array((“a”, 1), (“b”, 1), (“a”, 1),(“a”, 1), (“b”, 1), (“b”, 1),(“b”, 1), (“b”, 1)), 3)

val y = x.reduceByKey((accum, n) => (accum + n))

y.collect

reduceByKey Spark

Join Our Free Webinar Today!

Explore Insights, Trends, and Expert Advice

About Us

Explore

Useful Links

Contact Info