Changes

Jump to: navigation, search

GPU621/Apache Spark Fall 2022

5 bytes added, 15:52, 5 December 2022
RDD APIs
When the action is triggered after the result, new RDD is not formed like transformation. Thus, Actions are Spark RDD operations that give non-RDD values. The values of action are stored to drivers or to the external storage system. It brings laziness of RDD into motion. reduce(func)
2.1. Reduce()
Aggregate the elements of the dataset using a function func (which takes two arguments and returns one).
System.out.println(result);
2. 2 Count()
count() returns the number of elements in RDD.
2.3. take(n)
The action take(n) returns n number of elements from RDD. It tries to cut the number of partition it accesses, so it represents a biased collection. We cannot presume the order of the elements.
2.4. collect()
The action collect() is the common and simplest operation that returns our entire RDDs content to driver program.
2.5. foreach()
When we have a situation where we want to apply operation on each element of RDD, but it should not return value to the driver. In this case, foreach() function is useful.
92
edits

Navigation menu