Changes

Jump to: navigation, search

The Real A Team

319 bytes added, 20:40, 29 March 2016
Using Apache's Spark
{{GPU621/DPS921 Index | 20161}}
= Using Apache's Spark =
This assignment is dedicated to learning how to use Apache's Spark. I chose to use the programming language Scala because it is the native language Spark was created for.
 
== A Team Members ==
# [mailto:aasauvageot@myseneca.ca?subject=dps921 Adrian Sauvageot], All
</nowiki>
====Transformation====
The next step is to transform the data that is in the first RDD into something you want to use. You can do this by using filter, map, union, join, sort, etc. In this example we would like to count the number of times each word is used in a piece of text.
The following code snip-it will split the RDD by spaces, and map each word with a key value. The value is set to 1 to count each word as 1.
<nowiki>
====Action====
Finally an action is taken to give the programmer the desired output, for example count, collect, reduce the RDD, lookup, save. In this example we want to reduce the RDD on each word by adding the value of the word. To do this we can use:
<nowiki>
.reduceByKey(_ + _)

Navigation menu