92
edits
Changes
→Apache Spark Core API
//From a S3 file
JavaRDD<String> sentences = sc.textFile("s3://gpu621-demo/input.txt");
===RDD APIs===
Basically there are types of APIs. Transformations and Actions. Transformation APIs are functions that could return another RDD set. Using these APIs, we can create child RDD from parent RDD. Actions are the functions we want to perform onto the actual dataset. They will not return new RDDs.
1. Transformations
1.1 map(func)
The map function iterates over every line in RDD and split into new RDD. It receives a function, and will use that function to each line and create new RDD.
//create input data list
List<Integer> inputData = new ArrayList<>();
inputData.add(11);
inputData.add(22);
inputData.add(33);
inputData.add(44);
//create RDD
JavaRDD<Integer> javaRDD = sc.parallelize(inputData);
// map from one RDD to another RDD
JavaRDD<Double> mapRDD = javaRDD.map(value -> Math.sqrt(value));
// 3.3166247903554, 4.69041575982343, 5.744562646538029, 6.6332495807108
mapRDD.foreach(value->System.out.println(value));
==Deploy Apache Spark Application On AWS==