92
edits
Changes
→RDD Overview
===RDD Overview===
One of the most important concepts in Spark is a resilient distributed dataset (RDD). RDD is a collection of elements partitioned across the nodes of the cluster that can be operated in parallel. RDDs are created by starting with a file, or an existing Java collection in the driver program, and transforming it.
We will introduce some key APIs provided by Spark Core 2.2.1 using Java 8.
===Spark Library Installation Using Maven===