Changes

GPU621/Apache Spark Fall 2022

113 bytes added, 15:32, 30 November 2022

→‎RDD Overview

==Apache Spark Core API==

===RDD Overview===

One of the most important concepts in Spark is a resilient distributed dataset (RDD). RDD is a collection of elements partitioned across the nodes of the cluster that can be operated in parallel. RDDs are created by starting with a file, or an existing Java collection in the driver program, and transforming it.We will introduce some key APIs provided by Spark Core 2.2.1 using Java 8.You can find more information about the RDD here. https://spark.apache.org/docs/2.2.1/rdd-programming-guide.html

===Spark Library Installation Using Maven===

RobinYu

92

edits

Changes

GPU621/Apache Spark Fall 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools