Changes

GPU621/Apache Spark Fall 2022

12 bytes removed, 11:18, 7 December 2022

→‎1. Spark Core

===1. Spark Core===

The Spark core is the project's foundation, providing distributed task ~~scheduling,~~ scheduling, and basic I/O functionality. The underlying program abstraction is called Resilient Distributed Datasets, or RDDs, which is a collection of data that can be manipulated in parallel through fault-tolerant mechanisms. The abstraction of RDDs is presented through language integration APIs in Scala, Java, and Python, simplifying programming complexity and allowing applications to manipulate RDDs in a manner similar to manipulating native datasets.

===2. Spark SQL===

RobinYu

92

edits

Changes

GPU621/Apache Spark Fall 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools