Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

12 bytes removed, 12:18, 7 December 2022
1. Spark Core
===1. Spark Core===
The Spark core is the project's foundation, providing distributed task scheduling, scheduling, and basic I/O functionality. The underlying program abstraction is called Resilient Distributed Datasets, or RDDs, which is a collection of data that can be manipulated in parallel through fault-tolerant mechanisms. The abstraction of RDDs is presented through language integration APIs in Scala, Java, and Python, simplifying programming complexity and allowing applications to manipulate RDDs in a manner similar to manipulating native datasets.
===2. Spark SQL===
92
edits