92
edits
Changes
→1. Spark Core
===1. Spark Core===
The Spark core is the project's foundation, providing distributed task scheduling, scheduling, and basic I/O functionality. The underlying program abstraction is called Resilient Distributed Datasets, or RDDs, which is a collection of data that can be manipulated in parallel through fault-tolerant mechanisms. The abstraction of RDDs is presented through language integration APIs in Scala, Java, and Python, simplifying programming complexity and allowing applications to manipulate RDDs in a manner similar to manipulating native datasets.
===2. Spark SQL===