Changes

GPU621/Apache Spark

145 bytes added, 17:46, 30 November 2020

→‎Architecture

== Architecture ==

One of the distinguishing features of Spark is that it processes data in RAM using a concept known as Resilient Distributed Datasets (RDDs) - an immutable distributed collection of objects which can contain any type of Python, Java, or Scala objects, including user-defined classes. Each dataset is divided into logical partitions which may be computed on different nodes of the cluster. Spark's RDDs function as a working set for distributed programs that offer a restricted form of distributed shared memory. Another important abstraction in Spark is Directed Acyclic Graph or DAG which is the scheduling layer that implements stage-oriented scheduling.

Abalachandran7

73

edits

CDOT Wiki β

Changes

GPU621/Apache Spark

CDOT Wiki ^β