Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark

436 bytes added, 12:10, 30 November 2020
Architecture
[[File: Cluster-overview.png|thumb|upright=1|right|alt=Spark cluster|4.1 Spark Cluster components]]
 At a fundamental level, an Apache Spark application consists of two main components: a driver, which converts the user's code into multiple tasks that can be distributed across worker nodes, and executors, which run on those nodes and execute the tasks assigned to them. The processes are coordinated by the SparkContext object in the driver program. The SparkContext can connect to several types of cluster managers which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for the application. Next, it sends the application code to the executors and finally sends tasks to the executors to run.
== Components ==