Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

905 bytes added, 03:43, 7 December 2022
no edit summary
Spark has the ability to process massive data
*With the advantage of high throughput with Spark Streaming, Spark can also process massive data in real time.
 
==Spark cluster mode==
 
===1. Local mode===
For local development and testing, it is usually divided into local single-thread and local-cluster multi-thread.
===2. Standalone cluster mode===
Running on the Standalone cluster manager, Standalone is responsible for resource management, Spark is responsible for task scheduling and calculation
===3. Hadoop YARN cluster mode===
Running on the Hadoop YARN cluster manager, Hadoop YARN is responsible for resource management, Spark is responsible for task scheduling and calculation
===4. Apache Mesos cluster mode===
Running on the Apache Mesos cluster manager, Apache Mesos is responsible for resource management, Spark is responsible for task scheduling and computing
===5. Kubernetes cluster mode===
Running on the Kubernetes cluster manager, Kubernetes is responsible for resource management, and Spark is responsible for task scheduling and computing
 
==Apache Spark Core API==
10
edits