Changes

GPU621/Apache Spark Fall 2022

905 bytes added, 03:43, 7 December 2022

no edit summary

Spark has the ability to process massive data

*With the advantage of high throughput with Spark Streaming, Spark can also process massive data in real time.

==Spark cluster mode==

===1. Local mode===

For local development and testing, it is usually divided into local single-thread and local-cluster multi-thread.

===2. Standalone cluster mode===

Running on the Standalone cluster manager, Standalone is responsible for resource management, Spark is responsible for task scheduling and calculation

===3. Hadoop YARN cluster mode===

Running on the Hadoop YARN cluster manager, Hadoop YARN is responsible for resource management, Spark is responsible for task scheduling and calculation

===4. Apache Mesos cluster mode===

Running on the Apache Mesos cluster manager, Apache Mesos is responsible for resource management, Spark is responsible for task scheduling and computing

===5. Kubernetes cluster mode===

Running on the Kubernetes cluster manager, Kubernetes is responsible for resource management, and Spark is responsible for task scheduling and computing

==Apache Spark Core API==

AlanHuang

10

edits

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

CDOT Wiki ^β