Changes

GPU621/Apache Spark Fall 2022

618 bytes added, 03:48, 7 December 2022

no edit summary

==Spark Application==

===1. The iterative operations and the multiple operations of the specific data sets ===

Spark is developed based on a memory-based iterative computing framework, so Spark has the advantage that the amount of read data will increase as the number of iterations increases. In the case of where iterative operations are applied or specific data sets need to be operated multiple times, Spark is very effective.

Work nodes are used to submit tasks to executors, report executor status information, cpu and memory information to the cluster manager.

====4. Executor====

Components that perform computational tasks. It is a process responsible for running tasks, saving data and returning result data.

===The implementation of Spark has the following steps===1. The SparkContext applies for computing resources from the Cluster Manager.2. The Cluster Manager receives the request and start allocating the resources. (Creates and activates the executor on the worker node.)3. The SparkContext sends the program/application code (jar package or python file, etc.) and task to the Executor. Executor executes the task and saves/returns the result4. The SparkContext will collect the results.

==Apache Spark Core API==

AlanHuang

10

edits

CDOT Wiki β

Changes

GPU621/Apache Spark Fall 2022

CDOT Wiki ^β