Changes

Jump to: navigation, search

GPU621/Apache Spark Fall 2022

546 bytes added, 03:20, 7 December 2022
no edit summary
==Spark Application==
===1: Suitable for complex . The iterative operations and the multiple operations of the specific data sets ===Spark is developed based on a memory-based iterative computing framework, so Spark has the advantage that the amount of read data will increase as the number of iterations increases. In the case of where iterative operations are applied or specific data sets need to be operated multiple times, Spark is very effective.===2. Real-time calculation===With the batch processingcapability of Spark Streaming, such as Spark has the advantage of large throughput in real-time statistical analysis calculations.===3. Batch Data Processingdata processing=== This type of processing focuses on Spark has the ability to process massive amounts of data, not the speed *With advantage of processing. Thereforehigh throughput with Spark Streaming, the general processing so Spark can also process massive data in real time of this type is usually from minutes to several hours. A similar situation is the MapReduce computing method used by hadoop.
==Apache Spark Core API==
https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/
 
https://hevodata.com/learn/spark-batch-processing/
 
https://spark.apache.org/docs/latest/streaming-programming-guide.html
10
edits

Navigation menu