Changes

GPU621/Apache Spark Fall 2022

546 bytes added, 03:20, 7 December 2022

no edit summary

==Spark Application==

===1~~: Suitable for complex~~ . The iterative operations and the multiple operations of the specific data sets ===Spark is developed based on a memory-based iterative computing framework, so Spark has the advantage that the amount of read data will increase as the number of iterations increases. In the case of where iterative operations are applied or specific data sets need to be operated multiple times, Spark is very effective.===2. Real-time calculation===With the batch processingcapability of Spark Streaming, ~~such as~~ Spark has the advantage of large throughput in real-time statistical analysis calculations.===3. Batch ~~Data Processing~~data processing=== ~~This type of processing focuses on~~ Spark has the ability to process massive ~~amounts of~~ data~~, not the speed~~ *With advantage of ~~processing. Therefore~~high throughput with Spark Streaming, ~~the general processing~~ so Spark can also process massive data in real time ~~of this type is usually from minutes to several hours. A similar situation is the MapReduce computing method used by hadoop~~.

==Apache Spark Core API==

https://data-flair.training/blogs/spark-rdd-operations-transformations-actions/

https://hevodata.com/learn/spark-batch-processing/

https://spark.apache.org/docs/latest/streaming-programming-guide.html

AlanHuang

10

edits

Changes

GPU621/Apache Spark Fall 2022

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools