Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark

393 bytes added, 14:42, 30 November 2020
Performance
= Copmarison: Spark vs Hadoop MapReduce =
== Performance ==
Spark processes data in RAM while Hadoop persists data back to the disk after a map or reduce action. Spark has been found to run '''100 times faster in-memory''', and '''10 times faster on disk'''. Spark won the 2014 Gray Sort Benchmark where it sorted 100TB of data using 206 machines in 23 minutes beating a Hadoop MapReduce cluster's previous world record of 72 minutes using 2100 nodes.
 
== Ease of Use ==
== Data Processing ==