Changes

GPU621/Apache Spark

96 bytes added, 15:50, 30 November 2020

→‎Comparison: Spark vs Hadoop MapReduce

= Comparison: Spark vs Hadoop MapReduce =

=== Performance ===

Spark processes data in RAM while Hadoop persists data back to the disk after a map or reduce action. Spark has been found to run '''100 times faster in-memory''', and '''10 times faster on disk'''. Spark won the 2014 Gray Sort Benchmark where it sorted 100TB of data using 206 machines in 23 minutes beating a Hadoop MapReduce cluster's previous world record of 72 minutes using 2100 nodes.

=== Security ===

Both Spark and Hadoop have access to support for Kerberos authentication, but Hadoop has more fine-grained security controls for HDFS.

= Spark vs Hadoop Wordcount Performance =

Abalachandran7

73

edits

Changes

GPU621/Apache Spark

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools