Open main menu

CDOT Wiki β

Changes

GPU621/Apache Spark

51 bytes added, 17:04, 30 November 2020
m
no edit summary
=== Compatibility ===
Spark can run as a standalone application or on top of Hadoop YARN or Apache Mesos. Spark supports data sources that implement Hadoop input format, so it can integrate with all the same data sources and file formats that Hadoop supports.
[[File:Hadoop-vs-spark.png|upright=2|right||300px]]
=== Data Processing ===
In addition to plain data processing, Spark can also process graphs, and it also has the MLlib machine learning library. Due to its high performance, Spark can do both real-time and batch processing. However, Hadoop MapReduce is great only for batch processing.
76
edits