33
edits
Changes
→Introduction
== Introduction ==
=== What is Apache Spark ? ===
An open-source distributed general-purpose cluster-computing framework for Big Data.
=== History of Apache Spark ===
2009: a distributed system framework initiated at UC Berkeley AMPLab by MateiZaharia
2010: Open sourced under a BSD license
2013: The project was donated to the Apache Software Foundation and the license was changed to Apache 2.0
2014: Became an Apache Top-Level Project. Used by Databricks to set a world record in large-scale sorting in November.
2014-present: Exists as a next generation real-time and batch processing framework.
=== Why Apache Spark ===
Data is exploded in volume, velocity and variety
The need to have faster analytic results becomes increasingly important
Support near real time analytics to answer business questions
== What is Apache Spark ==
== How it works ==