Difference between revisions of "GPU621/ApacheSpark"
(Created page with "== Team Members == # [mailto:sathia@senecacollege.ca?subject=gpu621 Shreena Athia ] # [mailto:wpan17@senecacollege.ca?subject=gpu621 Wang Pan] == Introduction == == What is...") |
(→Introduction) |
||
Line 5: | Line 5: | ||
== Introduction == | == Introduction == | ||
+ | |||
+ | === What is Apache Spark ? === | ||
+ | |||
+ | An open-source distributed general-purpose cluster-computing framework for Big Data. | ||
+ | |||
+ | === History of Apache Spark === | ||
+ | |||
+ | 2009: a distributed system framework initiated at UC Berkeley AMPLab by MateiZaharia | ||
+ | 2010: Open sourced under a BSD license | ||
+ | 2013: The project was donated to the Apache Software Foundation and the license was changed to Apache 2.0 | ||
+ | 2014: Became an Apache Top-Level Project. Used by Databricks to set a world record in large-scale sorting in November. | ||
+ | 2014-present: Exists as a next generation real-time and batch processing framework. | ||
+ | |||
+ | === Why Apache Spark === | ||
+ | |||
+ | Data is exploded in volume, velocity and variety | ||
+ | The need to have faster analytic results becomes increasingly important | ||
+ | Support near real time analytics to answer business questions | ||
+ | |||
== What is Apache Spark == | == What is Apache Spark == | ||
== How it works == | == How it works == |
Revision as of 15:11, 25 November 2018
Contents
Team Members
Introduction
What is Apache Spark ?
An open-source distributed general-purpose cluster-computing framework for Big Data.
History of Apache Spark
2009: a distributed system framework initiated at UC Berkeley AMPLab by MateiZaharia 2010: Open sourced under a BSD license 2013: The project was donated to the Apache Software Foundation and the license was changed to Apache 2.0 2014: Became an Apache Top-Level Project. Used by Databricks to set a world record in large-scale sorting in November. 2014-present: Exists as a next generation real-time and batch processing framework.
Why Apache Spark
Data is exploded in volume, velocity and variety The need to have faster analytic results becomes increasingly important Support near real time analytics to answer business questions