Open main menu

CDOT Wiki β

Changes

GPU621/ApacheSpark

805 bytes added, 15:11, 25 November 2018
Introduction
== Introduction ==
 
=== What is Apache Spark ? ===
 
An open-source distributed general-purpose cluster-computing framework for Big Data.
 
=== History of Apache Spark ===
 
2009: a distributed system framework initiated at UC Berkeley AMPLab by MateiZaharia
2010: Open sourced under a BSD license
2013: The project was donated to the Apache Software Foundation and the license was changed to Apache 2.0
2014: Became an Apache Top-Level Project. Used by Databricks to set a world record in large-scale sorting in November.
2014-present: Exists as a next generation real-time and batch processing framework.
 
=== Why Apache Spark ===
 
Data is exploded in volume, velocity and variety
The need to have faster analytic results becomes increasingly important
Support near real time analytics to answer business questions
 
== What is Apache Spark ==
== How it works ==
33
edits