Changes

GPU621/ApacheSpark

805 bytes added, 16:11, 25 November 2018

→‎Introduction

== Introduction ==

=== What is Apache Spark ? ===

An open-source distributed general-purpose cluster-computing framework for Big Data.

=== History of Apache Spark ===

2009: a distributed system framework initiated at UC Berkeley AMPLab by MateiZaharia

2010: Open sourced under a BSD license

2013: The project was donated to the Apache Software Foundation and the license was changed to Apache 2.0

2014: Became an Apache Top-Level Project. Used by Databricks to set a world record in large-scale sorting in November.

2014-present: Exists as a next generation real-time and batch processing framework.

=== Why Apache Spark ===

Data is exploded in volume, velocity and variety

The need to have faster analytic results becomes increasingly important

Support near real time analytics to answer business questions

== What is Apache Spark ==

== How it works ==

33

edits