27
edits
Changes
→HISTORY
Spark is Big Data framework for large scale data procesing. It provides an API centred on a data structure called the Resilient Distributed Dataset (RDD). It provides a read only, fault tolerant multiset of data items distributed over a cluster of machines. High-level APIs are available for Scala, Java, Python, and R. This tutorial focuses on Python code for its simplicity and popularity.
=== HISTORY History ===
Spark was developed in 2009 at UC Berkeleys AMPLab. It was open sourced in 2010 under the BSD license. As of this writing (November 2016), it's at version 2.02.