Open main menu

CDOT Wiki β

Changes

GPU621/Spark

13 bytes removed, 15:43, 24 November 2016
SPARK
This is a supplementary to the presentation for those who want an in depth walk through of the concepts and code. The presentation focused on an introduction to the technical and practical aspects of Spark and these notes will focus on the same.
=== SPARK Spark ===
Spark is Big Data framework for large scale data procesing. It provides an API centred on a data structure called the Resilient Distributed Dataset (RDD). It provides a read only, fault tolerant multiset of data items distributed over a cluster of machines. High-level APIs are available for Scala, Java, Python, and R. This tutorial focuses on Python code for its simplicity and popularity.
[[File: rdd-lineage-graph.png]]
<syntaxhighlight lang="scala">
val r00 = sc.parallelize(0 to 9)
val r01 = sc.parallelize(0 to 90 by 10)
27
edits