Changes

← Older edit

GPU621/ApacheSpark

2,074 bytes added, 13:36, 26 November 2018

→‎Finance and Stock trading Use Case

=== Why Apache Spark ===

Data is exploded in volume, velocity and variety The need to have faster analytic results becomes increasingly important Support near real time analytics to answer business questions === Spark and Hadoop ===Hadoop = HDFS(Hadoop Distributed File System) + MapReduce(data processing model) Spark is advanced data processing/analysis model which is replacing MapReduce Spark does not have its own file system so it run on the top of HDFS [[File:10a.PNG]] === Spark vs MapReduce === [[File:3.PNG]]

== Features ==

Easy to use Supporting python. Java and Scala Libraries for sql, ml, streaming General-purpose Batch like MapReduce is included Iterative algorithm Interactive queries and streaming which return results immediately Speed In memory computations Faster than MapReduce for complex application on disks [[File:2abc.png ]] == Resilient Distributed Datasets (RDDs) ==Spark revolves around RDDs it is a fundamental data structure in spark. It is an immutable distributed collection of objects which can be operated on in parallel. Two ways to implement RDDs 1) Parallelizing an existing collection 2) Referencing a data set in an external storage system === Operations === Transformations Create a new data set from existing one [[File:5bc.PNG ]] Actions Return a value to the driver program after running computation on data set [[File:6.PNG]] These examples and more are found at https://spark.apache.org/docs/latest/rdd-programming-guide.html

== Examples ==

=== Word Count ===

[[File:4.PNG]]

Using transformations ( flatmap, map, reduceByKey ) to build a data set of string and int pairs. It is then saved into a file

=== Finance and Stock trading Use Case ===

Imagine that you are working for a financial company and your job is to buy in and buy out stocks to make money. The decision you make highly depends on the prediction which is calculated by your financial model. In this kind of situation, how long it takes for your financial model to make a prediction is very critical. We know that the price of stocks change very fast. In a couple seconds a stock can change prices drastically. Thus, if your model cannot provide you a near real time response, you might lose your opportunity to trade your stocks with the best price. Apache Spark can be utilized to create financial models to make predictions in real time.

[[File:7ab.png]]

Sathia

33

edits

Changes

GPU621/ApacheSpark

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools