Difference between revisions of "SLEEPy"
(→Introduction) |
(→Parallel) |
||
Line 39: | Line 39: | ||
[[File:Daal-flow.png|center|alt=DAAL Data Flow.]] | [[File:Daal-flow.png|center|alt=DAAL Data Flow.]] | ||
− | == | + | == How DAAL Works == |
+ | |||
+ | [[Image:DaalModel.png|center|600px ]] | ||
+ | [[Image:DAALDataSet.PNG|left|450px ]] | ||
+ | [[Image:DAALDataflow.PNG|right|600px ]] | ||
== Code Examples == | == Code Examples == |
Revision as of 15:42, 11 April 2016
GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary
Contents
Intel Data Analytics Acceleration Library (DAAL)
Team Member
Intro OLD
Local DAAL Examples Location: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016\windows\daal\examples
Data: http://open.canada.ca/data/en/dataset/cad804cd-454e-4bd7-9f22-fcee64f60719
New Data: http://open.canada.ca/data/en/dataset/be3880f2-0d04-4583-8265-611b231ebce8
Parser code: https://software.intel.com/en-us/node/610127
Low Order Moments: https://software.intel.com/en-us/node/599561
Our goal is to parse & process this crime data and to add more meaning to said data. Using various parallel techniques taught in the course and comparing them via the DAAL library.
Introduction
What is Big Data
Big Data is data that is so big that traditional methods of data processing fail to keep up. A lot of the time this type of data is related to human behavior measurable by computers. This maybe trends, web analytics etc. With so many people using technology, data gathered this way is becoming massive. This is one of the reasons for the rise of Big Data.
What is DAAL
DAAL is a C++ & Java / Scala library for data analytics. It's similar to MKL(Math Kernel Library) with some differences:
- MKL focuses on computation. DAAL focuses on the entire data flow (aquisition, transformation, processing).
- Optimized for all kinds of Intel based devices (from data center to home computers)
DAAL supports 3 processing modes
- Offline Processing (Batch) - Data can fit in memory, data can be processed all at once.
- Online Processing (Streaming) - Data is too big for memory, DAAL processes the data in chunks and combine the partial results for the final result.
- Distributed processing - Distributes data processing. DAAL has not bound the communication method and leaves it to the developer (Hadoop, Spark, MPI etc).
How DAAL Works
Code Examples
Batch Sorting
CSV Data:
-55.558252,63.051427,-27.793776, -75.622534,61.212279,-16.283311, -86.747071,-28.132241,-17.824316, -34.172101,-51.404172,14.670925, -61.506308,48.248030,-99.235341, 9.746765,-89.879258,55.561778, 48.896723,-32.648097,48.313603, -15.346015,9.769776,-33.483281, 56.726081,-87.272631,8.724224, -1.926802,54.960580,-78.723429, 45.237223,-79.764218,-47.271926, 84.138339,11.547818,-92.962952, 46.711824,-42.623510,-34.664673, 55.813112,19.803475,4.807766, -55.474098,-72.163755,89.425736, -7.566596,-77.829218,58.630172, -76.081937,-12.089445,-44.065054, -25.888944,46.425499,-37.515164, -30.201387,-16.237217,-50.716591, -88.085869,60.136249,54.812866
Code:
/* file: sorting_batch.cpp
* Copyright 2014-2016 Intel Corporation All Rights Reserved.*/
#include "daal.h"
#include "service.h"
using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace std;
/* Input data set parameters */
string datasetFileName = "../data/batch/sorting.csv";
int main(int argc, char *argv[])
{
checkArguments(argc, argv, 1, &datasetFileName);
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file */
FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);
/* Retrieve the data from the input file */
dataSource.loadDataBlock();
/* Create algorithm objects to sort data using the default (radix) method */
sorting::Batch<> algorithm;
/* Print the input observations matrix */
printNumericTable(dataSource.getNumericTable(), "Initial matrix of observations:");
/* Set input objects for the algorithm */
algorithm.input.set(sorting::data, dataSource.getNumericTable());
/* Sort data observations */
algorithm.compute();
/* Get the sorting result */
services::SharedPtr<sorting::Result> res = algorithm.getResult();
printNumericTable(res->get(sorting::sortedData), "Sorted matrix of observations:");
return 0;
}
Results:
The data is sorted from smallest to largest per column.
Data Blocks:
dataSource.loadDataBlock(5);
//dataSource.loadDataBlock(5); |
dataSource.loadDataBlock(5);
dataSource.loadDataBlock(5); |
"Blocks" of data are being loaded 5 rows at a time. This allows us to easily section off data to process. This is also one way of distributing data to MPI etc.