Difference between revisions of "SLEEPy"
(→Batch Sorting) |
(→Batch Sorting) |
||
Line 110: | Line 110: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
+ | Results | ||
[[File:DAAL-Sort-Batch.PNG|Left|alt=DAAL Sort Batch.]] | [[File:DAAL-Sort-Batch.PNG|Left|alt=DAAL Sort Batch.]] | ||
+ | |||
+ | The data is sorted from smallest to largest per column. | ||
== Useful Link == | == Useful Link == | ||
# https://software.intel.com/en-us/daal | # https://software.intel.com/en-us/daal |
Revision as of 19:05, 10 April 2016
GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary
Contents
Intel Data Analytics Acceleration Library (DAAL)
Team Member
Intro OLD
Local DAAL Examples Location: C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2016\windows\daal\examples
Data: http://open.canada.ca/data/en/dataset/cad804cd-454e-4bd7-9f22-fcee64f60719
New Data: http://open.canada.ca/data/en/dataset/be3880f2-0d04-4583-8265-611b231ebce8
Parser code: https://software.intel.com/en-us/node/610127
Low Order Moments: https://software.intel.com/en-us/node/599561
Our goal is to parse & process this crime data and to add more meaning to said data. Using various parallel techniques taught in the course and comparing them via the DAAL library.
Introduction
DAAL is a C++ & Java / Scala library for data analytics. It's similar to MKL with some differences:
- MKL focuses on computation. DAAL focuses on the entire data flow (aquisition, transformation, processing).
- Optimized for all kinds of Intel based devices (from data center to home computers)
DAAL supports 3 processing modes
- Offline Processing (Batch) - Data can fit in memory, data can be processed all at once.
- Online Processing (Streaming) - Data is too big for memory, DAAL processes the data in chunks and combine the partial results for the final result.
- Distributed processing - Distributes data processing. DAAL has not bound the communication method and leaves it to the developer (Hadoop, Spark, MPI etc).
Parallel
Code Examples
Batch Sorting
CSV Data
-55.558252,63.051427,-27.793776,
-75.622534,61.212279,-16.283311,
-86.747071,-28.132241,-17.824316,
-34.172101,-51.404172,14.670925,
-61.506308,48.248030,-99.235341,
9.746765,-89.879258,55.561778,
48.896723,-32.648097,48.313603,
-15.346015,9.769776,-33.483281,
56.726081,-87.272631,8.724224,
-1.926802,54.960580,-78.723429,
45.237223,-79.764218,-47.271926,
84.138339,11.547818,-92.962952,
46.711824,-42.623510,-34.664673,
55.813112,19.803475,4.807766,
-55.474098,-72.163755,89.425736,
-7.566596,-77.829218,58.630172,
-76.081937,-12.089445,-44.065054,
-25.888944,46.425499,-37.515164,
-30.201387,-16.237217,-50.716591,
-88.085869,60.136249,54.812866
Code
/* file: sorting_batch.cpp
* Copyright 2014-2016 Intel Corporation All Rights Reserved.*/
#include "daal.h"
#include "service.h"
using namespace daal;
using namespace daal::algorithms;
using namespace daal::data_management;
using namespace std;
/* Input data set parameters */
string datasetFileName = "../data/batch/sorting.csv";
int main(int argc, char *argv[])
{
checkArguments(argc, argv, 1, &datasetFileName);
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file */
FileDataSource<CSVFeatureManager> dataSource(datasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);
/* Retrieve the data from the input file */
dataSource.loadDataBlock();
/* Create algorithm objects to sort data using the default (radix) method */
sorting::Batch<> algorithm;
/* Print the input observations matrix */
printNumericTable(dataSource.getNumericTable(), "Initial matrix of observations:");
/* Set input objects for the algorithm */
algorithm.input.set(sorting::data, dataSource.getNumericTable());
/* Sort data observations */
algorithm.compute();
/* Get the sorting result */
services::SharedPtr<sorting::Result> res = algorithm.getResult();
printNumericTable(res->get(sorting::sortedData), "Sorted matrix of observations:");
return 0;
}
The data is sorted from smallest to largest per column.