Changes

GPU621/Jedd Clan

5,500 bytes added, 16:20, 11 August 2021

no edit summary

Jedd Chionglo

Gabriel Dizon

== Background ==

Daal does this and stuf and more stuff ......................

To install the Intel DAAL library follow the [https://software.intel.com/en-us/get-started-with-daal-for-linux instructions].

== Features ==

*The Intel® Data Analytics Acceleration Library provides optimized building blocks for the various stages of data analysis

[[File:DataAnalyticsStages.jpg]]

== Data Management ==

*Raw Data Acquisition

*Data preperation

*Algorithim computation

[[File:ManagemenFlowDal.jpg]]

[[File:DataSet.jpg]]

== Building Blocks ==

*DAAL helps with aspects of data analytics from the tools used for managing data to computational algorithms

[[File:BuildingBlocks.jpg]]

== Computations ==

*Must choose an algorithim for the application

[[File:Algorithims.jpg]]

*Modes of Computation

**Batch Mode - simplest mode uses a single data set

**Online Mode - multiple training sets

** Distributed Mode - computation of partial results and supports multiple data sets

[[File:ComputationMode.jpg]]

== How To Use Intel DAAL ==

In the example below we will show how to use the basics of the intel DAAL library.

The example looks at the hydrodynamics of yachts and builds a predictive model based on that information.

It uses linear regression to extrapolate the data based on the training algorithm and predictive modelling,

more specifically it uses polynomial regression.

== How To Load Data ==

Intel DAAL requires the use of numeric tables as inputs there are three different types of tables:

* Heterogenous - contains multiple data types

* Homogeneous - only one data type

* Matrices - used when matrix algebra is needed

The information can be loaded offline using two different methods:

Arrays

// Array containing the data

const int nRows = 100;

const int nCols = 100;

double* rawData = (double*) malloc(sizeof(double)*nRows*nCols);

// Creating the numeric table

NumericTable* dataTable = new HomogenNumericTable<double>(rawData, nCols, nRows);

// Creating a SharedPtr table

services::SharedPtr<NumericTable> sharedNTable(dataTable);

</source>

CSV Files - The rows should be determined during runtime, in the example hard coded 1000

string dataFileName = "/path/to/file/datafile.csv";

const int nRows = 1000; // number of rows to be read

// Create the data source

FileDataSource<CSVFeatureManager> dataSource(dataFileName,

DataSource::doAllocateNumericTable,

DataSource::doDictionaryFromContext);

// Load data from the CSV file

dataSource.loadDataBlock(nRows);

// Extract NumericTable

services::SharedPtr<NumericTable> sharedNTable;

</source>

== How to Extract Data ==

Information can be extracted directly from the table.

services::SharedPtr<NumericTable> dataTable;

// ... Populate dataTable ... //

double* rawData = dataTable.getArray();

</source>

Can also acquire data by transferring information to a BlockDescriptor object

services::SharedPtr<NumericTable> dataTable;

// ... Populate dataTable ... //

BlockDescriptor<double> block;

//offset defines the row number one wants to begin at, number of rows, read/write permissions, block is the object being written to

dataTable->getBlockOfRows(offset, numRows, readwrite, block);

double* rawData = block.getBlockPtr();

</source>

== How to create the Training Model ==

After the data is loaded into a numeric table it is then put into an algorithm and a trained model is created.

It first requires features and response values to be inputted.

// Setting up the training sets.

services::SharedPtr<NumericTable> trnFeatures(trnFeatNumTable);

services::SharedPtr<NumericTable> trnResponse(trnRespNumTable);

// Setting up the algorithm object

training::Batch<> algorithm;

algorithm.input.set(training::data, trnFeatures);

algorithm.input.set(training::dependentVariables, trnResponse);

// Training

algorithm.compute();

// Extracting the result

services::SharedPtr<training::Result> trainingResult;

trainingResult = algorithm.getResult();

</source>

The example used default values for the Batch object but these can be configured to show the type/precision (float, double)

and method which defines the mathematical algorithm which can be used for computation.

training::Batch<algorithmFPType=TYPE, method=MTHD> algorithm;

</source>

== Creating the Prediction Model ==

The predictive model requires the training portion to be completed because it uses the model made from there.

It also has a batch object similar to the training portion with the same types of inputs. This algorithm will need two different inputs

the data and a model. The values are extracted from the results object and once obtained will compute the predicted responses to the

test feature vectors. The results will be printed out in a one dimensional result.

// ... Set up: training algorithm ... //

services::SharedPtr<training::Result> trainingResult;

trainingResult = algorithm.getResult();

services::SharedPtr<NumericTable> testFeatures(tstFeatNumTable);

// ... Set up: populating testFeatures ... //

// Creating the algorithm object

prediction::Batch<> algorithm;

algorithm.input.set(prediction::data, testFeatures);

algorithm.input.set(prediction::model, trainingResult->get(training::model));

// Training

algorithm.compute();

// Extracting the result

services::SharedPtr<prediction::Result> predictionResult;

predictionResult = algorithm.getResult();

BlockDescriptor<double> resultBlock;

predictionResult->get(prediction::prediction)->getBlockOfRows(0, numDepVariables,

readOnly, resultBlock);

double* result = resultBlock.getBlockPtr();

</source>

Jchionglo1

49

edits

Changes

GPU621/Jedd Clan

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools