Difference between revisions of "GPU621/Jedd Clan"

From CDOT Wiki
Jump to: navigation, search
(Data Management)
(Data Analytics Pipeline)
Line 27: Line 27:
 
== Data Analytics Pipeline ==
 
== Data Analytics Pipeline ==
 
*The Intel® Data Analytics Acceleration Library provides optimized building blocks for the various stages of data analysis
 
*The Intel® Data Analytics Acceleration Library provides optimized building blocks for the various stages of data analysis
[[File:DataAnalyticsStages.jpg]]
+
[[File:DataAnalyticsStages.jpg | 500px]]
  
 
== Data Management ==
 
== Data Management ==

Revision as of 16:39, 11 August 2021

Project Name

Intel Data Analytics Acceleration Library

Group Members

  • Jedd Chionglo
  • Gabriel Dizon

Background

Daal does this and stuf and more stuff ...................... To install the Intel DAAL library follow the instructions.

Features

  • Produce quicker and better predictions
  • Analyze large datasets with available computer resources
  • Optimize data ingestion and algorithmic compute simultaneously
  • Supports offline, distributed, and streaming usage models
  • Handles big data better than libraries such as Intel’s Math Kernel Library (MKL)
  • Maximum Calculation Performance
  • High-Speed Algorithms

Users of the Library

  • Data Scientist
  • Researchers
  • Data Analysts

Data Analytics Pipeline

  • The Intel® Data Analytics Acceleration Library provides optimized building blocks for the various stages of data analysis

DataAnalyticsStages.jpg

Data Management

  • Raw Data Acquisition
  • Data preperation
  • Algorithim computation

ManagemenFlowDal.jpg
DataSet.jpg

Building Blocks

  • DAAL helps with aspects of data analytics from the tools used for managing data to computational algorithms

BuildingBlocks.jpg

Computations

  • Must choose an algorithim for the application

Algorithims.jpg

  • Modes of Computation
    • Batch Mode - simplest mode uses a single data set
    • Online Mode - multiple training sets
    • Distributed Mode - computation of partial results and supports multiple data sets

ComputationMode.jpg


How To Use Intel DAAL

In the example below we will show how to use the basics of the intel DAAL library. The example looks at the hydrodynamics of yachts and builds a predictive model based on that information. It uses linear regression to extrapolate the data based on the training algorithm and predictive modelling, more specifically it uses polynomial regression. This will show how to load data, call loaded data, create a training model based on the information, show how to use the trained model for predictions, apply implementations to the model and then finally test the quality of the data.

How To Load Data

Intel DAAL requires the use of numeric tables as inputs there are three different types of tables:

  • Heterogenous - contains multiple data types
  • Homogeneous - only one data type
  • Matrices - used when matrix algebra is needed

The information can be loaded offline using two different methods:

Arrays

// Array containing the data
const int nRows = 100;
const int nCols = 100;
double* rawData = (double*) malloc(sizeof(double)*nRows*nCols);

// Creating the numeric table
NumericTable* dataTable = new HomogenNumericTable<double>(rawData, nCols, nRows);

// Creating a SharedPtr table
services::SharedPtr<NumericTable> sharedNTable(dataTable);

CSV Files - The rows should be determined during runtime, in the example hard coded 1000

string dataFileName = "/path/to/file/datafile.csv";
const int nRows = 1000; // number of rows to be read

// Create the data source
FileDataSource<CSVFeatureManager> dataSource(dataFileName,
DataSource::doAllocateNumericTable,
DataSource::doDictionaryFromContext);

// Load data from the CSV file
dataSource.loadDataBlock(nRows);

// Extract NumericTable
services::SharedPtr<NumericTable> sharedNTable;

How to Extract Data

Information can be extracted directly from the table.

services::SharedPtr<NumericTable> dataTable;
// ... Populate dataTable ... //

double* rawData = dataTable.getArray();

Can also acquire data by transferring information to a BlockDescriptor object

services::SharedPtr<NumericTable> dataTable;
services::SharedPtr<NumericTable> dataTable;
// ... Populate dataTable ... //

BlockDescriptor<double> block;
//offset defines the row number one wants to begin at, number of rows, read/write permissions, block is the object being written to
dataTable->getBlockOfRows(offset, numRows, readwrite, block);
double* rawData = block.getBlockPtr();


How to create the Training Model

After the data is loaded into a numeric table it is then put into an algorithm and a trained model is created. It first requires features and response values to be inputted.

// Setting up the training sets.
services::SharedPtr<NumericTable> trnFeatures(trnFeatNumTable);
services::SharedPtr<NumericTable> trnResponse(trnRespNumTable);

// Setting up the algorithm object
training::Batch<> algorithm;
algorithm.input.set(training::data, trnFeatures);
algorithm.input.set(training::dependentVariables, trnResponse);

// Training
algorithm.compute();

// Extracting the result
services::SharedPtr<training::Result> trainingResult;
trainingResult = algorithm.getResult();

The example used default values for the Batch object but these can be configured to show the type/precision (float, double) and method which defines the mathematical algorithm which can be used for computation.

training::Batch<algorithmFPType=TYPE, method=MTHD> algorithm;


Creating the Prediction Model

The predictive model requires the training portion to be completed because it uses the model made from there. It also has a batch object similar to the training portion with the same types of inputs. This algorithm will need two different inputs the data and a model. The values are extracted from the results object and once obtained will compute the predicted responses to the test feature vectors. The results will be printed out in a one dimensional result.

// ... Set up: training algorithm  ... //
services::SharedPtr<training::Result> trainingResult;
trainingResult = algorithm.getResult();
services::SharedPtr<NumericTable> testFeatures(tstFeatNumTable);
// ... Set up: populating testFeatures  ... //

// Creating the algorithm object
prediction::Batch<> algorithm;
algorithm.input.set(prediction::data, testFeatures);
algorithm.input.set(prediction::model, trainingResult->get(training::model));

// Training
algorithm.compute();

// Extracting the result
services::SharedPtr<prediction::Result> predictionResult;
predictionResult = algorithm.getResult();
BlockDescriptor<double> resultBlock;
predictionResult->get(prediction::prediction)->getBlockOfRows(0, numDepVariables,
readOnly, resultBlock);
double* result = resultBlock.getBlockPtr();

TrainingPrediction.jpg
This image shows what has been created so far. The training has made a data model with the use of the features and responses and with that model the prediction is able to test new features and output a response based on the training information.

Implementation

For this example once again it required the use of polynomial regression which required the information to be expanded upon before training. After that is completed it would then be implemented similarly to above and a prediction model can be created once again.

// Getting data from the source
FileDataSource<CSVFeatureManager> featuresSrc(trainingFeaturesFile,
DataSource::doAllocateNumericTable,
DataSource::doDictionaryFromContext);
featuresSrc.loadDataBlock(nTrnVectors);

// Creating a block object to extract data
BlockDescriptor<double> features_block;
featuresSrc.getNumericTable()->getBlockOfRows(0, nTrnVectors,
readOnly, features_block);

// Getting the pointer to the data
double* featuresArray = features_block.getBlockPtr();

// Expanding the data (see source for full implementation)
const int features_count = nFeatures*expansion*nTrnVectors;
double * expanded_tstFeatures = (double*) malloc(sizeof(double)*features_count);
expand_feature_vector(trnFeatures_block.getBlockPtr(), expanded_trnFeatures,
nFeatures, nTrnVectors, expansion);

// Repackaging the result into a numeric table
HomogenNumericTable<double> expanded_table(expanded_trnFeatures,
nFeatures*expansion, nTrnVectors);
trainingFeaturesTable = services::SharedPtr<NumericTable>(expanded_table);


TableValue.jpg
Shows the results of the reference vs the prediction for 1st - 3rd order expansions


TableValueGraph.jpg
The information plotted graphically