Open main menu

CDOT Wiki β

Changes

GPU621/Jedd Clan

5,500 bytes added, 16:20, 11 August 2021
no edit summary
Jedd Chionglo
Gabriel Dizon
 
== Background ==
Daal does this and stuf and more stuff ......................
To install the Intel DAAL library follow the [https://software.intel.com/en-us/get-started-with-daal-for-linux instructions].
== Features ==
*The Intel® Data Analytics Acceleration Library provides optimized building blocks for the various stages of data analysis
[[File:DataAnalyticsStages.jpg]]
 
== Data Management ==
*Raw Data Acquisition
*Data preperation
*Algorithim computation
[[File:ManagemenFlowDal.jpg]]
[[File:DataSet.jpg]]
 
== Building Blocks ==
*DAAL helps with aspects of data analytics from the tools used for managing data to computational algorithms
[[File:BuildingBlocks.jpg]]
 
== Computations ==
*Must choose an algorithim for the application
[[File:Algorithims.jpg]]
*Modes of Computation
**Batch Mode - simplest mode uses a single data set
**Online Mode - multiple training sets
** Distributed Mode - computation of partial results and supports multiple data sets
[[File:ComputationMode.jpg]]
 
 
== How To Use Intel DAAL ==
In the example below we will show how to use the basics of the intel DAAL library.
The example looks at the hydrodynamics of yachts and builds a predictive model based on that information.
It uses linear regression to extrapolate the data based on the training algorithm and predictive modelling,
more specifically it uses polynomial regression.
 
== How To Load Data ==
Intel DAAL requires the use of numeric tables as inputs there are three different types of tables:
* Heterogenous - contains multiple data types
* Homogeneous - only one data type
* Matrices - used when matrix algebra is needed
The information can be loaded offline using two different methods:
 
Arrays
<source lang=c++>
// Array containing the data
const int nRows = 100;
const int nCols = 100;
double* rawData = (double*) malloc(sizeof(double)*nRows*nCols);
 
// Creating the numeric table
NumericTable* dataTable = new HomogenNumericTable<double>(rawData, nCols, nRows);
 
// Creating a SharedPtr table
services::SharedPtr<NumericTable> sharedNTable(dataTable);
</source>
 
CSV Files - The rows should be determined during runtime, in the example hard coded 1000
<source lang=c++>
string dataFileName = "/path/to/file/datafile.csv";
const int nRows = 1000; // number of rows to be read
// Create the data source
FileDataSource<CSVFeatureManager> dataSource(dataFileName,
DataSource::doAllocateNumericTable,
DataSource::doDictionaryFromContext);
// Load data from the CSV file
dataSource.loadDataBlock(nRows);
 
// Extract NumericTable
services::SharedPtr<NumericTable> sharedNTable;
</source>
 
== How to Extract Data ==
Information can be extracted directly from the table.
<source lang=c++>
services::SharedPtr<NumericTable> dataTable;
// ... Populate dataTable ... //
 
double* rawData = dataTable.getArray();
</source>
 
Can also acquire data by transferring information to a BlockDescriptor object
<source lang=c++>
services::SharedPtr<NumericTable> dataTable;
services::SharedPtr<NumericTable> dataTable;
// ... Populate dataTable ... //
 
BlockDescriptor<double> block;
//offset defines the row number one wants to begin at, number of rows, read/write permissions, block is the object being written to
dataTable->getBlockOfRows(offset, numRows, readwrite, block);
double* rawData = block.getBlockPtr();
</source>
 
 
== How to create the Training Model ==
After the data is loaded into a numeric table it is then put into an algorithm and a trained model is created.
It first requires features and response values to be inputted.
 
<source lang=c++>
// Setting up the training sets.
services::SharedPtr<NumericTable> trnFeatures(trnFeatNumTable);
services::SharedPtr<NumericTable> trnResponse(trnRespNumTable);
 
// Setting up the algorithm object
training::Batch<> algorithm;
algorithm.input.set(training::data, trnFeatures);
algorithm.input.set(training::dependentVariables, trnResponse);
 
// Training
algorithm.compute();
 
// Extracting the result
services::SharedPtr<training::Result> trainingResult;
trainingResult = algorithm.getResult();
</source>
 
The example used default values for the Batch object but these can be configured to show the type/precision (float, double)
and method which defines the mathematical algorithm which can be used for computation.
<source lang=c++>
training::Batch<algorithmFPType=TYPE, method=MTHD> algorithm;
</source>
 
 
== Creating the Prediction Model ==
The predictive model requires the training portion to be completed because it uses the model made from there.
It also has a batch object similar to the training portion with the same types of inputs. This algorithm will need two different inputs
the data and a model. The values are extracted from the results object and once obtained will compute the predicted responses to the
test feature vectors. The results will be printed out in a one dimensional result.
<source lang=c++>
// ... Set up: training algorithm ... //
services::SharedPtr<training::Result> trainingResult;
trainingResult = algorithm.getResult();
services::SharedPtr<NumericTable> testFeatures(tstFeatNumTable);
// ... Set up: populating testFeatures ... //
 
// Creating the algorithm object
prediction::Batch<> algorithm;
algorithm.input.set(prediction::data, testFeatures);
algorithm.input.set(prediction::model, trainingResult->get(training::model));
 
// Training
algorithm.compute();
 
// Extracting the result
services::SharedPtr<prediction::Result> predictionResult;
predictionResult = algorithm.getResult();
BlockDescriptor<double> resultBlock;
predictionResult->get(prediction::prediction)->getBlockOfRows(0, numDepVariables,
readOnly, resultBlock);
double* result = resultBlock.getBlockPtr();
</source>
49
edits