Changes

Jump to: navigation, search

Alpha Centauri

2,068 bytes added, 13:50, 22 December 2017
no edit summary
== Introduction ==
Intel Data Analytics Acceleration Library, also known as Intel DAAL, is a library created by Intel in 2015 to solve problems associated with Big Dataand Machine Learning.<br/>Intel DAAL It is available for Linux, OS X and Windows platforms and it is available for works with the following programming languages: C++, Python, and Java programming platforms.<br/>Intel DAAL It is also optimized to run on a wide range of devices ranging from home computers to data centers and it uses Vectorization to deliver the best performances.<br/>
Intel DAAL helps speed big data analytics by providing highly optimized algorithmic building blocks for all data analysis stages and by supporting different processing modes.
== How Intel DAAL Works ==
Intel DAAL comes pre-bundled with Intel® Parallel Studio XE and Intel® SystemStudio. It is also available as a stand-alone version and can be installed following these instructions [https://software.intel.com/en-us/get-started-with-daal-for-linux instructions].<br/>
Intel DAAL is a simple and efficient solution to solve problems related to Big Data, Machine Learning, and Deep Learning.<br/>
The reasoning behind that is because it handles all the complex and tedious algorithms for you and software developers only have to worry about feeding the Data and follow the Data Analytics Ecosystem Flow.<br/>
Following, there are some pictures that show how Intel DAAL works in greater detail.
[[File:Daal-flow.png]] <br/>This picture shows the Data Flow in Intel DAAL.The picture shows the data being fed to the program and all the steps that Intel DAAL goes through when processing the data.
[[File:DaalModel.png|500px]] <br/>
This picture shows the Intel DAAL Model(Data Management, Algorithms, and Services).
This model represents all the functionalities that Intel DAAL offers from grabbing the data to making a final decision.
 
 
 
[[File:Daal-flow.png]] <br/>
This picture shows the Data Flow in Intel DAAL.
The picture shows the data being fed to the program and all the steps that Intel DAAL goes through when processing the data.
 
 
[[File:DAALDataflow.PNG]] <br/>
A very good type of machine learning problem is handwritten digit recognition. Intel DAAL does a good job at solving this problem by providing several relevant application algorithms such as Support Vector Machine (SVM), Principal Component Analysis (PCA), Naïve Bayes, and Neural Networks. Below there is an example that uses SVM to solve this problem.
Recognition is essentially the prediction or inference stage in the machine learning pipeline.In simple words, When given a handwritten digit, the system should be able to recognize or infer what determine which digit was had been written. For In order for a system to be able to predict the output with a given inputset of data, it needs a trained model learned from the training data set that provides would provide the system with the capability to make an inference or a calculated prediction.  The first step before constructing a training model is to collect training datafrom the given data within the .csv data set files
=== Loading Data in Intel DAAL ===
</source>
The object trainDataSource is a CSVFeatureManager that can load the data from a CSV file into memory.Setting Training and Prediction Models<source lang=c++>services::SharedPtr<svm::training::Batch<> > training(new svm::training::Batch<>());services::SharedPtr<svm::prediction::Batch<> > prediction(new svm::prediction::Batch<>());</source> Setting Training and Prediction Algorithm Models
<source lang=c++>
FileDataSourceservices::SharedPtr<CSVFeatureManagermulti_class_classifier::training::Result> trainDataSource(trainDatasetFileName,trainingResult; DataSourceservices::doAllocateNumericTable, DataSourceSharedPtr<classifier::doDictionaryFromContext)prediction::Result> predictionResult;
</source>
The data in memory would be stored as a numerical table. With the CSVFeatureManager, the table is automatically created. Load data from the CSV file by calling the member function loadDataBlock().Setting up Kernel Function Parameters for Multi-Class Classifier
<source lang=c++>
trainDataSource.loadDataBlockkernel_function::rbf::Batch<> *rbfKernel = new kernel_function::rbf::Batch<>(nTrainObservations);services::SharedPtr<kernel_function::KernelIface> kernel(rbfKernel);services::SharedPtr<multi_class_classifier::quality_metric_set::ResultCollection> qualityMetricSetResult;</source> Initializing Numeric Tables for Predicted and Ground Truth<source lang=c++>services::SharedPtr<NumericTable> predictedLabels;services::SharedPtr<NumericTable> groundTruthLabels;
</source>
 
=== Training Data in Intel DAAL ===
With the training data in memory, DAAL can start use that data [[File:trainpic.jpg]]<br/>Initialize FileDataSource<CSVFeatureManager> to train by passing it to an algorithm by the training retrieve input data numeric table from .csv file <source lang=c++>FileDataSource<CSVFeatureManager> trainDataSource.getNumericTable(trainDatasetFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);FileDataSource<CSVFeatureManager> trainGroundTruthSource(trainGroundTruthFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);</source>
First the creation of a training model must occurLoad data from files
<source lang=c++>
services::SharedPtr<svm::training::Batch<> > trainingtrainDataSource.loadDataBlock(newnTrainObservations); svm::training::Batch<>trainGroundTruthSource.loadDataBlock()nTrainObservations);
</source>
Create Initialize algorithm object for multi-class SVM training
<source lang=c++>
multi_class_classifier::training::Batch<> algorithm; algorithm.parameter.nClasses = nClasses; algorithm.parameter.training = training;
</source>
Pass training dataset and dependent values to the Setting algorithmparameters
<source lang=c++>
algorithm.inputparameter.nClasses = nClasses;algorithm.parameter.set(classifier::training::data,= training; trainDataSourcealgorithm.parameter.getNumericTable())prediction = prediction;
</source>
Build multi-class SVM model by calling Pass dependent parameters and training data to the SVM computation on algorithm
<source lang=c++>
algorithm.computeinput.set(classifier::training::data, trainDataSource.getNumericTable());algorithm.input.set(classifier::training::labels, trainGroundTruthSource.getNumericTable());
</source>
Retrieve Retrieving results from algorithm results and place within trainingResult object
<source lang=c++>
trainingResult = algorithm.getResult();
</source>
Serialize the learned model into a disk file. The training data from trainingResult is written to the model.
<source lang=c++>
ModelFileWriter writer("./model"); writer.serializeToFile(trainingResult->get(classifier::training::model));
</source>
 
=== Testing The Trained Model ===
The prediction model is created[[File:testpic.jpg]]<br/>Initialize testDataSource to retrieve test data from a .csv file
<source lang=c++>
services::SharedPtr<svm::prediction::BatchFileDataSource<CSVFeatureManager> > predictiontestDataSource(newtestDatasetFileName, svm DataSource::predictiondoAllocateNumericTable, DataSource::Batch<>doDictionaryFromContext);testDataSource.loadDataBlock()nTestObservations);</source>
Initialize testDataSource and load data from algorithm object for prediction of SVM values/* Initialize FileDataSource<CSVFeatureManager> to retrieve the test data from .csv file */ FileDataSource<CSVFeatureManagersource lang=c++> testDataSource(testDatasetFileName, DataSourcemulti_class_classifier::doAllocateNumericTable, DataSourceprediction::doDictionaryFromContext)Batch<> algorithm; testDataSource.loadDataBlock(nTestObservations);</source>
/* Create Setting algorithm object for prediction of multi-class SVM values */parameters multi_class_classifier::prediction::Batch<source lang=c++> algorithm.parameter.nClasses = nClasses;algorithm.parameter.training = training;algorithm.parameter.prediction = prediction;</source>
Pass into the algorithm the testing data and trained model<source lang=c++>algorithm.parameterinput.set(classifier::prediction = ::data, testDataSource.getNumericTable());algorithm.input.set(classifier::prediction::model,trainingResult->get(classifier::training::model));</source>
/* Pass testing dataset and trained model to the Retrieve results from algorithm */ <source lang=c++>predictionResult = algorithm.input.set(classifier::prediction::data, testDataSource.getNumericTablegetResult()); algorithm.input.set(classifier::prediction::model, trainingResult-</source>get(classifier::training::model));
/* Predict multi-class SVM values */
algorithm.compute();
/* Retrieve algorithm results */=== Testing the Quality of the Model ===Initialize testGroundTruth to retrieve ground truth test data from .csv file predictionResult <source lang= algorithmc++>FileDataSource<CSVFeatureManager> testGroundTruth(testGroundTruthFileName, DataSource::doAllocateNumericTable, DataSource::doDictionaryFromContext);testGroundTruth.getResultloadDataBlock(nTestObservations);</source>
Retrieve label for ground truth
<source lang=c++>
groundTruthLabels = testGroundTruth.getNumericTable();
</source>
 
Retrieve prediction label
<source lang=c++>
predictedLabels = predictionResult->get(classifier::prediction::prediction);
</source>
 
Create quality metric object to quantitate quality metrics of the classifier algorithm
<source lang=c++>
multi_class_classifier::quality_metric_set::Batch qualityMetricSet(nClasses);
services::SharedPtr<multiclass_confusion_matrix::Input> input =
qualityMetricSet.getInputDataCollection()->getInput(multi_class_classifier::quality_metric_set::confusionMatrix);
input->set(multiclass_confusion_matrix::predictedLabels, predictedLabels);
input->set(multiclass_confusion_matrix::groundTruthLabels, groundTruthLabels);
</source>
 
Compute quality
<source lang=c++>
qualityMetricSet.compute();
</source>
 
Retrieve quality results
<source lang=c++>
qualityMetricSetResult = qualityMetricSet.getResultCollection();
</source>
<br/>
=== SVM Digit Recognition Code Example Output ===
[[File:outputDAAL.jpg|950px]]
== Sources ==
81
edits

Navigation menu