===Optimized with DAAL===
Intel has a library called the Data Analytics Acceleration Library. It is used to solve big data problems, and the library contains optimized algorithmic building blocks to efficient solutions.
The library includes algorithms to solve all sorts of machine learning problems, including linear regression.
Two sets of data were generated from the serial version of the regression algorithm. The serial version was run twice, and the x[N], and y[N] arrays from the random normal number generator were written to two different csv files called test.csv, and train.csv. The x[N] and y[N] values in these two files follow a normal distribution as defined in the serial algorithm code, with N = 99,999,999.
The function called "lin_reg_norm_eq_dense_batch.cpp" in the DAAL library was manipulated to test the linear regression model. First, the function "trainModel()" is called. This function reads the "train.csv" data,
and then merges the columns based on the number of independent and dependent variables, in this case it is simple regression with 1 dependent and 1 independent variable. An optimized algorithm is then initialized, training data and dependent values are passed in, and trained based on the data within the csv file. A training result is produced, which is a line of best fit model for the data. The "testModel()" function is then called, which initialized a test algorithm. The algorithm works by passing the dependent variable into the training model, and the independent values are predicted.
The model predicted y = 0.5x + 1, which matches nearly perfectly with the random data which was stored to both the train.csv, and test.csv files.
#include "daal.h"
#include "service.h"
using namespace std;
using namespace daal;
using namespace daal::algorithms::linear_regression;
/* Input data set parameters */
string trainDatasetFileName = "train.csv";
string testDatasetFileName = "test.csv";
const size_t nFeatures = 1; /* Number of features in training and testing data sets */
const size_t nDependentVariables = 1; /* Number of dependent variables that correspond to each observation */
void trainModel();
void testModel();
training::ResultPtr trainingResult;
prediction::ResultPtr predictionResult;
int main(int argc, char *argv[])
//checkArguments(argc, argv, 2, &trainDatasetFileName, &testDatasetFileName);
return 0;
void trainModel()
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the input data from a .csv file */
FileDataSource<CSVFeatureManager> trainDataSource(trainDatasetFileName,
/* Create Numeric Tables for training data and dependent variables */
NumericTablePtr trainData(new HomogenNumericTable<>(nFeatures, 0, NumericTable::doNotAllocate));
NumericTablePtr trainDependentVariables(new HomogenNumericTable<>(nDependentVariables, 0, NumericTable::doNotAllocate));
NumericTablePtr mergedData(new MergedNumericTable(trainData, trainDependentVariables));
/* Retrieve the data from input file */
/* Create an algorithm object to train the multiple linear regression model with the normal equations method */
training::Batch<> algorithm;
/* Pass a training data set and dependent values to the algorithm */
algorithm.input.set(training::data, trainData);
algorithm.input.set(training::dependentVariables, trainDependentVariables);
/* Build the multiple linear regression model */
/* Retrieve the algorithm results */
trainingResult = algorithm.getResult();
printNumericTable(trainingResult->get(training::model)->getBeta(), "Linear Regression coefficients:");
void testModel()
/* Initialize FileDataSource<CSVFeatureManager> to retrieve the test data from a .csv file */
FileDataSource<CSVFeatureManager> testDataSource(testDatasetFileName,
/* Create Numeric Tables for testing data and ground truth values */
NumericTablePtr testData(new HomogenNumericTable<>(nFeatures, 0, NumericTable::doNotAllocate));
NumericTablePtr testGroundTruth(new HomogenNumericTable<>(nDependentVariables, 0, NumericTable::doNotAllocate));
NumericTablePtr mergedData(new MergedNumericTable(testData, testGroundTruth));
/* Load the data from the data file */
/* Create an algorithm object to predict values of multiple linear regression */
prediction::Batch<> algorithm;
/* Pass a testing data set and the trained model to the algorithm */
algorithm.input.set(prediction::data, testData);
algorithm.input.set(prediction::model, trainingResult->get(training::model));
/* Predict values of multiple linear regression */
/* Retrieve the algorithm results */
predictionResult = algorithm.getResult();
"Linear Regression prediction results: (first 10 rows):", 10);
printNumericTable(testGroundTruth, "Ground truth (first 10 rows):", 10);
