42
edits
Changes
→Intel DAAL
== Intel DAAL ==
Intel Data Analytics Acceleration Library is essentially a library which is optimized to work with large data sources and analytics. It covers the comprehensive range of tasksthat arise when working with large data, from preprocessing, transformation, analysis, modeling, validation and decision making. This makes it quite flexible as it can be used in many end-to-end analytics frameworks (data management + algorithms + services). To optimize performance the intel DAAL What this means is that we can use this library takes and uses algorithms to extract data from files, store files, structure the data in the Math Kernel Library as well as files in an orderly way and perform complex operations on that data - all within the Intel Integrated Performance Primitivessame library.
[[File:alow1.jpg]]
Having a complete framework is a very powerful perk to have as we can be assured that all parts of the system will link together. This appears to be one of the main appeals of the system as we will not have to worry if how we are handling data sets, for example reading a large csv file, will affect our ability to process them algorithmically.
The framework is composed of 3 major components: data management + algorithms + services
[[File:alow6.jpg]]
The '''data management''' part of the system is critical to the overall structure, since data must be formatted in such a way that the algorithmic functions will be able to operate on them swiftly and efficiently, as well as compression and decompression of very large data sets. This section is the part of the system which deals in extracting long csv files and putting the data in models where they can be accessed by the algorithms. Additionally, this part of the system handles the data in such a way that even if parts of data are missing the algorithmic section will still be able to understand. The following image is an example of how data management structures data, by putting it within a "data set". In the data set, table rows represent observations and columns represent features.
[[File:alow9.jpg]]
size_t readTextFile(const std::string& datasetFileName, daal::byte** data)
{
std::ifstream file(datasetFileName.c_str(), std::ios::binary | std::ios::ate);
if (!file.is_open())
{fileOpenError(datasetFileName.c_str());}
std::streampos end = file.tellg();
file.seekg(0, std::ios::beg);
size_t fileSize = static_cast<size_t>(end);
(*data) = new daal::byte[fileSize];
checkAllocation(data);
if (!file.read((char*)(*data), fileSize))
{ delete[] data;
fileReadError();
}
return fileSize;
}
== How to enable: ==
1) Download Intel oneAPI Base Toolkit
2) Within VS Code -> Project Properties -> intel libraries for oneAPI -> use oneDAL
[[File:alow2.jpg]]
== Computation Modes ==
'''Batching:'''
[[File:alow5.jpg]]
[[File:alow4.jpg]]
We can see that the DAAL version versus a vector quick sort is much slower with a small data collection but as the data set gets larger and larger it starts to outperform the quick sort more and more. This shows that there is a small amount of overhead when calling this function and to only use it for larger data sets.
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/sorting/sorting_dense_batch.cpp Batch Sort Code Link]
'''Online:'''
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/svd/svd_dense_online.cpp SVD Code Example]
'''Distributed:'''
The final method of processing in the library is distributed processing. This is exactly what it sounds like, the library now forks different chunks of data to different compute nodes before finally rejoining all the data in one place.These functions are obviously best used for larger data sets and more complex operations. The example used here is K-means clusteringBelow are a list of algorithms, all of which is basically just modelling vectors are optimized for large data sets and seeing where they end up clustering arounduse distributed processing.
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/kmeans/kmeans_dense_distr.cpp K-means Code Example]
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/moments/low_order_moms_dense_distr.cpp Moments of Low Order Example]
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/pca/pca_cor_dense_distr.cpp Principle Component Analysis]
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/svd/svd_dense_distr.cpp Distributed Processing]
[[File:alow7.jpg]]
Locally, our code operate on different chunks.
[[File:alow8.jpg]]
On the master thread, all operations eventually rejoin.