Open main menu

CDOT Wiki β

Changes

GPU621/Intel DAAL

1,979 bytes added, 13:28, 7 December 2022
Intel DAAL
== Intel DAAL ==
Intel Data Analytics Acceleration Library is essentially a library which is optimized to work with large data sources and analytics. It covers the comprehensive range of tasks that arise when working with large data, from preprocessing, transformation, analysis, modeling, validation and decision making. This makes it quite flexible as it can be used in many end-to-end analytics frameworks. What this means is that we can use this library to extract data from files, store files, structure the data in the files in an orderly way and perform complex operations on that data - all within the same library.
[[File:alow1.jpg]]
Having a complete framework is a very powerful perk to have as we can be assured that all parts of the system will link together. This appears to be one of the main appeals of the system as we will not have to worry if how we are handling data sets, for example reading a large csv file, will affect our ability to process them algorithmically.
The framework is composed of 3 major components: data management + algorithms + services
1) Download Intel oneAPI Base Toolkit
2) Within VS Code -> Project Properties -> intel libraries for oneAPI -> use oneDAL
[[File:alow2.jpg]]
== Computation Modes ==
 
Computation modes refer to how the functions in the library interact with the data that the data management part of the library handled. This can be divided into 3 types of interactions, batching, online and distributed. While batching and online types are just efficient code handling of large data sets, distributed processing falls under what we know as parallel processing. One of the reasons why the computation is quick is simply because the data being accessed has been efficiently organized by the data management part of the library. Additionally, many functions in the library have the potential to be used by two or even three different computation modes. This gives the developer a lot of freedom in deciding how to handle the data. If the data set is not so large, it may be much better to do batch processing. Whereas if the data set is actually very large, it may make a lot more sense to use distributed processing.
'''Batching:'''
The majority of Many common functions in the library appears appear to be simple batch processing. I believe batch processing is the equivalent to serial code, where the algorithm works with the entire block of data at once. However, the library is still quite optimized even in these situations. For example the sort function:
[[File:alow5.jpg]]
[[File:alow4.jpg]]
We can see that the DAAL version versus a vector quick sort is much slower with a small data collection but as the data set gets larger and larger it starts to outperform the quick sort more and more. This shows that there is a small amount of overhead when calling this function and to only use it for larger data sets.
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/sorting/sorting_dense_batch.cpp Batch Sort Code Link]
'''Distributed:'''
The final method of processing in the library is distributed processing. This is exactly what it sounds like, the library now forks different chunks of data to different compute nodes before finally rejoining all the data in one place. These functions are obviously best used for larger data sets and more complex operations.One example Below are a list of this is K-means clusteringalgorithms, all of which is basically just modelling vectors are optimized for large data sets and seeing where they end up clustering arounduse distributed processing.
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/kmeans/kmeans_dense_distr.cpp K-means Code Example]
 
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/moments/low_order_moms_dense_distr.cpp Moments of Low Order Example]
 
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/pca/pca_cor_dense_distr.cpp Principle Component Analysis]
 
[https://github.com/oneapi-src/oneDAL/blob/master/examples/daal/cpp/source/svd/svd_dense_distr.cpp Distributed Processing]
[[File:alow7.jpg]]
42
edits