Changes

Jump to: navigation, search

GPU621/Intel Data Analytics Acceleration Library

4,539 bytes added, 07:27, 9 December 2021
no edit summary
Intel Data Analytics Acceleration Library is a library of optimized algorithmic building blocks for data analysis. It provides tools to build compute-intense applications that run fast on Intel architecture. It is optimized for CPUs and GPUs and includes algorithms for analysis functions, math functions, and training and library prediction functions for C++, Java and machine-learning Python libraries.
[[File:blocks.png|Left|1200px]]
=== What can you do with it? ===
Intel Data Analytics Acceleration Library is a toolkit specifically designed to provide the user the tools it needs to optimize compute-intense applications that use intel architecture. This involves analyzing datasets with the limited compute resources the user has available, optimize predictions created by systems to make them run faster, optimize data ingestion and algorithmic compute simultaneously. All of this while still being able to be used in offline, streaming or even distributed models allowing you to download the library or use it from cloud.
 
=== Why use it? ===
When working with Intel DAAL is supports the concept of the end to end analytics when some of data analytics stages are preformed on edge devices. These edge devices are close to where data is generated and consumed for example a router. For example, an edge device could be any device that controls data flow at the boundary of two networks like a phone, computer or IoT sensor. Intel DAAL APIs are agnostic about a particular cross-device communication technology this way they can actually be used with different types of end to end analytical frameworks.
 
[[File:End.png|Left|1200px]]
=== Features ===
==== Portability ====
Working with Intel Data Analytics Acceleration Library gives a range of languages it can support and integrate with. These languages are Python, Java, C and C++. As these languages are common in modern coding Intel Data Analytics Acceleration Library will be able to integrate with most applications.
 
==== Components ====
Intel DAAL consist of three major components:
* Data management
* Algorithms
* Services
 
[[File:compo.png|Left|1200px]]
 
==== Data Management ====
The data management portion of the Intel DAAL includes the classes and utilities used for data acquisition, initial preprocessing and normalization, “for” data conversion into numeric formats done by a supported data source and “for” model representation. An example would be the NumericTable class which is used for in-memory data manipulation. There is also the Model class which would be used to mimic any actual data that would be missing. If we move more towards more generic methods for dictionary manipulation there is DataSourceDictionary and NumericTableDictionary which are for things like accessing a particular data feature, adding a new feature or setting and retrieving the number of features.
 
==== Algorithms ====
As we move to look at algorithms it’s important to note that we are referring to classes that implement algorithms for data analysis and data modeling. They can be broke up into three computation modes: Batch processing, Online processing and Distributed processing.
 
Batch processing mode, the algorithm will work with the entire data set in mind to produce the output/results. This can be an issue when an entire data set is not available or the data set is too large to fit onto in the device’s memory.
 
Online Processing mode will incrementally update the results with partial results as the data set is streamed into the device’s memory in blocks.
 
Distributed processing mode will work on the data set across several devices. While it is working on the data set the devices will create partial results and upon the final result being completed all results will be merged into one.
 
==== Services ====
When looking at the services component it will include classes that are used across data management and algorithm components. To be more specific these classes will enable memory allocation, error handling and implementation of collections and shared pointers. These come in the form of collects which allow you to store different types of objects in a unified way. This comes in use when handling outputs or inputs of algorithms in use and at times can be used for error handling. How Intel DAAL will handle enabling memory for operations that require them or any deallocation is with shared pointers.
 
=== In-Depth Algorithm Support ===
* Limited-Memory BFGS (L-BFGS) Optimization Solver
* Linear Regression
 
==== Deeper analysis on supported algorithms ====
Analysis Algorithms:
* Low Order Moments: Includes computing min, max, deviation, variance for the data set
* Principal Components Analysis: Used for dimensionality reduction
* Clustering: Taking groups of data and putting them into unlabeled groups.
* Outlier detection: Identifying data that abnormal compared to typical data.
* Association rules mining: This will identify any cooccurrence patterns.
* Quantiles: Splitting data into equal groups
* Data transformation through matrix decomposition
* Correlation matrix and variance-covariance matrix:For understanding statistical dependence among variables given
* Cosine distance matrix: Determining pairwise distance using cosine
* Correlation distance matrix: Determining pairwise distance by using other objects/items
 
==== Supported CPU & GPU Algorithms via DPC++ Interfaces ====
* Density-based Special Clustering of Applications with Noise (DBSCAN)
* Random Forest
 
==== Training and prediction algorithms ====
Regression
* Ex. Linear regression: Predicting by using a linear equation to try to predict the relationship between variables that are explanatory (Known) or dependent.
 
Classification
* This would be trying to build a model to assign items into groups that would be uniquely labeled.
 
Recommendation Systems
* These systems would try to predict “preferences” a user would give by using commonly recognized examples
 
Neural networks
* Using algorithms they try to identify hidden patterns and any correlations that are within raw data, cluster and classifying it (More of a trial and error)
 
==== Code Examples ====
13
edits

Navigation menu