GPU621/Intel Data Analytics Acceleration Library

From CDOT Wiki
Revision as of 12:55, 28 November 2021 by Ang65 (talk | contribs)
Jump to: navigation, search


GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary

Intel Data Analytics Acceleration Library

Group Members

- Adrian Ng

- Milosz Zapolski

- Muhammad Faaiz

Progress

100/100

Intel Data Analytics Acceleration

Introduction

Intel Data Analytics Acceleration Library is a library of optimized algorithmic building blocks for data analysis. It provides tools to build compute-intense applications that run fast on Intel architecture. It is optimized for CPUs and GPUs and includes algorithms for analysis functions, math functions, and training and library prediction functions for C++, Java and machine-learning Python libraries.


What can you do with it?

Intel Data Analytics Acceleration Library is a toolkit specifically designed to provide the user the tools it needs to optimize compute-intense applications that use intel architecture. This involves analysing datasets with the limited compute resources the user has available, optimize predictions created by systems to make them run faster, optimize data ingestion and algorithmic compute simultaneously. All of this while still being able to be used in offline, streaming or even distributed models allowing you to download the library or use it from cloud.

Features

Performance

Intel Data Analytics Acceleration Library is a high specialized toolkit that uses specific algorithms to analyse, train, predict or process the datasets given to it. This allows the user to algorithm and function to write data in order to maximize the performance. Each algorithm within Intel Data Analytics Acceleration Library is highly specialized for very specific scenarios ensuring that every single resource available is used to it's full potential.

Portability

Working with Intel Data Analytics Acceleration Library gives a range of languages it can support and integrate with. These languages are Python, Java, C and C++. As these languages are common in modern coding Intel Data Analytics Acceleration Library will be able to integrate with most applications.

In-Depth Algorithm Support

Supported Algorithms

  • Apriori for Association Rules Mining
  • Correlation and Variance-Covariance Matrices
  • Decision Forest for Classification and Regression
  • Expectation-Maximization Using a Gaussian Mixture Model (EM-GMM)
  • Gradient Boosted Trees (GBT) for Classification and Regression
  • Alternating Least Squares (ALS) for Collaborative Filtering
  • Multinomial Naïve Bayes Classifier
  • Multiclass Classification Using a One-Against-One Strategy
  • Limited-Memory BFGS (L-BFGS) Optimization Solver
  • Logistic Regression with L1 and L2 regularization support
  • Limited-Memory BFGS (L-BFGS) Optimization Solver
  • Linear Regression

Supported CPU & GPU Algorithms via DPC++ Interfaces

  • K-Means Clustering
  • K-Nearest Neighbor (KNN)
  • Support Vector Machines (SVM) with Linear and Radial Basis Function (RBF) Kernels
  • Principal Components Analysis (PCA)
  • Density-based Special Clustering of Applications with Noise (DBSCAN)
  • Random Forest

Code Examples

The examples shown below will demonstrate the algorithms used in Intel Data Analytics Acceleration Library along with how to access oneAPI code samples in tool command line or IDE:

Benchmark Example

Intel Data Analytics Acceleration Library addresses all stages of data analytics pipeline, it pre-processes the data, transforms it, analyses it, models, validates and makes decisions based on the dataset given to it.

IDAAL Benchmark.jpg

Conclusion

The Intel Data Analytics Acceleration Library is a great tool to use if we are looking for something to maximize preformance speed while still being able to process large amounts of data without losing accuracy. Due to it's flexible nature it can be run within the cloud which allows users to access large amounts of compute power from their home. The Intel Data Analytics Acceleration Library is a part of the OneAPI Base Toolkit and is often updated to ensure it's continued reliability.