Changes

← Older edit

GPU621/MKL

264 bytes added, 16:45, 30 November 2022

→‎What is Math Kernel Library?

== Overview ==

This project aims to explore the Intel Math Kernel Library and find out how it functions, its efficiency, as well as its advantages and disadvantages when ~~utlized~~ utilized in the real world. This will be accomplished through an examination of how to include and apply Math Kernel Library functionality to a program, and the resulting effect on computational efficiency.

== What is Math Kernel Library? ==

Released on May 9, 2003, Intel's oneAPI Math Kernel Library, also known as Intel oneMKL or Intel MKL, is a library tailored towards the optimization of numerical computation in the fields such as science, engineering and finance. MKL functions by parallelizing computation routines processing on both the CPU and GPU. The library provides functionality improvements for calculations including:

== How MKL Improves Efficiency ==

In this instance the MKL used DGEMM to improve the calculation time. DGEMM stands for '''D'''ouble-precision, '''GE'''neral '''M'''atrix-'''M'''atrix multiplication. In the example used to demonstrate matrix multiplication, the code defines the multiplication of two matrices along with scaling factors alpha and beta. It can be noted that without MKL implementation the matrix multiplication is done though nested loops, however in the MKL optimized version cblas_dgemm() is called. The dgemm refers to DGEMM defined above and cblas refers to the CBLAS interface, which stands for '''B'''asic '''L'''inear '''A'''lgebra '''S'''ubprograms in '''C'''. One part of BLAS, level 3, is dedicated to matrix-matrix operations, which in this case includes the matrix multiplication calculations. While the math and logic behind the implementation of the cblas_dgemm() function is fairly complicated, a simplified explanation on how it works can be expressed as the decomposition of either one or both of the matrices being multiplied and taking advantage of cache memory to improve computation speed.

One part of BLAS, level 3, is dedicated to matrix-matrix operations, which in this case includes the matrix multiplication calculations. While the math and logic behind the implementation of the cblas_dgemm() function is fairly complicated, a simplified explanation on how it works can be expressed as the decomposition of either one or both of the matrices being multiplied and taking advantage of cache memory to improve computation speed. The decomposition of matrices into block matrices allows for general matrix-matrix multiplication to be conducted recursively. By using a beta parameter with the block matrices a multiplication calculation can be eliminated from each member of the resulting matrix

== Other Mathematical Functionality ==

Jyzhang7

24

edits

Changes

GPU621/MKL

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools