Difference between revisions of "GPU621/Intel oneMKL - Math Kernel Library"

From CDOT Wiki
Jump to: navigation, search
(Setting up MKL)
Line 20: Line 20:
 
Finally, modify the additional dependencies with the help of the URL https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html<br />
 
Finally, modify the additional dependencies with the help of the URL https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html<br />
  
 +
==MKL Testing==
 +
In this project I want to compare the running time of the serial version and the optimized version of MKL under multithreading. <br />
 +
 +
serial version<br />
 +
 +
clock_t startTime = clock();
 +
    for (r = 0; r < LOOP_COUNT; r++) {
 +
        for (i = 0; i < m; i++) {
 +
            for (j = 0; j < n; j++) {
 +
                sum = 0.0;
 +
                for (k = 0; k < p; k++)
 +
                    sum += A[p * i + k] * B[n * k + j];
 +
                C[n * i + j] = sum;
 +
            }
 +
        }
 +
    }
 +
 +
    clock_t endTime = clock();
 +
<br />
 +
 +
MKL version<br />
 +
 +
max_threads = mkl_get_max_threads();
 +
    printf(" Finding max number %d of threads Intel(R) MKL can use for parallel runs \n\n", max_threads);
 +
 +
    printf(" Running Intel(R) MKL from 1 to %i threads \n\n", max_threads * 2);
 +
    for (i = 1; i <= max_threads * 2; i++) {
 +
        for (j = 0; j < (m * n); j++)
 +
            C[j] = 0.0;
 +
 +
        mkl_set_num_threads(i);
 +
 +
        cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
 +
            m, n, p, alpha, A, p, B, n, beta, C, n);
 +
 +
        s_initial = dsecnd();
 +
        for (r = 0; r < LOOP_COUNT; r++) {
 +
            cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
 +
                m, n, p, alpha, A, p, B, n, beta, C, n);
 +
        }
 +
        s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;
 +
 +
https://raw.githubusercontent.com/MenglinWu9527/m3u/main/Snipaste_2021-12-01_00-20-37.jpeg
 +
 +
{| class="output"
 +
! serial
 +
! 1
 +
! 2
 +
! 3
 +
! 4
 +
! 5
 +
! 6
 +
! 7
 +
! 8
 +
! 9
 +
! 10
 +
! 11
 +
! 12
 +
|-
 +
| 1500
 +
| 15.7
 +
| 7.7
 +
| 6.4
 +
| 8.1
 +
| 7.4
 +
| 7.5
 +
| 8.0
 +
| 7.9
 +
| 7.2
 +
| 7.5
 +
| 7.2
 +
| 8.0
 +
| 8.0
 +
|}
 +
 +
Here is my computer's number of logical processors.</br>
 +
wmic:root\cli>cpu get numberoflogicalprocessors</br>
 +
NumberOfLogicalProcessors
 +
6
 
==References==
 
==References==
 
references
 
references

Revision as of 01:32, 1 December 2021

Intel® oneAPI Math Kernel Library

Group Members

  1. Menglin Wu
  2. Syed Muhammad Saad Bukhari
  3. Lin Xu

Introduction

Intel Math Kernel Library, or now known as oneMKL (as part of Intel’s oneAPI), is a library of highly optimized and extensively parallelized routines, that was built to provide maximum performance across a variety of CPUs and accelerators.

There are many functions included in domains such as sparse and dense linear algebra, sparse solvers, fast Fourier transforms, random number generation, basic statistics etc., and there are many routines supported by the DPC++ Interface on CPU and GPU.

Progress Report

progress 100%

Setting up MKL

First, you need to download the mkl library from the intel official website through the URL: https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html
Then you need to set additional include directories and additional library directories on visual studio, don’t forget to change the configuration and platform.
Finally, modify the additional dependencies with the help of the URL https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html

MKL Testing

In this project I want to compare the running time of the serial version and the optimized version of MKL under multithreading.

serial version

clock_t startTime = clock();

   for (r = 0; r < LOOP_COUNT; r++) {
       for (i = 0; i < m; i++) {
           for (j = 0; j < n; j++) {
               sum = 0.0;
               for (k = 0; k < p; k++)
                   sum += A[p * i + k] * B[n * k + j];
               C[n * i + j] = sum;
           }
       }
   }
   clock_t endTime = clock();


MKL version

max_threads = mkl_get_max_threads();

   printf(" Finding max number %d of threads Intel(R) MKL can use for parallel runs \n\n", max_threads);
   printf(" Running Intel(R) MKL from 1 to %i threads \n\n", max_threads * 2);
   for (i = 1; i <= max_threads * 2; i++) {
       for (j = 0; j < (m * n); j++)
           C[j] = 0.0;
       mkl_set_num_threads(i);
       cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
           m, n, p, alpha, A, p, B, n, beta, C, n);
       s_initial = dsecnd();
       for (r = 0; r < LOOP_COUNT; r++) {
           cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,
               m, n, p, alpha, A, p, B, n, beta, C, n);
       }
       s_elapsed = (dsecnd() - s_initial) / LOOP_COUNT;

https://raw.githubusercontent.com/MenglinWu9527/m3u/main/Snipaste_2021-12-01_00-20-37.jpeg

serial 1 2 3 4 5 6 7 8 9 10 11 12
1500 15.7 7.7 6.4 8.1 7.4 7.5 8.0 7.9 7.2 7.5 7.2 8.0 8.0

Here is my computer's number of logical processors.</br> wmic:root\cli>cpu get numberoflogicalprocessors</br> NumberOfLogicalProcessors 6

References

references