Open main menu

CDOT Wiki β

Changes

GPU610/gpuchill

5,043 bytes added, 21:29, 4 April 2019
Beginning Information
= GPU n' Chill =
== Team Members ==
# [mailto:dserpa@myseneca.ca?subject=gpu610 Daniel Serpa], Calculation of Pi, Shrink & Rotate
# [mailto:akkabia@myseneca.ca?subject=gpu610 Abdul Kabia], Some responsibility
# [mailto:jtardif1@myseneca.ca?subject=gpu610 Josh Tardif], Some responsibility
It seems most of our time in this part of the code is spent assigning our enlarged image to the now one, and also creating our image object in the first place. I think if we were to somehow use a GPU for this process, we would see an decrease in run-time for this part of the library. Also, there also seems to be room for improvement on the very 'Image::enlargeImage' function itself. I feel like by loading said functionality onto thje GPU, we can reduce it's 0.76s to something even lower.
 
Using the same image as above (16MB file), I went ahead and profile the Negate option as well. This as the name implies turns the image into a negative form.
<pre>
real 0m5.707s
user 0m0.000s
sys 0m0.000s
</pre>
 
As you can see, about half the time of the Enlarge option, which is expect considering you're not doing as much.
 
<pre>
Flat profile:
 
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
23.53 0.16 0.16 2 80.00 80.00 Image::Image(Image const&)
16.18 0.27 0.11 2 55.00 55.00 Image::Image(int, int, int)
14.71 0.37 0.10 _fu62___ZSt4cout
13.24 0.46 0.09 17117346 0.00 0.00 Image::getPixelVal(int, int)
13.24 0.55 0.09 1 90.00 90.00 Image::operator=(Image const&)
7.35 0.60 0.05 1 50.00 140.00 writeImage(char*, Image&)
7.35 0.65 0.05 1 50.00 195.00 Image::negateImage(Image&)
4.41 0.68 0.03 17117346 0.00 0.00 Image::setPixelVal(int, int, int)
0.00 0.68 0.00 4 0.00 0.00 Image::~Image()
0.00 0.68 0.00 3 0.00 0.00 std::operator|(std::_Ios_Openmode, std::_Ios_Openmode)
0.00 0.68 0.00 1 0.00 0.00 readImageHeader(char*, int&, int&, int&, bool&)
0.00 0.68 0.00 1 0.00 0.00 readImage(char*, Image&)
0.00 0.68 0.00 1 0.00 0.00 Image::getImageInfo(int&, int&, int&)
</pre>
 
Notice in both cases of the Enlarge and Negate options the function "Image::Image(int, int, int)" is always within the top 3 of functions that seem to take the most time. Also, the functions "Image::setPixelVal(int, int, int)" and
"Image::getPixelVal(int, int)" are called very often. I think if we focus our efforts on unloading the "Image::getPixelVal(int, int)" and "Image::setPixelVal(int, int, int)" functions onto the GPU as I imagine they are VERY repetitive tasks, as well as try and optimize the "Image::Image(int, int, int)" function; we are sure to see an increase in performance for this program.
==== Merge Sort Algorithm ====
===== Results =====
You need many billions of points and maybe even trillions to reach a high precision for the final result but using just 2 billion dots causes the program to take over 30 seconds to run. The most intensive part of the program is the loop which is what loops executes 2 billion times in my run of the program while profiling, which can all be parallelized. We can determine from the profiling that 100% of the time executing the program is spent in the loop but of course that is not possible so we will go with 99.9%, using a GTX 1080 as an example GPU which has 20 SMX processors and each having 2048 threads, and using Amdahl's Law we can expect a speedup of 976.191 times
=== Assignment 2 ===
==== Beginning Information ====
 
Image used for all of the testing
 
[[File:Duck.JPG||400px]]
 
==== Enlarge Image====
<pre>
__global__ void enlargeImg(int* a, int* b, int matrixSize, int growthVal, int imgCols, int enlargedCols) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int x = idx / enlargedCols;
int y = idx % enlargedCols;
if (idx < matrixSize) {
a[idx] = b[(x / growthVal) * imgCols + (y / growthVal)];
}
}
</pre>
 
==== Shrink Image ====
 
<pre>
__global__ void shrinkImg(int* a, int* b, int matrixSize, int shrinkVal, int imgCols, int shrinkCols) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int x = idx / shrinkCols;
int y = idx % shrinkCols;
if (idx < matrixSize) {
a[idx] = b[(x / shrinkVal) * imgCols + (y / shrinkVal)];
}
}
</pre>
 
==== Reflect Image====
 
<pre>
// Reflect Image Horizontally
__global__ void reflectImgH(int* a, int* b, int rows, int cols) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
//tempImage.pixelVal[rows - (i + 1)][j] = oldImage.pixelVal[i][j];
a[j * cols + (rows - (i + 1))] = b[j * cols + i];
 
}
 
//Reflect Image Vertically
__global__ void reflectImgV(int* a, int* b, int rows, int cols) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
//tempImage.pixelVal[i][cols - (j + 1)] = oldImage.pixelVal[i][j];
a[(cols - (j + 1) * cols) + i] = b[j * cols + i];
 
}
</pre>
 
==== Translate Image====
 
<pre>
__global__ void translateImg(int* a, int* b, int cols, int value) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;
 
//tempImage.pixelVal[i + value][j + value] = oldImage.pixelVal[i][j];
a[(j-value) * cols + (i+value)] = b[j * cols + i];
}
</pre>
 
==== Rotate Image====
 
<pre>
__global__ void rotateImg(int* a, int* b, int matrixSize, int imgCols, int imgRows, int r0, int c0, float rads) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int r = idx / imgCols;
int c = idx % imgCols;
if (idx < matrixSize) {
int r1 = (int)(r0 + ((r - r0) * cos(rads)) - ((c - c0) * sin(rads)));
int c1 = (int)(c0 + ((r - r0) * sin(rads)) + ((c - c0) * cos(rads)));
if (r1 >= imgRows || r1 < 0 || c1 >= imgCols || c1 < 0) {
}
else {
a[c1 * imgCols + r1] = b[c * imgCols + r];
}
 
}
}
 
__global__ void rotateImgBlackFix(int* a, int imgCols) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int r = idx / imgCols;
int c = idx % imgCols;
if (a[c * imgCols + r] == 0)
a[c * imgCols + r] = a[(c + 1) * imgCols + r];
}
</pre>
 
==== Negate Image====
 
<pre>
__global__ void negateImg(int* a, int* b, int matrixSize) {
int matrixCol = blockIdx.x * blockDim.x + threadIdx.x;
if(matrixCol < matrixSize)
</pre>
 
====Results====
[[File:CHART2GOOD.png]]
 
=== Assignment 3 ===
46
edits