Changes

GPU610/gpuchill

1,984 bytes added, 22:55, 15 February 2019

→‎GPU n' Chill

It seems most of our time in this part of the code is spent assigning our enlarged image to the now one, and also creating our image object in the first place. I think if we were to somehow use a GPU for this process, we would see an decrease in run-time for this part of the library. Also, there also seems to be room for improvement on the very 'Image::enlargeImage' function itself. I feel like by loading said functionality onto thje GPU, we can reduce it's 0.76s to something even lower.

===Merge Sort Algorithm===

I decide to profile a vector merge sort algorithm. A merge sort is based on a based on divide and conquer technique which recursively breaks down a problem into two or more sub-problems of the same or related types. When these become simple enough to be solved directly the sub-problems are then combined to give a solution to the original problem. It first divides the array into equal halves and then combines them in a sorted manner. Due to this type of sort being broken into equal parts, I thought that it would be perfect for a GPU to be able to accelerate the process. With the sort being broken down into multiple chunks and then sent to the GPU it will be able to accomplish its task more efficiently. I was able to find the source code [https://codereview.stackexchange.com/questions/167680/merge-sort-implementation-with-vectors/ here].

Profile for 10 million elements between 1 and 10000. Using -02 optimization.

<pre>

Flat profile:

Each sample counts as 0.01 seconds.

% cumulative self self total

time seconds seconds calls ns/call ns/call name

48.35 1.16 1.16 9999999 115.56 115.56 mergeSort(std::vector<int, std::allocator<int> >&, std::vector<int, std::allocator<int> >&,

std::vector<int, std::allocator<int> >&)

32.80 1.94 0.78 sort(std::vector<int, std::allocator<int> >&)

19.34 2.40 0.46 43708492 10.58 10.58 std::vector<int, std::allocator<int> >::_M_insert_aux(__gnu_cxx::__normal_iterator<int*, std::vector<int,

std::allocator<int> > >, int const&)

0.00 2.40 0.00 1 0.00 0.00 _GLOBAL__sub_I_main

</pre>

As you can see 80% of the total time was spent in mergeSort and sort functions. <br />

If we look at Amdahl's law Sn = 1 / ( 1 - 0.80 + 0.80/8 ) we can expect a maximum speedup of 3.3x.

=== Assignment 2 ===

=== Assignment 3 ===

Jtardif1

46

edits

CDOT Wiki β

Changes

GPU610/gpuchill

CDOT Wiki ^β