Changes

← Older edit

BarraCUDA Boiz

4,305 bytes added, 00:53, 14 April 2017

→‎Progress

# [mailto:mamichalski@myseneca.ca?subject=DPS915 Michael Michalski]

# [mailto:addogra@myseneca.ca?subject=DPS915 Agam Dogra]

[mailto:vcbui@myseneca.ca;mamichalski@myseneca.ca;addogra@myseneca.ca?subject=GPU610 eMail All]

== Progress ==

=== Assignment 1 ===

==== <span style="color: ~~green~~red">&#~~x2713~~x2717; EucideanDistance ====

Profiled the following project on github which finds the euclidean distance transformation on given chart formatted in a text file. The project can be found here: [https://github.com/lanipse/Euclidean-Distance-Transform-CPP here]

====<span style="color: ~~red~~green">&#~~x2717~~x2713; KmeansPlusPlus====

Kmeansplusplus is a clustering method that determins which in this case takes an image and splits it into k number of clusters. For an image it selects k number of pixels and uses those pixels as a reference point to compare all the other pixels to change their colors based on which reference pixel they are closest to.

The first integer is k (the number of reference points), the second integer is the number of times to iterate through the image.

As you can see if you select a lot of clusters the image will appear very similar to the original but if you select a small number of clusters most of the detail is gone.

==== Conclusion and Observations ====

While the EucideanDistance project shows great promise in being optimized and parallelized (will be upwards of 90% once redundant functions are merged), it suffers from the draw back of having limited test data. The test data required for the EucideanDistance project is a text based image based in 1s and 0s. Along with this, the 'image' file used is also rather unpresentable.

The seam carver project displays poor results in regards of parallelization and optimization. There is no real identifiable hotspot in any of the function calls.

The KmeansPlusPlus project shows to be somewhat promising in regards to parallelization and optimization. At first glance, there does not seem to be a exact hotspot as two functions take around 50% of the work. However, on close examination, one of the function take 50% but is only called once in one project execution vs the other at above 8 million. Optimizing and improving at least the one portion of this project would prove to be advantageous.

In conclusion, our findings indicate that KmeansPlusPlus would be the best project to continue onto assignment 2, with EucideanDistance as the back up.

=== Assignment 2 ===

==== Problem ====

After surveying the original code. We found three major hot-spots for heavy CPU usage.

This block of code handles reshapes input pixels into a set of samples for classification.

[[File:SetSamplesSerial.png]]

This block of code computes the distances between sampled centers and other input samples.

[[File:CalculateDistanceSerial.png|550px]]

This block of code generates the image that has to be outputted.

[[File:GenerateImageSerial.png|550px]]

==== Analysis ====

After analyzing this block of code. We decided to parallelize it.

You can find the new parallelized KmeansPlusPlus code

[https://github.com/MajinBui/kmeansplusplusCUDA].

Here are the kernels that we programmed.

Set Samples kernel

[[File:SetSamplesKernel.png|550px]]

Calculate Distance kernel

[[File:CalculateDistanceKernel.png|550px]]

Generate Image kernel

[[File:GenerateImageKernel.png|550px]]

==== Conclusion ====

By comparing the run-times of the serial KmeansPlusPlus and the parallelized version, we can see that the performance of the program has improved.

[[File:GraphAssignment2.png|900px]]

The performance improvement is not significant for smaller clusters and iterations. But you can see that the performance has been improved for the higher test cases.

=== Assignment 3 ===

For assignment 3, we optimized the kernels by allocating the correct amounts of grids and block for each kernel. Previously, we allocated 32 threads by 32 blocks for every kernel call even when it did not require it. After adjustments, we found significant improvements for many of the kernels.

====Runtime of program====

Here, we see that the program was improved by the optimizations of threads per block.

Runtime of program:

For larger images, we found that the program was improved more and more as the amount of clusters and iterations increased.

[[File:Big Image.png]]

For medium images, we found more inconsistent results.

[[File:Med Image.png]]

For small images, we found the most inconsistent results after optimizations.

[[File:Small Image.png]]

When the image side increases, the more efficient the kernel.

====Runtime of each kernel====

Each kernel individually found significant or marginal improvements after adjusting for thread/block size.

Runtime of kernels:

Set samples found small improvements on average.

[[File:Set Samples.png]]

Here we changed the calculation of y_index to the outside of the inner loop.

[[File:SetSamplesKernelOptimized.png|550px]]

Calcuate distance found a significant improvements.

[[File:Calculate Distance Kernel.png]]

The biggest change was the thread/block size.

[[File:CalculateDistanceKernelOptimized.png|550px]]

Generate image found improvements as well since image sizes varied. Changing the thread/block size to the correct amount of pixels enabled better usage of memory.

[[File:Generate Image Kernel.png]]

The biggest change was the thread/block size.

[[File:GenerateImageKernelOptimized.png|550px]]

Addogra

52

edits

Changes

BarraCUDA Boiz

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools