Changes

← Older edit

Unique Project Page

93 bytes removed, 01:57, 26 February 2017

→‎Introduction : GPU Benchmarking/Gaussian Blur Filter : Colin Paul

= ~~Profiling~~ Assignment 1 - Select and Assess =

== Introduction : GPU Benchmarking/Testing using Mandelbrot Sets : Kartik Nagarajan ==

|}

----

== Introduction : GPU Benchmarking/Gaussian Blur Filter : Colin Paul ==

[[Image:Cinque_terre.jpg|860px]][[Image:Cinque_terre_BLURRED.jpg|860px]]

[[Image:F2RiP.gif|500px|thumb|alt=convolution pattern]]

[[Image:Img16.png|500px|thumb|alt=Plot of frequency response of the 2D Gaussian]]

===What is a Gaussian ~~blurring~~filter blur?===

At a high level, Gaussian blurring works just like box blurring in that there is a weight per pixel and that for each pixel, you apply the weights to that pixel and it’s neighbors to come up

There are a couple ways to calculate a Gaussian kernel.

~~Believe it or not,~~ Pascal’s triangle approaches the Gaussian bell curve as the row number reaches infinity. ~~If you remember,~~ Pascal’s triangle also represents the numbers that each term is calculated by after expanding binomials (x + y)N. So technically, you could use a row from Pascal’s triangle as a 1d 1D kernel and normalize the result, but it isn’t the most accurate.

A better way is to use the Gaussian function which is this: e-x2/(2 * σ2)

Where the sigma is your blur amount and x ranges across your values from the negative to the positive. For instance , if your kernel was 5 values, it would range from -2 to +2.

An even better way would be to integrate the Gaussian function instead of just taking point samples. Refer to the diagram on the right.

Below you can find a plot of the continuous distribution function and the discrete kernel approximation. One thing to look out for are the tails of the distribution vs. kernel support:

For the current configuration , we have 13.36% of the curve’s area outside the discrete kernel. Note that the weights are renormalized such that the sum of all weights is one. Or in other words:

the probability mass outside the discrete kernel is redistributed evenly to all pixels within the kernel. The weights are calculated by numerical integration of the continuous gaussian distribution

over each discrete kernel tap~~. Take a look at the java script source in case you are interested~~.

Whatever way you do it, make sure and normalize the result so that the weights add up to 1. This makes sure that your blurring doesn’t make the image get brighter (greater than 1) or dimmer (less than 1).

speed versus quality.

~~===Running program=======Windows====To compile and run the program:# Set-up an empty Visual C++ - Visual Studio project# Save [http://matrix.senecac.on.ca/~cpaul12/cinque_terre.bmp this] image and place it in your projects directory.# Copy the source code below and paste it into a [your chosen file name].cpp file.# Go into you Debug properties of your project.# Add four (4) values into the Debugging -> Command Arguments:~~ ~~[input image filename].bmp [output image filename].bmp [x - sigma value] [y - sigmea value] => cinque_terre.bmp cinque_terre_BLURRED.bmp 3.0 3.0====Linux====To compile and run the program:# Navigate to the directory you want to run the program in.# Save [http://matrix.senecac.on.ca/~cpaul12/cinque_terre.bmp this] image and place it directory you will be running the program from.~~ ~~# Copy the main source code below and paste it into a [your chosen file name].cpp file.# Copy the header source code below and paste it into a file name windows.h.Compile the binaries using the following command:~~ ~~g++ -O2 -std=c++0x -Wall -pedantic gaussian.cpp -o blurRun the compiled prigram~~ ~~./blur cinque_terre.bmp cinque_terre_BLURRED.bmp 3.0 3.0The command line arguments are structured as follows:~~ ~~[input image filename].bmp [output image filename].bmp [x - sigma value] [y - sigmea value]~~====Code====

Original source code (Windows) can be found [http://blog.demofox.org/2015/08/19/gaussian-blur/ here].

{| class="wikitable mw-collapsible mw-collapsed"

#include <vector>

#include <functional>

#include <windows.h> // for bitmap headers. ~~Sorry non windows people!~~

const float c_pi = 3.14159265359f;

#include <vector>

#include <functional>

#include "windows.h" // for bitmap headers. ~~Sorry non windows people!~~

/* uncomment the line below if you want to run grpof */

|}

~~<h3~~===Running program=======Windows====To compile and run the program:# Set-up an empty Visual C++ - Visual Studio project.# Save [http://matrix.senecac.on.ca/~cpaul12/cinque_terre.bmp this] image and place it in your projects directory.# Copy the source code below and paste it into a [your chosen file name].cpp file.# Go into you Debug properties of your project.# Add four (4) values into the Debugging ->Command Arguments: [input image filename].bmp [output image filename].bmp [x - sigma value] [y - sigmea value] => cinque_terre.bmp cinque_terre_BLURRED.bmp 3.0 3.0====Linux====To compile and run the program:# Navigate to the directory you want to run the program in.# Save [http://matrix.senecac.on.ca/~cpaul12/cinque_terre.bmp this] image and place it directory you will be running the program from. # Copy the main source code below and paste it into a [your chosen file name].cpp file.# Copy the header source code below and paste it into a file name windows.h.Compile the binaries using the following command: g++ -O2 -std=c++0x -Wall -pedantic gaussian.cpp -o blurRun the compiled prigram ./blur cinque_terre.bmp cinque_terre_BLURRED.bmp 3.0 3.0The command line arguments are structured as follows: [input image filename].bmp [output image filename].bmp [x - sigma value] [y - sigmea value]===Analysis~~</h3>~~===

Flat profile:

=== Observations ===

The program does not take a long time to run, but ~~runtime~~ run-time depends on the values of sigma (σ) and the kernel block size. If you specify larger values for these parameters the runtime increases significantly. The code is ~~relatively~~ straight forward and ~~the~~ parallelization should ~~also~~ be easy to implement ~~and test~~.

=== Hotspot ===

Referring to the Call graph we can see more supporting evidence that this application spends nearly all of its execution time in the BlurImage function. Therefore this function is the prime candidate

for parallelization using CUDA. The sigma (σ) and the kernel size can be increased in order to make the computation stressful on the GPU to get a significant benchmark.

= Assignment 2 - Parallelize =

= Assignment 3 - Optimize =

Cpaul12

147

edits

CDOT Wiki β

Changes

Unique Project Page

CDOT Wiki ^β