Changes

Unique Project Page

30,664 bytes added, 01:32, 26 February 2017

→‎Assignment 1 - Profiling

|}

== Introduction : GPU Benchmarking/Gaussian Blur Filter : Colin Paul ==

[[Image:Cinque_terre.jpg|860px]][[Image:Cinque_terre_BLURRED.jpg|860px]]

[[Image:F2RiP.gif|500px|thumb|alt=convolution pattern]]

[[Image:Img16.png|500px|thumb|alt=Plot of frequency response of the 2D Gaussian]]

===What is Gaussian blurring?===

At a high level, Gaussian blurring works just like box blurring in that there is a weight per pixel and that for each pixel, you apply the weights to that pixel and it’s neighbors to come up

with the final value for the blurred pixel. It uses a convolution pattern which is a linear stencil that applies fixed weights to the elements of a neighborhood in the combination operation.

With true Gaussian blurring however, the function that defines the weights for each pixel technically never reaches zero, but gets smaller and smaller over distance. In theory, this makes a

Gaussian kernel infinitely large. In practice though, you can choose a cut-off point and call it good enough.

====The parameters to a Gaussian blur are:====

*Sigma (σ) – This defines how much blur there is. A larger number is a higher amount of blur.

*Radius – The size of the kernel in pixels. The appropriate pixel size can be calculated for a specific sigma, but more information on that lower down.

Just like a box blur, a Gaussian blur is separable which means that you can either apply a 2D convolution kernel, or you can apply a 1D convolution kernel on each axis. Doing a single 2D convolution

means more calculations, but you only need one buffer to put the results into. Doing two 1D convolutions (one on each axis), ends up being fewer calculations, but requires two buffers to put the results

into (one intermediate buffer to hold the first axis results).

Here is a 3 pixel 1D Gaussian Kernel for a sigma of 1.0:

[[Image:1dkernel.png|250px]]

This kernel is useful for a two pass algorithm: First, perform a horizontal blur with the weights below and then perform a vertical blur on the resulting image (or vice versa).

Below is a 3×3 pixel 2D Gaussian Kernel also with a sigma of 1.0. Note that this can be calculated as an outer product (tensor product) of 1D kernels:

[[Image:2dkernel.png|250px]]

These weights below be used directly in a single pass blur algorithm: n2 samples per pixel.

An interesting property of Gaussian blurs is that you can apply multiple smaller blurs and it will come up with the result as if you did a larger Blur. Unfortunately it’s more

calculations doing multiple smaller blurs so is not usually worth while.

If you apply multiple blurs, the equivalent blur is the square root of the sum of the squares of the blur. Taking wikipedia’s example, if you applied a blur with radius 6 and a blur

with a radius of 8, you’d end up with the equivelant of a radius 10 blur. This is because √ 62 + 82 = 10

[[Image:Kernalweightperpixel.PNG|500px|thumb|alt=2D Gaussian]]

<h4>Calculating The Kernel</h4>

There are a couple ways to calculate a Gaussian kernel.

Believe it or not, Pascal’s triangle approaches the Gaussian bell curve as the row number reaches infinity. If you remember, Pascal’s triangle also represents the numbers that each term

is calculated by after expanding binomials (x + y)N. So technically, you could use a row from Pascal’s triangle as a 1d kernel and normalize the result, but it isn’t the most accurate.

A better way is to use the Gaussian function which is this: e-x2/(2 * σ2)

Where the sigma is your blur amount and x ranges across your values from the negative to the positive. For instance if your kernel was 5 values, it would range from -2 to +2.

An even better way would be to integrate the Gaussian function instead of just taking point samples. Refer to the diagram on the right.

Below you can find a plot of the continuous distribution function and the discrete kernel approximation. One thing to look out for are the tails of the distribution vs. kernel support:

For the current configuration we have 13.36% of the curve’s area outside the discrete kernel. Note that the weights are renormalized such that the sum of all weights is one. Or in other words:

the probability mass outside the discrete kernel is redistributed evenly to all pixels within the kernel. The weights are calculated by numerical integration of the continuous gaussian distribution

over each discrete kernel tap. Take a look at the java script source in case you are interested.

Whatever way you do it, make sure and normalize the result so that the weights add up to 1. This makes sure that your blurring doesn’t make the image get brighter (greater than 1) or dimmer (less than 1).

====Calculating The Kernel Size====

Given a sigma value, you can calculate the size of the kernel you need by using this formula:1 + 2 √ -2σ2 ln 0.0005

That formula makes a Kernel large enough such that it cuts off when the value in the kernel is less than 0.5%. You can adjust the number in there to higher or lower depending on your desires for

speed versus quality.

===Running program===

====Windows====

To compile and run the program:

# Set-up an empty Visual C++ - Visual Studio project

# Save [http://matrix.senecac.on.ca/~cpaul12/cinque_terre.bmp this] image and place it in your projects directory.

# Copy the source code below and paste it into a [your chosen file name].cpp file.

# Go into you Debug properties of your project.

# Add four (4) values into the Debugging -> Command Arguments:

[input image filename].bmp [output image filename].bmp [x - sigma value] [y - sigmea value] => cinque_terre.bmp cinque_terre_BLURRED.bmp 3.0 3.0

====Linux====

To compile and run the program:

# Navigate to the directory you want to run the program in.

# Save [http://matrix.senecac.on.ca/~cpaul12/cinque_terre.bmp this] image and place it directory you will be running the program from.

# Copy the main source code below and paste it into a [your chosen file name].cpp file.

# Copy the header source code below and paste it into a file name windows.h.

Compile the binaries using the following command:

g++ -O2 -std=c++0x -Wall -pedantic gaussian.cpp -o blur

Run the compiled prigram

./blur cinque_terre.bmp cinque_terre_BLURRED.bmp 3.0 3.0

The command line arguments are structured as follows:

[input image filename].bmp [output image filename].bmp [x - sigma value] [y - sigmea value]

====Code====

Original source code (Windows) can be found [http://blog.demofox.org/2015/08/19/gaussian-blur/ here].

{| class="wikitable mw-collapsible mw-collapsed"

! Windows - Gassusan Blur Filter Source Code (Visual Studio)

|-

|

#include <iostream>

#include <stdio.h>

#include <stdlib.h>

#include <stdint.h>

#include <array>

#include <vector>

#include <functional>

#include <windows.h> // for bitmap headers. Sorry non windows people!

const float c_pi = 3.14159265359f;

struct SImageData

{

SImageData()

: m_width(0)

, m_height(0)

{ }

long m_width;

long m_height;

long m_pitch;

std::vector<uint8_t> m_pixels;

};

void WaitForEnter()

{

char c;

std::cout << "Press Enter key to exit ... ";

std::cin.get(c);

}

bool LoadImage(const char *fileName, SImageData& imageData)

{

// open the file if we can

FILE *file;

file = fopen(fileName, "rb");

if (!file)

return false;

// read the headers if we can

BITMAPFILEHEADER header;

BITMAPINFOHEADER infoHeader;

if (fread(&header, sizeof(header), 1, file) != 1 ||

fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||

header.bfType != 0x4D42 || infoHeader.biBitCount != 24)

{

fclose(file);

return false;

}

// read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4

imageData.m_pixels.resize(infoHeader.biSizeImage);

fseek(file, header.bfOffBits, SEEK_SET);

if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)

{

fclose(file);

return false;

}

imageData.m_width = infoHeader.biWidth;

imageData.m_height = infoHeader.biHeight;

imageData.m_pitch = imageData.m_width * 3;

if (imageData.m_pitch & 3)

{

imageData.m_pitch &= ~3;

imageData.m_pitch += 4;

}

fclose(file);

return true;

}

bool SaveImage(const char *fileName, const SImageData &image)

{

// open the file if we can

FILE *file;

file = fopen(fileName, "wb");

if (!file)

return false;

// make the header info

BITMAPFILEHEADER header;

BITMAPINFOHEADER infoHeader;

header.bfType = 0x4D42;

header.bfReserved1 = 0;

header.bfReserved2 = 0;

header.bfOffBits = 54;

infoHeader.biSize = 40;

infoHeader.biWidth = image.m_width;

infoHeader.biHeight = image.m_height;

infoHeader.biPlanes = 1;

infoHeader.biBitCount = 24;

infoHeader.biCompression = 0;

infoHeader.biSizeImage = image.m_pixels.size();

infoHeader.biXPelsPerMeter = 0;

infoHeader.biYPelsPerMeter = 0;

infoHeader.biClrUsed = 0;

infoHeader.biClrImportant = 0;

header.bfSize = infoHeader.biSizeImage + header.bfOffBits;

// write the data and close the file

fwrite(&header, sizeof(header), 1, file);

fwrite(&infoHeader, sizeof(infoHeader), 1, file);

fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);

fclose(file);

return true;

}

int PixelsNeededForSigma(float sigma)

{

// returns the number of pixels needed to represent a gaussian kernal that has values

// down to the threshold amount. A gaussian function technically has values everywhere

// on the image, but the threshold lets us cut it off where the pixels contribute to

// only small amounts that aren't as noticeable.

const float c_threshold = 0.005f; // 0.5%

return int(floor(1.0f + 2.0f * sqrtf(-2.0f * sigma * sigma * log(c_threshold)))) + 1;

}

float Gaussian(float sigma, float x)

{

return expf(-(x*x) / (2.0f * sigma*sigma));

}

float GaussianSimpsonIntegration(float sigma, float a, float b)

{

return

((b - a) / 6.0f) *

(Gaussian(sigma, a) + 4.0f * Gaussian(sigma, (a + b) / 2.0f) + Gaussian(sigma, b));

}

std::vector<float> GaussianKernelIntegrals(float sigma, int taps)

{

std::vector<float> ret;

float total = 0.0f;

for (int i = 0; i < taps; ++i)

{

float x = float(i) - float(taps / 2);

float value = GaussianSimpsonIntegration(sigma, x - 0.5f, x + 0.5f);

ret.push_back(value);

total += value;

}

// normalize it

for (unsigned int i = 0; i < ret.size(); ++i)

{

ret[i] /= total;

}

return ret;

}

const uint8_t* GetPixelOrBlack(const SImageData& image, int x, int y)

{

static const uint8_t black[3] = { 0, 0, 0 };

if (x < 0 || x >= image.m_width ||

y < 0 || y >= image.m_height)

{

return black;

}

return &image.m_pixels[(y * image.m_pitch) + x * 3];

}

void BlurImage(const SImageData& srcImage, SImageData &destImage, float xblursigma, float yblursigma, unsigned int xblursize, unsigned int yblursize)

{

// allocate space for copying the image for destImage and tmpImage

destImage.m_width = srcImage.m_width;

destImage.m_height = srcImage.m_height;

destImage.m_pitch = srcImage.m_pitch;

destImage.m_pixels.resize(destImage.m_height * destImage.m_pitch);

SImageData tmpImage;

tmpImage.m_width = srcImage.m_width;

tmpImage.m_height = srcImage.m_height;

tmpImage.m_pitch = srcImage.m_pitch;

tmpImage.m_pixels.resize(tmpImage.m_height * tmpImage.m_pitch);

// horizontal blur from srcImage into tmpImage

{

auto row = GaussianKernelIntegrals(xblursigma, xblursize);

int startOffset = -1 * int(row.size() / 2);

for (int y = 0; y < tmpImage.m_height; ++y)

{

for (int x = 0; x < tmpImage.m_width; ++x)

{

std::array<float, 3> blurredPixel = { { 0.0f, 0.0f, 0.0f } };

for (unsigned int i = 0; i < row.size(); ++i)

{

const uint8_t *pixel = GetPixelOrBlack(srcImage, x + startOffset + i, y);

blurredPixel[0] += float(pixel[0]) * row[i];

blurredPixel[1] += float(pixel[1]) * row[i];

blurredPixel[2] += float(pixel[2]) * row[i];

}

uint8_t *destPixel = &tmpImage.m_pixels[y * tmpImage.m_pitch + x * 3];

destPixel[0] = uint8_t(blurredPixel[0]);

destPixel[1] = uint8_t(blurredPixel[1]);

destPixel[2] = uint8_t(blurredPixel[2]);

}

// vertical blur from tmpImage into destImage

{

auto row = GaussianKernelIntegrals(yblursigma, yblursize);

int startOffset = -1 * int(row.size() / 2);

for (int y = 0; y < destImage.m_height; ++y)

{

for (int x = 0; x < destImage.m_width; ++x)

{

std::array<float, 3> blurredPixel = { { 0.0f, 0.0f, 0.0f } };

for (unsigned int i = 0; i < row.size(); ++i)

{

const uint8_t *pixel = GetPixelOrBlack(tmpImage, x, y + startOffset + i);

blurredPixel[0] += float(pixel[0]) * row[i];

blurredPixel[1] += float(pixel[1]) * row[i];

blurredPixel[2] += float(pixel[2]) * row[i];

}

uint8_t *destPixel = &destImage.m_pixels[y * destImage.m_pitch + x * 3];

destPixel[0] = uint8_t(blurredPixel[0]);

destPixel[1] = uint8_t(blurredPixel[1]);

destPixel[2] = uint8_t(blurredPixel[2]);

}

int main(int argc, char **argv)

{

float xblursigma, yblursigma;

bool showUsage = argc < 5 ||

(sscanf(argv[3], "%f", &xblursigma) != 1) ||

(sscanf(argv[4], "%f", &yblursigma) != 1);

char *srcFileName = argv[1];

char *destFileName = argv[2];

if (showUsage)

{

printf("Usage: <source> <dest> <xblur> <yblur>\nBlur values are sigma\n\n");

WaitForEnter();

return 1;

}

// calculate pixel sizes, and make sure they are odd

int xblursize = PixelsNeededForSigma(xblursigma) | 1;

int yblursize = PixelsNeededForSigma(yblursigma) | 1;

printf("Attempting to blur a 24 bit image.\n");

printf(" Source=%s\n Dest=%s\n blur=[%0.1f, %0.1f] px=[%d,%d]\n\n", srcFileName, destFileName, xblursigma, yblursigma, xblursize, yblursize);

SImageData srcImage;

if (LoadImage(srcFileName, srcImage))

{

printf("%s loaded\n", srcFileName);

SImageData destImage;

BlurImage(srcImage, destImage, xblursigma, yblursigma, xblursize, yblursize);

if (SaveImage(destFileName, destImage))

printf("Blurred image saved as %s\n", destFileName);

else

{

printf("Could not save blurred image as %s\n", destFileName);

WaitForEnter();

return 1;

}

else

{

printf("could not read 24 bit bmp file %s\n\n", srcFileName);

WaitForEnter();

return 1;

}

return 0;

}

</syntaxhighlight>

|}

Ported to Linux:

{| class="wikitable mw-collapsible mw-collapsed"

! Linux - Gassusan Blur Filter Source Code (Command Line)

|-

|

#include <iostream>

#include <stdio.h>

#include <stdlib.h>

#include <stdint.h>

#include <math.h>

#include <array>

#include <vector>

#include <functional>

#include "windows.h" // for bitmap headers. Sorry non windows people!

/* uncomment the line below if you want to run grpof */

//#define RUN_GPROF

const float c_pi = 3.14159265359f;

struct SImageData

{

SImageData()

: m_width(0)

, m_height(0)

{ }

long m_width;

long m_height;

long m_pitch;

std::vector<uint8_t> m_pixels;

};

void WaitForEnter()

{

char c;

std::cout << "Press Enter key to exit ... ";

std::cin.get(c);

}

bool LoadImage(const char *fileName, SImageData& imageData)

{

// open the file if we can

FILE *file;

file = fopen(fileName, "rb");

if (!file)

return false;

// read the headers if we can

BITMAPFILEHEADER header;

BITMAPINFOHEADER infoHeader;

if (fread(&header, sizeof(header), 1, file) != 1 ||

fread(&infoHeader, sizeof(infoHeader), 1, file) != 1 ||

header.bfType != 0x4D42 || infoHeader.biBitCount != 24)

{

fclose(file);

return false;

}

// read in our pixel data if we can. Note that it's in BGR order, and width is padded to the next power of 4

imageData.m_pixels.resize(infoHeader.biSizeImage);

fseek(file, header.bfOffBits, SEEK_SET);

if (fread(&imageData.m_pixels[0], imageData.m_pixels.size(), 1, file) != 1)

{

fclose(file);

return false;

}

imageData.m_width = infoHeader.biWidth;

imageData.m_height = infoHeader.biHeight;

imageData.m_pitch = imageData.m_width * 3;

if (imageData.m_pitch & 3)

{

imageData.m_pitch &= ~3;

imageData.m_pitch += 4;

}

fclose(file);

return true;

}

bool SaveImage(const char *fileName, const SImageData &image)

{

// open the file if we can

FILE *file;

file = fopen(fileName, "wb");

if (!file)

return false;

// make the header info

BITMAPFILEHEADER header;

BITMAPINFOHEADER infoHeader;

header.bfType = 0x4D42;

header.bfReserved1 = 0;

header.bfReserved2 = 0;

header.bfOffBits = 54;

infoHeader.biSize = 40;

infoHeader.biWidth = image.m_width;

infoHeader.biHeight = image.m_height;

infoHeader.biPlanes = 1;

infoHeader.biBitCount = 24;

infoHeader.biCompression = 0;

infoHeader.biSizeImage = image.m_pixels.size();

infoHeader.biXPelsPerMeter = 0;

infoHeader.biYPelsPerMeter = 0;

infoHeader.biClrUsed = 0;

infoHeader.biClrImportant = 0;

header.bfSize = infoHeader.biSizeImage + header.bfOffBits;

// write the data and close the file

fwrite(&header, sizeof(header), 1, file);

fwrite(&infoHeader, sizeof(infoHeader), 1, file);

fwrite(&image.m_pixels[0], infoHeader.biSizeImage, 1, file);

fclose(file);

return true;

}

int PixelsNeededForSigma(float sigma)

{

// returns the number of pixels needed to represent a gaussian kernal that has values

// down to the threshold amount. A gaussian function technically has values everywhere

// on the image, but the threshold lets us cut it off where the pixels contribute to

// only small amounts that aren't as noticeable.

const float c_threshold = 0.005f; // 0.5%

return int(floor(1.0f + 2.0f * sqrtf(-2.0f * sigma * sigma * log(c_threshold)))) + 1;

}

float Gaussian(float sigma, float x)

{

return expf(-(x*x) / (2.0f * sigma*sigma));

}

float GaussianSimpsonIntegration(float sigma, float a, float b)

{

return

((b - a) / 6.0f) *

(Gaussian(sigma, a) + 4.0f * Gaussian(sigma, (a + b) / 2.0f) + Gaussian(sigma, b));

}

std::vector<float> GaussianKernelIntegrals(float sigma, int taps)

{

std::vector<float> ret;

float total = 0.0f;

for (int i = 0; i < taps; ++i)

{

float x = float(i) - float(taps / 2);

float value = GaussianSimpsonIntegration(sigma, x - 0.5f, x + 0.5f);

ret.push_back(value);

total += value;

}

// normalize it

for (unsigned int i = 0; i < ret.size(); ++i)

{

ret[i] /= total;

}

return ret;

}

const uint8_t* GetPixelOrBlack(const SImageData& image, int x, int y)

{

static const uint8_t black[3] = { 0, 0, 0 };

if (x < 0 || x >= image.m_width ||

y < 0 || y >= image.m_height)

{

return black;

}

return &image.m_pixels[(y * image.m_pitch) + x * 3];

}

void BlurImage(const SImageData& srcImage, SImageData &destImage, float xblursigma, float yblursigma, unsigned int xblursize, unsigned int yblursize)

{

// allocate space for copying the image for destImage and tmpImage

destImage.m_width = srcImage.m_width;

destImage.m_height = srcImage.m_height;

destImage.m_pitch = srcImage.m_pitch;

destImage.m_pixels.resize(destImage.m_height * destImage.m_pitch);

SImageData tmpImage;

tmpImage.m_width = srcImage.m_width;

tmpImage.m_height = srcImage.m_height;

tmpImage.m_pitch = srcImage.m_pitch;

tmpImage.m_pixels.resize(tmpImage.m_height * tmpImage.m_pitch);

// horizontal blur from srcImage into tmpImage

{

auto row = GaussianKernelIntegrals(xblursigma, xblursize);

int startOffset = -1 * int(row.size() / 2);

for (int y = 0; y < tmpImage.m_height; ++y)

{

for (int x = 0; x < tmpImage.m_width; ++x)

{

std::array<float, 3> blurredPixel = { { 0.0f, 0.0f, 0.0f } };

for (unsigned int i = 0; i < row.size(); ++i)

{

const uint8_t *pixel = GetPixelOrBlack(srcImage, x + startOffset + i, y);

blurredPixel[0] += float(pixel[0]) * row[i];

blurredPixel[1] += float(pixel[1]) * row[i];

blurredPixel[2] += float(pixel[2]) * row[i];

}

uint8_t *destPixel = &tmpImage.m_pixels[y * tmpImage.m_pitch + x * 3];

destPixel[0] = uint8_t(blurredPixel[0]);

destPixel[1] = uint8_t(blurredPixel[1]);

destPixel[2] = uint8_t(blurredPixel[2]);

}

// vertical blur from tmpImage into destImage

{

auto row = GaussianKernelIntegrals(yblursigma, yblursize);

int startOffset = -1 * int(row.size() / 2);

for (int y = 0; y < destImage.m_height; ++y)

{

for (int x = 0; x < destImage.m_width; ++x)

{

std::array<float, 3> blurredPixel = { { 0.0f, 0.0f, 0.0f } };

for (unsigned int i = 0; i < row.size(); ++i)

{

const uint8_t *pixel = GetPixelOrBlack(tmpImage, x, y + startOffset + i);

blurredPixel[0] += float(pixel[0]) * row[i];

blurredPixel[1] += float(pixel[1]) * row[i];

blurredPixel[2] += float(pixel[2]) * row[i];

}

uint8_t *destPixel = &destImage.m_pixels[y * destImage.m_pitch + x * 3];

destPixel[0] = uint8_t(blurredPixel[0]);

destPixel[1] = uint8_t(blurredPixel[1]);

destPixel[2] = uint8_t(blurredPixel[2]);

}

int main(int argc, char **argv)

{

#ifdef RUN_GPROF

float xblursigma = 3.0f, yblursigma = 3.0f;

bool showUsage = false;

const char *srcFileName = "cinque_terre.bmp";

const char *destFileName = "cinque_terre_BLURRED.bmp";

#else

float xblursigma, yblursigma;

bool showUsage = argc < 5 ||

(sscanf(argv[3], "%f", &xblursigma) != 1) ||

(sscanf(argv[4], "%f", &yblursigma) != 1);

char *srcFileName = argv[1];

char *destFileName = argv[2];

#endif /* RUNG_ROF */

if (showUsage)

{

printf("Usage: <source> <dest> <xblur> <yblur>\nBlur values are sigma\n\n");

WaitForEnter();

return 1;

}

// calculate pixel sizes, and make sure they are odd

int xblursize = PixelsNeededForSigma(xblursigma) | 1;

int yblursize = PixelsNeededForSigma(yblursigma) | 1;

printf("Attempting to blur a 24 bit image.\n");

printf(" Source=%s\n Dest=%s\n blur=[%0.1f, %0.1f] px=[%d,%d]\n\n", srcFileName, destFileName, xblursigma, yblursigma, xblursize, yblursize);

SImageData srcImage;

if (LoadImage(srcFileName, srcImage))

{

printf("%s loaded\n", srcFileName);

SImageData destImage;

BlurImage(srcImage, destImage, xblursigma, yblursigma, xblursize, yblursize);

if (SaveImage(destFileName, destImage))

printf("Blurred image saved as %s\n", destFileName);

else

{

printf("Could not save blurred image as %s\n", destFileName);

WaitForEnter();

return 1;

}

else

{

printf("could not read 24 bit bmp file %s\n\n", srcFileName);

WaitForEnter();

return 1;

}

return 0;

}

</syntaxhighlight>

|}

{| class="wikitable mw-collapsible mw-collapsed"

! Linux - Header Source Code (Linux cannot use the Windows API, had to replicate the required structs)

|-

|

#pragma once

// for Linux platform, please make sure the size of data type is correct for BMP spec.

// if you use this on Windows or other platforms, please pay attention to this.

typedef int LONG;

typedef unsigned char BYTE;

typedef unsigned int DWORD;

typedef unsigned short WORD;

// __attribute__((packed)) on non-Intel arch may cause some unexpected errors!

typedef struct tagBITMAPFILEHEADER

{

WORD bfType; // 2 /* Magic identifier */

DWORD bfSize; // 4 /* File size in bytes */

WORD bfReserved1; // 2

WORD bfReserved2; // 2

DWORD bfOffBits; // 4 /* Offset to image data, bytes */

} __attribute__((packed)) BITMAPFILEHEADER;

typedef struct tagBITMAPINFOHEADER

{

DWORD biSize; // 4 /* Header size in bytes */

LONG biWidth; // 4 /* Width of image */

LONG biHeight; // 4 /* Height of image */

WORD biPlanes; // 2 /* Number of colour planes */

WORD biBitCount; // 2 /* Bits per pixel */

DWORD biCompression; // 4 /* Compression type */

DWORD biSizeImage; // 4 /* Image size in bytes */

LONG biXPelsPerMeter; // 4

LONG biYPelsPerMeter; // 4 /* Pixels per meter */

DWORD biClrUsed; // 4 /* Number of colours */

DWORD biClrImportant; // 4 /* Important colours */

} __attribute__((packed)) BITMAPINFOHEADER;

</syntaxhighlight>

|}

<h3>Analysis</h3>

Flat profile:

Each sample counts as 0.01 seconds.

% cumulative self self total

time seconds seconds calls ns/call ns/call name

61.38 3.37 3.37 BlurImage(SImageData const&, SImageData&, float, float, unsigned int, unsigned int)

38.62 5.49 2.12 172032000 12.32 12.32 GetPixelOrBlack(SImageData const&, int, int)

0.00 5.49 0.00 126 0.00 0.00 Gaussian(float, float)

0.00 5.49 0.00 42 0.00 0.00 GaussianSimpsonIntegration(float, float, float)

0.00 5.49 0.00 12 0.00 0.00 void std::vector<float, std::allocator<float> >::_M_insert_aux<float const&>(__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > >, float const&&&)

0.00 5.49 0.00 3 0.00 0.00 std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned int)

0.00 5.49 0.00 2 0.00 0.00 GaussianKernelIntegrals(float, int)

0.00 5.49 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z12WaitForEnterv

Call graph

granularity: each sample hit covers 4 byte(s) for 0.18% of 5.49 seconds

index % time self children called name

[1] 100.0 3.37 2.12 BlurImage(SImageData const&, SImageData&, float, float, unsigned int, unsigned int) [1]

2.12 0.00 172032000/172032000 GetPixelOrBlack(SImageData const&, int, int) [2]

0.00 0.00 2/2 GaussianKernelIntegrals(float, int) [11]

0.00 0.00 2/3 std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned int) [10]

-----------------------------------------------

2.12 0.00 172032000/172032000 BlurImage(SImageData const&, SImageData&, float, float, unsigned int, unsigned int) [1]

[2] 38.6 2.12 0.00 172032000 GetPixelOrBlack(SImageData const&, int, int) [2]

-----------------------------------------------

0.00 0.00 126/126 GaussianSimpsonIntegration(float, float, float) [8]

[7] 0.0 0.00 0.00 126 Gaussian(float, float) [7]

-----------------------------------------------

0.00 0.00 42/42 GaussianKernelIntegrals(float, int) [11]

[8] 0.0 0.00 0.00 42 GaussianSimpsonIntegration(float, float, float) [8]

0.00 0.00 126/126 Gaussian(float, float) [7]

-----------------------------------------------

0.00 0.00 12/12 GaussianKernelIntegrals(float, int) [11]

[9] 0.0 0.00 0.00 12 void std::vector<float, std::allocator<float> >::_M_insert_aux<float const&>(__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > >, float const&&&) [9]

-----------------------------------------------

0.00 0.00 1/3 LoadImage(char const*, SImageData&) [15]

0.00 0.00 2/3 BlurImage(SImageData const&, SImageData&, float, float, unsigned int, unsigned int) [1]

[10] 0.0 0.00 0.00 3 std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned int) [10]

-----------------------------------------------

0.00 0.00 2/2 BlurImage(SImageData const&, SImageData&, float, float, unsigned int, unsigned int) [1]

[11] 0.0 0.00 0.00 2 GaussianKernelIntegrals(float, int) [11]

0.00 0.00 42/42 GaussianSimpsonIntegration(float, float, float) [8]

0.00 0.00 12/12 void std::vector<float, std::allocator<float> >::_M_insert_aux<float const&>(__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > >, float const&&&) [9]

-----------------------------------------------

0.00 0.00 1/1 __do_global_ctors_aux [18]

[12] 0.0 0.00 0.00 1 _GLOBAL__sub_I__Z12WaitForEnterv [12]

-----------------------------------------------

Index by function name

[12] _GLOBAL__sub_I__Z12WaitForEnterv (gaussian.cpp) [8] GaussianSimpsonIntegration(float, float, float) [9] void std::vector<float, std::allocator<float> >::_M_insert_aux<float const&>(__gnu_cxx::__normal_iterator<float*, std::vector<float, std::allocator<float> > >, float const&&&)

[2] GetPixelOrBlack(SImageData const&, int, int) [7] Gaussian(float, float) [10] std::vector<unsigned char, std::allocator<unsigned char> >::_M_default_append(unsigned int)

[11] GaussianKernelIntegrals(float, int) [1] BlurImage(SImageData const&, SImageData&, float, float, unsigned int, unsigned int)

=== Observations ===

The program does not take a long time to run, but runtime depends on the values of sigma (σ) and the kernel block size. If you specify larger values for these parameters the runtime increases

significantly. The code is relatively straight forward and the parallelization should also be easy to implement and test.

=== Hotspot ===

According to the Flat profile, 61.38% of the time is spent in the BlurImage function. This function contains a set of triply-nested for-loops which equates to a run-time of T(n) is O(n3).

Referring to the Call graph we can see more supporting evidence that this application spends nearly all of its execution time in the BlurImage function. Therefore this function is the prime candidate

for parallelization using CUDA. The sigma (σ) and the kernel size can be increased in order to make the computation stressful on the GPU to get a significant benchmark.

Cpaul12

147

edits

Changes

Unique Project Page

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools