Changes

BETTERRED

579 bytes added, 11:47, 12 April 2017

→‎Device Memory Management

In the original program a bmp is loaded into an vector of uint8_t. This is not ideal for CUDA, therefore an array of pinned memory was allocated. This array contains the same amount of elements but stores them as a structure, "BGRPixel" which is three contiguous floats. The vector is then transferred over to pinned memory.

=== Device Memory Management ===

To get a blurred pixel the surrounding pixels must be sampled, in some cases this means sampling pixels outside the bounds of the image. In the original, a simple if check was used to determine if the pixel was outside the bounds or the image, if it was a black pixel was returned instead. This if statement most likely would have caused massive thread divergence in a kernel, therefore the images created in device memory featured additional padding of black pixels to compensate for this. Two such images were created, one to perform horizontal blur and one to perform vertical blur. Other small device arrays were also needed to store the Gaussian integrals that are used to produce the blurring effect.<br>{| class="wikitable mw-collapsible mw-collapsed"! Padding example|-| <div style="display:inline;">

[[File:shrunk.png]]

</div>

[[File:pad.png]]

</div>

<br>

This is how the image would be padded for 3x3 sigma blur.

The original image is 2560x1600 -> 11.7MB

With blur sigmas [x = 3, y = 3] and conversion to float the padded images will be 2600x1640 -> 48.8MB

Increase of 4.1% pixels and with the conversion for uint8_t to float total increase of 317% in memory requirements on the GPU

Since two padded images are needed at least 97.6MB will be on the GPU

|}

=== Host to Device ===

Jkraitberg

49

edits

Changes

BETTERRED

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools