49
edits
Changes
no edit summary
== Steps ==
=== Host Memory Management ===
In the original program a bmp is loaded into an vector of uint8_t. This is not ideal for CUDA, therefore an array of pinned memory was allocated. This array contains the same amount of elements but stores them as a structure, "BGRPixel" which is three contiguous floats. The vector is then transferred over to pinned memory. {| class="wikitable mw-collapsible mw-collapsed"! Unoptimized - BlurImage( ... )|-|<syntaxhighlight lang="cpp">struct SImageData{ SImageData() : m_width(0) , m_height(0) { } long m_width; long m_height; long m_pitch; std::vector<uint8_t> m_pixels;}; struct BGRPixel { float b; float g; float r;}; void BlurImage(const SImageData& srcImage, SImageData &destImage, float xblursigma, float yblursigma, unsigned int xblursize, unsigned int yblursize){ int xImage = srcImage.m_width; // Width of image int yImage = srcImage.m_height; // Height of image int imageSize = xImage*yImage; int xPadded = xImage + (xblursize - 1); // Width including padding int yPadded = yImage + (yblursize - 1); // Height including padding int paddedSize = xPadded*yPadded; int xPad = xblursize / 2; // Number of padding columns on each side int yPad = yblursize / 2; int padOffset = xPadded*yPad + xPad; // Offset to first pixel in padded image float* pinnedImage = nullptr; BGRPixel* d_padded1 = nullptr; BGRPixel* d_padded2 = nullptr; // ... // Allocate memory for host and device check(cudaHostAlloc((void**)&pinnedImage, 3 * imageSize * sizeof(float), 0)); check(cudaMalloc((void**)&d_padded1, paddedSize * sizeof(BGRPixel))); check(cudaMalloc((void**)&d_padded2, paddedSize * sizeof(BGRPixel))); // Copy image to pinned memory for (int i = 0; i < 3 * imageSize; ++i) { pinnedImage[i] = (float)srcImage.m_pixels[i]; } // ...}</syntaxhighlight> |}
=== Device Memory Management ===
To get a blurred pixel the surrounding pixels must be sampled, in some cases this means sampling pixels outside the bounds of the image. In the original, a simple if check was used to determine if the pixel was outside the bounds or the image, if it was a black pixel was returned instead. This if statement most likely would have caused massive thread divergence in a kernel, therefore the images created in device memory featured additional padding of black pixels to compensate for this. Two such images were created, one to perform horizontal blur and one to perform vertical blur. Other small device arrays were also needed to store the Gaussian integrals that are used to produce the blurring effect.<br>