1
edit
Changes
→Assignment 2
{{GPU610/DPS915 Index | 20123}}
= The B-Team =
== Team Members ==
# [mailto:kmbarnhart@learn.senecac.on.ca?sujbect=DPS915 Kyle Barnhart]
= Assignment 3 =
== Mandelbrot Set ==
=== Repo ===
https://github.com/KyleBarnhart/Fractal/tree/cuda
In actual fact the main Mandelbrot calculation was already as efficient as possible. It is a simple function and there simply were no changes to make. The initial CUDA version of the program saw and average improvement of 737.6% over the original CPU version of the program. The optimized version was 802.9% faster than the CPU version, which is 10.4% better than the original CUDA version.
=== Code ===
I'm only going to post the kernels that changed. There were also many changes in the host code that helped make the program faster.
<big><pre>
// part of common.h
const int BLOCK_SIZE_X = 16;
const int BLOCK_SIZE_Y = 8;
const int BLOCK_SIZE_SSAA = 256;
const int BLOCK_SIZE_RGB = 16;
const int MAX_GRID_SIZE_X === Description ===65536;
// Get device information for total global memory
error = cudaGetDevice(&iDevice);
if(error != cudaSuccess)
displayCudeError(error);
error = cudaGetDeviceProperties(&prop, iDevice);
if(error != cudaSuccess)
displayCudeError(error);
// The max amount to do per pass demends on the size of GPU memory and the size of unsigned integer.
// Global memory is devided by two so that both the value array and RGB array can both fit in memory.
DimensionSqType maxPixelsPerPass = (UINT_MAX > (prop.totalGlobalMem / 2)) ? (prop.totalGlobalMem / 2) : UINT_MAX;
// RGB + alpha is 4 BYTEs. Make sure two copies of the larger can fit in device memory. DimensionType largerType === Code ===((4 * sizeof(BYTE)) < sizeof(ElementType)) ? sizeof(ElementType) : (4 * sizeof(BYTE));
// Divide by two for extra safty. maxPixelsPerPass /= (largerType * 2);</pre></big><big><pre>inline __device__ ElementType mandelbrot(ElementType c_rc_i, ElementType c_ic_r, IterationType iterations) {
{
ElementType z_r = c_r;
ElementType z_i = c_i;
ElementType z2_r = z_r * z_r;
ElementType z2_i = z_i * z_i;
IterationType n = 0;
while(n < iterations && z2_r + z2_i < 4.0)
{
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
n++;
}
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
n += 2;
if(n > iterations) { return (ElementType)iterations; } else { return (ElementType)n + 1.0 - __logf(__logf(__dsqrt_rn(z2_r + z2_i )))/__logf(2.0);; }}</pre></big><big><pre>// Calculate if c is in Mandelbrot set.// Return number of iterations.__global__ void getFractal(ElementType* img, ElementType yMax, ElementType xMin, ElementType xScale, ElementType yScale, IterationType iterations, DimensionType width, DimensionType height){ DimensionType dx = blockIdx.x * BLOCK_SIZE_X + threadIdx.x; DimensionType dy = z_i blockIdx.y * z_iBLOCK_SIZE_Y + threadIdx.y;
if(dx >= width || dy >= height) return;
if(curr >= length)
return;
DimensionSqType val = list[curr];
ElementType xSubScale = xScale / ((ElementType)ssaafactor);
ElementType ySubScale = yScale / ((ElementType)ssaafactor);
{
for(AlisingFactorType y = 0; y < ssaafactor; y++)
{
subpixels[x * ssaafactor + y] = mandelbrot(yMax - ySubScale * y , xMin + xSubScale * x, iterations);
}
}
if(factor2 % 2 != 0)
{
img[val] = getMedian(subpixels, (AlisingFactorSqType)ssaafactor * (AlisingFactorSqType)ssaafactor / 2, factor2);
}
else
{
img[val] = (getMedian(subpixels, factor2 / 2 - 1, factor2)
+ getMedian(subpixels, factor2 / 2, factor2))
/ 2.0;
}
}
</pre></big>
=== Code ===
<big><pre>
__device__ void swap(ElementType* a, ElementType* b)
{
ElementType t = *a;
*a = *b;
*b = t;
}
</pre></big>
<big><pre>
__device__ ElementType getMedian(ElementType* arr, AlisingFactorSqType median, AlisingFactorSqType n)
{
AlisingFactorSqType low, high ;
AlisingFactorSqType middle, ll, hh;
low = 0 ; high = n-1 ;
for (;;) {
if (high <= low) /* One element only */
return arr[median] ;
if (high == low + 1) { /* Two elements only */
if (arr[low] > arr[high])
swap(&arr[low], &arr[high]) ;
return arr[median];
}
/* Find median of low, middle and high items; swap into position low */ middle == Calculating Pi ==(low + high) / 2; if (arr[middle] > arr[high]) swap(&arr[middle], &arr[high]) ; if (arr[low] > arr[high]) swap(&arr[low], &arr[high]) ; if (arr[middle] > arr[low]) swap(&arr[middle], &arr[low]) ;
if (hh < ll)
break;
swap(&arr[ll], &arr[hh]) ;
}
/* Swap middle item (in position low) back into correct position */
swap(&arr[low], &arr[hh]) ;
/* Re-set active partition */ if (hh <=median) low =ll; if (hh >= Description median) high =hh - 1; }}</pre></big><big><pre>__device__ ElementType mandelbrot(ElementType c_i, ElementType c_r, IterationType iterations){ ElementType z_r =c_r; ElementType z_i =c_i; ElementType z2_r = z_r * z_r; ElementType z2_i = z_i * z_i; IterationType n = 0; while(n < iterations && z2_r + z2_i < 4.0) { z_i = 2.0 * z_r * z_i + c_i; z_r = z2_r - z2_i + c_r; z2_r = z_r * z_r; z2_i = z_i * z_i; n++; }
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
n += 2;
if(n > iterations)
{
return (ElementType)iterations;
}
else
{
return (ElementType)n + 1.0 - log(log(sqrt(z2_r + z2_i)))/log(2.0);;
}
}
</pre></big>
<big><pre>
// Calculate if c is in Mandelbrot set.
// Return number of iterations.
__global__ void getFractal(ElementType* img, ElementType yMax, ElementType xMin, ElementType xScale, ElementType yScale, IterationType iterations, DimensionType width, DimensionType height)
{
DimensionType dx = blockIdx.x * BLOCK_SIZE + threadIdx.x;
DimensionType dy = blockIdx.y * BLOCK_SIZE + threadIdx.y;
if(dx >= width || dy >= height)
return;
ElementType c_i = yMax - (ElementType)dy * yScale;
ElementType c_r = xMin + (ElementType)dx * xScale;
DimensionSqType c =(DimensionSqType)dy * (DimensionSqType)width + (DimensionSqType)dx; img[c] =mandelbrot(c_i, c_r, iterations);}</pre></big><big><pre>// Calculate if c is in Mandelbrot set.// Return number of iterations.__global__ void getFractalSSAA(ElementType* img, DimensionSqType* list, DimensionSqType length, ElementType yMax, ElementType xMin, ElementType xScale, ElementType yScale, IterationType iterations, DimensionType width, AlisingFactorType ssaafactor){ DimensionType curr = Code ===blockIdx.x * BLOCK_SIZE * BLOCK_SIZE + threadIdx.x;
AlisingFactorSqType factor2 = (AlisingFactorSqType)ssaafactor * (AlisingFactorSqType)ssaafactor;
// Get the values for each pixel in fractal
ElementType subpixels[MAX_ALIASING_FACTOR * MAX_ALIASING_FACTOR];
for(AlisingFactorType x = 0; x < ssaafactor; x++)
{
for(AlisingFactorType y = 0; y < ssaafactor; y++)
{
subpixels[x * ssaafactor + y] = mandelbrot(yMax - ySubScale * y , xMin + xSubScale * x, iterations);
}
}
switch (bracket)
{
case 0:
rgb[0] = colour;
rgb[1] = 0;
rgb[2] = 0;
break;
case 1:
rgb[0] = 255;
rgb[1] = colour;
rgb[2] = 0;
break;
case 2:
rgb[0] = 255 - colour;
rgb[1] = 255;
rgb[2] = 0;
break;
case 3:
rgb[0] = 0;
rgb[1] = 255;
rgb[2] = colour;
break;
case 4:
rgb[0] = 0;
rgb[1] = 255 - colour;
rgb[2] = 255;
break;
case 5:
rgb[0] = colour;
rgb[1] = 0;
rgb[2] = 255;
break;
default:
rgb[0] = 0;
rgb[1] = 0;
rgb[2] = 0;
break;
}
}
</pre></big>
<big><pre>
__global__ void getBmpRGB(BYTE* image, ElementType* values, DimensionType width, DimensionType height, IterationType iterations)
{
DimensionType dy = blockIdx.y * BLOCK_SIZE + threadIdx.y;
DimensionType dx = blockIdx.x * BLOCK_SIZE + threadIdx.x;
if(dx >= width || dy >= height)
return;
DimensionType c = dy * width + dx;
BYTE rgbValue[3];
getRGB(values[c]/(ElementType)iterations, rgbValue); image[c*3] = rgbValue[2]; image[c*3 + 1] = rgbValue[1]; image[c*3 + 2] = rgbValue[0];}</pre></big><big><pre>__global__ void getBmpRGBfromHistorgram(ElementType* map, BYTE* image, ElementType* values, DimensionType width, DimensionType height){ DimensionType dy = blockIdx.y * BLOCK_SIZE + threadIdx.y; DimensionType dx = blockIdx.x * BLOCK_SIZE + threadIdx.x; if(dx >= width || dy >= height) return result; DimensionType c = dy * width + dx; IterationType ival = (IterationType)values[c]; ElementType colourVal = map[ival] + (values[c] - (ElementType)ival) * (map[ival + 1] - map[ival]); BYTE rgbValue[3]; getRGB(colourVal, rgbValue); image[c*3] = rgbValue[2]; image[c*3 + 1] = rgbValue[1]; image[c*3 + 2] = rgbValue[0];}</pre></big> = Assignment 1 === Mandelbrot Set ===== Repo ===https://github.com/KyleBarnhart/Fractal === Description ===I wrote this program because I could not find a suitable fractal program. This program generates fractal images of the Mandelbrot Set and saves them in the BMP image format. It can generate images of an almost arbitrary size. You also set the locations, number of iterations, and the zoom factor. It uses a smoothing algorithm and a histogram to create smoothly and evenly coloured images. It has selective super sampling for anti-aliasing. And it can generate a sequence of images that can be put together to make a video. In testing the program I made a video (http://www.youtube.com/watch?v=vQigSMuHxuU).
=== Code ===
<big><pre>
inline ElementType mandelbrot(ElementType c_r, ElementType c_i, IterationType iterations)
{
ElementType z_r = c_r;
ElementType z_i = c_i;
ElementType z2_r = z_r * z_r;
ElementType z2_i = z_i * z_i;
IterationType n = 0;
while(n < iterations && z2_r + z2_i < 4.0)
{
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
n++;
}
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
z_i = 2.0 * z_r * z_i + c_i;
z_r = z2_r - z2_i + c_r;
z2_r = z_r * z_r;
z2_i = z_i * z_i;
n += 2;
if(n > iterations)
{
return (ElementType)iterations;
}
else
{
return (ElementType)n + 1.0 - log(log(sqrt(z2_r + z2_i)))/log(2.0);;
}
}
</pre></big>
== Calculating Pi ==
=== Repo ===
https://github.com/KyleBarnhart/Pi
=== Description ===
I wrote this program based on reading some articles I found. The program calculates the value of pi by randomly generating points in a one by one area and determining if they are no more than a distance of one from the bottom left corner (0, 0). By randomly generating many uniformly distributed numbers and dividing the number that fall within the circle by the number of iterations you can determine pi. The more iterations that you preform, the closer the value will be to pi.
=== Code ===<big><pre>inline double moreRandom(unsigned iterations){ double result = (double)rand() / (double)RAND_MAX; for(unsigned i = 1; i < iterations; i++) { result = (((double)rand()) + result) / (double)RAND_MAX; } return result;}</pre></big>