1
edit
Changes
→Description
=== Description ===
My Mandelbrot Set program is not the type of program that the reduction of an array to a value, nor does the computation of values require knowledge of other values in an array. In fact the only time global memory is used in the Mandelbrot function is when the value is set. The input is derived from a number of constants and the position of the thread in the grid and block. There are a number of constraints but the most prominent is the size of global device memory and the size of an unsigned integer. To make very large images, very large arrays of values are needed. I enhanced a calculation to determine the largest possible size. The number of values is divided by that size and the image is built over a number of passes.
In actual fact the main Mandelbrot calculation was already as efficient as possible. It is a simple function and there simply were no changes to make. The initial CUDA version of the program saw and average improvement of 737.6% over the original CPU version of the program. The optimized version was 802.9% faster than the CPU version, which is 10.4% better than the original CUDA version.
=== Code ===