17
edits
Changes
→Assignment 2/3 - Parallelize & Optimize
The main objective was refactor the get_number_iterations() function and the subsequent functions called that created the nested loops. The objective was met as all the functions were refactored into a single device function that did the calculation for a single pixel of the image. As the original program was done with doubles, all of the doubles were changed to floats.
=== Steps ===
=== Host Memory Management ===
After that is done the image is copied back using a single memcpy to the host.
=== Results ===
The program was compiled using clang++ , icpc (Intel Parallel Studio Compiler) and NVCC for the GPU. Runtimes for the standard clang++ version were extremely slow as the size of the resultant image increased. Compiling the program using the icpc compiler brought in significant changes without modifying any code and reduced runtimes drastically for running purely on a CPU. Using the parallel version based on CUDA improved the runtime massively over the clang++ compiled version and even the icpc version as more values could be calculated in parallel.
[[Image:Mandelbrot.png | 750px]]
=== Output Images ===
[http://imgur.com/a/R3ZAH Image Output]
=== Future Optimizations ===
As there isn't any data intensive tasks in this program, further optimizations would include creating streams of kernels and having them execute concurrently in order to improve runtime of the current solution.