Changes

BETTERRED

8 bytes added, 13:59, 12 April 2017

→‎Assignment 2/3 - Parallelize & Optimize

The main objective was refactor the get_number_iterations() function and the subsequent functions called that created the nested loops. The objective was met as all the functions were refactored into a single device function that did the calculation for a single pixel of the image. As the original program was done with doubles, all of the doubles were changed to floats.

=== Steps ===

=== Host Memory Management ===

After that is done the image is copied back using a single memcpy to the host.

=== Results ===

The program was compiled using clang++ , icpc (Intel Parallel Studio Compiler) and NVCC for the GPU. Runtimes for the standard clang++ version were extremely slow as the size of the resultant image increased. Compiling the program using the icpc compiler brought in significant changes without modifying any code and reduced runtimes drastically for running purely on a CPU. Using the parallel version based on CUDA improved the runtime massively over the clang++ compiled version and even the icpc version as more values could be calculated in parallel.

[[Image:Mandelbrot.png | 750px]]

=== Output Images ===

[http://imgur.com/a/R3ZAH Image Output]

=== Future Optimizations ===

As there isn't any data intensive tasks in this program, further optimizations would include creating streams of kernels and having them execute concurrently in order to improve runtime of the current solution.

Knagarajan1

17

edits

Changes

BETTERRED

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools