49
edits
Changes
→Assignment 2 - Parallelize
|}
== Objectives
The main objective was to not change the main function. This objective was met, although code had to be added for profiling.
== Steps
=== Host Memory Management
In the original program a bmp is loaded into an vector of uint8_t. This is not ideal for CUDA, therefore an array of pinned memory was allocated. This array contains the same amount of elements but stores them as a structure, "BGRPixel" which is three contiguous floats. The vector is then transferred over to pinned memory.
= Assignment 3 - Optimize =