53
edits
Changes
→Presentation
Another area that will be speed up the program would be the render function
// Begin tracing // trace(rayOrigin, rayDirection, 0, pixel, sphere, k); } } </nowiki></code> This function traces the rays for each pixel of the image , traces it and returns a color.
= Assignment 2 =
==Parallelization==
After converting a portion of the code to become more parallelized we decided to test the run times of the program at various resolutions. This rendered a picture at a certain quality and at each resolution the run time increased. ===Changing the Render function=== Instead of using regular C++ indexing in the render() function paralleled using Blocks and Thread indexing. So in the C++ version of the code we had a nested for loop that iterates over the x and the y axis of the image depending on the resolution of the image set. [[File:RenderCPP.jpg]] This was changed to thread based indexing when we changed the render function to the kernel. [[File:Render.png]] ===Declaring Device pointer===We declared a device pointer to the sphere object and allocated memory for device object and lastly copied the data from host object to the device object. [[File:Htod.png]] ===Setting up the Grid===We allocated the grid of threads based on the image resolution we set the code to render and divide it by the number of threads per block [[File:Grid.png]]===Launching the Kernel===Instead of calling the render function in the main we changed fucntion render() to a __global__ void render() kernel. [[File:RCpp.png]] In the end we launch the kernel to render the image and copy the rendered data from device memory to the host memory.
[[File:block.jpg]]
===Image resolution 512===
[[File:512.jpg]]
===Image resolution 1024===
[[File:1024.jpg]]
===Image resolution 2048===
[[File:2048.jpg]]
===Image resolution 4096===
[[File:4096.jpg]]
===Analysis===
[[File:excelgraph2.jpg]]
From this chart we can see the significant drop in run time when we switch from serial to parallel processing in ray tracing using CUDA as we double the resolution from 512. There is still room for improvement which will be implemented, and analyzed in assignment 3.
= Assignment 3 =