Changes

Jump to: navigation, search

Studyapplocator

1,840 bytes added, 14:33, 22 April 2018
Presentation
Another area that will be speed up the program would be the render function
<code> <nowiki> for (unsigned y = 0; y < heightIMG_RES; ++y) { for (unsigned x = 0; x < widthIMG_RES; ++x, ++pixel) { int k = x + y * IMG_RES; float xx xxPoints = (2 * ((x + 0.5) * invWidthiwidth) - 1) * angle viewangle * aspectratio; float yy yyPoints = (1 - 2 * ((y + 0.5) * invHeightiheight)) * angleviewangle; Vec3f raydirrayDirection, rayOrigin; rayDirection.init(xxxxPoints, yyyyPoints, -1); raydir rayDirection.normalize(); *pixel = trace(Vec3f rayOrigin.init(0), raydir, spheres, 0); } }
// Begin tracing // trace(rayOrigin, rayDirection, 0, pixel, sphere, k); } }  </nowiki></code> This function traces the rays for each pixel of the image , traces it and returns a color.
= Assignment 2 =
Instead of using regular C++ indexing in the render() function paralleled using Blocks and Thread indexing.
 
So in the C++ version of the code we had a nested for loop that iterates over the x and the y axis of the image depending on the resolution of the image set.
 
[[File:RenderCPP.jpg]]
 
This was changed to thread based indexing when we changed the render function to the kernel.
[[File:Render.png]]
[[File:Grid.png]]
===Launching the Kernel===
Instead of calling the render function in the main we changed fucntion render() to a __global__ void render() kernel.
 
[[File:RCpp.png]]
 
In the end we launch the kernel to render the image and copy the rendered data from device memory to the host memory.
===Analysis===
[[File:excelgraphexcelgraph2.jpg]] From this chart we can see the significant drop in run time when we switch from serial to parallel processing in ray tracing using CUDA as we double the resolution from 512. There is still room for improvement which will be implemented, and analyzed in assignment 3.
= Assignment 3 =
Under ProgressIn this assignment we decided to enhance memory access to a vital data point which decreased the run time of the render kernel by almost half. This effect is shown in the graph below: [[File:optimizedExcel.jpg]] We can see the difference of run times in the kernel from the Nvidia Visual Profiler===Optimized Image Resolution Results at 512===[[File:512Optimized.jpg]] ===Optimized Image Resolution Results at 1024===[[File:1024Optimized.jpg]] ===Optimized Image Resolution Results at 2048===[[File:2048Optimized.jpg]] ===Optimized Image Resolution Results at 4096===[[File:4096Optimized.jpg]] Although there are more ways to optimize the code by better using available GPU resources, like using more available bandwidth, using more cores depending on compute capability, having better memcpy efficiency. For simplicity we decided to reduce memory access times as it was the main area where the kernel was spending most of its time as indicated by the nvvp profiles we collected. = Presentation =[[File:Presentation.pdf]]
53
edits

Navigation menu