Changes

GPUSquad

699 bytes added, 18:41, 11 April 2018

→‎Assignment 3

Note how the run times for each kernel with shared memory are significantly longer than those with global.

To ~~demonstrate that~~ try to determine if this ~~is probably an~~ issue was one of warp divergence, ~~here is another diagram~~ we tried to time a kernel with ~~timings where the kernel both sets up~~ global memory that also initialized shared memory ~~using if statments to determine~~ , although referenced global memory when ~~to initialize ghost cells, but runs~~ carrying out the ~~Jacobi~~ actual calculations ~~using global memory~~:

[[File:GlobalInitSharedKernelTimes.png]]

~~It turns out~~ The run of a kernel that ~~this does~~ allocated shared memory using a series of if statements, but executed instructions using global memory is shown in the figure above. While slightly longer than the run with global memory where shared memory is not initialized for ghost cells, it still takes less time to run ~~as slowly either--~~than the ~~issue~~ version with Global memory. It is ~~probably with resource allocation (trying~~ likely that Our group's attempts to ~~allocate more~~ employ shared memory failed because we did not adequately schedule or partition the shared memory ~~than~~ , and the kernel was slowed as a result. The supposed occupancy of a block ~~can handle~~of shared memory was 34x32 (the dimensions of the shared memory matrix)x 4 (the size of a float) which equals 4,352 bytes per block, which is supposedly less than the maximum of about 49KB stated for a device with a 5.0 compute capability (which this series of tests on individual kernel run times was performed on).~~. try reducing~~ With this is mind it is still unclear as to why the ~~size of~~ shared memory ~~to 32x16?~~performed more poorly that the global memory implementation.

~~[TODO: INCLUDE PROFILING BREAKDOWNS OF INDIVIDUAL (NOT 5000) KERNEL RUNS TO SEE SPECIFIC TIMELINE FEATURES~~Unfortunately our group's inability to effectively use profiling tools has left this discrepancy as a mystery. ~~EXPLAIN THE DIFFERENCES IN RUN TIMES]~~

Moverall

41

edits

Changes

GPUSquad

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools