Changes

GPUSquad

1,059 bytes added, 23:58, 9 April 2018

m

→‎Assignment 3

PROPER TIMINGS:

[[File:Code_timings.png]]

The above graph includes the total run times for the serial code, the 1D kernel from assignment 2, the 1d kernel using constant memory for calculation constants, a kernel with global and constant memory with a 2D thread arrangement, and the same 2D arrangement but with shared memory utilizing ghost cells.

We found that the most efficient version of the code was the 1D implementation that used constant memory. Because the shared memory version of the kernel required synchronization of threads to allocate shared memory every time a kernel was run, and a kernel was run 5000 times for each version of our code, this increased overhead for memory setup actually made the execution slower than the version with global memory.

The 1D design ran better than the 2d implementation for a couple of reasons (including that it scaled along the m dimension, which still produced readable graphs).

[TODO: INCLUDE PROFILING BREAKDOWNS OF INDIVIDUAL (NOT 5000) KERNEL RUNS TO SEE SPECIFIC TIMELINE FEATURES. EXPLAIN THE DIFFERENCES IN RUN TIMES]

Moverall

41

edits

Changes

GPUSquad

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools