Changes

GPU610/Team AGC

1,209 bytes added, 19:41, 29 November 2014

→‎Bench

Since my video card has 48 KB of shared memory and I am not using more than 20 KB with all of my arrays, I do not need to worry about coalescing my data, since shared memory is much faster.

Due to operational limits, the kernel is being killed short of completion by the watchdog of the operation system. Thus I have updated the maximum step count to be 1 million, otherwise the kernel would need to be rethought or be run in Tesla Compute Cluster (TCC) mode with a secondary GPU not being used for display, but I just don't have that kind of money right now.

====== Testing ======

I have written the following script for testing purposes against the MPI implementation in dual-core and quad-core modes, and the CUDA implementation using 1 block of 800 threads:

<pre>

#!/usr/bin/env bash

# 1D Wave Equation Benchmark

# output_master() must be commented out

# Author: Christopher Markieta

set -e # Exit on error

MYDIR=$(dirname $0)

if [ "$1" == "mpi" ]; then

if [ -z $2 ]; then

echo "Usage: $0 mpi [2-8]"

exit 1

fi

# Number of threads to launch

run="mpirun -n $2 $MYDIR/wave.o"

elif [ "$1" == "cuda" ]; then

run="$MYDIR/wave.o"

else

echo "Usage: $0 [cuda|mpi] ..."

exit 1

fi

# 1 million

for steps in 1 10 100 1000 10000 100000 1000000

do

time echo $steps | $run &> /dev/null

done

</pre>

Christopher Markieta

70

edits

CDOT Wiki β

Changes

GPU610/Team AGC

CDOT Wiki ^β