Open main menu

CDOT Wiki β

Changes

GPU610/DPS915 MCM Decrypt

896 bytes added, 22:45, 18 November 2013
Kernel Attempts
-Currently I am trying to get the above kernels to work before handing in the assignment as I feel that having just the initialization kernel would not be nearly sufficient for the purpose of this assignment.
 
==== Work With Prefix Scan ====
-At the moment I am working on various simplified versions of a prefix sum algorithm that I am hoping will lead me on the right path to completing my assignment. These algorithms have been gathered from various sources such as MIT, NVIDIA, as well as CUDA documentation.
void scan( float* arr1, float* input, int n) {
output[0] = 0; // since this is a prescanan exclusive scan, we do not a scaninclude the first element
for(int i = 1; i < length; ++i) {
arr1[i] = input[i-1] + arr1[i-1];
}
}
 
</source>
 
-Below is a parallel scan which does the same thing as the above function
 
<source lang="cpp">
 
global__ void scan(float *g_odata, float *g_idata, int n) {
extern __shared__ float temp[]; // allocated on invocation
 
int thid = threadIdx.x;
int pout = 0, pin = 1;
 
// load input into shared memory.
// This is exclusive scan, so shift right by one and set first elt to 0
temp[pout*n + thid] = (thid > 0) ? g_idata[thid-1] : 0;
__syncthreads();
 
for (int offset = 1; offset < n; offset *= 2) {
pout = 1 - pout; // swap double buffer indices
pin = 1 - pout;
 
if (thid >= offset)
temp[pout*n+thid] += temp[pin*n+thid - offset];
else
temp[pout*n+thid] = temp[pin*n+thid];
 
__syncthreads();
}
 
g_odata[thid] = temp[pout*n+thid1]; // write output
}
</source>
=== Assignment 3 ===