Open main menu

CDOT Wiki β

Changes

GPU610/Team DAG

2,437 bytes added, 22:44, 20 April 2013
Assignment 1
The later restriction on the efficiency of the existing code in addition to the file IO is the inclusion of a CRITICAL pragma, which restricts the execution of that block of code to 1 thread at a time. This may be difficult to convert or optimize for CUDA since these sections of code appear to make frequent use if file IO, and parallel IO may not be available in the operating environment.
-------------------------------------------------
 
Due to difficulties in code conversion, limitations of Direct file IO in the methods, and library linking, the attempt to optimize methods from the Drive_God project have been abandoned for a more feasible topic.
 
New Project Topic: Optimizing Summed-Area-Tables with GPGPU algorithms.
 
Work is based on initial examples from another member of class, with examinations of improved serial algorithm and using this as a step towards a parallel kernel.
 
The improved serial algorithm uses 2-pass prefix-scan-sum to generate the Summed-Area-Table. The code for the kernels will aim to start with implementing the 2 pass prefix-scan-sum on the GPU, possibly look at an improved algorithm released by Nehab,Maximo,Lima and Hoppe in 2011 that reduces memory traffic by overlapping computation of block partitions and buffering intermediate partition sums across multiple stage passes.
 
Initial Naive Serial Algorithm: Iterative Summation: O -> 1/4 *n^4 ? ( 1/2*n^2) *(1/2* n^2)
 
(based on sum of operations iterated across from 0 to n in 2 dimensions. )
 
------------------------------------
/* Creates the summarized area table */
void summarizedAreaTable(float** a, float** b, int size){
int k = 0;
float sum = 0.0;
for(int i = size-1; i >= 0; i--){
for(int j = 0; j < size; j++){
for(int k = i; k < size; k++){
for(int m = 0; m <= j; m++){
sum += a[k][m];
}
}
b[i][j] = sum;
sum = 0.0;
}
}
}
 
------------------------------------------
Improved Serial SAT Algorithm: 2-pass Serial Prefix Sum : O-> 2*n^2
(Requires 1.5x the memory as above, to store interim stage values)
 
void twopassscan_SAT(float* a, float* b, float* c, int size)
{
// perform serial prefix sum on horizontal dimension - store in b
for(int j = 0; j < size; j++)
{
b[0][j] = a[0][j]; // no recursive math for first column -- else out of bounds
for(int i = 1; i < size; i++)
{
// result element in row = source element + recursive element sum of previous item in row
b[i][j] = a[i][j] + b[i-1][j];
}
}
// Now perform prefix sum along vertical dimension on intermediate result b
for(int x = 0; x < size; x ++)
{
c[x][0] = b[x][0]; // no recusive sum for first elements down columns
for(y = 1; y< size; y++)
{
c[x][y] = b[x][y] + c[x][y-1];
}
}
 
}
=== Assignment 2 ===
=== Assignment 3 ===