Changes

Jump to: navigation, search

BarraCUDA Boiz

853 bytes added, 00:24, 26 March 2017
Assignment 2
=== Assignment 2 ===
000=== Problem ===  After surveying the original code. We found one major hot-spots for heavy CPU usage. This block of code handles reshapes input pixels into a set of samples for classification.  const int N = width * height; const int dim = img.channels(); cv::Mat samples = cv::Mat(N, dim, CV_32FC1); for (int x = 0; x<width; x++) { for (int y = 0; y<height; y++) { for (int d = 0; d<dim; d++) { int index = y * width + x; samples.at<float>(index, d) = (float)img.at<uchar>(y, x*dim + d); } } } After analyzing this block of code. We decided to parallelize this   __global__ void setCenter(float* d_center, float* d_sample, int n, int dim, int randi) { int i = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; if (i < n && j < n) d_center[j * n + i] = d_sample[j * randi + i]; }
52
edits

Navigation menu