Changes

Jump to: navigation, search

UnknownX

344 bytes added, 05:30, 13 April 2017
Assignment 2 - V1 Parallelization
== Assignment 2 - V1 Parallelization==
Output result(converted to PNG formate):
[[File:GpuassOutput.PNG]]
CPU code:
The most expensive part in the program.
 
for (int y = 0; y < N; ++y) {
for (int x = 0; x < N; ++x) {
}
GPUMain code on .cu:1. Allocate memory on device.2. run kunal. ntpb = 1024.3. copy the key data out. 
int size = N * N;
int nblocks = (size + ntpb - 1) / ntpb;
Kernel:before: for (int y = 0; y < N; ++y) for (int x = 0; x < N; ++x)after: int idx = blockIdx.x * blockDim.x + threadIdx.x; int x = idx / N; int y = idx % N;
__global__ void kernel_tray(Vec3 pix_col, int N, int* pixs_x, int* pixs_y, int* pixs_z) {
51
edits

Navigation menu