Changes

← Older edit

UnknownX

412 bytes added, 05:50, 13 April 2017

→‎Assignment 2 - V1 Parallelization

== Assignment 2 - V1 Parallelization==

Output result(converted to PNG formate): \n

[[File:GpuassOutput.PNG]]

Run time graph: \n

[[File:Pygpu2.PNG]]

CPU code:

The most expensive part in the program.

for (int y = 0; y < N; ++y) {

for (int x = 0; x < N; ++x) {

}

~~GPU~~Main code on .cu: 1. Allocate memory on device. 2. run kunal. ntpb = 1024. 3. copy the key data out.

int size = N * N;

int nblocks = (size + ntpb - 1) / ntpb;

Kernel: before: for (int y = 0; y < N; ++y) for (int x = 0; x < N; ++x)after: int idx = blockIdx.x * blockDim.x + threadIdx.x; int x = idx / N; int y = idx % N;

__global__ void kernel_tray(Vec3 pix_col, int N, int* pixs_x, int* pixs_y, int* pixs_z) {

pixs_z[y * N + x] = (int)pix_col.z;

}

Profile on nvvp:

[[File:matrix.senecac.on.ca/~zzha1/Capture.PNG]]

== Assignment 3 - Optimization ==

Zzha1

51

edits

Changes

UnknownX

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools