Changes

GPUSquad

242 bytes added, 12:26, 7 April 2018

→‎Assignment 2

</source>

The hotspot seems to ~~clearly~~ be the ~~triple~~ double for-loop based on m and n in the Jacobi iterations code of the dojacobi() function. I believe these matrix calculations could be parallelized for improved performance. Note that the for-loop that the double loop is inside of is based on a constant numbers, iters, so it doesn't grow with the problem size. It would be O(iters * n^2) which is still O(n^2) not O(n^3).

==== Idea 2 - LZW Compression ====

const int total_iters = 5000;

const int error_every = 2;

const int m = ~~500~~32, n = ~~500~~1024;

const float xmin = -1, xmax = 1;

const float ymin = -1, ymax = 1;

cudaMemcpy(d_b, b, n* m * sizeof(float), cudaMemcpyHostToDevice);

int nblocks = n / 1024; dim3 dGrid(1nblocks); dim3 dBlock(m1024);

// Carry out Jacobi iterations

Tsarkarcd

93

edits

CDOT Wiki β

Changes

GPUSquad

CDOT Wiki ^β