1
edit
Changes
→Description
subtracting the row and column minimum
With a cilk_for reducer and a vectorized cilk_for this should have been possible, the thing is, I wrote the algorithm in such a way that instead of searching for a minimum, it searches for the location of the minimum. This turned out to be a problem when trying to parallelise it. In the end I also tried making it just a regular number and parallelising it, but for some reason got around the compiler broke on problem by reverting to the cilk_for'ssimpler solution.
Here is how what the parallelisation should have looked.looks like:
<pre>
int16_t findRowMinimum(int row) {
}
</pre>
The strange thing is that after these parallelisations The algorithm performed worse. I think this has to do with the granularity of the parallelism, because we aren't solving 1000 by 1000 matrices but a very large amount of 20 by 20 matrices, the parallelism isn't splitting into all that much and some of that may be faster off serial.
Another approach would have been to parallelise the whole nodes instead of the operations in them, but this is a little bit difficult because for each node to exist, the parent needs to have been evaluated already.
==Source==