Latest revision as of 12:29, 14 April 2016

GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary

T-eaSyPeasy

Team Members

Berwout de Vries Robles

Progress

Introduction

For my project instead of picking a new method to learn to parallelise with, I decided to try to paralellise an algorithm that is fairly hard to write, to see if after following this course I could actually parallelise an algorithm. The thing is, in the workshops we have only been parallelising algorithms that have been specifically written to be parallelised. I wanted to change that and see if I could still do it. The algorithm I am parallelising is a Branch & Bound Algorithm by Little from 1966. A good explanation of the algorithm can be found here: https://www.youtube.com/watch?v=nN4K8xA8ShM, the explanation of the branch and bound I wrote starts at 34:30.

Description

First off because I am not a C / C++ programmer I had a lot of trouble learning C++ syntax and what I could and could not do in my program. I think as a result the program is slower than it should be.

Some difficulties and how I solved them:

After removing a row or column, the matrix would be skewed, we would have to keep track of the exact columns and rows we removed and calculate our current value based on that entire history along the way.

The way I remedied that is that instead of removing the rows and columns, I put a sentinel value in them (UINT16_MAX), the rest of the algorithm just ignores that value.

Not all of the column minima need to be found, since if you have a row minimum in a column that automatically turns into a column minimum, due to the nature of the algorithm.

When searching for row minima I kept a bool array of which columns I had already found a value for, so that I would not have to search those *columns in the findColMinima function.

Subtour elimination

Store the result as partial tours, If an assignment is the start of a tour and the end of a tour, place it in between and concatenate the tour. If an assignment is the start of a tour, place it before the tour. If an assignment is the end of a tour, place it behind the tour. If an assignment is not part of a tour, create a new tour with it. Find out which tour the assignment got added to and eliminate the subtour that goes from the end of that tour to the beginning.

I wanted to parallelise certain sections of the algorithm, most notable : finding the row and column minimum subtracting the row and column minimum

With a cilk_for reducer and a vectorized cilk_for this should have been possible, the thing is, I wrote the algorithm in such a way that instead of searching for a minimum, it searches for the location of the minimum. This turned out to be a problem when trying to parallelise it. In the end I got around the problem by reverting to the simpler solution.

Here is what the parallelisation looks like:

    int16_t findRowMinimum(int row) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for (int col = 0; col < matrixSize; col++) {
            best->calc_min(workingCopy[row*matrixSize + col]);
        }
        return best.get_value();
    }

 int16_t findColMinimum(int col) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for (int row= 0; row< matrixSize; row++) {
            best->calc_min(workingCopy[row*matrixSize + col]);
        }
        return best.get_value();
    }

    void subtractRowMinimum(uint16_t val, int row) {
        cilk_for(int col = 0; col < matrixSize; col++) {
            workingCopy[row*matrixSize + col] -= val;
        }
    }

    void subtractColMinimum(uint16_t val, int col) {
        cilk_for(int row = 0; row < matrixSize; row++) {
            workingCopy[row*matrixSize + col] -= val;
        }
    }

The strange thing is that after these parallelisations thealgorithm performed a lot worse. I think this has to do with the granularity of the parallelism. Our serial version can solve a 500 city problem around 10 times faster than the non-serial version can. Another approach would have been to parallelise the whole nodes instead of the operations in them, but this is a little bit difficult because for each node to exist, the parent needs to have been evaluated already. There would have to be some very complex Node management logic to determine which nodes to spawn and which not to spawn, when to halt evaluation on certain nodes and that sort of thing.

Learn from me and think long and hard about what you want to parallelise in regards of granularity, because in this case parallelising the individual operations in the Node made my algorithm significantly worse. It took a lot of programmer effort and gave me no reward in terms of speed. I did learn a lot about C++ writing this algorithm, so all was not lost. I left the cilk/cilk_stubs.h in the solution below, because the serial solution was the best working solution I found. If you want to try the parallel version, just remove the stubs.

Source

The program takes it's input from a text file I generated in the same folder as the source code, the file contains the cities for the problem. The text file has 2 columns of integers. The first column is read as the x value of the city and the second column is the y value of the city. You can adjust the number of cities to be taken from the file in the source code at the top of the main section.

#include <stdint.h>
#include <vector>
#include <algorithm>
#include <fstream>
#include <iostream>
#include <cmath>
#include <time.h>
#include <cilk/cilk.h>
#include <cilk/reducer_min.h>
#include <cilk/cilk_stub.h>

struct Selection {
    uint16_t row;
    uint16_t col;
    uint16_t val;
    Selection(uint16_t x, uint16_t y, uint16_t v);

    bool operator==(const Selection& rhs)
    {
        return (col == rhs.col && row == rhs.row);
    }
    bool operator!=(const Selection& rhs)
    {
        return !(col == rhs.col && row == rhs.row);
    }
    Selection() : row(UINT16_MAX), col(UINT16_MAX), val(UINT16_MAX) {}
};

Selection::Selection(uint16_t r, uint16_t c, uint16_t v) {
    row = r;
    col = c;
    val = v;
}

struct Zero {
    uint16_t row;
    uint16_t col;
    uint16_t horPen;
    uint16_t vertPen;
    Zero(uint16_t row, uint16_t col);
    Zero() {};
};

Zero::Zero(uint16_t r, uint16_t c) {
    row = r;
    col = c;
};

struct Node {
    int lowerBound;
    int matrixSize;
    bool solutionFound = false;
    bool invalidSolution = false;
    uint16_t* originalMatrix;
    Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution, uint16_t lb);
    Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution);
    std::vector<std::vector<Selection>> solution;
    Node* left;
    Node* right;

    void evaluate(){
        std::vector<uint16_t> rowMins(matrixSize);
        std::vector <uint16_t> colMins(matrixSize);
        std::vector<Zero> zeroes;
        std::vector<bool> zeroInColumn(matrixSize, false);
        lowerBound = sumCurrentSolution();

        for (int row = 0; row < matrixSize; row++) {
            int16_t minimum = findRowMinimum(row);
            rowMins[row] = minimum;
            if(minimum != UINT16_MAX)
            lowerBound += minimum;
        }
        for (int row = 0; row < matrixSize; row++) {
            if (rowMins[row] != UINT16_MAX && rowMins[row] != 0) {
                subtractRowMinimum(rowMins[row], row);
            }
        }

        for (int col = 0; col < matrixSize; col++) {
            int16_t minimum = findColMinimum(col);
            colMins[col] = minimum;
            if (minimum != UINT16_MAX)
            lowerBound += minimum;
        }
        for (int col = 0; col < matrixSize; col++) {
            if (colMins[col] != UINT16_MAX && colMins[col] != 0) {
                subtractRowMinimum(colMins[col], col);
            }
        }
        gatherZeroes(zeroes);
        if (zeroes.empty()) {
            invalidSolution = true;
        }
        else {
            findPenalties(zeroes);
            Zero assignment = assignLargestPenalty(zeroes);
            createLeftNode(solution, assignment);
            int subtourIndex = addToResult(solution, assignment);
            createRightNode(solution, assignment, subtourIndex);
            checkSolutionFound();
        }
        free(originalMatrix);
    }

    int16_t findColMinimum(int col) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for(int row = 0; row< matrixSize; row++) {
            best->calc_min(originalMatrix[row*matrixSize + col]);
        }
        return best.get_value();
    }

    int16_t findRowMinimum(int row) {
        cilk::reducer< cilk::op_min<uint16_t> > best;
        cilk_for (int col = 0; col < matrixSize; col++) {
            best->calc_min(originalMatrix[row*matrixSize + col]);
        }
        return best.get_value();
    }

    void subtractRowMinimum(uint16_t val, int row) {
        if (val != 0) {
            cilk_for(int col = 0; col < matrixSize; col++) {
                if (originalMatrix[row*matrixSize + col] != UINT16_MAX)
                    originalMatrix[row*matrixSize + col] -= val;
            }
        }
    }

    void subtractColMinimum(uint16_t val, int col) {
        if (val != 0) {
            cilk_for(int row = 0; row < matrixSize; row++) {
                if (originalMatrix[row*matrixSize + col] != UINT16_MAX)
                    originalMatrix[row*matrixSize + col] -= val;
            }
        }
    }

    int sumCurrentSolution() {
        int result = 0;
        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
            std::for_each(v.begin(), v.end(), [&](Selection s) {
                result += s.val;
            });
        });
        return result;
    }

    void checkSolutionFound() {
        solutionFound = (solution.size() == 1 && solution[0].size() == matrixSize - 1);
    }

    bool isSolutionFound() {
        return solutionFound;
    }

    std::vector<std::vector<Selection>> getSolution() {
        return solution;
    }

    //void createWorkingCopy() {
    //    originalMatrix = (uint16_t*)malloc(sizeof(uint16_t) * matrixSize * matrixSize);
    //    memcpy(originalMatrix, originalMatrix, (sizeof(uint16_t) * matrixSize * matrixSize));
    //}

    void printMatrix() {
        std::cout << "\n Matrix: ";
        for (int i = 0; i < matrixSize; i++) {
            std::cout << "\n";
            for (int j = 0; j < matrixSize; j++) {
                std::cout << originalMatrix[i*matrixSize + j] << " ";
            }
        }
        std::cout << "\n";
    }

    void printOriginalMatrix() {
        std::cout << "\nOriginal Matrix: ";
        for (int i = 0; i < matrixSize; i++) {
            std::cout << "\n";
            for (int j = 0; j < matrixSize; j++) {
                std::cout << originalMatrix[i*matrixSize + j] << " ";
            }
        }
        std::cout << "\n";
    }

    void printSolution() {
        std::cout << "\n intermediate :\n";
        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
            std::for_each(v.begin(), v.end(), [&](Selection s) {
                std::cout << "(" << s.row << ", " << s.col << ") value = " << s.val << " ";
            });
            std::cout << "\n";
        });
        std::cout << "\n";
    }
    void gatherZeroes(std::vector<Zero>& zeroes) {
        for (int i = 0; i < matrixSize; i++) {
            for (int j = 0; j < matrixSize; j++) {
                if (originalMatrix[i*matrixSize + j] == 0) {
                    zeroes.push_back(Zero(i, j));
                }
            }
        }
        //std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
        //    std::cout << " zero found : row : " << z.row << " col : " << z.col << "\n";
        //});
    }

    void findPenalties(std::vector<Zero>& zeroes) {
        std::for_each(zeroes.begin(), zeroes.end(), [&](Zero& z) {
            uint16_t horizontalMin = UINT16_MAX;
            for (int col = 0; col < matrixSize; col++) {
                if (col != z.col && originalMatrix[z.row*matrixSize + col] < horizontalMin) {
                    horizontalMin = originalMatrix[z.row*matrixSize + col];
                }
            }
            z.horPen = horizontalMin;

            uint16_t verticalMin = UINT16_MAX;
            for (int row = 0; row < matrixSize; row++) {
                if (row != z.row && originalMatrix[row * matrixSize + z.col] < verticalMin) {
                    verticalMin = originalMatrix[row * matrixSize + z.col];
                }
            }
            z.vertPen = verticalMin;
        });

        //std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
        //    std::cout << " (" << z.row << ", " << z.col << ") Horizontal penalty: " << z.horPen << " Vertical penalty: " << z.vertPen << "\n";
        //});
    }

    Zero assignLargestPenalty(std::vector<Zero>& zeroes) {
        Zero assignment = zeroes[0];
        std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
            if ((z.vertPen + z.horPen) > (assignment.vertPen + assignment.horPen)) {
                assignment = z;
            }
        });
        return assignment;
    }

    int addToResult(std::vector<std::vector<Selection>>& solution, Zero assignment) {
        int startOfSubtourIndex = -1;
        int endOfSubtourIndex = -1;
        int index = 0;
        int result = 0;

        //dealing with subtours....
        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection>& subtour) {
            if (subtour[0].row == assignment.col) {
                startOfSubtourIndex = index;
            }
            else if (subtour.back().col == assignment.row) {
                endOfSubtourIndex = index;
            }
            index++;
        });
        if (startOfSubtourIndex > -1 && endOfSubtourIndex > -1) {
            //this case is true if our solution connects 2 subtours, by being at the start of one and at the end of the other.
            //we combine them by concatenating the subtour that it is the start of after the subtour that it is the end of.
            solution[endOfSubtourIndex].push_back(Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col]));
            solution[endOfSubtourIndex].insert(solution[endOfSubtourIndex].end(), solution[startOfSubtourIndex].begin(), solution[startOfSubtourIndex].end());
            solution.erase(solution.begin() + startOfSubtourIndex);
            result = endOfSubtourIndex;
            if (startOfSubtourIndex < endOfSubtourIndex) {
                result--;
            }

        }
        else if (startOfSubtourIndex > -1) {
            solution[startOfSubtourIndex].insert(solution[startOfSubtourIndex].begin(), Selection(assignment.row, assignment.col, originalMatrix[assignment.row * matrixSize + assignment.col]));
            //subtour elimination, we can no longer go FROM(column) subtour.back().row TO(row) assignment.col.
            result = startOfSubtourIndex;
        }
        else if (endOfSubtourIndex > -1) {
            solution[endOfSubtourIndex].push_back(Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col]));
            //subtour elimination, we can no longer go FROM(column) assignment.row TO(row) subtour[0].col.
            result = endOfSubtourIndex;
        }
        else {
            solution.push_back(std::vector<Selection>(1, Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col])));
            result = solution.size() - 1;
        }
        return result;
    }

    void createLeftNode(std::vector<std::vector<Selection>>& solution, Zero assignment) {
        uint16_t* leftSideMatrix = (uint16_t*)malloc(sizeof(uint16_t)*matrixSize*matrixSize);
        for (int row = 0; row < matrixSize; row++) {
            for (int col = 0; col < matrixSize; col++) {
                if (row == assignment.row && col == assignment.col) {
                    leftSideMatrix[row*matrixSize + col] = UINT16_MAX;
                }
                else {
                    leftSideMatrix[row*matrixSize + col] = originalMatrix[row*matrixSize + col];
                }
            }
        }
        left = new Node(matrixSize, leftSideMatrix, solution, lowerBound + assignment.horPen + assignment.vertPen);
    }

    void createRightNode(std::vector<std::vector<Selection>>& solution, Zero assignment, int subtourIndex) {
        uint16_t* rightSideMatrix = (uint16_t*)malloc(sizeof(uint16_t)*matrixSize*matrixSize);
        for (int row = 0; row < matrixSize; row++) {
            for (int col = 0; col < matrixSize; col++) {
                if (assignment.row == row || assignment.col == col) {
                    rightSideMatrix[row*matrixSize + col] = UINT16_MAX;
                }
                else {
                    rightSideMatrix[row*matrixSize + col] = originalMatrix[row*matrixSize + col];
                }
            }
        }
        rightSideMatrix[solution[subtourIndex].back().col * matrixSize + solution[subtourIndex][0].row] = UINT16_MAX;
        right = new Node(matrixSize, rightSideMatrix, solution);
    }

    Node getRight() {
        return *right;
    }

    Node getLeft() {
        return *left;
    }

};

Node::Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution) : solution(solution) {
    matrixSize = amountOfCities;
    originalMatrix = matrix;
}

Node::Node(int amountOfCities,uint16_t* matrix, std::vector<std::vector<Selection>> solution, uint16_t lb) : solution(solution){
    matrixSize = amountOfCities;
    originalMatrix = matrix;
    lowerBound = lb;
}

class Tree {
    Node *start;
    int lowerBound;
    int upperBound;
    std::vector<Node> nodes;
};

void getMatrixFromFile(std::string filename, int matrixSize, uint16_t* originalMatrix) {
    std::vector<int> xValues(matrixSize);
    std::vector<int> yValues(matrixSize);
    int n, m, i = 0;
    std::ifstream read(filename);
    while (i < matrixSize && read >> n >> m) {
        xValues[i] = n;
        yValues[i] = m;
        i++;
    }
   
    for (int i = 0; i < matrixSize; i++) {
        for (int j = 0; j < matrixSize; j++) {
            if (i == j) {
                originalMatrix[i*matrixSize + j] = UINT16_MAX;
            }
            else {
                originalMatrix[i*matrixSize + j] = (uint16_t)sqrt(pow(xValues[i]-xValues[j], 2) + pow(yValues[i] - yValues[j],2));
            }
        }
    }
}

void pruneNodeList(std::vector<Node>& nodeList, int upperBound) {
    int i = 0;
    while (i < nodeList.size()) {
        if (nodeList[i].lowerBound >= upperBound) {
            nodeList.erase(nodeList.begin() + i);
        }
        else {
            ++i;
        }
    }
}

int calculateSolution(uint16_t* originalMatrix, std::vector<std::vector<Selection>>& solution, int matrixSize) {
    int sum = 0;
    std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
        std::for_each(v.begin(), v.end(), [&](Selection s) {
            sum += s.val;
        });
    });
    sum += originalMatrix[solution[0].back().col * matrixSize + solution[0][0].row];
    return sum;
}

int findNextEvaluation(std::vector<Node>& nodeList) {
    Node eval = nodeList[0];
    int index = 0;
    int currentIndex = 0;
    std::for_each(nodeList.begin(), nodeList.end(), [&](Node n) {
        if (n.lowerBound < eval.lowerBound) {
            currentIndex = index;
        }
        index++;
    });
    return currentIndex;
}

int main() {
    int amountOfCities = 500;
    uint16_t* originalMatrix = (uint16_t*)malloc(sizeof(uint16_t)*amountOfCities*amountOfCities);
    getMatrixFromFile("cities.txt", amountOfCities, originalMatrix);
    std::vector<std::vector<Selection>> solution;
    Node node = Node(amountOfCities, originalMatrix, solution);
    std::vector<Node> nodeList;

    std::vector<std::vector<Selection>> currentSolution;
    std::vector<std::vector<Selection>> tempSolution;
    int upperBound = UINT16_MAX;
    int count = 0;
    clock_t t1, t2;
    t1 = clock();
    while (!node.isSolutionFound()) {
        node.evaluate();
        count++;
        if (!node.isSolutionFound()) {
            nodeList.push_back(node.getLeft());
            node = node.getRight();
        }
    }
    currentSolution = node.getSolution();
    upperBound = calculateSolution(originalMatrix, currentSolution, amountOfCities);
    pruneNodeList(nodeList, upperBound);
    while (!nodeList.empty()) {
        int nextNode = findNextEvaluation(nodeList);
        node = nodeList[nextNode];
        nodeList.erase(nodeList.begin() + nextNode);
        while (!node.isSolutionFound() && !node.invalidSolution) {
            node.evaluate();
            count++;
            if (!node.isSolutionFound() && !node.invalidSolution && node.lowerBound < upperBound) {
                if (node.getLeft().lowerBound < upperBound) {
                    nodeList.push_back(node.getLeft());
                }
                node = node.getRight();
            }
            else if (node.lowerBound >= upperBound) {
                break;
            } else if(!node.invalidSolution){
                tempSolution = node.getSolution();
                int value = calculateSolution(originalMatrix, tempSolution, amountOfCities);
                if (value < upperBound) {
                    currentSolution = tempSolution;
                    upperBound = value;
                    pruneNodeList(nodeList, upperBound);
                }
            }
        }

    }
    t2 = clock();
    float diff((float)t2 - (float)t1);
    std::cout << "time spent: " << diff << "\n";
    std::cout << "count = " << count << "\n";
    std::cout << "\n Solution :\n";
    std::for_each(currentSolution.begin(), currentSolution.end(), [&](std::vector<Selection> v) {
        std::for_each(v.begin(), v.end(), [&](Selection s) {
            std::cout << "(" << s.row << ", " << s.col << ") ";
        });
        std::cout << "\n";
    });
    std::cout << "\nResult = " << upperBound;
}

useful links

https://www.youtube.com/watch?v=-cLsEHP0qt0 - Branch and Bound explanation from minute 34:30.
https://www.cilkplus.org/tutorial-reducer-min - using the min reducer in cilk_for

Difference between revisions of "GPU621/DPS921 T-eaSyPeasy"

Latest revision as of 12:29, 14 April 2016

Contents

T-eaSyPeasy

Team Members

Progress

Introduction

Description

Source

useful links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools

@@ Line 5: / Line 5: @@
 == Progress ==
-=== Assignment 1 ===
+==Introduction==
-=== Assignment 2 ===
+For my project instead of picking a new method to learn to parallelise with, I decided to try to paralellise an algorithm that is fairly hard to write, to see if after following this course I could actually parallelise an algorithm. The thing is, in the workshops we have only been parallelising algorithms that have been specifically written to be parallelised. I wanted to change that and see if I could still do it. The algorithm I am parallelising is a Branch & Bound Algorithm by Little from 1966. A good explanation of the algorithm can be found here: https://www.youtube.com/watch?v=nN4K8xA8ShM, the explanation of the branch and bound I wrote starts at 34:30.
-=== Assignment 3 ===
+==Description==
+First off because I am not a C / C++ programmer I had a lot of trouble learning C++ syntax and what I could and could not do in my program. I think as a result the program is slower than it should be.
+Some difficulties and how I solved them:
+* After removing a row or column, the matrix would be skewed, we would have to keep track of the exact columns and rows we removed and calculate our current value based on that entire history along the way.
+The way I remedied that is that instead of removing the rows and columns, I put a sentinel value in them (UINT16_MAX), the rest of the algorithm just ignores that value.
+* Not all of the column minima need to be found, since if you have a row minimum in a column that automatically turns into a column minimum, due to the nature of the algorithm.
+When searching for row minima I kept a bool array of which columns I had already found a value for, so that I would not have to search those *columns in the findColMinima function.
+*Subtour elimination
+Store the result as partial tours,
+If an assignment is the start of a tour and the end of a tour, place it in between and concatenate the tour.
+If an assignment is the start of a tour, place it before the tour.
+If an assignment is the end of a tour, place it behind the tour.
+If an assignment is not part of a tour, create a new tour with it.
+Find out which tour the assignment got added to and eliminate the subtour that goes from the end of that tour to the beginning.
+I wanted to parallelise certain sections of the algorithm, most notable :
+finding the row and column minimum
+subtracting the row and column minimum
+With a cilk_for reducer and a vectorized cilk_for this should have been possible, the thing is, I wrote the algorithm in such a way that instead of searching for a minimum, it searches for the location of the minimum. This turned out to be a problem when trying to parallelise it. In the end I got around the problem by reverting to the simpler solution.
+Here is what the parallelisation looks like:
+<pre>
+    int16_t findRowMinimum(int row) {
+        cilk::reducer< cilk::op_min<uint16_t> > best;
+        cilk_for (int col = 0; col < matrixSize; col++) {
+            best->calc_min(workingCopy[row*matrixSize + col]);
+        }
+        return best.get_value();
+    }
+</pre>
+<pre>
+ int16_t findColMinimum(int col) {
+        cilk::reducer< cilk::op_min<uint16_t> > best;
+        cilk_for (int row= 0; row< matrixSize; row++) {
+            best->calc_min(workingCopy[row*matrixSize + col]);
+        }
+        return best.get_value();
+    }
+</pre>
+<pre>
+    void subtractRowMinimum(uint16_t val, int row) {
+        cilk_for(int col = 0; col < matrixSize; col++) {
+            workingCopy[row*matrixSize + col] -= val;
+        }
+    }
+</pre>
+<pre>
+    void subtractColMinimum(uint16_t val, int col) {
+        cilk_for(int row = 0; row < matrixSize; row++) {
+            workingCopy[row*matrixSize + col] -= val;
+        }
+    }
+</pre>
+The strange thing is that after these parallelisations thealgorithm performed a lot worse. I think this has to do with the granularity of the parallelism. Our serial version can solve a 500 city problem around 10 times faster than the non-serial version can.
+Another approach would have been to parallelise the whole nodes instead of the operations in them, but this is a little bit difficult because for each node to exist, the parent needs to have been evaluated already. There would have to be some very complex Node management logic to determine which nodes to spawn and which not to spawn, when to halt evaluation on certain nodes and that sort of thing.
+Learn from me and think long and hard about what you want to parallelise in regards of granularity, because in this case parallelising the individual operations in the Node made my algorithm significantly worse. It took a lot of programmer effort and gave me no reward in terms of speed. I did learn a lot about C++ writing this algorithm, so all was not lost. I left the cilk/cilk_stubs.h in the solution below, because the serial solution was the best working solution I found. If you want to try the parallel version, just remove the stubs.
+==Source==
+The program takes it's input from a text file I generated in the same folder as the source code, the file contains the cities for the problem. The text file has 2 columns of integers.
+The first column is read as the x value of the city and the second column is the y value of the city. You can adjust the number of cities to be taken from the file in the source code at the top of the main section.
+<pre>
+#include <stdint.h>
+#include <vector>
+#include <algorithm>
+#include <fstream>
+#include <iostream>
+#include <cmath>
+#include <time.h>
+#include <cilk/cilk.h>
+#include <cilk/reducer_min.h>
+#include <cilk/cilk_stub.h>
+struct Selection {
+    uint16_t row;
+    uint16_t col;
+    uint16_t val;
+    Selection(uint16_t x, uint16_t y, uint16_t v);
+    bool operator==(const Selection& rhs)
+    {
+        return (col == rhs.col && row == rhs.row);
+    }
+    bool operator!=(const Selection& rhs)
+    {
+        return !(col == rhs.col && row == rhs.row);
+    }
+    Selection() : row(UINT16_MAX), col(UINT16_MAX), val(UINT16_MAX) {}
+};
+Selection::Selection(uint16_t r, uint16_t c, uint16_t v) {
+    row = r;
+    col = c;
+    val = v;
+}
+struct Zero {
+    uint16_t row;
+    uint16_t col;
+    uint16_t horPen;
+    uint16_t vertPen;
+    Zero(uint16_t row, uint16_t col);
+    Zero() {};
+};
+Zero::Zero(uint16_t r, uint16_t c) {
+    row = r;
+    col = c;
+};
+struct Node {
+    int lowerBound;
+    int matrixSize;
+    bool solutionFound = false;
+    bool invalidSolution = false;
+    uint16_t* originalMatrix;
+    Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution, uint16_t lb);
+    Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution);
+    std::vector<std::vector<Selection>> solution;
+    Node* left;
+    Node* right;
+    void evaluate(){
+        std::vector<uint16_t> rowMins(matrixSize);
+        std::vector <uint16_t> colMins(matrixSize);
+        std::vector<Zero> zeroes;
+        std::vector<bool> zeroInColumn(matrixSize, false);
+        lowerBound = sumCurrentSolution();
+        for (int row = 0; row < matrixSize; row++) {
+            int16_t minimum = findRowMinimum(row);
+            rowMins[row] = minimum;
+            if(minimum != UINT16_MAX)
+            lowerBound += minimum;
+        }
+        for (int row = 0; row < matrixSize; row++) {
+            if (rowMins[row] != UINT16_MAX && rowMins[row] != 0) {
+                subtractRowMinimum(rowMins[row], row);
+            }
+        }
+        for (int col = 0; col < matrixSize; col++) {
+            int16_t minimum = findColMinimum(col);
+            colMins[col] = minimum;
+            if (minimum != UINT16_MAX)
+            lowerBound += minimum;
+        }
+        for (int col = 0; col < matrixSize; col++) {
+            if (colMins[col] != UINT16_MAX && colMins[col] != 0) {
+                subtractRowMinimum(colMins[col], col);
+            }
+        }
+        gatherZeroes(zeroes);
+        if (zeroes.empty()) {
+            invalidSolution = true;
+        }
+        else {
+            findPenalties(zeroes);
+            Zero assignment = assignLargestPenalty(zeroes);
+            createLeftNode(solution, assignment);
+            int subtourIndex = addToResult(solution, assignment);
+            createRightNode(solution, assignment, subtourIndex);
+            checkSolutionFound();
+        }
+        free(originalMatrix);
+    }
+    int16_t findColMinimum(int col) {
+        cilk::reducer< cilk::op_min<uint16_t> > best;
+        cilk_for(int row = 0; row< matrixSize; row++) {
+            best->calc_min(originalMatrix[row*matrixSize + col]);
+        }
+        return best.get_value();
+    }
+    int16_t findRowMinimum(int row) {
+        cilk::reducer< cilk::op_min<uint16_t> > best;
+        cilk_for (int col = 0; col < matrixSize; col++) {
+            best->calc_min(originalMatrix[row*matrixSize + col]);
+        }
+        return best.get_value();
+    }
+    void subtractRowMinimum(uint16_t val, int row) {
+        if (val != 0) {
+            cilk_for(int col = 0; col < matrixSize; col++) {
+                if (originalMatrix[row*matrixSize + col] != UINT16_MAX)
+                    originalMatrix[row*matrixSize + col] -= val;
+            }
+        }
+    }
+    void subtractColMinimum(uint16_t val, int col) {
+        if (val != 0) {
+            cilk_for(int row = 0; row < matrixSize; row++) {
+                if (originalMatrix[row*matrixSize + col] != UINT16_MAX)
+                    originalMatrix[row*matrixSize + col] -= val;
+            }
+        }
+    }
+    int sumCurrentSolution() {
+        int result = 0;
+        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
+            std::for_each(v.begin(), v.end(), [&](Selection s) {
+                result += s.val;
+            });
+        });
+        return result;
+    }
+    void checkSolutionFound() {
+        solutionFound = (solution.size() == 1 && solution[0].size() == matrixSize - 1);
+    }
+    bool isSolutionFound() {
+        return solutionFound;
+    }
+    std::vector<std::vector<Selection>> getSolution() {
+        return solution;
+    }
+    //void createWorkingCopy() {
+    //    originalMatrix = (uint16_t*)malloc(sizeof(uint16_t) * matrixSize * matrixSize);
+    //    memcpy(originalMatrix, originalMatrix, (sizeof(uint16_t) * matrixSize * matrixSize));
+    //}
+    void printMatrix() {
+        std::cout << "\n Matrix: ";
+        for (int i = 0; i < matrixSize; i++) {
+            std::cout << "\n";
+            for (int j = 0; j < matrixSize; j++) {
+                std::cout << originalMatrix[i*matrixSize + j] << " ";
+            }
+        }
+        std::cout << "\n";
+    }
+    void printOriginalMatrix() {
+        std::cout << "\nOriginal Matrix: ";
+        for (int i = 0; i < matrixSize; i++) {
+            std::cout << "\n";
+            for (int j = 0; j < matrixSize; j++) {
+                std::cout << originalMatrix[i*matrixSize + j] << " ";
+            }
+        }
+        std::cout << "\n";
+    }
+    void printSolution() {
+        std::cout << "\n intermediate :\n";
+        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
+            std::for_each(v.begin(), v.end(), [&](Selection s) {
+                std::cout << "(" << s.row << ", " << s.col << ") value = " << s.val << " ";
+            });
+            std::cout << "\n";
+        });
+        std::cout << "\n";
+    }
+    void gatherZeroes(std::vector<Zero>& zeroes) {
+        for (int i = 0; i < matrixSize; i++) {
+            for (int j = 0; j < matrixSize; j++) {
+                if (originalMatrix[i*matrixSize + j] == 0) {
+                    zeroes.push_back(Zero(i, j));
+                }
+            }
+        }
+        //std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
+        //    std::cout << " zero found : row : " << z.row << " col : " << z.col << "\n";
+        //});
+    }
+    void findPenalties(std::vector<Zero>& zeroes) {
+        std::for_each(zeroes.begin(), zeroes.end(), [&](Zero& z) {
+            uint16_t horizontalMin = UINT16_MAX;
+            for (int col = 0; col < matrixSize; col++) {
+                if (col != z.col && originalMatrix[z.row*matrixSize + col] < horizontalMin) {
+                    horizontalMin = originalMatrix[z.row*matrixSize + col];
+                }
+            }
+            z.horPen = horizontalMin;
+            uint16_t verticalMin = UINT16_MAX;
+            for (int row = 0; row < matrixSize; row++) {
+                if (row != z.row && originalMatrix[row * matrixSize + z.col] < verticalMin) {
+                    verticalMin = originalMatrix[row * matrixSize + z.col];
+                }
+            }
+            z.vertPen = verticalMin;
+        });
+        //std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
+        //    std::cout << " (" << z.row << ", " << z.col << ") Horizontal penalty: " << z.horPen << " Vertical penalty: " << z.vertPen << "\n";
+        //});
+    }
+    Zero assignLargestPenalty(std::vector<Zero>& zeroes) {
+        Zero assignment = zeroes[0];
+        std::for_each(zeroes.begin(), zeroes.end(), [&](Zero z) {
+            if ((z.vertPen + z.horPen) > (assignment.vertPen + assignment.horPen)) {
+                assignment = z;
+            }
+        });
+        return assignment;
+    }
+    int addToResult(std::vector<std::vector<Selection>>& solution, Zero assignment) {
+        int startOfSubtourIndex = -1;
+        int endOfSubtourIndex = -1;
+        int index = 0;
+        int result = 0;
+        //dealing with subtours....
+        std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection>& subtour) {
+            if (subtour[0].row == assignment.col) {
+                startOfSubtourIndex = index;
+            }
+            else if (subtour.back().col == assignment.row) {
+                endOfSubtourIndex = index;
+            }
+            index++;
+        });
+        if (startOfSubtourIndex > -1 && endOfSubtourIndex > -1) {
+            //this case is true if our solution connects 2 subtours, by being at the start of one and at the end of the other.
+            //we combine them by concatenating the subtour that it is the start of after the subtour that it is the end of.
+            solution[endOfSubtourIndex].push_back(Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col]));
+            solution[endOfSubtourIndex].insert(solution[endOfSubtourIndex].end(), solution[startOfSubtourIndex].begin(), solution[startOfSubtourIndex].end());
+            solution.erase(solution.begin() + startOfSubtourIndex);
+            result = endOfSubtourIndex;
+            if (startOfSubtourIndex < endOfSubtourIndex) {
+                result--;
+            }
+        }
+        else if (startOfSubtourIndex > -1) {
+            solution[startOfSubtourIndex].insert(solution[startOfSubtourIndex].begin(), Selection(assignment.row, assignment.col, originalMatrix[assignment.row * matrixSize + assignment.col]));
+            //subtour elimination, we can no longer go FROM(column) subtour.back().row TO(row) assignment.col.
+            result = startOfSubtourIndex;
+        }
+        else if (endOfSubtourIndex > -1) {
+            solution[endOfSubtourIndex].push_back(Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col]));
+            //subtour elimination, we can no longer go FROM(column) assignment.row TO(row) subtour[0].col.
+            result = endOfSubtourIndex;
+        }
+        else {
+            solution.push_back(std::vector<Selection>(1, Selection(assignment.row, assignment.col, originalMatrix[assignment.row* matrixSize + assignment.col])));
+            result = solution.size() - 1;
+        }
+        return result;
+    }
+    void createLeftNode(std::vector<std::vector<Selection>>& solution, Zero assignment) {
+        uint16_t* leftSideMatrix = (uint16_t*)malloc(sizeof(uint16_t)*matrixSize*matrixSize);
+        for (int row = 0; row < matrixSize; row++) {
+            for (int col = 0; col < matrixSize; col++) {
+                if (row == assignment.row && col == assignment.col) {
+                    leftSideMatrix[row*matrixSize + col] = UINT16_MAX;
+                }
+                else {
+                    leftSideMatrix[row*matrixSize + col] = originalMatrix[row*matrixSize + col];
+                }
+            }
+        }
+        left = new Node(matrixSize, leftSideMatrix, solution, lowerBound + assignment.horPen + assignment.vertPen);
+    }
+    void createRightNode(std::vector<std::vector<Selection>>& solution, Zero assignment, int subtourIndex) {
+        uint16_t* rightSideMatrix = (uint16_t*)malloc(sizeof(uint16_t)*matrixSize*matrixSize);
+        for (int row = 0; row < matrixSize; row++) {
+            for (int col = 0; col < matrixSize; col++) {
+                if (assignment.row == row || assignment.col == col) {
+                    rightSideMatrix[row*matrixSize + col] = UINT16_MAX;
+                }
+                else {
+                    rightSideMatrix[row*matrixSize + col] = originalMatrix[row*matrixSize + col];
+                }
+            }
+        }
+        rightSideMatrix[solution[subtourIndex].back().col * matrixSize + solution[subtourIndex][0].row] = UINT16_MAX;
+        right = new Node(matrixSize, rightSideMatrix, solution);
+    }
+    Node getRight() {
+        return *right;
+    }
+    Node getLeft() {
+        return *left;
+    }
+};
+Node::Node(int amountOfCities, uint16_t* matrix, std::vector<std::vector<Selection>> solution) : solution(solution) {
+    matrixSize = amountOfCities;
+    originalMatrix = matrix;
+}
+Node::Node(int amountOfCities,uint16_t* matrix, std::vector<std::vector<Selection>> solution, uint16_t lb) : solution(solution){
+    matrixSize = amountOfCities;
+    originalMatrix = matrix;
+    lowerBound = lb;
+}
+class Tree {
+    Node *start;
+    int lowerBound;
+    int upperBound;
+    std::vector<Node> nodes;
+};
+void getMatrixFromFile(std::string filename, int matrixSize, uint16_t* originalMatrix) {
+    std::vector<int> xValues(matrixSize);
+    std::vector<int> yValues(matrixSize);
+    int n, m, i = 0;
+    std::ifstream read(filename);
+    while (i < matrixSize && read >> n >> m) {
+        xValues[i] = n;
+        yValues[i] = m;
+        i++;
+    }
+    for (int i = 0; i < matrixSize; i++) {
+        for (int j = 0; j < matrixSize; j++) {
+            if (i == j) {
+                originalMatrix[i*matrixSize + j] = UINT16_MAX;
+            }
+            else {
+                originalMatrix[i*matrixSize + j] = (uint16_t)sqrt(pow(xValues[i]-xValues[j], 2) + pow(yValues[i] - yValues[j],2));
+            }
+        }
+    }
+}
+void pruneNodeList(std::vector<Node>& nodeList, int upperBound) {
+    int i = 0;
+    while (i < nodeList.size()) {
+        if (nodeList[i].lowerBound >= upperBound) {
+            nodeList.erase(nodeList.begin() + i);
+        }
+        else {
+            ++i;
+        }
+    }
+}
+int calculateSolution(uint16_t* originalMatrix, std::vector<std::vector<Selection>>& solution, int matrixSize) {
+    int sum = 0;
+    std::for_each(solution.begin(), solution.end(), [&](std::vector<Selection> v) {
+        std::for_each(v.begin(), v.end(), [&](Selection s) {
+            sum += s.val;
+        });
+    });
+    sum += originalMatrix[solution[0].back().col * matrixSize + solution[0][0].row];
+    return sum;
+}
+int findNextEvaluation(std::vector<Node>& nodeList) {
+    Node eval = nodeList[0];
+    int index = 0;
+    int currentIndex = 0;
+    std::for_each(nodeList.begin(), nodeList.end(), [&](Node n) {
+        if (n.lowerBound < eval.lowerBound) {
+            currentIndex = index;
+        }
+        index++;
+    });
+    return currentIndex;
+}
+int main() {
+    int amountOfCities = 500;
+    uint16_t* originalMatrix = (uint16_t*)malloc(sizeof(uint16_t)*amountOfCities*amountOfCities);
+    getMatrixFromFile("cities.txt", amountOfCities, originalMatrix);
+    std::vector<std::vector<Selection>> solution;
+    Node node = Node(amountOfCities, originalMatrix, solution);
+    std::vector<Node> nodeList;
+    std::vector<std::vector<Selection>> currentSolution;
+    std::vector<std::vector<Selection>> tempSolution;
+    int upperBound = UINT16_MAX;
+    int count = 0;
+    clock_t t1, t2;
+    t1 = clock();
+    while (!node.isSolutionFound()) {
+        node.evaluate();
+        count++;
+        if (!node.isSolutionFound()) {
+            nodeList.push_back(node.getLeft());
+            node = node.getRight();
+        }
+    }
+    currentSolution = node.getSolution();
+    upperBound = calculateSolution(originalMatrix, currentSolution, amountOfCities);
+    pruneNodeList(nodeList, upperBound);
+    while (!nodeList.empty()) {
+        int nextNode = findNextEvaluation(nodeList);
+        node = nodeList[nextNode];
+        nodeList.erase(nodeList.begin() + nextNode);
+        while (!node.isSolutionFound() && !node.invalidSolution) {
+            node.evaluate();
+            count++;
+            if (!node.isSolutionFound() && !node.invalidSolution && node.lowerBound < upperBound) {
+                if (node.getLeft().lowerBound < upperBound) {
+                    nodeList.push_back(node.getLeft());
+                }
+                node = node.getRight();
+            }
+            else if (node.lowerBound >= upperBound) {
+                break;
+            } else if(!node.invalidSolution){
+                tempSolution = node.getSolution();
+                int value = calculateSolution(originalMatrix, tempSolution, amountOfCities);
+                if (value < upperBound) {
+                    currentSolution = tempSolution;
+                    upperBound = value;
+                    pruneNodeList(nodeList, upperBound);
+                }
+            }
+        }
+    }
+    t2 = clock();
+    float diff((float)t2 - (float)t1);
+    std::cout << "time spent: " << diff << "\n";
+    std::cout << "count = " << count << "\n";
+    std::cout << "\n Solution :\n";
+    std::for_each(currentSolution.begin(), currentSolution.end(), [&](std::vector<Selection> v) {
+        std::for_each(v.begin(), v.end(), [&](Selection s) {
+            std::cout << "(" << s.row << ", " << s.col << ") ";
+        });
+        std::cout << "\n";
+    });
+    std::cout << "\nResult = " << upperBound;
+}
+</pre>
+==useful links==
+#https://www.youtube.com/watch?v=-cLsEHP0qt0 - Branch and Bound explanation from minute 34:30.
+#https://www.cilkplus.org/tutorial-reducer-min - using the min reducer in cilk_for