Open main menu

CDOT Wiki β

GPU621/Go Kimchi

Revision as of 00:26, 4 August 2021 by Ywon7 (talk | contribs) (General Implementation Info)

Project Name

Parallel Programming in Go & Java

Group Members

Yuseok Won Minsu Kim

Project Description

This project will look at the performance difference between Go and Java using a classic matrix multiplication algorithm. It will present how to do parallel program in each language and show the different syntax codes that's using the similar algorithm. We will gather the results of running the code and show the graphical result to clearly show to performance difference between the two programming language. It will also briefly cover the history of each language, key term of parallel programming, look at the fundamental difference between two languages to see what may have caused the performance difference.

Progress

What's GO Language?


Go Language is a open source programming language that is designed at Google by Robert Griesemer, Ken Thompson, and Rob Pike.

Go lang is a Fast and Pretty language. Go language is a compiler language which makes its execution fast. Go is faster than Python & JavaScript. In some cases, Go may be faster than Java or as fast as Java.
Go also has a relatively easy syntax which makes it easy to learn and fast to develop.
Is Go Popular?
GO is used by Google, Youtube, Google Download Server, Uber replaced Node.JS to Go, Twitch's trance-coding process Go & C++.

What's Java Language?


Java is open source and Object-oriented programming language developed by Sun Microsystems in 1995. The founder is James Gosling. In 2010, Oracle acquired Sun Microsystems and owned Java's copyright.

Java is simple programming language, easy to learn, clean and easy to understand. This is because syntax of Java is based on C++ that has removed many confusing and rarely used functionality by user (e.g. operator overloading, explicit pointer, automatic garbage collection etc.)

Java is the platform independent language. Programs can run on several different types of computer as long as computer has a Java Runtime Environment(JRE) installed.





Parallel Programming in GO & Java


Parallelism and Concurrency

Rob Pike who is developer of Go language has defined parallelism and concurrency in his speech "Concurrency is not Parallelism":

Parallelism defined as

     Parallelism is about doing lots of things at once.

Concurrency defined as

     Concurrency is about dealing with lots of things at once.

Concurrency allows to execute multiple tasks in partial order or out of order. The executions must finished before the next execution can start. While, parallelism has simultaneous execution on a multicore per CPU.

If we combine two concepts:

  Concurrency                 Concurrency + parallelism
  (Single-Core CPU)           (Multi-Core CPU)
   ___                         ___ ___
  |th1|                       |th1|th2|
  |   |                       |   |___|
  |___|___                    |   |___
      |th2|                   |___|th2|
   ___|___|                    ___|___|
  |th1|                       |th1|
  |___|___                    |   |___
      |th2|                   |   |th2|

diagram cited by Pithikos in Stack Overflow

Go language focuses on concurrency and parallelism by implementing its own Goroutine instead of regular threading. Goroutines uses the built-in scheduler to conduct the threads in behind and hide many complexity of thread management from the user to specify which function to run and execute them with “Go” command.

Java language can implement thread in two different methods. The first method is to create a class which extends the “Thread” class that gives the user to have control of the lifecycle of the thread. But Java language only allows single inheritance which prevents the thread from being extend other base class. The second method is using ‘Runnable’ interface. This method allows to user to extend other base class while still being run as a new thread. However, this method does not give user to control lifecycle of thread.

Java version 5.0 and later provides concurrency to simplify thread creation and processing. The "Executor" that implement by java.util.concurrent provides APIs for creating and processing new threads using reusable threads.

General Implementation Info


Before diving into the code, here are some things to know.

1. To Increase the accuracy, both Go & Java program will run in terminal using .sh script
- To do this, you will need to enable "Root".

2.To Increase the accuracy, both Go & Java program will run in the same environment/machine
Environment/machine used for below experiment is:

Machine Mac Book Pro 2019
Processor 2.3GHz 8-core Intel Core i9
macOS Big Sur 11.5.1
Memory 16GB
Go Version 1.16.6
Java Version 15 2020-09-15

3. To run both program, 4096 * 4096 pre-generated Matrix with random values between 0 and 1000 will be used


You can create & download Matrix in the following link https://onlinenumbertools.com/random-number-matrix

Implementation for Go


Installation of Go
Go to https://golang.org/dl/ & Download Go accordingly to your OS.

Multi-Thread Program in Go

   package main
   import (
       "bufio"
       "fmt"
       "log"
       "os"
       "runtime"
       "runtime/debug"
       "strconv"
       "strings"
       "sync"
       "time"
   )
   func multiply(A [][]int, B [][]int) [][]int { sizeA := len(A)
       sizeB := len(B)
       n := make([][]int, sizeA)
       for i := range n {
           n[i] = make([]int, sizeB)
       }
       for i := 0; i < sizeA; i++ {
           for k := 0; k < sizeB; k++ {
               temp := A[i][k]
               for j := 0; j < sizeB; j++ {
                   n[i][j] += temp * B[k][j]
               }
           } 		
       }
       return n 
   }
   func splitMatrix(nrOfThreads int, matrix [][]int) (matrixes [][][]int) {
       splitter := len(matrix) / nrOfThreads
       for i := 0; i < nrOfThreads; i++ {
           matrixes = append(matrixes, matrix[splitter*i:(splitter*(i+1))])
       }
       return
   }
   func multiplyStuff(finalMatrix *[][][]int, matrix1 [][]int, matrix2 [][]int, i int) {
       (*finalMatrix)[i] = multiply(matrix1, matrix2) 
   }
   func readFile(filePath string) (matrix1 [][]int, matrix2 [][]int) {
       file, err := os.Open(filePath)
       if err != nil {
           log.Fatal(err)
       }
       defer file.Close()
       var temp []int
       matrixNr := 1
       scanner := bufio.NewScanner(file)
       for scanner.Scan() {
           words := strings.Fields(scanner.Text()) 
           if len(words) != 0 {
               for _, element := range words {
                   i, err := strconv.Atoi(element)
                   if err != nil {
                       log.Fatal(err)
                   }
                   temp = append(temp, i)
               }
               if matrixNr == 1 {
                   matrix1 = append(matrix1, temp)
                   matrix2 = append(matrix2, temp)
               } 
               temp = nil
           } else {
               matrixNr = 2 
    	    }
       }	
       if err := scanner.Err(); err != nil {
           log.Fatal(err)
       }
       return
   }
   func main() {
       file := os.Args[1]
       nrOfThreads, err := strconv.Atoi(os.Args[2]) 
       if err != nil {  
           log.Fatal("USAGE: " + os.Args[0] + " <file> <nrOfThreads>")
       }
       debug.SetGCPercent(-1)
       if nrOfThreads <= 0 {
           runtime.GOMAXPROCS(1)
       } else if nrOfThreads >= 16 {
           runtime.GOMAXPROCS(8)
       } else {
           runtime.GOMAXPROCS(nrOfThreads)
       }
       var wg sync.WaitGroup
       finishedMatrix := make([][][]int, nrOfThreads)
       matrix1, matrix2 := readFile(file)
       if len(matrix1) != len(matrix2) || (nrOfThreads != 0 && len(matrix1)%nrOfThreads != 0) {
           log.Fatal("USAGE: " + os.Args[0] + " <file> <nrOfThreads>")
       }
       var start int64
       if nrOfThreads == 0 {
           start = time.Now().UnixNano()
           multiply(matrix1, matrix2)
       } else {
           matrixes := splitMatrix(nrOfThreads, matrix1)
           start = time.Now().UnixNano()
           for i := 0; i < nrOfThreads; i++ {
               wg.Add(1)
               go func(index int) {
                   defer wg.Done()
                   multiplyStuff(&finishedMatrix, matrixes[index], matrix2, index)
               }(i)
           }
       wg.Wait() 
       }
       end := time.Now().UnixNano()
       fmt.Printf("Execution took %d ns\n", (end - start)) 
       runtime.GC()
   }

.sh Code to Run Go Program
Above Go code will run in terminal using .sh script.

   #! /bin/sh
   COUNTER=0
   MAX=7
   go build MatrixMultiGo.go > result.txt
   echo "Running tests for no multithreading..." 
   echo "### No Multithreading" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 0 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 1 thread..."
   echo "### Threading with 1 thread" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 1 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 2 threads..."
   echo "### Multithreading with 2 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
   ((COUNTER++))
   echo "Running test #$COUNTER"
   nice -n -20 ./MatrixMultiGo 4096.i 0>> result.txt
   done
   COUNTER=0
   echo "Running tests for 4 threads..."
   echo "### Multithreading with 4 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 4 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 8 threads..."
   echo "### Multithreading with 8 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 8 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 16 threads..."
   echo "### Multithreading with 16 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 16 >> result.txt
   done
   echo "Done";

Implementation for Java


Installation of Java
Go to https://www.java.com/ko/download/manual.jsp & Download Java accordingly to your OS.

Multi-Thread Program in Java

 import java.io.BufferedReader;
 import java.io.FileReader; 
 import java.io.IOException;
 import java.util.LinkedList;
 import java.util.List;
 import java.util.ArrayList;
 public class Java_Thread {
   public List<ArrayList<ArrayList<Integer>>> read(String filename) {
       ArrayList<ArrayList<Integer>> A = new ArrayList<ArrayList<Integer>>();
       ArrayList<ArrayList<Integer>> B = new ArrayList<ArrayList<Integer>>();
       String thisLine;
       try {
           BufferedReader br = new BufferedReader(new FileReader(filename));
           while ((thisLine = br.readLine()) != null) {
               if (thisLine.trim().equals("")) {
                   break;
               } else {
                   ArrayList<Integer> line = new ArrayList<Integer>();
                   String[] lineArray = thisLine.split("\t"); for (String number : lineArray) {
                       line.add(Integer.parseInt(number)); 
                   }
                   A.add(line);
                   B.add(line);
               }
           }
           br.close();
       } catch (IOException e) {
           System.err.println("Error: " + e);
       }
       List<ArrayList<ArrayList<Integer>>> res = new LinkedList<ArrayList<ArrayList<Integer>>>();
       res.add(A);
       res.add(B);
       return res;
   }
   public int[][] matrixMultiplication(ArrayList<ArrayList<Integer>> A, ArrayList<ArrayList<Integer>> B, int m, int n) { 
       int[][] C = new int[m][n];
       for (int i = 0; i < m; i++) {
           for (int k = 0; k < n; k++) {
               int temp = A.get(i).get(k);
               for (int j = 0; j < n; j++) {
                   C[i][j] += temp * B.get(k).get(j);
               } 
           }       
       }
       return C; 
   }
   public ArrayList<ArrayList<ArrayList<Integer>>> splitMatrix(ArrayList<ArrayList<Integer>> A, int nrOfThreads) {
       int n = A.size();
       int m = n / nrOfThreads; 
       ArrayList<ArrayList<ArrayList<Integer>>> B = new
       ArrayList<ArrayList<ArrayList<Integer>>>(); 
       for (int i = 0; i < nrOfThreads; i++){
           B.add(new ArrayList<ArrayList<Integer>>(A.subList(i*m, (i+1)*m)));
       }
       return B; 
   }
   public static void main(String[] args) {
       String filename = "";
       int nrOfThreads = 0;
       if (args.length < 2) {
           System.out.println("Missing filename and/or number of threads\nUSAGE: java Java_Thread <file> <nrOfThreads>"); 
           System.exit(1);
       } else {
           filename = args[0];
           nrOfThreads = Integer.parseInt(args[1]); 
           Java_Thread java_Thread = new Java_Thread(); 
           java_Thread.start(filename, nrOfThreads);
       } 
   }
   public void start(String filename, int nrOfThreads) { 
       List<ArrayList<ArrayList<Integer>>> matrices = read(filename);
       ArrayList<ArrayList<Integer>> A = matrices.get(0); 
       ArrayList<ArrayList<Integer>> B = matrices.get(1);
       if (nrOfThreads <= 0) {
           int n = A.size();
           long startTime = System.nanoTime();
           int[][] C = matrixMultiplication(A, B, n, n); 
           long endTime = System.nanoTime();
           System.out.println("Execution took " + (endTime - startTime) + " ns");
       } else {
           if (A.size() % nrOfThreads != 0) {
               System.out.println("Size of matrix is not divisible by the supplied number of threads");
               System.exit(1);
           }
           ArrayList<Worker> threads = new ArrayList<Worker>(); 
           ArrayList<int[][]> result = new ArrayList<int[][]>(); 
           int[][] empty = new int[][]{{}};
           for (int i = 0; i < nrOfThreads; i++) {
               result.add(empty);
           }
           ArrayList<ArrayList<ArrayList<Integer>>> workerMatrices = splitMatrix(A, nrOfThreads);
           long startTime = System.nanoTime();
           for (int i = 0; i < nrOfThreads; i++) {
               i, result));
           }
           threads.add(new Worker(workerMatrices.get(i), B, threads.get(i).start();
           for (int i = 0; i < nrOfThreads; i++) {
               try {
                   threads.get(i).join();
               } catch (Exception e) {
                   System.err.println(e);
               }
           }
           long endTime = System.nanoTime();
           System.out.println("Execution took " + (endTime - startTime) + " ns");
       }  
   }
   class Worker extends Thread {
       private ArrayList<ArrayList<Integer>> A; 
       private ArrayList<ArrayList<Integer>> B; private int index;
       private ArrayList<int[][]> result; private int m;
       private int n;
       public Worker(ArrayList<ArrayList<Integer>> A, ArrayList<ArrayList<Integer>> B, int index, ArrayList<int[][]> result) {
           this.A = A;
           this.B = B;
           this.index = index;
           this.result = result;
           this.m = A.size();
           this.n = B.size();
       }
       @Override
       public void run() {
           this.result.set(this.index, matrixMultiplication(this.A, this.B, this.m, this.n));
       }
   } 
 }

.sh Code to Run Go Program
Above Java code will run in terminal using .sh script. .sh Code to Run Java Program

   #! /bin/sh
   COUNTER=0
   MAX=7
   javac Java_Thread.java > result.txt
   echo "Running tests for no multithreading..." 
   echo "### No Multithreading" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 0 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 1 thread..."
   echo "### Threading with 1 thread" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 1 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 2 threads..."
   echo "### Multithreading with 2 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 2 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 4 threads..."
   echo "### Multithreading with 4 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 4 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 8 threads..."
   echo "### Multithreading with 8 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 8 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 16 threads..."
   echo "### Multithreading with 16 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 16 >> result.txt
   done
   echo "Done";

Result


The data used in the implementation is a 4096x4096 matrix


Performance difference in Go & Java
Go Java
Main thread 84.66 s 171.76 s
1 thread 84.28 s 169.51 s
2 thread 37.08 s 73.04 s
4 thread 19.2 s 34.91 s
8 thread 11.41 s 17.63 s
16 thread 14.38 s 16.08 s


Performance difference for Java Thread
difference
Main Thread to 1 thread 2.25%
1 thread to 2 threads 56.91%
2 threads to 4 threads 51.90%
4 threads to 8 threads 49.49
8 threads to 16 threads 8.79%


Performance increase of Go in comparsion to Java Thread
difference
Main thread 102%
1 thread 101%
2 thread 96.90%
4 thread 81.80%
8 thread 55%
16 thread 12%
Performance increase of Go in comparsion to Java Thread
difference
Main thread 102%
1 thread 101%
2 thread 96.90%
4 thread 81.80%
8 thread 55%
16 thread 12%

Analysis


The results of matrix multiplication for Go shows fastest in serial implementation.