Open main menu

CDOT Wiki β

GPU621/Go Kimchi

Group Members

Project Description

This project will look at the performance difference between Go and Java for Parallel/Multithread program using the matrix multiplication algorithm. The project will present parallel program codes written in each language and .sh script to run the codes. Graphical result of running the codes will be given to clearly show the performance difference between Go & Java. It will also briefly cover key terms & concepts of parallel programming to help readers' understanding and the historical/fundamental difference of each language to determine what may have caused the performance difference.

Programming Language Introduction

What's GO Language?

Go Language is a open source programming language that is designed at Google by Robert Griesemer, Ken Thompson, and Rob Pike.

Go lang is a Fast and Pretty language. Go language is a compiler language which makes its execution fast. Go is faster than Python & JavaScript. In some cases, Go may be faster than Java or as fast as Java.
Go also has a relatively easy syntax which makes it easy to learn and fast to develop.
Is Go Popular?
GO is used by Google, Youtube, Google Download Server, Uber replaced Node.JS to Go, Twitch's trance-coding process Go & C++.




What's Java Language?

Java is open source and object-oriented programming language developed by Sun Microsystems in 1995. The founder is James Gosling. In 2010, Oracle acquired Sun Microsystems and owned Java's copyright.

Java is simple programming language, easy to learn, clean and easy to understand. This is because syntax of Java is based on C++ that has removed many confusing and rarely used functionality by user (e.g. operator overloading, explicit pointer, manual memory management etc.)

Java is the platform independent language. Programs can run on several different types of computer as long as computer has a Java Runtime Environment(JRE) installed.





Parallel Programming in GO & Java

Parallelism and Concurrency

Rob Pike who is developer of Go language has defined parallelism and concurrency in his speech "Concurrency is not Parallelism":

Parallelism defined as

     Parallelism is about doing lots of things at once.

Concurrency defined as

     Concurrency is about dealing with lots of things at once.

Concurrency allows to execute multiple tasks in partial order or out of order. The executions must finished before the next execution can start. While, parallelism has simultaneous execution on a multicore per CPU.

If we combine two concepts:

  Concurrency                 Concurrency + parallelism
  (Single-Core CPU)           (Multi-Core CPU)
   ___                         ___ ___
  |th1|                       |th1|th2|
  |   |                       |   |___|
  |___|___                    |   |___
      |th2|                   |___|th2|
   ___|___|                    ___|___|
  |th1|                       |th1|
  |___|___                    |   |___
      |th2|                   |   |th2|

diagram cited by Pithikos in Stack Overflow

Implementation

Go language focuses on concurrency and parallelism by implementing its own Goroutine instead of regular threading. Goroutines uses the built-in scheduler to conduct the threads in behind and hide many complexity of thread management from the user and execute them with “Go” command.

Java language can implement thread in three different methods. The first method is to create a class which extends the "Thread" class that gives the user to have control of the lifecycle of the thread. But Java language only allows single inheritance which prevents the thread from extending other base class. The second method is using "Runnable" interface. This method allows to user to extend other base class while still being run as a new thread. However, this method does not give user to control lifecycle of thread. The third, Java version 5.0 and later provides "concurrency" to simplify thread creation and processing. The "Executor" that implement by java.util.concurrent provides APIs for creating and processing new threads.

Implementation

General Implementation Info

Before diving into the code, here are some things to know.

1. To Increase the accuracy, both Go & Java program will run in terminal using .sh script
- To do this, you will need to enable "Root".
Run below command to start root-user terminal

 sudo -s

2. To Increase the accuracy, both Go & Java program will run in the same environment/machine
- Environment/machine used for below experiment is:

Machine Mac Book Pro 2019
Processor 2.3GHz 8-core 16 threads Intel Core i9
macOS Big Sur 11.5.1
Memory 16GB
Go Version v1.16.6
Java Version v15 2020-09-15

3. To run both program, 4096 * 4096 pre-generated Matrix with random values between 0 and 1000 will be used

- Create & Download Matrix here https://onlinenumbertools.com/random-number-matrix

Implementation for Go

Installation of Go

Go to https://golang.org/dl/ & Download Go accordingly to your OS.

Multi-Thread Program in Go

   package main
   import (
       "bufio"
       "fmt"
       "log"
       "os"
       "runtime"
       "runtime/debug"
       "strconv"
       "strings"
       "sync"
       "time"
   )
   func multiply(A [][]int, B [][]int) [][]int { sizeA := len(A)
       sizeB := len(B)
       n := make([][]int, sizeA)
       for i := range n {
           n[i] = make([]int, sizeB)
       }
       for i := 0; i < sizeA; i++ {
           for k := 0; k < sizeB; k++ {
               temp := A[i][k]
               for j := 0; j < sizeB; j++ {
                   n[i][j] += temp * B[k][j]
               }
           } 		
       }
       return n 
   }
   func splitMatrix(nrOfThreads int, matrix [][]int) (matrixes [][][]int) {
       splitter := len(matrix) / nrOfThreads
       for i := 0; i < nrOfThreads; i++ {
           matrixes = append(matrixes, matrix[splitter*i:(splitter*(i+1))])
       }
       return
   }
   func multiplyStuff(finalMatrix *[][][]int, matrix1 [][]int, matrix2 [][]int, i int) {
       (*finalMatrix)[i] = multiply(matrix1, matrix2) 
   }
   func readFile(filePath string) (matrix1 [][]int, matrix2 [][]int) {
       file, err := os.Open(filePath)
       if err != nil {
           log.Fatal(err)
       }
       defer file.Close()
       var temp []int
       matrixNr := 1
       scanner := bufio.NewScanner(file)
       for scanner.Scan() {
           words := strings.Fields(scanner.Text()) 
           if len(words) != 0 {
               for _, element := range words {
                   i, err := strconv.Atoi(element)
                   if err != nil {
                       log.Fatal(err)
                   }
                   temp = append(temp, i)
               }
               if matrixNr == 1 {
                   matrix1 = append(matrix1, temp)
                   matrix2 = append(matrix2, temp)
               } 
               temp = nil
           } else {
               matrixNr = 2 
    	    }
       }	
       if err := scanner.Err(); err != nil {
           log.Fatal(err)
       }
       return
   }
   func main() {
       file := os.Args[1]
       nrOfThreads, err := strconv.Atoi(os.Args[2]) 
       if err != nil {  
           log.Fatal("USAGE: " + os.Args[0] + " <file> <nrOfThreads>")
       }
       debug.SetGCPercent(-1)
       if nrOfThreads <= 0 {
           runtime.GOMAXPROCS(1)
       } else if nrOfThreads >= 16 {
           runtime.GOMAXPROCS(8)
       } else {
           runtime.GOMAXPROCS(nrOfThreads)
       }
       var wg sync.WaitGroup
       finishedMatrix := make([][][]int, nrOfThreads)
       matrix1, matrix2 := readFile(file)
       if len(matrix1) != len(matrix2) || (nrOfThreads != 0 && len(matrix1)%nrOfThreads != 0) {
           log.Fatal("USAGE: " + os.Args[0] + " <file> <nrOfThreads>")
       }
       var start int64
       if nrOfThreads == 0 {
           start = time.Now().UnixNano()
           multiply(matrix1, matrix2)
       } else {
           matrixes := splitMatrix(nrOfThreads, matrix1)
           start = time.Now().UnixNano()
           for i := 0; i < nrOfThreads; i++ {
               wg.Add(1)
               go func(index int) {
                   defer wg.Done()
                   multiplyStuff(&finishedMatrix, matrixes[index], matrix2, index)
               }(i)
           }
       wg.Wait() 
       }
       end := time.Now().UnixNano()
       fmt.Printf("Execution took %d ns\n", (end - start)) 
       runtime.GC()
   }

.sh Code to Run Go Program

Above Go code will run in terminal using .sh script.

   #! /bin/sh
   COUNTER=0
   MAX=7
   go build MatrixMultiGo.go > result.txt
   echo "Running tests for no multithreading..." 
   echo "### No Multithreading" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 0 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 1 thread..."
   echo "### Threading with 1 thread" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 1 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 2 threads..."
   echo "### Multithreading with 2 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
   ((COUNTER++))
   echo "Running test #$COUNTER"
   nice -n -20 ./MatrixMultiGo 4096.i 0>> result.txt
   done
   COUNTER=0
   echo "Running tests for 4 threads..."
   echo "### Multithreading with 4 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 4 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 8 threads..."
   echo "### Multithreading with 8 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 8 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 16 threads..."
   echo "### Multithreading with 16 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 ./MatrixMultiGo 4096.in 16 >> result.txt
   done
   echo "Done";

Implementation for Java

Installation of Java

Go to https://www.java.com/ko/download/manual.jsp & Download Java accordingly to your OS.

Multi-Thread Program in Java

 import java.io.BufferedReader;
 import java.io.FileReader; 
 import java.io.IOException;
 import java.util.LinkedList;
 import java.util.List;
 import java.util.ArrayList;
 public class Java_Thread {
   public List<ArrayList<ArrayList<Integer>>> read(String filename) {
       ArrayList<ArrayList<Integer>> A = new ArrayList<ArrayList<Integer>>();
       ArrayList<ArrayList<Integer>> B = new ArrayList<ArrayList<Integer>>();
       String thisLine;
       try {
           BufferedReader br = new BufferedReader(new FileReader(filename));
           while ((thisLine = br.readLine()) != null) {
               if (thisLine.trim().equals("")) {
                   break;
               } else {
                   ArrayList<Integer> line = new ArrayList<Integer>();
                   String[] lineArray = thisLine.split("\t"); for (String number : lineArray) {
                       line.add(Integer.parseInt(number)); 
                   }
                   A.add(line);
                   B.add(line);
               }
           }
           br.close();
       } catch (IOException e) {
           System.err.println("Error: " + e);
       }
       List<ArrayList<ArrayList<Integer>>> res = new LinkedList<ArrayList<ArrayList<Integer>>>();
       res.add(A);
       res.add(B);
       return res;
   }
   public int[][] matrixMultiplication(ArrayList<ArrayList<Integer>> A, ArrayList<ArrayList<Integer>> B, int m, int n) { 
       int[][] C = new int[m][n];
       for (int i = 0; i < m; i++) {
           for (int k = 0; k < n; k++) {
               int temp = A.get(i).get(k);
               for (int j = 0; j < n; j++) {
                   C[i][j] += temp * B.get(k).get(j);
               } 
           }       
       }
       return C; 
   }
   public ArrayList<ArrayList<ArrayList<Integer>>> splitMatrix(ArrayList<ArrayList<Integer>> A, int nrOfThreads) {
       int n = A.size();
       int m = n / nrOfThreads; 
       ArrayList<ArrayList<ArrayList<Integer>>> B = new
       ArrayList<ArrayList<ArrayList<Integer>>>(); 
       for (int i = 0; i < nrOfThreads; i++){
           B.add(new ArrayList<ArrayList<Integer>>(A.subList(i*m, (i+1)*m)));
       }
       return B; 
   }
   public static void main(String[] args) {
       String filename = "";
       int nrOfThreads = 0;
       if (args.length < 2) {
           System.out.println("Missing filename and/or number of threads\nUSAGE: java Java_Thread <file> <nrOfThreads>"); 
           System.exit(1);
       } else {
           filename = args[0];
           nrOfThreads = Integer.parseInt(args[1]); 
           Java_Thread java_Thread = new Java_Thread(); 
           java_Thread.start(filename, nrOfThreads);
       } 
   }
   public void start(String filename, int nrOfThreads) { 
       List<ArrayList<ArrayList<Integer>>> matrices = read(filename);
       ArrayList<ArrayList<Integer>> A = matrices.get(0); 
       ArrayList<ArrayList<Integer>> B = matrices.get(1);
       if (nrOfThreads <= 0) {
           int n = A.size();
           long startTime = System.nanoTime();
           int[][] C = matrixMultiplication(A, B, n, n); 
           long endTime = System.nanoTime();
           System.out.println("Execution took " + (endTime - startTime) + " ns");
       } else {
           if (A.size() % nrOfThreads != 0) {
               System.out.println("Size of matrix is not divisible by the supplied number of threads");
               System.exit(1);
           }
           ArrayList<Worker> threads = new ArrayList<Worker>(); 
           ArrayList<int[][]> result = new ArrayList<int[][]>(); 
           int[][] empty = new int[][]{{}};
           for (int i = 0; i < nrOfThreads; i++) {
               result.add(empty);
           }
           ArrayList<ArrayList<ArrayList<Integer>>> workerMatrices = splitMatrix(A, nrOfThreads);
           long startTime = System.nanoTime();
           for (int i = 0; i < nrOfThreads; i++) {
               i, result));
           }
           threads.add(new Worker(workerMatrices.get(i), B, threads.get(i).start();
           for (int i = 0; i < nrOfThreads; i++) {
               try {
                   threads.get(i).join();
               } catch (Exception e) {
                   System.err.println(e);
               }
           }
           long endTime = System.nanoTime();
           System.out.println("Execution took " + (endTime - startTime) + " ns");
       }  
   }
   class Worker extends Thread {
       private ArrayList<ArrayList<Integer>> A; 
       private ArrayList<ArrayList<Integer>> B; private int index;
       private ArrayList<int[][]> result; private int m;
       private int n;
       public Worker(ArrayList<ArrayList<Integer>> A, ArrayList<ArrayList<Integer>> B, int index, ArrayList<int[][]> result) {
           this.A = A;
           this.B = B;
           this.index = index;
           this.result = result;
           this.m = A.size();
           this.n = B.size();
       }
       @Override
       public void run() {
           this.result.set(this.index, matrixMultiplication(this.A, this.B, this.m, this.n));
       }
   } 
 }

.sh Code to Run Go Program

Above Java code will run in terminal using .sh script. .sh Code to Run Java Program

   #! /bin/sh
   COUNTER=0
   MAX=7
   javac Java_Thread.java > result.txt
   echo "Running tests for no multithreading..." 
   echo "### No Multithreading" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 0 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 1 thread..."
   echo "### Threading with 1 thread" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 1 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 2 threads..."
   echo "### Multithreading with 2 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 2 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 4 threads..."
   echo "### Multithreading with 4 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 4 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 8 threads..."
   echo "### Multithreading with 8 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 8 >> result.txt
   done
   COUNTER=0
   echo "Running tests for 16 threads..."
   echo "### Multithreading with 16 threads" >> result.txt 
   while [ $COUNTER -lt $MAX ]; 
   do
       ((COUNTER++))
       echo "Running test #$COUNTER"
       nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 16 >> result.txt
   done
   echo "Done";

Result

Charts

Performance difference in Go & Java
Go Java
Main thread 84.66 s 171.76 s
1 thread 84.28 s 169.51 s
2 thread 37.08 s 73.04 s
4 thread 19.2 s 34.91 s
8 thread 11.41 s 17.63 s
16 thread 14.38 s 16.08 s


 


Performance difference for Java Thread
difference
Main Thread to 1 thread 2.25%
1 thread to 2 threads 56.91%
2 threads to 4 threads 51.90%
4 threads to 8 threads 49.49%
8 threads to 16 threads 8.79%
Performance difference for Go Thread
difference
Main Thread to 1 thread 0.45%
1 thread to 2 threads 127.30%
2 threads to 4 threads 93.12%
4 threads to 8 threads 68.27%
8 threads to 16 threads -20.65%
Performance increase of Go in comparsion to Java Thread
difference
Main thread 102%
1 thread 101%
2 thread 96.90%
4 thread 81.80%
8 thread 55%
16 thread 12%

Analysis

The results of matrix multiplication implementation on Go language has remarkable superiority in main thread. However, the both language shows the tendency that execution time gradually decreases as the number of threads increase. The curve of execution time on the graph become flatten for both Go and Java when the number of threads become 8 and 16. We infer the Go has a reduction tendency when the number of threads reach 8 and 16 but Java shows the promotion tendency as number of threads reach 8 and 16. The best performance has been shown that the 8 threads for Go and 16 threads on Java. The percentage difference of implementation time between GO and Java shows minimum 12% when 16 threads being used and maximum 102% when main thread being used.

There is a few possible reason that Go and Java shows difference performance despite using same calculating matrix multiplication algorithm. First, the code for Go and Java is not exactly matched. second, Java uses JVM(Java Virtual Machine) to calculate the algorithm which may take more time to implement.

Reference