Difference between revisions of "GPU621/Go Kimchi"
(→Parallel Programming in Go & Java) |
(→Reference) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 575: | Line 575: | ||
There is a few possible reason that Go and Java shows difference performance despite using same calculating matrix multiplication algorithm. First, the code for Go and Java is not exactly matched. second, Java uses JVM(Java Virtual Machine) to calculate the algorithm which may take more time to implement. | There is a few possible reason that Go and Java shows difference performance despite using same calculating matrix multiplication algorithm. First, the code for Go and Java is not exactly matched. second, Java uses JVM(Java Virtual Machine) to calculate the algorithm which may take more time to implement. | ||
+ | |||
+ | =Reference= | ||
+ | https://golang.org/doc/faq#goroutines <br> | ||
+ | https://docs.oracle.com/javase/tutorial/essential/concurrency/highlevel.html<br> | ||
+ | http://www.oracle.com/technetwork/java/javase/overview/javahistory-index-198355.html<br> | ||
+ | https://golang.org/doc/faq#Origins<br> | ||
+ | https://www.youtube.com/watch?v=oV9rvDllKEg<br> | ||
+ | https://stackoverflow.com/questions/1050222/what-is-the-difference-between-concurrency-and-parallelism<br> | ||
+ | https://onlinenumbertools.com/random-number-matrix<br> | ||
+ | https://golang.org/dl/<br> | ||
+ | https://www.java.com/ko/download/manual.jsp<br> | ||
+ | https://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html |
Latest revision as of 11:13, 11 August 2021
GPU621/DPS921 | Participants | Groups and Projects | Resources | Glossary
Contents
Group Members
Project Description
This project will look at the performance difference between Go and Java for Parallel/Multithread program using the matrix multiplication algorithm. The project will present parallel program codes written in each language and .sh script to run the codes. Graphical result of running the codes will be given to clearly show the performance difference between Go & Java. It will also briefly cover key terms & concepts of parallel programming to help readers' understanding and the historical/fundamental difference of each language to determine what may have caused the performance difference.
Programming Language Introduction
What's GO Language?
Go Language is a open source programming language that is designed at Google by Robert Griesemer, Ken Thompson, and Rob Pike.Go lang is a Fast and Pretty language.
Go language is a compiler language which makes its execution fast. Go is faster than Python & JavaScript. In some cases, Go may be faster than Java or as fast as Java.
Go also has a relatively easy syntax which makes it easy to learn and fast to develop.
Is Go Popular?
GO is used by Google, Youtube, Google Download Server, Uber replaced Node.JS to Go, Twitch's trance-coding process Go & C++.
What's Java Language?
Java is open source and object-oriented programming language developed by Sun Microsystems in 1995. The founder is James Gosling. In 2010, Oracle acquired Sun Microsystems and owned Java's copyright.
Java is simple programming language, easy to learn, clean and easy to understand. This is because syntax of Java is based on C++ that has removed many confusing and rarely used functionality by user (e.g. operator overloading, explicit pointer, manual memory management etc.)
Java is the platform independent language. Programs can run on several different types of computer as long as computer has a Java Runtime Environment(JRE) installed.
Parallel Programming in GO & Java
Parallelism and Concurrency
Rob Pike who is developer of Go language has defined parallelism and concurrency in his speech "Concurrency is not Parallelism":
Parallelism defined as
Parallelism is about doing lots of things at once.
Concurrency defined as
Concurrency is about dealing with lots of things at once.
Concurrency allows to execute multiple tasks in partial order or out of order. The executions must finished before the next execution can start. While, parallelism has simultaneous execution on a multicore per CPU.
If we combine two concepts:
Concurrency Concurrency + parallelism (Single-Core CPU) (Multi-Core CPU) ___ ___ ___ |th1| |th1|th2| | | | |___| |___|___ | |___ |th2| |___|th2| ___|___| ___|___| |th1| |th1| |___|___ | |___ |th2| | |th2|
diagram cited by Pithikos in Stack Overflow
Implementation
Go language focuses on concurrency and parallelism by implementing its own Goroutine instead of regular threading. Goroutines uses the built-in scheduler to conduct the threads in behind and hide many complexity of thread management from the user and execute them with “Go” command.
Java language can implement thread in three different methods. The first method is to create a class which extends the "Thread" class that gives the user to have control of the lifecycle of the thread. But Java language only allows single inheritance which prevents the thread from extending other base class. The second method is using "Runnable" interface. This method allows to user to extend other base class while still being run as a new thread. However, this method does not give user to control lifecycle of thread. The third, Java version 5.0 and later provides "concurrency" to simplify thread creation and processing. The "Executor" that implement by java.util.concurrent provides APIs for creating and processing new threads.
Implementation
General Implementation Info
Before diving into the code, here are some things to know.
1. To Increase the accuracy, both Go & Java program will run in terminal using .sh script
- To do this, you will need to enable "Root".
Run below command to start root-user terminal
sudo -s
2. To Increase the accuracy, both Go & Java program will run in the same environment/machine
- Environment/machine used for below experiment is:
Machine | Mac Book Pro 2019 |
Processor | 2.3GHz 8-core 16 threads Intel Core i9 |
macOS | Big Sur 11.5.1 |
Memory | 16GB |
Go Version | v1.16.6 |
Java Version | v15 2020-09-15 |
3. To run both program, 4096 * 4096 pre-generated Matrix with random values between 0 and 1000 will be used
- Create & Download Matrix here https://onlinenumbertools.com/random-number-matrix
Implementation for Go
Installation of Go
Go to https://golang.org/dl/ & Download Go accordingly to your OS.
Multi-Thread Program in Go
package main import ( "bufio" "fmt" "log" "os" "runtime" "runtime/debug" "strconv" "strings" "sync" "time" ) func multiply(A [][]int, B [][]int) [][]int { sizeA := len(A) sizeB := len(B) n := make([][]int, sizeA) for i := range n { n[i] = make([]int, sizeB) } for i := 0; i < sizeA; i++ { for k := 0; k < sizeB; k++ { temp := A[i][k] for j := 0; j < sizeB; j++ { n[i][j] += temp * B[k][j] } } } return n } func splitMatrix(nrOfThreads int, matrix [][]int) (matrixes [][][]int) { splitter := len(matrix) / nrOfThreads for i := 0; i < nrOfThreads; i++ { matrixes = append(matrixes, matrix[splitter*i:(splitter*(i+1))]) } return } func multiplyStuff(finalMatrix *[][][]int, matrix1 [][]int, matrix2 [][]int, i int) { (*finalMatrix)[i] = multiply(matrix1, matrix2) } func readFile(filePath string) (matrix1 [][]int, matrix2 [][]int) { file, err := os.Open(filePath) if err != nil { log.Fatal(err) } defer file.Close() var temp []int matrixNr := 1 scanner := bufio.NewScanner(file) for scanner.Scan() { words := strings.Fields(scanner.Text()) if len(words) != 0 { for _, element := range words { i, err := strconv.Atoi(element) if err != nil { log.Fatal(err) } temp = append(temp, i) } if matrixNr == 1 { matrix1 = append(matrix1, temp) matrix2 = append(matrix2, temp) } temp = nil } else { matrixNr = 2 } } if err := scanner.Err(); err != nil { log.Fatal(err) } return } func main() { file := os.Args[1] nrOfThreads, err := strconv.Atoi(os.Args[2]) if err != nil { log.Fatal("USAGE: " + os.Args[0] + " <file> <nrOfThreads>") } debug.SetGCPercent(-1) if nrOfThreads <= 0 { runtime.GOMAXPROCS(1) } else if nrOfThreads >= 16 { runtime.GOMAXPROCS(8) } else { runtime.GOMAXPROCS(nrOfThreads) } var wg sync.WaitGroup finishedMatrix := make([][][]int, nrOfThreads) matrix1, matrix2 := readFile(file) if len(matrix1) != len(matrix2) || (nrOfThreads != 0 && len(matrix1)%nrOfThreads != 0) { log.Fatal("USAGE: " + os.Args[0] + " <file> <nrOfThreads>") } var start int64 if nrOfThreads == 0 { start = time.Now().UnixNano() multiply(matrix1, matrix2) } else { matrixes := splitMatrix(nrOfThreads, matrix1) start = time.Now().UnixNano() for i := 0; i < nrOfThreads; i++ { wg.Add(1) go func(index int) { defer wg.Done() multiplyStuff(&finishedMatrix, matrixes[index], matrix2, index) }(i) } wg.Wait() } end := time.Now().UnixNano() fmt.Printf("Execution took %d ns\n", (end - start)) runtime.GC() }
.sh Code to Run Go Program
Above Go code will run in terminal using .sh script.
#! /bin/sh COUNTER=0 MAX=7 go build MatrixMultiGo.go > result.txt echo "Running tests for no multithreading..." echo "### No Multithreading" >> result.txt while [ $COUNTER -lt $MAX ] do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 ./MatrixMultiGo 4096.in 0 >> result.txt done COUNTER=0 echo "Running tests for 1 thread..." echo "### Threading with 1 thread" >> result.txt while [ $COUNTER -lt $MAX ] do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 ./MatrixMultiGo 4096.in 1 >> result.txt done COUNTER=0 echo "Running tests for 2 threads..." echo "### Multithreading with 2 threads" >> result.txt while [ $COUNTER -lt $MAX ] do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 ./MatrixMultiGo 4096.i 0>> result.txt done COUNTER=0 echo "Running tests for 4 threads..." echo "### Multithreading with 4 threads" >> result.txt while [ $COUNTER -lt $MAX ] do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 ./MatrixMultiGo 4096.in 4 >> result.txt done COUNTER=0 echo "Running tests for 8 threads..." echo "### Multithreading with 8 threads" >> result.txt while [ $COUNTER -lt $MAX ] do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 ./MatrixMultiGo 4096.in 8 >> result.txt done COUNTER=0 echo "Running tests for 16 threads..." echo "### Multithreading with 16 threads" >> result.txt while [ $COUNTER -lt $MAX ] do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 ./MatrixMultiGo 4096.in 16 >> result.txt done echo "Done";
Implementation for Java
Installation of Java
Go to https://www.java.com/ko/download/manual.jsp & Download Java accordingly to your OS.
Multi-Thread Program in Java
import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.LinkedList; import java.util.List; import java.util.ArrayList; public class Java_Thread { public List<ArrayList<ArrayList<Integer>>> read(String filename) { ArrayList<ArrayList<Integer>> A = new ArrayList<ArrayList<Integer>>(); ArrayList<ArrayList<Integer>> B = new ArrayList<ArrayList<Integer>>(); String thisLine; try { BufferedReader br = new BufferedReader(new FileReader(filename)); while ((thisLine = br.readLine()) != null) { if (thisLine.trim().equals("")) { break; } else { ArrayList<Integer> line = new ArrayList<Integer>(); String[] lineArray = thisLine.split("\t"); for (String number : lineArray) { line.add(Integer.parseInt(number)); } A.add(line); B.add(line); } } br.close(); } catch (IOException e) { System.err.println("Error: " + e); } List<ArrayList<ArrayList<Integer>>> res = new LinkedList<ArrayList<ArrayList<Integer>>>(); res.add(A); res.add(B); return res; } public int[][] matrixMultiplication(ArrayList<ArrayList<Integer>> A, ArrayList<ArrayList<Integer>> B, int m, int n) { int[][] C = new int[m][n]; for (int i = 0; i < m; i++) { for (int k = 0; k < n; k++) { int temp = A.get(i).get(k); for (int j = 0; j < n; j++) { C[i][j] += temp * B.get(k).get(j); } } } return C; } public ArrayList<ArrayList<ArrayList<Integer>>> splitMatrix(ArrayList<ArrayList<Integer>> A, int nrOfThreads) { int n = A.size(); int m = n / nrOfThreads; ArrayList<ArrayList<ArrayList<Integer>>> B = new ArrayList<ArrayList<ArrayList<Integer>>>(); for (int i = 0; i < nrOfThreads; i++){ B.add(new ArrayList<ArrayList<Integer>>(A.subList(i*m, (i+1)*m))); } return B; } public static void main(String[] args) { String filename = ""; int nrOfThreads = 0; if (args.length < 2) { System.out.println("Missing filename and/or number of threads\nUSAGE: java Java_Thread <file> <nrOfThreads>"); System.exit(1); } else { filename = args[0]; nrOfThreads = Integer.parseInt(args[1]); Java_Thread java_Thread = new Java_Thread(); java_Thread.start(filename, nrOfThreads); } } public void start(String filename, int nrOfThreads) { List<ArrayList<ArrayList<Integer>>> matrices = read(filename); ArrayList<ArrayList<Integer>> A = matrices.get(0); ArrayList<ArrayList<Integer>> B = matrices.get(1); if (nrOfThreads <= 0) { int n = A.size(); long startTime = System.nanoTime(); int[][] C = matrixMultiplication(A, B, n, n); long endTime = System.nanoTime(); System.out.println("Execution took " + (endTime - startTime) + " ns"); } else { if (A.size() % nrOfThreads != 0) { System.out.println("Size of matrix is not divisible by the supplied number of threads"); System.exit(1); } ArrayList<Worker> threads = new ArrayList<Worker>(); ArrayList<int[][]> result = new ArrayList<int[][]>(); int[][] empty = new int[][]{{}}; for (int i = 0; i < nrOfThreads; i++) { result.add(empty); } ArrayList<ArrayList<ArrayList<Integer>>> workerMatrices = splitMatrix(A, nrOfThreads); long startTime = System.nanoTime(); for (int i = 0; i < nrOfThreads; i++) { i, result)); } threads.add(new Worker(workerMatrices.get(i), B, threads.get(i).start(); for (int i = 0; i < nrOfThreads; i++) { try { threads.get(i).join(); } catch (Exception e) { System.err.println(e); } } long endTime = System.nanoTime(); System.out.println("Execution took " + (endTime - startTime) + " ns"); } } class Worker extends Thread { private ArrayList<ArrayList<Integer>> A; private ArrayList<ArrayList<Integer>> B; private int index; private ArrayList<int[][]> result; private int m; private int n; public Worker(ArrayList<ArrayList<Integer>> A, ArrayList<ArrayList<Integer>> B, int index, ArrayList<int[][]> result) { this.A = A; this.B = B; this.index = index; this.result = result; this.m = A.size(); this.n = B.size(); } @Override public void run() { this.result.set(this.index, matrixMultiplication(this.A, this.B, this.m, this.n)); } } }
.sh Code to Run Go Program
Above Java code will run in terminal using .sh script. .sh Code to Run Java Program
#! /bin/sh COUNTER=0 MAX=7 javac Java_Thread.java > result.txt echo "Running tests for no multithreading..." echo "### No Multithreading" >> result.txt while [ $COUNTER -lt $MAX ]; do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 0 >> result.txt done COUNTER=0 echo "Running tests for 1 thread..." echo "### Threading with 1 thread" >> result.txt while [ $COUNTER -lt $MAX ]; do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 1 >> result.txt done COUNTER=0 echo "Running tests for 2 threads..." echo "### Multithreading with 2 threads" >> result.txt while [ $COUNTER -lt $MAX ]; do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 2 >> result.txt done COUNTER=0 echo "Running tests for 4 threads..." echo "### Multithreading with 4 threads" >> result.txt while [ $COUNTER -lt $MAX ]; do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 4 >> result.txt done COUNTER=0 echo "Running tests for 8 threads..." echo "### Multithreading with 8 threads" >> result.txt while [ $COUNTER -lt $MAX ]; do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 8 >> result.txt done COUNTER=0 echo "Running tests for 16 threads..." echo "### Multithreading with 16 threads" >> result.txt while [ $COUNTER -lt $MAX ]; do ((COUNTER++)) echo "Running test #$COUNTER" nice -n -20 java -Xms8g -Xmx8g Java_Thread 4096.in 16 >> result.txt done echo "Done";
Result
Charts
|
|
|
|
Analysis
The results of matrix multiplication implementation on Go language has remarkable superiority in main thread. However, the both language shows the tendency that execution time gradually decreases as the number of threads increase. The curve of execution time on the graph become flatten for both Go and Java when the number of threads become 8 and 16. We infer the Go has a reduction tendency when the number of threads reach 8 and 16 but Java shows the promotion tendency as number of threads reach 8 and 16. The best performance has been shown that the 8 threads for Go and 16 threads on Java. The percentage difference of implementation time between GO and Java shows minimum 12% when 16 threads being used and maximum 102% when main thread being used.
There is a few possible reason that Go and Java shows difference performance despite using same calculating matrix multiplication algorithm. First, the code for Go and Java is not exactly matched. second, Java uses JVM(Java Virtual Machine) to calculate the algorithm which may take more time to implement.
Reference
https://golang.org/doc/faq#goroutines
https://docs.oracle.com/javase/tutorial/essential/concurrency/highlevel.html
http://www.oracle.com/technetwork/java/javase/overview/javahistory-index-198355.html
https://golang.org/doc/faq#Origins
https://www.youtube.com/watch?v=oV9rvDllKEg
https://stackoverflow.com/questions/1050222/what-is-the-difference-between-concurrency-and-parallelism
https://onlinenumbertools.com/random-number-matrix
https://golang.org/dl/
https://www.java.com/ko/download/manual.jsp
https://docs.oracle.com/javase/tutorial/essential/concurrency/executors.html