Studyapplocator
Assignment 1
Sudoku
What is Sudoku?
It is a puzzle game where players insert number on a grid consisting of squares with equal number of smaller squares inside. Most games consist of 9x9 grid with higher difficulty and run-time at bigger sizes and more missing numbers. The rules of the game are to insert numbers in such a way that every number appears once in each horizontal and vertical line and the inner square.
Profiling and Analysis
The following program was run on Matrix command line without any modification to the original code.
Easy Problem
command: ./Sudoku sample-puzzle-1 Input: 0 6 0 0 0 0 9 7 2 0 5 0 0 0 2 0 0 3 0 7 0 3 9 0 5 0 0 2 0 0 0 0 5 4 0 8 0 0 0 0 0 0 0 0 0 3 0 1 8 0 0 0 0 6 0 0 4 0 2 3 0 8 0 7 0 0 9 0 0 0 2 0 9 2 5 0 0 0 0 4 0 Solution: 1 6 3 4 5 8 9 7 2 4 5 9 7 1 2 8 6 3 8 7 2 3 9 6 5 1 4 2 9 7 1 6 5 4 3 8 5 8 6 2 3 4 1 9 7 3 4 1 8 7 9 2 5 6 6 1 4 5 2 3 7 8 9 7 3 8 9 4 1 6 2 5 9 2 5 6 8 7 3 4 1
Flat Profile - Easy
From this flat profile we can see that most of the functions being called were for checks of rows and columns with the third most called function being placeNum(int,int). As the problem complexity was low, the program completed quickly with far less calls compared to the hard problem.
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total time seconds seconds calls Ts/call Ts/call name 0.00 0.00 0.00 4539 0.00 0.00 checkRow(int, int) 0.00 0.00 0.00 1620 0.00 0.00 checkColumn(int, int) 0.00 0.00 0.00 1120 0.00 0.00 placeNum(int, int) 0.00 0.00 0.00 698 0.00 0.00 checkSquare(int, int, int) 0.00 0.00 0.00 476 0.00 0.00 goBack(int&, int&) 0.00 0.00 0.00 2 0.00 0.00 print(int (*) [9]) 0.00 0.00 0.00 1 0.00 0.00 _GLOBAL__sub_I_sudoku 0.00 0.00 0.00 1 0.00 0.00 _GLOBAL__sub_I_temp 0.00 0.00 0.00 1 0.00 0.00 solveSudoku() 0.00 0.00 0.00 1 0.00 0.00 storePositions() 0.00 0.00 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 0.00 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
Granularity - Easy
The call graph gives us a further breakdown of the program and the number of calls that were made to the functions in order to solve the easy puzzle.
Call graph granularity: each sample hit covers 4 byte(s) no time propagated index % time self children called name 0.00 0.00 4539/4539 placeNum(int, int) [7] [5] 0.0 0.00 0.00 4539 checkRow(int, int) [5] ----------------------------------------------- 0.00 0.00 1620/1620 placeNum(int, int) [7] [6] 0.0 0.00 0.00 1620 checkColumn(int, int) [6] ----------------------------------------------- 0.00 0.00 1120/1120 solveSudoku() [13] [7] 0.0 0.00 0.00 1120 placeNum(int, int) [7] 0.00 0.00 4539/4539 checkRow(int, int) [5] 0.00 0.00 1620/1620 checkColumn(int, int) [6] 0.00 0.00 698/698 checkSquare(int, int, int) [8] ----------------------------------------------- 0.00 0.00 698/698 placeNum(int, int) [7] [8] 0.0 0.00 0.00 698 checkSquare(int, int, int) [8] ----------------------------------------------- 0.00 0.00 476/476 solveSudoku() [13] [9] 0.0 0.00 0.00 476 goBack(int&, int&) [9] ----------------------------------------------- 0.00 0.00 2/2 main [4] [10] 0.0 0.00 0.00 2 print(int (*) [9]) [10] ----------------------------------------------- 0.00 0.00 1/1 __do_global_ctors_aux [22] [11] 0.0 0.00 0.00 1 _GLOBAL__sub_I_sudoku [11] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [15] ----------------------------------------------- 0.00 0.00 1/1 __do_global_ctors_aux [22] [12] 0.0 0.00 0.00 1 _GLOBAL__sub_I_temp [12] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [16] ----------------------------------------------- 0.00 0.00 1/1 main [4] [13] 0.0 0.00 0.00 1 solveSudoku() [13] 0.00 0.00 1120/1120 placeNum(int, int) [7] 0.00 0.00 476/476 goBack(int&, int&) [9] ----------------------------------------------- 0.00 0.00 1/1 main [4] [14] 0.0 0.00 0.00 1 storePositions() [14] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_sudoku [11] [15] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [15] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_temp [12] [16] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [16] ----------------------------------------------- Index by function name [11] _GLOBAL__sub_I_sudoku [13] solveSudoku() [10] print(int (*) [9]) [12] _GLOBAL__sub_I_temp [14] storePositions() [9] goBack(int&, int&) [6] checkColumn(int, int) [15] __static_initialization_and_destruction_0(int, int) [5] checkRow(int, int) [8] checkSquare(int, int, int) [16] __static_initialization_and_destruction_0(int, int) [7] placeNum(int, int)
Hard Problem
command: ./Sudoku sample-puzzle-2-hard Input: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 8 5 0 0 1 0 2 0 0 0 0 0 0 0 5 0 7 0 0 0 0 0 4 0 0 0 1 0 0 0 9 0 0 0 0 0 0 0 5 0 0 0 0 0 0 7 3 0 0 2 0 1 0 0 0 0 0 0 0 0 4 0 0 0 9 Output: 9 8 7 6 5 4 3 2 1 2 4 6 1 7 3 9 8 5 3 5 1 9 2 8 7 4 6 1 2 8 5 3 7 6 9 4 6 3 4 8 9 2 1 5 7 7 9 5 4 6 1 8 3 2 5 1 9 2 8 6 4 7 3 4 7 2 3 1 9 5 6 8 8 6 3 7 4 5 2 1 9
Flat Profile - Hard
From this flat profile we can see how the number of calls exploded as the complexity of the puzzle was increased with more missing numbers without changing the size of the grid. If the grid size was increased with an equal amount of missing numbers then the program would take far longer to complete with exponentially higher number of calls. As the problem became more complex, the previously low called functions in the easy version, checkSquare and goBack had a sudden spike in usage.
Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 46.41 21.66 21.66 622577597 0.00 0.00 checkRow(int, int) 19.43 30.73 9.07 223365661 0.00 0.00 checkColumn(int, int) 15.31 37.88 7.14 157353814 0.00 0.00 placeNum(int, int) 13.11 43.99 6.12 100608583 0.00 0.00 checkSquare(int, int, int) 3.10 45.44 1.45 69175252 0.00 0.00 goBack(int&, int&) 2.64 46.67 1.23 1 1.23 46.67 solveSudoku() 0.00 46.67 0.00 2 0.00 0.00 print(int (*) [9]) 0.00 46.67 0.00 1 0.00 0.00 _GLOBAL__sub_I_sudoku 0.00 46.67 0.00 1 0.00 0.00 _GLOBAL__sub_I_temp 0.00 46.67 0.00 1 0.00 0.00 storePositions() 0.00 46.67 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int) 0.00 46.67 0.00 1 0.00 0.00 __static_initialization_and_destruction_0(int, int)
Granularity - Hard
The call graph gives us a more detailed look on where exactly most of the time was spent in solving the hard puzzle problem and which areas can be possible candidates for optimization.
Call graph granularity: each sample hit covers 4 byte(s) for 0.02% of 46.67 seconds index % time self children called name <spontaneous> [1] 100.0 0.00 46.67 main [1] 1.23 45.44 1/1 solveSudoku() [2] 0.00 0.00 2/2 print(int (*) [9]) [11] 0.00 0.00 1/1 storePositions() [14] ----------------------------------------------- 1.23 45.44 1/1 main [1] [2] 100.0 1.23 45.44 1 solveSudoku() [2] 7.14 36.85 157353814/157353814 placeNum(int, int) [3] 1.45 0.00 69175252/69175252 goBack(int&, int&) [7] ----------------------------------------------- 7.14 36.85 157353814/157353814 solveSudoku() [2] [3] 94.3 7.14 36.85 157353814 placeNum(int, int) [3] 21.66 0.00 622577597/622577597 checkRow(int, int) [4] 9.07 0.00 223365661/223365661 checkColumn(int, int) [5] 6.12 0.00 100608583/100608583 checkSquare(int, int, int) [6] ----------------------------------------------- 21.66 0.00 622577597/622577597 placeNum(int, int) [3] [4] 46.4 21.66 0.00 622577597 checkRow(int, int) [4] ----------------------------------------------- 9.07 0.00 223365661/223365661 placeNum(int, int) [3] [5] 19.4 9.07 0.00 223365661 checkColumn(int, int) [5] ----------------------------------------------- 6.12 0.00 100608583/100608583 placeNum(int, int) [3] [6] 13.1 6.12 0.00 100608583 checkSquare(int, int, int) [6] ----------------------------------------------- 1.45 0.00 69175252/69175252 solveSudoku() [2] [7] 3.1 1.45 0.00 69175252 goBack(int&, int&) [7] ----------------------------------------------- 0.00 0.00 2/2 main [1] [11] 0.0 0.00 0.00 2 print(int (*) [9]) [11] ----------------------------------------------- 0.00 0.00 1/1 __do_global_ctors_aux [22] [12] 0.0 0.00 0.00 1 _GLOBAL__sub_I_sudoku [12] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [15] ----------------------------------------------- 0.00 0.00 1/1 __do_global_ctors_aux [22] [13] 0.0 0.00 0.00 1 _GLOBAL__sub_I_temp [13] 0.00 0.00 1/1 __static_initialization_and_destruction_0(int, int) [16] ----------------------------------------------- 0.00 0.00 1/1 main [1] [14] 0.0 0.00 0.00 1 storePositions() [14] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_sudoku [12] [15] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [15] ----------------------------------------------- 0.00 0.00 1/1 _GLOBAL__sub_I_temp [13] [16] 0.0 0.00 0.00 1 __static_initialization_and_destruction_0(int, int) [16] ----------------------------------------------- Index by function name [12] _GLOBAL__sub_I_sudoku [2] solveSudoku() [11] print(int (*) [9]) [13] _GLOBAL__sub_I_temp [14] storePositions() [7] goBack(int&, int&) [5] checkColumn(int, int) [15] __static_initialization_and_destruction_0(int, int) [4] checkRow(int, int) [6] checkSquare(int, int, int) [16] __static_initialization_and_destruction_0(int, int) [3] placeNum(int, int)
Analysis
The program was quite fast and had a really short run time for solving the given problem on a standard 9x9 block. This project was found on GitHub. It is GNU C++ compiler compatible and both the flat profile and call graph were generated on matrix successfully.
For the easy problem, the run time was quick as the inputted grid had more filled cells compared to the hard problem where most of the cells were set to 0. This increased the complexity scope and which is why the program took 46.67 seconds to complete.
From the hard puzzle call graph we can see that the function which could be prime candidates for optimization are the checkRow and checkColumn functions in which the program spends most of its time. Because of the type of mathematical problem set, this Sudoku solver can be an excellent application for a parallelization project.
Ray Tracing
Ray tracing is a rendering technique for generating an image by tracing the path of light as pixels in an image plane and simulating the effects of its encounters with virtual objects. The technique is capable of producing a very high degree of visual realism, usually higher than that of typical scan line rendering methods,but at a grater computational cost.(Wikipedia [1]).
Source Code
Source code taken from this location.[2]
Compile using the following command -
g++ -O2 -std=c++0x -pg raytracer.cpp -o raytracer
Profile using the command
gprof -p -b ./raytracer gmon.out > raytracer.flt
Profile generated
Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls us/call us/call name 81.82 0.36 0.36 307200 1.17 1.17 trace(Vec3<float> const&, Vec3<float> const&, std::vector<Sphere, std::allocator<Sphere> > const&, int const&) 18.18 0.44 0.08 render(std::vector<Sphere, std::allocator<Sphere> > const&) 0.00 0.44 0.00 4 0.00 0.00 void std::vector<Sphere, std::allocator<Sphere> >::_M_insert_aux<Sphere>(__gnu_cxx::__normal_iterator<Sphere*, std::vector<Sphere, std::allocator<Sphere> > >, Sphere&&) 0.00 0.44 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z3mixRKfS0_S0_
Where to parallelize the program?
From the above profile we can see that the trace function require faster computation time.
Finding the intersection of this ray with the sphere in the scene the algorithm takes a longer .Hence an area to parellelize the program will be here.
for (unsigned i = 0; i < spheres.size(); ++i) { float t0 = INFINITY, t1 = INFINITY; if (spheres[i].intersect(rayorig, raydir, t0, t1)) { if (t0 < 0) t0 = t1; if (t0 < tnear) { tnear = t0; sphere = &spheres[i]; } } }
Another area that will be speed up the program would be the render function
for (unsigned y = 0; y < height; ++y) { for (unsigned x = 0; x < width; ++x, ++pixel) { float xx = (2 * ((x + 0.5) * invWidth) - 1) * angle * aspectratio; float yy = (1 - 2 * ((y + 0.5) * invHeight)) * angle; Vec3f raydir(xx, yy, -1); raydir.normalize(); *pixel = trace(Vec3f(0), raydir, spheres, 0); } }
This function traces the rays for each pixel of the image traces it and returns a color.
Assignment 2
For our parallelization example we decided on using Ray Tracing as it is the best candidate for optimization based on the code.
Parallelization
After converting a portion of the code to become more parallelized we decided to test the run times of the program at various resolutions. This rendered a picture at a certain quality and at each resolution the run time increased.
Image resolution 512
Image resolution 1024
Image resolution 2048
Image resolution 4096
Assignment 3
Under Progress