0xCAFEBABE
Team Members
Progress
Assignment 1
Profile 1: Solving unsteady 1D heat transfer equation
For this project, the code analyzed was to solve a problem with constant value boundary condition.
The code was borrowed from Dr. Pengfei Du, and I had to split up his class calls into function calls, respectively, to help me better analyze it.
Flat profile: Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls Ts/call Ts/call name 0.03 0.01 0.01 1 0.00 0.00 _GLOBAL__sub_I__Z5setupR14pdu1dsteadyExp 99.78 1.03 1.02 1 1.01 1.01 _GLOBAL__sub_I__Z5solvingR15pdu1dsteadyExp
As shown, most of the time taken was in the solving function, which took care of calculating the wave.
Here is the result in a graphical animation: http://pengfeidu.net/Media/gif/1d-animated.gif
- Martin Ristov
Profile 2: Solving a Rubiks Cube
My choice of an application to profile is a Rubiks cube solver. While a Rubiks cube is typically thought of as a 3x3 cube, you can exponentially increase the number of squares. By doing so, solving a much larger Rubiks cube on a typical CPU becomes useless.
I came across a github repo for a (GPL v3) Rubiks cube solver at: https://github.com/brownan/Rubiks-Cube-Solver
I compiled Brown's code with profiling flags and got some interesting results.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls s/call s/call name 44.24 301.99 301.99 1768367109 0.00 0.00 cube_turn 41.94 588.30 286.31 1768367110 0.00 0.00 edge_hash2 13.51 680.53 92.24 1 92.24 682.02 edge_generate 0.10 681.24 0.71 132162051 0.00 0.00 stack_push 0.06 681.67 0.43 cube_120convert 0.05 682.03 0.36 edge_hash1 0.04 682.33 0.31 132162010 0.00 0.00 stack_peek_cube 0.03 682.52 0.19 132162010 0.00 0.00 stack_peek_distance 0.03 682.70 0.18 132162051 0.00 0.00 stack_pop 0.02 682.81 0.11 132162010 0.00 0.00 stack_peek_turn 0.00 682.82 0.02 _GLOBAL__sub_I_65535_0_cube_solved 0.00 682.82 0.00 1 0.00 0.00 edge_write 0.00 682.82 0.00 1 0.00 682.02 make_edge
cube_turn and edge_hash2 are the top two hotspots.
cube_turn - https://github.com/brownan/Rubiks-Cube-Solver/blob/master/cube.c#L181
edge_hash2 - https://github.com/brownan/Rubiks-Cube-Solver/blob/master/edgetable.c#L227
I feel like either two of these hotspots would be an excellent opportunity to parallelize.
~ Steven
Profile 3: Solving a Maze
I decided to profile an application which generates a maze according to the user provided rows and columns and then provides with the solution for the generated maze. As a maze can quickly rise in complexity when dealing with larger number of rows and columns, I believe that this type of an application could be a perfect fit to parallelize and make use of a much powerful GPU.
The above application has been taken form the following github repo : https://github.com/Syntaf/MazeSolver
Flat Profile :
Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls ms/call ms/call name 99.97 13.00 13.00 mazeGen::printMazeData(std::string) const 0.08 13.01 0.01 1 10.00 10.00 AStar::findPath(int const&, int const&, int, int const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >) 0.00 13.01 0.00 159198 0.00 0.00 disjointSets::setFind(int) 0.00 13.01 0.00 70742 0.00 0.00 void std::__adjust_heap<__gnu_cxx::__normal_iterator<node*, std::vector<node, std::allocator<node> > >, long, node, std::less<node> >(__gnu_cxx::__normal_iterator<node*, std::vector<node, std::allocator<node> > >, long, long, node, std::less<node>) 0.00 13.01 0.00 39999 0.00 0.00 disjointSets::setUnion(int, int) 0.00 13.01 0.00 4433 0.00 0.00 void std::vector<int, std::allocator<int> >::_M_emplace_back_aux<int>(int&&) 0.00 13.01 0.00 803 0.00 0.00 void std::vector<char, std::allocator<char> >::emplace_back<char>(char&&) [clone .constprop.131] 0.00 13.01 0.00 10 0.00 0.00 void std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > >::_M_emplace_back_aux<std::vector<char, std::allocator<char> > const&>(std::vector<char, std::allocator<char> > const&) 0.00 13.01 0.00 9 0.00 0.00 void std::vector<node, std::allocator<node> >::_M_emplace_back_aux<node const&>(node const&) 0.00 13.01 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN12disjointSetsC2Ei 0.00 13.01 0.00 1 0.00 0.00 _GLOBAL__sub_I__ZN4node14updatePriorityERKiS1_ 0.00 13.01 0.00 1 0.00 0.00 _GLOBAL__sub_I_main 0.00 13.01 0.00 1 0.00 0.00 disjointSets::disjointSets(int) 0.00 13.01 0.00 1 0.00 0.00 disjointSets::~disjointSets() 0.00 13.01 0.00 1 0.00 0.00 AStar::makeReadyMap(std::vector<std::vector<char, std::allocator<char> >, std::allocator<std::vector<char, std::allocator<char> > > > const&)
As you can see the maximum time spent by the CPU is while generating the maze, due to a number of nested loops which evaluate heavy computation of the maze data
- Luv