Test Team Please Ignore
Contents
Test Team Please Ignore
Team Members
mailto:ebi@senecacollege.ca?subject=gpu610 Email All
Progress
Assignment 1
Image Rotation I profiled a code found on http://www.dreamincode.net/forums/topic/76816-image-processing-tutorial/ There are multiple functions available within the code, and I decided to try three of them (enlarge, flip, and rotate image) It turned out that rotation takes the longest time and good place to apply parallelization.
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total time seconds seconds calls ms/call ms/call name 34.55 0.19 0.19 Image::rotateImage(int, Image&) 25.45 0.33 0.14 Image::Image(Image const&) 18.18 0.43 0.10 1 100.00 100.00 Image::operator=(Image const&) 12.73 0.50 0.07 1 70.00 70.00 Image::Image(int, int, int) 5.45 0.53 0.03 writeImage(char*, Image&) 3.64 0.55 0.02 readImage(char*, Image&) 0.00 0.55 0.00 1 0.00 0.00 _GLOBAL__sub_I_main 0.00 0.55 0.00 1 0.00 0.00 Image::~Image()
Erquan Bi code source: https://people.sc.fsu.edu/~jburkardt/cpp_src/mandelbrot/mandelbrot.cpp
This program computer an image of the Mandelbrot set through function:
that 1, carry out the iteration for each pixel: void iterPixel(int n, int* count, int count_max, double x_max, double x_min, double y_max, double y_min); Which inludes a three nested loop, a hotspot, consuming around 70% of the total time
2, Determine the coloring of each pixel: void pixelColor(int& c_max, int n, int *count);
3, Set the image data: void setImageData(int n, int *r, int *g, int *b, int c_max, int* count); which includes 2 nested loop, a hotspot, taking up 26% of the time.
4, Then, write an image file: bool ppma_write(string file_out_name, int xsize, int ysize, int *r, int *g, int *b);
The Big-O class of iterPixel is O(n^3). For each iteration which is the (n+1)th row and the (n+1) column, the extra steps that needed to be taken for the mulitplication are (n+1)^3, which are O(n^2) - cubic mulitiply(n), is the hotspot logic of this program. It consumes up 75% of the elasped time and grow significantly. The program will be faster if this function can be speed up.
ebi@matrix:~/610/a1> time A1 501
real 0m0.336s
user 0m0.220s
sys 0m0.068s
ebi@matrix:~/610/a1> time A1 1001
real 0m1.289s
user 0m0.948s
sys 0m0.188s
ebi@matrix:~/610/a1> time A1 1501
real 0m2.859s
user 0m2.204s
sys 0m0.368s
A1.501.flt
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
75.00 0.09 0.09 iterPixel(int, int*, int, double, double, double, double)
25.00 0.12 0.03 setImageData(int, int*, int*, int*, int, int*)
0.00 0.12 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
0.00 0.12 0.00 1 0.00 0.00 ppma_write_data(std::basic_ofstream<char, std::char_traits<char> >&, int, int, int*, int*, int*)
0.00 0.12 0.00 1 0.00 0.00 ppma_write_header(std::basic_ofstream<char, std::char_traits<char> >&, std::string, int, int, int)
A1.1001.flt
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
67.31 0.35 0.35 iterPixel(int, int*, int, double, double, double, double)
26.92 0.49 0.14 setImageData(int, int*, int*, int*, int, int*)
5.77 0.52 0.03 1 30.00 30.00 ppma_write_data(std::basic_ofstream<char, std::char_traits<char> >&, int, int, int*, int*, int*)
0.00 0.52 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
0.00 0.52 0.00 1 0.00 0.00 ppma_write_header(std::basic_ofstream<char, std::char_traits<char> >&, std::string, int, int, int)
A1.1501.flt
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
67.31 0.35 0.35 iterPixel(int, int*, int, double, double, double, double)
26.92 0.49 0.14 setImageData(int, int*, int*, int*, int, int*)
5.77 0.52 0.03 1 30.00 30.00 ppma_write_data(std::basic_ofstream<char, std::char_traits<char> >&, int, int, int*, int*, int*)
0.00 0.52 0.00 1 0.00 0.00 _GLOBAL__sub_I_main
0.00 0.52 0.00 1 0.00 0.00 ppma_write_header(std::basic_ofstream<char, std::char_traits<char> >&, std::string, int, int, int)