66
edits
Changes
Sirius
,→Conclusion
There must be a way to optimize this application, but as of today (March 4, 2018) I am not sure which path to take.<br>
For me the most important thing is to solve the problem regardless of the tools used and I think that reimplementing everything from scratch using OpenCV and CUDA is a viable solution.
=== Source Code for Vehicle Detection ===
<syntaxhighlight lang="cpp">
void detect_vehicles() {
for (unsigned int i = 0; i < files.size(); i++) {
// Load one image at the time and display it
load_image(img, files[i]);
win.set_image(img);
// Run the detector on the image and show the output
for (auto&& d : net(img)) {
auto fd = sp(img, d);
rectangle rect;
for (unsigned long j = 0; j < fd.num_parts(); ++j)
rect += fd.part(j);
if (d.label == "rear")
win.add_overlay(rect, rgb_pixel(255, 0, 0), d.label);
else
win.add_overlay(rect, rgb_pixel(255, 255, 0), d.label);
}
// Clear the overlay
dlib::sleep(1000);
win.clear_overlay();
}
}
</syntaxhighlight>
=== Box Blur on an image using opencv C++ Library (Max Fainshtein) ===
The application has the opportunity to receive an incredible boost to performance with the addition of parallel programming as most of the computational time is made up of calculating the average of every pixel which can be calculated concurrently, while only requiring a single synchronization at the end before we display the image.
=== Source Code for Box Blur ===
<syntaxhighlight lang="cpp">
int findingNeighbors(Mat img, int i, int j, int neighbour,float * b, float * g, float * r) {
int row_limit = img.rows;
int column_limit = img.cols;
Scalar temp;
double sum = 0, blue=0, red=0, green=0;
for (int x = i - floor(neighbour / 2); x <= i + floor(neighbour / 2); x++) {
for (int y = j - floor(neighbour / 2); y <= j + floor(neighbour / 2); y++) {
if (x >= 0 && y >= 0 && x < row_limit && y < column_limit) {
temp = img.at<Vec3b>(x, y);
blue += temp.val[0];
green += temp.val[1];
red += temp.val[2];
}
}
}
*b = blue / pow(neighbour, 2);
*g = green / pow(neighbour, 2);
*r = red / pow(neighbour, 2);
return 1;
}
</syntaxhighlight>
=== Algorithms (Joseph Pildush)===
0.00 17.52 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z10setRandArrRSt6vectorIiSaIiEEi
0.00 17.52 0.00 1 0.00 0.00 void std::__insertion_sort<__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, __gnu_cxx::__ops::_Iter_less_iter)
</source>
==== Source Code ====
<source>
//std::sort Algorithm
void stdSort(vector<int>& array,int arrSize,steady_clock::time_point ts,steady_clock::time_point te){
cout << "--==Execution Time of std::sort Alogirthm==--" << endl;
/*std::sort Algorithm*/
//Time the fill of 1 vector
ts = steady_clock::now();
//Fill array with random numbers
setRandArr(array, arrSize);
te = steady_clock::now();
printTiming("std::sort Vector (1) Initialize", te - ts);
//Start timing of std::sort
ts = steady_clock::now();
//Use std::sort to sort vector array1
sort(array.begin(),array.end());
//End timing std::sort
te = steady_clock::now();
//Print Results
printTiming("std::sort algorithm", te - ts);
}
//saxpy Algorithm
void saxpyAlg(int arrSize,steady_clock::time_point ts,steady_clock::time_point te){
cout << endl << "--==Execution Time of saxpy Alogirthm==--" << endl;
/*saxpy Algorithm*/
vector<int> saxpyX,saxpyY;
int saxpyA = 15;
//Time the fill of 2 vectors
ts = steady_clock::now();
setRandArr(saxpyX, arrSize);
setRandArr(saxpyY, arrSize);
te = steady_clock::now();
printTiming("saxpy Vectors (2) Initialize", te - ts);
//Start timing of saxpy
ts = steady_clock::now();
for (int i = 0;i < arrSize;++i)
saxpyY[i] = saxpyA*saxpyX[i] + saxpyY[i];
//End timing of saxpy
te = steady_clock::now();
printTiming("saxpy Algorithm", te - ts);
}
//Prefix Sum Algorithm
void prefixSum(vector<int>& array,int arrSize,steady_clock::time_point ts,steady_clock::time_point te){
cout << endl << "--==Execution Time of Prefix-Sum Alogirthm==--" << endl;
/*Prefix-Sum Algorithm*/
vector<int> psSum;
array.clear();
//Time the fill of 1 vector
ts = steady_clock::now();
//Fill array with random numbers
setRandArr(array, arrSize);
te = steady_clock::now();
printTiming("Prefix-Sum Vector (1) Initialize", te - ts);
//Start timing of Prefix-Sum
ts = steady_clock::now();
psSum.push_back(array[0]);
for (int i = 1;i < arrSize;++i)
psSum.push_back(psSum[i - 1] + array[i]);
//End timing of Prefix-Sum
te = steady_clock::now();
printTiming("Prefix-Sum Algorithm", te - ts);
}
</source>
4.00 0.75 0.03 2176681 0.00 0.00 std::__detail::_Hashtable_iterator<std::pair<unsigned int const, std::string>, false, false> std::_Hashtable<unsigned int, std::pair<unsigned int const, std::string>, std::allocator<std::pair<unsigned int const, std::string> >, std::_Select1st<std::pair<unsigned int const, std::string> >, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, false, false, true>::_M_insert_bucket<std::pair<unsigned int, std::string> >(std::pair<unsigned int, std::string>&&, unsigned int, unsigned int)
0.00 0.75 0.00 1 0.00 0.00 _GLOBAL__sub_I__Z18convert_int_to_bini
====Source Code====
<source>
void compress(string input, int size, string filename) {
unordered_map<string, int> compress_dictionary(MAX_DEF);
//Dictionary initializing with ASCII
for ( int unsigned i = 0 ; i < 256 ; i++ ){
compress_dictionary[string(1,i)] = i;
}
string current_string;
unsigned int code;
unsigned int next_code = 256;
//Output file for compressed data
ofstream outputFile;
outputFile.open(filename + ".lzw");
for(char& c: input){
current_string = current_string + c;
if ( compress_dictionary.find(current_string) ==compress_dictionary.end() ){
if (next_code <= MAX_DEF)
compress_dictionary.insert(make_pair(current_string, next_code++));
current_string.erase(current_string.size()-1);
outputFile << convert_int_to_bin(compress_dictionary[current_string]);
current_string = c;
}
}
if (current_string.size())
outputFile << convert_int_to_bin(compress_dictionary[current_string]);
outputFile.close();
}
void decompress(string input, int size, string filename) {
unordered_map<unsigned int, string> dictionary(MAX_DEF);
//Dictionary initializing with ASCII
for ( int unsigned i = 0 ; i < 256 ; i++ ){
dictionary[i] = string(1,i);
}
string previous_string;
unsigned int code;
unsigned int next_code = 256;
//Output file for decompressed data
ofstream outputFile;
outputFile.open(filename + "_uncompressed.txt");
int i =0;
while (i<size){
//Extracting 12 bits and converting binary to decimal
string subinput = input.substr(i,12);
bitset<12> binary(subinput);
code = binary.to_ullong();
i+=12;
if ( dictionary.find(code) ==dictionary.end() )
dictionary.insert(make_pair(code,(previous_string + previous_string.substr(0,1))));
outputFile<<dictionary[code];
if ( previous_string.size())
dictionary.insert(make_pair(next_code++,previous_string + dictionary[code][0]));
previous_string = dictionary[code];
}
outputFile.close();
}
</source>
=== Assignment 2 ===
=== Assignment 3 ===
<br><br>
Problem:
----
<br><br>
Solution:
----
<br><br>
<br><br>
The number of blocks for the grid had been recalculated to incorporate the complexity of the image and the new threads per block.
<br><br>
Problem:
----
We considered shared memory when optimizing our kernel. When attempting to implement shared memory we realized that it would be a difficult task to complete because every pixel in a block needs access to a different range of pixels for averaging. One major problem was that neighborhood pixels may fall out of range of the block. We also attempted to store the entire image in shared memory but this solution is not scalable to larger image sizes as shared memory is a limited resource.
<br><br>
Below you'll see that our optimizations although show slight improvements sometimes, it was not effective. We are currently still looking for a way to implement shared memory which will surely improve efficiency and execution time.
==== Graph ====
[[File:boxFilterOptimize.png | 750px]]
With further optimization, we managed to slightly improve the execution time of the blur effect.
<br><br>
Below are the final results of all the test runs as well as the corresponding graph.
==== Results ====
[[File:boxFilterFinalTable.png | 500px]]