62
edits
Changes
→Intel Adviser (Dmytro)
See the details below:
[[File:Ascii vtune runtime.png]]
As you can see, '''calcLum''' takes the second highest time. However, it does not have a single loop, and really only has a single line of code.
int calcLum(const unsigned char *pixels)
{
return (0.2126 * (int)pixels[2]) + (0.7152 * (int)pixels[1]) + (0.0722 * (int)pixels[0]);
}
My initial thoughts were to take the code form the functions and just put it inside the loop, after all it is only called in a single place in the code and by putting it inside the main function, I would save on memory stack allocation.
[[File:ascii_variable_sizes.JPG]]
In the snippet above, I moved hard-coded variables into constants that are declared before the loop starts as well as assigned each RGB value to an int variable. The last bit is needed because we need to get ASCII codes of each character to get its intensity value.
In doing so, I got the following results:
[[File:ascii_data_type_missmatch_refactoring.JPG]]
The above indicates that now we have 3 datatype miss-matches that take additional time on each loop to convert from char to int and then from int to float for the final calculation of the sum.
=== Solution ===
Unfortunately, sometimes there is no way around the datatype conversion, as it is the only way to retrieve the information. So all we can do now is minimize the impact.
[[File:ascii_solution.JPG]]
The final solution above involves making sure that the data types stay consistent across the calculation by converting chars straight to floats and using float constants, thus only making a single type cast for every bracket pair.
The final solution achieved a speedup of around '''200ms''' on a 10s video clip.