Difference between revisions of "DPS921/ASCII"
(→Intel Adviser (Dmytro)) |
(→Solution) |
||
Line 93: | Line 93: | ||
The final solution achieved a speedup of around '''200ms''' on a 10s video clip. | The final solution achieved a speedup of around '''200ms''' on a 10s video clip. | ||
+ | |||
+ | === Summary === | ||
+ | ...todo |
Revision as of 13:44, 30 November 2018
Contents
NoName or Pixels2
Team Members
Presentation
Google Slides presentation can be found here
ASCII Art (Yuriy)
Introduction
The idea is take an image and turn it into a pictorial representation using ascii character. We use PNG as input and TXT file as output. The idea with TXT format is that they can be pasted into editors, their font can be modified, text colour and background changed for a customized look.
We decided to take it a step further and output a PNG file as well. This loses some of the functionality mentioned above, however now we are able to process videos since we can take a frame and run our algorithm through it. Having a video will also highlight how efficient our algorithm can run and whether we can keep up with live processing.
OUTPUT Samples
- Live video stream (MP4)
- Video file processing (MKV)
- Reverse sampling with white colour font and black colour background (PNG)
- R2D2 above (TXT)
Note: for best video quality download the files for view since Google compresses them further for playback in browsers. For a text file you'll need to decrease font such that no lines wrap unto to the next line and don't forget to use monospace font.
Working with image and video files
is an open source computer vision library which can do a lot of cool stuff. As beginners we use it for:
- Read and write image files
- Read and write video files
- Read from video stream like camera
- Display video in native OS window
vTune Amplifier with OpenMP (Alex)
Intel Adviser (Dmytro)
What can the Advisor Do:
1. Vectorization Optimization
- Use the cache-aware Roofline Analysis to identify high-impact, under-optimized loops and get tips for faster code.
- Quickly find what's blocking vectorization where it matters most to make the best use of your machine's Single Instruction Multiple Data (SIMD) capabilities.
- Identify where it is safe to force compiler vectorization.
- Use memory analysis to find inefficient memory usage.
2. Thread Prototyping
- Use Threading Advisor to fast-track threading design.
- Its simple workflow lets you quickly model threading designs while delivering the data and tips you need to make faster design and optimization decisions.
(More could be found https://software.intel.com/en-us/advisor)
Our Hotspot analysis from vTune(see previous section for details on vTune) identified that there are two bottlenecks in our code:
- imageToScaleeNaive - the main conversion function that creates image drawn with characters from the original buffer.
- calcLum - averages out the RGB values of a single pixel to convert it to grey scale.
See the details below:
As you can see, calcLum takes the second highest time. However, it does not have a single loop, and really only has a single line of code.
int calcLum(const unsigned char *pixels) { return (0.2126 * (int)pixels[2]) + (0.7152 * (int)pixels[1]) + (0.0722 * (int)pixels[0]); }
My initial thoughts were to take the code form the functions and just put it inside the loop, after all it is only called in a single place in the code and by putting it inside the main function, I would save on memory stack allocation.
In the snippet above, I moved hard-coded variables into constants that are declared before the loop starts as well as assigned each RGB value to an int variable. The last bit is needed because we need to get ASCII codes of each character to get its intensity value.
In doing so, I got the following results:
The above indicates that now we have 3 datatype miss-matches that take additional time on each loop to convert from char to int and then from int to float for the final calculation of the sum.
Solution
Unfortunately, sometimes there is no way around the datatype conversion, as it is the only way to retrieve the information. So all we can do now is minimize the impact.
The final solution above involves making sure that the data types stay consistent across the calculation by converting chars straight to floats and using float constants, thus only making a single type cast for every bracket pair.
The final solution achieved a speedup of around 200ms on a 10s video clip.
Summary
...todo