Open main menu

CDOT Wiki β



4,897 bytes added, 23:09, 11 December 2018
Group : NoName
=NoName or Group : Pixels2=
===Team Members===
# [ Alex]
==ASCII Art (Yuriy)==
The idea is take an image and turn it into a pictorial representation using ascii character. We use PNG as input and TXT file as output. The idea with TXT format is that they can be pasted into editors, their font can be modified, text colour and background changed for a customized look.
*Display video in native '''OS window'''
OpenCV give us a native Matrix object that represents a frame in a video or live feed. We can easily extract an array of unsigned chars from it. In C++ unsigned char hold an integer value from 0 - 255. This number represents the luminosity of a colour or grayscale channel. 0 being complete black and 255 pure white. This is also the same range we use for front-end work with CSS attribute:function ( '''background-color: rgb(0, 191, 255)''' ). For the test below we use RGB or Red Green and Blue channels to calculate luminosity ourselves for more areas to optimize. However, for simplicity in explaining Algorthim in the next section, we are going to assume that OpenCV already gives us a greyscale frame, as it has ability to do so.
This means that for a standard HD frame 1920 pixels by 1080 pixels we are given an array with 2,073,600 elements( unsigned char * data[2,073,600];) If we are processing a live feed at 30 frames per second we have a time constraint to process 62,208,000 elements (30 * 2,073,600) per second. From a perspective of a single frame, 1/30 that means have to process a frame under 33 milliseconds so as not to drop any frames. ===Algorithm and Pseudocode=== The implementation idea of ASCII art, is to break up an image into chunks represented by yellow boxes in the picture below. This means we have to iterate chunks in a line and then we iterate on lines of chunks - green boxes. This will form two outer-most loops in our code. [[File:zoomed.png|400px]] Then we think of processing a chunk itself. We need to average the value of all pixels within it to a single value. This requires us to iterate over each pixel in a chunk. Added complexity in indexing the pixels arises from the fact that lines of pixels are not contiguous in memory. This iteration forms the 2 inner-most loops in our algorithm. [[File: chunk.png|150px]] Once we have a single value we need to map its luminosity to a character we are going to replace it with. The character templates are small images we have in our project folder (Pound.png, AT.png W.png etc.) These are small images 7 by 11 pixels only, and we read all images into 2D array. The dimensions of our character template determines the size of the chunk. We have experimented with different size templates and found that smaller font to be preferable. Once we know which character we want to print we will copy, pixel by pixel from our template into output array of unassigned chars of the same length as our original frame.  [[File:charTmpl.png|150px]] Iterating over character template forms the second pair of inner-most loops. The pseudocode to do this is below. And while it may look like we have 4 nested loops the runtime complexity of our algorithm is O(N) since we are iterating over each pixel in an image exactly once. Because we are able to break down work into chunks, their processing can be done independently of each other. This problem can be classified as '''Embarrassingly parallel''' problem, however I prefer a new proposed term a '''perfectly parallel''' problem. (Source: []) [[File:codePxl.png|300px]] ==vTune Amplifier with OpenMP (Alex)==
'''vTune Amplifier Overview:'''
to our code. So referring to our sudocode, the new code looked something like this:
#pragma omp parallel for for( j ) { for( k ) { int sum=0 for(y) { for(x) { int index // using j, k x, y sum += input[index]; }
} int ave // get average using sum for(y) { for(x) { int index // using j, k x, y int charIndex // using x, y output[index] = ascii[charIndex] }
} }
This improved our overall runtime and reduced it to '''36.095 seconds'''.
Note the pink section on the master thread. That section represents OpenCV initializing the window the resulting video is displayed on. This is outside the scope of our code so we assume that it is uncontrollable overhead.
The orange blocks show the parallel regions where we process and convert each frame and the gap between is the task where OpenCV grabs each individual frame from the source.
Then we ran our code through the HPC Performance Characterization analysis to verify the efficiency of our OpenMP implementation.
==Intel Adviser (Dmytro)==
'''What can the Advisor Do:'''
=== Summary ===
 Our code changes involved: 1.Parallelized our code with the '''#pragma omp parallel for''' statement. * This improved our runtime to 5(+- 1 second) seconds of OpenCV overehead + length of video for a 1080p video 2. Added dynamic scheduling with our code * Improvement on load balancing. There was minimal runtime gain on our test videos because initial parallelization already optimized the code to handle frames as fast as they are retrieved 3. Eliminated the data type conversion in our logic * Improvement on memory management. There was minimal runtime gain on our test videos because initial parallelization already optimized the code to handle frames as fast as they are retrieved however the section itself showed a 50% improvement in runtime To preface our results:  We used OpenCV to process and display our results. This added roughly 5 seconds of overhead added by OpenCV to create the Window and the related processes required to display the resulting video. This is outside of our control. After our initial parallelization improvements our bottleneck is no longer our code but the speed at which we get frames from the source so our improvements do not improve our runtime with a 1080p video but could help maintain this level of execution on higher resolution videos.todo