Changes

Jump to: navigation, search

SPO600 Algorithm Selection Lab

4,185 bytes added, 10:24, 27 September 2019
Design of Your Tests
[[Category:SPO600 Labs]]{{Admon/lab|Purpose of this Lab|In this lab, you will investigate the impact of different algorithms which produce the same effect. You will test and select one of two three algorithms for adjusting the volume of PCM audio samples based on benchmarking of two possible approaches.}}
== Lab 5 4 ==
1. Write two different approaches to adjusting the volume of a sequence of sound samples:=== Background ===* The first one should scale a Digital sound is typically represented, uncompressed, as signed 16-bit integer by multiplying it by a volume scaling factor expressed as a floating point number in signal samples. There is are two streams of samples, one each for the range left and right stereo channels, at typical sample rates of 044.000-1or 48 thousand samples per second per channel, for a total of 88.0002 or 96 thousand samples per second (kHz). This should be implemented as a function that accepts the sample Since there are 16 bits (int162 bytes) and scaling factor per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (float~172 KiB/sec) and returns the scaled sample or 96 * 1000 * 2 = 192,000 bytes/second (int16~187.5 KiB/sec).* The second version To change the volume of the function should do the same thingsound, using each sample can be scaled by a lookup table volume factor, in the range of 0.00 (a pre-computed array of all 65536 possible valuessilence)to 1. The lookup table should be initialized every time a different 00 (full volume factor is observed). This should be implemented as * On a drop-in replacement for mobile device, the function above (same parameters and return value)amount of processing required to scale sound will affect battery life.
2. Test which approach is faster. Control the variables and use a large run of data (at least hundreds millions of samples). Use both [[SPO600 Servers|x86 and AArch64]] systems for testing - DO NOT compare results between the architectures (because they are different classes of systems) but DO compare the relative performance of the algorithms on each architecture. For example, you might note that "Algorithm I is NN% faster on Architecture A, but NN% slower on Architecture B".=== Basic Sound Scale Program ===
3Get the files for this lab on one of the [[SPO600 Servers]] -- but you can perform the lab wherever you want. # Unpack the archive <code>/public/spo600-algorithm-selection-lab.tgz</code># Examine the <code>vol1.c</code> source code. This program:## Creates 5,000,000 random "sound samples" in a data array (the number of samples is set in the <code>vol.h</code> file).## Scales those samples by the volume factor 0.75 and stores them back to the data array.## Sums the output array and prints the sum.# Build and test this file.#* Does it produce the same output each time?# Test the performance of this program.#* How long does it take to run the scaling?#* How much time is spent scaling the sound samples? Be sure to eliminate the time taken for the non-scaling part of the program (e.g., random sample generation).#* Do multiple runs take the same time? How much variation do you observe? What is the likely cause of this variation?#* Is there any difference in the results produced by the various algorithms? How much does numeric accuracy matter in this application? === Alternate Approaches === The sample program uses the most basic, obvious algorithm for the problem. Let's call this "Algorithm 0", or the "Naive Algorithm". Note that it uses casting between integer and floating-point formats as well as multiplication -- both of which can be [[Expensive|expensive]] operations. Try these alternate algorithms for scaling the sound samples by modifying copies of <code>vol1.c</code>. Edit the <code>Makefile</code> to build your modified programs as well as the original. Test each approach to see the performance impact: # Pre-calculate a lookup table (array) of all possible sample values multiplied by the volume factor, and look up each sample in that table to get the scaled values. (You'll have to handle the fact that the input values range from -32768 to +32767, while C arrays accept only a positive index).# Convert the volume factor 0.75 to a fix-point integer by multiplying by a binary number representing a fixed-point value "1". For example, you could use 0b100000000 (= 256 in decimal) to represent 1.00, and therefore use 0.75 * 256 = 192 for your volume factor. Multiply this fixed-point integer volume factor by each sample, then shift the result to the right the required number of bits after the multiplication (>>8 if you're using 256 as the multiplier).  === Deliverables === Blog about your experiments with a detailed analysis of your results, including memory usage, time performance, accuracy, and trade-offs.  Important! -- explain what you're doing so that a reader coming across your blog post understands the context (in other words, don't just jump into a discussion of optimization results -- give your post some context). '''Optional - Recommended:''' Compare results across several implementations of AArch64 and x86_64 systems. Note that on different implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.* For AArch64, you could compare the performance on AArchie against another 64-bit ARM system such as a Raspberry Pi 3 or an ARM Chromebook.* For x86_64, you could compare the performance of different processors, such as xerxes, your own laptop or desktop, and Seneca systems such as Matrix, Zenit, or lab desktops.
=== Things to consider ===
==== Design of Your Test Tests ==== * Most solutions for a problem of this type involve generating a large amount of data in an array, processing that array using the function being evaluated, and then storing that data back into an array. Make sure that you measure the time taken in the test function only -- you need to be able toremove to remove the rest of the processing time from your evaluation.* You may need to run a very large amount of sample data through the function to be able to detect its performance. Feel free to edit the sample count in <code>vol.h</code> as necessary.
* If you do not use the output from your calculation (e.g., do something with the output array), the compiler may recognize that, and remove the code you're trying to test. Be sure to process the results in some way so that the optimizer preserves the code you want to test. It is a good idea to calculate some sort of verification value to ensure that both approaches generate the same results.
* Be aware of what other tasks the system is handling during your test run, including software running on behalf of other users.
==== Analyzing Results ====
* What is the impact of various optimization levels on the software performance? (For example, compiling with -O0 / -O1 / -O2 / -O3)* Does the distribution of data matter?(e.g., is there any difference if there are no absolute large numbers, or no negative numbers?)* If samples are fed at CD rate (44100 samples per second x 2 channelsx 2 bytes per sample), can both each of the algorithms keep up?
* What is the memory footprint of each approach?
* What is the performance of each approach?
* What is the energy consumption of each approach?(What information do you need to calculate this?)* Xerxes and Aarchie Various machines within an architecture have very different performance profiles, energy consumption, and hardware costs -- so it's not reasonable to compare performance between the machines, but it is reasonable to compare the relative performance of the two algorithms in each context. Do you get similar resultsDoes the ratio of performance of the various approaches remain constant across the machines? Why or why not?
* What other optimizations can be applied to this problem?
=== Competition Tips ===* How fast {{Admon/tip|Non-Decimal Notation|In this lab, the number prefix 0x indicates a hexadecimal number, and 0b indicates a binary number, in harmony with the C language.}}{{Admon/tip|Time and Memory Usage of a Program|You can you scale 500 million int16 PCM sound samples?get basic timing information for a program by running <code>time ''programName''</code> -- the output will show the total time taken (real), the amount of CPU time used to run the application (user), and the amount of CPU time used by the operating system on behalf of the application (system). Another version of the <code>time</code> command, located in <code>/bin/time</code>, gives slightly different information, including maximum resident memory usage: <code>/bin/time ''programName''</code>}}
=== Tips ===
{{Admon/tip|SOX|If you want to try this with actual sound samples, you can convert a sound file of your choice to raw 16-bit signed integer PCM data using the [http://sox.sourceforge.net/ sox] utility present on most Linux systems and available for a wide range of platforms.}}
{{Admon/tip|Stack Limit|Fixed-size, non-static arrays will be placed in the stack space. The size of the stack space is controlled by per-process limits, inherited from the shell, and adjustable with the <code>ulimit</code> command. Allocating an array larger than the stack size limit will cause a segmentation fault, usually on the first write. To see the current stack limit, use <code>ulimit -s</code> (displayed value is in KB; default is usually 8192 KB or 8 MB). To set the current stack limit, place a new size in KB or the keyword <code>unlimited</code>after the <code>-s</code> argument.<br /><br />Alternate (and preferred) approach, as used in the provided sample code: allocate the array space with <code>malloc()</code> or <code>calloc()</code>.}}
{{Admon/tip|stdint.h|The <code>stdint.h</code> header provides definitions for many specialized integer size types. Use <code>int16_t</code> for 16-bit signed integers.}}

Navigation menu