1,885
edits
Changes
no edit summary
[[Category:SPO600 Labs- Retired]]{{Admon/lab|Purpose of this Lab|In this lab, you will investigate the impact of different algorithms which produce the same effect.}}{{Admon/important|x86_64 and AArch64 Systems|This lab must be performed on both x86_64 and AArch64 systems. You will test may use the [[SPO600 Servers]] or you may use other system(s) -- it might make sense to use your own x86_64 system and select one of three algorithms [[SPO600_Servers#AArch64:_israel.cdot.systems|israel.cdot.systems]] for adjusting the volume of PCM audio samples based on benchmarkingAArch64.}}
== Lab 5 ==
=== Background ===
* Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There is one stream are two streams of samples , one each for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples (kHz)per secondper channel, for a total of 88.2 or 96 thousand samples per second (kHz). Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).* To change the volume of sound, each sample can be scaled (multiplied) by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
* On a mobile device, the amount of processing required to scale sound will affect battery life.
=== Basic Sound Scale Program Multiple Approaches ===
=== Alternate Approaches =Benchmarking ==
Perform these steps '''on both x86_64 and AArch64 systems''':# PreUnpack the archive <code>/public/spo600-volume-calculate examples.tgz</code># Study each of the source code files and make sure that you understand what the code is doing.# '''Make a lookup table (array) prediction''' of all possible sample values multiplied by the volume factor, relative performance of each scaling algorithm.# Build and look up test each sample in that table to get of the scaled valuesprograms.# Convert * Do all of the volume factor 0.75 algorithms produce the same output?#** How can you verify this?#** If there is a difference, is it significant enough to a fix-point integer by multiplying by a binary matter?#* Change the number representing of samples so that each program takes a fixed-point value "1"reasonable amount of time to execute (suggested minimum is 20 seconds). For example# Test the performance of each program (vol0 through vol3 on x86_64, you could use 0b100000000 and vol0 through vol5 on AArch64)#* Find a way to measure performance ''without'' the time taken to perform the test setup pre-processing (generating the samples) and post-processing (= 256 in decimalsumming the results) so that you can measure ''only'' the time taken to represent 1scale the samples.00. Shift '''This is the hard part!'''#* How much time is spent scaling the sound samples?#* Do multiple runs take the same time? How much variation do you observe? What is the likely cause of this variation?#* Is there any difference in the results produced by the various algorithms?#* Does the difference between the result to algorithms vary depending on the right architecture and implementation on which you test?#* What is the required number relative memory usage of bits after each program?# See if you can measurably increase performance by changing the multiplication compiler option (via the Makefile)# Was your prediction about performance accurate?# Find all of the questions, marked with <code>'''Q:'''</code>8 if you're using 256 as , in the multiplier)program comments, and answer those questions.
=== Deliverables ===
Blog about your experiments with a detailed analysis of your results, including memory usage, time performance, accuracy, and trade-offs. Include answers to all of the questions marked with Q: in the source code.
'''Optional - Recommended:''' Compare results across several '''implementations ''' of AArch64 and x86_64 systems. Note that on different CPU implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.* For AArch64, you could compare the performance on AArchie against another a Raspberry Pi 4 (in 64-bit ARM system such as a Raspberry Pi 3 mode) or an ARM Chromebook.* For x86_64, you could compare the performance of different processors, such as xerxesportugal.cdot.systems, your own laptop or desktop, and Seneca systems such as Matrix, Zenit, or lab desktops.
=== Things to consider ===
==== Design of Your Tests ====
* Most solutions for a problem of this type involve generating a large amount of data in an array, processing that array using the function being evaluated, and then storing that data back into an array. The test setup can take more time than the actual test! Make sure that you measure the time taken for the code in question (the part that scales the test function only sound samples) ONLY -- you need to be able to remove the rest of the processing time from your evaluation.* You may need to run a very massive large amount of sample data through the function to be able to detect its performance. Feel free to edit the sample count in <code>vol.h</code> as necessary.
* If you do not use the output from your calculation (e.g., do something with the output array), the compiler may recognize that, and remove the code you're trying to test. Be sure to process the results in some way so that the optimizer preserves the code you want to test. It is a good idea to calculate some sort of verification value to ensure that both approaches generate the same results.
* Be aware of what other tasks the system is handling during your test run, including software running on behalf of other users.
==== Analyzing Results =Tips ===* What is the impact {{Admon/tip|Analysis|Do a thorough analysis of various optimization levels on the software performance? results. Be certain (For example, -O0 / -O1 / -O2 / -O3and prove!)* Does that your performance measurement ''does not'' include the distribution generation or summarization of the test data matter? (. Do multiple runs and discard the outliers. Decide whether to use mean, minimum, or maximum time values from the multiple runs, and explain why you made that decision. Control your variables well. Show relative performance as percentage change, e.g., is there any difference if there are no absolute large numbers, or no negative numbers?)"this approach was NN% faster than that approach".}} * If samples are fed at CD rate {{Admon/tip|Time and Memory Usage of a Program|You can get basic timing information for a program by running <code>time ''programName''</code> -- the output will show the total time taken (44100 samples per second x 2 channels x 2 bytes per samplereal), can each the amount of CPU time used to run the algorithms keep up?* What is application (user), and the memory footprint amount of each approach?* What is CPU time used by the performance operating system on behalf of each approach?* What is the energy consumption of each approach? application (What information do you need to calculate this?system).* Various machines within an architecture have very The version of the <code>time</code> command located in <code>/bin/time</code> gives slightly different performance profiles, energy consumption, and hardware costs information than the version built in to bash -- so itincluding maximum resident memory usage: <code>/bin/time ''s not reasonable to compare performance between machines, but it is reasonable to compare the relative performance of the algorithms in each context. Does the ratio of performance of the various approaches remain constant across the machines? Why or why not?* What other optimizations can be applied to this problem?/programName''</code>}}
{{Admon/tip|SOX|If you want to try this with actual sound samples, you can convert a sound file of your choice to raw 16-bit signed integer PCM data using the [http://sox.sourceforge.net/ sox] utility present on most Linux systems and available for a wide range of platforms.}}
{{Admon/tip|Stack Limitstdint.h|Fixed-size, non-static arrays will be placed in the stack space. The size of the stack space is controlled by per-process limits, inherited from the shell, and adjustable with the <code>ulimitstdint.h</code> command. Allocating an array larger than the stack header provides definitions for many specialized integer size limit will cause a segmentation fault, usually on the first write. To see the current stack limit, use <code>ulimit -s</code> (displayed value is in KB; default is usually 8192 KB or 8 MB)types. To set the current stack limit, place a new size in KB or the keyword Use <code>unlimitedint16_t</code>after the <code>for 16-s</code> argument.<br /><br />Alternate (bit signed integers and preferred) approach, as used in the provided sample code: allocate the array space with <code>malloc()</code> or <code>calloc()uint16_t</code>for 16-bit unsigned integers.}}
{{Admon/tip|stdint.hScripting|The <code>stdint.h</code> header provides definitions for many specialized integer size types. Use <code>int16_t</code> for 16-bit signed integers.bash scripting capabilities to reduce tedious manual steps!}}