Difference between revisions of "SPO600 Algorithm Selection Lab"
Chris Tyler (talk | contribs) (→Conclusions) |
Chris Tyler (talk | contribs) |
||
Line 1: | Line 1: | ||
− | [[Category:SPO600 Labs]]{{Admon/lab|Purpose of this Lab|In this lab, you will select one of | + | [[Category:SPO600 Labs]]{{Admon/lab|Purpose of this Lab|In this lab, you will investigate the impact of different algorithms which produce the same effect. You will test and select one of three algorithms for adjusting the volume of PCM audio samples based on benchmarking.}} |
== Lab 5 == | == Lab 5 == | ||
Line 37: | Line 37: | ||
'''Optional - Recommended:''' Compare results across several implementations of AArch64 and x86_64 systems. Note that on different implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system. | '''Optional - Recommended:''' Compare results across several implementations of AArch64 and x86_64 systems. Note that on different implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system. | ||
− | * For AArch64, you could compare the performance of Cortex-A57 octa-core CPU (on aarchie) against the APM XGene-1 octa-core CPUs (on bbetty or ccharlie), or against Cortex-A53 cores (e.g., on a Raspberry Pi 3). | + | * For AArch64, you could compare the performance of Cortex-A57 octa-core CPU (on aarchie) against the APM XGene-1 octa-core CPUs (on bbetty or ccharlie), or against Cortex-A53 cores (e.g., on a Raspberry Pi 3, or on ddouglas). |
* For x86_64, you could compare the performance of different processors, such as xerxes, your own laptop or desktop, and Seneca systems such as Matrix, Zenit, or lap desktops. | * For x86_64, you could compare the performance of different processors, such as xerxes, your own laptop or desktop, and Seneca systems such as Matrix, Zenit, or lap desktops. | ||
Revision as of 15:09, 24 September 2018
Contents
Lab 5
Background
- Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There is one stream of samples for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples per second, for a total of 88.2 or 96 thousand samples per second. Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).
- To change the volume of sound, each sample can be scaled by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
- On a mobile device, the amount of processing required to scale sound will affect battery life.
Basic Sound Scale Program
Perform this lab on one of the ARMv8 AArch64 SPO600 Servers.
- Unpack the archive
/public/spo600-20181-algorithm-selection-lab.tgz
- Examine the
vol1.c
source code. This program:- Creates 500,000 random "sound samples" in an input array (the number of samples is set in the
vol.h
file). - Scales those samples by the volume factor 0.75 and stores them in an output array.
- Sums the output array and prints the sum.
- Creates 500,000 random "sound samples" in an input array (the number of samples is set in the
- Build and test this file.
- Does it produce the same output each time?
- Test the performance of this program. Adjust the number of samples as necessary.
- How long does it take to run?
- How much time is spent scaling the sound samples?
- Do multiple runs take the same time?
Alternate Approaches
Try these alternate approaches to scaling the sound samples by modifying copies of vol1.c
. Edit the Makefile
to build your modified programs as well as the original. Test each approach to see the performance impact:
- Pre-calculate a lookup table (array) of all possible sample values multiplied by the volume factor, and look up each sample in that table to get the scaled values.
- Convert the volume factor 0.75 to a fix-point integer by multiplying by a binary number representing a fixed-point value "1". For example, you could use 0b100000000 (= 256 in decimal) to represent 1.00. Shift the result to the right the required number of bits after the multiplication (>>8 if you're using 256 as the multiplier).
Conclusions
Blog about your experiments with an analysis of your results. Do a detailed analysis, including memory usage, time performance, and other trade-offs.
Important! -- explain what you're doing so that a reader coming across your blog post understands the context (in other words, don't just jump into a discussion of optimization results -- give your post some context).
Optional - Recommended: Compare results across several implementations of AArch64 and x86_64 systems. Note that on different implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.
- For AArch64, you could compare the performance of Cortex-A57 octa-core CPU (on aarchie) against the APM XGene-1 octa-core CPUs (on bbetty or ccharlie), or against Cortex-A53 cores (e.g., on a Raspberry Pi 3, or on ddouglas).
- For x86_64, you could compare the performance of different processors, such as xerxes, your own laptop or desktop, and Seneca systems such as Matrix, Zenit, or lap desktops.
Things to consider
Design of Your Tests
- Most solutions for a problem of this type involve generating a large amount of data in an array, processing that array using the function being evaluated, and then storing that data back into an array. Make sure that you measure the time taken in the test function only -- you need to be able to remove the rest of the processing time from your evaluation.
- You may need to run a very large amount of sample data through the function to be able to detect its performance. Feel free to edit the sample count in <file>vol.h</file> as necessary.
- If you do not use the output from your calculation (e.g., do something with the output array), the compiler may recognize that, and remove the code you're trying to test. Be sure to process the results in some way so that the optimizer preserves the code you want to test. It is a good idea to calculate some sort of verification value to ensure that both approaches generate the same results.
- Be aware of what other tasks the system is handling during your test run.
Analyzing Results
- What is the impact of various optimization levels on the software performance?
- Does the distribution of data matter?
- If samples are fed at CD rate (44100 samples per second x 2 channels x 2 bytes per sample), can each of the algorithms keep up?
- What is the memory footprint of each approach?
- What is the performance of each approach?
- What is the energy consumption of each approach? (What information do you need to calculate this?)
- Aarchie and Betty have different performance profiles, so it's not reasonable to compare performance between the machines, but it is reasonable to compare the relative performance of the two algorithms in each context. Do you get similar results?
- What other optimizations can be applied to this problem?