Open main menu

CDOT Wiki β

Changes

SPO600 Algorithm Selection Lab

1,899 bytes added, 09:43, 1 November 2023
no edit summary
[[Category:SPO600 Labs- Retired]]{{Admon/lab|Purpose of this Lab|In this lab, you will investigate the impact of different algorithms which produce the same effect. You will test and select one of three algorithms for adjusting the volume of PCM audio samples based on benchmarking.}}
{{Admon/important|x86_64 and AArch64 Systems|This lab must be performed on both x86_64 and AArch64 systems. You may use the [[SPO600 Servers]] or you may use other system(s) -- it might make sense to use your own x86_64 system and [[SPO600_Servers#AArch64:_israel.cdot.systems|israel.cdot.systems]] for AArch64.}}
== Project Stage 1 (aka Lab 6) 5 ==
=== Background ===
* Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There are two streams of samples, one each for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples (kHz)per second per channel, for a total of 88.2 or 96 thousand samples per second (kHz). Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).
* To change the volume of sound, each sample can be scaled (multiplied) by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
* On a mobile device, the amount of processing required to scale sound will affect battery life.
# vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.
# vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.
 
'''Note that vol4.c and vol5.c will build only on AArch64 systems because they use architecture-specific SIMD instructions.'''
=== Don't Compare Across Machines ===
In this lab, ''do not'' compare the relative performance across different machines, because various systems have different microarchitectures, memory configurations, peripheral implementations, and clock speeds, from mobile-class to server-class (e.g. Intel Atom vs. Xeon; AMD APU vs. Threadripper; ARM Cortex-A35 A55 vs. Neoverse-V1V2).
However, ''do'' compare the relative performance of the various algorithms on the ''same'' machine.
** What background operations are being performed?
** How does your login on the machine affect performance (e.g., network activity)?
* <span style="background: #ffff00">Isolate the performance of the volume scaling code. </span> This is one of the most important parts of this lab! There are two practical approaches:
** Subtract the performance of the dummy version of the program from each of the other versions, or
** Add code to the program to measure and report just the performance of the volume-scaling code
** Take some measure of the amount of variation of your results (e.g., tolerance limits or standard deviation).
=== Resources ===* ARM Aarch64 documentation** [http://developer.arm.com/ ARM Developer Information Centre]*** [https://developer.arm.com/docs/den0024/latest ARM Cortex-A Series Programmer’s Guide for ARMv8-A]*** The ''short'' guide to the ARMv8 instruction set: [https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf ARMv8 Instruction Set Overview] ("ARM ISA Overview")*** The ''long'' guide to the ARMv8 instruction set: [https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile] ("ARM ARM")** [https://developer.arm.com/docs/ihi0055/latest/procedure-call-standard-for-the-arm-64-bit-architecture Procedure Call Standard for the ARM 64-bit Architecture (AArch64)] * x86_64 Documentation** AMD: https://developer.amd.com/resources/developer-guides-manuals/ (see the AMD64 Architecture section, particularly the ''AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions'')** Intel: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html** Web sites*** http://ref.x86asm.net/*** http://sandpile.org/ * Assembler and Inline Assembler** [[Assembler Basics]]** [[Inline Assembly Language]]** GAS Manual - Using as, The GNU Assembler: https://sourceware.org/binutils/docs/as/*** Specifically, the [http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Extended-Asm.html Assembler Instructions with C Expression Operands] section  == Benchmarking ===
Get the files for this lab from one of the [[SPO600 Servers]] -- but you can perform the lab wherever you want (feel free to use your laptop or home system). Test on both an x86_64 and an AArch64 system.
#** If there is a difference, is it significant enough to matter?
#* Change the number of samples so that each program takes a reasonable amount of time to execute (suggested minimum is 20 seconds).
# Test the performance of each program.(vol0 through vol3 on x86_64, and vol0 through vol5 on AArch64)
#* Find a way to measure performance ''without'' the time taken to perform the test setup pre-processing (generating the samples) and post-processing (summing the results) so that you can measure ''only'' the time taken to scale the samples. '''This is the hard part!'''
#* How much time is spent scaling the sound samples?
Blog about your experiments with a detailed analysis of your results, including memory usage, performance, accuracy, and trade-offs. Include answers to all of the questions marked with Q: in the source code.
'''Make sure you convincingly '''<u>prove''' </u> your results to your reader! ''' Re-read the [[#Important.21|section marked ''Important'' above]] and make sure you address the issues explained there. Also be sure to explain what you're doing so that a reader coming across your blog post understands the context (in other words, don't just jump into a discussion of optimization results -- give your post some context).
'''Optional - Recommended:''' Compare results across several '''implementations''' of AArch64 and x86_64 systems. Note that on different CPU implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.
* For AArch64, you could compare the performance on AArchie against a Raspberry Pi 4 (in 64-bit mode) or an ARM Chromebook.
* For x86_64, you could compare the performance of different processors, such as xerxesportugal.cdot.systems, your own laptop or desktop, and Seneca systems such as Matrix or lab desktops.
=== Things to consider ===