Difference between revisions of "SPO600 Algorithm Selection Lab"

From CDOT Wiki
Jump to: navigation, search
(Design of Your Tests)
 
(44 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Category:SPO600 Labs]]{{Admon/lab|Purpose of this Lab|In this lab, you will investigate the impact of different algorithms which produce the same effect. You will test and select one of three algorithms for adjusting the volume of PCM audio samples based on benchmarking.}}
+
[[Category:SPO600 Labs - Retired]]{{Admon/lab|Purpose of this Lab|In this lab, you will investigate the impact of different algorithms which produce the same effect.}}
 +
{{Admon/important|x86_64 and AArch64 Systems|This lab must be performed on both x86_64 and AArch64 systems. You may use the [[SPO600 Servers]] or you may use other system(s) -- it might make sense to use your own x86_64 system and [[SPO600_Servers#AArch64:_israel.cdot.systems|israel.cdot.systems]] for AArch64.}}
  
 
== Lab 5 ==
 
== Lab 5 ==
  
 
=== Background ===
 
=== Background ===
* Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There is one stream of samples for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples per second, for a total of 88.2 or 96 thousand samples per second. Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).
+
* Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There are two streams of samples, one each for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples (kHz)per second per channel, for a total of 88.2 or 96 thousand samples per second. Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).
* To change the volume of sound, each sample can be scaled by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
+
* To change the volume of sound, each sample can be scaled (multiplied) by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
 
* On a mobile device, the amount of processing required to scale sound will affect battery life.
 
* On a mobile device, the amount of processing required to scale sound will affect battery life.
  
=== Basic Sound Scale Program ===
+
=== Multiple Approaches ===
  
Perform this lab on one of the ARMv8 AArch64 [[SPO600 Servers]].
+
Six programs are provided, each with a different approach to the problem, named <code>vol0.c</code> through <code>vol5.c</code>. A header file, <code>vol.h</code>, defines how much data (in number of sample) will be processed by each program, as well as the volume level to be used for scaling (50%).
  
# Unpack the archive <code>/public/spo600-20181-algorithm-selection-lab.tgz</code>
+
These are the six programs:
# Examine the <code>vol1.c</code> source code. This program:
+
 
## Creates 500,000 random "sound samples" in an input array (the number of samples is set in the <code>vol.h</code> file).
+
# vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be [[Expensive|expensive]] operations.
## Scales those samples by the volume factor 0.75 and stores them in an output array.
+
# vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.
## Sums the output array and prints the sum.
+
# vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value.
# Build and test this file.
+
# vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.
#* Does it produce the same output each time?
+
# vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.
# Test the performance of this program. Adjust the number of samples as necessary.
+
# vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.
#* How long does it take to run?
+
 
#* How much time is spent scaling the sound samples?
+
'''Note that vol4.c and vol5.c will build only on AArch64 systems because they use architecture-specific SIMD instructions.'''
#* Do multiple runs take the same time?
+
 
 +
=== Don't Compare Across Machines ===
 +
 
 +
In this lab, ''do not'' compare the relative performance across different machines, because various systems have different microarchitectures, memory configurations, peripheral implementations, and clock speeds, from mobile-class to server-class (e.g. Intel Atom vs. Xeon; AMD APU vs. Threadripper; ARM Cortex-A55 vs. Neoverse-V2).
 +
 
 +
However, ''do'' compare the relative performance of the various algorithms on the ''same'' machine.
 +
 
 +
=== Important! ===
 +
 
 +
The hardest part of this lab, and the most critical component, is being able to separate the performance of the volume scaling code from the rest of the code (which only exists to set up the test of the scaling code). The volume scaling code runs ''very'' quickly, and is dwarfed by the rest of the code.
 +
 
 +
You '''must''':
 +
* Control variables in your test environment.
 +
** What else is the machine doing while you are testing?
 +
** Who else is logged in to the machine?
 +
** What background operations are being performed?
 +
** How does your login on the machine affect performance (e.g., network activity)?
 +
* <span style="background: #ffff00">Isolate the performance of the volume scaling code.</span> This is one of the most important parts of this lab! There are two practical approaches:
 +
** Subtract the performance of the dummy version of the program from each of the other versions, or
 +
** Add code to the program to measure and report just the performance of the volume-scaling code
 +
* Repeat the tests multiple times to ensure that the results you are getting are consistent, valid, and accurately reflect the performance of the volume scaling code.
 +
** Make sure you are performing enough calculation to give a useful result -- adjust the SAMPLES value in <code>vol.h</code> to a sufficiently high value
 +
** Discard outliers (unusually high or low results)
 +
** Average the results.
 +
** Take some measure of the amount of variation of your results (e.g., tolerance limits or standard deviation).
 +
 
 +
=== Resources ===
 +
* ARM Aarch64 documentation
 +
** [http://developer.arm.com/ ARM Developer Information Centre]
 +
*** [https://developer.arm.com/docs/den0024/latest ARM Cortex-A Series Programmer’s Guide for ARMv8-A]
 +
*** The ''short'' guide to the ARMv8 instruction set: [https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf ARMv8 Instruction Set Overview] ("ARM ISA Overview")
 +
*** The ''long'' guide to the ARMv8 instruction set: [https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile ARM Architecture Reference Manual ARMv8, for ARMv8-A architecture profile] ("ARM ARM")
 +
** [https://developer.arm.com/docs/ihi0055/latest/procedure-call-standard-for-the-arm-64-bit-architecture Procedure Call Standard for the ARM 64-bit Architecture (AArch64)]
 +
 
 +
* x86_64 Documentation
 +
** AMD: https://developer.amd.com/resources/developer-guides-manuals/ (see the AMD64 Architecture section, particularly the ''AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions'')
 +
** Intel: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
 +
** Web sites
 +
*** http://ref.x86asm.net/
 +
*** http://sandpile.org/
 +
 
 +
* Assembler and Inline Assembler
 +
** [[Assembler Basics]]
 +
** [[Inline Assembly Language]]
 +
** GAS Manual -  Using as, The GNU Assembler: https://sourceware.org/binutils/docs/as/
 +
*** Specifically, the [http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Extended-Asm.html Assembler Instructions with C Expression Operands] section
 +
 
 +
 
 +
== Benchmarking ==
  
=== Alternate Approaches ===
+
Get the files for this lab from one of the [[SPO600 Servers]] -- but you can perform the lab wherever you want (feel free to use your laptop or home system). Test on both an x86_64 and an AArch64 system.
  
Try these alternate approaches to scaling the sound samples by modifying copies of <code>vol1.c</code>. Edit the <code>Makefile</code> to build your modified programs as well as the original. Test each approach to see the performance impact:
+
The files for this lab are in the archive <code>/public/spo600-volume-examples.tgz</code> on each of the SPO600 servers. The archive contains:
 +
* <code>vol.h</code> controls the number of samples to be processed and the volume level to be used
 +
* <code>vol0.c</code> through <code>vol5.c</code> implement the various algorithms
 +
* <code>vol_createsample.c</code> contains a function to create dummy samples
 +
* The <code>Makefile</code> can be used to build the programs
  
# Pre-calculate a lookup table (array) of all possible sample values multiplied by the volume factor, and look up each sample in that table to get the scaled values.
+
Perform these steps '''on both x86_64 and AArch64 systems''':
# Convert the volume factor 0.75 to a fix-point integer by multiplying by a binary number representing a fixed-point value "1". For example, you could use 0b100000000 (= 256 in decimal) to represent 1.00. Shift the result to the right the required number of bits after the multiplication (>>8 if you're using 256 as the multiplier).
+
# Unpack the archive <code>/public/spo600-volume-examples.tgz</code>
 +
# Study each of the source code files and make sure that you understand what the code is doing.
 +
# '''Make a prediction''' of the relative performance of each scaling algorithm.
 +
# Build and test each of the programs.
 +
#* Do all of the algorithms produce the same output?
 +
#** How can you verify this?
 +
#** If there is a difference, is it significant enough to matter?
 +
#* Change the number of samples so that each program takes a reasonable amount of time to execute (suggested minimum is 20 seconds).
 +
# Test the performance of each program (vol0 through vol3 on x86_64, and vol0 through vol5 on AArch64)
 +
#* Find a way to measure performance ''without'' the time taken to perform the test setup pre-processing (generating the samples) and post-processing (summing the results) so that you can measure ''only'' the time taken to scale the samples. '''This is the hard part!'''
 +
#* How much time is spent scaling the sound samples?
 +
#* Do multiple runs take the same time? How much variation do you observe? What is the likely cause of this variation?
 +
#* Is there any difference in the results produced by the various algorithms?
 +
#* Does the difference between the algorithms vary depending on the architecture and implementation on which you test?
 +
#* What is the relative memory usage of each program?
 +
# See if you can measurably increase performance by changing the compiler option (via the Makefile)
 +
# Was your prediction about performance accurate?
 +
# Find all of the questions, marked with <code>'''Q:'''</code>, in the program comments, and answer those questions.
  
=== Conclusions ===
+
=== Deliverables ===
Blog about your experiments with an analysis of your results. Do a detailed analysis, including memory usage, time performance, and other trade-offs.
 
  
Important! -- explain what you're doing so that a reader coming across your blog post understands the context (in other words, don't just jump into a discussion of optimization results -- give your post some context).
+
Blog about your experiments with a detailed analysis of your results, including memory usage, performance, accuracy, and trade-offs. Include answers to all of the questions marked with Q: in the source code.
  
'''Optional - Recommended:''' Compare results across several implementations of AArch64 and x86_64 systems. Note that on different implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.
+
'''Make sure you convincingly <u>prove</u> your results to your reader!''' Re-read the [[#Important.21|section marked ''Important'' above]] and make sure you address the issues explained there. Also be sure to explain what you're doing so that a reader coming across your blog post understands the context (in other words, don't just jump into a discussion of optimization results -- give your post some context).
* For AArch64, you could compare the performance of Cortex-A57 octa-core CPU (on aarchie) against the APM XGene-1 octa-core CPUs (on bbetty or ccharlie), or against Cortex-A53 cores (e.g., on a Raspberry Pi 3, or on ddouglas).
+
 
* For x86_64, you could compare the performance of different processors, such as xerxes, your own laptop or desktop, and Seneca systems such as Matrix, Zenit, or lap desktops.
+
'''Optional - Recommended:''' Compare results across several '''implementations''' of AArch64 and x86_64 systems. Note that on different CPU implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.
 +
* For AArch64, you could compare the performance on AArchie against a Raspberry Pi 4 (in 64-bit mode) or an ARM Chromebook.
 +
* For x86_64, you could compare the performance of different processors, such as portugal.cdot.systems, your own laptop or desktop, and Seneca systems such as Matrix or lab desktops.
  
 
=== Things to consider ===
 
=== Things to consider ===
  
 
==== Design of Your Tests ====
 
==== Design of Your Tests ====
* Most solutions for a problem of this type involve generating a large amount of data in an array, processing that array using the function being evaluated, and then storing that data back into an array. Make sure that you measure the time taken in the test function only -- you need to be able to remove the rest of the processing time from your evaluation.
+
* Most solutions for a problem of this type involve generating a large amount of data in an array, processing that array using the function being evaluated, and then storing that data back into an array. The test setup can take more time than the actual test! Make sure that you measure the time taken for the code in question (the part that scales the sound samples) ONLY -- you need to be able to remove the rest of the processing time from your evaluation.
* You may need to run a very large amount of sample data through the function to be able to detect its performance. Feel free to edit the sample count in <code>vol.h</code> as necessary.
+
* You may need to run a massive large amount of sample data through the function to be able to detect its performance.
 
* If you do not use the output from your calculation (e.g., do something with the output array), the compiler may recognize that, and remove the code you're trying to test. Be sure to process the results in some way so that the optimizer preserves the code you want to test. It is a good idea to calculate some sort of verification value to ensure that both approaches generate the same results.
 
* If you do not use the output from your calculation (e.g., do something with the output array), the compiler may recognize that, and remove the code you're trying to test. Be sure to process the results in some way so that the optimizer preserves the code you want to test. It is a good idea to calculate some sort of verification value to ensure that both approaches generate the same results.
* Be aware of what other tasks the system is handling during your test run.
+
* Be aware of what other tasks the system is handling during your test run, including software running on behalf of other users.
 +
 
 +
=== Tips ===
 +
{{Admon/tip|Analysis|Do a thorough analysis of the results. Be certain (and prove!) that your performance measurement ''does not'' include the generation or summarization of the test data. Do multiple runs and discard the outliers. Decide whether to use mean, minimum, or maximum time values from the multiple runs, and explain why you made that decision. Control your variables well. Show relative performance as percentage change, e.g., "this approach was NN% faster than that approach".}}
  
==== Analyzing Results ====
+
{{Admon/tip|Time and Memory Usage of a Program|You can get basic timing information for a program by running <code>time ''programName''</code> -- the output will show the total time taken (real), the amount of CPU time used to run the application (user), and the amount of CPU time used by the operating system on behalf of the application (system).
* What is the impact of various optimization levels on the software performance?
 
* Does the distribution of data matter?
 
* If samples are fed at CD rate (44100 samples per second x 2 channels x 2 bytes per sample), can each of the algorithms keep up?
 
* What is the memory footprint of each approach?
 
* What is the performance of each approach?
 
* What is the energy consumption of each approach? (What information do you need to calculate this?)
 
* Aarchie and Betty have different performance profiles, so it's not reasonable to compare performance between the machines, but it is reasonable to compare the relative performance of the two algorithms in each context. Do you get similar results?
 
* What other optimizations can be applied to this problem?
 
  
 +
The version of the <code>time</code> command located in <code>/bin/time</code> gives slightly different information than the version built in to bash -- including maximum resident memory usage: <code>/bin/time ''./programName''</code>}}
  
=== Tips ===
 
 
{{Admon/tip|SOX|If you want to try this with actual sound samples, you can convert a sound file of your choice to raw 16-bit signed integer PCM data using the [http://sox.sourceforge.net/ sox] utility present on most Linux systems and available for a wide range of platforms.}}
 
{{Admon/tip|SOX|If you want to try this with actual sound samples, you can convert a sound file of your choice to raw 16-bit signed integer PCM data using the [http://sox.sourceforge.net/ sox] utility present on most Linux systems and available for a wide range of platforms.}}
  
{{Admon/tip|Stack Limit|Fixed-size, non-static arrays will be placed in the stack space. The size of the stack space is controlled by per-process limits, inherited from the shell, and adjustable with the <code>ulimit</code> command. Allocating an array larger than the stack size limit will cause a segmentation fault, usually on the first write. To see the current stack limit, use <code>ulimit -s</code> (displayed value is in KB; default is usually 8192 KB or 8 MB). To set the current stack limit, place a new size in KB or the keyword <code>unlimited</code>after the <code>-s</code> argument.<br /><br />Alternate (and preferred) approach, as used in the provided sample code: allocate the array space with <code>malloc()</code> or <code>calloc()</code>.}}
+
{{Admon/tip|stdint.h|The <code>stdint.h</code> header provides definitions for many specialized integer size types. Use <code>int16_t</code> for 16-bit signed integers and <code>uint16_t</code> for 16-bit unsigned integers.}}
  
{{Admon/tip|stdint.h|The <code>stdint.h</code> header provides definitions for many specialized integer size types. Use <code>int16_t</code> for 16-bit signed integers.}}
+
{{Admon/tip|Scripting|Use bash scripting capabilities to reduce tedious manual steps!}}

Latest revision as of 08:43, 1 November 2023

Lab icon.png
Purpose of this Lab
In this lab, you will investigate the impact of different algorithms which produce the same effect.
Important.png
x86_64 and AArch64 Systems
This lab must be performed on both x86_64 and AArch64 systems. You may use the SPO600 Servers or you may use other system(s) -- it might make sense to use your own x86_64 system and israel.cdot.systems for AArch64.

Lab 5

Background

  • Digital sound is typically represented, uncompressed, as signed 16-bit integer signal samples. There are two streams of samples, one each for the left and right stereo channels, at typical sample rates of 44.1 or 48 thousand samples (kHz)per second per channel, for a total of 88.2 or 96 thousand samples per second. Since there are 16 bits (2 bytes) per sample, the data rate is 88.2 * 1000 * 2 = 176,400 bytes/second (~172 KiB/sec) or 96 * 1000 * 2 = 192,000 bytes/second (~187.5 KiB/sec).
  • To change the volume of sound, each sample can be scaled (multiplied) by a volume factor, in the range of 0.00 (silence) to 1.00 (full volume).
  • On a mobile device, the amount of processing required to scale sound will affect battery life.

Multiple Approaches

Six programs are provided, each with a different approach to the problem, named vol0.c through vol5.c. A header file, vol.h, defines how much data (in number of sample) will be processed by each program, as well as the volume level to be used for scaling (50%).

These are the six programs:

  1. vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations.
  2. vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.
  3. vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value.
  4. vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.
  5. vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.
  6. vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.

Note that vol4.c and vol5.c will build only on AArch64 systems because they use architecture-specific SIMD instructions.

Don't Compare Across Machines

In this lab, do not compare the relative performance across different machines, because various systems have different microarchitectures, memory configurations, peripheral implementations, and clock speeds, from mobile-class to server-class (e.g. Intel Atom vs. Xeon; AMD APU vs. Threadripper; ARM Cortex-A55 vs. Neoverse-V2).

However, do compare the relative performance of the various algorithms on the same machine.

Important!

The hardest part of this lab, and the most critical component, is being able to separate the performance of the volume scaling code from the rest of the code (which only exists to set up the test of the scaling code). The volume scaling code runs very quickly, and is dwarfed by the rest of the code.

You must:

  • Control variables in your test environment.
    • What else is the machine doing while you are testing?
    • Who else is logged in to the machine?
    • What background operations are being performed?
    • How does your login on the machine affect performance (e.g., network activity)?
  • Isolate the performance of the volume scaling code. This is one of the most important parts of this lab! There are two practical approaches:
    • Subtract the performance of the dummy version of the program from each of the other versions, or
    • Add code to the program to measure and report just the performance of the volume-scaling code
  • Repeat the tests multiple times to ensure that the results you are getting are consistent, valid, and accurately reflect the performance of the volume scaling code.
    • Make sure you are performing enough calculation to give a useful result -- adjust the SAMPLES value in vol.h to a sufficiently high value
    • Discard outliers (unusually high or low results)
    • Average the results.
    • Take some measure of the amount of variation of your results (e.g., tolerance limits or standard deviation).

Resources


Benchmarking

Get the files for this lab from one of the SPO600 Servers -- but you can perform the lab wherever you want (feel free to use your laptop or home system). Test on both an x86_64 and an AArch64 system.

The files for this lab are in the archive /public/spo600-volume-examples.tgz on each of the SPO600 servers. The archive contains:

  • vol.h controls the number of samples to be processed and the volume level to be used
  • vol0.c through vol5.c implement the various algorithms
  • vol_createsample.c contains a function to create dummy samples
  • The Makefile can be used to build the programs

Perform these steps on both x86_64 and AArch64 systems:

  1. Unpack the archive /public/spo600-volume-examples.tgz
  2. Study each of the source code files and make sure that you understand what the code is doing.
  3. Make a prediction of the relative performance of each scaling algorithm.
  4. Build and test each of the programs.
    • Do all of the algorithms produce the same output?
      • How can you verify this?
      • If there is a difference, is it significant enough to matter?
    • Change the number of samples so that each program takes a reasonable amount of time to execute (suggested minimum is 20 seconds).
  5. Test the performance of each program (vol0 through vol3 on x86_64, and vol0 through vol5 on AArch64)
    • Find a way to measure performance without the time taken to perform the test setup pre-processing (generating the samples) and post-processing (summing the results) so that you can measure only the time taken to scale the samples. This is the hard part!
    • How much time is spent scaling the sound samples?
    • Do multiple runs take the same time? How much variation do you observe? What is the likely cause of this variation?
    • Is there any difference in the results produced by the various algorithms?
    • Does the difference between the algorithms vary depending on the architecture and implementation on which you test?
    • What is the relative memory usage of each program?
  6. See if you can measurably increase performance by changing the compiler option (via the Makefile)
  7. Was your prediction about performance accurate?
  8. Find all of the questions, marked with Q:, in the program comments, and answer those questions.

Deliverables

Blog about your experiments with a detailed analysis of your results, including memory usage, performance, accuracy, and trade-offs. Include answers to all of the questions marked with Q: in the source code.

Make sure you convincingly prove your results to your reader! Re-read the section marked Important above and make sure you address the issues explained there. Also be sure to explain what you're doing so that a reader coming across your blog post understands the context (in other words, don't just jump into a discussion of optimization results -- give your post some context).

Optional - Recommended: Compare results across several implementations of AArch64 and x86_64 systems. Note that on different CPU implementations, the relative performance of different algorithms will vary; for example, table lookup may outperform other algorithms on a system with a fast memory system (cache), but not on a system with a slower memory system.

  • For AArch64, you could compare the performance on AArchie against a Raspberry Pi 4 (in 64-bit mode) or an ARM Chromebook.
  • For x86_64, you could compare the performance of different processors, such as portugal.cdot.systems, your own laptop or desktop, and Seneca systems such as Matrix or lab desktops.

Things to consider

Design of Your Tests

  • Most solutions for a problem of this type involve generating a large amount of data in an array, processing that array using the function being evaluated, and then storing that data back into an array. The test setup can take more time than the actual test! Make sure that you measure the time taken for the code in question (the part that scales the sound samples) ONLY -- you need to be able to remove the rest of the processing time from your evaluation.
  • You may need to run a massive large amount of sample data through the function to be able to detect its performance.
  • If you do not use the output from your calculation (e.g., do something with the output array), the compiler may recognize that, and remove the code you're trying to test. Be sure to process the results in some way so that the optimizer preserves the code you want to test. It is a good idea to calculate some sort of verification value to ensure that both approaches generate the same results.
  • Be aware of what other tasks the system is handling during your test run, including software running on behalf of other users.

Tips

Idea.png
Analysis
Do a thorough analysis of the results. Be certain (and prove!) that your performance measurement does not include the generation or summarization of the test data. Do multiple runs and discard the outliers. Decide whether to use mean, minimum, or maximum time values from the multiple runs, and explain why you made that decision. Control your variables well. Show relative performance as percentage change, e.g., "this approach was NN% faster than that approach".
Idea.png
Time and Memory Usage of a Program
You can get basic timing information for a program by running time programName -- the output will show the total time taken (real), the amount of CPU time used to run the application (user), and the amount of CPU time used by the operating system on behalf of the application (system). The version of the time command located in /bin/time gives slightly different information than the version built in to bash -- including maximum resident memory usage: /bin/time ./programName
Idea.png
SOX
If you want to try this with actual sound samples, you can convert a sound file of your choice to raw 16-bit signed integer PCM data using the sox utility present on most Linux systems and available for a wide range of platforms.
Idea.png
stdint.h
The stdint.h header provides definitions for many specialized integer size types. Use int16_t for 16-bit signed integers and uint16_t for 16-bit unsigned integers.
Idea.png
Scripting
Use bash scripting capabilities to reduce tedious manual steps!