Changes

Jump to: navigation, search

Benchmarking

3,554 bytes added, 23:42, 11 January 2015
Created page with '{{Chris Tyler Draft}}Category:SPO600 A '''bench mark''' was originally a surveyor's mark used to record a reference elevation. A surveyor would literally mark a pillar, post,…'
{{Chris Tyler Draft}}[[Category:SPO600]]
A '''bench mark''' was originally a surveyor's mark used to record a reference elevation. A surveyor would literally mark a pillar, post, or stone with the reference elevation, which would correspond to the height at which they would place their bench (platform for measuring equipment) to measure other elevations. (See the [http://en.wiktionary.org/wiki/benchmark#Etymology etymology of benchmark at Wiktionary].

== Benchmarking Software ==

In the software industry, benchmarking means measuring performance in a reliable, repeatable way. This is done to compare the relative performance of:
* two versions of the same software (to gauge the effect of changes made to the software);
* two different pieces of software which do the same thing (e.g., two webservers);
* the same software running with different libraries or operating systems (e.g., apache under Windows and OS/X);
* the same software built in two different ways (e.g., using different compilers or optimization options); or
* the same software on two different computers (x86_64 vs mainframe) or computer configurations (SSD vs. hard disk, or 8GB RAM vs 64GB RAM).

== Factors to Control ==

In order to produce reliable, repeatable results, variable must be controlled or eliminated. The most common variables affecting performance results on a system are:
* the data being processed;
* the state of caches;
* other activity on the system (other processes, users, network activity, and so forth).

== Typical Benchmark Process ==

=== Execution-time Benchmarks ===
# Decide on the processing to be benchmarked. It is best to avoid all human interaction (user interfaces) and use data that is consistent (data sets should be provided by a file, random numbers should be generated by a PRNG given identical keys, etc). Pick a data set size that is
# Disable any unnecessary background processing (daemons, cron jobs, screen sessions, and so forth).
# Warm up disk and network caches by doing an initial program run and discard the results.
# Execute the benchmark process several times, recording the execution time for each run. If the results are not consistent, determine why and eliminate the variation.

{{Admon/tip|Timing Commands on a Linux System|You can record the execution time of a command on a Linux system using the <code>time</code> command. However, this includes command startup and shutdown time, and is not suitable for all purposes.}}

=== Volume-of-Work Benchmarks ===
For some operations, especially such as serving web pages or remote storage, it may be more appropriate to determine how much data (e.g., web requests) can be served in a given amount of time.

# Decide on the processing to be benchmarked, and which program will be used to generate the test load (e.g., a program to request web pages, such as httpbench, or a program to generate storage requests, such as bonnie++).
# Decide whether the load generator should be run on the same system as the server, or on another network-connected system (ensure that the network connection is fast enough that it will not be the limiting factor).
# Set up the server.
# Run the benchmark several times. Discard the first result (it may be affected by cache state).
# If the results are not consistent, determine why and eliminate the variation.

== Comparing Different Systems ==

To compare benchmarks on different systems or with different software, it is important to configure the systems as similarly as possible. Doing so is left as an exercise for the reader :-)

Navigation menu