Difference between revisions of "Dhrystone howto"

From CDOT Wiki
Jump to: navigation, search
(Installation)
 
(27 intermediate revisions by the same user not shown)
Line 1: Line 1:
<font style="font-size:90%">This Page serves as a guide for running Dhrystone benchmark on arm machines: [http://zenit.senecac.on.ca/wiki/index.php/Supporting_Architectures_above_armv5tel Go back to Main Project Page]</font>
+
<font style="font-size:90%">This Page serves as a guide for running Dhrystone benchmark on arm machines :: Please visit the [http://zenit.senecac.on.ca/wiki/index.php/Supporting_Architectures_above_armv5tel Main Project Page] </font>
 
=About Dhrystone=
 
=About Dhrystone=
 
<font style="font-size:90%">From Wikipedia, the free encyclopedia</font>
 
<font style="font-size:90%">From Wikipedia, the free encyclopedia</font>
Line 42: Line 42:
 
independent certification means that customers are dependent on processor vendors to
 
independent certification means that customers are dependent on processor vendors to
 
quote accurate and meaningful Dhrystone data.
 
quote accurate and meaningful Dhrystone data.
 +
 +
== What Dhrystone really does ==
 +
<font style="font-size:90%">From Clarify.doc (Included in Dhrystone 2.1), Rick Richardson</font><br />
 +
 +
    <ul>
 +
      <li> DHRYSTONE is a measure of processor+compiler efficiency in
 +
  executing a 'typical' program.  The 'typical' program was
 +
  designed by measuring statistics on a great number of
 +
  'real' programs.  The 'typical' program was then written
 +
  by Reinhold P. Weicker using these statistics.  The
 +
  program is balanced according to statement type, as well
 +
  as data type.</li>
 +
 +
<li>DHRYSTONE does not use floating point.  Typical programs don't.</li>
 +
 +
<li>DHRYSTONE does not do I/O.  Typical programs do, but then
 +
  we'd have a whole can of worms opened up.</li>
 +
 +
<li>DHRYSTONE does not contain much code that can be optimized
 +
  by vector processors.  That is why a CRAY doesn't look real
 +
  fast, they weren't built to do this sort of computing.</li>
 +
 +
<li>DHRYSTONE does not measure OS performance, as it avoids
 +
  calling the O.S.  The O.S. is indicated in the results only
 +
  to help in identifying the compiler technology.</li>
 +
 +
<li>DHRYSTONE is not perfect, but is a hell of a lot better than
 +
  the "sieve", or "SI".</li>
 +
 +
<li>DHRYSTONE gives results in dhrystones/second.  Bigger
 +
  numbers are better.  As a baseline, the original IBM PC
 +
  gives around 300-400 dhrystones/second with a good compiler.
 +
  The fastest machines today are approaching 100,000.</li>
 +
        </ul>
  
 
<font style="font-size:110%"><b>Dhrystone Characteristics</b></font>
 
<font style="font-size:110%"><b>Dhrystone Characteristics</b></font>
Line 96: Line 130:
 
<font style="font-size:110%"><b>3. Edit the Makefile</b></font>
 
<font style="font-size:110%"><b>3. Edit the Makefile</b></font>
  
Open Makefile with any text editor; '''UNCOMMENT''' (if commented) then '''EDIT''' the following fields:
+
Open Makefile with any text editor; '''UNCOMMENT''' (if commented) then '''EDIT''' the following fields using the '''GIVEN''' values:
 
<blockquote>
 
<blockquote>
 
Line #25  
 
Line #25  
Line 119: Line 153:
 
<pre>OPTIMIZE=      -Ox -G2                # Optimization Level (MSC, 80286)</pre>
 
<pre>OPTIMIZE=      -Ox -G2                # Optimization Level (MSC, 80286)</pre>
 
</blockquote>
 
</blockquote>
== Project Leader(s) ==
 
  
Name(s) of primary people working on the project. If you want to join a project as leader, discuss with other leaders first. Include links to personal pages within wiki
+
<font style="font-size:110%"><b>Makefile snapshot</b></font>
 +
[[Image:Dhry21.png|center]]
 +
 
 +
 
 +
<font style="font-size:110%"><b>Compiler Optimization Options</b></font>
 +
 
 +
<font style="font-size:90%">Please see more about [http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html GCC ARM-Options]</font>
 +
 
 +
The options used for lines #39~40 are for optimizing the dhrystone install to run specifically with '''armv7''' architecture. Optimizations provide a performance boost for the program. Removing the optimizations would result in a ''nominal'' program performance.
 +
 
 +
<font style="font-size:110%"><b>4. Run "make"</b></font>
 +
 
 +
Running make in the current directory should only produce warnings!! Here is an output of the make command with warnings relating to c library functions that can be ignored.
 +
 
 +
<blockquote>
 +
<pre>[mjeamiguel@cdot-beagleXM-0-3 dhrystone]$ make
 +
gcc -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer  -DTIME -DHZ=166    
 +
dhry_1.c dhry_2.c  -o gcc_dry2
 +
dhry_1.c:31: warning: conflicting types for built-in function ‘malloc’
 +
dhry_1.c: In function ‘main’:
 +
dhry_1.c:98: warning: incompatible implicit declaration of built-in function ‘strcpy’
 +
gcc -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer  -DTIME -DHZ=166    
 +
-DREG=register dhry_1.c dhry_2.c  -o gcc_dry2reg
 +
dhry_1.c:31: warning: conflicting types for built-in function ‘malloc’
 +
dhry_1.c: In function ‘main’:
 +
dhry_1.c:98: warning: incompatible implicit declaration of built-in function ‘strcpy’
 +
</pre>
 +
</blockquote>
 +
 
 +
= Running the benchmark and gathering results =
 +
 
 +
The make command outputs 2 files named '''gcc_dry2''' and '''gcc_dry2reg'''. The author of this version decided to create 2 dhrystone executables . One with register variables, and one without. Either one will work for the benchmark so, feel free to test it out.
 +
 
 +
 
 +
<font style="font-size:110%><b>1. Run the executable by typing ''./gcc_dry2''</b></font>
 +
 
 +
The program will start to ask you for a ''number of runs''. By convention on newer machines (>=1GHz), the number of runs used is about '''~100000000''' (100 million). There are no rules or standards about how many runs it should be. Some people calculate the number of runs through "Dhrystone run time erros" (way too advanced); what matters is the consistency of the result. For consistent results, dhrystone is executed more than 5 times with same values for number of runs.
 +
 
 +
 
 +
<font style="font-size:110%><b>2. Calculate for DMIPS</b></font>
  
== Project Contributor(s) ==
+
One common representation of the Dhrystone benchmark is DMIPS. DMIPS (Dhrystone MIPS ). It is obtained when the Dhrystone score is divided by '''1757''' (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine).
  
Name(s) of people casually working on the project, or who have contributed significant help.  Include links to personal pages within wiki
+
Given the result:
  
NOTE: only Project Leader(s) should add names hereYou '''can’t''' add your own name to the Contributor list.
+
<blockquote><code>
 +
Microseconds for one run through Dhrystone:   0.8<br />
 +
Dhrystones per Second:                    1333333.4
 +
</code></blockquote>  
  
== Project Details ==
+
Using the formula:
  
Provides more depth than the Project DescriptionThis is the place for technical discussions, project specs, or other details.  If this gets very long, you might consider breaking this part into multiple pages and linking to them.
+
'''1333333.4 / 1757 = 758.87''' ''DMIPS''
  
== Project Plan ==
 
  
Goals for each release:
+
The result shown was an actual test for a beagleboardXM machine with 1GHz of processor speed.
* 0.1
 
* 0.2
 
* 0.3
 
  
== Project News ==
+
= And... =
  
This is where your regular updates will go.  In these you should discuss the status or your work, your interactions with other members of the community (e.g., Seneca and Mozilla), problems you have encountered, etc.
+
<font style="font-size:90%">From ARM White paper</font><br />
  
Put detailed technical information into the Project Details page (i.e., update it as you go), and save this section for news about participation in the project.
+
"When first released, the Dhrystone benchmark fulfilled a useful function – at least it gave an alternative indicator to vendors’ literal MIPS ratings. However, more than twenty years later, there are undoubtedly better benchmarks available for measuring processor performance."

Latest revision as of 15:25, 30 December 2010

This Page serves as a guide for running Dhrystone benchmark on arm machines :: Please visit the Main Project Page

About Dhrystone

From Wikipedia, the free encyclopedia

Dhrystone is a synthetic computing benchmark program developed in 1984 by Reinhold P. Weicker intended to be representative of system (integer) programming. The Dhrystone grew to become representative of general processor (CPU) performance. Later the CPU89 benchmark suite from the Standard Performance Evaluation Corporation, today known as the "SPECint" suite was introduced, but SPEC programs are quite expensive whereas Dhrystone is free, therefore Dhrystone remains popular.

The name "Dhrystone" is a pun on a different benchmark algorithm called Whetstone.

With Dhrystone, Weicker gathered meta-data from a broad range of software, including programs written in FORTRAN, PL/1, SAL, ALGOL 68, and Pascal. He then characterized these programs in terms of various common constructs: procedure calls, pointer indirections, assignments, etc. From this he wrote the Dhrystone benchmark to correspond to a representative mix. Dhrystone was published in Ada, with the C version for Unix developed by Rick Richardson ("version 1.1") greatly contributing to its popularity.

The Dhrystone benchmark contains no floating point operations, thus the name is a pun on the then-popular Whetstone benchmark for floating point operations. The output from the benchmark is the number of Dhrystones per second (the number of iterations of the main code loop per second).

Both Whetstone and Dhrystone are synthetic benchmarks, meaning that they are simple programs that are carefully designed to statistically mimic the processor usage of some common set of programs. Whetstone, developed in 1972, originally strove to mimic typical Algol 60 programs based on measurements from 1970, but eventually became most popular in its Fortran version, reflecting the highly numerical orientation of computing in the 1960s.

Dhrystone Fundamentals

From ARM White Paper

Dhrystone has a number of attributes that have led to it being widely used in the past as a measure of CPU performance. Foremost, Dhrystone is compact, widely available in the public domain, and simple to run. Significantly, there are no lengthy certification processes to go through before citing Dhrystone figures. Dhrystone compares the performance of the processor under benchmark to that of a reference machine. This is an advantage over quoting ‘straight’ MIPS numbers since using a reference machine effectively compensates for differences in the richness of competing instruction sets. For example, literal comparison of the ‘millions of instructions per second’ numbers for a RISC architecture and a CISC architecture is not meaningful.

The industry has adopted the VAX 11/780 as the reference 1 MIP machine. The VAX 11/780 achieves 1757 Dhrystones per second. The Dhrystone figure is calculated by measuring the number of Dhrystones per second for the system, and dividing that by 1757. So "80 MIPS" means "80 Dhrystone VAX MIPS", which means 80 times faster than a VAX 11/780. A DMIPS/MHz rating takes this normalization process one step further, enabling comparison of processor performance at different clock rates. For all of these reasons, in the past, Dhrystone has been a widely quoted benchmark figure. In theory, Dhrystone should provide a basis for the comparison of processor performances.

However, some of the apparent advantages of Dhrystone are also significant weaknesses of the benchmark. Dhrystone numbers actually reflect the performance of the C compiler and libraries, probably more so than the performance of the processor itself. Also, lack of independent certification means that customers are dependent on processor vendors to quote accurate and meaningful Dhrystone data.

What Dhrystone really does

From Clarify.doc (Included in Dhrystone 2.1), Rick Richardson

  • DHRYSTONE is a measure of processor+compiler efficiency in executing a 'typical' program. The 'typical' program was designed by measuring statistics on a great number of 'real' programs. The 'typical' program was then written by Reinhold P. Weicker using these statistics. The program is balanced according to statement type, as well as data type.
  • DHRYSTONE does not use floating point. Typical programs don't.
  • DHRYSTONE does not do I/O. Typical programs do, but then we'd have a whole can of worms opened up.
  • DHRYSTONE does not contain much code that can be optimized by vector processors. That is why a CRAY doesn't look real fast, they weren't built to do this sort of computing.
  • DHRYSTONE does not measure OS performance, as it avoids calling the O.S. The O.S. is indicated in the results only to help in identifying the compiler technology.
  • DHRYSTONE is not perfect, but is a hell of a lot better than the "sieve", or "SI".
  • DHRYSTONE gives results in dhrystones/second. Bigger numbers are better. As a baseline, the original IBM PC gives around 300-400 dhrystones/second with a good compiler. The fastest machines today are approaching 100,000.

Dhrystone Characteristics

Strengths

  • Written in C language Code (Allows code portability)
  • Small in size (An easy to understand program)
  • Single easy to report score (DMIPS which uses a reference VAX MIPS)
  • Potentially useful for 8 and 16-bit microcontroller benchmark

Weaknesses

  • Cannot hope to mimic the breadth of applications encountered by a processor-based system
  • Dhrystone only measures a few mathematical and basic operations
  • Does not measure multiply- accumulate, floating-point, SIMD, or any other type of operations
  • Dhrystone’s execution is largely spent in standard C library functions, such as strcmp(),strcpy(), and memcpy(). Compiler vendors generally provide these libraries that are typically optimized and hand-written in assembly language. While you may think you are benchmarking a processor, you are really benchmarking are the compiler writer’s optimizations of the C library functions for a particular platform

Installation

1. Obtaining the Source Code

One of the most important defects in Dhrystone is that it is often unclear what version is being quoted. Furthermore, since there are no "disclosure rules" or independent certification of scores, companies and individuals are free to state, or not state, anything. Due to its non proprietary nature, individuals and companies modified their own versions of Dhrystone resulting in various alterations of the original source code.

The following package is the most quoted, well used Dhrystone release. It is the cleanest/customisable Dhrystone out in the internet.

Dhrystone-2.1.tar.gz

2. Extract the file

Extract the tarball using the command:

tar xvf dhrystone-2.1.tar.gz -C destination_directory/

There will be a total of 19 files once extracted. Move to the directory where the extracted files are.

3. Edit the Makefile

Open Makefile with any text editor; UNCOMMENT (if commented) then EDIT the following fields using the GIVEN values:

Line #25 Fedora uses -DTIME for TIME function, this field is commented out by default

TIME_FUNC=     -DTIME                # Use times(2) for measurement

Line #28 Check motherboard specifications to determine the memory clock speed ( beagleboardXM runs at 166MHz DDR speed )

HZ=             166                  # Frequency of times(2) clock ticks

Line #39 This option is for C compiler

OPTIMIZE=       -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer # Optimization Level (generic UNIX)

Line #40 This option is for GCC compiler

GCCOPTIM=       -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer

Comment out/disable the following lines:

Line #26

TIME_FUNC=     -DTIMES                # Use times(2) for measurement

Line #38

OPTIMIZE=      -Ox -G2                 # Optimization Level (MSC, 80286)

Makefile snapshot

Dhry21.png


Compiler Optimization Options

Please see more about GCC ARM-Options

The options used for lines #39~40 are for optimizing the dhrystone install to run specifically with armv7 architecture. Optimizations provide a performance boost for the program. Removing the optimizations would result in a nominal program performance.

4. Run "make"

Running make in the current directory should only produce warnings!! Here is an output of the make command with warnings relating to c library functions that can be ignored.

[mjeamiguel@cdot-beagleXM-0-3 dhrystone]$ make
gcc -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer  -DTIME -DHZ=166			    
dhry_1.c dhry_2.c  -o gcc_dry2
dhry_1.c:31: warning: conflicting types for built-in function ‘malloc’
dhry_1.c: In function ‘main’:
dhry_1.c:98: warning: incompatible implicit declaration of built-in function ‘strcpy’
gcc -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -fomit-frame-pointer  -DTIME -DHZ=166			    
-DREG=register dhry_1.c dhry_2.c  -o gcc_dry2reg
dhry_1.c:31: warning: conflicting types for built-in function ‘malloc’
dhry_1.c: In function ‘main’:
dhry_1.c:98: warning: incompatible implicit declaration of built-in function ‘strcpy’

Running the benchmark and gathering results

The make command outputs 2 files named gcc_dry2 and gcc_dry2reg. The author of this version decided to create 2 dhrystone executables . One with register variables, and one without. Either one will work for the benchmark so, feel free to test it out.


1. Run the executable by typing ./gcc_dry2

The program will start to ask you for a number of runs. By convention on newer machines (>=1GHz), the number of runs used is about ~100000000 (100 million). There are no rules or standards about how many runs it should be. Some people calculate the number of runs through "Dhrystone run time erros" (way too advanced); what matters is the consistency of the result. For consistent results, dhrystone is executed more than 5 times with same values for number of runs.


2. Calculate for DMIPS

One common representation of the Dhrystone benchmark is DMIPS. DMIPS (Dhrystone MIPS ). It is obtained when the Dhrystone score is divided by 1757 (the number of Dhrystones per second obtained on the VAX 11/780, nominally a 1 MIPS machine).

Given the result:

Microseconds for one run through Dhrystone: 0.8
Dhrystones per Second: 1333333.4

Using the formula:

1333333.4 / 1757 = 758.87 DMIPS


The result shown was an actual test for a beagleboardXM machine with 1GHz of processor speed.

And...

From ARM White paper

"When first released, the Dhrystone benchmark fulfilled a useful function – at least it gave an alternative indicator to vendors’ literal MIPS ratings. However, more than twenty years later, there are undoubtedly better benchmarks available for measuring processor performance."