Difference between revisions of "SPO600 Vectorization Lab"
Chris Tyler (talk | contribs) |
Chris Tyler (talk | contribs) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | [[Category:SPO600 Labs]] | + | [[Category:SPO600 Labs - Retired]] |
{{Admon/lab|Purpose of this Lab|This lab is designed to explore single instruction/multiple data (SIMD) vectorization, and the auto-vectorization capabilities of the GCC compiler.}} | {{Admon/lab|Purpose of this Lab|This lab is designed to explore single instruction/multiple data (SIMD) vectorization, and the auto-vectorization capabilities of the GCC compiler.}} | ||
+ | {{Admon/tip|Tiny Lab|This is intended to be a very short lab. Don't overcomplicate it!}} | ||
+ | {{Admon/important|This lab is not used in the current semester.|Please refer to the other labs in the [[:Category:SPO600 Labs|SPO600 Labs]] category.}} | ||
− | |||
− | # Write a short program that creates two 1000-element integer arrays and fills them with random numbers, then sums those two arrays to a third array, and finally sums the third array | + | == Optional Lab (Recommended!) == |
− | # Compile this program on [[SPO600 Servers | + | |
+ | # Write a short program that creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000, then sums those two arrays element-by-element to a third array, and finally sums the third array and prints the result. | ||
+ | # Compile this program on one of the AArch64/ARM64 [[SPO600 Servers]] in such a way that the code is auto-vectorized. | ||
# Annotate the emitted code (i.e., obtain a dissassembly via <code>objdump -d</code> and add comments to the instructions in <code><main></code> explaining what the code does). | # Annotate the emitted code (i.e., obtain a dissassembly via <code>objdump -d</code> and add comments to the instructions in <code><main></code> explaining what the code does). | ||
− | # | + | # Write a blog post discussing your findings. Include: |
− | |||
#* The source code | #* The source code | ||
#* The compiler command line used to build the code | #* The compiler command line used to build the code | ||
− | #* Your annotated dissassembly listing | + | #* Your annotated dissassembly listing - '''Prove that the code is vectorized''', for example, by pointing out the use of vector registers and SIMD instructions. |
#* Your reflections on the experience and the results | #* Your reflections on the experience and the results | ||
− | |||
=== Resources === | === Resources === | ||
* [https://gcc.gnu.org/projects/tree-ssa/vectorization.html Auto-Vectorization in GCC] - Main project page for the GCC auto-vectorizer. | * [https://gcc.gnu.org/projects/tree-ssa/vectorization.html Auto-Vectorization in GCC] - Main project page for the GCC auto-vectorizer. | ||
* [http://locklessinc.com/articles/vectorize/ Auto-vectorization with gcc 4.7] - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this article was written. '''This article is strongly recommended.''' | * [http://locklessinc.com/articles/vectorize/ Auto-vectorization with gcc 4.7] - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this article was written. '''This article is strongly recommended.''' | ||
+ | * [https://software.intel.com/sites/default/files/8c/a9/CompilerAutovectorizationGuide.pdf Intel (Auto)Vectorization Tutorial] - this deals with the Intel compiler (ICC) but the general technical discussion is valid for other compilers such as gcc and llvm |
Latest revision as of 11:52, 2 October 2019
Optional Lab (Recommended!)
- Write a short program that creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000, then sums those two arrays element-by-element to a third array, and finally sums the third array and prints the result.
- Compile this program on one of the AArch64/ARM64 SPO600 Servers in such a way that the code is auto-vectorized.
- Annotate the emitted code (i.e., obtain a dissassembly via
objdump -d
and add comments to the instructions in<main>
explaining what the code does). - Write a blog post discussing your findings. Include:
- The source code
- The compiler command line used to build the code
- Your annotated dissassembly listing - Prove that the code is vectorized, for example, by pointing out the use of vector registers and SIMD instructions.
- Your reflections on the experience and the results
Resources
- Auto-Vectorization in GCC - Main project page for the GCC auto-vectorizer.
- Auto-vectorization with gcc 4.7 - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this article was written. This article is strongly recommended.
- Intel (Auto)Vectorization Tutorial - this deals with the Intel compiler (ICC) but the general technical discussion is valid for other compilers such as gcc and llvm