Optional Lab (Recommended!)
- Write a short program that creates two 1000-element integer arrays and fills them with random numbers in the range -1000 to +1000, then sums those two arrays element-by-element to a third array, and finally sums the third array and prints the result.
- Compile this program on one of the AArch64/ARM64 SPO600 Servers in such a way that the code is auto-vectorized.
- Annotate the emitted code (i.e., obtain a dissassembly via
objdump -d
and add comments to the instructions in<main>
explaining what the code does). - Write a blog post discussing your findings. Include:
- The source code
- The compiler command line used to build the code
- Your annotated dissassembly listing - Prove that the code is vectorized, for example, by pointing out the use of vector registers and SIMD instructions.
- Your reflections on the experience and the results
Resources
- Auto-Vectorization in GCC - Main project page for the GCC auto-vectorizer.
- Auto-vectorization with gcc 4.7 - An excellent discussion of the capabilities and limitations of the GCC auto-vectorizer, intrinsics for providing hints to GCC, and other code pattern changes that can improve results. Note that there has been some improvement in the auto-vectorizer since this article was written. This article is strongly recommended.
- Intel (Auto)Vectorization Tutorial - this deals with the Intel compiler (ICC) but the general technical discussion is valid for other compilers such as gcc and llvm