49
edits
Changes
no edit summary
</pre>
=== Example SSE Examples ===
For SSE >= SSE4.1, to multiply two 128-bit vector of signed 32-bit integers, you would use the following Intel intrisic function:
Here is a link to an interactive guide to Intel Intrinsics: [https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2 Intel Intrinsics SSE,SSE2,SSE3,SSSE3,SSE4.1,SSE4.2]
== Vectorization Examples ==
[INSERT IMAGE HERE]
A pointer alias means that two pointers point to the same location in memory or the two pointers overlap in memory.
If you compile the vec_samples project with the `NOALIAS` macro, the `<code>matvec` </code> function declaration will include the `<code>restrict` </code> keyword. The `<code>restrict` </code> keyword will tell the compiler that pointers `<code>a` </code> and `<code>b` </code> do not overlap and that the compiler is free optimize the code blocks that uses the pointers.
[INSERT IMAGE HERE]
</source>
To learn more about the `<code>restrict` </code> keyword and how the compiler can optimize code if it knows that two pointers do not overlap, you can visit this StackOverflow thread: [https://stackoverflow.com/a/30827880 What does the restrict keyword mean in C++?]
=== Loop-Carried Dependency ===
Pointers that overlap one another may introduce a loop-carried dependency when those pointers point to an array of data. The vectorizer will make this assumption and, as a result, will not auto-vectorize the code.
In the code example below, `<code>a` </code> is a function of `<code>b`</code>. If pointers `<code>a` </code> and `<code>b` </code> overlap, then there exists the possibility that if `<code>a` </code> is modified then `<code>b` </code> will also be modified, and therefore may create the possibility of a loop-carried dependency. This means the loop cannot be vectorized.
<source lang="cpp">
=== Alignment ===
To align data elements to an `<code>x` </code> amount of bytes in memory, use the `<code>align` </code> macro.
Code snippet that is used to align the data elements in the 'vec_samples' project.
To address this issue, add some padding.
For example, if you have a `<code>4 x 19` </code> array of floats, and your system access to a 128-bit vector registers, then you should add 1 column to make the array `<code>4 x 20` </code> so that the number of columns is evenly divisible by the number of floats that can be loaded onto a 128-bit vector register, which is 4 floats.
[INSERT IMAGE HERE]
= Summary =