28
edits
Changes
→OpenMP
Keep in mind: this really only happens on the particle level, and not even to all particles. Only particles with low mass and high energy are capable of quantum tunneling, at least consistently.
=OpenMP=
OpenMP is a parallel programming API provided by Intel for C, C++, and Fortran. It enables flexible implementation of parallel algorithms, allowing computers of all builds to utilize all cores it has access to.
In this project, the <nowiki>#pragma omp parallel for</nowiki> statement was used in several locations in the program where for loops had no external dependencies. Where there were dependencies, math was used in such a way that the for loops no longer required had the external variablesdependencies. Their usage will be discussed further down.
=Program=
==Without Parallel ProcessesOpenMP==
[[File:FFT.png|thumb|fig 3.1 Fourier transformation code block.]]
[[File:NonParallel.png|1000px|center|Non-Parallel Process]]
==With Optimized Parallel ProcessesOpenMP and Optimization==
[[File:PFFT.png|thumb|fig 4.1 Fourier transformation code block, with OpenMP parallelization.]]
[[File:PFT.png|thumb|fig 4.2 Fourier transformations called in OpenMP.]]
[[File:Parallel.png|1000px|center|Non-Parallel Process]]
=Comparison and Analysis=
Below, you can see the analysis provided by the VTune Amplifier, through visual studio. Here we can see the total elapsed time as well as the overhead time that the program used:
[[File:initial_analysis.png|1000px750px|centernone|Initial performance ]] [[File:initial_histo_analysis.png|750px|none|First histogram analysis]]
[[File:initial_analysis_parallel_vs_serial.png|750px|none|Initial comparison between parallel and serial implementations]]
==Parallelized and Dynamic Performance==
[[File:FFT_crop.png|500px|none|Hotspot]] The issue is the <code>T *=phiT;</code> line. This means that every time the loop counter <code>l</code> increases, <code>T</code> gets multiplied by <code>phiT</code>. This statement seems painfully obvious, but it prevents us from parallelizing the code since the iterations can't be done in an arbitrary order. What it does mean, however, is that we can remove that line and replace any usage of <code>T</code> with <code>pow(phiT, l)</code>. We can then parallelize it, since the iterations are now order-invariant. When we do that, the FPS somehow does not change. In fact, if we remove the parallel for construct, the FPS drops to a meager 1 frame per second. This is awful, and likely because the <code>pow()</code> operation is very computationally expensive. Are we stuck? Of course not. We can apply math. There is a property of complex numbers which allows us to turn exponentiation into multiplication. If we write the complex number <code>phiT</code> as <code>phiT =Parallelized cos(arg) + i * sin(arg)</code>, and Dynamic Performance=we can since it has norm 1, we have <code>phiT ** l =cos(l * arg) + i * sin(l * arg)</code>. This gave a tremendous speedup since the trigonometric functions are apparently less costly than exponentiation. The code and new vTune analyses are below. [[File:PFFT_crop_no_dynamic.png|500px|none|No dynamic]]
[[File:after_fft_hist.png|750px|none|After FFT analysis]]
[[File:PFFT_crop.png|500px|none|Dynamic]]
[[File:after_dynamic_histoafter_dynamic_analysis.png|1000px750px|centernone|Dynamic HistogramPerformance]]
[[File:after_dynamic_histo.png|750px|none|Dynamic Histogram]]
=Conclusion=