28
edits
Changes
→OpenMP
Keep in mind: this really only happens on the particle level, and not even to all particles. Only particles with low mass and high energy are capable of quantum tunneling, at least consistently.
=OpenMP=
OpenMP is a parallel programming API provided by Intel for C, C++, and Fortran. It enables flexible implementation of parallel algorithms, allowing computers of all builds to utilize all cores it has access to.
In this project, the <nowiki>#pragma omp parallel for</nowiki> statement was used in several locations in the program where for loops had no external dependencies. Where there were dependencies, math was used in such a way that the for loops no longer required had the external variablesdependencies. Their usage will be discussed further down.
=Program=
==Without Parallel ProcessesOpenMP==
[[File:FFT.png|thumb|fig 3.1 Fourier transformation code block.]]
[[File:NonParallel.png|1000px|center|Non-Parallel Process]]
==With Optimized Parallel ProcessesOpenMP and Optimization==
[[File:PFFT.png|thumb|fig 4.1 Fourier transformation code block, with OpenMP parallelization.]]
[[File:PFT.png|thumb|fig 4.2 Fourier transformations called in OpenMP.]]
[[File:Parallel.png|1000px|center|Non-Parallel Process]]
=Comparison and Analysis=
Below, you can see the analysis provided by the VTune Amplifier, through visual studio. Here we can see the total elapsed time as well as the overhead time that the program used:
[[File:initial_analysis.png|1000px750px|centernone|Initial performance ]] [[File:initial_histo_analysis.png|750px|none|First histogram analysis]]
[[File:initial_analysis_parallel_vs_serial.png|750px|none|Initial comparison between parallel and serial implementations]]
==Parallelized and Dynamic Performance==
[[File:PFFT_crop_no_dynamic.png|500px|none|No dynamic]]
[[File:after_dynamic_analysisafter_fft_analysis.png|1000px750px|centernone|Dynamic PerformanceAfter FFT analysis]]
[[File:after_fft_hist.png|750px|none|After FFT analysis]]
[[File:PFFT_crop.png|500px|none|Dynamic]]
[[File:after_fft_analysisafter_dynamic_analysis.png|1000px750px|centernone|After FFT analysisDynamic Performance]]
[[File:after_dynamic_histo.png|750px|none|Dynamic Histogram]]
=Conclusion=
After countless hours programming this simulation, incorporating OpenMP into the finished program did not prove to be very difficult. The most challenging part of including it was identifying the bottleneck for the performance, where CPU idle time was occurring. After the Fourier Transformation Function was identified as the cause of the bottleneck, external dependencies were the loop-carried dependency was factored out and an OpenMP for loop was added. After observing that there was still a considerable amount of CPU idle time, the program was changed to include a dynamic version of the OpenMP for loop, which indicated to the program to only create dynamically load-balance threads as it needed, as opposed to wasting time creating a (statically) set amount of threads that it may not usehave much to actually compute. After the dynamic for loop scheduling was added, the program jumped to a consistent efficiency and run time.