Simulating Quantum Tunneling With OpenMP
Members:
Introduction
The concept of quantum tunneling is a subset of the Quantum Mechanics branch of theoretical physics. The core concept of Quantum Mechanics is that on a microscopic level, particles behave strangely, often in counter-intuitive ways. Quantum Tunneling refers to the phenomenon in which particles pass through barriers if the particles have enough energy and if the barrier is thin enough. In essence, said particles 'ignore' the barrier, continuing on as if nothing were there at all.
Visualizing
A good way to contrast quantum tunneling with intuition is to consider the following scenario: picture yourself at the bottom of a large hill. You have a tennis ball that you want to roll up the hill far enough that it rolls down the other side. If you don't give it enough energy to reach the top of the hill, the ball will merely roll back to you.
Now replace the tennis ball with an electron. The electron, intuitively, would require a specific amount of energy needed to surpass a barrier, such as a gap of air. However, quantum mechanics and intuition tend not to occupy the same space! The electron, instead, always has a probability of passing through the barrier without ever having come in contact with the barrier in the first place. If this happens, the electron's probability will from there on out be lower than it was prior to tunneling through the barrier.
Keep in mind: this really only happens on the particle level, and not even to all particles. Only particles with low mass and high energy are capable of quantum tunneling, at least consistently.
OpenMP
OpenMP is a parallel programming API provided by Intel for C, C++, and Fortran. It enables flexible implementation of parallel algorithms, allowing computers of all builds to utilize all cores it has access to.
In this project, the #pragma omp parallel for statement was used in several locations in the program where for loops had no external dependencies. Where there were dependencies, math was used in such a way that the for loops no longer required the external variables. Their usage will be discussed further down.
Program
Without Parallel Processes
Originally, this program would calculate the path of a particle using Fourier transformations. These were used in place of the time resource consuming Schrodinger equations because the Schrodinger equations require the simulation to take into account, at each point: the potential energy after a half step, the kinetic energy after a whole step, then finally going back to take the potential energy of the half step. This was even more complicated because, for each particle, every neighboring particle had to be analyzed and accounted for as well. The Fourier transformations converted this into a simple process of multiplication, as shown in the code in figure 1.2:
After the values for all the particles' energies are calculated, they are rendered on the screen. Also rendered on the top left are the frames per second being rendered on the screen.
With Optimized Parallel Processes
With the introduction of OpenMP into this project, several processes could be done in parallel. In the evolve() function, whenever a for loop was called, it could be parallelized because none of them had external variable dependencies. That eliminated quite a bit of overhead in the runtime. With the inclusion of OpenMP, the Fourier function, itself, was condensed substantially. This was achieved by introducing the Complex type into the program, so that complex calculations could be done in-line. The dynamic call to #pragma omp parallel also cut out a considerable amount of idle time that the CPU spent waiting to initialized all the threads that the program indicated that it required, rather than created threads on a need basis.
On this screen, the parallelized version of this code is running. Observe that the framerate is considerably better than the non-OpenMP program.
Comparison and Analysis
Initial Performance
Below, you can see the analysis provided by the VTune Amplifier, through visual studio. Here we can see the total elapsed time as well as the overhead time that the program used:
What these graphs and figures are telling us is that we've efficiently parallelized the lowest of the low hanging fruit. The parallel regions themselves are reasonably efficient at what they do, but as it turns out, they contributed very, very little to the computational load in the first place. A lot of CPU time is spent at synchronization barriers, and there is a major hotspot which can still be parallelized.
Parallelized and Dynamic Performance
We parallelized the code for rendering and state evolution above. We did not, however, yet parallelize the FFT code. The reason for this is that we initially thought that there was a loop-carried dependency that could not be resolved:
Conclusion
After countless hours programming this simulation, incorporating OpenMP into the finished program did not prove to be very difficult. The most challenging part of including it was identifying the bottleneck for the performance, where CPU idle time was occurring. After the Fourier Transformation Function was identified as the cause of the bottleneck, the loop-carried dependency was factored out and an OpenMP for loop was added. After observing that there was still a considerable amount of CPU idle time, the program was changed to include a dynamic version of the OpenMP for loop, which indicated to the program to dynamically load-balance threads, as opposed to wasting time creating a (statically) set amount of threads that it may not have much to actually compute. After the dynamic scheduling was added, the program jumped to a consistent efficiency and run time.