Open main menu

CDOT Wiki β

Team NP Complete

Revision as of 22:46, 21 December 2017 by Jali-clarke (talk | contribs) (Initial Performance)

Simulating Quantum Tunneling With OpenMP

Introduction

The concept of quantum tunneling is a subset of the Quantum Mechanics branch of theoretical physics. The core concept of Quantum Mechanics is that on a microscopic level, particles behave strangely, often in counter-intuitive ways. Quantum Tunneling refers to the phenomenon in which particles pass through barriers if the particles have enough energy and if the barrier is thin enough. In essence, said particles 'ignore' the barrier, continuing on as if nothing were there at all.

 
fig 1.1 Quantum tunnelling through a barrier. The energy of the tunnelled particle is the same but the probability amplitude is decreased.

Visualizing

A good way to contrast quantum tunneling with intuition is to consider the following scenario: picture yourself at the bottom of a large hill. You have a tennis ball that you want to roll up the hill far enough that it rolls down the other side. If you don't give it enough energy to reach the top of the hill, the ball will merely roll back to you.

Now replace the tennis ball with an electron. The electron, intuitively, would require a specific amount of energy needed to surpass a barrier, such as a gap of air. However, quantum mechanics and intuition tend not to occupy the same space! The electron, instead, always has a probability of passing through the barrier without ever having come in contact with the barrier in the first place. If this happens, the electron's probability will from there on out be lower than it was prior to tunneling through the barrier.

Keep in mind: this really only happens on the particle level, and not even to all particles. Only particles with low mass and high energy are capable of quantum tunneling, at least consistently.


OpenMP

 
fig 2.1 A chart of OpenMP constructs.

OpenMP is a parallel programming API provided by Intel for C, C++, and Fortran. It enables flexible implementation of parallel algorithms, allowing computers of all builds to utilize all cores it has access to.

In this project, the #pragma omp parallel for statement was used in several locations in the program where for loops had no external dependencies. Where there were dependencies, math was used in such a way that the for loops no longer required the external variables. Their usage will be discussed further down.

Program

Without Parallel Processes

 
fig 3.1 Fourier transformation code block.
 
fig 3.2 Fourier transformations are called, in which the potential energy half step is calculated, then the full kinetic energy step, then finally the final potential energy half step.

Originally, this program would calculate the path of a particle using Fourier transformations. These were used in place of the time resource consuming Schrodinger equations because the Schrodinger equations require the simulation to take into account, at each point: the potential energy after a half step, the kinetic energy after a whole step, then finally going back to take the potential energy of the half step. This was even more complicated because, for each particle, every neighboring particle had to be analyzed and accounted for as well. The Fourier transformations converted this into a simple process of multiplication, as shown in the code in figure 1.2:

After the values for all the particles' energies are calculated, they are rendered on the screen. Also rendered on the top left are the frames per second being rendered on the screen.



With Optimized Parallel Processes

 
fig 4.1 Fourier transformation code block, with OpenMP parallelization.
 
fig 4.2 Fourier transformations called in OpenMP.

With the introduction of OpenMP into this project, several processes could be done in parallel. In the evolve() function, whenever a for loop was called, it could be parallelized because none of them had external variable dependencies. That eliminated quite a bit of overhead in the runtime. With the inclusion of OpenMP, the Fourier function, itself, was condensed substantially. This was achieved by introducing the Complex type into the program, so that complex calculations could be done in-line. The dynamic call to #pragma omp parallel also cut out a considerable amount of idle time that the CPU spent waiting to initialized all the threads that the program indicated that it required, rather than created threads on a need basis.

On this screen, the parallelized version of this code is running. Observe that the framerate is considerably better than the non-OpenMP program.


Comparison and Analysis

Initial Performance

Below, you can see the analysis provided by the VTune Amplifier, through visual studio. Here we can see the total elapsed time as well as the overhead time that the program used:

What these graphs and figures are telling us is that we've efficiently parallelized the lowest of the low hanging fruit. The parallel regions themselves are reasonably efficient at what they do, but as it turns out, they contributed very, very little to the computational load in the first place. A lot of CPU time is spent at synchronization barriers, and there is a major hotspot which can still be parallelized.

Parallelized and Dynamic Performance

Here, we can see the performance of the program after OpenMP has been applied to several functions:





Conclusion

After countless hours programming this simulation, incorporating OpenMP into the finished program did not prove to be very difficult. The most challenging part of including it was identifying the bottleneck for the performance, where CPU idle time was occurring. After the Fourier Transformation Function was identified as the cause of the bottleneck, external dependencies were factored out and an OpenMP for loop was added. After observing that there was still a considerable amount of CPU idle time, the program was changed to include a dynamic version of the OpenMP for loop, which indicated to the program to only create threads as it needed, as opposed to wasting time creating a set amount of threads that it may not use. After the dynamic for loop was added, the program jumped to a consistent efficiency and run time.