Midnight Tiger

From CDOT Wiki
Revision as of 23:42, 4 April 2016 by Taeyang Chung (talk | contribs) (How to parallelize your code using Chapel Cray)
Jump to: navigation, search

Chapel Cray

Team Information

  1. Taeyang Chung

Introduction to Chapel

Chapel(Cascade High-Productivity Language) is an alternative parallel programming language that focuses on the productivity of high-end computing systems.

The concept of "Productivity" is somewhat more special that we might think.

Who are interested in Parallel Programming and what do they want?

* Student: I want something that is similar to the languages that I learned in school such as c++, c, and Java. I want it to be easy to implement the parallel programming on my code.

* HPC Programmers: I want the full control that gives me more spot to increase the performance.

* Computational Scientists: I want something that I can easily implement my computation without knowing much architecture knowledge.


Chapel: Chapel is the language that easy to implement the parallel computation that similar to other languages with granting full control to the users.

Advantages of Chapel Cray

From one of Cray articles

  • General Parallelism: Chapel has the goal of supporting any parallel algorithm you can conceive of on any parallel hardware you want to target. In particular, you should never hit a point where you think “Well, that was fun while it lasted, but now that I want to do x, I’d better go back to MPI.”
  • Separation of Parallelism and Locality: Chapel supports distinct concepts for describing parallelism (“These things should run concurrently”) from locality (“This should be placed here; that should be placed over there”). This is in sharp contrast to conventional approaches that either conflate the two concepts or ignore locality altogether.
  • Multiresolution Design: Chapel is designed to support programming at higher or lower levels, as required by the programmer. Moreover, higher-level features—like data distributions or parallel loop schedules—may be specified by advanced programmers within the language.
  • Productivity Features: In addition to all of its features designed for supercomputers, Chapel also includes a number of sequential language features designed for productive programming. Examples include type inference, iterator functions, object-oriented programming, and a rich set of array types. The result combines productivity features as in Python™, Matlab®, or Java™ software with optimization opportunities as in Fortran or C.

Installation Process

  • On commandline, type tar xzf chapel-1.12.0.tar.gz
>tar xzf chapel-1.12.0.tar.gz
  • Go to the chapel folder
>cd $CHPL_HOME
  • Type gmake or make on command line to build the compiler
>gmake

or

>make

How to compile sample code

  • How to compile
>chpl -o execution_name file_name
  • How to run
>./execution_name
  • How to run with the configuration of const variable
>./execution_name --variable_name=value
  • Chapel offers some example codes in examples folder. Use one of the files to test.
>cd $CHPL_HOME/examples

How to parallelize your code using Chapel Cray

Iterations in a loop will be executed in parallel.

  • forall: At the beginning of the first iteration all the threads will be created on each core.
  • Sample Parallel Code using forall
config const n = 10;
forall i in 1..n do
   writeln("Iteration #", i," is executed.");    
  • Sample Parallel Code Output using forall

Output Example

  • coforall: A thread will be created at each iteration. It's recommended to use coforall when the inside of loop is big and the iteration size is equal to the total number of logical cores.
  • Sample Parallel Code using coforall
config const n = here.numCores;
corforall i in 1..n do
   writeln("Thread #", i,"of ", n);    
  • Sample Parallel Code Output using coforall

Output Example

  • begin: Each begin statement will create a different thread.
  • Sample Parallel Code using begin
begin writeln("There is an apple");
begin writeln("There is a banana");
begin writeln("There is an orange");
begin writeln("There is a melon");
  • Sample Parallel Code Output using begin

Output Example

  • cobegin: the each statement in cobegin block will be parallelized.
  • Sample Parallel Code using cobegin
cobegin {
 writeln("Item 1 is loaded");
 writeln("Item 2 is loaded");
 writeln("Item 3 is loaded");
}
writeln("All the items are loaded");
  • Sample Parallel Code Output using cobegin


Output Example

For more detail: http://faculty.knox.edu/dbunde/teaching/chapel

Demonstration of Sample Code

While learning Chapel, I found a pi program for serial & parallel. I tweaked a little bit to remove errors in the code and added few more lines to check the performance. This is tested on dual-core computer. You can find original code here: http://chapel.cray.com/tutorials/SC10/

Serial Pi Program

use Random;
use Time;

config const n = 100000, // number of random points to try
seed = 589494289; // seed for random number generator
const ts = getCurrentTime();

writeln("Number of points = ", n);
writeln("Random number seed = ", seed);

var rs = new RandomStream(seed, parSafe=false);
var count = 0;

for i in 1..n do
if (rs.getNext()**2 + rs.getNext()**2) <= 1.0 then
count += 1;
const te = getCurrentTime();
writeln("Approximation of pi = ", count * 4.0 / n);
writeln("Integration :", te-ts);
delete rs;


Task Parallelized Pi Program

use Time;
use Random;

config const n = 100000,
       tasks = here.numCores,
       seed = 589494289;
const ts = getCurrentTime();

writeln("Number of points    = ", n);
writeln("Random number seed  = ", seed);
writeln("Number of tasks     = ", tasks);

var counts: [0..#tasks] int;
coforall tid in 0..#tasks {
  var rs = new RandomStream(seed, parSafe=false);
  const nPerTask = n/tasks,
        extras = n%tasks;
  rs.skipToNth(2*(tid*nPerTask + (if tid < extras then tid else extras)) + 1);

  var count = 0;
  for i in 1..nPerTask + (tid < extras) do
    count += (rs.getNext()**2 + rs.getNext()**2) <= 1.0;

  counts[tid] = count;

  delete rs;
}

var count = + reduce counts;
const te = getCurrentTime();
writeln("Approximation of pi = ", count * 4.0 / n);
writeln("Integration: ", te - ts);

Performance

Performance

Working in progress...

While developing my own program, I got the following issue.

bug

Useful Links

This is the list of links that I found useful while learning the basic of Chapel

https://learnxinyminutes.com/docs/chapel/

https://www.cs.colostate.edu/wiki/Chapel_language

http://www.cray.com/blog/chapel-productive-parallel-programming/

http://www.cray.com/blog/six-ways-to-say-hello-in-chapel-part-1/

http://chapel.cray.com/publications/cug06.pdf

https://www.youtube.com/watch?v=lo3a_b34zX0

http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture02-types.pdf

http://chapel.cray.com/presentations/ChapelForCopenhagen-presented.pdf

http://chapel.cray.com/tutorials/

http://faculty.knox.edu/dbunde/teaching/chapel/

http://chapel.cray.com/tutorials/SC10/MonteCarloPi.pdf

http://chapel.cray.com/tutorials/NOTUR09/NOTUR-5-DATAPAR.pdf

http://faculty.knox.edu/dbunde/teaching/chapel