Open main menu

CDOT Wiki β

Midnight Tiger

Revision as of 09:24, 5 April 2016 by Taeyang Chung (talk | contribs) (How to parallelize your code using Chapel Cray)

Chapel

Team Information

  1. Taeyang Chung

Introduction to Chapel

Chapel(Cascade High-Productivity Language) is an alternative parallel programming language that focuses on the productivity of high-end computing systems.

The concept of "Productivity" is somewhat more special that we might think.

Who are interested in Parallel Programming and what do they want?

* Student: I want something that is similar to the languages that I learned in school such as c++, c, and Java. I want it to be easy to implement the parallel programming on my code.

* HPC Programmers: I want the full control that gives me more spot to increase the performance.

* Computational Scientists: I want something that I can easily implement my computation without knowing much architecture knowledge.


Chapel: Chapel is the language that is easy to implement the parallel computation and similar to other languages with granting full control to the users.

Advantages of Chapel

From one of Cray articles

  • General Parallelism: Chapel has the goal of supporting any parallel algorithm you can conceive of on any parallel hardware you want to target. In particular, you should never hit a point where you think “Well, that was fun while it lasted, but now that I want to do x, I’d better go back to MPI.”
  • Separation of Parallelism and Locality: Chapel supports distinct concepts for describing parallelism (“These things should run concurrently”) from locality (“This should be placed here; that should be placed over there”). This is in sharp contrast to conventional approaches that either conflate the two concepts or ignore locality altogether.
  • Multiresolution Design: Chapel is designed to support programming at higher or lower levels, as required by the programmer. Moreover, higher-level features—like data distributions or parallel loop schedules—may be specified by advanced programmers within the language.
  • Productivity Features: In addition to all of its features designed for supercomputers, Chapel also includes a number of sequential language features designed for productive programming. Examples include type inference, iterator functions, object-oriented programming, and a rich set of array types. The result combines productivity features as in Python™, Matlab®, or Java™ software with optimization opportunities as in Fortran or C.

Installation Process

  • On commandline, type tar xzf chapel-1.12.0.tar.gz
>tar xzf chapel-1.12.0.tar.gz
  • Go to the chapel folder
>cd $CHPL_HOME
  • Type gmake or make on command line to build the compiler
>gmake

or

>make

How to compile sample code

  • How to compile
>chpl -o execution_name file_name
  • How to run
>./execution_name
  • How to run with the configuration of const variable
>./execution_name --variable_name=value
  • Chapel offers some example codes in examples folder. Use one of the files to test.
>cd $CHPL_HOME/examples

How to parallelize your code using Chapel

  • Task: A unit of computation


  • Thread: A hardware resource where a task can be mapped to.

Iterations in a loop will be executed in parallel.

  • forall: When the first iteration starts the tasks will be created on each available thread. It's recommended to use when the iteration size is bigger than the number of threads. It might not run in parallel, when there is not enough thread available.
  • Sample Parallel Code using forall
config const n = 10;
forall i in 1..n do
   writeln("Iteration #", i," is executed.");    
  • Sample Parallel Code Output using forall

 

  • coforall: A task will be created at each iteration. It's recommended to use coforall when the computation is big and the iteration size is equal to the total number of logical cores(thread). Parllel is mandatory here.
  • Sample Parallel Code using coforall
config const n = here.numCores;
corforall i in 1..n do
   writeln("Thread #", i,"of ", n);    
  • Sample Parallel Code Output using coforall

 

  • begin: Each begin statement will create a different task.
  • Sample Parallel Code using begin
begin writeln("There is an apple");
begin writeln("There is a banana");
begin writeln("There is an orange");
begin writeln("There is a melon");
  • Sample Parallel Code Output using begin

 

  • cobegin: the each statement in cobegin block will be parallelized.
  • Sample Parallel Code using cobegin
cobegin {
 writeln("Item 1 is loaded");
 writeln("Item 2 is loaded");
 writeln("Item 3 is loaded");
}
writeln("All the items are loaded");
  • Sample Parallel Code Output using cobegin


 

For more detail: http://faculty.knox.edu/dbunde/teaching/chapel

Demonstration of Sample Code

While learning Chapel, I found a pi program for serial & parallel. I tweaked a little bit to remove errors in the code and added few more lines to check the performance. This is tested on dual-core computer. You can find original code here: http://chapel.cray.com/tutorials/SC10/

Serial Pi Program

use Random;
use Time;

config const n = 100000, // number of random points to try
seed = 589494289; // seed for random number generator
const ts = getCurrentTime();

writeln("Number of points = ", n);
writeln("Random number seed = ", seed);

var rs = new RandomStream(seed, parSafe=false);
var count = 0;

for i in 1..n do
if (rs.getNext()**2 + rs.getNext()**2) <= 1.0 then
count += 1;
const te = getCurrentTime();
writeln("Approximation of pi = ", count * 4.0 / n);
writeln("Integration :", te-ts);
delete rs;


Task Parallelized Pi Program

use Time;
use Random;

config const n = 100000,
       tasks = here.numCores,
       seed = 589494289;
const ts = getCurrentTime();

writeln("Number of points    = ", n);
writeln("Random number seed  = ", seed);
writeln("Number of tasks     = ", tasks);

var counts: [0..#tasks] int;
coforall tid in 0..#tasks {
  var rs = new RandomStream(seed, parSafe=false);
  const nPerTask = n/tasks,
        extras = n%tasks;
  rs.skipToNth(2*(tid*nPerTask + (if tid < extras then tid else extras)) + 1);

  var count = 0;
  for i in 1..nPerTask + (tid < extras) do
    count += (rs.getNext()**2 + rs.getNext()**2) <= 1.0;

  counts[tid] = count;

  delete rs;
}

var count = + reduce counts;
const te = getCurrentTime();
writeln("Approximation of pi = ", count * 4.0 / n);
writeln("Integration: ", te - ts);

Performance

 

Working in progress...

While developing my own program, I got the following issue.

 

Useful Links

This is the list of links that I found useful while learning the basic of Chapel

https://learnxinyminutes.com/docs/chapel/

https://www.cs.colostate.edu/wiki/Chapel_language

http://www.cray.com/blog/chapel-productive-parallel-programming/

http://www.cray.com/blog/six-ways-to-say-hello-in-chapel-part-1/

http://chapel.cray.com/publications/cug06.pdf

https://www.youtube.com/watch?v=lo3a_b34zX0

http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture02-types.pdf

http://chapel.cray.com/presentations/ChapelForCopenhagen-presented.pdf

http://chapel.cray.com/tutorials/

http://faculty.knox.edu/dbunde/teaching/chapel/

http://chapel.cray.com/tutorials/SC10/MonteCarloPi.pdf

http://chapel.cray.com/tutorials/NOTUR09/NOTUR-5-DATAPAR.pdf

http://faculty.knox.edu/dbunde/teaching/chapel