Changes

← Older edit

Team Hortons

5,451 bytes added, 14:10, 15 December 2017

→‎Sources

= ~~Parallelization of the C++ Standard Library: Intel Parallel STL, MCSTL, and C++17~~ Parallelism with Rust =

== Introduction: The ~~Standard Template Library~~ Rust Programming language ==The Standard Template Library (STL) for C++ is a library that provides a set of classes and functions for C++, like iterators and vectors. The STL is divided in four parts: algorithms, containers, functors, and iterators. For this project, we will focus on the ''algorithms'' library.

~~The algorithms library provides many functions~~ Rust is a system programming language (like C and C++, offering direct access to ~~be used for ranges of elements~~physical memory) sponsored by Mozilla, created in 2010. It is compiled, imperative, functional and ~~use iterators or pointers~~ strongly typed; it aims to ~~navigate~~ provide better memory safety and still maintain a performance similar to C++, while providing garbage collection through ~~them~~reference counting. Rust won the first place as the "most loved programming language" in 2016 and 2017, in a survey from Stack Overflow [https://insights. ~~Some of these algorithms are~~stackoverflow.com/survey/2016#technology-most-loved-dreaded-and-wanted 1][https://insights.stackoverflow.com/survey/2017#most-loved-dreaded-and-wanted 2].

* '''all_of''' - Tests if Rust has a ~~condition is true for all the elements~~ very strong emphasis in security, more specifically, memory safety. Because of this, the ~~range~~* '''for_each''' - Applies a function to all elements in same rules that the ~~range~~* '''find''' - Finds compiler requires from a ~~value~~ program end up solving many problems that are common in ~~the range~~* '''sort''' - Sorts the elements concurrent executions, such as race conditions. This project will explain what these security rules are, how they can prevent problems in parallel programming, as well as demonstrating the ~~range~~correct way to tackle concurrency with Rust.

~~These algorithms provide a ''functional'' way to work on collections,~~ == Memory Ownership and ~~are present in several programming languages. These algorithms replace what usually would be a for/while loop, shifting the attention from "how to loop" to "what to do".~~Borrowing ==

~~<pre><code>#include <iostream>#include <algorithm>#include <~~Rust has a very peculiar way to deal with memory deallocation, security, and ownership. Understanding how memory management works is essential in order to understand how to use parallel programming in Rust. Consider the following example, where we create a vector>, push some elements into it, and then print the elements:

~~using namespace std~~<source lang="rust">fn main() { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2);

~~int main~~ for i in vec.iter(){ println!("{}", i); }}</source> Output: <source>012</source> Notice how variables in Rust are immutable (constant) by default - we need to manually specify that the vector can be changed. Failing to do so will result in a compilation error. In the next example, we abstracted the "push" method to a function called "add_to_vec", which takes a vectorand a number, and pushes the number into the vector: <~~int~~source lang="rust">fn add_to_vec(mut v: Vec<i32> ~~myVector~~ , n: i32) { v.push(n);} fn main() { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2); add_to_vec(vec, 3); for i in vec.iter() { 10 println!("{}", i); }}</source> In a language like C++, a vector passed like this would be copied to the function and modified, 6leaving the original vector untouched; in a language like Java, 7a similar logic would work, 8since objects are passed by reference by default. Rust, on the other hand, has a different approach: sending the vector to a function like this means transferring the ownership of the object completely to that function. This means that after the vector is sent to a function, 5it is not available to the main function anymore - once the '''add_to_vec''' function ends, 4the vector will then be deallocated, 1and the next *for* loop will fail. If we try compiling this code, 2the compiler will produce an error: <source>error[E0382]: use of moved value: `vec` --> src/main.rs:13:12 |11 | add_to_vec(vec, 3); | --- value moved here12 | 13 | for i in vec.iter() { | ^^^ value used here after move | = note: move occurs because `vec` has type `std::vec::Vec<i32>`, ~~9 };~~which does not implement the `Copy` trait</source>

~~// Sorting~~ To fix this, we will have to transfer the ~~elements~~ ~~sort(myVector~~ownership of the vector back to the main function.~~begin(), myVector.end(), [](int a, int b) { return b < a; });~~We do this by returning the vector back and reassigning it to the variable '''vec''':

~~cout <~~< source lang="~~== FOR EACH ==~~rust" >fn add_to_vec(mut v: Vec<i32>, n: i32) -> Vec< ~~endl;~~i32> { ~~for_each(myVector~~v.~~begin~~push(~~), myVector.end(), [](int i) { cout << "Element: " << i << endl; }~~n); ~~cout << endl~~v // shorthand for returning. notice the lack of ;}

fn main() {

let mut vec = Vec::new();

vec.push(0);

vec.push(1);

vec.push(2);

~~cout << "~~vec =~~= COUNTING NUMBERS LARGER THAN 6 ==" << endl;~~ ~~long nLt6 = count_if(myVector.begin(), myVector.end~~add_to_vec()vec, ~~[](int i~~3) ~~{ return i > 6~~; ~~});~~ ~~cout << "There are " << nLt6 << " numbers larger than 6 in~~ // receiving the ~~vector" << endl << endl;~~ownership back

for i in vec.iter() {

println!("{}", i);

}

</source>

~~cout << "== FINDING AN ELEMENT ==" << endl;~~ ~~vector<int>::iterator elFound = find_if(myVector.begin(), myVector.end(), [](int i) { return i > 4 && i % 6 == 0; });~~ ~~cout << "The element " << *elFound << "~~ Now the output is the ~~first that satisfies i > 4 && i % 6 == 0" << endl << endl;~~expected:

<source>0123</source> An easier way to deal with memory ownership is by lending/borrowing memory instead of completely transferring control over it. This is called memory '''borrowing'''. To borrow a value, we use the '''&''' operator: <source lang="rust">fn add_to_vec(v: &mut Vec<i32>, n: i32) { ~~return 0~~v.push(n);

}

~~</code></pre>~~

~~Output~~fn main() { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2); add_to_vec(&mut vec, 3);

~~<pre>~~ for i in vec.iter() {~~== FOR EACH ==~~ println!("{}", i);~~Element: 10~~ }~~Element: 9~~}~~Element: 8Element: 7Element: 6Element: 5Element: 4Element: 3Element: 2Element: 1~~</source>

~~== COUNTING NUMBERS LARGER THAN 6 ==~~This will also produce the correct output. In this case, we don't need to send the ownership of the vector back, because it was never transferred, only borrowed. There is one important detail: '''mutable references (&mut T) are ~~4 numbers larger than 6 in the vector~~only allowed to exist if no other mutable references to that value are active. In other words: only one mutable reference to a value may exist at a time'''.

~~== FINDING AN ELEMENT ==~~From what was presented above, some problems can be identified when applying parallelism:~~The element 6~~ * If memory ownership is transferred in function calls, we can not transfer the ~~first~~ ownership of one something to several threads at the same time - only one thread can have the ownership* Mutable references to something can only exist one at a time, this means that ~~satisfies i > 4 && i % 6 == 0</pre>~~we can not spawn several threads with mutable references to a value

These ~~algorithms not only make~~ two factors limit the capability of conventional shared memory. For example, the following code ~~easier to write and read, but it also tends to be faster~~would not compile: ~~these algorithms are heavily optimized, making them much faster than for/while loops.~~

~~The execution for these algorithms, however, are not parallel by default~~<source lang="rust">use std: ~~they are sequential. However, it is possible to parallelize many of these algorithms, making their execution much faster. This project will discuss the following methods for parallelizing the STL~~: ~~Policy-based execution for C++17 + Intel Parallel STL.~~thread;

~~== Policy-based execution for C++17 and Intel Parallel STL ==~~fn add_to_vec(v: &mut Vec<i32>, n: i32) {~~C++17, introduces a feature called Execution Policies, which can be used to specify what kind of execution is is desired for the algorithm~~ v. ~~There are three possible execution policies:~~push(n);}

* '''stdfn main() { let mut vec = Vec::~~execution::seq''' - Sequential execution~~ new(); vec.push(~~no parallelism~~0);* '''std::execution::par''' - Parallel execution vec.push(~~multiple threads~~1);* '''std::execution::par_unseq''' - Parallel execution + vectorization vec.push(2);

~~The Intel C++ compiler also supports another policy, which was not specified in the [http~~ let t1 = thread:~~//en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t C++ Documentation]~~:spawn(move || { add_to_vec(&mut vec, 4); });

* '''std::execution::unseq''' - Vectorization add_to_vec(&mut vec, 3);

~~These executions are passed as the first parameter~~ for ~~the algorithm:~~i in vec.iter() { println!("{}", i); }

~~<pre><code>std::copy(~~ ~~std::execution::par,~~ a t1.~~start~~join(),; ~~a.end(),~~} ~~b.start()~~);~~</code>~~</~~pre~~source>

~~Most compilers nowadays do not yet support this feature. Probably the only compiler that can already make~~ <source>error[E0382]: use of ~~these policies is the Intel C++ Compiler~~moved value: `vec` --> src/main.rs:20:19 |13 | let t1 = thread::spawn(move || { | ------- value moved (into closure) here...20 | add_to_vec(&mut vec, ~~which was~~ 3); | ^^^ value used ~~to perform the tests below.~~here after move |</source>

Here are the tests performed: we compared the speed of the four execution policies above for six algorithms: **sort**, **count\_if**, **for_each**, **reduce**, **transform**, and **copy**Although this seems unnecessarily strict, ~~using three different data structures~~it has a great advantage: ~~**array**, **vector**~~race conditions are impossible - they are checked at compile time, which means that the systems will be safer and ~~**list**~~less likely to present concurrency problems. ~~We used the Intel C++ Compiler on Windows~~To correctly use parallel programming in Rust, ~~on an x86 64 machine with 8 cores. The programs were compiled with **O3** optimization~~we can make use of tools such as Channels, ~~including the Intel TBB~~ Locks (~~Thread Building Blocks~~Mutex) ~~library and OpenMP. The purpose of these tests was to see how the elapsed time would change with different execution policies~~ , and ~~different data structures~~Atomic pointers.

== Channels ==

~~Some observations~~ Instead of sharing memory, Rust encourages communication between thread in order to ~~point out~~combine information. Channels, which are available in Rust's standard library, can be pictured as rivers:an object put into this river will be carried downstream. Likewise, a Channel has two parts: a sender and a receiver. The sender is responsible for putting information in the channel, which can be retrieved by the receiver. When we ask for a receiver to get a message from the channel, the execution may wait until the message is received or not: the method '''recv''' will block the execution, while '''try_recv''' will not.

* There is not much documentation available on how to properly use execution policies (at least I did not find anything in depth).* The ~~programmer who is using execution policies is still responsible for avoiding deadlocks and race conditions.~~* The purpose of these tests is not to compare one compiler versus another or one algorithm versus another.* The code for all the tests is available [https://github.com/hscasn/testpstl here].* **std::vector** is following example will spawn a ~~dynamic array: the data is kept sequentially in memory~~thread, ~~while **std::list** is~~ which will send a ~~linked list (~~few strings down a channel which will be received by the ~~data is scattered in memory).~~* The tests were ran three times, and the average of the elapsed time was used as the results, avoiding "abnormal" resultsmain thread.

~~----~~<source lang="rust">**use std::copy**thread;use std::sync::mpsc;use std::sync::mpsc::{Sender, Receiver};

~~''Code snippet''~~fn main() { let (sender, receiver): (Sender<String>, Receiver<String>) = mpsc::channel();

~~Copying values from array ''a'' to array ''b''.~~ thread::spawn(move || {~~<pre><code>~~ let messages = vec![~~std~~ String::~~copy~~from("blue"), ~~std~~ String::~~execution~~from("yellow"), String::~~par~~from("green"), a, ~~a + ARRSZ~~ String::from("red"), b) ];~~```~~

~~''Results''~~ for colour in messages { sender.send(colour).unwrap(); } });

~~[[File:http://public.hcoelho.com/images/blog/pstl_copy.png]]~~ for message in receiver { println!("Got {}", message); }

First, for the array and the vector, these results look rather strange - we were expecting that both for the array and the vector, we would notice a downward trend for the time for the execution policies: since the memory is sequential, they could easily be vectorized.}</source>

~~There are some important guesses/conclusions here that I want to mention. They will be important to explain these results and the results for~~ This produces the following ~~tests~~output:- An explanation for why the vectorized method did not yield a good result could be that because I was compiling with **O3**, the serial code is already being vectorized in the background; the fact that I would be asking for it to be vectorized again would add some overhead- I am not entirely sure why the parallel policy did not always have a good result, but if I had to guess, I would say it was because the copies were creating a **race condition**, which was slowing the parallel execution down

Now, about the *list*: the parallel versions were better than the serial and vectorized ones. Why did the vectorize version did not yield a good result? This could be explained by the fact that vectorization did not happen: because of the nature of a linked list, the memory is too scattered to be put together in a vector register and processed with **SIMD**, and this would not improve the timing at all. One thing to point out: **since a list is not sequential in memory, it is costly to "slice" the collection in equal parts and parallelize them (we basically have to iterate through the whole list), so the gains for parallel execution are not great. If, however, the operation to be performed in every element is slow, this slow slicing can still be worth it**.<source>Got blueGot yellowGot greenGot red</source>

~~----~~**stdSimilarly to the previous examples, the '''sender''' will be moved to the closure executed by the thread, which means that it will go out of scope when the thread finishes. To distribute the sender through several threads, they can be cloned:objects sent through this sender will still be received by the same receiver, but the two senders are different objects, which can have their ownership moved:count_if**

~~''Snippet~~<source lang="rust">use std:'':thread;use std::sync::mpsc;use std::sync::mpsc::{Sender, Receiver};

~~<pre><code>// Counting all multiples of 3auto condition~~ static NUM_THREADS: i32 = ~~[](mytype& i) { return i % 3 == 0; }~~8;

~~size_t sum~~ fn main() { let mut vec = ~~std~~Vec::~~count_if~~new( ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ ~~condition~~);~~</code></pre>~~

~~''Results~~ let (sender, receiver):''(Sender<i32>, Receiver<i32>) = mpsc::channel();

~~[[File:http://public~~ for i in 0.~~hcoelho~~.~~com~~NUM_THREADS { //~~images~~cloning the sender so it can be exclusive to /~~blog~~/~~pstl_count_if~~the new thread let thread_sender = sender.~~png]]~~clone();

~~These results look a lot more like what we were expecting~~ thread::spawn(move || {* The vectorized version is the slowest, because vectorizing an operation like this is not possible, and we have the extra overhead of making this decision thread_sender.send(i).unwrap();* The parallel version performed better than any other policy, because the parallel + vectorized version had the overhead for deciding if the vectorization should happen or not });* We had a similar result for the list, but the gains were dramatically smaller. This is because, as I cited before, a list is not sequential in memory, so it is costly to "slice" the collection in equal parts and parallelize them. We did have a very small gain, but the cost for iterating through the list was way too big. }

~~----~~ for _ in 0..NUM_THREADS {~~'''std::for_each'''~~ vec.push(receiver.recv().unwrap()); }

~~''Snippet:''~~ for i in vec.iter() { println!("{}", i); }

~~<pre><code>size_t sum;auto action = [&](mytype& i) { return sum += i;~~ };~~std::for_each(~~ ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ ~~action~~);</~~code></pre~~source>

Notice how this **for_each** behaves like a **reduce**Output: ~~I am modifying the variable "sum" outside of this function. This is very likely to cause a **race condition**.~~

~~''Results:''~~<source>13025674</source>

~~[[File:http://public.hcoelho.com/images/blog/pstl_for_each.png]]~~== Locks (Mutex) ==

~~Even though we had~~ Rust language also supports shared-state concurrency which allows two or more processes have some shared state (~~what it looked like~~data) ~~a race condition,we still got good results with the parallel excution policies for array~~ between them they can write to and ~~vector~~read from. ~~Again, this process could not be vectorized, and this is why~~ Sharing data between multiple threads can get pretty complicated since it introduces the ~~vectorization policy did not do well~~strong hazard of race conditions. For the **list**Imagine a situation when one thread grabs some data and attempts to change it, ~~we see~~ while another thread is starting to read the same ~~pattern as before: since the slicing for the collection is costly~~value, there’s no way to predict if latter one retrieves updated data or if it ~~seems like~~ gets hold of the ~~either the compiler did not parallelize it, or the parallel version was just as slow as the serial version~~old value. Thus shared-state concurrency has to deliver some implementation of guard and synchronized access mechanism.[[File:RustMutex.png|thumb|left|alt=Mutex|Acquire/Release Lock Process ]]

~~----'''std::reduce'''~~The access to shared state (critical section) and synchronization methods are typically implemented through ‘mutex’. Mutual exclusion principles (mutex) provide ways to prevent race conditions by only allowing one thread to reach out some data at any given time. If one thread wants to read/write some piece of data, it must first give a signal and then, once permitted, acquire the mutex’s lock. The lock is a special data structure that can keep track of who currently has exclusive access to the data. You can think of locks as dedicated mechanism that guards critical section.

~~''Snippet:''~~Since Rust Lang supports ownership principles, the threads are inaccessible to each other automatically and only one thread can access data at any given time time.

~~<pre><code>size_t sum = std~~The snippet below demonstrates how you can create and access mutex in rustlang:~~:reduce(~~ ~~std::execution::par,~~ ~~a.begin(),~~ ~~a.end()~~);~~</code></pre>~~

~~''Results~~<source lang="rust">fn use_lock(mutex:''&Mutex<Vec<i32>>) { let mut guard = lock(mutex); // take ownership let numbers = access(&mut guard); // borrow access numbers.push(42); // change the data} </source>

~~[[File:http://public~~Mutex construct is generic and it accepts the piece of shared data protected by the lock.~~hcoelho~~It is important to remember that the ownership of that data is transferred into the mutex structure at its creation.~~com/images/blog/pstl_reduce.png]]~~

The ~~results here are very similar~~ lock function will attempt to acquire the **for_each** algorithm - lock by blocking the local thread until it ~~seems like~~ is available to obtain the mutex. The data is automatically unlocked once the ~~race condition I made with~~ return value of lock() function gets released (at the ~~"*sum*" for~~ end of the ~~previous test was not really a problem for~~ function scope) so there is no need to release the ~~algorithm~~mutex lock manually.

== RC and Atomic ==

~~----~~In Rust, some data types are defined as “thread safe” while others are not. For example, Rc<T> type, which is Rust’s own implementation of “smart pointer”, is considered to be unsafe to share across threads. This type keeps track of number of references and increments/decrements count each time a new reference is created or old one gets destroyed. Rc<T> does not enforce any mechanisms that makes sure that changes to the reference counter value can’t be interrupted by another thread. The snippet below demonstrates the issue an results in compiler error:

~~''' std~~<source lang="rust">let mutex = Rc::~~sort'''~~new(Mutex::new(2));let mut handles = vec![];

~~''Snippet~~for _ in 0..4 { let mutex = mutex.clone(); let handle = thread:'':spawn(move || { let mut num = mutex.lock().unwrap();

~~This algorithm is a bit different~~ *num *= 2; println!("Intermediate Result : ~~we cannot use ''std::list'' with it. Since the algorithm requires random access~~{}", ~~it is only possible to use arrays or vectors~~*num); }); handles.push(handle);}

~~<pre><code>~~for handle in handles {~~auto sorter = []~~ handle.join().unwrap(~~mytype a, mytype b~~) ~~{ return a < b~~; };

~~std::sort~~println!( ~~std~~"Final Result:~~:execution::par~~{}", a, ~~a + ARRSZ,~~ ~~sorter~~*mutex.lock().unwrap());</~~code></pre~~source>

~~''Results~~This will produce an error :''

<source lang="rust">error[E0277]: the trait bound `std::rc::Rc<std::sync::Mutex<i32>>: std::marker::Send` is not satisfied in `[~~File~~closure@src/main.rs:11:32: 16:6 mutex:std:~~http~~:rc::Rc<std::sync::Mutex<i32>>]` --> src/main.rs:11:18 |11 | let handle = thread::spawn(move || { | ^^^^^^^^^^^^^ `std::rc::Rc<std::sync::Mutex<i32>>` cannot be sent between threads safely | = help: within `[closure@src/~~public~~main.~~hcoelho.com/images/blog~~rs:11:32: 16:6 mutex:std::rc::Rc<std::sync::Mutex<i32>>]`, the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::sync::Mutex<i32>>` = note: required because it appears within the type `[closure@src/~~pstl_sort~~main.~~png]~~rs:11:32: 16:6 mutex:std::rc::Rc<std::sync::Mutex<i32>>]` = note: required by `std::thread::spawn`

Probably the most dramatic results so far. Vectorization for sorting is probably not something that can be done, so it makes sense that the vectorized policies did not yield a good result. The parallel versions, however, had a very dramatic improvement. It seems that this **std::sort** implementation is really optimised for parallel execution!</source>

Luckily for us, Rust ships another thread-safe implementation of ‘smart pointer’ called Arc<T> which implements reference counting via atomic operations. It is important to note that in serial code, it makes much more sense to use standard Rc<T> over Arc<T> type since the latter structure is more expensive performance-wise.

~~----~~<source lang="rust">use std::sync::{Arc, Mutex};use std::thread;

~~'''std~~fn main() {let mutex = Arc::~~transform'''~~new(Mutex::new(2));let mut handles = vec![];

~~''Snippet~~for _ in 0..4 { let mutex = mutex.clone(); let handle = thread:'':spawn(move || { let mut num = mutex.lock().unwrap();

~~<pre><code>~~ *num *= 2;~~auto action = []~~ println!(~~mytype i~~"Intermediate Result : {}", *num) ~~{ return i += 5~~; }); handles.push(handle);}

~~std::transform~~for handle in handles { handle.join().unwrap( ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ a, ~~action~~);~~</code></pre>~~ }

~~''Results~~println!("Final Result:''{}", *mutex.lock().unwrap());

~~[[File:http://public.hcoelho.com/images/blog/pstl_transform.png]]~~}

For the list, what happened here seems to be similar to what happened before: it is too costly to parallelize or vectorize the operations, so the execution policies did not have a good result. For the array, we also had a similar result as before, with the parallel version working better than the other ones.</source>

~~For~~ This will produce the **vector** data structure, however, we finally had a different expected result: for some reason, the vectorized version, this time, had a good effect. Apparently the serial version could not be vectorized by default, but when we used the **unseq** execution policy, it was now able to vectorize the operation. An explanation for the fact that the parallel + vectorization version did not have a better result than the parallel one, however, is not very obvious to me. If I had to guess, I would say it is because the parallel version had the vectorization done by the compiler as well, without us needing to specify it.

In conclusion, it is refreshing to see a tool like this. It makes parallelization so easy that we almost don't need to worry about anything else. It would be nice to have more clear documentation for these methods, so the results could be a little more predictable - Some of these results are not exactly intuitive. A suggestion for the next project is to analyse the assembly code that was generated to see exactly why we got some of these results.<source>Intermediate Result : 4Intermediate Result : 8Intermediate Result : 16Intermediate Result : 32Final Result: 32</source>

This time the code worked. This example was very simple and not too impressive. Much more complicated algorithms can be implemented with the Mutex<T>.

== Rayon Library ==

=== Introduction ===

Rust lang users and community are heavily involved in development of language extensions. Although core features offer quite a bit of programming tools on their own, there is still a demand for more complex functionality with a higher-level API. Rust has its own package registry that stores various libraries that can satisfy programmers needs. These packages vary from networking tools, web services to image processing and multi-threading libraries.

One package can be of particular interest to parallel programming enthusiasts – it is a data-parallelism library called Rayon. The package is praised to be extremely lightweight and it makes it easy to convert a sequential computation into a parallel one. It also abstracts some Rust-lang threading boilerplate code and makes coding parallel tasks a bit more fun.

Just like TBB, it can be used with iterators, where iterator chains are executed in parallel. It also supports join model and provides ways to convert recursive, divide-and-conquer style problems into parallel implementations.

=== Parallel Iterators ===

Rayon offers experimental API called “parallel iterators”. Example below demonstrates parallel sum of squares function:

use rayon::prelude::*;

fn sum_of_squares(input: &[i32]) -> i32 {

input.par_iter()

.map(|&i| i * i)

.sum()

}

</source>

=== Using Join for Divide-and-Conquer Problems ===

Parallel iterators are implemented with a more primitive method called join which takes 2 closures and runs them in parallel. The code snippet below demonstrates increment algorithm:

fn increment_all(slice: &mut [i32]) {

if slice.len() < 1000 {

for p in slice { *p += 1; }

} else {

let mid_point = slice.len() / 2;

let (left, right) = slice.split_at_mut(mid_point);

rayon::join(|| increment_all(left), || increment_all(right));

}

</source>

=== How it works ===

Rayon borrowed work stealing concepts from Cilk project. The library attempts to dynamically ascertain how much multi-threading resources are available. The idea is very simple: there is always a pool of worker threads available, waiting for some work to do. When you call join the first time, we shift over into that pool of threads.

[[File:Rust_Join1.png]]

~~== ??? ==<pre style="color: red">???</pre>~~But if you call join(a, b) from a worker thread W, then W will place b into its work queue, advertising that this is work that other worker threads might help out with. W will then start executing a. While W is busy with a, other threads might come along and take b from its queue. That is called stealing b.

[[File:Rust_Join_2.png|600px]]

~~== Sources ==~~* STL: https://enOnce a is done, W checks whether b was stolen by another thread and, if not, executes b itself.~~wikipedia~~If W runs out of jobs in its own queue, it will look through the other threads' queues and try to steal work from them.~~org/wiki/Standard_Template_Library~~* STL: http://www.cplusplus.com/reference/algorithm/* Policy-Based execution for C++17: https://scs.senecac.on.ca/~oop345/pages/content/multi.html#alg~~* Intel Parallel STL: https://software.intel.com/en-us/get-started-with-pstl*~~

[[File:Rust_Join_3.png|600px]]

== Group Members ==

# [mailto:obelavina@myseneca.ca Olga Belavina]

== Sources ==

* [https://doc.rust-lang.org/book/second-edition/ch03-01-variables-and-mutability.html Variables and Mutability]

* [https://doc.rust-lang.org/book/second-edition/ch04-01-what-is-ownership.html Ownership]

* [https://doc.rust-lang.org/book/second-edition/ch04-02-references-and-borrowing.html References and Borrowing]

* [https://doc.rust-lang.org/std/sync/atomic/struct.AtomicPtr.html Atomic Pointers]

* [https://doc.rust-lang.org/std/sync/struct.Mutex.html Mutex]

* [https://docs.rs/rayon/0.9.0/rayon/ Rayon Library]

* [https://static.rust-lang.org/doc/master/std/sync/mpsc/index.html Channels]

* [https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.html Fearless Concurrency with Rust]

== Progress ==

-5%

Henrique Salvadori Coelho

55

edits

Changes

Team Hortons

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools