Changes

← Older edit

Team Hortons

5,544 bytes added, 14:10, 15 December 2017

→‎Sources

= ~~Parallelization of the C++ Standard Library: Intel Parallel STL, MCSTL, and C++17~~ Parallelism with Rust =

== Introduction: The ~~Standard Template Library~~ Rust Programming language ==The Standard Template Library (STL) for C++ is a library that provides a set of classes and functions for C++, like iterators and vectors. The STL is divided in four parts: algorithms, containers, functors, and iterators. For this project, we will focus on the ''algorithms'' library.

~~The algorithms library provides many functions~~ Rust is a system programming language (like C and C++, offering direct access to ~~be used for ranges of elements~~physical memory) sponsored by Mozilla, created in 2010. It is compiled, imperative, functional and ~~use iterators or pointers~~ strongly typed; it aims to ~~navigate~~ provide better memory safety and still maintain a performance similar to C++, while providing garbage collection through ~~them~~reference counting. Rust won the first place as the "most loved programming language" in 2016 and 2017, in a survey from Stack Overflow [https://insights. ~~Some of these algorithms are~~stackoverflow.com/survey/2016#technology-most-loved-dreaded-and-wanted 1][https://insights.stackoverflow.com/survey/2017#most-loved-dreaded-and-wanted 2].

* '''all_of''' - Tests if Rust has a ~~condition is true for all the elements~~ very strong emphasis in security, more specifically, memory safety. Because of this, the ~~range~~* '''for_each''' - Applies a function to all elements in same rules that the ~~range~~* '''find''' - Finds compiler requires from a ~~value~~ program end up solving many problems that are common in ~~the range~~* '''sort''' - Sorts the elements concurrent executions, such as race conditions. This project will explain what these security rules are, how they can prevent problems in parallel programming, as well as demonstrating the ~~range~~correct way to tackle concurrency with Rust.

~~These algorithms provide a ''functional'' way to work on collections,~~ == Memory Ownership and ~~are present in several programming languages. These algorithms replace what usually would be a for/while loop, shifting the attention from "how to loop" to "what to do".~~Borrowing ==

~~<pre>#include <iostream>#include <algorithm>#include <~~Rust has a very peculiar way to deal with memory deallocation, security, and ownership. Understanding how memory management works is essential in order to understand how to use parallel programming in Rust. Consider the following example, where we create a vector>, push some elements into it, and then print the elements:

~~using namespace std~~<source lang="rust">fn main() { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2);

~~int main~~ for i in vec.iter(){ println!("{}", i); }}</source> Output: <source>012</source> Notice how variables in Rust are immutable (constant) by default - we need to manually specify that the vectorcan be changed. Failing to do so will result in a compilation error. In the next example, we abstracted the "push" method to a function called "add_to_vec", which takes a vector and a number, and pushes the number into the vector: <source lang="rust">fn add_to_vec(mut v: Vec<~~int~~i32> ~~myVector~~ , n: i32) { v.push(n);} fn main() { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2); add_to_vec(vec, 3); for i in vec.iter() { println!("{ 10}", i); }}</source> In a language like C++, 6a vector passed like this would be copied to the function and modified, 7leaving the original vector untouched; in a language like Java, 8a similar logic would work, 5since objects are passed by reference by default. Rust, on the other hand, has a different approach: sending the vector to a function like this means transferring the ownership of the object completely to that function. This means that after the vector is sent to a function, it is not available to the main function anymore - once the '''add_to_vec''' function ends, 4the vector will then be deallocated, 1and the next *for* loop will fail. If we try compiling this code, 2the compiler will produce an error: <source>error[E0382]: use of moved value: `vec` --> src/main.rs:13:12 |11 | add_to_vec(vec, 3); | --- value moved here12 | 13 | for i in vec.iter() { | ^^^ value used here after move | = note: move occurs because `vec` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait</source> To fix this, 9 we will have to transfer the ownership of the vector back to the main function. We do this by returning the vector back and reassigning it to the variable '''vec''': <source lang="rust">fn add_to_vec(mut v: Vec<i32>, n: i32) -> Vec<i32> { v.push(n); v // shorthand for returning. notice the lack of ;} fn main() { let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2);

~~// Sorting the elements~~ ~~sort(myVector.begin~~vec = add_to_vec()vec, ~~myVector.end(), [](int a, int b) { return b < a; }~~3);// receiving the ownership back

~~cout << "== FOR EACH ==" << endl;~~ ~~for_each(myVector~~for i in vec.~~begin~~iter()~~, myVector.end~~{ println!(~~), [](int i)~~ "{ ~~cout <<~~ }"~~Element: " <<~~ , i ~~<< endl; }~~); ~~cout <~~}}< ~~endl;~~/source>

Now the output is the expected:

~~cout~~ <~~< "== COUNTING NUMBERS LARGER THAN 6 ==" << endl;~~source>012 ~~long nLt6 = count_if(myVector.begin(), myVector.end(), [](int i) { return i > 6; });~~3 ~~cout~~ <~~< "There are " << nLt6 << " numbers larger than 6 in the vector" << endl << endl;~~/source>

An easier way to deal with memory ownership is by lending/borrowing memory instead of completely transferring control over it. This is called memory '''borrowing'''. To borrow a value, we use the '''&''' operator:

~~cout <~~< source lang="~~== FINDING AN ELEMENT ==~~rust" ~~<< endl;~~> ~~vector~~fn add_to_vec(v: &~~lt;int~~mut Vec<i32>, n:~~:iterator elFound = find_if(myVector.begin(~~i32)~~, myVector~~{ v.~~end~~push(~~), [](int i) { return i > 4 && i % 6 == 0; }~~n); ~~cout << "The element " << *elFound << " is the first that satisfies i > 4 && i % 6 == 0" << endl << endl;~~}

fn main() { ~~return~~ let mut vec = Vec::new(); vec.push(0); vec.push(1); vec.push(2); add_to_vec(&mut vec, 3); for i in vec.iter() { println!("{}", i); }

}

</~~pre~~source>

~~Output~~This will also produce the correct output. In this case, we don't need to send the ownership of the vector back, because it was never transferred, only borrowed. There is one important detail:'''mutable references (&mut T) are only allowed to exist if no other mutable references to that value are active. In other words: only one mutable reference to a value may exist at a time'''.

~~<pre>== FOR EACH ==Element: 10Element: 9Element: 8Element: 7Element~~From what was presented above, some problems can be identified when applying parallelism: 6~~Element: 5~~* If memory ownership is transferred in function calls, we can not transfer the ownership of one something to several threads at the same time - only one thread can have the ownership~~Element: 4Element: 3Element: 2Element: 1~~* Mutable references to something can only exist one at a time, this means that we can not spawn several threads with mutable references to a value

~~== COUNTING NUMBERS LARGER THAN 6 ==There are 4 numbers larger than 6 in~~ These two factors limit the ~~vector~~capability of conventional shared memory. For example, the following code would not compile:

<source lang=~~= FINDING AN ELEMENT ==The element 6 is the first that satisfies i~~ "rust"> ~~4 && i % 6 == 0</pre>~~use std::thread;

~~These algorithms not only make the code easier to write and read~~fn add_to_vec(v: &mut Vec<i32>, ~~but it also tends to be faster~~n: ~~these algorithms are heavily optimized, making them much faster than for/while loops~~i32) { v.push(n);}

~~The execution for these algorithms, however, are not parallel by default~~fn main() { let mut vec = Vec: ~~they are sequential~~:new(); vec. ~~However, it is possible to parallelize many of these algorithms, making their execution much faster~~push(0); vec. ~~This project will discuss the following methods for parallelizing the STL: Policy-based execution for C++17 + Intel Parallel STL~~push(1); vec.push(2);

let t1 =~~= Policy-based execution for C++17 and Intel Parallel STL ==~~thread::spawn(move || {~~C++17~~ add_to_vec(&mut vec, ~~introduces a feature called Execution Policies, which can be used to specify what kind of execution is is desired for the algorithm. There are three possible execution policies:~~4); });

* '''std::execution::seq''' - Sequential execution add_to_vec(~~no parallelism~~&mut vec, 3)* '''std::execution::par''' - Parallel execution (multiple threads)* '''std::execution::par_unseq''' - Parallel execution + vectorization;

~~The Intel C++ compiler also supports another policy, which was not specified~~ for i in ~~the [http://en~~vec.~~cppreference.com/w/cpp/algorithm/execution_policy_tag_t C++ Documentation]:~~iter() { println!("{}", i); }

* '''std::execution::unseq''' - Vectorization t1.join();}</source>

~~These executions are passed as the first parameter for the algorithm~~<source>error[E0382]:use of moved value: `vec` --> src/main.rs:20:19 |13 | let t1 = thread::spawn(move || { | ------- value moved (into closure) here...20 | add_to_vec(&mut vec, 3); | ^^^ value used here after move |</source>

~~<pre>std~~Although this seems unnecessarily strict, it has a great advantage:~~:copy(~~ ~~std::execution::par~~race conditions are impossible - they are checked at compile time, awhich means that the systems will be safer and less likely to present concurrency problems.~~start()~~To correctly use parallel programming in Rust, we can make use of tools such as Channels, ~~a.end~~Locks (Mutex), band Atomic pointers.~~start()~~);~~</pre>~~

~~Most compilers nowadays do not yet support this feature. Probably the only compiler that can already make use of these policies is the Intel C++ Compiler, which was used to perform the tests below.~~== Channels ==

~~Here are the tests performed: we compared the speed~~ Instead of the four execution policies above for six algorithms: **sort**sharing memory, ~~**count\_if**~~Rust encourages communication between thread in order to combine information. Channels, ~~**for_each**~~which are available in Rust's standard library, ~~**reduce**, **transform**~~can be pictured as rivers: an object put into this river will be carried downstream. Likewise, ~~and **copy**, using three different data structures~~a Channel has two parts: **array**, **vector**, a sender and ~~**list**~~a receiver. ~~We used~~ The sender is responsible for putting information in the ~~Intel C++ Compiler on Windows~~channel, ~~on an x86 64 machine with 8 cores~~which can be retrieved by the receiver. ~~The programs were compiled with **O3** optimization~~When we ask for a receiver to get a message from the channel, ~~including~~ the ~~Intel TBB (Thread Building Blocks) library and OpenMP. The purpose of these tests was to see how~~ execution may wait until the message is received or not: the method '''recv''' will block the ~~elapsed time would change with different~~ execution ~~policies and different data structures~~, while '''try_recv''' will not.

The following example will spawn a thread, which will send a few strings down a channel which will be received by the main thread.

~~Some observations to point out~~<source lang="rust">use std::thread;use std::sync::mpsc;use std::sync::mpsc::{Sender, Receiver};

* There is not much documentation available on how to properly use execution policies fn main(~~at least I did not find anything in depth~~).{* The programmer who is using execution policies is still responsible for avoiding deadlocks and race conditions.* The purpose of these tests is not to compare one compiler versus another or one algorithm versus another.* The code for all the tests is available [https let (sender, receiver):~~//github.com/hscasn/testpstl here].~~(Sender<String>, Receiver<String>) = * **std mpsc::~~vector** is a dynamic array: the data is kept sequentially in memory, while **std::list** is a linked list~~ channel(~~the data is scattered in memory~~).* The tests were ran three times, and the average of the elapsed time was used as the results, avoiding "abnormal" results.;

~~----~~ thread::spawn(move || { let messages = vec![**std String::copy**from("blue"), String::from("yellow"), String::from("green"), String::from("red"), ];

~~''Code snippet''~~ for colour in messages { sender.send(colour).unwrap(); } });

~~Copying values from array ''a'' to array ''b''.~~ for message in receiver { println!("Got {}", message); }

~~<pre>std::copy(~~ ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ b);}</~~pre~~source>

~~''Results''~~This produces the following output:

~~[[File:http:~~<source>Got blueGot yellowGot greenGot red</~~/public.hcoelho.com/images/blog/pstl_copy.png]]~~source>

~~First~~Similarly to the previous examples, ~~for~~ the ~~array and~~ '''sender''' will be moved to the closure executed by the ~~vector~~thread, ~~these results look rather strange - we were expecting~~ which means that ~~both for~~ it will go out of scope when the ~~array and~~ thread finishes. To distribute the ~~vector~~sender through several threads, ~~we would notice a downward trend for~~ they can be cloned: objects sent through this sender will still be received by the ~~time for~~ same receiver, but the ~~execution policies~~two senders are different objects, which can have their ownership moved: ~~since the memory is sequential, they could easily be vectorized.~~

~~There are some important guesses/conclusions here that I want to mention. They will be important to explain these results and the results for the following tests~~<source lang="rust">use std::thread;~~- An explanation for why the vectorized method did not yield a good result could be that because I was compiling with **O3**, the serial code is already being vectorized in the background~~use std::sync::mpsc; ~~the fact that I would be asking for it to be vectorized again would add some overhead- I am not entirely sure why the parallel policy did not always have a good result~~use std::sync::mpsc::{Sender, ~~but if I had to guess, I would say it was because the copies were creating a **race condition**, which was slowing the parallel execution down~~Receiver};

Now, about the *list*static NUM_THREADS: the parallel versions were better than the serial and vectorized ones. Why did the vectorize version did not yield a good result? This could be explained by the fact that vectorization did not happen: because of the nature of a linked list, the memory is too scattered to be put together in a vector register and processed with **SIMD**, and this would not improve the timing at all. One thing to point out: **since a list is not sequential in memory, it is costly to "slice" the collection in equal parts and parallelize them (we basically have to iterate through the whole list), so the gains for parallel execution are not great. If, however, the operation to be performed in every element is slow, this slow slicing can still be worth it**.i32 = 8;

~~----~~fn main() {**std let mut vec = Vec::count_if**new();

~~''Snippet~~ let (sender, receiver):''(Sender<i32>, Receiver<i32>) = mpsc::channel();

~~<pre>~~ for i in 0..NUM_THREADS { // cloning the sender so it can be exclusive to // ~~Counting all multiples of 3~~the new thread~~auto condition~~ let thread_sender = []sender.clone(~~mytype& i~~) ~~{ return i % 3 == 0; }~~;

~~size_t sum = std~~ thread::~~count_if~~spawn(move || { ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ ~~condition~~ thread_sender.send(i).unwrap(); });~~</pre>~~ }

~~''Results:''~~ for _ in 0..NUM_THREADS { vec.push(receiver.recv().unwrap()); }

~~[[File:http://public~~ for i in vec.~~hcoelho.com/images/blog/pstl_count_if.png]]~~iter() { println!("{}", i); }

~~These results look a lot more like what we were expecting:~~}* The vectorized version is the slowest, because vectorizing an operation like this is not possible, and we have the extra overhead of making this decision* The parallel version performed better than any other policy, because the parallel + vectorized version had the overhead for deciding if the vectorization should happen or not* We had a similar result for the list, but the gains were dramatically smaller. This is because, as I cited before, a list is not sequential in memory, so it is costly to "slice" the collection in equal parts and parallelize them. We did have a very small gain, but the cost for iterating through the list was way too big.</source>

~~----'''std~~Output:~~:for_each'''~~

~~''Snippet:''~~<source>13025674</source>

~~<pre>size_t sum;auto action~~ = ~~[&]~~= Locks (~~mytype& i~~Mutex) ~~{ return sum +~~= ~~i; };std::for_each(~~ ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ ~~action~~);~~</pre>~~=

~~Notice how this **for_each** behaves like~~ Rust language also supports shared-state concurrency which allows two or more processes have some shared state (data) between them they can write to and read from. Sharing data between multiple threads can get pretty complicated since it introduces the strong hazard of race conditions. Imagine a **reduce**: I am modifying situation when one thread grabs some data and attempts to change it, while another thread is starting to read the ~~variable "sum" outside~~ same value, there’s no way to predict if latter one retrieves updated data or if it gets hold of ~~this function~~the old value. ~~This is very likely~~ Thus shared-state concurrency has to cause a **race condition**deliver some implementation of guard and synchronized access mechanism.[[File:RustMutex.png|thumb|left|alt=Mutex|Acquire/Release Lock Process ]]

~~''Results:''~~The access to shared state (critical section) and synchronization methods are typically implemented through ‘mutex’. Mutual exclusion principles (mutex) provide ways to prevent race conditions by only allowing one thread to reach out some data at any given time. If one thread wants to read/write some piece of data, it must first give a signal and then, once permitted, acquire the mutex’s lock. The lock is a special data structure that can keep track of who currently has exclusive access to the data. You can think of locks as dedicated mechanism that guards critical section.

~~[[File:http://public~~Since Rust Lang supports ownership principles, the threads are inaccessible to each other automatically and only one thread can access data at any given time time.~~hcoelho.com/images/blog/pstl_for_each.png]]~~

~~Even though we had (what it looked like) a race condition,we still got good results with the parallel excution policies for array~~ The snippet below demonstrates how you can create and ~~vector. Again, this process could not be vectorized, and this is why the vectorization policy did not do well. For the **list**, we see the same pattern as before~~access mutex in rustlang: ~~since the slicing for the collection is costly, it seems like the either the compiler did not parallelize it, or the parallel version was just as slow as the serial version.~~

~~----~~<source lang="rust">~~'''std~~fn use_lock(mutex:~~:reduce'''~~&Mutex<Vec<i32>>) { let mut guard = lock(mutex); // take ownership let numbers = access(&mut guard); // borrow access numbers.push(42); // change the data} </source>

~~''Snippet:''~~Mutex construct is generic and it accepts the piece of shared data protected by the lock. It is important to remember that the ownership of that data is transferred into the mutex structure at its creation.

~~<pre>size_t sum = std::reduce(~~ ~~std::execution::par,~~ aThe lock function will attempt to acquire the lock by blocking the local thread until it is available to obtain the mutex.~~begin~~The data is automatically unlocked once the return value of lock(), a.function gets released (at the end(of the function scope));~~</pre>~~so there is no need to release the mutex lock manually.

~~''Results:''~~== RC and Atomic ==

~~[[File:http://public~~In Rust, some data types are defined as “thread safe” while others are not.~~hcoelho~~For example, Rc<T> type, which is Rust’s own implementation of “smart pointer”, is considered to be unsafe to share across threads.~~com~~This type keeps track of number of references and increments/~~images/blog/pstl_reduce~~decrements count each time a new reference is created or old one gets destroyed. Rc<T> does not enforce any mechanisms that makes sure that changes to the reference counter value can’t be interrupted by another thread.~~png]]~~The snippet below demonstrates the issue an results in compiler error:

~~The results here are very similar to the **for_each** algorithm - it seems like the race condition I made with the~~ <source lang="~~*sum*~~rust" ~~for the previous test was not really a problem for the algorithm.~~>let mutex = Rc::new(Mutex::new(2));let mut handles = vec![];

for _ in 0..4 {

let mutex = mutex.clone();

let handle = thread::spawn(move || {

let mut num = mutex.lock().unwrap();

~~----~~ *num *= 2; println!("Intermediate Result : {}", *num); }); handles.push(handle);}

~~''' std::sort'''~~for handle in handles { handle.join().unwrap();}

~~''Snippet~~println!("Final Result:''{}", *mutex.lock().unwrap());</source>

This ~~algorithm is a bit different~~will produce an error : ~~we cannot use ''std::list'' with it. Since the algorithm requires random access, it is only possible to use arrays or vectors.~~

<~~pre~~source lang="rust">error[E0277]: the trait bound `std::rc::Rc<std::sync::Mutex<i32>>: std::marker::Send` is not satisfied in `[closure@src/main.rs:11:32: 16:6 mutex:std::rc::Rc<std::sync::Mutex<i32>>]` --> src/main.rs:11:18 |11 | let handle = thread::spawn(move || { | ^^^^^^^^^^^^^ `std::rc::Rc<std::sync::Mutex<i32>>` cannot be sent between threads safely |~~auto sorter~~ = help: within `[closure@src/main.rs:11:32: 16:6 mutex:std::rc::Rc<std::sync::Mutex<i32>>]~~(mytype a~~`, ~~mytype b) { return a~~ the trait `std::marker::Send` is not implemented for `std::rc::Rc<std::sync::Mutex<i32>>` = note: required because it appears within the type `[closure@src/main.rs:11:32: 16:6 mutex:std::rc::Rc<std::sync::Mutex< ~~b; };~~i32>>]` = note: required by `std::thread::spawn`

~~std::sort(~~ ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ ~~sorter~~);</~~pre~~source>

~~''Results:''~~Luckily for us, Rust ships another thread-safe implementation of ‘smart pointer’ called Arc<T> which implements reference counting via atomic operations. It is important to note that in serial code, it makes much more sense to use standard Rc<T> over Arc<T> type since the latter structure is more expensive performance-wise.

~~[[File~~<source lang="rust">use std:~~http~~:~~//public.hcoelho.com/images/blog/pstl_sort.png]]~~sync::{Arc, Mutex};use std::thread;

Probably the most dramatic results so far. Vectorization for sorting is probably not something that can be done, so it makes sense that the vectorized policies did not yield a good result. The parallel versions, however, had a very dramatic improvement. It seems that this **stdfn main() {let mutex = Arc::~~sort** implementation is really optimised for parallel execution~~new(Mutex::new(2));let mut handles = vec![];

for _ in 0..4 {

let mutex = mutex.clone();

let handle = thread::spawn(move || {

let mut num = mutex.lock().unwrap();

~~----~~ *num *= 2; println!("Intermediate Result : {}", *num); }); handles.push(handle);}

~~'''std::transform'''~~for handle in handles { handle.join().unwrap(); }

~~''Snippet~~println!("Final Result:''{}", *mutex.lock().unwrap());

~~<pre>auto action = [](mytype i) { return i += 5;~~ };

~~std::transform(~~ ~~std::execution::par,~~ a, ~~a + ARRSZ,~~ a, ~~action~~);</~~pre~~source>

~~''Results~~This will produce the expected result :''

~~[[File~~<source>Intermediate Result :~~http~~4Intermediate Result :8Intermediate Result : 16Intermediate Result : 32Final Result: 32</~~/public.hcoelho.com/images/blog/pstl_transform.png]]~~source>

~~For~~ This time the ~~list, what happened here seems to be similar to what happened before: it is~~ code worked. This example was very simple and not too ~~costly to parallelize or vectorize the operations, so the execution policies did not have a good result~~impressive. ~~For the array, we also had a similar result as before,~~ Much more complicated algorithms can be implemented with the ~~parallel version working better than the other ones~~Mutex<T>.

For the **vector** data structure, however, we finally had a different result: for some reason, the vectorized version, this time, had a good effect. Apparently the serial version could not be vectorized by default, but when we used the **unseq** execution policy, it was now able to vectorize the operation. An explanation for the fact that the parallel + vectorization version did not have a better result than the parallel one, however, is not very obvious to me. If I had to guess, I would say it is because the parallel version had the vectorization done by the compiler as well, without us needing to specify it.== Rayon Library ===== Introduction ===

~~In conclusion~~Rust lang users and community are heavily involved in development of language extensions. Although core features offer quite a bit of programming tools on their own, there is still a demand for more complex functionality with a higher-level API. Rust has its own package registry that stores various libraries that can satisfy programmers needs. These packages vary from networking tools, web services to image processing and multi-threading libraries.One package can be of particular interest to parallel programming enthusiasts – it is ~~refreshing to see~~ a ~~tool like this~~data-parallelism library called Rayon. It The package is praised to be extremely lightweight and it makes ~~parallelization so~~ it easy ~~that we almost don't need~~ to ~~worry about anything else~~convert a sequential computation into a parallel one. It ~~would be nice to have~~ also abstracts some Rust-lang threading boilerplate code and makes coding parallel tasks a bit more ~~clear documentation for these methods~~fun.Just like TBB, ~~so the results could~~ it can be ~~a little more predictable - Some of these results~~ used with iterators, where iterator chains are ~~not exactly intuitive~~executed in parallel. ~~A suggestion for the next project is~~ It also supports join model and provides ways to ~~analyse the assembly code that was generated to see exactly why we got some of these results~~convert recursive, divide-and-conquer style problems into parallel implementations.

=== Parallel Iterators ===

Rayon offers experimental API called “parallel iterators”. Example below demonstrates parallel sum of squares function:

use rayon::prelude::*;

fn sum_of_squares(input: &[i32]) -> i32 {

input.par_iter()

.map(|&i| i * i)

.sum()

}

</source>

=== Using Join for Divide-and-Conquer Problems ===

Parallel iterators are implemented with a more primitive method called join which takes 2 closures and runs them in parallel. The code snippet below demonstrates increment algorithm:

fn increment_all(slice: &mut [i32]) {

if slice.len() < 1000 {

for p in slice { *p += 1; }

} else {

let mid_point = slice.len() / 2;

let (left, right) = slice.split_at_mut(mid_point);

rayon::join(|| increment_all(left), || increment_all(right));

}

</source>

=== How it works ===

Rayon borrowed work stealing concepts from Cilk project. The library attempts to dynamically ascertain how much multi-threading resources are available. The idea is very simple: there is always a pool of worker threads available, waiting for some work to do. When you call join the first time, we shift over into that pool of threads.

[[File:Rust_Join1.png]]

~~== ??? ==<pre style="color: red">???</pre>~~But if you call join(a, b) from a worker thread W, then W will place b into its work queue, advertising that this is work that other worker threads might help out with. W will then start executing a. While W is busy with a, other threads might come along and take b from its queue. That is called stealing b.

[[File:Rust_Join_2.png|600px]]

~~== Sources ==~~* STL: https://enOnce a is done, W checks whether b was stolen by another thread and, if not, executes b itself.~~wikipedia~~If W runs out of jobs in its own queue, it will look through the other threads' queues and try to steal work from them.~~org/wiki/Standard_Template_Library~~* STL: http://www.cplusplus.com/reference/algorithm/* Policy-Based execution for C++17: https://scs.senecac.on.ca/~oop345/pages/content/multi.html#alg~~* Intel Parallel STL: https://software.intel.com/en-us/get-started-with-pstl*~~

[[File:Rust_Join_3.png|600px]]

== Group Members ==

# [mailto:obelavina@myseneca.ca Olga Belavina]

== Sources ==

* [https://doc.rust-lang.org/book/second-edition/ch03-01-variables-and-mutability.html Variables and Mutability]

* [https://doc.rust-lang.org/book/second-edition/ch04-01-what-is-ownership.html Ownership]

* [https://doc.rust-lang.org/book/second-edition/ch04-02-references-and-borrowing.html References and Borrowing]

* [https://doc.rust-lang.org/std/sync/atomic/struct.AtomicPtr.html Atomic Pointers]

* [https://doc.rust-lang.org/std/sync/struct.Mutex.html Mutex]

* [https://docs.rs/rayon/0.9.0/rayon/ Rayon Library]

* [https://static.rust-lang.org/doc/master/std/sync/mpsc/index.html Channels]

* [https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.html Fearless Concurrency with Rust]

== Progress ==

-5%

Henrique Salvadori Coelho

55

edits

Changes

Team Hortons

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

get involved with CDOT

courses

course projects

links

Tools