Team Hortons
Once upon a time there was a team. In their programs, they never used floats. They only used double doubles.
Contents
Multithreading with Rust
Introduction: The Rust Programming language
Rust is a system programming language (like C and C++, offering direct access to physical memory) sponsored by Mozilla, created in 2010. It is compiled, imperative, functional and strongly typed; it aims to provide better memory safety and still maintain a performance similar to C++, while providing garbage collection through reference counting. Rust won the first place as the "most loved programming language" in 2016 and 2017, in a survey from Stack Overflow 1 2.
Rust has a very strong emphasis in security, more specifically, memory safety. Because of this, the same rules that the compiler requires from a program end up solving many problems that are common in concurrent executions, such as race conditions. This project will explain what these security rules are, how they can prevent problems in parallel programming, as well as demonstrating the correct way to tackle concurrency with Rust.
Memory Ownership and Borrowing
Rust has a very peculiar way to deal with memory deallocation, security, and ownership. Understanding how memory management works is essential in order to understand how to use parallel programming in Rust. Consider the following example, where we create a vector, push some elements into it, and then print the elements:
fn main() {
let mut vec = Vec::new();
vec.push(0);
vec.push(1);
vec.push(2);
for i in vec.iter() {
println!("{}", i);
}
}
Output:
0
1
2
Notice how variables in Rust are immutable (constant) by default - we need to manually specify that the vector can be changed. Failing to do so will result in a compilation error.
In the next example, we abstracted the "push" method to a function called "add_to_vec", which takes a vector and a number, and pushes the number into the vector:
fn add_to_vec(mut v: Vec<i32>, n: i32) {
v.push(n);
}
fn main() {
let mut vec = Vec::new();
vec.push(0);
vec.push(1);
vec.push(2);
add_to_vec(vec, 3);
for i in vec.iter() {
println!("{}", i);
}
}
In a language like C++, a vector passed like this would be copied to the function and modified, leaving the original vector untouched; in a language like Java, a similar logic would work, since objects are passed by reference by default. Rust, on the other hand, has a different approach: sending the vector to a function like this means transferring the ownership of the object completely to that function. This means that after the vector is sent to a function, it is not available to the main function anymore - once the add_to_vec function ends, the vector will then be deallocated, and the next *for* loop will fail.
If we try compiling this code, the compiler will produce an error:
error[E0382]: use of moved value: `vec`
--> src/main.rs:13:12
|
11 | add_to_vec(vec, 3);
| --- value moved here
12 |
13 | for i in vec.iter() {
| ^^^ value used here after move
|
= note: move occurs because `vec` has type `std::vec::Vec<i32>`, which does not implement the `Copy` trait
To fix this, we will have to transfer the ownership of the vector back to the main function. We do this by returning the vector back and reassigning it to the variable vec:
fn add_to_vec(mut v: Vec<i32>, n: i32) -> Vec<i32> {
v.push(n);
v // shorthand for returning. notice the lack of ;
}
fn main() {
let mut vec = Vec::new();
vec.push(0);
vec.push(1);
vec.push(2);
vec = add_to_vec(vec, 3); // receiving the ownership back
for i in vec.iter() {
println!("{}", i);
}
}
Now the output is the expected:
0
1
2
3
An easier way to deal with memory ownership is by lending/borrowing memory instead of completely transferring control over it. This is called memory borrowing. To borrow a value, we use the & operator:
fn add_to_vec(v: &mut Vec<i32>, n: i32) {
v.push(n);
}
fn main() {
let mut vec = Vec::new();
vec.push(0);
vec.push(1);
vec.push(2);
add_to_vec(&mut vec, 3);
for i in vec.iter() {
println!("{}", i);
}
}
This will also produce the correct output. In this case, we don't need to send the ownership of the vector back, because it was never transferred, only borrowed. There is one important detail: mutable references (&mut T) are only allowed to exist if no other mutable references to that value are active. In other words: only one mutable reference to a value may exist at a time.
From what was presented above, some problems can be identified when applying parallelism:
- If memory ownership is transferred in function calls, we can not transfer the ownership of one something to several threads at the same time - only one thread can have the ownership
- Mutable references to something can only exist one at a time, this means that we can not spawn several threads with mutable references to a value
These two factors limit the capability of conventional shared memory. For example, the following code would not compile:
use std::thread;
fn add_to_vec(v: &mut Vec<i32>, n: i32) {
v.push(n);
}
fn main() {
let mut vec = Vec::new();
vec.push(0);
vec.push(1);
vec.push(2);
let t1 = thread::spawn(move || {
add_to_vec(&mut vec, 4);
});
add_to_vec(&mut vec, 3);
for i in vec.iter() {
println!("{}", i);
}
t1.join();
}
error[E0382]: use of moved value: `vec`
--> src/main.rs:20:19
|
13 | let t1 = thread::spawn(move || {
| ------- value moved (into closure) here
...
20 | add_to_vec(&mut vec, 3);
| ^^^ value used here after move
|
Although this seems unnecessarily strict, it has a great advantage: race conditions are impossible - they are checked at compile time, which means that the systems will be safer and less likely to present concurrency problems. To correctly use parallel programming in Rust, we can make use of tools such as Channels, Locks (Mutex), and Atomic pointers.
Channels
Instead of sharing memory, Rust encourages communication between thread in order to combine information. Channels, which are available in Rust's standard library, can be pictured as rivers: an object put into this river will be carried downstream. Likewise, a Channel has two parts: a sender and a receiver. The sender is responsible for putting information in the channel, which can be retrieved by the receiver. When we ask for a receiver to get a message from the channel, the execution may wait until the message is received or not: the method recv will block the execution, while try_recv will not.
The following example will spawn a thread, which will send a few strings down a channel which will be received by the main thread.
use std::thread;
use std::sync::mpsc;
use std::sync::mpsc::{Sender, Receiver};
fn main() {
let (sender, receiver): (Sender<String>, Receiver<String>) =
mpsc::channel();
thread::spawn(move || {
let messages = vec![
String::from("blue"),
String::from("yellow"),
String::from("green"),
String::from("red"),
];
for colour in messages {
sender.send(colour).unwrap();
}
});
for message in receiver {
println!("Got {}", message);
}
}
This produces the following output:
Got blue
Got yellow
Got green
Got red
Similarly to the previous examples, the sender will be moved to the closure executed by the thread, which means that it will go out of scope when the thread finishes. To distribute the sender through several threads, they can be cloned: objects sent through this sender will still be received by the same receiver, but the two senders are different objects, which can have their ownership moved:
use std::thread;
use std::sync::mpsc;
use std::sync::mpsc::{Sender, Receiver};
static NUM_THREADS: i32 = 8;
fn main() {
let mut vec = Vec::new();
let (sender, receiver): (Sender<i32>, Receiver<i32>) =
mpsc::channel();
for i in 0..NUM_THREADS {
// cloning the sender so it can be exclusive to the new thread
let thread_sender = sender.clone();
thread::spawn(move || {
thread_sender.send(i).unwrap();
});
}
for _ in 0..NUM_THREADS {
vec.push(receiver.recv().unwrap());
}
for i in vec.iter() {
println!("{}", i);
}
}
Output:
1
3
0
2
5
6
7
4
Locks (Mutex)
[text]
RC and Atomic
[text]
Group Members
Progress
-5%