From f41bcf37dc64a9c33bbcb34917f3305a21e36844 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Mon, 21 Jan 2019 17:14:58 -0500 Subject: [PATCH] Finish up `static` and start on stack --- _drafts/understanding-allocations-in-rust.md | 222 ++++++++++++++++++- 1 file changed, 213 insertions(+), 9 deletions(-) diff --git a/_drafts/understanding-allocations-in-rust.md b/_drafts/understanding-allocations-in-rust.md index 963e23b..9c33032 100644 --- a/_drafts/understanding-allocations-in-rust.md +++ b/_drafts/understanding-allocations-in-rust.md @@ -75,7 +75,7 @@ Now let's address some conditions and caveats before going much further: - We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's (think the [libc] and [winapi] implementations of [malloc]) that we'll ignore. - We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) - and address (hehe) "release" mode at the end (`cargo run --release` and `cargo test --release`). + and address (pun intended) "release" mode at the end (`cargo run --release` and `cargo test --release`). - All content will be run using Rust 1.31, as that's the highest currently supported in the [Compiler Exporer](https://godbolt.org/). As such, we'll avoid talking about things like [compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) @@ -257,7 +257,7 @@ pub fn multiply_twice(value: u32) -> u32 { value * FACTOR * FACTOR } ``` --- [Compiler Explorer](https://godbolt.org/z/Qc7tHM) +-- [Compiler Explorer](https://odbolt.org/z/Qc7tHM) In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the `multiply` and `multiply_twice` functions; the "1000" value is never @@ -282,15 +282,215 @@ and `thread_local!()` macros later. More generally, `static` variables are globally unique locations in memory, the contents of which are loaded as part of your program being read into main memory. -They allow initialization with both raw values and (most) `const fn` calls, and must -implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) -marker trait. The initial value is loaded along with the program/library binary, -though it can change during the time your program is running. While `static mut` -variables are allowed, mutating a static is considered an `unsafe` operation. -Unlike `const` though, interior mutability is accepted, and can be done with `safe` code. +They allow initialization with both raw values and `const fn` calls, and the initial +value is loaded along with the program/library binary. All static variables must +be of a type that implements the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) +marker trait. And while `static mut` variables are allowed, mutating a static is considered +an `unsafe` operation. + +The single biggest difference between `const` and `static` is the guarantees +provided about uniqueness. Where `const` variables may or may not be copied +in code, `static` variables are guarantee to be unique. If we take a previous +`const` example and change it to `static`, the difference should be clear: + +```rust +static FACTOR: u32 = 1000; + +pub fn multiply(value: u32) -> u32 { + value * FACTOR +} + +pub fn multiply_twice(value: u32) -> u32 { + value * FACTOR * FACTOR +} +``` +-- [Compiler Explorer](https://godbolt.org/z/MGBr5Y) + +Where [previously](https://godbolt.org/z/MGBr5Y) there were plenty of +references to multiplying by 1000, the new assembly refers to `FACTOR` +as a named memory location instead. No initialization work needs to be done, +but the compiler can no longer prove the value never changes during execution. + +Next, let's talk about initialization. The simplest case is initializing +static variables with either scalar or struct notation: + +```rust +#[derive(Debug)] +struct MyStruct { + x: u32 +} + +static MY_STRUCT: MyStruct = MyStruct { + // You can even reference other statics + // declared later + x: MY_VAL +}; + +static MY_VAL: u32 = 24; + +fn main() { + println!("Static MyStruct: {:?}", MY_STRUCT); +} +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) + +Things get a bit weirder when using `const fn`. In most cases, things just work: + +```rust +#[derive(Debug)] +struct MyStruct { + x: u32 +} + +impl MyStruct { + const fn new() -> MyStruct { + MyStruct { x: 24 } + } +} + +static MY_STRUCT: MyStruct = MyStruct::new(); + +fn main() { + println!("const fn Static MyStruct: {:?}", MY_STRUCT); +} +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) + +However, there's a caveat: you're currently not allowed to use `const fn` to initialize +static variables of types that aren't marked `Sync`. As an example, even though +[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) +is `const fn`, because [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), +you'll get an error at compile time: + +```rust +use std::cell::RefCell; + +// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely +static MY_LOCK: RefCell = RefCell::new(0); +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560) + +It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though, +so be on the lookout. + +Which leads well to the next point: static variable types must implement the +[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). +Because they're globally unique, it must be safe for you to access static variables +from any thread at any time. Most `struct` definitions automatically implement the +`Sync` trait because they contain only elements which themselves +implement `Sync`. This is why earlier examples could get away with initializing +statics, even though we never included an `impl Sync for MyStruct` in the code. +For more on the `Sync` trait, the [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html) +has a much more thorough treatment. But as an example, Rust refuses to compile +our earlier example if we add a non-`Sync` element to the `struct` definition: + +```rust +use std::cell::RefCell; + +struct MyStruct { + x: u32, + y: RefCell, +} + +// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely +static MY_STRUCT: MyStruct = MyStruct { + x: 8, + y: RefCell::new(8) +}; +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) + +Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. +Unlike `const` however, interior mutability is acceptable. To demonstrate: + +```rust +use std::sync::Once; + +// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once +static INIT: Once = Once::new(); + +fn main() { + // Note that while `INIT` is declared immutable, we're still allowed + // to mutate its interior + INIT.call_once(|| println!("Initializing...")); + // This code won't panic, as the interior of INIT was modified + // as part of the previous `call_once` + INIT.call_once(|| panic!("INIT was called twice!")); +} +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59) ## **push** and **pop**: Stack Allocations +**const** and **static** are perfectly fine, but it's very rare that we know +at compile-time about either references or values that will be the same for the entire +time our program is running. Put another way, it's not often the case that either you +or your compiler know how much memory your entire program will need. + +However, there are still some optimizations the compiler can do if it knows how much +memory individual functions will need. Specifically, the compiler can make use of +"stack" memory (as opposed to "heap" memory) which can be managed far faster in +both the short- and long-term. When requesting memory, the +[`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) +can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) +(<1 nanosecond on modern CPUs). Heap memory instead requires using an allocator +(specialized software to track what memory is in use) to reserve space. +And when you're finished with memory, the `pop` instruction likewise runs in +1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation +and other issues. All sorts of incredibly sophisticated techniques have been used +to design allocators: +- [Garbage Collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)) + strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) + (used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) + and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) + (used in [Python](https://docs.python.org/3/extending/extending.html#reference-counts)) +- Thread-local structures to prevent locking the allocator in [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html) +- Arena structures used in [jemalloc](http://jemalloc.net/), which until recently + was the primary allocator for Rust programs! + +But no matter how sophisticated your allocator is, the principle remains: the +fastest allocator is the one you never use. As such, we're not going to go +in detail on how exactly the +[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), +and we'll focus instead on the conditions that enable the Rust compiler to use +stack-based allocation for variables. + +Now, one question I hope you're asking is "how do we distinguish stack- and +heap-based allocations in Rust code?" There are three strategies I'm going +to use for this: + +1. Any time the `push` or `pop` instructions are used, or the `rsp` register is modified, + this is a stack allocation: + ```rust + pub fn stack_alloc(x: u32) -> u32 { + // Space for `y` is allocated by subtracting from `rsp`, + // and then populated + let y = [1u8, 2, 3, 4]; + // Space for `y` is deallocated by adding back to `rsp` + x + } + ``` + -- [Compiler Explorer](https://godbolt.org/z/gKFOgB) +2. Any time `call core::ptr::drop_in_place` occurs, a heap allocation has occurred + sometime in the past and it is now time for us to de-allocate the memory: + ```rust + pub fn heap_alloc(x: usize) -> usize { + // Space for elements in a vector has to be allocated + // on the heap, and is then de-allocated once the + // vector goes out of scope + let y: Vec = Vec::with_capacity(x); + x + } + ``` + -- [Compiler Explorer](https://godbolt.org/z/T2xoh8) (`drop_in_place` happens on line 1321) +3. Using a special [`GlobalAlloc`](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html) + implementation to track when heap allocations occur. For this post, I'll be using + [qadapt](https://crates.io/crates/qadapt) to trigger a panic if heap allocations + occur; code that doesn't panic doesn't use heap allocations, and by necessity + uses stack allocation instead. + +With all that in mind, let's talk about how to use the stack in Rust. + Example: Why doesn't `Vec::new()` go to the allocator? Questions: @@ -304,6 +504,8 @@ Questions: 7. Legal to pass an array as an argument? 8. Can you force a heap allocation with arrays that are larger than stack size? - Check `ulimit -s` +9. Can you force heap allocation by returning something that escapes the stack? + - Will `#[inline(always)]` move this back to a stack allocation? # Piling On - Rust and the Heap @@ -322,7 +524,9 @@ Questions: # Compiler Optimizations Make Everything Complicated -Example: Compiler stripping out allocations of Box<>, Vec::push() +1. Box<> getting inlined into stack allocations +2. Vec::push() === Vec::with_capacity() for fixed/predictable capacities +3. Inlining statics that don't change value # Appendix and Further Reading