diff --git a/blog/2018-12-15-allocation-safety/index.mdx b/blog/2018-12-15-allocation-safety/index.mdx index 9333459..2579024 100644 --- a/blog/2018-12-15-allocation-safety/index.mdx +++ b/blog/2018-12-15-allocation-safety/index.mdx @@ -12,7 +12,7 @@ bit over a month ago, I was dispensing sage wisdom for the ages: > I had a really great idea: build a custom allocator that allows you to track your own allocations. > I gave it a shot, but learned very quickly: **never write your own allocator.** > -> -- [me](../2018-10-08-case-study-optimization) +> -- [me](/2018/10/case-study-optimization) I proceeded to ignore it, because we never really learn from our mistakes. diff --git a/blog/2019-02-04-understanding-allocations-in-rust/_article.md b/blog/2019-02-04-understanding-allocations-in-rust/_article.md new file mode 100644 index 0000000..48b9df6 --- /dev/null +++ b/blog/2019-02-04-understanding-allocations-in-rust/_article.md @@ -0,0 +1,113 @@ +--- +layout: post +title: "Allocations in Rust" +description: "An introduction to the memory model." +category: +tags: [rust, understanding-allocations] +--- + +There's an alchemy of distilling complex technical topics into articles and videos that change the +way programmers see the tools they interact with on a regular basis. I knew what a linker was, but +there's a staggering amount of complexity in between +[the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). Rust programmers use the +[`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) type all the time, but there's a +rich history of the Rust language itself wrapped up in +[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/). + +In a similar vein, this series attempts to look at code and understand how memory is used; the +complex choreography of operating system, compiler, and program that frees you to focus on +functionality far-flung from frivolous book-keeping. The Rust compiler relieves a great deal of the +cognitive burden associated with memory management, but we're going to step into its world for a +while. + +Let's learn a bit about memory in Rust. + +# Table of Contents + +This series is intended as both learning and reference material; we'll work through the different +memory types Rust uses, and explain the implications of each. Ultimately, a summary will be provided +as a cheat sheet for easy future reference. To that end, a table of contents is in order: + +- Foreword +- [Global Memory Usage: The Whole World](/2019/02/the-whole-world.html) +- [Fixed Memory: Stacking Up](/2019/02/stacking-up.html) +- [Dynamic Memory: A Heaping Helping](/2019/02/a-heaping-helping.html) +- [Compiler Optimizations: What It's Done For You Lately](/2019/02/compiler-optimizations.html) +- [Summary: What Are the Rules?](/2019/02/summary.html) + +# Foreword + +Rust's three defining features of +[Performance, Reliability, and Productivity](https://www.rust-lang.org/) are all driven to a great +degree by the how the Rust compiler understands memory usage. Unlike managed memory languages (Java, +Python), Rust +[doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis) +garbage collect; instead, it uses an +[ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) system to reason about +how long objects will last in your program. In some cases, if the life of an object is fairly +transient, Rust can make use of a very fast region called the "stack." When that's not possible, +Rust uses +[dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation) +and the ownership system to ensure you can't accidentally corrupt memory. It's not as fast, but it +is important to have available. + +That said, there are specific situations in Rust where you'd never need to worry about the +stack/heap distinction! If you: + +1. Never use `unsafe` +2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html) + +...then it's not possible for you to use dynamic memory! + +For some uses of Rust, typically embedded devices, these constraints are OK. They have very limited +memory, and the program binary size itself may significantly affect what's available! There's no +operating system able to manage this +["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that's not an issue +because there's only one running application. The +[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, and +interacting with the "real world" through extra peripherals is accomplished by reading and writing +to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html). + +Most Rust programs find these requirements overly burdensome though. C++ developers would struggle +without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) (except those +hardcore no-STL people), and Rust developers would struggle without +[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above, +`std::vec` is actually a part of the +[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. `Box`, +`Rc`, etc., are also unusable for the same reason. + +Whether writing code for embedded devices or not, the important thing in both situations is how much +you know _before your application starts_ about what its memory usage will look like. In embedded +devices, there's a small, fixed amount of memory to use. In a browser, you have no idea how large +[google.com](https://www.google.com)'s home page is until you start trying to download it. The +compiler uses this knowledge (or lack thereof) to optimize how memory is used; put simply, your code +runs faster when the compiler can guarantee exactly how much memory your program needs while it's +running. This series is all about understanding how the compiler reasons about your program, with an +emphasis on the implications for performance. + +Now let's address some conditions and caveats before going much further: + +- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's + ([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll + ignore. +- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) and + address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`). +- All content will be run using Rust 1.32, as that's the highest currently supported in the + [Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like + [compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) + that are available in nightly. +- Because of the nature of the content, being able to read assembly is helpful. We'll keep it + simple, but I [found](https://stackoverflow.com/a/4584131/1454178) a + [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop` + [instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) was helpful while writing + this. +- I've tried to be precise in saying only what I can prove using the tools (ASM, docs) that are + available, but if there's something said in error it will be corrected expeditiously. Please let + me know at [bradlee@speice.io](mailto:bradlee@speice.io) + +Finally, I'll do what I can to flag potential future changes but the Rust docs have a notice worth +repeating: + +> Rust does not currently have a rigorously and formally defined memory model. +> +> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html) diff --git a/blog/2019-02-04-understanding-allocations-in-rust/index.mdx b/blog/2019-02-04-understanding-allocations-in-rust/index.mdx new file mode 100644 index 0000000..4d73e40 --- /dev/null +++ b/blog/2019-02-04-understanding-allocations-in-rust/index.mdx @@ -0,0 +1,102 @@ +--- +slug: 2019/02/understanding-allocations-in-rust +title: "Allocations in Rust: Foreword" +date: 2019-02-04 12:00:00 +authors: [bspeice] +tags: [] +--- + +There's an alchemy of distilling complex technical topics into articles and videos that change the +way programmers see the tools they interact with on a regular basis. I knew what a linker was, but +there's a staggering amount of complexity in between +[the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). Rust programmers use the +[`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) type all the time, but there's a +rich history of the Rust language itself wrapped up in +[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/). + +In a similar vein, this series attempts to look at code and understand how memory is used; the +complex choreography of operating system, compiler, and program that frees you to focus on +functionality far-flung from frivolous book-keeping. The Rust compiler relieves a great deal of the +cognitive burden associated with memory management, but we're going to step into its world for a +while. + +Let's learn a bit about memory in Rust. + + + +--- + +Rust's three defining features of +[Performance, Reliability, and Productivity](https://www.rust-lang.org/) are all driven to a great +degree by the how the Rust compiler understands memory usage. Unlike managed memory languages (Java, +Python), Rust +[doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis) +garbage collect; instead, it uses an +[ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) system to reason about +how long objects will last in your program. In some cases, if the life of an object is fairly +transient, Rust can make use of a very fast region called the "stack." When that's not possible, +Rust uses +[dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation) +and the ownership system to ensure you can't accidentally corrupt memory. It's not as fast, but it +is important to have available. + +That said, there are specific situations in Rust where you'd never need to worry about the +stack/heap distinction! If you: + +1. Never use `unsafe` +2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html) + +...then it's not possible for you to use dynamic memory! + +For some uses of Rust, typically embedded devices, these constraints are OK. They have very limited +memory, and the program binary size itself may significantly affect what's available! There's no +operating system able to manage this +["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that's not an issue +because there's only one running application. The +[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, and +interacting with the "real world" through extra peripherals is accomplished by reading and writing +to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html). + +Most Rust programs find these requirements overly burdensome though. C++ developers would struggle +without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) (except those +hardcore no-STL people), and Rust developers would struggle without +[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above, +`std::vec` is actually a part of the +[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. `Box`, +`Rc`, etc., are also unusable for the same reason. + +Whether writing code for embedded devices or not, the important thing in both situations is how much +you know _before your application starts_ about what its memory usage will look like. In embedded +devices, there's a small, fixed amount of memory to use. In a browser, you have no idea how large +[google.com](https://www.google.com)'s home page is until you start trying to download it. The +compiler uses this knowledge (or lack thereof) to optimize how memory is used; put simply, your code +runs faster when the compiler can guarantee exactly how much memory your program needs while it's +running. This series is all about understanding how the compiler reasons about your program, with an +emphasis on the implications for performance. + +Now let's address some conditions and caveats before going much further: + +- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's + ([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll + ignore. +- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) and + address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`). +- All content will be run using Rust 1.32, as that's the highest currently supported in the + [Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like + [compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) + that are available in nightly. +- Because of the nature of the content, being able to read assembly is helpful. We'll keep it + simple, but I [found](https://stackoverflow.com/a/4584131/1454178) a + [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop` + [instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) was helpful while writing + this. +- I've tried to be precise in saying only what I can prove using the tools (ASM, docs) that are + available, but if there's something said in error it will be corrected expeditiously. Please let + me know at [bradlee@speice.io](mailto:bradlee@speice.io) + +Finally, I'll do what I can to flag potential future changes but the Rust docs have a notice worth +repeating: + +> Rust does not currently have a rigorously and formally defined memory model. +> +> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html) diff --git a/blog/2019-02-05-the-whole-world/_article.md b/blog/2019-02-05-the-whole-world/_article.md new file mode 100644 index 0000000..ef3bc47 --- /dev/null +++ b/blog/2019-02-05-the-whole-world/_article.md @@ -0,0 +1,337 @@ +--- +layout: post +title: "Global Memory Usage: The Whole World" +description: "Static considered slightly less harmful." +category: +tags: [rust, understanding-allocations] +--- + +The first memory type we'll look at is pretty special: when Rust can prove that a _value_ is fixed +for the life of a program (`const`), and when a _reference_ is unique for the life of a program +(`static` as a declaration, not +[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) as a +lifetime), we can make use of global memory. This special section of data is embedded directly in +the program binary so that variables are ready to go once the program loads; no additional +computation is necessary. + +Understanding the value/reference distinction is important for reasons we'll go into below, and +while the +[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) for +these two keywords is available, we'll take a hands-on approach to the topic. + +# **const** + +When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars, +`struct`s, etc.), you can declare it `const`. This tells the compiler that it's safe to treat the +value as never changing, and enables some interesting optimizations; not only is there no +initialization cost to creating the value (it is loaded at the same time as the executable parts of +your program), but the compiler can also copy the value around if it speeds up the code. + +The points we need to address when talking about `const` are: + +- `Const` values are stored in read-only memory - it's impossible to modify. +- Values resulting from calling a `const fn` are materialized at compile-time. +- The compiler may (or may not) copy `const` values wherever it chooses. + +## Read-Only + +The first point is a bit strange - "read-only memory." +[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants) +mentions in a couple places that using `mut` with constants is illegal, but it's also important to +demonstrate just how immutable they are. _Typically_ in Rust you can use +[interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) to modify +things that aren't declared `mut`. +[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an example of this +pattern in action: + +```rust +use std::cell::RefCell; + +fn my_mutator(cell: &RefCell) { + // Even though we're given an immutable reference, + // the `replace` method allows us to modify the inner value. + cell.replace(14); +} + +fn main() { + let cell = RefCell::new(25); + // Prints out 25 + println!("Cell: {:?}", cell); + my_mutator(&cell); + // Prints out 14 + println!("Cell: {:?}", cell); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2) + +When `const` is involved though, interior mutability is impossible: + +```rust +use std::cell::RefCell; + +const CELL: RefCell = RefCell::new(25); + +fn my_mutator(cell: &RefCell) { + cell.replace(14); +} + +fn main() { + // First line prints 25 as expected + println!("Cell: {:?}", &CELL); + my_mutator(&CELL); + // Second line *still* prints 25 + println!("Cell: {:?}", &CELL); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00) + +And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html): + +```rust +use std::sync::Once; + +const SURPRISE: Once = Once::new(); + +fn main() { + // This is how `Once` is supposed to be used + SURPRISE.call_once(|| println!("Initializing...")); + // Because `Once` is a `const` value, we never record it + // having been initialized the first time, and this closure + // will also execute. + SURPRISE.call_once(|| println!("Initializing again???")); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed) + +When the +[`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md) +refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this +behavior is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this +as an error, but it's still something to be aware of. + +## Initialization == Compilation + +The next thing to mention is that `const` values are loaded into memory _as part of your program +binary_. Because of this, any `const` values declared in your program will be "realized" at +compile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may +be able to prefetch the value), but that's it. + +```rust +use std::cell::RefCell; + +const CELL: RefCell = RefCell::new(24); + +pub fn multiply(value: u32) -> u32 { + // CELL is stored at `.L__unnamed_1` + value * (*CELL.get_mut()) +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/Th8boO) + +The compiler creates one `RefCell`, uses it everywhere, and never needs to call the `RefCell::new` +function. + +## Copying + +If it's helpful though, the compiler can choose to copy `const` values. + +```rust +const FACTOR: u32 = 1000; + +pub fn multiply(value: u32) -> u32 { + // See assembly line 4 for the `mov edi, 1000` instruction + value * FACTOR +} + +pub fn multiply_twice(value: u32) -> u32 { + // See assembly lines 22 and 29 for `mov edi, 1000` instructions + value * FACTOR * FACTOR +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/ZtS54X) + +In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the +`multiply` and `multiply_twice` functions; the "1000" value is never "stored" anywhere, as it's +small enough to inline into the assembly instructions. + +Finally, getting the address of a `const` value is possible, but not guaranteed to be unique +(because the compiler can choose to copy values). I was unable to get non-unique pointers in my +testing (even using different crates), but the specifications are clear enough: _don't rely on +pointers to `const` values being consistent_. To be frank, caring about locations for `const` values +is almost certainly a code smell. + +# **static** + +Static variables are related to `const` variables, but take a slightly different approach. When we +declare that a _reference_ is unique for the life of a program, you have a `static` variable +(unrelated to the `'static` lifetime). Because of the reference/value distinction with +`const`/`static`, static variables behave much more like typical "global" variables. + +But to understand `static`, here's what we'll look at: + +- `static` variables are globally unique locations in memory. +- Like `const`, `static` variables are loaded at the same time as your program being read into + memory. +- All `static` variables must implement the + [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) marker trait. +- Interior mutability is safe and acceptable when using `static` variables. + +## Memory Uniqueness + +The single biggest difference between `const` and `static` is the guarantees provided about +uniqueness. Where `const` variables may or may not be copied in code, `static` variables are +guarantee to be unique. If we take a previous `const` example and change it to `static`, the +difference should be clear: + +```rust +static FACTOR: u32 = 1000; + +pub fn multiply(value: u32) -> u32 { + // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used + value * FACTOR +} + +pub fn multiply_twice(value: u32) -> u32 { + // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used + value * FACTOR * FACTOR +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ) + +Where [previously](#copying) there were plenty of references to multiplying by 1000, the new +assembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be +done, but the compiler can no longer prove the value never changes during execution. + +## Initialization == Compilation + +Next, let's talk about initialization. The simplest case is initializing static variables with +either scalar or struct notation: + +```rust +#[derive(Debug)] +struct MyStruct { + x: u32 +} + +static MY_STRUCT: MyStruct = MyStruct { + // You can even reference other statics + // declared later + x: MY_VAL +}; + +static MY_VAL: u32 = 24; + +fn main() { + println!("Static MyStruct: {:?}", MY_STRUCT); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) + +Things can get a bit weirder when using `const fn` though. In most cases, it just works: + +```rust +#[derive(Debug)] +struct MyStruct { + x: u32 +} + +impl MyStruct { + const fn new() -> MyStruct { + MyStruct { x: 24 } + } +} + +static MY_STRUCT: MyStruct = MyStruct::new(); + +fn main() { + println!("const fn Static MyStruct: {:?}", MY_STRUCT); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) + +However, there's a caveat: you're currently not allowed to use `const fn` to initialize static +variables of types that aren't marked `Sync`. For example, +[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) is a +`const fn`, but because +[`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you'll +get an error at compile time: + +```rust +use std::cell::RefCell; + +// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely +static MY_LOCK: RefCell = RefCell::new(0); +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560) + +It's likely that this will +[change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though. + +## **Sync** + +Which leads well to the next point: static variable types must implement the +[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they're globally +unique, it must be safe for you to access static variables from any thread at any time. Most +`struct` definitions automatically implement the `Sync` trait because they contain only elements +which themselves implement `Sync` (read more in the +[Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). This is why earlier examples could +get away with initializing statics, even though we never included an `impl Sync for MyStruct` in the +code. To demonstrate this property, Rust refuses to compile our earlier example if we add a +non-`Sync` element to the `struct` definition: + +```rust +use std::cell::RefCell; + +struct MyStruct { + x: u32, + y: RefCell, +} + +// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely +static MY_STRUCT: MyStruct = MyStruct { + x: 8, + y: RefCell::new(8) +}; +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) + +## Interior Mutability + +Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. If we +want to stay in `safe` Rust, we can use interior mutability to accomplish similar goals: + +```rust +use std::sync::Once; + +// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once +static INIT: Once = Once::new(); + +fn main() { + // Note that while `INIT` is declared immutable, we're still allowed + // to mutate its interior + INIT.call_once(|| println!("Initializing...")); + // This code won't panic, as the interior of INIT was modified + // as part of the previous `call_once` + INIT.call_once(|| panic!("INIT was called twice!")); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59) diff --git a/blog/2019-02-05-the-whole-world/index.mdx b/blog/2019-02-05-the-whole-world/index.mdx new file mode 100644 index 0000000..69c5269 --- /dev/null +++ b/blog/2019-02-05-the-whole-world/index.mdx @@ -0,0 +1,339 @@ +--- +slug: 2019/02/the-whole-world +title: "Allocations in Rust: Global memory" +date: 2019-02-05 12:00:00 +authors: [bspeice] +tags: [] +--- + +The first memory type we'll look at is pretty special: when Rust can prove that a _value_ is fixed +for the life of a program (`const`), and when a _reference_ is unique for the life of a program +(`static` as a declaration, not +[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) as a +lifetime), we can make use of global memory. This special section of data is embedded directly in +the program binary so that variables are ready to go once the program loads; no additional +computation is necessary. + +Understanding the value/reference distinction is important for reasons we'll go into below, and +while the +[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) for +these two keywords is available, we'll take a hands-on approach to the topic. + + + +## `const` values + +When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars, +`struct`s, etc.), you can declare it `const`. This tells the compiler that it's safe to treat the +value as never changing, and enables some interesting optimizations; not only is there no +initialization cost to creating the value (it is loaded at the same time as the executable parts of +your program), but the compiler can also copy the value around if it speeds up the code. + +The points we need to address when talking about `const` are: + +- `Const` values are stored in read-only memory - it's impossible to modify. +- Values resulting from calling a `const fn` are materialized at compile-time. +- The compiler may (or may not) copy `const` values wherever it chooses. + +### Read-Only + +The first point is a bit strange - "read-only memory." +[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants) +mentions in a couple places that using `mut` with constants is illegal, but it's also important to +demonstrate just how immutable they are. _Typically_ in Rust you can use +[interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) to modify +things that aren't declared `mut`. +[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an example of this +pattern in action: + +```rust +use std::cell::RefCell; + +fn my_mutator(cell: &RefCell) { + // Even though we're given an immutable reference, + // the `replace` method allows us to modify the inner value. + cell.replace(14); +} + +fn main() { + let cell = RefCell::new(25); + // Prints out 25 + println!("Cell: {:?}", cell); + my_mutator(&cell); + // Prints out 14 + println!("Cell: {:?}", cell); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2) + +When `const` is involved though, interior mutability is impossible: + +```rust +use std::cell::RefCell; + +const CELL: RefCell = RefCell::new(25); + +fn my_mutator(cell: &RefCell) { + cell.replace(14); +} + +fn main() { + // First line prints 25 as expected + println!("Cell: {:?}", &CELL); + my_mutator(&CELL); + // Second line *still* prints 25 + println!("Cell: {:?}", &CELL); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00) + +And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html): + +```rust +use std::sync::Once; + +const SURPRISE: Once = Once::new(); + +fn main() { + // This is how `Once` is supposed to be used + SURPRISE.call_once(|| println!("Initializing...")); + // Because `Once` is a `const` value, we never record it + // having been initialized the first time, and this closure + // will also execute. + SURPRISE.call_once(|| println!("Initializing again???")); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed) + +When the +[`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md) +refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this +behavior is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this +as an error, but it's still something to be aware of. + +### Initialization + +The next thing to mention is that `const` values are loaded into memory _as part of your program +binary_. Because of this, any `const` values declared in your program will be "realized" at +compile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may +be able to prefetch the value), but that's it. + +```rust +use std::cell::RefCell; + +const CELL: RefCell = RefCell::new(24); + +pub fn multiply(value: u32) -> u32 { + // CELL is stored at `.L__unnamed_1` + value * (*CELL.get_mut()) +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/Th8boO) + +The compiler creates one `RefCell`, uses it everywhere, and never needs to call the `RefCell::new` +function. + +### Copying + +If it's helpful though, the compiler can choose to copy `const` values. + +```rust +const FACTOR: u32 = 1000; + +pub fn multiply(value: u32) -> u32 { + // See assembly line 4 for the `mov edi, 1000` instruction + value * FACTOR +} + +pub fn multiply_twice(value: u32) -> u32 { + // See assembly lines 22 and 29 for `mov edi, 1000` instructions + value * FACTOR * FACTOR +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/ZtS54X) + +In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the +`multiply` and `multiply_twice` functions; the "1000" value is never "stored" anywhere, as it's +small enough to inline into the assembly instructions. + +Finally, getting the address of a `const` value is possible, but not guaranteed to be unique +(because the compiler can choose to copy values). I was unable to get non-unique pointers in my +testing (even using different crates), but the specifications are clear enough: _don't rely on +pointers to `const` values being consistent_. To be frank, caring about locations for `const` values +is almost certainly a code smell. + +## `static` values + +Static variables are related to `const` variables, but take a slightly different approach. When we +declare that a _reference_ is unique for the life of a program, you have a `static` variable +(unrelated to the `'static` lifetime). Because of the reference/value distinction with +`const`/`static`, static variables behave much more like typical "global" variables. + +But to understand `static`, here's what we'll look at: + +- `static` variables are globally unique locations in memory. +- Like `const`, `static` variables are loaded at the same time as your program being read into + memory. +- All `static` variables must implement the + [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) marker trait. +- Interior mutability is safe and acceptable when using `static` variables. + +### Memory Uniqueness + +The single biggest difference between `const` and `static` is the guarantees provided about +uniqueness. Where `const` variables may or may not be copied in code, `static` variables are +guarantee to be unique. If we take a previous `const` example and change it to `static`, the +difference should be clear: + +```rust +static FACTOR: u32 = 1000; + +pub fn multiply(value: u32) -> u32 { + // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used + value * FACTOR +} + +pub fn multiply_twice(value: u32) -> u32 { + // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used + value * FACTOR * FACTOR +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ) + +Where [previously](#copying) there were plenty of references to multiplying by 1000, the new +assembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be +done, but the compiler can no longer prove the value never changes during execution. + +### Initialization + +Next, let's talk about initialization. The simplest case is initializing static variables with +either scalar or struct notation: + +```rust +#[derive(Debug)] +struct MyStruct { + x: u32 +} + +static MY_STRUCT: MyStruct = MyStruct { + // You can even reference other statics + // declared later + x: MY_VAL +}; + +static MY_VAL: u32 = 24; + +fn main() { + println!("Static MyStruct: {:?}", MY_STRUCT); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) + +Things can get a bit weirder when using `const fn` though. In most cases, it just works: + +```rust +#[derive(Debug)] +struct MyStruct { + x: u32 +} + +impl MyStruct { + const fn new() -> MyStruct { + MyStruct { x: 24 } + } +} + +static MY_STRUCT: MyStruct = MyStruct::new(); + +fn main() { + println!("const fn Static MyStruct: {:?}", MY_STRUCT); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) + +However, there's a caveat: you're currently not allowed to use `const fn` to initialize static +variables of types that aren't marked `Sync`. For example, +[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) is a +`const fn`, but because +[`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you'll +get an error at compile time: + +```rust +use std::cell::RefCell; + +// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely +static MY_LOCK: RefCell = RefCell::new(0); +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560) + +It's likely that this will +[change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though. + +### The `Sync` marker + +Which leads well to the next point: static variable types must implement the +[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they're globally +unique, it must be safe for you to access static variables from any thread at any time. Most +`struct` definitions automatically implement the `Sync` trait because they contain only elements +which themselves implement `Sync` (read more in the +[Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). This is why earlier examples could +get away with initializing statics, even though we never included an `impl Sync for MyStruct` in the +code. To demonstrate this property, Rust refuses to compile our earlier example if we add a +non-`Sync` element to the `struct` definition: + +```rust +use std::cell::RefCell; + +struct MyStruct { + x: u32, + y: RefCell, +} + +// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely +static MY_STRUCT: MyStruct = MyStruct { + x: 8, + y: RefCell::new(8) +}; +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) + +### Interior mutability + +Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. If we +want to stay in `safe` Rust, we can use interior mutability to accomplish similar goals: + +```rust +use std::sync::Once; + +// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once +static INIT: Once = Once::new(); + +fn main() { + // Note that while `INIT` is declared immutable, we're still allowed + // to mutate its interior + INIT.call_once(|| println!("Initializing...")); + // This code won't panic, as the interior of INIT was modified + // as part of the previous `call_once` + INIT.call_once(|| panic!("INIT was called twice!")); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59) diff --git a/blog/2019-02-06-stacking-up/_article.md b/blog/2019-02-06-stacking-up/_article.md new file mode 100644 index 0000000..b060ea1 --- /dev/null +++ b/blog/2019-02-06-stacking-up/_article.md @@ -0,0 +1,601 @@ +--- +layout: post +title: "Fixed Memory: Stacking Up" +description: "We don't need no allocator." +category: +tags: [rust, understanding-allocations] +--- + +`const` and `static` are perfectly fine, but it's relatively rare that we know at compile-time about +either values or references that will be the same for the duration of our program. Put another way, +it's not often the case that either you or your compiler knows how much memory your entire program +will ever need. + +However, there are still some optimizations the compiler can do if it knows how much memory +individual functions will need. Specifically, the compiler can make use of "stack" memory (as +opposed to "heap" memory) which can be managed far faster in both the short- and long-term. When +requesting memory, the [`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) +can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (<1 +nanosecond on modern CPUs). Contrast that to heap memory which requires an allocator (specialized +software to track what memory is in use) to reserve space. When you're finished with stack memory, +the `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory +fragmentation and other issues with the heap. All sorts of incredibly sophisticated techniques have +been used to design allocators: + +- [Garbage Collection]() + strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) (used in + [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) and + [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) (used in + [Python](https://docs.python.org/3/extending/extending.html#reference-counts)) +- Thread-local structures to prevent locking the allocator in + [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html) +- Arena structures used in [jemalloc](http://jemalloc.net/), which + [until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default) + was the primary allocator for Rust programs! + +But no matter how fast your allocator is, the principle remains: the fastest allocator is the one +you never use. As such, we're not going to discuss how exactly the +[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but +we'll focus instead on the conditions that enable the Rust compiler to use faster stack-based +allocation for variables. + +So, **how do we know when Rust will or will not use stack allocation for objects we create?** +Looking at other languages, it's often easy to delineate between stack and heap. Managed memory +languages (Python, Java, +[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) place +everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/), +[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize +some heap allocations away, but you should never assume it will happen. C makes things clear with +calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) needed to access +heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though +modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii). + +For Rust, we can summarize as follows: **stack allocation will be used for everything that doesn't +involve "smart pointers" and collections**. We'll skip over a precise definition of the term "smart +pointer" for now, and instead discuss what we should watch for to understand when stack and heap +memory regions are used: + +1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate + allocation of stack memory: + + ```rust + pub fn stack_alloc(x: u32) -> u32 { + // Space for `y` is allocated by subtracting from `rsp`, + // and then populated + let y = [1u8, 2, 3, 4]; + // Space for `y` is deallocated by adding back to `rsp` + x + } + ``` + + -- [Compiler Explorer](https://godbolt.org/z/5WSgc9) + +2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to watch + for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent + past: + + ```rust + pub fn heap_alloc(x: usize) -> usize { + // Space for elements in a vector has to be allocated + // on the heap, and is then de-allocated once the + // vector goes out of scope + let y: Vec = Vec::with_capacity(x); + x + } + ``` + + -- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317) + Note: While the + [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is + [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46), + the Rust standard library only defines `Drop` implementations for types that involve heap + allocation. + +3. If you don't want to inspect the assembly, use a custom allocator that's able to track and alert + when heap allocations occur. Crates like + [`alloc_counter`](https://crates.io/crates/alloc_counter) are designed for exactly this purpose. + +With all that in mind, let's talk about situations in which we're guaranteed to use stack memory: + +- Structs are created on the stack. +- Function arguments are passed on the stack, meaning the + [`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) will + not change the memory region used. +- Enums and unions are stack-allocated. +- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated. +- Closures capture their arguments on the stack. +- Generics will use stack allocation, even with dynamic dispatch. +- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be + stack-allocated, and copying them will be done in stack memory. +- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library are + stack-allocated even when iterating over heap-based collections. + +# Structs + +The simplest case comes first. When creating vanilla `struct` objects, we use stack memory to hold +their contents: + +```rust +struct Point { + x: u64, + y: u64, +} + +struct Line { + a: Point, + b: Point, +} + +pub fn make_line() { + // `origin` is stored in the first 16 bytes of memory + // starting at location `rsp` + let origin = Point { x: 0, y: 0 }; + // `point` makes up the next 16 bytes of memory + let point = Point { x: 1, y: 2 }; + + // When creating `ray`, we just move the content out of + // `origin` and `point` into the next 32 bytes of memory + let ray = Line { a: origin, b: point }; +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/vri9BE) + +Note that while some extra-fancy instructions are used for memory manipulation in the assembly, the +`sub rsp, 64` instruction indicates we're still working with the stack. + +# Function arguments + +Have you ever wondered how functions communicate with each other? Like, once the variables are given +to you, everything's fine. But how do you "give" those variables to another function? How do you get +the results back afterward? The answer: the compiler arranges memory and assembly instructions using +a pre-determined [calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). This +convention governs the rules around where arguments needed by a function will be located (either in +memory offsets relative to the stack pointer `rsp`, or in other registers), and where the results +can be found once the function has finished. And when multiple languages agree on what the calling +conventions are, you can do things like having [Go call Rust code](https://blog.filippo.io/rustgo/)! + +Put simply: it's the compiler's job to figure out how to call other functions, and you can assume +that the compiler is good at its job. + +We can see this in action using a simple example: + +```rust +struct Point { + x: i64, + y: i64, +} + +// We use integer division operations to keep +// the assembly clean, understanding the result +// isn't accurate. +fn distance(a: &Point, b: &Point) -> i64 { + // Immediately subtract from `rsp` the bytes needed + // to hold all the intermediate results - this is + // the stack allocation step + + // The compiler used the `rdi` and `rsi` registers + // to pass our arguments, so read them in + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + // Do the actual math work + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared + + // Our final result will be stored in the `rax` register + // so that our caller knows where to retrieve it. + // Finally, add back to `rsp` the stack memory that is + // now ready to be used by other functions. +} + +pub fn total_distance() { + let start = Point { x: 1, y: 2 }; + let middle = Point { x: 3, y: 4 }; + let end = Point { x: 5, y: 6 }; + + let _dist_1 = distance(&start, &middle); + let _dist_2 = distance(&middle, &end); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST) + +As a consequence of function arguments never using heap memory, we can also infer that functions +using the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look +at the assembly to prove it: + +```rust +struct Point { + x: i64, + y: i64, +} + +// Note that there is no `distance` function in the assembly output, +// and the total line count goes from 229 with inlining off +// to 306 with inline on. Even still, no heap allocations occur. +#[inline(always)] +fn distance(a: &Point, b: &Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (a.x - b.x) * (a.x - b.x); + let y_pow = (a.y - b.y) * (a.y - b.y); + let squared = x_pow + y_pow; + squared / squared +} + +pub fn total_distance() { + let start = Point { x: 1, y: 2 }; + let middle = Point { x: 3, y: 4 }; + let end = Point { x: 5, y: 6 }; + + let _dist_1 = distance(&start, &middle); + let _dist_2 = distance(&middle, &end); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/30Sh66) + +Finally, passing by value (arguments with type +[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either +moving ownership or passing a pointer) may have slightly different layouts in assembly, but will +still use either stack memory or CPU registers: + +```rust +pub struct Point { + x: i64, + y: i64, +} + +// Moving values +pub fn distance_moved(a: Point, b: Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared +} + +// Borrowing values has two extra `mov` instructions on lines 21 and 22 +pub fn distance_borrowed(a: &Point, b: &Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/06hGiv) + +# Enums + +If you've ever worried that wrapping your types in +[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or +[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would finally make them +large enough that Rust decides to use heap allocation instead, fear no longer: `enum` and union +types don't use heap allocation: + +```rust +enum MyEnum { + Small(u8), + Large(u64) +} + +struct MyStruct { + x: MyEnum, + y: MyEnum, +} + +pub fn enum_compare() { + let x = MyEnum::Small(0); + let y = MyEnum::Large(0); + + let z = MyStruct { x, y }; + + let opt = Option::Some(z); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/HK7zBx) + +Because the size of an `enum` is the size of its largest element plus a flag, the compiler can +predict how much memory is used no matter which variant of an enum is currently stored in a +variable. Thus, enums and unions have no need of heap allocation. There's unfortunately not a great +way to show this in assembly, so I'll instead point you to the +[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums) +documentation. + +# Arrays + +The array type is guaranteed to be stack allocated, which is why the array size must be declared. +Interestingly enough, this can be used to cause safe Rust programs to crash: + +```rust +// 256 bytes +#[derive(Default)] +struct TwoFiftySix { + _a: [u64; 32] +} + +// 8 kilobytes +#[derive(Default)] +struct EightK { + _a: [TwoFiftySix; 32] +} + +// 256 kilobytes +#[derive(Default)] +struct TwoFiftySixK { + _a: [EightK; 32] +} + +// 8 megabytes - exceeds space typically provided for the stack, +// though the kernel can be instructed to allocate more. +// On Linux, you can check stack size using `ulimit -s` +#[derive(Default)] +struct EightM { + _a: [TwoFiftySixK; 32] +} + +fn main() { + // Because we already have things in stack memory + // (like the current function call stack), allocating another + // eight megabytes of stack memory crashes the program + let _x = EightM::default(); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4) + +There aren't any security implications of this (no memory corruption occurs), but it's good to note +that the Rust compiler won't move arrays into heap memory even if they can be reasonably expected to +overflow the stack. + +# Closures + +Rules for how anonymous functions capture their arguments are typically language-specific. In Java, +[Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) are +actually objects created on the heap that capture local primitives by copying, and capture local +non-primitives as (`final`) references. +[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and +[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/) +both bind _everything_ by reference normally, but Python can also +[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has +[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions). + +In Rust, arguments to closures are the same as arguments to other functions; closures are simply +functions that don't have a declared name. Some weird ordering of the stack may be required to +handle them, but it's the compiler's responsiblity to figure that out. + +Each example below has the same effect, but a different assembly implementation. In the simplest +case, we immediately run a closure returned by another function. Because we don't store a reference +to the closure, the stack memory needed to store the captured values is contiguous: + +```rust +fn my_func() -> impl FnOnce() { + let x = 24; + // Note that this closure in assembly looks exactly like + // any other function; you even use the `call` instruction + // to start running it. + move || { x; } +} + +pub fn immediate() { + my_func()(); + my_func()(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions + +If we store a reference to the closure, the Rust compiler keeps values it needs in the stack memory +of the original function. Getting the details right is a bit harder, so the instruction count goes +up even though this code is functionally equivalent to our original example: + +```rust +pub fn simple_reference() { + let x = my_func(); + let y = my_func(); + y(); + x(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions + +Even things like variable order can make a difference in instruction count: + +```rust +pub fn complex() { + let x = my_func(); + let y = my_func(); + x(); + y(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions + +In every circumstance though, the compiler ensured that no heap allocations were necessary. + +# Generics + +Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) and dynamic +dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often _associated_ with trait +objects being stored in the heap, dynamic dispatch can be used with stack allocated objects as well: + +```rust +trait GetInt { + fn get_int(&self) -> u64; +} + +// vtable stored at section L__unnamed_1 +struct WhyNotU8 { + x: u8 +} +impl GetInt for WhyNotU8 { + fn get_int(&self) -> u64 { + self.x as u64 + } +} + +// vtable stored at section L__unnamed_2 +struct ActualU64 { + x: u64 +} +impl GetInt for ActualU64 { + fn get_int(&self) -> u64 { + self.x + } +} + +// `&dyn` declares that we want to use dynamic dispatch +// rather than monomorphization, so there is only one +// `retrieve_int` function that shows up in the final assembly. +// If we used generics, there would be one implementation of +// `retrieve_int` for each type that implements `GetInt`. +pub fn retrieve_int(u: &dyn GetInt) { + // In the assembly, we just call an address given to us + // in the `rsi` register and hope that it was set up + // correctly when this function was invoked. + let x = u.get_int(); +} + +pub fn do_call() { + // Note that even though the vtable for `WhyNotU8` and + // `ActualU64` includes a pointer to + // `core::ptr::real_drop_in_place`, it is never invoked. + let a = WhyNotU8 { x: 0 }; + let b = ActualU64 { x: 0 }; + + retrieve_int(&a); + retrieve_int(&b); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/u_yguS) + +It's hard to imagine practical situations where dynamic dispatch would be used for objects that +aren't heap allocated, but it technically can be done. + +# Copy types + +Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs +[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can +be addressed here, so I'll leave them to do the job. From a memory perspective though, their +guideline is reasonable: +[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy). +While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` (move semantics for +stack objects vs. copying stack pointers vs. copying stack `struct`s), _it's impossible for `Copy` +to introduce a heap allocation_. + +But why is this the case? Fundamentally, it's because the language controls what `Copy` means - +["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone) +because it's a marker trait. From there we'll note that a type +[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy) +if (and only if) its components implement `Copy`, and that +[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors). +Thus, assignments involving heap types are always move semantics, and new heap allocations won't +occur because of implicit operator behavior. + +```rust +#[derive(Clone)] +struct Cloneable { + x: Box +} + +// error[E0204]: the trait `Copy` may not be implemented for this type +#[derive(Copy, Clone)] +struct NotCopyable { + x: Box +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/VToRuK) + +# Iterators + +In managed memory languages (like +[Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), there's a subtle +difference between these two code samples: + +```java +public static int sum_for(List vals) { + long sum = 0; + // Regular for loop + for (int i = 0; i < vals.length; i++) { + sum += vals[i]; + } + return sum; +} + +public static int sum_foreach(List vals) { + long sum = 0; + // "Foreach" loop - uses iteration + for (Long l : vals) { + sum += l; + } + return sum; +} +``` + +In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, an object of type +[`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html) +is allocated on the heap, and will eventually be garbage-collected. This isn't a great design; +iterators are often transient objects that you need during a function and can discard once the +function ends. Sounds exactly like the issue stack-allocated objects address, no? + +In Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in +heap memory, but the iterator itself +([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. In +each of the examples below we iterate over a collection, but never use heap allocation: + +```rust +use std::collections::HashMap; +// There's a lot of assembly generated, but if you search in the text, +// there are no references to `real_drop_in_place` anywhere. + +pub fn sum_vec(x: &Vec) { + let mut s = 0; + // Basic iteration over vectors doesn't need allocation + for y in x { + s += y; + } +} + +pub fn sum_enumerate(x: &Vec) { + let mut s = 0; + // More complex iterators are just fine too + for (_i, y) in x.iter().enumerate() { + s += y; + } +} + +pub fn sum_hm(x: &HashMap) { + let mut s = 0; + // And it's not just Vec, all types will allocate the iterator + // on stack memory + for y in x.values() { + s += y; + } +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/FTT3CT) diff --git a/blog/2019-02-06-stacking-up/index.mdx b/blog/2019-02-06-stacking-up/index.mdx new file mode 100644 index 0000000..0ae8bb6 --- /dev/null +++ b/blog/2019-02-06-stacking-up/index.mdx @@ -0,0 +1,604 @@ +--- +slug: 2019/02/stacking-up +title: "Allocations in Rust: Fixed memory" +date: 2019-02-06 12:00:00 +authors: [bspeice] +tags: [] +--- + +`const` and `static` are perfectly fine, but it's relatively rare that we know at compile-time about +either values or references that will be the same for the duration of our program. Put another way, +it's not often the case that either you or your compiler knows how much memory your entire program +will ever need. + +However, there are still some optimizations the compiler can do if it knows how much memory +individual functions will need. Specifically, the compiler can make use of "stack" memory (as +opposed to "heap" memory) which can be managed far faster in both the short- and long-term. + + + +When requesting memory, the [`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) +can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (<1ns +on modern CPUs). Contrast that to heap memory which requires an allocator (specialized +software to track what memory is in use) to reserve space. When you're finished with stack memory, +the `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory +fragmentation and other issues with the heap. All sorts of incredibly sophisticated techniques have +been used to design allocators: + +- [Garbage Collection]() + strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) (used in + [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) and + [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) (used in + [Python](https://docs.python.org/3/extending/extending.html#reference-counts)) +- Thread-local structures to prevent locking the allocator in + [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html) +- Arena structures used in [jemalloc](http://jemalloc.net/), which + [until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default) + was the primary allocator for Rust programs! + +But no matter how fast your allocator is, the principle remains: the fastest allocator is the one +you never use. As such, we're not going to discuss how exactly the +[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but +we'll focus instead on the conditions that enable the Rust compiler to use faster stack-based +allocation for variables. + +So, **how do we know when Rust will or will not use stack allocation for objects we create?** +Looking at other languages, it's often easy to delineate between stack and heap. Managed memory +languages (Python, Java, +[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) place +everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/), +[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize +some heap allocations away, but you should never assume it will happen. C makes things clear with +calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) needed to access +heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though +modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii). + +For Rust, we can summarize as follows: **stack allocation will be used for everything that doesn't +involve "smart pointers" and collections**. We'll skip over a precise definition of the term "smart +pointer" for now, and instead discuss what we should watch for to understand when stack and heap +memory regions are used: + +1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate + allocation of stack memory: + + ```rust + pub fn stack_alloc(x: u32) -> u32 { + // Space for `y` is allocated by subtracting from `rsp`, + // and then populated + let y = [1u8, 2, 3, 4]; + // Space for `y` is deallocated by adding back to `rsp` + x + } + ``` + + -- [Compiler Explorer](https://godbolt.org/z/5WSgc9) + +2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to watch + for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent + past: + + ```rust + pub fn heap_alloc(x: usize) -> usize { + // Space for elements in a vector has to be allocated + // on the heap, and is then de-allocated once the + // vector goes out of scope + let y: Vec = Vec::with_capacity(x); + x + } + ``` + + -- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317) + Note: While the + [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is + [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46), + the Rust standard library only defines `Drop` implementations for types that involve heap + allocation. + +3. If you don't want to inspect the assembly, use a custom allocator that's able to track and alert + when heap allocations occur. Crates like + [`alloc_counter`](https://crates.io/crates/alloc_counter) are designed for exactly this purpose. + +With all that in mind, let's talk about situations in which we're guaranteed to use stack memory: + +- Structs are created on the stack. +- Function arguments are passed on the stack, meaning the + [`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) will + not change the memory region used. +- Enums and unions are stack-allocated. +- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated. +- Closures capture their arguments on the stack. +- Generics will use stack allocation, even with dynamic dispatch. +- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be + stack-allocated, and copying them will be done in stack memory. +- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library are + stack-allocated even when iterating over heap-based collections. + +## Structs + +The simplest case comes first. When creating vanilla `struct` objects, we use stack memory to hold +their contents: + +```rust +struct Point { + x: u64, + y: u64, +} + +struct Line { + a: Point, + b: Point, +} + +pub fn make_line() { + // `origin` is stored in the first 16 bytes of memory + // starting at location `rsp` + let origin = Point { x: 0, y: 0 }; + // `point` makes up the next 16 bytes of memory + let point = Point { x: 1, y: 2 }; + + // When creating `ray`, we just move the content out of + // `origin` and `point` into the next 32 bytes of memory + let ray = Line { a: origin, b: point }; +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/vri9BE) + +Note that while some extra-fancy instructions are used for memory manipulation in the assembly, the +`sub rsp, 64` instruction indicates we're still working with the stack. + +## Function arguments + +Have you ever wondered how functions communicate with each other? Like, once the variables are given +to you, everything's fine. But how do you "give" those variables to another function? How do you get +the results back afterward? The answer: the compiler arranges memory and assembly instructions using +a pre-determined [calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). This +convention governs the rules around where arguments needed by a function will be located (either in +memory offsets relative to the stack pointer `rsp`, or in other registers), and where the results +can be found once the function has finished. And when multiple languages agree on what the calling +conventions are, you can do things like having [Go call Rust code](https://blog.filippo.io/rustgo/)! + +Put simply: it's the compiler's job to figure out how to call other functions, and you can assume +that the compiler is good at its job. + +We can see this in action using a simple example: + +```rust +struct Point { + x: i64, + y: i64, +} + +// We use integer division operations to keep +// the assembly clean, understanding the result +// isn't accurate. +fn distance(a: &Point, b: &Point) -> i64 { + // Immediately subtract from `rsp` the bytes needed + // to hold all the intermediate results - this is + // the stack allocation step + + // The compiler used the `rdi` and `rsi` registers + // to pass our arguments, so read them in + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + // Do the actual math work + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared + + // Our final result will be stored in the `rax` register + // so that our caller knows where to retrieve it. + // Finally, add back to `rsp` the stack memory that is + // now ready to be used by other functions. +} + +pub fn total_distance() { + let start = Point { x: 1, y: 2 }; + let middle = Point { x: 3, y: 4 }; + let end = Point { x: 5, y: 6 }; + + let _dist_1 = distance(&start, &middle); + let _dist_2 = distance(&middle, &end); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST) + +As a consequence of function arguments never using heap memory, we can also infer that functions +using the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look +at the assembly to prove it: + +```rust +struct Point { + x: i64, + y: i64, +} + +// Note that there is no `distance` function in the assembly output, +// and the total line count goes from 229 with inlining off +// to 306 with inline on. Even still, no heap allocations occur. +#[inline(always)] +fn distance(a: &Point, b: &Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (a.x - b.x) * (a.x - b.x); + let y_pow = (a.y - b.y) * (a.y - b.y); + let squared = x_pow + y_pow; + squared / squared +} + +pub fn total_distance() { + let start = Point { x: 1, y: 2 }; + let middle = Point { x: 3, y: 4 }; + let end = Point { x: 5, y: 6 }; + + let _dist_1 = distance(&start, &middle); + let _dist_2 = distance(&middle, &end); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/30Sh66) + +Finally, passing by value (arguments with type +[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either +moving ownership or passing a pointer) may have slightly different layouts in assembly, but will +still use either stack memory or CPU registers: + +```rust +pub struct Point { + x: i64, + y: i64, +} + +// Moving values +pub fn distance_moved(a: Point, b: Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared +} + +// Borrowing values has two extra `mov` instructions on lines 21 and 22 +pub fn distance_borrowed(a: &Point, b: &Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/06hGiv) + +## Enums + +If you've ever worried that wrapping your types in +[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or +[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would finally make them +large enough that Rust decides to use heap allocation instead, fear no longer: `enum` and union +types don't use heap allocation: + +```rust +enum MyEnum { + Small(u8), + Large(u64) +} + +struct MyStruct { + x: MyEnum, + y: MyEnum, +} + +pub fn enum_compare() { + let x = MyEnum::Small(0); + let y = MyEnum::Large(0); + + let z = MyStruct { x, y }; + + let opt = Option::Some(z); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/HK7zBx) + +Because the size of an `enum` is the size of its largest element plus a flag, the compiler can +predict how much memory is used no matter which variant of an enum is currently stored in a +variable. Thus, enums and unions have no need of heap allocation. There's unfortunately not a great +way to show this in assembly, so I'll instead point you to the +[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums) +documentation. + +## Arrays + +The array type is guaranteed to be stack allocated, which is why the array size must be declared. +Interestingly enough, this can be used to cause safe Rust programs to crash: + +```rust +// 256 bytes +#[derive(Default)] +struct TwoFiftySix { + _a: [u64; 32] +} + +// 8 kilobytes +#[derive(Default)] +struct EightK { + _a: [TwoFiftySix; 32] +} + +// 256 kilobytes +#[derive(Default)] +struct TwoFiftySixK { + _a: [EightK; 32] +} + +// 8 megabytes - exceeds space typically provided for the stack, +// though the kernel can be instructed to allocate more. +// On Linux, you can check stack size using `ulimit -s` +#[derive(Default)] +struct EightM { + _a: [TwoFiftySixK; 32] +} + +fn main() { + // Because we already have things in stack memory + // (like the current function call stack), allocating another + // eight megabytes of stack memory crashes the program + let _x = EightM::default(); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4) + +There aren't any security implications of this (no memory corruption occurs), but it's good to note +that the Rust compiler won't move arrays into heap memory even if they can be reasonably expected to +overflow the stack. + +## Closures + +Rules for how anonymous functions capture their arguments are typically language-specific. In Java, +[Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) are +actually objects created on the heap that capture local primitives by copying, and capture local +non-primitives as (`final`) references. +[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and +[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/) +both bind _everything_ by reference normally, but Python can also +[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has +[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions). + +In Rust, arguments to closures are the same as arguments to other functions; closures are simply +functions that don't have a declared name. Some weird ordering of the stack may be required to +handle them, but it's the compiler's responsiblity to figure that out. + +Each example below has the same effect, but a different assembly implementation. In the simplest +case, we immediately run a closure returned by another function. Because we don't store a reference +to the closure, the stack memory needed to store the captured values is contiguous: + +```rust +fn my_func() -> impl FnOnce() { + let x = 24; + // Note that this closure in assembly looks exactly like + // any other function; you even use the `call` instruction + // to start running it. + move || { x; } +} + +pub fn immediate() { + my_func()(); + my_func()(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions + +If we store a reference to the closure, the Rust compiler keeps values it needs in the stack memory +of the original function. Getting the details right is a bit harder, so the instruction count goes +up even though this code is functionally equivalent to our original example: + +```rust +pub fn simple_reference() { + let x = my_func(); + let y = my_func(); + y(); + x(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions + +Even things like variable order can make a difference in instruction count: + +```rust +pub fn complex() { + let x = my_func(); + let y = my_func(); + x(); + y(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions + +In every circumstance though, the compiler ensured that no heap allocations were necessary. + +## Generics + +Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) and dynamic +dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often _associated_ with trait +objects being stored in the heap, dynamic dispatch can be used with stack allocated objects as well: + +```rust +trait GetInt { + fn get_int(&self) -> u64; +} + +// vtable stored at section L__unnamed_1 +struct WhyNotU8 { + x: u8 +} +impl GetInt for WhyNotU8 { + fn get_int(&self) -> u64 { + self.x as u64 + } +} + +// vtable stored at section L__unnamed_2 +struct ActualU64 { + x: u64 +} +impl GetInt for ActualU64 { + fn get_int(&self) -> u64 { + self.x + } +} + +// `&dyn` declares that we want to use dynamic dispatch +// rather than monomorphization, so there is only one +// `retrieve_int` function that shows up in the final assembly. +// If we used generics, there would be one implementation of +// `retrieve_int` for each type that implements `GetInt`. +pub fn retrieve_int(u: &dyn GetInt) { + // In the assembly, we just call an address given to us + // in the `rsi` register and hope that it was set up + // correctly when this function was invoked. + let x = u.get_int(); +} + +pub fn do_call() { + // Note that even though the vtable for `WhyNotU8` and + // `ActualU64` includes a pointer to + // `core::ptr::real_drop_in_place`, it is never invoked. + let a = WhyNotU8 { x: 0 }; + let b = ActualU64 { x: 0 }; + + retrieve_int(&a); + retrieve_int(&b); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/u_yguS) + +It's hard to imagine practical situations where dynamic dispatch would be used for objects that +aren't heap allocated, but it technically can be done. + +## Copy types + +Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs +[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can +be addressed here, so I'll leave them to do the job. From a memory perspective though, their +guideline is reasonable: +[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy). +While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` (move semantics for +stack objects vs. copying stack pointers vs. copying stack `struct`s), _it's impossible for `Copy` +to introduce a heap allocation_. + +But why is this the case? Fundamentally, it's because the language controls what `Copy` means - +["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone) +because it's a marker trait. From there we'll note that a type +[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy) +if (and only if) its components implement `Copy`, and that +[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors). +Thus, assignments involving heap types are always move semantics, and new heap allocations won't +occur because of implicit operator behavior. + +```rust +#[derive(Clone)] +struct Cloneable { + x: Box +} + +// error[E0204]: the trait `Copy` may not be implemented for this type +#[derive(Copy, Clone)] +struct NotCopyable { + x: Box +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/VToRuK) + +## Iterators + +In managed memory languages (like +[Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), there's a subtle +difference between these two code samples: + +```java +public static int sum_for(List vals) { + long sum = 0; + // Regular for loop + for (int i = 0; i < vals.length; i++) { + sum += vals[i]; + } + return sum; +} + +public static int sum_foreach(List vals) { + long sum = 0; + // "Foreach" loop - uses iteration + for (Long l : vals) { + sum += l; + } + return sum; +} +``` + +In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, an object of type +[`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html) +is allocated on the heap, and will eventually be garbage-collected. This isn't a great design; +iterators are often transient objects that you need during a function and can discard once the +function ends. Sounds exactly like the issue stack-allocated objects address, no? + +In Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in +heap memory, but the iterator itself +([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. In +each of the examples below we iterate over a collection, but never use heap allocation: + +```rust +use std::collections::HashMap; +// There's a lot of assembly generated, but if you search in the text, +// there are no references to `real_drop_in_place` anywhere. + +pub fn sum_vec(x: &Vec) { + let mut s = 0; + // Basic iteration over vectors doesn't need allocation + for y in x { + s += y; + } +} + +pub fn sum_enumerate(x: &Vec) { + let mut s = 0; + // More complex iterators are just fine too + for (_i, y) in x.iter().enumerate() { + s += y; + } +} + +pub fn sum_hm(x: &HashMap) { + let mut s = 0; + // And it's not just Vec, all types will allocate the iterator + // on stack memory + for y in x.values() { + s += y; + } +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/FTT3CT) diff --git a/blog/2019-02-07-a-heaping-helping/_article.md b/blog/2019-02-07-a-heaping-helping/_article.md new file mode 100644 index 0000000..b68c447 --- /dev/null +++ b/blog/2019-02-07-a-heaping-helping/_article.md @@ -0,0 +1,254 @@ +--- +layout: post +title: "Dynamic Memory: A Heaping Helping" +description: "The reason Rust exists." +category: +tags: [rust, understanding-allocations] +--- + +Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and +some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, how +the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_. +And as the docs mention, ownership +[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). + +The heap is used in two situations; when the compiler is unable to predict either the _total size of +memory needed_, or _how long the memory is needed for_, it allocates space in the heap. This happens +pretty frequently; if you want to download the Google home page, you won't know how large it is +until your program runs. And when you're finished with Google, we deallocate the memory so it can be +used to store other webpages. If you're interested in a slightly longer explanation of the heap, +check out +[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap) +in Rust's documentation. + +We won't go into detail on how the heap is managed; the +[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a +phenomenal job explaining both the "why" and "how" of memory management. Instead, we're going to +focus on understanding "when" heap allocations occur in Rust. + +To start off, take a guess for how many allocations happen in the program below: + +```rust +fn main() {} +``` + +It's obviously a trick question; while no heap allocations occur as a result of that code, the setup +needed to call `main` does allocate on the heap. Here's a way to show it: + +```rust +#![feature(integer_atomics)] +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicU64, Ordering}; + +static ALLOCATION_COUNT: AtomicU64 = AtomicU64::new(0); + +struct CountingAllocator; + +unsafe impl GlobalAlloc for CountingAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst); + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + System.dealloc(ptr, layout); + } +} + +#[global_allocator] +static A: CountingAllocator = CountingAllocator; + +fn main() { + let x = ALLOCATION_COUNT.fetch_add(0, Ordering::SeqCst); + println!("There were {} allocations before calling main!", x); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e) + +As of the time of writing, there are five allocations that happen before `main` is ever called. + +But when we want to understand more practically where heap allocation happens, we'll follow this +guide: + +- Smart pointers hold their contents in the heap +- Collections are smart pointers for many objects at a time, and reallocate when they need to grow + +Finally, there are two "addendum" issues that are important to address when discussing Rust and the +heap: + +- Non-heap alternatives to many standard library types are available. +- Special allocators to track memory behavior should be used to benchmark code. + +# Smart pointers + +The first thing to note are the "smart pointer" types. When you have data that must outlive the +scope in which it is declared, or your data is of unknown or dynamic size, you'll make use of these +types. + +The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) comes from C++, and while it's +closely linked to a general design pattern of +["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii), we'll +use it here specifically to describe objects that are responsible for managing ownership of data +allocated on the heap. The smart pointers available in the `alloc` crate should look mostly +familiar: + +- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html) +- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html) +- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html) +- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html) + +The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers to manage +heap objects, though more than can be covered here. Some examples are: + +- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) +- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) + +Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): **cell types** +(like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) look and behave +similarly, but **don't involve heap allocation**. The +[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) have more information. + +When a smart pointer is created, the data it is given is placed in heap memory and the location of +that data is recorded in the smart pointer. Once the smart pointer has determined it's safe to +deallocate that memory (when a `Box` has +[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a reference count +[goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can +prove these types use heap memory by looking at code: + +```rust +use std::rc::Rc; +use std::sync::Arc; +use std::borrow::Cow; + +pub fn my_box() { + // Drop at assembly line 1640 + Box::new(0); +} + +pub fn my_rc() { + // Drop at assembly line 1650 + Rc::new(0); +} + +pub fn my_arc() { + // Drop at assembly line 1660 + Arc::new(0); +} + +pub fn my_cow() { + // Drop at assembly line 1672 + Cow::from("drop"); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/4AMQug) + +# Collections + +Collection types use heap memory because their contents have dynamic size; they will request more +memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can +[release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it's +no longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In +a way, **collections are smart pointers for many objects at a time**. Common types that fall under +this umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html), +[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and +[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not +[`str`](https://doc.rust-lang.org/std/primitive.str.html)). + +While collections store the objects they own in heap memory, _creating new collections will not +allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the assembly shows a +corresponding call to `real_drop_in_place`: + +```rust +pub fn my_vec() { + // Drop in place at line 481 + Vec::::new(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/1WkNtC) + +But because the vector has no elements to manage, no calls to the allocator will ever be dispatched: + +```rust +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicBool, Ordering}; + +fn main() { + // Turn on panicking if we allocate on the heap + DO_PANIC.store(true, Ordering::SeqCst); + + // Interesting bit happens here + let x: Vec = Vec::new(); + drop(x); + + // Turn panicking back off, some deallocations occur + // after main as well. + DO_PANIC.store(false, Ordering::SeqCst); +} + +#[global_allocator] +static A: PanicAllocator = PanicAllocator; +static DO_PANIC: AtomicBool = AtomicBool::new(false); +struct PanicAllocator; + +unsafe impl GlobalAlloc for PanicAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected allocation."); + } + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected deallocation."); + } + System.dealloc(ptr, layout); + } +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6) + +Other standard library types follow the same behavior; make sure to check out +[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new), +and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new). + +# Heap Alternatives + +While it is a bit strange to speak of the stack after spending time with the heap, it's worth +pointing out that some heap-allocated objects in Rust have stack-based counterparts provided by +other crates. If you have need of the functionality, but want to avoid allocating, there are +typically alternatives available. + +When it comes to some standard library smart pointers +([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and +[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are +provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and +[spin](https://crates.io/crates/spin). You can check out +[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html), +[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and +[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you're in need +of synchronization primitives. + +[thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator +because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) uses a +[`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36) +that needs heap allocation. + +# Tracing Allocators + +When writing performance-sensitive code, there's no alternative to measuring your code. If you +didn't write a benchmark, +[you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263) +You should never rely on your instincts when +[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM). + +Similarly, there's great work going on in Rust with allocators that keep track of what they're doing +(like [`alloc_counter`](https://crates.io/crates/alloc_counter)). When it comes to tracking heap +behavior, it's easy to make mistakes; please write tests and make sure you have tools to guard +against future issues. diff --git a/blog/2019-02-07-a-heaping-helping/index.mdx b/blog/2019-02-07-a-heaping-helping/index.mdx new file mode 100644 index 0000000..292306b --- /dev/null +++ b/blog/2019-02-07-a-heaping-helping/index.mdx @@ -0,0 +1,258 @@ +--- +slug: 2019/02/a-heaping-helping +title: "Allocations in Rust: Dynamic memory" +date: 2019-02-07 12:00:00 +authors: [bspeice] +tags: [] +--- + +Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and +some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, how +the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_. +And as the docs mention, ownership +[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). + +The heap is used in two situations; when the compiler is unable to predict either the _total size of +memory needed_, or _how long the memory is needed for_, it allocates space in the heap. + + + +This happens +pretty frequently; if you want to download the Google home page, you won't know how large it is +until your program runs. And when you're finished with Google, we deallocate the memory so it can be +used to store other webpages. If you're interested in a slightly longer explanation of the heap, +check out +[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap) +in Rust's documentation. + +We won't go into detail on how the heap is managed; the +[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a +phenomenal job explaining both the "why" and "how" of memory management. Instead, we're going to +focus on understanding "when" heap allocations occur in Rust. + +To start off, take a guess for how many allocations happen in the program below: + +```rust +fn main() {} +``` + +It's obviously a trick question; while no heap allocations occur as a result of that code, the setup +needed to call `main` does allocate on the heap. Here's a way to show it: + +```rust +#![feature(integer_atomics)] +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicU64, Ordering}; + +static ALLOCATION_COUNT: AtomicU64 = AtomicU64::new(0); + +struct CountingAllocator; + +unsafe impl GlobalAlloc for CountingAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst); + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + System.dealloc(ptr, layout); + } +} + +#[global_allocator] +static A: CountingAllocator = CountingAllocator; + +fn main() { + let x = ALLOCATION_COUNT.fetch_add(0, Ordering::SeqCst); + println!("There were {} allocations before calling main!", x); +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e) + +As of the time of writing, there are five allocations that happen before `main` is ever called. + +But when we want to understand more practically where heap allocation happens, we'll follow this +guide: + +- Smart pointers hold their contents in the heap +- Collections are smart pointers for many objects at a time, and reallocate when they need to grow + +Finally, there are two "addendum" issues that are important to address when discussing Rust and the +heap: + +- Non-heap alternatives to many standard library types are available. +- Special allocators to track memory behavior should be used to benchmark code. + +## Smart pointers + +The first thing to note are the "smart pointer" types. When you have data that must outlive the +scope in which it is declared, or your data is of unknown or dynamic size, you'll make use of these +types. + +The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) comes from C++, and while it's +closely linked to a general design pattern of +["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii), we'll +use it here specifically to describe objects that are responsible for managing ownership of data +allocated on the heap. The smart pointers available in the `alloc` crate should look mostly +familiar: + +- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html) +- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html) +- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html) +- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html) + +The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers to manage +heap objects, though more than can be covered here. Some examples are: + +- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) +- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) + +Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): **cell types** +(like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) look and behave +similarly, but **don't involve heap allocation**. The +[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) have more information. + +When a smart pointer is created, the data it is given is placed in heap memory and the location of +that data is recorded in the smart pointer. Once the smart pointer has determined it's safe to +deallocate that memory (when a `Box` has +[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a reference count +[goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can +prove these types use heap memory by looking at code: + +```rust +use std::rc::Rc; +use std::sync::Arc; +use std::borrow::Cow; + +pub fn my_box() { + // Drop at assembly line 1640 + Box::new(0); +} + +pub fn my_rc() { + // Drop at assembly line 1650 + Rc::new(0); +} + +pub fn my_arc() { + // Drop at assembly line 1660 + Arc::new(0); +} + +pub fn my_cow() { + // Drop at assembly line 1672 + Cow::from("drop"); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/4AMQug) + +## Collections + +Collection types use heap memory because their contents have dynamic size; they will request more +memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can +[release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it's +no longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In +a way, **collections are smart pointers for many objects at a time**. Common types that fall under +this umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html), +[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and +[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not +[`str`](https://doc.rust-lang.org/std/primitive.str.html)). + +While collections store the objects they own in heap memory, _creating new collections will not +allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the assembly shows a +corresponding call to `real_drop_in_place`: + +```rust +pub fn my_vec() { + // Drop in place at line 481 + Vec::::new(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/1WkNtC) + +But because the vector has no elements to manage, no calls to the allocator will ever be dispatched: + +```rust +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicBool, Ordering}; + +fn main() { + // Turn on panicking if we allocate on the heap + DO_PANIC.store(true, Ordering::SeqCst); + + // Interesting bit happens here + let x: Vec = Vec::new(); + drop(x); + + // Turn panicking back off, some deallocations occur + // after main as well. + DO_PANIC.store(false, Ordering::SeqCst); +} + +#[global_allocator] +static A: PanicAllocator = PanicAllocator; +static DO_PANIC: AtomicBool = AtomicBool::new(false); +struct PanicAllocator; + +unsafe impl GlobalAlloc for PanicAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected allocation."); + } + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected deallocation."); + } + System.dealloc(ptr, layout); + } +} +``` + +-- +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6) + +Other standard library types follow the same behavior; make sure to check out +[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new), +and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new). + +## Heap Alternatives + +While it is a bit strange to speak of the stack after spending time with the heap, it's worth +pointing out that some heap-allocated objects in Rust have stack-based counterparts provided by +other crates. If you have need of the functionality, but want to avoid allocating, there are +typically alternatives available. + +When it comes to some standard library smart pointers +([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and +[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are +provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and +[spin](https://crates.io/crates/spin). You can check out +[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html), +[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and +[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you're in need +of synchronization primitives. + +[thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator +because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) uses a +[`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36) +that needs heap allocation. + +## Tracing Allocators + +When writing performance-sensitive code, there's no alternative to measuring your code. If you +didn't write a benchmark, +[you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263) +You should never rely on your instincts when +[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM). + +Similarly, there's great work going on in Rust with allocators that keep track of what they're doing +(like [`alloc_counter`](https://crates.io/crates/alloc_counter)). When it comes to tracking heap +behavior, it's easy to make mistakes; please write tests and make sure you have tools to guard +against future issues. diff --git a/blog/2019-02-08-compiler-optimizations/_article.md b/blog/2019-02-08-compiler-optimizations/_article.md new file mode 100644 index 0000000..4b8b385 --- /dev/null +++ b/blog/2019-02-08-compiler-optimizations/_article.md @@ -0,0 +1,148 @@ +--- +layout: post +title: "Compiler Optimizations: What It's Done Lately" +description: "A lot. The answer is a lot." +category: +tags: [rust, understanding-allocations] +--- + +**Update 2019-02-10**: When debugging a +[related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the +original code worked because LLVM optimized out the entire function, rather than just the allocation +segments. The code has been updated with proper use of +[`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section +on vector capacity has been removed. + +--- + +Up to this point, we've been discussing memory usage in the Rust language by focusing on simple +rules that are mostly right for small chunks of code. We've spent time showing how those rules work +themselves out in practice, and become familiar with reading the assembly code needed to see each +memory type (global, stack, heap) in action. + +Throughout the series so far, we've put a handicap on the code. In the name of consistent and +understandable results, we've asked the compiler to pretty please leave the training wheels on. Now +is the time where we throw out all the rules and take off the kid gloves. As it turns out, both the +Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them +do their job. + +Similar to +["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're +focusing on interesting things the Rust language (and LLVM!) can do with memory management. We'll +still be looking at assembly code to understand what's going on, but it's important to mention +again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to +double-check memory behavior if it's something you care about**. It's far too easy to mis-read +assembly in large code sections, you should always verify behavior if you care about memory usage. + +The guiding principal as we move forward is this: _optimizing compilers won't produce worse programs +than we started with._ There won't be any situations where stack allocations get moved to heap +allocations. There will, however, be an opera of optimization. + +# The Case of the Disappearing Box + +Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently +short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the +stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important +part is that LLVM can sometimes do better than the baseline Rust language: + +```rust +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicBool, Ordering}; + +pub fn cmp(x: u32) { + // Turn on panicking if we allocate on the heap + DO_PANIC.store(true, Ordering::SeqCst); + + // The compiler is able to see through the constant `Box` + // and directly compare `x` to 24 - assembly line 73 + let y = Box::new(24); + let equals = x == *y; + + // This call to drop is eliminated + drop(y); + + // Need to mark the comparison result as volatile so that + // LLVM doesn't strip out all the code. If `y` is marked + // volatile instead, allocation will be forced. + unsafe { std::ptr::read_volatile(&equals) }; + + // Turn off panicking, as there are some deallocations + // when we exit main. + DO_PANIC.store(false, Ordering::SeqCst); +} + +fn main() { + cmp(12) +} + +#[global_allocator] +static A: PanicAllocator = PanicAllocator; +static DO_PANIC: AtomicBool = AtomicBool::new(false); +struct PanicAllocator; + +unsafe impl GlobalAlloc for PanicAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected allocation."); + } + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected deallocation."); + } + System.dealloc(ptr, layout); + } +} +``` + +## -- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3) + +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d) + +# Dr. Array or: How I Learned to Love the Optimizer + +Finally, this isn't so much about LLVM figuring out different memory behavior, but LLVM stripping +out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if +you're not careful, they can make your benchmarks look +[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the +`black_box` function (implemented in both +[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and +[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler +to disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up +running programs that previously caused errors: + +```rust +#[derive(Default)] +struct TwoFiftySix { + _a: [u64; 32] +} + +#[derive(Default)] +struct EightK { + _a: [TwoFiftySix; 32] +} + +#[derive(Default)] +struct TwoFiftySixK { + _a: [EightK; 32] +} + +#[derive(Default)] +struct EightM { + _a: [TwoFiftySixK; 32] +} + +pub fn main() { + // Normally this blows up because we can't reserve size on stack + // for the `EightM` struct. But because the compiler notices we + // never do anything with `_x`, it optimizes out the stack storage + // and the program completes successfully. + let _x = EightM::default(); +} +``` + +## -- [Compiler Explorer](https://godbolt.org/z/daHn7P) + +[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0) diff --git a/blog/2019-02-08-compiler-optimizations/index.mdx b/blog/2019-02-08-compiler-optimizations/index.mdx new file mode 100644 index 0000000..000e68d --- /dev/null +++ b/blog/2019-02-08-compiler-optimizations/index.mdx @@ -0,0 +1,149 @@ +--- +title: "Allocations in Rust: Compiler optimizations" +description: "A lot. The answer is a lot." +date: 2019-02-08 12:00:00 +last_updated: + date: 2019-02-10 12:00:00 +tags: [] +--- + +Up to this point, we've been discussing memory usage in the Rust language by focusing on simple +rules that are mostly right for small chunks of code. We've spent time showing how those rules work +themselves out in practice, and become familiar with reading the assembly code needed to see each +memory type (global, stack, heap) in action. + +Throughout the series so far, we've put a handicap on the code. In the name of consistent and +understandable results, we've asked the compiler to pretty please leave the training wheels on. Now +is the time where we throw out all the rules and take off the kid gloves. As it turns out, both the +Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them +do their job. + + + +Similar to +["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're +focusing on interesting things the Rust language (and LLVM!) can do with memory management. We'll +still be looking at assembly code to understand what's going on, but it's important to mention +again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to +double-check memory behavior if it's something you care about**. It's far too easy to mis-read +assembly in large code sections, you should always verify behavior if you care about memory usage. + +The guiding principal as we move forward is this: _optimizing compilers won't produce worse programs +than we started with._ There won't be any situations where stack allocations get moved to heap +allocations. There will, however, be an opera of optimization. + +**Update 2019-02-10**: When debugging a +[related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the +original code worked because LLVM optimized out the entire function, rather than just the allocation +segments. The code has been updated with proper use of +[`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section +on vector capacity has been removed. + +## The Case of the Disappearing Box + +Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently +short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the +stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important +part is that LLVM can sometimes do better than the baseline Rust language: + +```rust +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicBool, Ordering}; + +pub fn cmp(x: u32) { + // Turn on panicking if we allocate on the heap + DO_PANIC.store(true, Ordering::SeqCst); + + // The compiler is able to see through the constant `Box` + // and directly compare `x` to 24 - assembly line 73 + let y = Box::new(24); + let equals = x == *y; + + // This call to drop is eliminated + drop(y); + + // Need to mark the comparison result as volatile so that + // LLVM doesn't strip out all the code. If `y` is marked + // volatile instead, allocation will be forced. + unsafe { std::ptr::read_volatile(&equals) }; + + // Turn off panicking, as there are some deallocations + // when we exit main. + DO_PANIC.store(false, Ordering::SeqCst); +} + +fn main() { + cmp(12) +} + +#[global_allocator] +static A: PanicAllocator = PanicAllocator; +static DO_PANIC: AtomicBool = AtomicBool::new(false); +struct PanicAllocator; + +unsafe impl GlobalAlloc for PanicAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected allocation."); + } + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected deallocation."); + } + System.dealloc(ptr, layout); + } +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3) + +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d) + +## Dr. Array or: how I learned to love the optimizer + +Finally, this isn't so much about LLVM figuring out different memory behavior, but LLVM stripping +out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if +you're not careful, they can make your benchmarks look +[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the +`black_box` function (implemented in both +[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and +[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler +to disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up +running programs that previously caused errors: + +```rust +#[derive(Default)] +struct TwoFiftySix { + _a: [u64; 32] +} + +#[derive(Default)] +struct EightK { + _a: [TwoFiftySix; 32] +} + +#[derive(Default)] +struct TwoFiftySixK { + _a: [EightK; 32] +} + +#[derive(Default)] +struct EightM { + _a: [TwoFiftySixK; 32] +} + +pub fn main() { + // Normally this blows up because we can't reserve size on stack + // for the `EightM` struct. But because the compiler notices we + // never do anything with `_x`, it optimizes out the stack storage + // and the program completes successfully. + let _x = EightM::default(); +} +``` + +-- [Compiler Explorer](https://godbolt.org/z/daHn7P) + +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0) diff --git a/blog/2019-02-09-summary/_article.md b/blog/2019-02-09-summary/_article.md new file mode 100644 index 0000000..dd7f06d --- /dev/null +++ b/blog/2019-02-09-summary/_article.md @@ -0,0 +1,35 @@ +--- +layout: post +title: "Summary: What are the Allocation Rules?" +description: "A synopsis and reference." +category: +tags: [rust, understanding-allocations] +--- + +While there's a lot of interesting detail captured in this series, it's often helpful to have a +document that answers some "yes/no" questions. You may not care about what an `Iterator` looks like +in assembly, you just need to know whether it allocates an object on the heap or not. And while Rust +will prioritize the fastest behavior it can, here are the rules for each memory type: + +**Heap Allocation**: + +- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory. +- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory. +- Some smart pointers in the standard library have counterparts in other crates that don't need heap + memory. If possible, use those. + +**Stack Allocation**: + +- Everything not using a smart pointer will be allocated on the stack. +- Structs, enums, iterators, arrays, and closures are all stack allocated. +- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated. +- Inlining (`#[inline]`) will not affect allocation behavior for better or worse. +- Types that are marked `Copy` are guaranteed to have their contents stack-allocated. + +**Global Allocation**: + +- `const` is a fixed value; the compiler is allowed to copy it wherever useful. +- `static` is a fixed reference; the compiler will guarantee it is unique. + +![Container Sizes in Rust](/assets/images/2019-02-04-container-size.svg) -- +[Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing) diff --git a/blog/2019-02-09-summary/container-size.svg b/blog/2019-02-09-summary/container-size.svg new file mode 100644 index 0000000..16d4fc0 --- /dev/null +++ b/blog/2019-02-09-summary/container-size.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/blog/2019-02-09-summary/index.mdx b/blog/2019-02-09-summary/index.mdx new file mode 100644 index 0000000..94b3f82 --- /dev/null +++ b/blog/2019-02-09-summary/index.mdx @@ -0,0 +1,39 @@ +--- +slug: 2019/02/summary +title: "Allocations in Rust: Summary" +date: 2019-02-09 12:00:00 +authors: [bspeice] +tags: [] +--- + +While there's a lot of interesting detail captured in this series, it's often helpful to have a +document that answers some "yes/no" questions. You may not care about what an `Iterator` looks like +in assembly, you just need to know whether it allocates an object on the heap or not. And while Rust +will prioritize the fastest behavior it can, here are the rules for each memory type: + + + +**Global Allocation**: + +- `const` is a fixed value; the compiler is allowed to copy it wherever useful. +- `static` is a fixed reference; the compiler will guarantee it is unique. + +**Stack Allocation**: + +- Everything not using a smart pointer will be allocated on the stack. +- Structs, enums, iterators, arrays, and closures are all stack allocated. +- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated. +- Inlining (`#[inline]`) will not affect allocation behavior for better or worse. +- Types that are marked `Copy` are guaranteed to have their contents stack-allocated. + + +**Heap Allocation**: + +- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory. +- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory. +- Some smart pointers in the standard library have counterparts in other crates that don't need heap + memory. If possible, use those. + +![Container Sizes in Rust](./container-size.svg) + +-- [Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing)