From d1153d07f69aed0c27358e236a8e6956b21219a7 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sun, 10 Feb 2019 22:44:40 -0500 Subject: [PATCH] Final draft! I think. --- _posts/2019-02-05-the-whole-world.md | 126 ++++++++++++-------- _posts/2019-02-06-stacking-up.md | 80 +++++++++---- _posts/2019-02-07-a-heaping-helping.md | 50 ++++---- _posts/2019-02-08-compiler-optimizations.md | 43 +++---- _posts/2019-02-09-summary.md | 25 +--- 5 files changed, 182 insertions(+), 142 deletions(-) diff --git a/_posts/2019-02-05-the-whole-world.md b/_posts/2019-02-05-the-whole-world.md index 7d38760..20886a4 100644 --- a/_posts/2019-02-05-the-whole-world.md +++ b/_posts/2019-02-05-the-whole-world.md @@ -7,33 +7,42 @@ tags: [rust, understanding-allocations] --- The first memory type we'll look at is pretty special: when Rust can prove that -a *value* is fixed for the life of a program (`const`), and when a *reference* is valid for -the duration of the program (`static` as a declaration, not +a *value* is fixed for the life of a program (`const`), and when a *reference* is unique for +the life of a program (`static` as a declaration, not [`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) -as a lifetime). -Understanding the distinction between value and reference is important for reasons -we'll go into below. The +as a lifetime), we can make use of global memory. This special section of data is embedded +directly in the program binary so that variables are ready to go once the program loads; +no additional computation is necessary. + +Understanding the value/reference distinction is important for reasons we'll go into below, +and while the [full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) -for these two memory types is available, but we'll take a hands-on approach to the topic. +for these two keywords is available, we'll take a hands-on approach to the topic. # **const** -The quick summary is this: `const` declares a read-only block of memory that is loaded -as part of your program binary (during the call to [exec(3)](https://linux.die.net/man/3/exec)). -Any `const` value resulting from calling a `const fn` is guaranteed to be materialized -at compile-time (meaning that access at runtime will not invoke the `const fn`), -even though the `const fn` functions are available at run-time as well. The compiler -can choose to copy the constant value wherever it is deemed practical. Getting the address -of a `const` value is legal, but not guaranteed to be the same even when referring to the -same named identifier. +When a *value* is guaranteed to be unchanging in your program (where "value" may be scalars, +`struct`s, etc.), you can declare it `const`. +This tells the compiler that it's safe to treat the value as never changing, and enables +some interesting optimizations; not only is there no initialization cost to +creating the value (it is loaded at the same time as the executable parts of your program), +but the compiler can also copy the value around if it speeds up the code. -The first point is a bit strange - "read-only memory". +The points we need to address when talking about `const` are: +- `Const` values are stored in read-only memory - it's impossible to modify. +- Values resulting from calling a `const fn` are materialized at compile-time. +- The compiler may (or may not) copy `const` values wherever it chooses. + +## Read-Only + +The first point is a bit strange - "read-only memory." [The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants) mentions in a couple places that using `mut` with constants is illegal, but it's also important to demonstrate just how immutable they are. *Typically* in Rust -you can use "inner mutability" to modify things that aren't declared `mut`. -[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an API -to guarantee at runtime that some consistency rules are enforced: +you can use [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) +to modify things that aren't declared `mut`. +[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an +example of this pattern in action: ```rust use std::cell::RefCell; @@ -55,7 +64,7 @@ fn main() { ``` -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2) -When `const` is involved though, modifications are silently ignored: +When `const` is involved though, interior mutability is impossible: ```rust use std::cell::RefCell; @@ -95,10 +104,12 @@ fn main() { -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed) When the [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md) -refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this is -what they mean. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this as an error, +refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this behavior +is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this as an error, but it's still something to be aware of. +## Initialization == Compilation + The next thing to mention is that `const` values are loaded into memory *as part of your program binary*. Because of this, any `const` values declared in your program will be "realized" at compile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may @@ -110,13 +121,16 @@ use std::cell::RefCell; const CELL: RefCell = RefCell::new(24); pub fn multiply(value: u32) -> u32 { + // CELL is stored at `.L__unnamed_1` value * (*CELL.get_mut()) } ``` --- [Compiler Explorer](https://godbolt.org/z/2KXUcN) +-- [Compiler Explorer](https://godbolt.org/z/Th8boO) -The compiler only creates one `RefCell`, and uses it everywhere. However, that value -is fully realized at compile time, and is fully stored in the `.L__unnamed_1` section. +The compiler creates one `RefCell`, uses it everywhere, and never +needs to call the `RefCell::new` function. + +## Copying If it's helpful though, the compiler can choose to copy `const` values. @@ -124,22 +138,24 @@ If it's helpful though, the compiler can choose to copy `const` values. const FACTOR: u32 = 1000; pub fn multiply(value: u32) -> u32 { + // See assembly line 4 for the `mov edi, 1000` instruction value * FACTOR } pub fn multiply_twice(value: u32) -> u32 { + // See assembly lines 22 and 29 for `mov edi, 1000` instructions value * FACTOR * FACTOR } ``` --- [Compiler Explorer](https://godbolt.org/z/_JiT9O) +-- [Compiler Explorer](https://godbolt.org/z/ZtS54X) In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the `multiply` and `multiply_twice` functions; the "1000" value is never "stored" anywhere, as it's small enough to inline into the assembly instructions. -Finally, getting the address of a `const` value is possible but not guaranteed -to be unique (given that the compiler can choose to copy values). In my testing -I was never able to get the compiler to copy a `const` value and get differing pointers, +Finally, getting the address of a `const` value is possible, but not guaranteed +to be unique (because the compiler can choose to copy values). I was unable to +get non-unique pointers in my testing (even using different crates), but the specifications are clear enough: *don't rely on pointers to `const` values being consistent*. To be frank, caring about locations for `const` values is almost certainly a code smell. @@ -147,20 +163,19 @@ is almost certainly a code smell. # **static** Static variables are related to `const` variables, but take a slightly different approach. -When the compiler can guarantee that a *reference* is fixed for the life of a program, -you end up with a `static` variable (as opposed to *values* that are fixed for the -duration a program is running). Because of this reference/value distinction, -static variables behave much more like what people expect from "global" variables. -We'll look at regular static variables first, and then address the `lazy_static!()` -and `thread_local!()` macros later. +When we declare that a *reference* is unique for the life of a program, +you have a `static` variable (unrelated to the `'static` lifetime). Because of the +reference/value distinction with `const`/`static`, +static variables behave much more like typical "global" variables. -More generally, `static` variables are globally unique locations in memory, -the contents of which are loaded as part of your program being read into main memory. -They allow initialization with both raw values and `const fn` calls, and the initial -value is loaded along with the program/library binary. All static variables must -be of a type that implements the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) -marker trait. And while `static mut` variables are allowed, mutating a static is considered -an `unsafe` operation. +But to understand `static`, here's what we'll look at: +- `static` variables are globally unique locations in memory. +- Like `const`, `static` variables are loaded at the same time as your program being read into memory. +- All `static` variables must implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) +marker trait. +- Interior mutability is safe and acceptable when using `static` variables. + +## Memory Uniqueness The single biggest difference between `const` and `static` is the guarantees provided about uniqueness. Where `const` variables may or may not be copied @@ -171,20 +186,24 @@ in code, `static` variables are guarantee to be unique. If we take a previous static FACTOR: u32 = 1000; pub fn multiply(value: u32) -> u32 { + // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used value * FACTOR } pub fn multiply_twice(value: u32) -> u32 { + // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used value * FACTOR * FACTOR } ``` --- [Compiler Explorer](https://godbolt.org/z/bSfBxn) +-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ) -Where [previously](https://godbolt.org/z/_JiT90) there were plenty of +Where [previously](#copying) there were plenty of references to multiplying by 1000, the new assembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be done, but the compiler can no longer prove the value never changes during execution. +## Initialization == Compilation + Next, let's talk about initialization. The simplest case is initializing static variables with either scalar or struct notation: @@ -208,7 +227,7 @@ fn main() { ``` -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) -Things get a bit weirder when using `const fn`. In most cases, things just work: +Things can get a bit weirder when using `const fn` though. In most cases, it just works: ```rust #[derive(Debug)] @@ -231,9 +250,9 @@ fn main() { -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) However, there's a caveat: you're currently not allowed to use `const fn` to initialize -static variables of types that aren't marked `Sync`. As an example, even though +static variables of types that aren't marked `Sync`. For example, [`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) -is `const fn`, because [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), +is a `const fn`, but because [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you'll get an error at compile time: ```rust @@ -246,16 +265,18 @@ static MY_LOCK: RefCell = RefCell::new(0); It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though. +## **Sync** + Which leads well to the next point: static variable types must implement the [`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they're globally unique, it must be safe for you to access static variables from any thread at any time. Most `struct` definitions automatically implement the `Sync` trait because they contain only elements which themselves -implement `Sync`. This is why earlier examples could get away with initializing +implement `Sync` (read more in the [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). +This is why earlier examples could get away with initializing statics, even though we never included an `impl Sync for MyStruct` in the code. -For more on the `Sync` trait, the [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html) -has a much more thorough treatment. But as an example, Rust refuses to compile -our earlier example if we add a non-`Sync` element to the `struct` definition: +To demonstrate this property, Rust refuses to compile our earlier +example if we add a non-`Sync` element to the `struct` definition: ```rust use std::cell::RefCell; @@ -273,8 +294,11 @@ static MY_STRUCT: MyStruct = MyStruct { ``` -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) +## Interior Mutability + Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. -Unlike `const` however, interior mutability is acceptable. To demonstrate: +If we want to stay in `safe` Rust, we can use interior mutability to accomplish +similar goals: ```rust use std::sync::Once; diff --git a/_posts/2019-02-06-stacking-up.md b/_posts/2019-02-06-stacking-up.md index 0dd85bc..0b06e1d 100644 --- a/_posts/2019-02-06-stacking-up.md +++ b/_posts/2019-02-06-stacking-up.md @@ -6,10 +6,10 @@ category: tags: [rust, understanding-allocations] --- -`const` and `static` are perfectly fine, but it's very rare that we know +`const` and `static` are perfectly fine, but it's relatively rare that we know at compile-time about either values or references that will be the same for the duration of our program. Put another way, it's not often the case that either you -or your compiler knows how much memory your entire program will need. +or your compiler knows how much memory your entire program will ever need. However, there are still some optimizations the compiler can do if it knows how much memory individual functions will need. Specifically, the compiler can make use of @@ -19,9 +19,9 @@ both the short- and long-term. When requesting memory, the can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (<1 nanosecond on modern CPUs). Contrast that to heap memory which requires an allocator (specialized software to track what memory is in use) to reserve space. -And when you're finished with your memory, the `pop` instruction likewise runs in +When you're finished with stack memory, the `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation -and other issues. All sorts of incredibly sophisticated techniques have been used +and other issues with the heap. All sorts of incredibly sophisticated techniques have been used to design allocators: - [Garbage Collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)) strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) @@ -37,7 +37,7 @@ But no matter how fast your allocator is, the principle remains: the fastest allocator is the one you never use. As such, we're not going to discuss how exactly the [`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but we'll focus instead on the conditions that enable the Rust compiler to use -the faster stack-based allocation for variables. +faster stack-based allocation for variables. So, **how do we know when Rust will or will not use stack allocation for objects we create?** Looking at other languages, it's often easy to delineate @@ -46,14 +46,14 @@ between stack and heap. Managed memory languages (Python, Java, place everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/), [HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize some heap allocations away, but you should never assume it will happen. -C makes things clear with calls to special functions ([malloc(3)](https://linux.die.net/man/3/malloc) -is one) being the way to use heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) +C makes things clear with calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) +needed to access heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii). -For Rust specifically, the principle is this: **stack allocation will be used for everything -that doesn't involve "smart pointers" and collections.** We'll skip over a precise definition -of the term "smart pointer" for now, and instead discuss what we should watch for when talking -about the memory region used for allocation: +For Rust, we can summarize as follows: **stack allocation will be used for everything +that doesn't involve "smart pointers" and collections**. We'll skip over a precise definition +of the term "smart pointer" for now, and instead discuss what we should watch for to understand +when stack and heap memory regions are used: 1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate allocation of stack memory: @@ -68,7 +68,7 @@ about the memory region used for allocation: ``` -- [Compiler Explorer](https://godbolt.org/z/5WSgc9) -2. Tracking when exactly heap allocation calls happen is difficult. It's typically easier to +2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to watch for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent past: ```rust @@ -200,7 +200,7 @@ pub fn total_distance() { -- [Compiler Explorer](https://godbolt.org/z/Qmx4ST) As a consequence of function arguments never using heap memory, we can also -infer that functions using the `#[inline]` attributes also do not heap-allocate. +infer that functions using the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look at the assembly to prove it: ```rust @@ -239,8 +239,42 @@ pub fn total_distance() { Finally, passing by value (arguments with type [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either moving ownership or passing a pointer) may have -[slightly different layouts in assembly](https://godbolt.org/z/sKi_kl), but will -still use either stack memory or CPU registers. +slightly different layouts in assembly, but will still use either stack memory +or CPU registers: + +```rust +pub struct Point { + x: i64, + y: i64, +} + +// Moving values +pub fn distance_moved(a: Point, b: Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared +} + +// Borrowing values has two extra `mov` instructions on lines 21 and 22 +pub fn distance_borrowed(a: &Point, b: &Point) -> i64 { + let x1 = a.x; + let x2 = b.x; + let y1 = a.y; + let y2 = b.y; + + let x_pow = (x1 - x2) * (x1 - x2); + let y_pow = (y1 - y2) * (y1 - y2); + let squared = x_pow + y_pow; + squared / squared +} +``` +-- [Compiler Explorer](https://godbolt.org/z/06hGiv) # Enums @@ -340,9 +374,9 @@ both bind *everything* by reference normally, but Python can also In Rust, arguments to closures are the same as arguments to other functions; closures are simply functions that don't have a declared name. Some weird ordering of the stack may be required to handle them, but it's the compiler's responsiblity -to figure it out. +to figure that out. -Each example below has the same effect, but compile to very different programs. +Each example below has the same effect, but a different assembly implementation. In the simplest case, we immediately run a closure returned by another function. Because we don't store a reference to the closure, the stack memory needed to store the captured values is contiguous: @@ -457,7 +491,7 @@ used for objects that aren't heap allocated, but it technically can be done. Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs [go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can be addressed here, so I'll leave them to do the job. -Even from a memory perspective though, their guideline is reasonable: +From a memory perspective though, their guideline is reasonable: [if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy). While there are potential speed tradeoffs to *benchmark* when discussing `Copy` (move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s), @@ -471,8 +505,7 @@ because it's a marker trait. From there we'll note that a type if (and only if) its components implement `Copy`, and that [no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors). Thus, assignments involving heap types are always move semantics, and new heap -allocations won't occur without explicit calls to -[`clone()`](https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone). +allocations won't occur because of implicit operator behavior. ```rust #[derive(Clone)] @@ -490,8 +523,8 @@ struct NotCopyable { # Iterators -In [managed memory languages](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357) -(like Java), there's a subtle difference between these two code samples: +In managed memory languages (like [Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), +there's a subtle difference between these two code samples: ```java public static int sum_for(List vals) { @@ -522,8 +555,7 @@ once the function ends. Sounds exactly like the issue stack-allocated objects ad In Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in heap memory, but the iterator itself ([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. -In each of the examples below we iterate over a collection, but will never need to allocate -a object on the heap to clean up: +In each of the examples below we iterate over a collection, but never use heap allocation: ```rust use std::collections::HashMap; diff --git a/_posts/2019-02-07-a-heaping-helping.md b/_posts/2019-02-07-a-heaping-helping.md index b1f0b9e..2c8c962 100644 --- a/_posts/2019-02-07-a-heaping-helping.md +++ b/_posts/2019-02-07-a-heaping-helping.md @@ -12,11 +12,11 @@ how the language uses dynamic memory (also referred to as the **heap**) is a sys And as the docs mention, ownership [is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). -The heap is used in two situations: when the compiler is unable to predict the *total size -of memory needed*, or *how long the memory is needed for*, it will allocate space in the heap. +The heap is used in two situations; when the compiler is unable to predict either the *total size +of memory needed*, or *how long the memory is needed for*, it allocates space in the heap. This happens pretty frequently; if you want to download the Google home page, you won't know -how large it is until your program runs. And when you're finished with Google, whenever that -happens to be, we deallocate the memory so it can be used to store other webpages. If you're +how large it is until your program runs. And when you're finished with Google, we deallocate +the memory so it can be used to store other webpages. If you're interested in a slightly longer explanation of the heap, check out [The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap) in Rust's documentation. @@ -32,8 +32,8 @@ To start off, take a guess for how many allocations happen in the program below: fn main() {} ``` -It's obviously a trick question; while no heap allocations happen as a result of -the code listed above, the setup needed to call `main` does allocate on the heap. +It's obviously a trick question; while no heap allocations occur as a result of +that code, the setup needed to call `main` does allocate on the heap. Here's a way to show it: ```rust @@ -78,8 +78,8 @@ we'll follow this guide: Finally, there are two "addendum" issues that are important to address when discussing Rust and the heap: -- Stack-based alternatives to some standard library types are available -- Special allocators to track memory behavior are available +- Non-heap alternatives to many standard library types are available. +- Special allocators to track memory behavior should be used to benchmark code. # Smart pointers @@ -99,7 +99,7 @@ crate should look mostly familiar: - [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html) The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers -to manage heap objects, though more than can be covered here. Some examples: +to manage heap objects, though more than can be covered here. Some examples are: - [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) - [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) @@ -112,8 +112,8 @@ have more information. When a smart pointer is created, the data it is given is placed in heap memory and the location of that data is recorded in the smart pointer. Once the smart pointer has determined it's safe to deallocate that memory (when a `Box` has -[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or when -reference count for an object [goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), +[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a +reference count [goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can prove these types use heap memory by looking at code: @@ -146,18 +146,18 @@ pub fn my_cow() { # Collections -Collections types use heap memory because their contents have dynamic size; they will request +Collection types use heap memory because their contents have dynamic size; they will request more memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can [release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it's no longer necessary. This dynamic property forces Rust to heap allocate -everything they contain. In a way, **collections are smart pointers for many objects at once.** +everything they contain. In a way, **collections are smart pointers for many objects at a time**. Common types that fall under this umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html), [`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and [`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) -(not [`&str`](https://doc.rust-lang.org/std/primitive.str.html)). +(not [`str`](https://doc.rust-lang.org/std/primitive.str.html)). -But while collections store the objects they own in heap memory, *creating new collections +While collections store the objects they own in heap memory, *creating new collections will not allocate on the heap*. This is a bit weird; if we call `Vec::new()`, the assembly shows a corresponding call to `real_drop_in_place`: @@ -169,7 +169,7 @@ pub fn my_vec() { ``` -- [Compiler Explorer](https://godbolt.org/z/1WkNtC) -But because the vector has no elements it is managing, no calls to the allocator +But because the vector has no elements to manage, no calls to the allocator will ever be dispatched: ```rust @@ -218,12 +218,12 @@ and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#me # Heap Alternatives -While it is a bit strange for us to talk of the stack after spending time with the heap, +While it is a bit strange to speak of the stack after spending time with the heap, it's worth pointing out that some heap-allocated objects in Rust have stack-based counterparts provided by other crates. If you have need of the functionality, but want to avoid allocating, -there are some great alternatives. +there are typically alternatives available. -When it comes to some of the standard library smart pointers +When it comes to some standard library smart pointers ([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and @@ -233,10 +233,9 @@ are provided in crates like [parking_lot](https://crates.io/crates/parking_lot) [`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you're in need of synchronization primitives. -[thread_id](https://crates.io/crates/thread-id) -may still be necessary if you're implementing an allocator (*cough cough* the author *cough cough*) +[thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) -[uses a `thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#22-40) +uses a [`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36) that needs heap allocation. # Tracing Allocators @@ -248,7 +247,6 @@ You should never rely on your instincts when [a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM). Similarly, there's great work going on in Rust with allocators that keep track of what -they're doing. [`alloc_counter`](https://crates.io/crates/alloc_counter) was designed -for exactly this purpose. When it comes to tracking heap behavior, you shouldn't just -rely on the language; please measure and make sure that you have tools in place to catch -any issues that come up. +they're doing (like [`alloc_counter`](https://crates.io/crates/alloc_counter)). +When it comes to tracking heap behavior, it's easy to make mistakes; +please write tests and make sure you have tools to guard against future issues. diff --git a/_posts/2019-02-08-compiler-optimizations.md b/_posts/2019-02-08-compiler-optimizations.md index 2e96008..ac5f864 100644 --- a/_posts/2019-02-08-compiler-optimizations.md +++ b/_posts/2019-02-08-compiler-optimizations.md @@ -12,25 +12,25 @@ We've spent time showing how those rules work themselves out in practice, and become familiar with reading the assembly code needed to see each memory type (global, stack, heap) in action. -But throughout the content so far, we've put a handicap on the code. +But throughout the series so far, we've put a handicap on the code. In the name of consistent and understandable results, we've asked the compiler to pretty please leave the training wheels on. Now is the time -where we throw out all the rules and take the kid gloves off. As it turns out, +where we throw out all the rules and take off the kid gloves. As it turns out, both the Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them do their job. Similar to ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're focusing on interesting things the Rust language (and LLVM!) can do -as regards memory management. We'll still be looking at assembly code to +with memory management. We'll still be looking at assembly code to understand what's going on, but it's important to mention again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to double-check memory behavior if it's something you care about**. It's far too easy to mis-read assembly in large code sections, you should -always have an automated tool verify behavior if you care about memory usage. +always verify behavior if you care about memory usage. The guiding principal as we move forward is this: *optimizing compilers -won't produce worse assembly than we started with.* There won't be any +won't produce worse programs than we started with.* There won't be any situations where stack allocations get moved to heap allocations. There will, however, be an opera of optimization. @@ -40,7 +40,7 @@ Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important part is that LLVM -can sometimes do better than the baseline Rust language. +can sometimes do better than the baseline Rust language: ```rust use std::alloc::{GlobalAlloc, Layout, System}; @@ -87,13 +87,13 @@ unsafe impl GlobalAlloc for PanicAllocator { With some collections, LLVM can predict how large they will become and allocate the entire size on the stack instead of the heap. -This works whether with both the pre-allocation (`Vec::with_capacity`) -*and re-allocation* (`Vec::push`) methods for collections types. -Not only can LLVM predict sizing if you reserve the fully size up front, +This works with both the pre-allocation (`Vec::with_capacity`) +*and re-allocation* (`Vec::push`) methods for collection types. +Not only can LLVM predict sizing if you reserve everything up front, it can see through the resizing operations and find the total size. While this specific optimization is unlikely to come up in production usage, it's cool to note that LLVM does a considerable amount of work -to understand what code actually does. +to understand what the code will do: ```rust use std::alloc::{GlobalAlloc, Layout, System}; @@ -104,13 +104,16 @@ fn main() { DO_PANIC.store(true, Ordering::SeqCst); // If the compiler can predict how large a vector will be, - // it can optimize out the heap storage needed. This also - // works with `Vec::with_capacity()`, but the push case - // is a bit more interesting. + // it can optimize out the heap storage needed. let mut x: Vec = Vec::new(); x.push(12); - assert_eq!(x[0], 12); + + let mut y: Vec = Vec::with_capacity(1); + y.push(12); + + assert_eq!(x[0], y[0]); drop(x); + drop(y); // Turn off panicking, as there are some deallocations // when we exit main. @@ -138,21 +141,21 @@ unsafe impl GlobalAlloc for PanicAllocator { } } ``` --- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1dfccfcf63d8800e644a3b948f1eeb7b) +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=af660a87b2cd94213afb906beeb32c15) # Dr. Array or: How I Learned to Love the Optimizer Finally, this isn't so much about LLVM figuring out different memory behavior, -but LLVM totally stripping out code that has no side effects. Optimizations of +but LLVM stripping out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if you're not careful, they can make your benchmarks look [impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). -In Rust, the `black_box` function (in both +In Rust, the `black_box` function (implemented in both [`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and [`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler to disable this kind of optimization. But if you let -LLVM remove unnecessary code, you can end up with programs that -would have previously caused errors running just fine: +LLVM remove unnecessary code, you can end up running programs that +previously caused errors: ```rust #[derive(Default)] @@ -183,5 +186,5 @@ pub fn main() { let _x = EightM::default(); } ``` --- [Compiler Explorer](https://godbolt.org/z/daHn7P) +-- [Compiler Explorer](https://godbolt.org/z/daHn7P) -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0) diff --git a/_posts/2019-02-09-summary.md b/_posts/2019-02-09-summary.md index ab29d14..7a90a8b 100644 --- a/_posts/2019-02-09-summary.md +++ b/_posts/2019-02-09-summary.md @@ -9,16 +9,8 @@ tags: [rust, understanding-allocations] While there's a lot of interesting detail captured in this series, it's often helpful to have a document that answers some "yes/no" questions. You may not care about what an `Iterator` looks like in assembly, you just need to know whether it allocates -an object on the heap or not. - -To that end, it should be said once again: if you care about memory behavior, -use an allocator to verify the correct behavior. Tools like -[`alloc_counter`](https://crates.io/crates/alloc_counter) are designed to make -testing this behavior simple easy. - -Finally, a summary of the content that's been covered. Rust will prioritize -the fastest behavior it can, but here are the ground rules for understanding -the memory model in Rust: +an object on the heap or not. And while Rust will prioritize the fastest behavior it can, +here are the rules for each memory type: **Heap Allocation**: - Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory. @@ -27,7 +19,7 @@ the memory model in Rust: don't need heap memory. If possible, use those. **Stack Allocation**: -- Everything not using a smart pointer type will be allocated on the stack. +- Everything not using a smart pointer will be allocated on the stack. - Structs, enums, iterators, arrays, and closures are all stack allocated. - Cell types (`RefCell`) behave like smart pointers, but are stack-allocated. - Inlining (`#[inline]`) will not affect allocation behavior for better or worse. @@ -37,14 +29,5 @@ the memory model in Rust: - `const` is a fixed value; the compiler is allowed to copy it wherever useful. - `static` is a fixed reference; the compiler will guarantee it is unique. -And a nice visualizaton of the rules, courtesy of -[Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing): - ![Container Sizes in Rust](/assets/images/2019-02-04-container-size.svg) - ---- - -If you've taken the time to read through this series: thanks. I've enjoyed the -process that went into writing this, both in building new tools and learning -the material well enough to explain it. I hope this is valuable as a reference -to you as well. +-- [Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing)