diff --git a/_drafts/a-heaping-helping.md b/_drafts/a-heaping-helping.md index 408a3ba..ddc7e6b 100644 --- a/_drafts/a-heaping-helping.md +++ b/_drafts/a-heaping-helping.md @@ -12,18 +12,21 @@ how the language uses dynamic memory (also referred to as the **heap**) is a sys And as the docs mention, ownership [is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). -The heap is used in two situations; when the compiler is unable to predict either the *total size +The heap is used in two situations: when the compiler is unable to predict the *total size of memory needed*, or *how long the memory is needed for*, it will allocate space in the heap. This happens pretty frequently; if you want to download the Google home page, you won't know -how large it is until your program runs. And when you're finished with Google, whenever that might be, -we deallocate the memory so it can be used to store other webpages. +how large it is until your program runs. And when you're finished with Google, whenever that +happens to be, we deallocate the memory so it can be used to store other webpages. If you're +interested in a slightly longer explanation of the heap, check out +[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap) +in Rust's documentation. We won't go into detail on how the heap is managed; the [ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a phenomenal job explaining both the "why" and "how" of memory management. Instead, we're going to focus on understanding "when" heap allocations occur in Rust. -To start off: take a guess for how many allocations happen in the program below: +To start off, take a guess for how many allocations happen in the program below: ```rust fn main() {} @@ -72,8 +75,11 @@ we'll follow this guide: - Smart pointers hold their contents in the heap - Collections are smart pointers for many objects at a time, and reallocate when they need to grow -- `lazy_static!` and `thread_local!` force heap allocation for everything. -- Stack-based alternatives to standard library types should be preferred (spin, parking_lot) + +Finally, there are two "addendum" issues that are important to address when discussing +Rust and the heap: +- Stack-based alternatives to some standard library types are available +- Special allocators to track memory behavior are available # Smart pointers @@ -98,10 +104,10 @@ to manage heap objects, though more than can be covered here. Some examples: - [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): -cell types (like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) -follow the RAII pattern, but don't involve heap allocation. Check out the +**cell types** (like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) +look and behave similarly, but **don't involve heap allocation**. The [`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) -for more information. +have more information. When a smart pointer is created, the data it is given is placed in heap memory and the location of that data is recorded in the smart pointer. Once the smart pointer @@ -117,40 +123,43 @@ use std::sync::Arc; use std::borrow::Cow; pub fn my_box() { - // Drop at line 1640 + // Drop at assembly line 1640 Box::new(0); } pub fn my_rc() { - // Drop at line 1650 + // Drop at assembly line 1650 Rc::new(0); } pub fn my_arc() { - // Drop at line 1660 + // Drop at assembly line 1660 Arc::new(0); } pub fn my_cow() { - // Drop at line 1672 + // Drop at assembly line 1672 Cow::from("drop"); } ``` --- [Compiler Explorer](https://godbolt.org/z/SaDpWg) +-- [Compiler Explorer](https://godbolt.org/z/4AMQug) # Collections -Collections types use heap memory because they have dynamic size; they will request more memory -[when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), +Collections types use heap memory because their contents have dynamic size; they will request +more memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can [release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) -when it's no longer necessary. This dynamic memory usage forces Rust to heap allocate +when it's no longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In a way, **collections are smart pointers for many objects at once.** -Common types that fall under this umbrella are `Vec`, `HashMap`, and `String` +Common types that fall under this umbrella are +[`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html), +[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and +[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not [`&str`](https://doc.rust-lang.org/std/primitive.str.html)). But while collections store the objects they own in heap memory, *creating new collections -will not allocate on the heap*. This is a bit weird, because if we call `Vec::new()` the -assembly shows a corresponding call to `drop_in_place`: +will not allocate on the heap*. This is a bit weird; if we call `Vec::new()`, the +assembly shows a corresponding call to `real_drop_in_place`: ```rust pub fn my_vec() { @@ -161,27 +170,58 @@ pub fn my_vec() { -- [Compiler Explorer](https://godbolt.org/z/1WkNtC) But because the vector has no elements it is managing, no calls to the allocator -will ever be dispatched. A couple of places to look at for confirming this behavior: -[`Vec::new()`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new), +will ever be dispatched: + +```rust +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicBool, Ordering}; + +fn main() { + // Turn on panicking if we allocate on the heap + DO_PANIC.store(true, Ordering::SeqCst); + + // Interesting bit happens here + let x: Vec = Vec::new(); + drop(x); + + // Turn panicking back off, some deallocations occur + // after main as well. + DO_PANIC.store(false, Ordering::SeqCst); +} + +#[global_allocator] +static A: PanicAllocator = PanicAllocator; +static DO_PANIC: AtomicBool = AtomicBool::new(false); +struct PanicAllocator; + +unsafe impl GlobalAlloc for PanicAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected allocation."); + } + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected deallocation."); + } + System.dealloc(ptr, layout); + } +} +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6) + +Other standard library types follow the same behavior; make sure to check out [`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new), and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new). -# **lazy_static!** and **thread_local!** - -There are two macros worth addressing in a conversation about heap memory. The first isn't part -of the standard library, but it's the [5th most downloaded crate](https://crates.io/crates/lazy_static) -in Rust. The second - -TODO: Not so sure about lazy_static anymore. Is thread_local possibly heap-allocated too? -- Think it may actually be that lazy_static has a no_std mode that uses `spin`, std-mode uses std::Once. -- Reasonably confident thread_local always allocates - # Heap Alternatives -While it is a bit strange for us to talk of the stack after spending so much time with the heap, +While it is a bit strange for us to talk of the stack after spending time with the heap, it's worth pointing out that some heap-allocated objects in Rust have stack-based counterparts -provided by other crates. There are a number of cases where this may be helpful, so it's useful -to know that alternatives exist if you need them. +provided by other crates. If you have need of the functionality, but want to avoid allocating, +there are some great alternatives. When it comes to some of the standard library smart pointers ([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and @@ -198,3 +238,15 @@ may still be necessary if you're implementing an allocator (*cough cough* the au because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) [uses a `thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#22-40) that needs heap allocation. + +# Tracing Allocators + +When writing performance-sensitive code, there's no alternative to measuring your code. +[Measure first](https://youtu.be/nXaxk27zwlk?t=583), because you should never rely on +your instincts when [a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM). + +Similarly, there's great work going on in Rust with allocators that keep track of what +they're doing. [`alloc_counter`](https://crates.io/crates/alloc_counter) was designed +for exactly this purpose. When it comes to tracking heap behavior, you shouldn't just +rely on the language; please measure and make sure that you have tools in place to catch +any issues that come up. diff --git a/_drafts/compiler-optimizations.md b/_drafts/compiler-optimizations.md index 753bc8e..d74c140 100644 --- a/_drafts/compiler-optimizations.md +++ b/_drafts/compiler-optimizations.md @@ -19,13 +19,13 @@ where we throw out all the rules and take the kid gloves off. As it turns out, both the Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them do their job. +Similar to ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), +we're focusing on interesting things the Rust language (and LLVM!) can do. We'll still be looking at assembly code to understand what's going on, but it's important to mention again: **please use automated tools like -[qadapt](https://crates.io/crates/qadapt) to double-check memory behavior**. +[alloc-counter](https://crates.io/crates/alloc_counter) to double-check memory behavior**. It's far too easy to mis-read assembly in large code sections, you should always have an automated tool verify behavior if you care about memory usage. -Similar to ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), -we're just focused on interesting things the Rust language can do. The guiding principal as we move forward is this: *optimizing compilers won't produce worse assembly than we started with.* There won't be any @@ -35,19 +35,24 @@ There will, however, be an opera of optimization. # The Case of the Disappearing Box ```rust -// Currently doesn't work, not sure why. use std::alloc::{GlobalAlloc, Layout, System}; use std::sync::atomic::{AtomicBool, Ordering}; fn allocate_box() { - let x = Box::new(0); + let _x = Box::new(0); } pub fn main() { // Turn on panicking if we allocate on the heap DO_PANIC.store(true, Ordering::SeqCst); + // This code will only run with the mode set to "Release". + // If you try running in "Debug", you'll get a panic. allocate_box(); + + // Turn off panicking, as there are some deallocations + // when we exit main. + DO_PANIC.store(false, Ordering::SeqCst); } #[global_allocator] @@ -71,7 +76,81 @@ unsafe impl GlobalAlloc for PanicAllocator { } } ``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=3fe2846dac6755dbb7bb90342d0bf135) # Vectors of Usual Size +```rust +use std::alloc::{GlobalAlloc, Layout, System}; +use std::sync::atomic::{AtomicBool, Ordering}; + +fn main() { + // Turn on panicking if we allocate on the heap + DO_PANIC.store(true, Ordering::SeqCst); + + // If the compiler can predict how large a vector will be, + // it can optimize out the heap storage needed. + let x: Vec = Vec::with_capacity(5); + drop(x); + + // Turn off panicking, as there are some deallocations + // when we exit main. + DO_PANIC.store(false, Ordering::SeqCst); +} + +#[global_allocator] +static A: PanicAllocator = PanicAllocator; +static DO_PANIC: AtomicBool = AtomicBool::new(false); +struct PanicAllocator; + +unsafe impl GlobalAlloc for PanicAllocator { + unsafe fn alloc(&self, layout: Layout) -> *mut u8 { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected allocation."); + } + System.alloc(layout) + } + + unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) { + if DO_PANIC.load(Ordering::SeqCst) { + panic!("Unexpected deallocation."); + } + System.dealloc(ptr, layout); + } +} +``` +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=5e9761b63243018d094829d901dd85c4) + # Dr. Array or: How I Learned to Love the Optimizer + +```rust +#[derive(Default)] +struct TwoFiftySix { + _a: [u64; 32] +} + +#[derive(Default)] +struct EightK { + _a: [TwoFiftySix; 32] +} + +#[derive(Default)] +struct TwoFiftySixK { + _a: [EightK; 32] +} + +#[derive(Default)] +struct EightM { + _a: [TwoFiftySixK; 32] +} + +pub fn main() { + // Normally this blows up because we can't reserve size on stack + // for the `EightM` struct. But because the compiler notices we + // never do anything with `_x`, it optimizes out the stack storage + // and the program completes successfully. + let _x = EightM::default(); +} +``` +-- [Compiler Explorer](https://godbolt.org/z/daHn7P) +-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0) diff --git a/_drafts/stacking-up.md b/_drafts/stacking-up.md index c5492e8..ebfe353 100644 --- a/_drafts/stacking-up.md +++ b/_drafts/stacking-up.md @@ -42,16 +42,16 @@ the faster stack-based allocation for variables. With that in mind, let's get into the details. How do we know when Rust will or will not use stack allocation for objects we create? Looking at other languages, it's often easy to delineate between stack and heap. Managed memory languages (Python, Java, -[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) assume -everything is on the heap. JIT compilers ([PyPy](https://www.pypy.org/), +[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) +place everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/), [HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize some heap allocations away, but you should never assume it will happen. C makes things clear with calls to special functions ([malloc(3)](https://linux.die.net/man/3/malloc) is one) being the way to use heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii). -For Rust specifically, the principle is this: *stack allocation will be used for everything -that doesn't involve "smart pointers" and collections.* If we're interested in dissecting it though, +For Rust specifically, the principle is this: **stack allocation will be used for everything +that doesn't involve "smart pointers" and collections.** If we're interested in dissecting it though, there are three things we pay attention to: 1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) @@ -101,9 +101,7 @@ With all that in mind, let's talk about situations in which we're guaranteed to - [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be stack-allocated, and copying them will be done in stack memory. - [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library - are stack-allocated. No worrying about some - ["managed languages"](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357) - creating garbage. + are stack-allocated even when iterating over heap-based collections. # Structs @@ -491,3 +489,69 @@ struct NotCopyable { # Iterators +In [managed memory languages](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357) +(like Java), there's a subtle difference between these two code samples: + +```java +public static int sum_for(List vals) { + long sum = 0; + // Regular for loop + for (int i = 0; i < vals.length; i++) { + sum += vals[i]; + } + return sum; +} + +public static int sum_foreach(List vals) { + long sum = 0; + // "Foreach" loop - uses iteration + for (Long l : vals) { + sum += l; + } + return sum; +} +``` + +In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, +an object of type [`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html) +is allocated on the heap, and will eventually be garbage-collected. This isn't a great design; +iterators are often transient objects that you need during a function and can discard +once the function ends. Sounds exactly like the issue stack-allocated objects address, no? + +In Rust, iterators are allocated on the stack. The objects to iterate over are almost +certainly in heap memory, but the iterator itself +([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. +In each of the examples below we iterate over a collection, but will never need to allocate +a object on the heap to clean up: + +```rust +use std::collections::HashMap; +// There's a lot of assembly generated, but if you search in the text, +// there are no references to `real_drop_in_place` anywhere. + +pub fn sum_vec(x: &Vec) { + let mut s = 0; + // Basic iteration over vectors doesn't need allocation + for y in x { + s += y; + } +} + +pub fn sum_enumerate(x: &Vec) { + let mut s = 0; + // More complex iterators are just fine too + for (_i, y) in x.iter().enumerate() { + s += y; + } +} + +pub fn sum_hm(x: &HashMap) { + let mut s = 0; + // And it's not just Vec, all types will allocate the iterator + // on stack memory + for y in x.values() { + s += y; + } +} +``` +-- [Compiler Explorer](https://godbolt.org/z/FTT3CT)