More content on stack allocations

This commit is contained in:
Bradlee Speice 2019-01-26 13:53:33 -05:00
parent e7c273a5ec
commit 178abe5dfe

View File

@ -30,35 +30,47 @@ section for easy citation in the future. To that end, a table of contents is pro
to assist in easy navigation:
- [Foreword](#foreword)
- [Stacking Up: Non-Heap Memory Types](#non-heap-memory-types)
- [Piling On: Rust and the Heap](#piling-on-rust-and-the-heap)
- [Compiler Optimizations Make Everything Complicated](#compiler-optimizations-make-everything-complicated)
- [The Whole World: Global Memory Usage](#the-whole-world-global-memory-usage)
- [Stacking Up: Non-Heap Memory](#stacking-up-non-heap-memory)
- [A Heaping Helping: Rust and Dynamic Memory](#a-heaping-helping-rust-and-dynamic-memory)
- [Compiler Optimizations: What It's Done For You Lately](#compiler-optimizations-what-its-done-for-you-lately)
- Summary: When Does Rust Allocate?
- [Appendix and Further Reading](#appendix-and-further-reading)
# Foreword
There's a simple checklist to see if you can skip over reading this article. You must:
Rust's three defining features of [Performance, Reliability, and Productivity](https://www.rust-lang.org/)
are all driven to a great degree by the how the Rust compiler understands
[memory ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html). Unlike managed memory
languages (Java, Python), Rust [doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis)
garbage collect, leading to fast code when [dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation)
isn't necessary. When heap memory is necessary, Rust ensures you can't accidentally mis-manage it.
And because the compiler handles memory "ownership" for you, developers never need to worry about
accidentally deleting data that was needed somewhere else.
1. Only write `#![no_std]` crates
2. Never use `unsafe`
3. Never use `#![feature(alloc)]`
That said, there are situations where you won't benefit from work the Rust compiler is doing.
If you:
1. Never use `unsafe`
2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)
...then it's not possible for you to use dynamic memory!
For some uses of Rust, typically embedded devices, these constraints make sense.
They're working with very limited memory, and the program binary size itself may
significantly affect what's available! There's no operating system able to manage
this "virtual memory" junk, but that's not an issue because there's only one
running application. The [embedonomicon] is ever in mind, and interacting with the
"real world" through extra peripherals is accomplished by reading and writing to
exact memory addresses.
They have very limited memory, and the program binary size itself may significantly
affect what's available! There's no operating system able to manage
this ["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) junk, but that's
not an issue because there's only one running application. The
[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind,
and interacting with the "real world" through extra peripherals is accomplished by
reading and writing to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html).
Most Rust programs find these requirements overly burdensome though. C++ developers
would struggle without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector)
(except those hardcore no-STL guys), and Rust developers would struggle without
(except those hardcore no-STL people), and Rust developers would struggle without
[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But in this scenario,
`std::vec` is actually part of the [`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html),
and thus off-limits (because the `alloc` crate requires `#![feature(alloc)]`).
Also, `Box` is right out for the same reason.
`std::vec` is actually aliased to a part of the
[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits.
`Box`, `Rc`, etc., are also unusable for the same reason.
Whether writing code for embedded devices or not, the important thing in both situations
is how much you know *before your application starts* about what its memory usage will look like.
@ -67,82 +79,45 @@ In a browser, you have no idea how large [google.com](https://www.google.com)'s
trying to download it. The compiler uses this information (or lack thereof) to optimize
how memory is used; put simply, your code runs faster when the compiler can guarantee exactly
how much memory your program needs while it's running. This post is all about understanding
the optimization tricks the compiler uses, and how you can help the compiler and make
your programs more efficient.
how the compiler reasons about your program, with an emphasis on how to design your programs
for performance.
Now let's address some conditions and caveats before going much further:
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
(think the [libc] and [winapi] implementations of [malloc]) that we'll ignore.
([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll ignore.
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`)
and address (pun intended) "release" mode at the end (`cargo run --release` and `cargo test --release`).
and address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`).
- All content will be run using Rust 1.31, as that's the highest currently supported in the
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid talking about things like
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
that are available in nightly.
- Because of the nature of the content, some (very simple) assembly-level code is involved.
We'll keep this to a minimum, but I [needed](https://stackoverflow.com/a/4584131/1454178)
We'll keep this simple, but I [found](https://stackoverflow.com/a/4584131/1454178)
a [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop`
[instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
while writing this post.
was helpful while writing this post.
And finally, I'll do what I can to flag potential future changes, but the Rust docs
Finally, I'll do what I can to flag potential future changes but the Rust docs
have a notice worth repeating:
> Rust does not currently have a rigorously and formally defined memory model.
>
> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html)
# Stacking Up: Non-Heap Memory Types
We'll start with the ["happy path"](https://en.wikipedia.org/wiki/Happy_path):
what happens when Rust is able to figure out *at compile time* how much memory
will be used in your program.
This is important because of the extra optimizations Rust uses when it can predict
how much memory is needed! Let's go over a quick example:
```rust
const MICROS_PER_MILLI: u32 = 1000;
const NANOS_PER_MICRO: u32 = 1000;
pub fn millis_to_nanos(millis: u32) -> u32 {
let micros = millis * MICROS_PER_MILLI;
let nanos = micros * NANOS_PER_MICRO;
return nanos;
}
```
-- [Compiler Explorer](https://godbolt.org/z/tOwngk)
Forgive the overly simple code, but this shows off what the compiler can figure out
about your program:
1. There's one `u32` passed to the function, and two `u32`'s used in the function body.
Each one is 4 bytes, for a total of 12 bytes. We can temporarily reserve space for all
variables because we know exactly how much space is needed.
- If you're looking at the assembly: `millis` is stored in `edi`,
`micros` is stored in `eax`, and `nanos` is stored in `ecx`.
The `eax` register is re-used to store the final result.
2. Because `MICROS_PER_MILLI` and `NANOS_PER_MICRO` are constants, the compiler never
allocates memory, and just burns the constants into the final program.
- Look for the instructions `mov edi, 1000` and `mov ecx, 1000`.
Given this information, the compiler can efficiently lay out your memory usage so
that the program never needs to ask the kernel/allocator for memory! This example
was a bit silly though, so let's talk about the more interesting details.
## **const** and **static**: Program Allocations
# The Whole World: Global Memory Usage
The first memory type we'll look at is pretty special: when Rust can prove that
a *value* is fixed for the life of a program, and when a *reference* is valid for
the duration of the program (`static`, not specifically `'static`).
a *value* is fixed for the life of a program (`const`), and when a *reference* is valid for
the duration of the program (`static` as a declaration, not
[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime)
as a lifetime).
Understanding the distinction between value and reference is important for reasons
we'll go into below. The
[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md)
for these two memory types is available, but we'll take a hands-on approach to the topic.
### **const**
## **const**
The quick summary is this: `const` declares a read-only block of memory that is loaded
as part of your program binary (during the call to [exec(3)](https://linux.die.net/man/3/exec)).
@ -270,7 +245,7 @@ but the specifications are clear enough: *don't rely on pointers to `const`
values being consistent*. To be frank, caring about locations for `const` values
is almost certainly a code smell.
### **static**
## **static**
Static variables are related to `const` variables, but take a slightly different approach.
When the compiler can guarantee that a *reference* is fixed for the life of a program,
@ -420,10 +395,10 @@ fn main() {
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)
## **push** and **pop**: Stack Allocations
# Stacking Up: Non-Heap Memory
**const** and **static** are perfectly fine, but it's very rare that we know
at compile-time about either references or values that will be the same for the entire
`const` and `static` are perfectly fine, but it's very rare that we know
at compile-time about either values or references that will be the same for the entire
time our program is running. Put another way, it's not often the case that either you
or your compiler know how much memory your entire program will need.
@ -435,7 +410,7 @@ both the short- and long-term. When requesting memory, the
can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods)
(<1 nanosecond on modern CPUs). Heap memory instead requires using an allocator
(specialized software to track what memory is in use) to reserve space.
And when you're finished with memory, the `pop` instruction likewise runs in
And when you're finished with your memory, the `pop` instruction likewise runs in
1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation
and other issues. All sorts of incredibly sophisticated techniques have been used
to design allocators:
@ -448,7 +423,7 @@ to design allocators:
- Arena structures used in [jemalloc](http://jemalloc.net/), which until recently
was the primary allocator for Rust programs!
But no matter how sophisticated your allocator is, the principle remains: the
But no matter how fast your allocator is, the principle remains: the
fastest allocator is the one you never use. As such, we're not going to go
in detail on how exactly the
[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html),
@ -459,7 +434,8 @@ Now, one question I hope you're asking is "how do we distinguish stack- and
heap-based allocations in Rust code?" There are three strategies I'm going
to use for this:
1. Any time the `push` or `pop` instructions are used, or the `rsp` register is modified,
1. When the stack pointer is modified to initialize a variable (done through either
`push`/`pop` instructions or the `rsp` register being modified),
this is a stack allocation:
```rust
pub fn stack_alloc(x: u32) -> u32 {
@ -471,8 +447,11 @@ to use for this:
}
```
-- [Compiler Explorer](https://godbolt.org/z/gKFOgB)
2. Any time `call core::ptr::drop_in_place` occurs, a heap allocation has occurred
sometime in the past and it is now time for us to de-allocate the memory:
2. Because there's a good deal of setup before heap allocations actually happen,
it's typically easier to watch for
["dropping"](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#ownership-rules)
variables instead. Any time `call core::ptr::drop_in_place` occurs, we can infer
a heap allocation has occurred sometime in the past related to our variable:
```rust
pub fn heap_alloc(x: usize) -> usize {
// Space for elements in a vector has to be allocated
@ -483,76 +462,129 @@ to use for this:
}
```
-- [Compiler Explorer](https://godbolt.org/z/T2xoh8) (`drop_in_place` happens on line 1321)
<span style="font-size: .8em">Note: While the [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is run
for stack-allocated objects, the Rust standard library only defines `Drop` implementations
for types that involve heap allocation.</span>
3. Using a special [`GlobalAlloc`](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html)
implementation to track when heap allocations occur. For this post, I'll be using
[qadapt](https://crates.io/crates/qadapt) to trigger a panic if heap allocations
occur; code that doesn't panic doesn't use heap allocations, and by necessity
uses stack allocation instead.
[qadapt](https://crates.io/crates/qadapt) to [trigger a panic](https://speice.io/2018/12/allocation-safety.html)
if heap allocations occur; code that doesn't panic doesn't use heap allocations.
With all that in mind, let's get into the details. The unfortunate thing about stack allocations
in Rust is that there's not a good
way to glance at code and figure out where allocations on the heap happen. Looking at
other languages, Java mostly cares about `new MyObject()` (yes, I'm conveniently ignoring
With all that in mind, let's get into the details. How do we know when Rust will or will not use
stack allocation for objects we create? Looking at other languages, it's often easy to identify
when this happens: Java only cares about `new MyObject()` (yes, I'm conveniently ignoring
autoboxing). C makes things clear with calls to [malloc(3)](https://linux.die.net/man/3/malloc),
and old C++ has the [new](https://stackoverflow.com/a/655086/1454178) keyword.
Rust's model most closely aligns with C++11 and [RAII](https://en.cppreference.com/w/cpp/language/raii);
[`Box`](https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html)
is comparable to [`std::make_unique()`](https://en.cppreference.com/w/cpp/memory/unique_ptr/make_unique),
and [`Rc`](https://doc.rust-lang.org/stable/alloc/rc/struct.Rc.html) behaves like
[`std::make_shared()`](https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared).
and old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword.
Modern C++ is a bit more complicated with C++11 and [RAII](https://en.cppreference.com/w/cpp/language/raii);
[`std::make_unique()`](https://en.cppreference.com/w/cpp/memory/unique_ptr/make_unique) and
[`std::make_shared()`](https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared) are
used most often in this context (and are equivalent to [`Box`](https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html)
and [`Rc`](https://doc.rust-lang.org/stable/alloc/rc/struct.Rc.html) in Rust!).
But what can be done to ensure your program is using stack allocations? Some guidelines
are in order:
For Rust specifically, the principle is this: *stack allocation will be used for all types
that don't use "smart pointers" and collections.* We're going to expand on this to clarify
some common questions though:
**For code you control**:
- Don't use smart pointer types, as they force heap allocation -
[`Box`](https://doc.rust-lang.org/stable/alloc/boxed/struct.Box.html),
[`Rc`](https://doc.rust-lang.org/stable/alloc/rc/struct.Rc.html), etc.
- Cloning or copying stack-allocated objects creates new objects that are
stack-allocated.
- Smart pointer types (`Box`, `Rc`) and collections (`String`, `Vec`, `HashMap`)
force heap allocation for the data they manage.
- Enums and other wrapper types will not trigger heap allocations unless
their contents need heap allocation. You can use
[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) and
[`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)
with reckless abandon.
- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are guaranteed
to be stack-allocated, but dynamically resizable types (
[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html),
[`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),
[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html))
will store their contents in the heap
- Note to self: Do I need to mention generics or trait objects? I think this
may be handled by the other points, and can be addressed later. Also, is it
obvious that cloning stack-allocated data puts things on the stack? Is there
a way to address that without it being a unique point?
their contents need heap allocation.
- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are guaranteed to be
stack-allocated, even if their size overflows available stack memory.
- Using the [`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute)
will not change the memory region used.
- [Closures](https://doc.rust-lang.org/reference/types/closure.html) obey the same
rules as `struct` and `enum` types; only closures wrapped in smart pointers
trigger an allocation.
**For code outside your control**: (crates you rely on)
- Review the code to make sure it abides by the guidelines above
- Use a custom allocator like [qadapt](https://crates.io/crates/qadapt) as an automated check
- Use an allocator like [qadapt](https://crates.io/crates/qadapt) as an automated check
to make sure that stack allocations are used in code you care about.
## Smart pointers and collections
Example: Why doesn't `Vec::new()` go to the allocator?
The first thing to note are the "smart pointer" and collections types.
When you have data that must outlive the scope in which it is declared,
or your data is of unknown or dynamic size, you'll make use of these types.
Questions:
The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer)
comes from C++, and is used to describe objects that are responsible for managing
ownership of data allocated on the heap. In Rust, the smart pointers types are:
- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)
- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)
- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)
- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html)
1. What is the "Push" instruction? Why do we like the stack?
2. How does Rust allocate arguments to the function?
3. How does Rust allocate variables created in the function but never returned?
4. How does Rust allocate variables created in the function and returned?
5. How do Option<> or Result<> affect structs?
6. How are arrays allocated?
7. Legal to pass an array as an argument?
8. Can you force a heap allocation with arrays that are larger than stack size?
- Check `ulimit -s`
- Are array implementations larger than 32 needed? 32 x u64 == 256 bytes
9. Can you force heap allocation by returning something that escapes the stack?
- Will `#[inline(always)]` move this back to a stack allocation?
- Will `#[inline(never)]` force a heap allocation?
When a smart pointer is created, the data it is given is placed in heap memory and
the location of that data is recorded in the smart pointer. Once the smart pointer
has determined it's safe to deallocate that memory (when a `Box` has
[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or when
the [last reference](https://doc.rust-lang.org/alloc/rc/index.html) to an object
is lost) the heap space is reclaimed. We can prove these types use heap memory by
looking at some quick code:
# Piling On - Rust and the Heap
```rust
use std::rc::Rc;
use std::sync::Arc;
use std::borrow::Cow;
pub fn my_box() {
// Drop at line 1674
Box::new(0);
}
pub fn my_rc() {
// Drop at line 1684
Rc::new(0);
}
pub fn my_arc() {
// Drop at line 1694
Arc::new(0);
}
pub fn my_cow() {
// Drop at line 1707
Cow::from("drop");
}
```
-- [Compiler Explorer](https://godbolt.org/z/QOPR4V)
Collections types use heap memory because they have dynamic size; they will
request more memory
[when they need it](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve),
and can be [asked to release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit)
when it's no longer necessary. This dynamic memory usage forces Rust to use
heap allocations for everything they contain. In a way, collections are smart pointers
for many objects at once. Common types that fall under this umbrella
are `Vec`, `HashMap`, and `String` (not [`&str`](https://doc.rust-lang.org/std/primitive.str.html)).
There's an interesting caveat worth addressing though: *creating empty collections
will not allocate on the heap*. This is a bit weird, because if we call `Vec::new()` the
assembly shows a corresponding call to `drop_in_place`:
```rust
pub fn my_vec() {
// Drop in place at line 481
Vec::<u8>::new();
}
```
-- [Compiler Explorer](https://godbolt.org/z/3-Gjqz)
But because the vector has no elements it is managing, no calls to the allocator
will ever be dispatched. A couple of places to look at for confirming this behavior:
[`Vec::new()`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.new),
[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new),
and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new).
## Enums and Wrappers
# A Heaping Helping: Rust and Dynamic Memory
Example: How to trigger a heap allocation
@ -572,7 +604,7 @@ Questions:
- Use `Borrow` to abstract over Pointer/Box/Rc/Arc/CoW
7. How expensive is move? Vs. C++ std::move?
# Compiler Optimizations Make Everything Complicated
# Compiler Optimizations: What It's Done For You Lately
1. Box<> getting inlined into stack allocations
2. Vec::push() === Vec::with_capacity() for fixed/predictable capacities