Move the files to final resting location

This commit is contained in:
2019-02-09 23:31:11 -05:00
parent 543e4253cc
commit e9099d191e
6 changed files with 12 additions and 9 deletions

View File

@ -0,0 +1,108 @@
---
layout: post
title: "Allocations in Rust"
description: "An introduction to the memory model."
category:
tags: [rust, understanding-allocations]
---
There's an alchemy of distilling complex technical topics into articles and videos
that change the way programmers see the tools they interact with on a regular basis.
I knew what a linker was, but there's a staggering amount of complexity in between
[`main()` and your executable](https://www.youtube.com/watch?v=dOfucXtyEsU).
Rust programmers use the [`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html)
type all the time, but there's a rich history of the Rust language itself wrapped up in
[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/).
In a similar vein, I want you to look at code and understand how memory is used;
the complex choreography of operating system, compiler, and program that frees you
to focus on functionality far-flung from frivolous book-keeping. The Rust compiler relieves
a great deal of the cognitive burden associated with memory management, but we're going
to step into its world for a while.
Let's learn a bit about memory in Rust.
# Table of Contents
This post is intended as both guide and reference material; we'll work to establish
an understanding of the different memory types Rust makes use of, then summarize each
section at the end for easy future citation. To that end, a table of contents is in order:
- Foreword
- [Global Memory Usage: The Whole World](/2019/02/the-whole-world)
- [Fixed Memory: Stacking Up](/2019/02/stacking-up)
- [Dynamic Memory: A Heaping Helping](/2019/02/a-heaping-helping)
- [Compiler Optimizations: What It's Done For You Lately](/2019/02/compiler-optimizations)
- [Summary: What Are the Rules?](/2019/02/summary)
# Foreword
Rust's three defining features of [Performance, Reliability, and Productivity](https://www.rust-lang.org/)
are all driven to a great degree by the how the Rust compiler understands
[memory ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html). Unlike managed memory
languages (Java, Python), Rust [doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis)
garbage collect, leading to fast code when [dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation)
isn't necessary. When heap memory is necessary, Rust ensures you can't accidentally mis-manage it.
And because the compiler handles memory "ownership" for you, developers never need to worry about
accidentally deleting data that was needed somewhere else.
That said, there are situations where you won't benefit from work the Rust compiler is doing.
If you:
1. Never use `unsafe`
2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)
...then it's not possible for you to use dynamic memory!
For some uses of Rust, typically embedded devices, these constraints make sense.
They have very limited memory, and the program binary size itself may significantly
affect what's available! There's no operating system able to manage
this ["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) junk, but that's
not an issue because there's only one running application. The
[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind,
and interacting with the "real world" through extra peripherals is accomplished by
reading and writing to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html).
Most Rust programs find these requirements overly burdensome though. C++ developers
would struggle without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector)
(except those hardcore no-STL people), and Rust developers would struggle without
[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But in this scenario,
`std::vec` is actually aliased to a part of the
[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits.
`Box`, `Rc`, etc., are also unusable for the same reason.
Whether writing code for embedded devices or not, the important thing in both situations
is how much you know *before your application starts* about what its memory usage will look like.
In embedded devices, there's a small, fixed amount of memory to use.
In a browser, you have no idea how large [google.com](https://www.google.com)'s home page is until you start
trying to download it. The compiler uses this information (or lack thereof) to optimize
how memory is used; put simply, your code runs faster when the compiler can guarantee exactly
how much memory your program needs while it's running. This post is all about understanding
how the compiler reasons about your program, with an emphasis on how to design your programs
for performance.
Now let's address some conditions and caveats before going much further:
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll ignore.
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`)
and address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`).
- All content will be run using Rust 1.32, as that's the highest currently supported in the
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
that are available in nightly.
- Because of the nature of the content, some (very simple) assembly-level code is involved.
We'll keep this simple, but I [found](https://stackoverflow.com/a/4584131/1454178)
a [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop`
[instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
was helpful while writing this post.
- I've tried to be precise in saying only what I can prove using the tools (ASM, docs)
that are available. That said, if there's something said in error, please reach out
and let me know - [bradlee@speice.io](mailto:bradlee@speice.io)
Finally, I'll do what I can to flag potential future changes but the Rust docs
have a notice worth repeating:
> Rust does not currently have a rigorously and formally defined memory model.
>
> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html)

View File

@ -0,0 +1,294 @@
---
layout: post
title: "Global Memory Usage: The Whole World"
description: "Static considered slightly less harmful."
category:
tags: [rust, understanding-allocations]
---
The first memory type we'll look at is pretty special: when Rust can prove that
a *value* is fixed for the life of a program (`const`), and when a *reference* is valid for
the duration of the program (`static` as a declaration, not
[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime)
as a lifetime).
Understanding the distinction between value and reference is important for reasons
we'll go into below. The
[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md)
for these two memory types is available, but we'll take a hands-on approach to the topic.
# **const**
The quick summary is this: `const` declares a read-only block of memory that is loaded
as part of your program binary (during the call to [exec(3)](https://linux.die.net/man/3/exec)).
Any `const` value resulting from calling a `const fn` is guaranteed to be materialized
at compile-time (meaning that access at runtime will not invoke the `const fn`),
even though the `const fn` functions are available at run-time as well. The compiler
can choose to copy the constant value wherever it is deemed practical. Getting the address
of a `const` value is legal, but not guaranteed to be the same even when referring to the
same named identifier.
The first point is a bit strange - "read-only memory".
[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants)
mentions in a couple places that using `mut` with constants is illegal,
but it's also important to demonstrate just how immutable they are. *Typically* in Rust
you can use "inner mutability" to modify things that aren't declared `mut`.
[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an API
to guarantee at runtime that some consistency rules are enforced:
```rust
use std::cell::RefCell;
fn my_mutator(cell: &RefCell<u8>) {
// Even though we're given an immutable reference,
// the `replace` method allows us to modify the inner value.
cell.replace(14);
}
fn main() {
let cell = RefCell::new(25);
// Prints out 25
println!("Cell: {:?}", cell);
my_mutator(&cell);
// Prints out 14
println!("Cell: {:?}", cell);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2)
When `const` is involved though, modifications are silently ignored:
```rust
use std::cell::RefCell;
const CELL: RefCell<u8> = RefCell::new(25);
fn my_mutator(cell: &RefCell<u8>) {
cell.replace(14);
}
fn main() {
// First line prints 25 as expected
println!("Cell: {:?}", &CELL);
my_mutator(&CELL);
// Second line *still* prints 25
println!("Cell: {:?}", &CELL);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00)
And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html):
```rust
use std::sync::Once;
const SURPRISE: Once = Once::new();
fn main() {
// This is how `Once` is supposed to be used
SURPRISE.call_once(|| println!("Initializing..."));
// Because `Once` is a `const` value, we never record it
// having been initialized the first time, and this closure
// will also execute.
SURPRISE.call_once(|| println!("Initializing again???"));
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed)
When the [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md)
refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this is
what they mean. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this as an error,
but it's still something to be aware of.
The next thing to mention is that `const` values are loaded into memory *as part of your program binary*.
Because of this, any `const` values declared in your program will be "realized" at compile-time;
accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may
be able to prefetch the value), but that's it.
```rust
use std::cell::RefCell;
const CELL: RefCell<u32> = RefCell::new(24);
pub fn multiply(value: u32) -> u32 {
value * (*CELL.get_mut())
}
```
-- [Compiler Explorer](https://godbolt.org/z/2KXUcN)
The compiler only creates one `RefCell`, and uses it everywhere. However, that value
is fully realized at compile time, and is fully stored in the `.L__unnamed_1` section.
If it's helpful though, the compiler can choose to copy `const` values.
```rust
const FACTOR: u32 = 1000;
pub fn multiply(value: u32) -> u32 {
value * FACTOR
}
pub fn multiply_twice(value: u32) -> u32 {
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/_JiT9O)
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction
in both the `multiply` and `multiply_twice` functions; the "1000" value is never
"stored" anywhere, as it's small enough to inline into the assembly instructions.
Finally, getting the address of a `const` value is possible but not guaranteed
to be unique (given that the compiler can choose to copy values). In my testing
I was never able to get the compiler to copy a `const` value and get differing pointers,
but the specifications are clear enough: *don't rely on pointers to `const`
values being consistent*. To be frank, caring about locations for `const` values
is almost certainly a code smell.
# **static**
Static variables are related to `const` variables, but take a slightly different approach.
When the compiler can guarantee that a *reference* is fixed for the life of a program,
you end up with a `static` variable (as opposed to *values* that are fixed for the
duration a program is running). Because of this reference/value distinction,
static variables behave much more like what people expect from "global" variables.
We'll look at regular static variables first, and then address the `lazy_static!()`
and `thread_local!()` macros later.
More generally, `static` variables are globally unique locations in memory,
the contents of which are loaded as part of your program being read into main memory.
They allow initialization with both raw values and `const fn` calls, and the initial
value is loaded along with the program/library binary. All static variables must
be of a type that implements the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html)
marker trait. And while `static mut` variables are allowed, mutating a static is considered
an `unsafe` operation.
The single biggest difference between `const` and `static` is the guarantees
provided about uniqueness. Where `const` variables may or may not be copied
in code, `static` variables are guarantee to be unique. If we take a previous
`const` example and change it to `static`, the difference should be clear:
```rust
static FACTOR: u32 = 1000;
pub fn multiply(value: u32) -> u32 {
value * FACTOR
}
pub fn multiply_twice(value: u32) -> u32 {
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/bSfBxn)
Where [previously](https://godbolt.org/z/_JiT90) there were plenty of
references to multiplying by 1000, the new assembly refers to `FACTOR`
as a named memory location instead. No initialization work needs to be done,
but the compiler can no longer prove the value never changes during execution.
Next, let's talk about initialization. The simplest case is initializing
static variables with either scalar or struct notation:
```rust
#[derive(Debug)]
struct MyStruct {
x: u32
}
static MY_STRUCT: MyStruct = MyStruct {
// You can even reference other statics
// declared later
x: MY_VAL
};
static MY_VAL: u32 = 24;
fn main() {
println!("Static MyStruct: {:?}", MY_STRUCT);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)
Things get a bit weirder when using `const fn`. In most cases, things just work:
```rust
#[derive(Debug)]
struct MyStruct {
x: u32
}
impl MyStruct {
const fn new() -> MyStruct {
MyStruct { x: 24 }
}
}
static MY_STRUCT: MyStruct = MyStruct::new();
fn main() {
println!("const fn Static MyStruct: {:?}", MY_STRUCT);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)
However, there's a caveat: you're currently not allowed to use `const fn` to initialize
static variables of types that aren't marked `Sync`. As an example, even though
[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new)
is `const fn`, because [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync),
you'll get an error at compile time:
```rust
use std::cell::RefCell;
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_LOCK: RefCell<u8> = RefCell::new(0);
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)
It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though.
Which leads well to the next point: static variable types must implement the
[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html).
Because they're globally unique, it must be safe for you to access static variables
from any thread at any time. Most `struct` definitions automatically implement the
`Sync` trait because they contain only elements which themselves
implement `Sync`. This is why earlier examples could get away with initializing
statics, even though we never included an `impl Sync for MyStruct` in the code.
For more on the `Sync` trait, the [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)
has a much more thorough treatment. But as an example, Rust refuses to compile
our earlier example if we add a non-`Sync` element to the `struct` definition:
```rust
use std::cell::RefCell;
struct MyStruct {
x: u32,
y: RefCell<u8>,
}
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_STRUCT: MyStruct = MyStruct {
x: 8,
y: RefCell::new(8)
};
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)
Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation.
Unlike `const` however, interior mutability is acceptable. To demonstrate:
```rust
use std::sync::Once;
// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once
static INIT: Once = Once::new();
fn main() {
// Note that while `INIT` is declared immutable, we're still allowed
// to mutate its interior
INIT.call_once(|| println!("Initializing..."));
// This code won't panic, as the interior of INIT was modified
// as part of the previous `call_once`
INIT.call_once(|| panic!("INIT was called twice!"));
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)

View File

@ -0,0 +1,558 @@
---
layout: post
title: "Fixed Memory: Stacking Up"
description: "We don't need no allocator."
category:
tags: [rust, understanding-allocations]
---
`const` and `static` are perfectly fine, but it's very rare that we know
at compile-time about either values or references that will be the same for the
duration of our program. Put another way, it's not often the case that either you
or your compiler knows how much memory your entire program will need.
However, there are still some optimizations the compiler can do if it knows how much
memory individual functions will need. Specifically, the compiler can make use of
"stack" memory (as opposed to "heap" memory) which can be managed far faster in
both the short- and long-term. When requesting memory, the
[`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods)
(<1 nanosecond on modern CPUs). Contrast that to heap memory which requires an allocator
(specialized software to track what memory is in use) to reserve space.
And when you're finished with your memory, the `pop` instruction likewise runs in
1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation
and other issues. All sorts of incredibly sophisticated techniques have been used
to design allocators:
- [Garbage Collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science))
strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection)
(used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html))
and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting)
(used in [Python](https://docs.python.org/3/extending/extending.html#reference-counts))
- Thread-local structures to prevent locking the allocator in [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html)
- Arena structures used in [jemalloc](http://jemalloc.net/), which
[until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default)
was the primary allocator for Rust programs!
But no matter how fast your allocator is, the principle remains: the
fastest allocator is the one you never use. As such, we're not going to discuss how exactly the
[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html),
but we'll focus instead on the conditions that enable the Rust compiler to use
the faster stack-based allocation for variables.
So, **how do we know when Rust will or will not use stack allocation for objects we create?**
Looking at other languages, it's often easy to delineate
between stack and heap. Managed memory languages (Python, Java,
[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/))
place everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/),
[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may
optimize some heap allocations away, but you should never assume it will happen.
C makes things clear with calls to special functions ([malloc(3)](https://linux.die.net/man/3/malloc)
is one) being the way to use heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178)
keyword, though modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii).
For Rust specifically, the principle is this: **stack allocation will be used for everything
that doesn't involve "smart pointers" and collections.** We'll skip over a precise definition
of the term "smart pointer" for now, and instead discuss what we should watch for when talking
about the memory region used for allocation:
1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register)
indicate allocation of stack memory:
```rust
pub fn stack_alloc(x: u32) -> u32 {
// Space for `y` is allocated by subtracting from `rsp`,
// and then populated
let y = [1u8, 2, 3, 4];
// Space for `y` is deallocated by adding back to `rsp`
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/5WSgc9)
2. Tracking when exactly heap allocation calls happen is difficult. It's typically easier to
watch for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened
in the recent past:
```rust
pub fn heap_alloc(x: usize) -> usize {
// Space for elements in a vector has to be allocated
// on the heap, and is then de-allocated once the
// vector goes out of scope
let y: Vec<u8> = Vec::with_capacity(x);
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317)
<span style="font-size: .8em">Note: While the [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html)
is [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46),
the Rust standard library only defines `Drop` implementations for types that involve heap allocation.</span>
3. If you don't want to inspect the assembly, use a custom allocator that's able to track
and alert when heap allocations occur. Crates like [`alloc_counter`](https://crates.io/crates/alloc_counter)
are designed for exactly this purpose.
With all that in mind, let's talk about situations in which we're guaranteed to use stack memory:
- Structs are created on the stack.
- Function arguments are passed on the stack, meaning the
[`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute)
will not change the memory region used.
- Enums and unions are stack-allocated.
- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated.
- Closures capture their arguments on the stack.
- Generics will use stack allocation, even with dynamic dispatch.
- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be
stack-allocated, and copying them will be done in stack memory.
- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library
are stack-allocated even when iterating over heap-based collections.
# Structs
The simplest case comes first. When creating vanilla `struct` objects, we use stack memory
to hold their contents:
```rust
struct Point {
x: u64,
y: u64,
}
struct Line {
a: Point,
b: Point,
}
pub fn make_line() {
// `origin` is stored in the first 16 bytes of memory
// starting at location `rsp`
let origin = Point { x: 0, y: 0 };
// `point` makes up the next 16 bytes of memory
let point = Point { x: 1, y: 2 };
// When creating `ray`, we just move the content out of
// `origin` and `point` into the next 32 bytes of memory
let ray = Line { a: origin, b: point };
}
```
-- [Compiler Explorer](https://godbolt.org/z/vri9BE)
Note that while some extra-fancy instructions are used for memory manipulation in the assembly,
the `sub rsp, 64` instruction indicates we're still working with the stack.
# Function arguments
Have you ever wondered how functions communicate with each other? Like, once the variables are
given to you, everything's fine. But how do you "give" those variables to another function?
How do you get the results back afterward? The answer: the compiler arranges memory and
assembly instructions using a pre-determined
[calling convention](http://llvm.org/docs/LangRef.html#calling-conventions).
This convention governs the rules around where arguments needed by a function will be located
(either in memory offsets relative to the stack pointer `rsp`, or in other registers), and
where the results can be found once the function has finished. And when multiple languages
agree on what the calling conventions are, you can do things like having
[Go call Rust code](https://blog.filippo.io/rustgo/)!
Put simply: it's the compiler's job to figure out how to call other functions, and you can assume
that the compiler is good at its job.
We can see this in action using a simple example:
```rust
struct Point {
x: i64,
y: i64,
}
// We use integer division operations to keep
// the assembly clean, understanding the result
// isn't accurate.
fn distance(a: &Point, b: &Point) -> i64 {
// Immediately subtract from `rsp` the bytes needed
// to hold all the intermediate results - this is
// the stack allocation step
// The compiler used the `rdi` and `rsi` registers
// to pass our arguments, so read them in
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
// Do the actual math work
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
// Our final result will be stored in the `rax` register
// so that our caller knows where to retrieve it.
// Finally, add back to `rsp` the stack memory that is
// now ready to be used by other functions.
}
pub fn total_distance() {
let start = Point { x: 1, y: 2 };
let middle = Point { x: 3, y: 4 };
let end = Point { x: 5, y: 6 };
let _dist_1 = distance(&start, &middle);
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST)
As a consequence of function arguments never using heap memory, we can also
infer that functions using the `#[inline]` attributes also do not heap-allocate.
But better than inferring, we can look at the assembly to prove it:
```rust
struct Point {
x: i64,
y: i64,
}
// Note that there is no `distance` function in the assembly output,
// and the total line count goes from 229 with inlining off
// to 306 with inline on. Even still, no heap allocations occur.
#[inline(always)]
fn distance(a: &Point, b: &Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (a.x - b.x) * (a.x - b.x);
let y_pow = (a.y - b.y) * (a.y - b.y);
let squared = x_pow + y_pow;
squared / squared
}
pub fn total_distance() {
let start = Point { x: 1, y: 2 };
let middle = Point { x: 3, y: 4 };
let end = Point { x: 5, y: 6 };
let _dist_1 = distance(&start, &middle);
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/30Sh66)
Finally, passing by value (arguments with type
[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html))
and passing by reference (either moving ownership or passing a pointer) may have
[slightly different layouts in assembly](https://godbolt.org/z/sKi_kl), but will
still use either stack memory or CPU registers.
# Enums
If you've ever worried that wrapping your types in
[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or
[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would
finally make them large enough that Rust decides to use heap allocation instead,
fear no longer: `enum` and union types don't use heap allocation:
```rust
enum MyEnum {
Small(u8),
Large(u64)
}
struct MyStruct {
x: MyEnum,
y: MyEnum,
}
pub fn enum_compare() {
let x = MyEnum::Small(0);
let y = MyEnum::Large(0);
let z = MyStruct { x, y };
let opt = Option::Some(z);
}
```
-- [Compiler Explorer](https://godbolt.org/z/HK7zBx)
Because the size of an `enum` is the size of its largest element plus a flag,
the compiler can predict how much memory is used no matter which variant
of an enum is currently stored in a variable. Thus, enums and unions have no
need of heap allocation. There's unfortunately not a great way to show this
in assembly, so I'll instead point you to the
[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums)
documentation.
# Arrays
The array type is guaranteed to be stack allocated, which is why the array size must
be declared. Interestingly enough, this can be used to cause safe Rust programs to crash:
```rust
// 256 bytes
#[derive(Default)]
struct TwoFiftySix {
_a: [u64; 32]
}
// 8 kilobytes
#[derive(Default)]
struct EightK {
_a: [TwoFiftySix; 32]
}
// 256 kilobytes
#[derive(Default)]
struct TwoFiftySixK {
_a: [EightK; 32]
}
// 8 megabytes - exceeds space typically provided for the stack,
// though the kernel can be instructed to allocate more.
// On Linux, you can check stack size using `ulimit -s`
#[derive(Default)]
struct EightM {
_a: [TwoFiftySixK; 32]
}
fn main() {
// Because we already have things in stack memory
// (like the current function call stack), allocating another
// eight megabytes of stack memory crashes the program
let _x = EightM::default();
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4)
There aren't any security implications of this (no memory corruption occurs),
but it's good to note that the Rust compiler won't move arrays into heap memory
even if they can be reasonably expected to overflow the stack.
# Closures
Rules for how anonymous functions capture their arguments are typically language-specific.
In Java, [Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html)
are actually objects created on the heap that capture local primitives by copying, and capture
local non-primitives as (`final`) references.
[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and
[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/)
both bind *everything* by reference normally, but Python can also
[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has
[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions).
In Rust, arguments to closures are the same as arguments to other functions;
closures are simply functions that don't have a declared name. Some weird ordering
of the stack may be required to handle them, but it's the compiler's responsiblity
to figure it out.
Each example below has the same effect, but compile to very different programs.
In the simplest case, we immediately run a closure returned by another function.
Because we don't store a reference to the closure, the stack memory needed to
store the captured values is contiguous:
```rust
fn my_func() -> impl FnOnce() {
let x = 24;
// Note that this closure in assembly looks exactly like
// any other function; you even use the `call` instruction
// to start running it.
move || { x; }
}
pub fn immediate() {
my_func()();
my_func()();
}
```
-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions
If we store a reference to the closure, the Rust compiler keeps values it needs
in the stack memory of the original function. Getting the details right is a bit harder,
so the instruction count goes up even though this code is functionally equivalent
to our original example:
```rust
pub fn simple_reference() {
let x = my_func();
let y = my_func();
y();
x();
}
```
-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions
Even things like variable order can make a difference in instruction count:
```rust
pub fn complex() {
let x = my_func();
let y = my_func();
x();
y();
}
```
-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions
In every circumstance though, the compiler ensured that no heap allocations were necessary.
# Generics
Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`)
and dynamic dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often
*associated* with trait objects being stored in the heap, dynamic dispatch can be used
with stack allocated objects as well:
```rust
trait GetInt {
fn get_int(&self) -> u64;
}
// vtable stored at section L__unnamed_1
struct WhyNotU8 {
x: u8
}
impl GetInt for WhyNotU8 {
fn get_int(&self) -> u64 {
self.x as u64
}
}
// vtable stored at section L__unnamed_2
struct ActualU64 {
x: u64
}
impl GetInt for ActualU64 {
fn get_int(&self) -> u64 {
self.x
}
}
// `&dyn` declares that we want to use dynamic dispatch
// rather than monomorphization, so there is only one
// `retrieve_int` function that shows up in the final assembly.
// If we used generics, there would be one implementation of
// `retrieve_int` for each type that implements `GetInt`.
pub fn retrieve_int(u: &dyn GetInt) {
// In the assembly, we just call an address given to us
// in the `rsi` register and hope that it was set up
// correctly when this function was invoked.
let x = u.get_int();
}
pub fn do_call() {
// Note that even though the vtable for `WhyNotU8` and
// `ActualU64` includes a pointer to
// `core::ptr::real_drop_in_place`, it is never invoked.
let a = WhyNotU8 { x: 0 };
let b = ActualU64 { x: 0 };
retrieve_int(&a);
retrieve_int(&b);
}
```
-- [Compiler Explorer](https://godbolt.org/z/u_yguS)
It's hard to imagine practical situations where dynamic dispatch would be
used for objects that aren't heap allocated, but it technically can be done.
# Copy types
Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs
[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html)
far better than can be addressed here, so I'll leave them to do the job.
Even from a memory perspective though, their guideline is reasonable:
[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy).
While there are potential speed tradeoffs to *benchmark* when discussing `Copy`
(move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s),
*it's impossible for `Copy` to introduce a heap allocation*.
But why is this the case? Fundamentally, it's because the language controls
what `Copy` means -
["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone)
because it's a marker trait. From there we'll note that a type
[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy)
if (and only if) its components implement `Copy`, and that
[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors).
Thus, assignments involving heap types are always move semantics, and new heap
allocations won't occur without explicit calls to
[`clone()`](https://doc.rust-lang.org/std/clone/trait.Clone.html#tymethod.clone).
```rust
#[derive(Clone)]
struct Cloneable {
x: Box<u64>
}
// error[E0204]: the trait `Copy` may not be implemented for this type
#[derive(Copy, Clone)]
struct NotCopyable {
x: Box<u64>
}
```
-- [Compiler Explorer](https://godbolt.org/z/VToRuK)
# Iterators
In [managed memory languages](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)
(like Java), there's a subtle difference between these two code samples:
```java
public static int sum_for(List<Long> vals) {
long sum = 0;
// Regular for loop
for (int i = 0; i < vals.length; i++) {
sum += vals[i];
}
return sum;
}
public static int sum_foreach(List<Long> vals) {
long sum = 0;
// "Foreach" loop - uses iteration
for (Long l : vals) {
sum += l;
}
return sum;
}
```
In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`,
an object of type [`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html)
is allocated on the heap, and will eventually be garbage-collected. This isn't a great design;
iterators are often transient objects that you need during a function and can discard
once the function ends. Sounds exactly like the issue stack-allocated objects address, no?
In Rust, iterators are allocated on the stack. The objects to iterate over are almost
certainly in heap memory, but the iterator itself
([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap.
In each of the examples below we iterate over a collection, but will never need to allocate
a object on the heap to clean up:
```rust
use std::collections::HashMap;
// There's a lot of assembly generated, but if you search in the text,
// there are no references to `real_drop_in_place` anywhere.
pub fn sum_vec(x: &Vec<u32>) {
let mut s = 0;
// Basic iteration over vectors doesn't need allocation
for y in x {
s += y;
}
}
pub fn sum_enumerate(x: &Vec<u32>) {
let mut s = 0;
// More complex iterators are just fine too
for (_i, y) in x.iter().enumerate() {
s += y;
}
}
pub fn sum_hm(x: &HashMap<u32, u32>) {
let mut s = 0;
// And it's not just Vec, all types will allocate the iterator
// on stack memory
for y in x.values() {
s += y;
}
}
```
-- [Compiler Explorer](https://godbolt.org/z/FTT3CT)

View File

@ -0,0 +1,254 @@
---
layout: post
title: "Dynamic Memory: A Heaping Helping"
description: "The reason Rust exists."
category:
tags: [rust, understanding-allocations]
---
Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++),
and some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust,
how the language uses dynamic memory (also referred to as the **heap**) is a system called *ownership*.
And as the docs mention, ownership
[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html).
The heap is used in two situations: when the compiler is unable to predict the *total size
of memory needed*, or *how long the memory is needed for*, it will allocate space in the heap.
This happens pretty frequently; if you want to download the Google home page, you won't know
how large it is until your program runs. And when you're finished with Google, whenever that
happens to be, we deallocate the memory so it can be used to store other webpages. If you're
interested in a slightly longer explanation of the heap, check out
[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap)
in Rust's documentation.
We won't go into detail on how the heap is managed; the
[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html)
does a phenomenal job explaining both the "why" and "how" of memory management. Instead,
we're going to focus on understanding "when" heap allocations occur in Rust.
To start off, take a guess for how many allocations happen in the program below:
```rust
fn main() {}
```
It's obviously a trick question; while no heap allocations happen as a result of
the code listed above, the setup needed to call `main` does allocate on the heap.
Here's a way to show it:
```rust
#![feature(integer_atomics)]
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicU64, Ordering};
static ALLOCATION_COUNT: AtomicU64 = AtomicU64::new(0);
struct CountingAllocator;
unsafe impl GlobalAlloc for CountingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
System.dealloc(ptr, layout);
}
}
#[global_allocator]
static A: CountingAllocator = CountingAllocator;
fn main() {
let x = ALLOCATION_COUNT.fetch_add(0, Ordering::SeqCst);
println!("There were {} allocations before calling main!", x);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e)
As of the time of writing, there are five allocations that happen before `main`
is ever called.
But when we want to understand more practically where heap allocation happens,
we'll follow this guide:
- Smart pointers hold their contents in the heap
- Collections are smart pointers for many objects at a time, and reallocate
when they need to grow
Finally, there are two "addendum" issues that are important to address when discussing
Rust and the heap:
- Stack-based alternatives to some standard library types are available
- Special allocators to track memory behavior are available
# Smart pointers
The first thing to note are the "smart pointer" types.
When you have data that must outlive the scope in which it is declared,
or your data is of unknown or dynamic size, you'll make use of these types.
The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer)
comes from C++, and while it's closely linked to a general design pattern of
["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii),
we'll use it here specifically to describe objects that are responsible for managing
ownership of data allocated on the heap. The smart pointers available in the `alloc`
crate should look mostly familiar:
- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)
- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)
- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)
- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html)
The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers
to manage heap objects, though more than can be covered here. Some examples:
- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)
- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)
Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha):
**cell types** (like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html))
look and behave similarly, but **don't involve heap allocation**. The
[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html)
have more information.
When a smart pointer is created, the data it is given is placed in heap memory and
the location of that data is recorded in the smart pointer. Once the smart pointer
has determined it's safe to deallocate that memory (when a `Box` has
[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or when
reference count for an object [goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)),
the heap space is reclaimed. We can prove these types use heap memory by
looking at code:
```rust
use std::rc::Rc;
use std::sync::Arc;
use std::borrow::Cow;
pub fn my_box() {
// Drop at assembly line 1640
Box::new(0);
}
pub fn my_rc() {
// Drop at assembly line 1650
Rc::new(0);
}
pub fn my_arc() {
// Drop at assembly line 1660
Arc::new(0);
}
pub fn my_cow() {
// Drop at assembly line 1672
Cow::from("drop");
}
```
-- [Compiler Explorer](https://godbolt.org/z/4AMQug)
# Collections
Collections types use heap memory because their contents have dynamic size; they will request
more memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve),
and can [release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit)
when it's no longer necessary. This dynamic property forces Rust to heap allocate
everything they contain. In a way, **collections are smart pointers for many objects at once.**
Common types that fall under this umbrella are
[`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),
[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and
[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html)
(not [`&str`](https://doc.rust-lang.org/std/primitive.str.html)).
But while collections store the objects they own in heap memory, *creating new collections
will not allocate on the heap*. This is a bit weird; if we call `Vec::new()`, the
assembly shows a corresponding call to `real_drop_in_place`:
```rust
pub fn my_vec() {
// Drop in place at line 481
Vec::<u8>::new();
}
```
-- [Compiler Explorer](https://godbolt.org/z/1WkNtC)
But because the vector has no elements it is managing, no calls to the allocator
will ever be dispatched:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// Interesting bit happens here
let x: Vec<u8> = Vec::new();
drop(x);
// Turn panicking back off, some deallocations occur
// after main as well.
DO_PANIC.store(false, Ordering::SeqCst);
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6)
Other standard library types follow the same behavior; make sure to check out
[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new),
and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new).
# Heap Alternatives
While it is a bit strange for us to talk of the stack after spending time with the heap,
it's worth pointing out that some heap-allocated objects in Rust have stack-based counterparts
provided by other crates. If you have need of the functionality, but want to avoid allocating,
there are some great alternatives.
When it comes to some of the standard library smart pointers
([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and
[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives
are provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and
[spin](https://crates.io/crates/spin). You can check out
[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html),
[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and
[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html)
if you're in need of synchronization primitives.
[thread_id](https://crates.io/crates/thread-id)
may still be necessary if you're implementing an allocator (*cough cough* the author *cough cough*)
because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html)
[uses a `thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#22-40)
that needs heap allocation.
# Tracing Allocators
When writing performance-sensitive code, there's no alternative to measuring your code.
If you didn't write a benchmark,
[you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263)
You should never rely on your instincts when
[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM).
Similarly, there's great work going on in Rust with allocators that keep track of what
they're doing. [`alloc_counter`](https://crates.io/crates/alloc_counter) was designed
for exactly this purpose. When it comes to tracking heap behavior, you shouldn't just
rely on the language; please measure and make sure that you have tools in place to catch
any issues that come up.

View File

@ -0,0 +1,187 @@
---
layout: post
title: "Compiler Optimizations: What It's Done Lately"
description: "A lot. The answer is a lot."
category:
tags: [rust, understanding-allocations]
---
Up to this point, we've been discussing memory usage in the Rust language
by focusing on simple rules that are mostly right for small chunks of code.
We've spent time showing how those rules work themselves out in practice,
and become familiar with reading the assembly code needed to see each memory
type (global, stack, heap) in action.
But throughout the content so far, we've put a handicap on the code.
In the name of consistent and understandable results, we've asked the
compiler to pretty please leave the training wheels on. Now is the time
where we throw out all the rules and take the kid gloves off. As it turns out,
both the Rust compiler and the LLVM optimizers are incredibly sophisticated,
and we'll step back and let them do their job.
Similar to ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4),
we're focusing on interesting things the Rust language (and LLVM!) can do
as regards memory management. We'll still be looking at assembly code to
understand what's going on, but it's important to mention again:
**please use automated tools like
[alloc-counter](https://crates.io/crates/alloc_counter) to double-check
memory behavior if it's something you care about**.
It's far too easy to mis-read assembly in large code sections, you should
always have an automated tool verify behavior if you care about memory usage.
The guiding principal as we move forward is this: *optimizing compilers
won't produce worse assembly than we started with.* There won't be any
situations where stack allocations get moved to heap allocations.
There will, however, be an opera of optimization.
# The Case of the Disappearing Box
Our first optimization comes when LLVM can reason that the lifetime of an object
is sufficiently short that heap allocations aren't necessary. In these cases,
LLVM will move the allocation to the stack instead! The way this interacts
with `#[inline]` attributes is a bit opaque, but the important part is that LLVM
can sometimes do better than the baseline Rust language.
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
pub fn main() {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// This code will only run with the mode set to "Release".
// If you try running in "Debug", you'll get a panic.
let x = Box::new(0);
drop(x);
// Turn off panicking, as there are some deallocations
// when we exit main.
DO_PANIC.store(false, Ordering::SeqCst);
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=614994a20e362bf04de868b19daf5ca4)
# Vectors of Usual Size
With some collections, LLVM can predict how large they will become
and allocate the entire size on the stack instead of the heap.
This works whether with both the pre-allocation (`Vec::with_capacity`)
*and re-allocation* (`Vec::push`) methods for collections types.
Not only can LLVM predict sizing if you reserve the fully size up front,
it can see through the resizing operations and find the total size.
While this specific optimization is unlikely to come up in production
usage, it's cool to note that LLVM does a considerable amount of work
to understand what code actually does.
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// If the compiler can predict how large a vector will be,
// it can optimize out the heap storage needed. This also
// works with `Vec::with_capacity()`, but the push case
// is a bit more interesting.
let mut x: Vec<u64> = Vec::new();
x.push(12);
assert_eq!(x[0], 12);
drop(x);
// Turn off panicking, as there are some deallocations
// when we exit main.
DO_PANIC.store(false, Ordering::SeqCst);
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1dfccfcf63d8800e644a3b948f1eeb7b)
# Dr. Array or: How I Learned to Love the Optimizer
Finally, this isn't so much about LLVM figuring out different memory behavior,
but LLVM totally stripping out code that has no side effects. Optimizations of
this type have a lot of nuance to them; if you're not careful, they can
make your benchmarks look
[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199).
In Rust, the `black_box` function (in both
[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and
[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html))
will tell the compiler to disable this kind of optimization. But if you let
LLVM remove unnecessary code, you can end up with programs that
would have previously caused errors running just fine:
```rust
#[derive(Default)]
struct TwoFiftySix {
_a: [u64; 32]
}
#[derive(Default)]
struct EightK {
_a: [TwoFiftySix; 32]
}
#[derive(Default)]
struct TwoFiftySixK {
_a: [EightK; 32]
}
#[derive(Default)]
struct EightM {
_a: [TwoFiftySixK; 32]
}
pub fn main() {
// Normally this blows up because we can't reserve size on stack
// for the `EightM` struct. But because the compiler notices we
// never do anything with `_x`, it optimizes out the stack storage
// and the program completes successfully.
let _x = EightM::default();
}
```
-- [Compiler Explorer](https://godbolt.org/z/daHn7P)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)

View File

@ -0,0 +1,45 @@
---
layout: post
title: "Summary: What Are the Rules?"
description: "A synopsis and reference."
category:
tags: [rust, understanding-allocations]
---
While there's a lot of interesting detail captured in this series, it's often helpful
to have a document that answers some "yes/no" questions. You may not care about
what an `Iterator` looks like in assembly, you just need to know whether it allocates
an object on the heap or not.
To that end, it should be said once again: if you care about memory behavior,
use an allocator to verify the correct behavior. Tools like
[`alloc_counter`](https://crates.io/crates/alloc_counter) are designed to make
testing this behavior simple easy.
Finally, a summary of the content that's been covered. Rust will prioritize
the fastest behavior it can, but here are the ground rules for understanding
the memory model in Rust:
**Heap Allocation**:
- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory.
- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory.
- Some smart pointers in the standard library have counterparts in other crates that
don't need heap memory. If possible, use those.
**Stack Allocation**:
- Everything not using a smart pointer type will be allocated on the stack.
- Structs, enums, iterators, arrays, and closures are all stack allocated.
- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated.
- Inlining (`#[inline]`) will not affect allocation behavior for better or worse.
- Types that are marked `Copy` are guaranteed to have their contents stack-allocated.
**Global Allocation**:
- `const` is a fixed value; the compiler is allowed to copy it wherever useful.
- `static` is a fixed reference; the compiler will guarantee it is unique.
---
And if you've read through this series: thanks. I've enjoyed the process that went
into writing this, both in building new tools and forcing myself to understand
the content well enough to explain it. I hope this is valuable as a reference to you
as well.