Allocations in Rust series

This commit is contained in:
Bradlee Speice 2024-11-09 21:05:00 -05:00
parent 7426890685
commit 97f997dc99
14 changed files with 2981 additions and 1 deletions

View File

@ -12,7 +12,7 @@ bit over a month ago, I was dispensing sage wisdom for the ages:
> I had a really great idea: build a custom allocator that allows you to track your own allocations.
> I gave it a shot, but learned very quickly: **never write your own allocator.**
>
> -- [me](../2018-10-08-case-study-optimization)
> -- [me](/2018/10/case-study-optimization)
I proceeded to ignore it, because we never really learn from our mistakes.

View File

@ -0,0 +1,113 @@
---
layout: post
title: "Allocations in Rust"
description: "An introduction to the memory model."
category:
tags: [rust, understanding-allocations]
---
There's an alchemy of distilling complex technical topics into articles and videos that change the
way programmers see the tools they interact with on a regular basis. I knew what a linker was, but
there's a staggering amount of complexity in between
[the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). Rust programmers use the
[`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) type all the time, but there's a
rich history of the Rust language itself wrapped up in
[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/).
In a similar vein, this series attempts to look at code and understand how memory is used; the
complex choreography of operating system, compiler, and program that frees you to focus on
functionality far-flung from frivolous book-keeping. The Rust compiler relieves a great deal of the
cognitive burden associated with memory management, but we're going to step into its world for a
while.
Let's learn a bit about memory in Rust.
# Table of Contents
This series is intended as both learning and reference material; we'll work through the different
memory types Rust uses, and explain the implications of each. Ultimately, a summary will be provided
as a cheat sheet for easy future reference. To that end, a table of contents is in order:
- Foreword
- [Global Memory Usage: The Whole World](/2019/02/the-whole-world.html)
- [Fixed Memory: Stacking Up](/2019/02/stacking-up.html)
- [Dynamic Memory: A Heaping Helping](/2019/02/a-heaping-helping.html)
- [Compiler Optimizations: What It's Done For You Lately](/2019/02/compiler-optimizations.html)
- [Summary: What Are the Rules?](/2019/02/summary.html)
# Foreword
Rust's three defining features of
[Performance, Reliability, and Productivity](https://www.rust-lang.org/) are all driven to a great
degree by the how the Rust compiler understands memory usage. Unlike managed memory languages (Java,
Python), Rust
[doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis)
garbage collect; instead, it uses an
[ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) system to reason about
how long objects will last in your program. In some cases, if the life of an object is fairly
transient, Rust can make use of a very fast region called the "stack." When that's not possible,
Rust uses
[dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation)
and the ownership system to ensure you can't accidentally corrupt memory. It's not as fast, but it
is important to have available.
That said, there are specific situations in Rust where you'd never need to worry about the
stack/heap distinction! If you:
1. Never use `unsafe`
2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)
...then it's not possible for you to use dynamic memory!
For some uses of Rust, typically embedded devices, these constraints are OK. They have very limited
memory, and the program binary size itself may significantly affect what's available! There's no
operating system able to manage this
["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that's not an issue
because there's only one running application. The
[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, and
interacting with the "real world" through extra peripherals is accomplished by reading and writing
to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html).
Most Rust programs find these requirements overly burdensome though. C++ developers would struggle
without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) (except those
hardcore no-STL people), and Rust developers would struggle without
[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above,
`std::vec` is actually a part of the
[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. `Box`,
`Rc`, etc., are also unusable for the same reason.
Whether writing code for embedded devices or not, the important thing in both situations is how much
you know _before your application starts_ about what its memory usage will look like. In embedded
devices, there's a small, fixed amount of memory to use. In a browser, you have no idea how large
[google.com](https://www.google.com)'s home page is until you start trying to download it. The
compiler uses this knowledge (or lack thereof) to optimize how memory is used; put simply, your code
runs faster when the compiler can guarantee exactly how much memory your program needs while it's
running. This series is all about understanding how the compiler reasons about your program, with an
emphasis on the implications for performance.
Now let's address some conditions and caveats before going much further:
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll
ignore.
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) and
address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`).
- All content will be run using Rust 1.32, as that's the highest currently supported in the
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
that are available in nightly.
- Because of the nature of the content, being able to read assembly is helpful. We'll keep it
simple, but I [found](https://stackoverflow.com/a/4584131/1454178) a
[refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop`
[instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) was helpful while writing
this.
- I've tried to be precise in saying only what I can prove using the tools (ASM, docs) that are
available, but if there's something said in error it will be corrected expeditiously. Please let
me know at [bradlee@speice.io](mailto:bradlee@speice.io)
Finally, I'll do what I can to flag potential future changes but the Rust docs have a notice worth
repeating:
> Rust does not currently have a rigorously and formally defined memory model.
>
> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html)

View File

@ -0,0 +1,102 @@
---
slug: 2019/02/understanding-allocations-in-rust
title: "Allocations in Rust: Foreword"
date: 2019-02-04 12:00:00
authors: [bspeice]
tags: []
---
There's an alchemy of distilling complex technical topics into articles and videos that change the
way programmers see the tools they interact with on a regular basis. I knew what a linker was, but
there's a staggering amount of complexity in between
[the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). Rust programmers use the
[`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) type all the time, but there's a
rich history of the Rust language itself wrapped up in
[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/).
In a similar vein, this series attempts to look at code and understand how memory is used; the
complex choreography of operating system, compiler, and program that frees you to focus on
functionality far-flung from frivolous book-keeping. The Rust compiler relieves a great deal of the
cognitive burden associated with memory management, but we're going to step into its world for a
while.
Let's learn a bit about memory in Rust.
<!-- truncate -->
---
Rust's three defining features of
[Performance, Reliability, and Productivity](https://www.rust-lang.org/) are all driven to a great
degree by the how the Rust compiler understands memory usage. Unlike managed memory languages (Java,
Python), Rust
[doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis)
garbage collect; instead, it uses an
[ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) system to reason about
how long objects will last in your program. In some cases, if the life of an object is fairly
transient, Rust can make use of a very fast region called the "stack." When that's not possible,
Rust uses
[dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation)
and the ownership system to ensure you can't accidentally corrupt memory. It's not as fast, but it
is important to have available.
That said, there are specific situations in Rust where you'd never need to worry about the
stack/heap distinction! If you:
1. Never use `unsafe`
2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)
...then it's not possible for you to use dynamic memory!
For some uses of Rust, typically embedded devices, these constraints are OK. They have very limited
memory, and the program binary size itself may significantly affect what's available! There's no
operating system able to manage this
["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that's not an issue
because there's only one running application. The
[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, and
interacting with the "real world" through extra peripherals is accomplished by reading and writing
to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html).
Most Rust programs find these requirements overly burdensome though. C++ developers would struggle
without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) (except those
hardcore no-STL people), and Rust developers would struggle without
[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above,
`std::vec` is actually a part of the
[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. `Box`,
`Rc`, etc., are also unusable for the same reason.
Whether writing code for embedded devices or not, the important thing in both situations is how much
you know _before your application starts_ about what its memory usage will look like. In embedded
devices, there's a small, fixed amount of memory to use. In a browser, you have no idea how large
[google.com](https://www.google.com)'s home page is until you start trying to download it. The
compiler uses this knowledge (or lack thereof) to optimize how memory is used; put simply, your code
runs faster when the compiler can guarantee exactly how much memory your program needs while it's
running. This series is all about understanding how the compiler reasons about your program, with an
emphasis on the implications for performance.
Now let's address some conditions and caveats before going much further:
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll
ignore.
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) and
address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`).
- All content will be run using Rust 1.32, as that's the highest currently supported in the
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
that are available in nightly.
- Because of the nature of the content, being able to read assembly is helpful. We'll keep it
simple, but I [found](https://stackoverflow.com/a/4584131/1454178) a
[refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop`
[instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) was helpful while writing
this.
- I've tried to be precise in saying only what I can prove using the tools (ASM, docs) that are
available, but if there's something said in error it will be corrected expeditiously. Please let
me know at [bradlee@speice.io](mailto:bradlee@speice.io)
Finally, I'll do what I can to flag potential future changes but the Rust docs have a notice worth
repeating:
> Rust does not currently have a rigorously and formally defined memory model.
>
> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html)

View File

@ -0,0 +1,337 @@
---
layout: post
title: "Global Memory Usage: The Whole World"
description: "Static considered slightly less harmful."
category:
tags: [rust, understanding-allocations]
---
The first memory type we'll look at is pretty special: when Rust can prove that a _value_ is fixed
for the life of a program (`const`), and when a _reference_ is unique for the life of a program
(`static` as a declaration, not
[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) as a
lifetime), we can make use of global memory. This special section of data is embedded directly in
the program binary so that variables are ready to go once the program loads; no additional
computation is necessary.
Understanding the value/reference distinction is important for reasons we'll go into below, and
while the
[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) for
these two keywords is available, we'll take a hands-on approach to the topic.
# **const**
When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars,
`struct`s, etc.), you can declare it `const`. This tells the compiler that it's safe to treat the
value as never changing, and enables some interesting optimizations; not only is there no
initialization cost to creating the value (it is loaded at the same time as the executable parts of
your program), but the compiler can also copy the value around if it speeds up the code.
The points we need to address when talking about `const` are:
- `Const` values are stored in read-only memory - it's impossible to modify.
- Values resulting from calling a `const fn` are materialized at compile-time.
- The compiler may (or may not) copy `const` values wherever it chooses.
## Read-Only
The first point is a bit strange - "read-only memory."
[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants)
mentions in a couple places that using `mut` with constants is illegal, but it's also important to
demonstrate just how immutable they are. _Typically_ in Rust you can use
[interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) to modify
things that aren't declared `mut`.
[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an example of this
pattern in action:
```rust
use std::cell::RefCell;
fn my_mutator(cell: &RefCell<u8>) {
// Even though we're given an immutable reference,
// the `replace` method allows us to modify the inner value.
cell.replace(14);
}
fn main() {
let cell = RefCell::new(25);
// Prints out 25
println!("Cell: {:?}", cell);
my_mutator(&cell);
// Prints out 14
println!("Cell: {:?}", cell);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2)
When `const` is involved though, interior mutability is impossible:
```rust
use std::cell::RefCell;
const CELL: RefCell<u8> = RefCell::new(25);
fn my_mutator(cell: &RefCell<u8>) {
cell.replace(14);
}
fn main() {
// First line prints 25 as expected
println!("Cell: {:?}", &CELL);
my_mutator(&CELL);
// Second line *still* prints 25
println!("Cell: {:?}", &CELL);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00)
And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html):
```rust
use std::sync::Once;
const SURPRISE: Once = Once::new();
fn main() {
// This is how `Once` is supposed to be used
SURPRISE.call_once(|| println!("Initializing..."));
// Because `Once` is a `const` value, we never record it
// having been initialized the first time, and this closure
// will also execute.
SURPRISE.call_once(|| println!("Initializing again???"));
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed)
When the
[`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md)
refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this
behavior is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this
as an error, but it's still something to be aware of.
## Initialization == Compilation
The next thing to mention is that `const` values are loaded into memory _as part of your program
binary_. Because of this, any `const` values declared in your program will be "realized" at
compile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may
be able to prefetch the value), but that's it.
```rust
use std::cell::RefCell;
const CELL: RefCell<u32> = RefCell::new(24);
pub fn multiply(value: u32) -> u32 {
// CELL is stored at `.L__unnamed_1`
value * (*CELL.get_mut())
}
```
-- [Compiler Explorer](https://godbolt.org/z/Th8boO)
The compiler creates one `RefCell`, uses it everywhere, and never needs to call the `RefCell::new`
function.
## Copying
If it's helpful though, the compiler can choose to copy `const` values.
```rust
const FACTOR: u32 = 1000;
pub fn multiply(value: u32) -> u32 {
// See assembly line 4 for the `mov edi, 1000` instruction
value * FACTOR
}
pub fn multiply_twice(value: u32) -> u32 {
// See assembly lines 22 and 29 for `mov edi, 1000` instructions
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/ZtS54X)
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the
`multiply` and `multiply_twice` functions; the "1000" value is never "stored" anywhere, as it's
small enough to inline into the assembly instructions.
Finally, getting the address of a `const` value is possible, but not guaranteed to be unique
(because the compiler can choose to copy values). I was unable to get non-unique pointers in my
testing (even using different crates), but the specifications are clear enough: _don't rely on
pointers to `const` values being consistent_. To be frank, caring about locations for `const` values
is almost certainly a code smell.
# **static**
Static variables are related to `const` variables, but take a slightly different approach. When we
declare that a _reference_ is unique for the life of a program, you have a `static` variable
(unrelated to the `'static` lifetime). Because of the reference/value distinction with
`const`/`static`, static variables behave much more like typical "global" variables.
But to understand `static`, here's what we'll look at:
- `static` variables are globally unique locations in memory.
- Like `const`, `static` variables are loaded at the same time as your program being read into
memory.
- All `static` variables must implement the
[`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) marker trait.
- Interior mutability is safe and acceptable when using `static` variables.
## Memory Uniqueness
The single biggest difference between `const` and `static` is the guarantees provided about
uniqueness. Where `const` variables may or may not be copied in code, `static` variables are
guarantee to be unique. If we take a previous `const` example and change it to `static`, the
difference should be clear:
```rust
static FACTOR: u32 = 1000;
pub fn multiply(value: u32) -> u32 {
// The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used
value * FACTOR
}
pub fn multiply_twice(value: u32) -> u32 {
// The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ)
Where [previously](#copying) there were plenty of references to multiplying by 1000, the new
assembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be
done, but the compiler can no longer prove the value never changes during execution.
## Initialization == Compilation
Next, let's talk about initialization. The simplest case is initializing static variables with
either scalar or struct notation:
```rust
#[derive(Debug)]
struct MyStruct {
x: u32
}
static MY_STRUCT: MyStruct = MyStruct {
// You can even reference other statics
// declared later
x: MY_VAL
};
static MY_VAL: u32 = 24;
fn main() {
println!("Static MyStruct: {:?}", MY_STRUCT);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)
Things can get a bit weirder when using `const fn` though. In most cases, it just works:
```rust
#[derive(Debug)]
struct MyStruct {
x: u32
}
impl MyStruct {
const fn new() -> MyStruct {
MyStruct { x: 24 }
}
}
static MY_STRUCT: MyStruct = MyStruct::new();
fn main() {
println!("const fn Static MyStruct: {:?}", MY_STRUCT);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)
However, there's a caveat: you're currently not allowed to use `const fn` to initialize static
variables of types that aren't marked `Sync`. For example,
[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) is a
`const fn`, but because
[`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you'll
get an error at compile time:
```rust
use std::cell::RefCell;
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_LOCK: RefCell<u8> = RefCell::new(0);
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)
It's likely that this will
[change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though.
## **Sync**
Which leads well to the next point: static variable types must implement the
[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they're globally
unique, it must be safe for you to access static variables from any thread at any time. Most
`struct` definitions automatically implement the `Sync` trait because they contain only elements
which themselves implement `Sync` (read more in the
[Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). This is why earlier examples could
get away with initializing statics, even though we never included an `impl Sync for MyStruct` in the
code. To demonstrate this property, Rust refuses to compile our earlier example if we add a
non-`Sync` element to the `struct` definition:
```rust
use std::cell::RefCell;
struct MyStruct {
x: u32,
y: RefCell<u8>,
}
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_STRUCT: MyStruct = MyStruct {
x: 8,
y: RefCell::new(8)
};
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)
## Interior Mutability
Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. If we
want to stay in `safe` Rust, we can use interior mutability to accomplish similar goals:
```rust
use std::sync::Once;
// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once
static INIT: Once = Once::new();
fn main() {
// Note that while `INIT` is declared immutable, we're still allowed
// to mutate its interior
INIT.call_once(|| println!("Initializing..."));
// This code won't panic, as the interior of INIT was modified
// as part of the previous `call_once`
INIT.call_once(|| panic!("INIT was called twice!"));
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)

View File

@ -0,0 +1,339 @@
---
slug: 2019/02/the-whole-world
title: "Allocations in Rust: Global memory"
date: 2019-02-05 12:00:00
authors: [bspeice]
tags: []
---
The first memory type we'll look at is pretty special: when Rust can prove that a _value_ is fixed
for the life of a program (`const`), and when a _reference_ is unique for the life of a program
(`static` as a declaration, not
[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) as a
lifetime), we can make use of global memory. This special section of data is embedded directly in
the program binary so that variables are ready to go once the program loads; no additional
computation is necessary.
Understanding the value/reference distinction is important for reasons we'll go into below, and
while the
[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) for
these two keywords is available, we'll take a hands-on approach to the topic.
<!-- truncate -->
## `const` values
When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars,
`struct`s, etc.), you can declare it `const`. This tells the compiler that it's safe to treat the
value as never changing, and enables some interesting optimizations; not only is there no
initialization cost to creating the value (it is loaded at the same time as the executable parts of
your program), but the compiler can also copy the value around if it speeds up the code.
The points we need to address when talking about `const` are:
- `Const` values are stored in read-only memory - it's impossible to modify.
- Values resulting from calling a `const fn` are materialized at compile-time.
- The compiler may (or may not) copy `const` values wherever it chooses.
### Read-Only
The first point is a bit strange - "read-only memory."
[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants)
mentions in a couple places that using `mut` with constants is illegal, but it's also important to
demonstrate just how immutable they are. _Typically_ in Rust you can use
[interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) to modify
things that aren't declared `mut`.
[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an example of this
pattern in action:
```rust
use std::cell::RefCell;
fn my_mutator(cell: &RefCell<u8>) {
// Even though we're given an immutable reference,
// the `replace` method allows us to modify the inner value.
cell.replace(14);
}
fn main() {
let cell = RefCell::new(25);
// Prints out 25
println!("Cell: {:?}", cell);
my_mutator(&cell);
// Prints out 14
println!("Cell: {:?}", cell);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2)
When `const` is involved though, interior mutability is impossible:
```rust
use std::cell::RefCell;
const CELL: RefCell<u8> = RefCell::new(25);
fn my_mutator(cell: &RefCell<u8>) {
cell.replace(14);
}
fn main() {
// First line prints 25 as expected
println!("Cell: {:?}", &CELL);
my_mutator(&CELL);
// Second line *still* prints 25
println!("Cell: {:?}", &CELL);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00)
And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html):
```rust
use std::sync::Once;
const SURPRISE: Once = Once::new();
fn main() {
// This is how `Once` is supposed to be used
SURPRISE.call_once(|| println!("Initializing..."));
// Because `Once` is a `const` value, we never record it
// having been initialized the first time, and this closure
// will also execute.
SURPRISE.call_once(|| println!("Initializing again???"));
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed)
When the
[`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md)
refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this
behavior is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this
as an error, but it's still something to be aware of.
### Initialization
The next thing to mention is that `const` values are loaded into memory _as part of your program
binary_. Because of this, any `const` values declared in your program will be "realized" at
compile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may
be able to prefetch the value), but that's it.
```rust
use std::cell::RefCell;
const CELL: RefCell<u32> = RefCell::new(24);
pub fn multiply(value: u32) -> u32 {
// CELL is stored at `.L__unnamed_1`
value * (*CELL.get_mut())
}
```
-- [Compiler Explorer](https://godbolt.org/z/Th8boO)
The compiler creates one `RefCell`, uses it everywhere, and never needs to call the `RefCell::new`
function.
### Copying
If it's helpful though, the compiler can choose to copy `const` values.
```rust
const FACTOR: u32 = 1000;
pub fn multiply(value: u32) -> u32 {
// See assembly line 4 for the `mov edi, 1000` instruction
value * FACTOR
}
pub fn multiply_twice(value: u32) -> u32 {
// See assembly lines 22 and 29 for `mov edi, 1000` instructions
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/ZtS54X)
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the
`multiply` and `multiply_twice` functions; the "1000" value is never "stored" anywhere, as it's
small enough to inline into the assembly instructions.
Finally, getting the address of a `const` value is possible, but not guaranteed to be unique
(because the compiler can choose to copy values). I was unable to get non-unique pointers in my
testing (even using different crates), but the specifications are clear enough: _don't rely on
pointers to `const` values being consistent_. To be frank, caring about locations for `const` values
is almost certainly a code smell.
## `static` values
Static variables are related to `const` variables, but take a slightly different approach. When we
declare that a _reference_ is unique for the life of a program, you have a `static` variable
(unrelated to the `'static` lifetime). Because of the reference/value distinction with
`const`/`static`, static variables behave much more like typical "global" variables.
But to understand `static`, here's what we'll look at:
- `static` variables are globally unique locations in memory.
- Like `const`, `static` variables are loaded at the same time as your program being read into
memory.
- All `static` variables must implement the
[`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) marker trait.
- Interior mutability is safe and acceptable when using `static` variables.
### Memory Uniqueness
The single biggest difference between `const` and `static` is the guarantees provided about
uniqueness. Where `const` variables may or may not be copied in code, `static` variables are
guarantee to be unique. If we take a previous `const` example and change it to `static`, the
difference should be clear:
```rust
static FACTOR: u32 = 1000;
pub fn multiply(value: u32) -> u32 {
// The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used
value * FACTOR
}
pub fn multiply_twice(value: u32) -> u32 {
// The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ)
Where [previously](#copying) there were plenty of references to multiplying by 1000, the new
assembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be
done, but the compiler can no longer prove the value never changes during execution.
### Initialization
Next, let's talk about initialization. The simplest case is initializing static variables with
either scalar or struct notation:
```rust
#[derive(Debug)]
struct MyStruct {
x: u32
}
static MY_STRUCT: MyStruct = MyStruct {
// You can even reference other statics
// declared later
x: MY_VAL
};
static MY_VAL: u32 = 24;
fn main() {
println!("Static MyStruct: {:?}", MY_STRUCT);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)
Things can get a bit weirder when using `const fn` though. In most cases, it just works:
```rust
#[derive(Debug)]
struct MyStruct {
x: u32
}
impl MyStruct {
const fn new() -> MyStruct {
MyStruct { x: 24 }
}
}
static MY_STRUCT: MyStruct = MyStruct::new();
fn main() {
println!("const fn Static MyStruct: {:?}", MY_STRUCT);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)
However, there's a caveat: you're currently not allowed to use `const fn` to initialize static
variables of types that aren't marked `Sync`. For example,
[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) is a
`const fn`, but because
[`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you'll
get an error at compile time:
```rust
use std::cell::RefCell;
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_LOCK: RefCell<u8> = RefCell::new(0);
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)
It's likely that this will
[change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though.
### The `Sync` marker
Which leads well to the next point: static variable types must implement the
[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they're globally
unique, it must be safe for you to access static variables from any thread at any time. Most
`struct` definitions automatically implement the `Sync` trait because they contain only elements
which themselves implement `Sync` (read more in the
[Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). This is why earlier examples could
get away with initializing statics, even though we never included an `impl Sync for MyStruct` in the
code. To demonstrate this property, Rust refuses to compile our earlier example if we add a
non-`Sync` element to the `struct` definition:
```rust
use std::cell::RefCell;
struct MyStruct {
x: u32,
y: RefCell<u8>,
}
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_STRUCT: MyStruct = MyStruct {
x: 8,
y: RefCell::new(8)
};
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)
### Interior mutability
Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. If we
want to stay in `safe` Rust, we can use interior mutability to accomplish similar goals:
```rust
use std::sync::Once;
// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once
static INIT: Once = Once::new();
fn main() {
// Note that while `INIT` is declared immutable, we're still allowed
// to mutate its interior
INIT.call_once(|| println!("Initializing..."));
// This code won't panic, as the interior of INIT was modified
// as part of the previous `call_once`
INIT.call_once(|| panic!("INIT was called twice!"));
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)

View File

@ -0,0 +1,601 @@
---
layout: post
title: "Fixed Memory: Stacking Up"
description: "We don't need no allocator."
category:
tags: [rust, understanding-allocations]
---
`const` and `static` are perfectly fine, but it's relatively rare that we know at compile-time about
either values or references that will be the same for the duration of our program. Put another way,
it's not often the case that either you or your compiler knows how much memory your entire program
will ever need.
However, there are still some optimizations the compiler can do if it knows how much memory
individual functions will need. Specifically, the compiler can make use of "stack" memory (as
opposed to "heap" memory) which can be managed far faster in both the short- and long-term. When
requesting memory, the [`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (<1
nanosecond on modern CPUs). Contrast that to heap memory which requires an allocator (specialized
software to track what memory is in use) to reserve space. When you're finished with stack memory,
the `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory
fragmentation and other issues with the heap. All sorts of incredibly sophisticated techniques have
been used to design allocators:
- [Garbage Collection](<https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)>)
strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) (used in
[Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) and
[Reference counting](https://en.wikipedia.org/wiki/Reference_counting) (used in
[Python](https://docs.python.org/3/extending/extending.html#reference-counts))
- Thread-local structures to prevent locking the allocator in
[tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html)
- Arena structures used in [jemalloc](http://jemalloc.net/), which
[until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default)
was the primary allocator for Rust programs!
But no matter how fast your allocator is, the principle remains: the fastest allocator is the one
you never use. As such, we're not going to discuss how exactly the
[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but
we'll focus instead on the conditions that enable the Rust compiler to use faster stack-based
allocation for variables.
So, **how do we know when Rust will or will not use stack allocation for objects we create?**
Looking at other languages, it's often easy to delineate between stack and heap. Managed memory
languages (Python, Java,
[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) place
everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/),
[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize
some heap allocations away, but you should never assume it will happen. C makes things clear with
calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) needed to access
heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though
modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii).
For Rust, we can summarize as follows: **stack allocation will be used for everything that doesn't
involve "smart pointers" and collections**. We'll skip over a precise definition of the term "smart
pointer" for now, and instead discuss what we should watch for to understand when stack and heap
memory regions are used:
1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate
allocation of stack memory:
```rust
pub fn stack_alloc(x: u32) -> u32 {
// Space for `y` is allocated by subtracting from `rsp`,
// and then populated
let y = [1u8, 2, 3, 4];
// Space for `y` is deallocated by adding back to `rsp`
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/5WSgc9)
2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to watch
for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent
past:
```rust
pub fn heap_alloc(x: usize) -> usize {
// Space for elements in a vector has to be allocated
// on the heap, and is then de-allocated once the
// vector goes out of scope
let y: Vec<u8> = Vec::with_capacity(x);
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317)
<span style="font-size: .8em">Note: While the
[`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is
[called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46),
the Rust standard library only defines `Drop` implementations for types that involve heap
allocation.</span>
3. If you don't want to inspect the assembly, use a custom allocator that's able to track and alert
when heap allocations occur. Crates like
[`alloc_counter`](https://crates.io/crates/alloc_counter) are designed for exactly this purpose.
With all that in mind, let's talk about situations in which we're guaranteed to use stack memory:
- Structs are created on the stack.
- Function arguments are passed on the stack, meaning the
[`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) will
not change the memory region used.
- Enums and unions are stack-allocated.
- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated.
- Closures capture their arguments on the stack.
- Generics will use stack allocation, even with dynamic dispatch.
- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be
stack-allocated, and copying them will be done in stack memory.
- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library are
stack-allocated even when iterating over heap-based collections.
# Structs
The simplest case comes first. When creating vanilla `struct` objects, we use stack memory to hold
their contents:
```rust
struct Point {
x: u64,
y: u64,
}
struct Line {
a: Point,
b: Point,
}
pub fn make_line() {
// `origin` is stored in the first 16 bytes of memory
// starting at location `rsp`
let origin = Point { x: 0, y: 0 };
// `point` makes up the next 16 bytes of memory
let point = Point { x: 1, y: 2 };
// When creating `ray`, we just move the content out of
// `origin` and `point` into the next 32 bytes of memory
let ray = Line { a: origin, b: point };
}
```
-- [Compiler Explorer](https://godbolt.org/z/vri9BE)
Note that while some extra-fancy instructions are used for memory manipulation in the assembly, the
`sub rsp, 64` instruction indicates we're still working with the stack.
# Function arguments
Have you ever wondered how functions communicate with each other? Like, once the variables are given
to you, everything's fine. But how do you "give" those variables to another function? How do you get
the results back afterward? The answer: the compiler arranges memory and assembly instructions using
a pre-determined [calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). This
convention governs the rules around where arguments needed by a function will be located (either in
memory offsets relative to the stack pointer `rsp`, or in other registers), and where the results
can be found once the function has finished. And when multiple languages agree on what the calling
conventions are, you can do things like having [Go call Rust code](https://blog.filippo.io/rustgo/)!
Put simply: it's the compiler's job to figure out how to call other functions, and you can assume
that the compiler is good at its job.
We can see this in action using a simple example:
```rust
struct Point {
x: i64,
y: i64,
}
// We use integer division operations to keep
// the assembly clean, understanding the result
// isn't accurate.
fn distance(a: &Point, b: &Point) -> i64 {
// Immediately subtract from `rsp` the bytes needed
// to hold all the intermediate results - this is
// the stack allocation step
// The compiler used the `rdi` and `rsi` registers
// to pass our arguments, so read them in
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
// Do the actual math work
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
// Our final result will be stored in the `rax` register
// so that our caller knows where to retrieve it.
// Finally, add back to `rsp` the stack memory that is
// now ready to be used by other functions.
}
pub fn total_distance() {
let start = Point { x: 1, y: 2 };
let middle = Point { x: 3, y: 4 };
let end = Point { x: 5, y: 6 };
let _dist_1 = distance(&start, &middle);
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST)
As a consequence of function arguments never using heap memory, we can also infer that functions
using the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look
at the assembly to prove it:
```rust
struct Point {
x: i64,
y: i64,
}
// Note that there is no `distance` function in the assembly output,
// and the total line count goes from 229 with inlining off
// to 306 with inline on. Even still, no heap allocations occur.
#[inline(always)]
fn distance(a: &Point, b: &Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (a.x - b.x) * (a.x - b.x);
let y_pow = (a.y - b.y) * (a.y - b.y);
let squared = x_pow + y_pow;
squared / squared
}
pub fn total_distance() {
let start = Point { x: 1, y: 2 };
let middle = Point { x: 3, y: 4 };
let end = Point { x: 5, y: 6 };
let _dist_1 = distance(&start, &middle);
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/30Sh66)
Finally, passing by value (arguments with type
[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either
moving ownership or passing a pointer) may have slightly different layouts in assembly, but will
still use either stack memory or CPU registers:
```rust
pub struct Point {
x: i64,
y: i64,
}
// Moving values
pub fn distance_moved(a: Point, b: Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
}
// Borrowing values has two extra `mov` instructions on lines 21 and 22
pub fn distance_borrowed(a: &Point, b: &Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
}
```
-- [Compiler Explorer](https://godbolt.org/z/06hGiv)
# Enums
If you've ever worried that wrapping your types in
[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or
[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would finally make them
large enough that Rust decides to use heap allocation instead, fear no longer: `enum` and union
types don't use heap allocation:
```rust
enum MyEnum {
Small(u8),
Large(u64)
}
struct MyStruct {
x: MyEnum,
y: MyEnum,
}
pub fn enum_compare() {
let x = MyEnum::Small(0);
let y = MyEnum::Large(0);
let z = MyStruct { x, y };
let opt = Option::Some(z);
}
```
-- [Compiler Explorer](https://godbolt.org/z/HK7zBx)
Because the size of an `enum` is the size of its largest element plus a flag, the compiler can
predict how much memory is used no matter which variant of an enum is currently stored in a
variable. Thus, enums and unions have no need of heap allocation. There's unfortunately not a great
way to show this in assembly, so I'll instead point you to the
[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums)
documentation.
# Arrays
The array type is guaranteed to be stack allocated, which is why the array size must be declared.
Interestingly enough, this can be used to cause safe Rust programs to crash:
```rust
// 256 bytes
#[derive(Default)]
struct TwoFiftySix {
_a: [u64; 32]
}
// 8 kilobytes
#[derive(Default)]
struct EightK {
_a: [TwoFiftySix; 32]
}
// 256 kilobytes
#[derive(Default)]
struct TwoFiftySixK {
_a: [EightK; 32]
}
// 8 megabytes - exceeds space typically provided for the stack,
// though the kernel can be instructed to allocate more.
// On Linux, you can check stack size using `ulimit -s`
#[derive(Default)]
struct EightM {
_a: [TwoFiftySixK; 32]
}
fn main() {
// Because we already have things in stack memory
// (like the current function call stack), allocating another
// eight megabytes of stack memory crashes the program
let _x = EightM::default();
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4)
There aren't any security implications of this (no memory corruption occurs), but it's good to note
that the Rust compiler won't move arrays into heap memory even if they can be reasonably expected to
overflow the stack.
# Closures
Rules for how anonymous functions capture their arguments are typically language-specific. In Java,
[Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) are
actually objects created on the heap that capture local primitives by copying, and capture local
non-primitives as (`final`) references.
[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and
[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/)
both bind _everything_ by reference normally, but Python can also
[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has
[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions).
In Rust, arguments to closures are the same as arguments to other functions; closures are simply
functions that don't have a declared name. Some weird ordering of the stack may be required to
handle them, but it's the compiler's responsiblity to figure that out.
Each example below has the same effect, but a different assembly implementation. In the simplest
case, we immediately run a closure returned by another function. Because we don't store a reference
to the closure, the stack memory needed to store the captured values is contiguous:
```rust
fn my_func() -> impl FnOnce() {
let x = 24;
// Note that this closure in assembly looks exactly like
// any other function; you even use the `call` instruction
// to start running it.
move || { x; }
}
pub fn immediate() {
my_func()();
my_func()();
}
```
-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions
If we store a reference to the closure, the Rust compiler keeps values it needs in the stack memory
of the original function. Getting the details right is a bit harder, so the instruction count goes
up even though this code is functionally equivalent to our original example:
```rust
pub fn simple_reference() {
let x = my_func();
let y = my_func();
y();
x();
}
```
-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions
Even things like variable order can make a difference in instruction count:
```rust
pub fn complex() {
let x = my_func();
let y = my_func();
x();
y();
}
```
-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions
In every circumstance though, the compiler ensured that no heap allocations were necessary.
# Generics
Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) and dynamic
dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often _associated_ with trait
objects being stored in the heap, dynamic dispatch can be used with stack allocated objects as well:
```rust
trait GetInt {
fn get_int(&self) -> u64;
}
// vtable stored at section L__unnamed_1
struct WhyNotU8 {
x: u8
}
impl GetInt for WhyNotU8 {
fn get_int(&self) -> u64 {
self.x as u64
}
}
// vtable stored at section L__unnamed_2
struct ActualU64 {
x: u64
}
impl GetInt for ActualU64 {
fn get_int(&self) -> u64 {
self.x
}
}
// `&dyn` declares that we want to use dynamic dispatch
// rather than monomorphization, so there is only one
// `retrieve_int` function that shows up in the final assembly.
// If we used generics, there would be one implementation of
// `retrieve_int` for each type that implements `GetInt`.
pub fn retrieve_int(u: &dyn GetInt) {
// In the assembly, we just call an address given to us
// in the `rsi` register and hope that it was set up
// correctly when this function was invoked.
let x = u.get_int();
}
pub fn do_call() {
// Note that even though the vtable for `WhyNotU8` and
// `ActualU64` includes a pointer to
// `core::ptr::real_drop_in_place`, it is never invoked.
let a = WhyNotU8 { x: 0 };
let b = ActualU64 { x: 0 };
retrieve_int(&a);
retrieve_int(&b);
}
```
-- [Compiler Explorer](https://godbolt.org/z/u_yguS)
It's hard to imagine practical situations where dynamic dispatch would be used for objects that
aren't heap allocated, but it technically can be done.
# Copy types
Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs
[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can
be addressed here, so I'll leave them to do the job. From a memory perspective though, their
guideline is reasonable:
[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy).
While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` (move semantics for
stack objects vs. copying stack pointers vs. copying stack `struct`s), _it's impossible for `Copy`
to introduce a heap allocation_.
But why is this the case? Fundamentally, it's because the language controls what `Copy` means -
["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone)
because it's a marker trait. From there we'll note that a type
[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy)
if (and only if) its components implement `Copy`, and that
[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors).
Thus, assignments involving heap types are always move semantics, and new heap allocations won't
occur because of implicit operator behavior.
```rust
#[derive(Clone)]
struct Cloneable {
x: Box<u64>
}
// error[E0204]: the trait `Copy` may not be implemented for this type
#[derive(Copy, Clone)]
struct NotCopyable {
x: Box<u64>
}
```
-- [Compiler Explorer](https://godbolt.org/z/VToRuK)
# Iterators
In managed memory languages (like
[Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), there's a subtle
difference between these two code samples:
```java
public static int sum_for(List<Long> vals) {
long sum = 0;
// Regular for loop
for (int i = 0; i < vals.length; i++) {
sum += vals[i];
}
return sum;
}
public static int sum_foreach(List<Long> vals) {
long sum = 0;
// "Foreach" loop - uses iteration
for (Long l : vals) {
sum += l;
}
return sum;
}
```
In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, an object of type
[`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html)
is allocated on the heap, and will eventually be garbage-collected. This isn't a great design;
iterators are often transient objects that you need during a function and can discard once the
function ends. Sounds exactly like the issue stack-allocated objects address, no?
In Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in
heap memory, but the iterator itself
([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. In
each of the examples below we iterate over a collection, but never use heap allocation:
```rust
use std::collections::HashMap;
// There's a lot of assembly generated, but if you search in the text,
// there are no references to `real_drop_in_place` anywhere.
pub fn sum_vec(x: &Vec<u32>) {
let mut s = 0;
// Basic iteration over vectors doesn't need allocation
for y in x {
s += y;
}
}
pub fn sum_enumerate(x: &Vec<u32>) {
let mut s = 0;
// More complex iterators are just fine too
for (_i, y) in x.iter().enumerate() {
s += y;
}
}
pub fn sum_hm(x: &HashMap<u32, u32>) {
let mut s = 0;
// And it's not just Vec, all types will allocate the iterator
// on stack memory
for y in x.values() {
s += y;
}
}
```
-- [Compiler Explorer](https://godbolt.org/z/FTT3CT)

View File

@ -0,0 +1,604 @@
---
slug: 2019/02/stacking-up
title: "Allocations in Rust: Fixed memory"
date: 2019-02-06 12:00:00
authors: [bspeice]
tags: []
---
`const` and `static` are perfectly fine, but it's relatively rare that we know at compile-time about
either values or references that will be the same for the duration of our program. Put another way,
it's not often the case that either you or your compiler knows how much memory your entire program
will ever need.
However, there are still some optimizations the compiler can do if it knows how much memory
individual functions will need. Specifically, the compiler can make use of "stack" memory (as
opposed to "heap" memory) which can be managed far faster in both the short- and long-term.
<!-- truncate -->
When requesting memory, the [`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (&lt;1ns
on modern CPUs). Contrast that to heap memory which requires an allocator (specialized
software to track what memory is in use) to reserve space. When you're finished with stack memory,
the `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory
fragmentation and other issues with the heap. All sorts of incredibly sophisticated techniques have
been used to design allocators:
- [Garbage Collection](<https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)>)
strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) (used in
[Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) and
[Reference counting](https://en.wikipedia.org/wiki/Reference_counting) (used in
[Python](https://docs.python.org/3/extending/extending.html#reference-counts))
- Thread-local structures to prevent locking the allocator in
[tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html)
- Arena structures used in [jemalloc](http://jemalloc.net/), which
[until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default)
was the primary allocator for Rust programs!
But no matter how fast your allocator is, the principle remains: the fastest allocator is the one
you never use. As such, we're not going to discuss how exactly the
[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but
we'll focus instead on the conditions that enable the Rust compiler to use faster stack-based
allocation for variables.
So, **how do we know when Rust will or will not use stack allocation for objects we create?**
Looking at other languages, it's often easy to delineate between stack and heap. Managed memory
languages (Python, Java,
[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) place
everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/),
[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize
some heap allocations away, but you should never assume it will happen. C makes things clear with
calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) needed to access
heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though
modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii).
For Rust, we can summarize as follows: **stack allocation will be used for everything that doesn't
involve "smart pointers" and collections**. We'll skip over a precise definition of the term "smart
pointer" for now, and instead discuss what we should watch for to understand when stack and heap
memory regions are used:
1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate
allocation of stack memory:
```rust
pub fn stack_alloc(x: u32) -> u32 {
// Space for `y` is allocated by subtracting from `rsp`,
// and then populated
let y = [1u8, 2, 3, 4];
// Space for `y` is deallocated by adding back to `rsp`
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/5WSgc9)
2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to watch
for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent
past:
```rust
pub fn heap_alloc(x: usize) -> usize {
// Space for elements in a vector has to be allocated
// on the heap, and is then de-allocated once the
// vector goes out of scope
let y: Vec<u8> = Vec::with_capacity(x);
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317)
<small>Note: While the
[`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is
[called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46),
the Rust standard library only defines `Drop` implementations for types that involve heap
allocation.</small>
3. If you don't want to inspect the assembly, use a custom allocator that's able to track and alert
when heap allocations occur. Crates like
[`alloc_counter`](https://crates.io/crates/alloc_counter) are designed for exactly this purpose.
With all that in mind, let's talk about situations in which we're guaranteed to use stack memory:
- Structs are created on the stack.
- Function arguments are passed on the stack, meaning the
[`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) will
not change the memory region used.
- Enums and unions are stack-allocated.
- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated.
- Closures capture their arguments on the stack.
- Generics will use stack allocation, even with dynamic dispatch.
- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be
stack-allocated, and copying them will be done in stack memory.
- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library are
stack-allocated even when iterating over heap-based collections.
## Structs
The simplest case comes first. When creating vanilla `struct` objects, we use stack memory to hold
their contents:
```rust
struct Point {
x: u64,
y: u64,
}
struct Line {
a: Point,
b: Point,
}
pub fn make_line() {
// `origin` is stored in the first 16 bytes of memory
// starting at location `rsp`
let origin = Point { x: 0, y: 0 };
// `point` makes up the next 16 bytes of memory
let point = Point { x: 1, y: 2 };
// When creating `ray`, we just move the content out of
// `origin` and `point` into the next 32 bytes of memory
let ray = Line { a: origin, b: point };
}
```
-- [Compiler Explorer](https://godbolt.org/z/vri9BE)
Note that while some extra-fancy instructions are used for memory manipulation in the assembly, the
`sub rsp, 64` instruction indicates we're still working with the stack.
## Function arguments
Have you ever wondered how functions communicate with each other? Like, once the variables are given
to you, everything's fine. But how do you "give" those variables to another function? How do you get
the results back afterward? The answer: the compiler arranges memory and assembly instructions using
a pre-determined [calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). This
convention governs the rules around where arguments needed by a function will be located (either in
memory offsets relative to the stack pointer `rsp`, or in other registers), and where the results
can be found once the function has finished. And when multiple languages agree on what the calling
conventions are, you can do things like having [Go call Rust code](https://blog.filippo.io/rustgo/)!
Put simply: it's the compiler's job to figure out how to call other functions, and you can assume
that the compiler is good at its job.
We can see this in action using a simple example:
```rust
struct Point {
x: i64,
y: i64,
}
// We use integer division operations to keep
// the assembly clean, understanding the result
// isn't accurate.
fn distance(a: &Point, b: &Point) -> i64 {
// Immediately subtract from `rsp` the bytes needed
// to hold all the intermediate results - this is
// the stack allocation step
// The compiler used the `rdi` and `rsi` registers
// to pass our arguments, so read them in
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
// Do the actual math work
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
// Our final result will be stored in the `rax` register
// so that our caller knows where to retrieve it.
// Finally, add back to `rsp` the stack memory that is
// now ready to be used by other functions.
}
pub fn total_distance() {
let start = Point { x: 1, y: 2 };
let middle = Point { x: 3, y: 4 };
let end = Point { x: 5, y: 6 };
let _dist_1 = distance(&start, &middle);
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST)
As a consequence of function arguments never using heap memory, we can also infer that functions
using the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look
at the assembly to prove it:
```rust
struct Point {
x: i64,
y: i64,
}
// Note that there is no `distance` function in the assembly output,
// and the total line count goes from 229 with inlining off
// to 306 with inline on. Even still, no heap allocations occur.
#[inline(always)]
fn distance(a: &Point, b: &Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (a.x - b.x) * (a.x - b.x);
let y_pow = (a.y - b.y) * (a.y - b.y);
let squared = x_pow + y_pow;
squared / squared
}
pub fn total_distance() {
let start = Point { x: 1, y: 2 };
let middle = Point { x: 3, y: 4 };
let end = Point { x: 5, y: 6 };
let _dist_1 = distance(&start, &middle);
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/30Sh66)
Finally, passing by value (arguments with type
[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either
moving ownership or passing a pointer) may have slightly different layouts in assembly, but will
still use either stack memory or CPU registers:
```rust
pub struct Point {
x: i64,
y: i64,
}
// Moving values
pub fn distance_moved(a: Point, b: Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
}
// Borrowing values has two extra `mov` instructions on lines 21 and 22
pub fn distance_borrowed(a: &Point, b: &Point) -> i64 {
let x1 = a.x;
let x2 = b.x;
let y1 = a.y;
let y2 = b.y;
let x_pow = (x1 - x2) * (x1 - x2);
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
}
```
-- [Compiler Explorer](https://godbolt.org/z/06hGiv)
## Enums
If you've ever worried that wrapping your types in
[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or
[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would finally make them
large enough that Rust decides to use heap allocation instead, fear no longer: `enum` and union
types don't use heap allocation:
```rust
enum MyEnum {
Small(u8),
Large(u64)
}
struct MyStruct {
x: MyEnum,
y: MyEnum,
}
pub fn enum_compare() {
let x = MyEnum::Small(0);
let y = MyEnum::Large(0);
let z = MyStruct { x, y };
let opt = Option::Some(z);
}
```
-- [Compiler Explorer](https://godbolt.org/z/HK7zBx)
Because the size of an `enum` is the size of its largest element plus a flag, the compiler can
predict how much memory is used no matter which variant of an enum is currently stored in a
variable. Thus, enums and unions have no need of heap allocation. There's unfortunately not a great
way to show this in assembly, so I'll instead point you to the
[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums)
documentation.
## Arrays
The array type is guaranteed to be stack allocated, which is why the array size must be declared.
Interestingly enough, this can be used to cause safe Rust programs to crash:
```rust
// 256 bytes
#[derive(Default)]
struct TwoFiftySix {
_a: [u64; 32]
}
// 8 kilobytes
#[derive(Default)]
struct EightK {
_a: [TwoFiftySix; 32]
}
// 256 kilobytes
#[derive(Default)]
struct TwoFiftySixK {
_a: [EightK; 32]
}
// 8 megabytes - exceeds space typically provided for the stack,
// though the kernel can be instructed to allocate more.
// On Linux, you can check stack size using `ulimit -s`
#[derive(Default)]
struct EightM {
_a: [TwoFiftySixK; 32]
}
fn main() {
// Because we already have things in stack memory
// (like the current function call stack), allocating another
// eight megabytes of stack memory crashes the program
let _x = EightM::default();
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4)
There aren't any security implications of this (no memory corruption occurs), but it's good to note
that the Rust compiler won't move arrays into heap memory even if they can be reasonably expected to
overflow the stack.
## Closures
Rules for how anonymous functions capture their arguments are typically language-specific. In Java,
[Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) are
actually objects created on the heap that capture local primitives by copying, and capture local
non-primitives as (`final`) references.
[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and
[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/)
both bind _everything_ by reference normally, but Python can also
[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has
[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions).
In Rust, arguments to closures are the same as arguments to other functions; closures are simply
functions that don't have a declared name. Some weird ordering of the stack may be required to
handle them, but it's the compiler's responsiblity to figure that out.
Each example below has the same effect, but a different assembly implementation. In the simplest
case, we immediately run a closure returned by another function. Because we don't store a reference
to the closure, the stack memory needed to store the captured values is contiguous:
```rust
fn my_func() -> impl FnOnce() {
let x = 24;
// Note that this closure in assembly looks exactly like
// any other function; you even use the `call` instruction
// to start running it.
move || { x; }
}
pub fn immediate() {
my_func()();
my_func()();
}
```
-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions
If we store a reference to the closure, the Rust compiler keeps values it needs in the stack memory
of the original function. Getting the details right is a bit harder, so the instruction count goes
up even though this code is functionally equivalent to our original example:
```rust
pub fn simple_reference() {
let x = my_func();
let y = my_func();
y();
x();
}
```
-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions
Even things like variable order can make a difference in instruction count:
```rust
pub fn complex() {
let x = my_func();
let y = my_func();
x();
y();
}
```
-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions
In every circumstance though, the compiler ensured that no heap allocations were necessary.
## Generics
Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) and dynamic
dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often _associated_ with trait
objects being stored in the heap, dynamic dispatch can be used with stack allocated objects as well:
```rust
trait GetInt {
fn get_int(&self) -> u64;
}
// vtable stored at section L__unnamed_1
struct WhyNotU8 {
x: u8
}
impl GetInt for WhyNotU8 {
fn get_int(&self) -> u64 {
self.x as u64
}
}
// vtable stored at section L__unnamed_2
struct ActualU64 {
x: u64
}
impl GetInt for ActualU64 {
fn get_int(&self) -> u64 {
self.x
}
}
// `&dyn` declares that we want to use dynamic dispatch
// rather than monomorphization, so there is only one
// `retrieve_int` function that shows up in the final assembly.
// If we used generics, there would be one implementation of
// `retrieve_int` for each type that implements `GetInt`.
pub fn retrieve_int(u: &dyn GetInt) {
// In the assembly, we just call an address given to us
// in the `rsi` register and hope that it was set up
// correctly when this function was invoked.
let x = u.get_int();
}
pub fn do_call() {
// Note that even though the vtable for `WhyNotU8` and
// `ActualU64` includes a pointer to
// `core::ptr::real_drop_in_place`, it is never invoked.
let a = WhyNotU8 { x: 0 };
let b = ActualU64 { x: 0 };
retrieve_int(&a);
retrieve_int(&b);
}
```
-- [Compiler Explorer](https://godbolt.org/z/u_yguS)
It's hard to imagine practical situations where dynamic dispatch would be used for objects that
aren't heap allocated, but it technically can be done.
## Copy types
Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs
[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can
be addressed here, so I'll leave them to do the job. From a memory perspective though, their
guideline is reasonable:
[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy).
While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` (move semantics for
stack objects vs. copying stack pointers vs. copying stack `struct`s), _it's impossible for `Copy`
to introduce a heap allocation_.
But why is this the case? Fundamentally, it's because the language controls what `Copy` means -
["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone)
because it's a marker trait. From there we'll note that a type
[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy)
if (and only if) its components implement `Copy`, and that
[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors).
Thus, assignments involving heap types are always move semantics, and new heap allocations won't
occur because of implicit operator behavior.
```rust
#[derive(Clone)]
struct Cloneable {
x: Box<u64>
}
// error[E0204]: the trait `Copy` may not be implemented for this type
#[derive(Copy, Clone)]
struct NotCopyable {
x: Box<u64>
}
```
-- [Compiler Explorer](https://godbolt.org/z/VToRuK)
## Iterators
In managed memory languages (like
[Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), there's a subtle
difference between these two code samples:
```java
public static int sum_for(List<Long> vals) {
long sum = 0;
// Regular for loop
for (int i = 0; i < vals.length; i++) {
sum += vals[i];
}
return sum;
}
public static int sum_foreach(List<Long> vals) {
long sum = 0;
// "Foreach" loop - uses iteration
for (Long l : vals) {
sum += l;
}
return sum;
}
```
In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, an object of type
[`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html)
is allocated on the heap, and will eventually be garbage-collected. This isn't a great design;
iterators are often transient objects that you need during a function and can discard once the
function ends. Sounds exactly like the issue stack-allocated objects address, no?
In Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in
heap memory, but the iterator itself
([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. In
each of the examples below we iterate over a collection, but never use heap allocation:
```rust
use std::collections::HashMap;
// There's a lot of assembly generated, but if you search in the text,
// there are no references to `real_drop_in_place` anywhere.
pub fn sum_vec(x: &Vec<u32>) {
let mut s = 0;
// Basic iteration over vectors doesn't need allocation
for y in x {
s += y;
}
}
pub fn sum_enumerate(x: &Vec<u32>) {
let mut s = 0;
// More complex iterators are just fine too
for (_i, y) in x.iter().enumerate() {
s += y;
}
}
pub fn sum_hm(x: &HashMap<u32, u32>) {
let mut s = 0;
// And it's not just Vec, all types will allocate the iterator
// on stack memory
for y in x.values() {
s += y;
}
}
```
-- [Compiler Explorer](https://godbolt.org/z/FTT3CT)

View File

@ -0,0 +1,254 @@
---
layout: post
title: "Dynamic Memory: A Heaping Helping"
description: "The reason Rust exists."
category:
tags: [rust, understanding-allocations]
---
Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and
some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, how
the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_.
And as the docs mention, ownership
[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html).
The heap is used in two situations; when the compiler is unable to predict either the _total size of
memory needed_, or _how long the memory is needed for_, it allocates space in the heap. This happens
pretty frequently; if you want to download the Google home page, you won't know how large it is
until your program runs. And when you're finished with Google, we deallocate the memory so it can be
used to store other webpages. If you're interested in a slightly longer explanation of the heap,
check out
[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap)
in Rust's documentation.
We won't go into detail on how the heap is managed; the
[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a
phenomenal job explaining both the "why" and "how" of memory management. Instead, we're going to
focus on understanding "when" heap allocations occur in Rust.
To start off, take a guess for how many allocations happen in the program below:
```rust
fn main() {}
```
It's obviously a trick question; while no heap allocations occur as a result of that code, the setup
needed to call `main` does allocate on the heap. Here's a way to show it:
```rust
#![feature(integer_atomics)]
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicU64, Ordering};
static ALLOCATION_COUNT: AtomicU64 = AtomicU64::new(0);
struct CountingAllocator;
unsafe impl GlobalAlloc for CountingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
System.dealloc(ptr, layout);
}
}
#[global_allocator]
static A: CountingAllocator = CountingAllocator;
fn main() {
let x = ALLOCATION_COUNT.fetch_add(0, Ordering::SeqCst);
println!("There were {} allocations before calling main!", x);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e)
As of the time of writing, there are five allocations that happen before `main` is ever called.
But when we want to understand more practically where heap allocation happens, we'll follow this
guide:
- Smart pointers hold their contents in the heap
- Collections are smart pointers for many objects at a time, and reallocate when they need to grow
Finally, there are two "addendum" issues that are important to address when discussing Rust and the
heap:
- Non-heap alternatives to many standard library types are available.
- Special allocators to track memory behavior should be used to benchmark code.
# Smart pointers
The first thing to note are the "smart pointer" types. When you have data that must outlive the
scope in which it is declared, or your data is of unknown or dynamic size, you'll make use of these
types.
The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) comes from C++, and while it's
closely linked to a general design pattern of
["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii), we'll
use it here specifically to describe objects that are responsible for managing ownership of data
allocated on the heap. The smart pointers available in the `alloc` crate should look mostly
familiar:
- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)
- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)
- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)
- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html)
The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers to manage
heap objects, though more than can be covered here. Some examples are:
- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)
- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)
Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): **cell types**
(like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) look and behave
similarly, but **don't involve heap allocation**. The
[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) have more information.
When a smart pointer is created, the data it is given is placed in heap memory and the location of
that data is recorded in the smart pointer. Once the smart pointer has determined it's safe to
deallocate that memory (when a `Box` has
[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a reference count
[goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can
prove these types use heap memory by looking at code:
```rust
use std::rc::Rc;
use std::sync::Arc;
use std::borrow::Cow;
pub fn my_box() {
// Drop at assembly line 1640
Box::new(0);
}
pub fn my_rc() {
// Drop at assembly line 1650
Rc::new(0);
}
pub fn my_arc() {
// Drop at assembly line 1660
Arc::new(0);
}
pub fn my_cow() {
// Drop at assembly line 1672
Cow::from("drop");
}
```
-- [Compiler Explorer](https://godbolt.org/z/4AMQug)
# Collections
Collection types use heap memory because their contents have dynamic size; they will request more
memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can
[release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it's
no longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In
a way, **collections are smart pointers for many objects at a time**. Common types that fall under
this umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),
[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and
[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not
[`str`](https://doc.rust-lang.org/std/primitive.str.html)).
While collections store the objects they own in heap memory, _creating new collections will not
allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the assembly shows a
corresponding call to `real_drop_in_place`:
```rust
pub fn my_vec() {
// Drop in place at line 481
Vec::<u8>::new();
}
```
-- [Compiler Explorer](https://godbolt.org/z/1WkNtC)
But because the vector has no elements to manage, no calls to the allocator will ever be dispatched:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// Interesting bit happens here
let x: Vec<u8> = Vec::new();
drop(x);
// Turn panicking back off, some deallocations occur
// after main as well.
DO_PANIC.store(false, Ordering::SeqCst);
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6)
Other standard library types follow the same behavior; make sure to check out
[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new),
and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new).
# Heap Alternatives
While it is a bit strange to speak of the stack after spending time with the heap, it's worth
pointing out that some heap-allocated objects in Rust have stack-based counterparts provided by
other crates. If you have need of the functionality, but want to avoid allocating, there are
typically alternatives available.
When it comes to some standard library smart pointers
([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and
[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are
provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and
[spin](https://crates.io/crates/spin). You can check out
[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html),
[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and
[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you're in need
of synchronization primitives.
[thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator
because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) uses a
[`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36)
that needs heap allocation.
# Tracing Allocators
When writing performance-sensitive code, there's no alternative to measuring your code. If you
didn't write a benchmark,
[you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263)
You should never rely on your instincts when
[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM).
Similarly, there's great work going on in Rust with allocators that keep track of what they're doing
(like [`alloc_counter`](https://crates.io/crates/alloc_counter)). When it comes to tracking heap
behavior, it's easy to make mistakes; please write tests and make sure you have tools to guard
against future issues.

View File

@ -0,0 +1,258 @@
---
slug: 2019/02/a-heaping-helping
title: "Allocations in Rust: Dynamic memory"
date: 2019-02-07 12:00:00
authors: [bspeice]
tags: []
---
Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and
some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, how
the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_.
And as the docs mention, ownership
[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html).
The heap is used in two situations; when the compiler is unable to predict either the _total size of
memory needed_, or _how long the memory is needed for_, it allocates space in the heap.
<!-- truncate -->
This happens
pretty frequently; if you want to download the Google home page, you won't know how large it is
until your program runs. And when you're finished with Google, we deallocate the memory so it can be
used to store other webpages. If you're interested in a slightly longer explanation of the heap,
check out
[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap)
in Rust's documentation.
We won't go into detail on how the heap is managed; the
[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a
phenomenal job explaining both the "why" and "how" of memory management. Instead, we're going to
focus on understanding "when" heap allocations occur in Rust.
To start off, take a guess for how many allocations happen in the program below:
```rust
fn main() {}
```
It's obviously a trick question; while no heap allocations occur as a result of that code, the setup
needed to call `main` does allocate on the heap. Here's a way to show it:
```rust
#![feature(integer_atomics)]
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicU64, Ordering};
static ALLOCATION_COUNT: AtomicU64 = AtomicU64::new(0);
struct CountingAllocator;
unsafe impl GlobalAlloc for CountingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
System.dealloc(ptr, layout);
}
}
#[global_allocator]
static A: CountingAllocator = CountingAllocator;
fn main() {
let x = ALLOCATION_COUNT.fetch_add(0, Ordering::SeqCst);
println!("There were {} allocations before calling main!", x);
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e)
As of the time of writing, there are five allocations that happen before `main` is ever called.
But when we want to understand more practically where heap allocation happens, we'll follow this
guide:
- Smart pointers hold their contents in the heap
- Collections are smart pointers for many objects at a time, and reallocate when they need to grow
Finally, there are two "addendum" issues that are important to address when discussing Rust and the
heap:
- Non-heap alternatives to many standard library types are available.
- Special allocators to track memory behavior should be used to benchmark code.
## Smart pointers
The first thing to note are the "smart pointer" types. When you have data that must outlive the
scope in which it is declared, or your data is of unknown or dynamic size, you'll make use of these
types.
The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) comes from C++, and while it's
closely linked to a general design pattern of
["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii), we'll
use it here specifically to describe objects that are responsible for managing ownership of data
allocated on the heap. The smart pointers available in the `alloc` crate should look mostly
familiar:
- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)
- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)
- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)
- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html)
The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers to manage
heap objects, though more than can be covered here. Some examples are:
- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)
- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)
Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): **cell types**
(like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) look and behave
similarly, but **don't involve heap allocation**. The
[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) have more information.
When a smart pointer is created, the data it is given is placed in heap memory and the location of
that data is recorded in the smart pointer. Once the smart pointer has determined it's safe to
deallocate that memory (when a `Box` has
[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a reference count
[goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can
prove these types use heap memory by looking at code:
```rust
use std::rc::Rc;
use std::sync::Arc;
use std::borrow::Cow;
pub fn my_box() {
// Drop at assembly line 1640
Box::new(0);
}
pub fn my_rc() {
// Drop at assembly line 1650
Rc::new(0);
}
pub fn my_arc() {
// Drop at assembly line 1660
Arc::new(0);
}
pub fn my_cow() {
// Drop at assembly line 1672
Cow::from("drop");
}
```
-- [Compiler Explorer](https://godbolt.org/z/4AMQug)
## Collections
Collection types use heap memory because their contents have dynamic size; they will request more
memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can
[release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it's
no longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In
a way, **collections are smart pointers for many objects at a time**. Common types that fall under
this umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),
[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and
[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not
[`str`](https://doc.rust-lang.org/std/primitive.str.html)).
While collections store the objects they own in heap memory, _creating new collections will not
allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the assembly shows a
corresponding call to `real_drop_in_place`:
```rust
pub fn my_vec() {
// Drop in place at line 481
Vec::<u8>::new();
}
```
-- [Compiler Explorer](https://godbolt.org/z/1WkNtC)
But because the vector has no elements to manage, no calls to the allocator will ever be dispatched:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// Interesting bit happens here
let x: Vec<u8> = Vec::new();
drop(x);
// Turn panicking back off, some deallocations occur
// after main as well.
DO_PANIC.store(false, Ordering::SeqCst);
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
--
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6)
Other standard library types follow the same behavior; make sure to check out
[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new),
and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new).
## Heap Alternatives
While it is a bit strange to speak of the stack after spending time with the heap, it's worth
pointing out that some heap-allocated objects in Rust have stack-based counterparts provided by
other crates. If you have need of the functionality, but want to avoid allocating, there are
typically alternatives available.
When it comes to some standard library smart pointers
([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and
[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are
provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and
[spin](https://crates.io/crates/spin). You can check out
[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html),
[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and
[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you're in need
of synchronization primitives.
[thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator
because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) uses a
[`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36)
that needs heap allocation.
## Tracing Allocators
When writing performance-sensitive code, there's no alternative to measuring your code. If you
didn't write a benchmark,
[you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263)
You should never rely on your instincts when
[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM).
Similarly, there's great work going on in Rust with allocators that keep track of what they're doing
(like [`alloc_counter`](https://crates.io/crates/alloc_counter)). When it comes to tracking heap
behavior, it's easy to make mistakes; please write tests and make sure you have tools to guard
against future issues.

View File

@ -0,0 +1,148 @@
---
layout: post
title: "Compiler Optimizations: What It's Done Lately"
description: "A lot. The answer is a lot."
category:
tags: [rust, understanding-allocations]
---
**Update 2019-02-10**: When debugging a
[related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the
original code worked because LLVM optimized out the entire function, rather than just the allocation
segments. The code has been updated with proper use of
[`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section
on vector capacity has been removed.
---
Up to this point, we've been discussing memory usage in the Rust language by focusing on simple
rules that are mostly right for small chunks of code. We've spent time showing how those rules work
themselves out in practice, and become familiar with reading the assembly code needed to see each
memory type (global, stack, heap) in action.
Throughout the series so far, we've put a handicap on the code. In the name of consistent and
understandable results, we've asked the compiler to pretty please leave the training wheels on. Now
is the time where we throw out all the rules and take off the kid gloves. As it turns out, both the
Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them
do their job.
Similar to
["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're
focusing on interesting things the Rust language (and LLVM!) can do with memory management. We'll
still be looking at assembly code to understand what's going on, but it's important to mention
again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to
double-check memory behavior if it's something you care about**. It's far too easy to mis-read
assembly in large code sections, you should always verify behavior if you care about memory usage.
The guiding principal as we move forward is this: _optimizing compilers won't produce worse programs
than we started with._ There won't be any situations where stack allocations get moved to heap
allocations. There will, however, be an opera of optimization.
# The Case of the Disappearing Box
Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently
short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the
stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important
part is that LLVM can sometimes do better than the baseline Rust language:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
pub fn cmp(x: u32) {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// The compiler is able to see through the constant `Box`
// and directly compare `x` to 24 - assembly line 73
let y = Box::new(24);
let equals = x == *y;
// This call to drop is eliminated
drop(y);
// Need to mark the comparison result as volatile so that
// LLVM doesn't strip out all the code. If `y` is marked
// volatile instead, allocation will be forced.
unsafe { std::ptr::read_volatile(&equals) };
// Turn off panicking, as there are some deallocations
// when we exit main.
DO_PANIC.store(false, Ordering::SeqCst);
}
fn main() {
cmp(12)
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
## -- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
# Dr. Array or: How I Learned to Love the Optimizer
Finally, this isn't so much about LLVM figuring out different memory behavior, but LLVM stripping
out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if
you're not careful, they can make your benchmarks look
[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the
`black_box` function (implemented in both
[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and
[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler
to disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up
running programs that previously caused errors:
```rust
#[derive(Default)]
struct TwoFiftySix {
_a: [u64; 32]
}
#[derive(Default)]
struct EightK {
_a: [TwoFiftySix; 32]
}
#[derive(Default)]
struct TwoFiftySixK {
_a: [EightK; 32]
}
#[derive(Default)]
struct EightM {
_a: [TwoFiftySixK; 32]
}
pub fn main() {
// Normally this blows up because we can't reserve size on stack
// for the `EightM` struct. But because the compiler notices we
// never do anything with `_x`, it optimizes out the stack storage
// and the program completes successfully.
let _x = EightM::default();
}
```
## -- [Compiler Explorer](https://godbolt.org/z/daHn7P)
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)

View File

@ -0,0 +1,149 @@
---
title: "Allocations in Rust: Compiler optimizations"
description: "A lot. The answer is a lot."
date: 2019-02-08 12:00:00
last_updated:
date: 2019-02-10 12:00:00
tags: []
---
Up to this point, we've been discussing memory usage in the Rust language by focusing on simple
rules that are mostly right for small chunks of code. We've spent time showing how those rules work
themselves out in practice, and become familiar with reading the assembly code needed to see each
memory type (global, stack, heap) in action.
Throughout the series so far, we've put a handicap on the code. In the name of consistent and
understandable results, we've asked the compiler to pretty please leave the training wheels on. Now
is the time where we throw out all the rules and take off the kid gloves. As it turns out, both the
Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them
do their job.
<!-- truncate -->
Similar to
["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're
focusing on interesting things the Rust language (and LLVM!) can do with memory management. We'll
still be looking at assembly code to understand what's going on, but it's important to mention
again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to
double-check memory behavior if it's something you care about**. It's far too easy to mis-read
assembly in large code sections, you should always verify behavior if you care about memory usage.
The guiding principal as we move forward is this: _optimizing compilers won't produce worse programs
than we started with._ There won't be any situations where stack allocations get moved to heap
allocations. There will, however, be an opera of optimization.
**Update 2019-02-10**: When debugging a
[related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the
original code worked because LLVM optimized out the entire function, rather than just the allocation
segments. The code has been updated with proper use of
[`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section
on vector capacity has been removed.
## The Case of the Disappearing Box
Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently
short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the
stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important
part is that LLVM can sometimes do better than the baseline Rust language:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicBool, Ordering};
pub fn cmp(x: u32) {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// The compiler is able to see through the constant `Box`
// and directly compare `x` to 24 - assembly line 73
let y = Box::new(24);
let equals = x == *y;
// This call to drop is eliminated
drop(y);
// Need to mark the comparison result as volatile so that
// LLVM doesn't strip out all the code. If `y` is marked
// volatile instead, allocation will be forced.
unsafe { std::ptr::read_volatile(&equals) };
// Turn off panicking, as there are some deallocations
// when we exit main.
DO_PANIC.store(false, Ordering::SeqCst);
}
fn main() {
cmp(12)
}
#[global_allocator]
static A: PanicAllocator = PanicAllocator;
static DO_PANIC: AtomicBool = AtomicBool::new(false);
struct PanicAllocator;
unsafe impl GlobalAlloc for PanicAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected allocation.");
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
}
System.dealloc(ptr, layout);
}
}
```
-- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
## Dr. Array or: how I learned to love the optimizer
Finally, this isn't so much about LLVM figuring out different memory behavior, but LLVM stripping
out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if
you're not careful, they can make your benchmarks look
[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the
`black_box` function (implemented in both
[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and
[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler
to disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up
running programs that previously caused errors:
```rust
#[derive(Default)]
struct TwoFiftySix {
_a: [u64; 32]
}
#[derive(Default)]
struct EightK {
_a: [TwoFiftySix; 32]
}
#[derive(Default)]
struct TwoFiftySixK {
_a: [EightK; 32]
}
#[derive(Default)]
struct EightM {
_a: [TwoFiftySixK; 32]
}
pub fn main() {
// Normally this blows up because we can't reserve size on stack
// for the `EightM` struct. But because the compiler notices we
// never do anything with `_x`, it optimizes out the stack storage
// and the program completes successfully.
let _x = EightM::default();
}
```
-- [Compiler Explorer](https://godbolt.org/z/daHn7P)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)

View File

@ -0,0 +1,35 @@
---
layout: post
title: "Summary: What are the Allocation Rules?"
description: "A synopsis and reference."
category:
tags: [rust, understanding-allocations]
---
While there's a lot of interesting detail captured in this series, it's often helpful to have a
document that answers some "yes/no" questions. You may not care about what an `Iterator` looks like
in assembly, you just need to know whether it allocates an object on the heap or not. And while Rust
will prioritize the fastest behavior it can, here are the rules for each memory type:
**Heap Allocation**:
- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory.
- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory.
- Some smart pointers in the standard library have counterparts in other crates that don't need heap
memory. If possible, use those.
**Stack Allocation**:
- Everything not using a smart pointer will be allocated on the stack.
- Structs, enums, iterators, arrays, and closures are all stack allocated.
- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated.
- Inlining (`#[inline]`) will not affect allocation behavior for better or worse.
- Types that are marked `Copy` are guaranteed to have their contents stack-allocated.
**Global Allocation**:
- `const` is a fixed value; the compiler is allowed to copy it wherever useful.
- `static` is a fixed reference; the compiler will guarantee it is unique.
![Container Sizes in Rust](/assets/images/2019-02-04-container-size.svg) --
[Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing)

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 426 KiB

View File

@ -0,0 +1,39 @@
---
slug: 2019/02/summary
title: "Allocations in Rust: Summary"
date: 2019-02-09 12:00:00
authors: [bspeice]
tags: []
---
While there's a lot of interesting detail captured in this series, it's often helpful to have a
document that answers some "yes/no" questions. You may not care about what an `Iterator` looks like
in assembly, you just need to know whether it allocates an object on the heap or not. And while Rust
will prioritize the fastest behavior it can, here are the rules for each memory type:
<!-- truncate -->
**Global Allocation**:
- `const` is a fixed value; the compiler is allowed to copy it wherever useful.
- `static` is a fixed reference; the compiler will guarantee it is unique.
**Stack Allocation**:
- Everything not using a smart pointer will be allocated on the stack.
- Structs, enums, iterators, arrays, and closures are all stack allocated.
- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated.
- Inlining (`#[inline]`) will not affect allocation behavior for better or worse.
- Types that are marked `Copy` are guaranteed to have their contents stack-allocated.
**Heap Allocation**:
- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory.
- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory.
- Some smart pointers in the standard library have counterparts in other crates that don't need heap
memory. If possible, use those.
![Container Sizes in Rust](./container-size.svg)
-- [Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing)