mirror of
https://github.com/bspeice/speice.io
synced 2025-01-13 03:00:03 -05:00
Finish up static
and start on stack
This commit is contained in:
parent
00b8d05387
commit
f41bcf37dc
@ -75,7 +75,7 @@ Now let's address some conditions and caveats before going much further:
|
|||||||
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
|
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
|
||||||
(think the [libc] and [winapi] implementations of [malloc]) that we'll ignore.
|
(think the [libc] and [winapi] implementations of [malloc]) that we'll ignore.
|
||||||
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`)
|
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`)
|
||||||
and address (hehe) "release" mode at the end (`cargo run --release` and `cargo test --release`).
|
and address (pun intended) "release" mode at the end (`cargo run --release` and `cargo test --release`).
|
||||||
- All content will be run using Rust 1.31, as that's the highest currently supported in the
|
- All content will be run using Rust 1.31, as that's the highest currently supported in the
|
||||||
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid talking about things like
|
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid talking about things like
|
||||||
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
|
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
|
||||||
@ -257,7 +257,7 @@ pub fn multiply_twice(value: u32) -> u32 {
|
|||||||
value * FACTOR * FACTOR
|
value * FACTOR * FACTOR
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
-- [Compiler Explorer](https://godbolt.org/z/Qc7tHM)
|
-- [Compiler Explorer](https://odbolt.org/z/Qc7tHM)
|
||||||
|
|
||||||
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction
|
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction
|
||||||
in both the `multiply` and `multiply_twice` functions; the "1000" value is never
|
in both the `multiply` and `multiply_twice` functions; the "1000" value is never
|
||||||
@ -282,15 +282,215 @@ and `thread_local!()` macros later.
|
|||||||
|
|
||||||
More generally, `static` variables are globally unique locations in memory,
|
More generally, `static` variables are globally unique locations in memory,
|
||||||
the contents of which are loaded as part of your program being read into main memory.
|
the contents of which are loaded as part of your program being read into main memory.
|
||||||
They allow initialization with both raw values and (most) `const fn` calls, and must
|
They allow initialization with both raw values and `const fn` calls, and the initial
|
||||||
implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html)
|
value is loaded along with the program/library binary. All static variables must
|
||||||
marker trait. The initial value is loaded along with the program/library binary,
|
be of a type that implements the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html)
|
||||||
though it can change during the time your program is running. While `static mut`
|
marker trait. And while `static mut` variables are allowed, mutating a static is considered
|
||||||
variables are allowed, mutating a static is considered an `unsafe` operation.
|
an `unsafe` operation.
|
||||||
Unlike `const` though, interior mutability is accepted, and can be done with `safe` code.
|
|
||||||
|
The single biggest difference between `const` and `static` is the guarantees
|
||||||
|
provided about uniqueness. Where `const` variables may or may not be copied
|
||||||
|
in code, `static` variables are guarantee to be unique. If we take a previous
|
||||||
|
`const` example and change it to `static`, the difference should be clear:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
static FACTOR: u32 = 1000;
|
||||||
|
|
||||||
|
pub fn multiply(value: u32) -> u32 {
|
||||||
|
value * FACTOR
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn multiply_twice(value: u32) -> u32 {
|
||||||
|
value * FACTOR * FACTOR
|
||||||
|
}
|
||||||
|
```
|
||||||
|
-- [Compiler Explorer](https://godbolt.org/z/MGBr5Y)
|
||||||
|
|
||||||
|
Where [previously](https://godbolt.org/z/MGBr5Y) there were plenty of
|
||||||
|
references to multiplying by 1000, the new assembly refers to `FACTOR`
|
||||||
|
as a named memory location instead. No initialization work needs to be done,
|
||||||
|
but the compiler can no longer prove the value never changes during execution.
|
||||||
|
|
||||||
|
Next, let's talk about initialization. The simplest case is initializing
|
||||||
|
static variables with either scalar or struct notation:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Debug)]
|
||||||
|
struct MyStruct {
|
||||||
|
x: u32
|
||||||
|
}
|
||||||
|
|
||||||
|
static MY_STRUCT: MyStruct = MyStruct {
|
||||||
|
// You can even reference other statics
|
||||||
|
// declared later
|
||||||
|
x: MY_VAL
|
||||||
|
};
|
||||||
|
|
||||||
|
static MY_VAL: u32 = 24;
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
println!("Static MyStruct: {:?}", MY_STRUCT);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)
|
||||||
|
|
||||||
|
Things get a bit weirder when using `const fn`. In most cases, things just work:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Debug)]
|
||||||
|
struct MyStruct {
|
||||||
|
x: u32
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MyStruct {
|
||||||
|
const fn new() -> MyStruct {
|
||||||
|
MyStruct { x: 24 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
static MY_STRUCT: MyStruct = MyStruct::new();
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
println!("const fn Static MyStruct: {:?}", MY_STRUCT);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)
|
||||||
|
|
||||||
|
However, there's a caveat: you're currently not allowed to use `const fn` to initialize
|
||||||
|
static variables of types that aren't marked `Sync`. As an example, even though
|
||||||
|
[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new)
|
||||||
|
is `const fn`, because [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync),
|
||||||
|
you'll get an error at compile time:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use std::cell::RefCell;
|
||||||
|
|
||||||
|
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
|
||||||
|
static MY_LOCK: RefCell<u8> = RefCell::new(0);
|
||||||
|
```
|
||||||
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)
|
||||||
|
|
||||||
|
It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though,
|
||||||
|
so be on the lookout.
|
||||||
|
|
||||||
|
Which leads well to the next point: static variable types must implement the
|
||||||
|
[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html).
|
||||||
|
Because they're globally unique, it must be safe for you to access static variables
|
||||||
|
from any thread at any time. Most `struct` definitions automatically implement the
|
||||||
|
`Sync` trait because they contain only elements which themselves
|
||||||
|
implement `Sync`. This is why earlier examples could get away with initializing
|
||||||
|
statics, even though we never included an `impl Sync for MyStruct` in the code.
|
||||||
|
For more on the `Sync` trait, the [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)
|
||||||
|
has a much more thorough treatment. But as an example, Rust refuses to compile
|
||||||
|
our earlier example if we add a non-`Sync` element to the `struct` definition:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use std::cell::RefCell;
|
||||||
|
|
||||||
|
struct MyStruct {
|
||||||
|
x: u32,
|
||||||
|
y: RefCell<u8>,
|
||||||
|
}
|
||||||
|
|
||||||
|
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
|
||||||
|
static MY_STRUCT: MyStruct = MyStruct {
|
||||||
|
x: 8,
|
||||||
|
y: RefCell::new(8)
|
||||||
|
};
|
||||||
|
```
|
||||||
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)
|
||||||
|
|
||||||
|
Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation.
|
||||||
|
Unlike `const` however, interior mutability is acceptable. To demonstrate:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use std::sync::Once;
|
||||||
|
|
||||||
|
// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once
|
||||||
|
static INIT: Once = Once::new();
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
// Note that while `INIT` is declared immutable, we're still allowed
|
||||||
|
// to mutate its interior
|
||||||
|
INIT.call_once(|| println!("Initializing..."));
|
||||||
|
// This code won't panic, as the interior of INIT was modified
|
||||||
|
// as part of the previous `call_once`
|
||||||
|
INIT.call_once(|| panic!("INIT was called twice!"));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)
|
||||||
|
|
||||||
## **push** and **pop**: Stack Allocations
|
## **push** and **pop**: Stack Allocations
|
||||||
|
|
||||||
|
**const** and **static** are perfectly fine, but it's very rare that we know
|
||||||
|
at compile-time about either references or values that will be the same for the entire
|
||||||
|
time our program is running. Put another way, it's not often the case that either you
|
||||||
|
or your compiler know how much memory your entire program will need.
|
||||||
|
|
||||||
|
However, there are still some optimizations the compiler can do if it knows how much
|
||||||
|
memory individual functions will need. Specifically, the compiler can make use of
|
||||||
|
"stack" memory (as opposed to "heap" memory) which can be managed far faster in
|
||||||
|
both the short- and long-term. When requesting memory, the
|
||||||
|
[`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
|
||||||
|
can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods)
|
||||||
|
(<1 nanosecond on modern CPUs). Heap memory instead requires using an allocator
|
||||||
|
(specialized software to track what memory is in use) to reserve space.
|
||||||
|
And when you're finished with memory, the `pop` instruction likewise runs in
|
||||||
|
1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation
|
||||||
|
and other issues. All sorts of incredibly sophisticated techniques have been used
|
||||||
|
to design allocators:
|
||||||
|
- [Garbage Collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science))
|
||||||
|
strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection)
|
||||||
|
(used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html))
|
||||||
|
and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting)
|
||||||
|
(used in [Python](https://docs.python.org/3/extending/extending.html#reference-counts))
|
||||||
|
- Thread-local structures to prevent locking the allocator in [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html)
|
||||||
|
- Arena structures used in [jemalloc](http://jemalloc.net/), which until recently
|
||||||
|
was the primary allocator for Rust programs!
|
||||||
|
|
||||||
|
But no matter how sophisticated your allocator is, the principle remains: the
|
||||||
|
fastest allocator is the one you never use. As such, we're not going to go
|
||||||
|
in detail on how exactly the
|
||||||
|
[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html),
|
||||||
|
and we'll focus instead on the conditions that enable the Rust compiler to use
|
||||||
|
stack-based allocation for variables.
|
||||||
|
|
||||||
|
Now, one question I hope you're asking is "how do we distinguish stack- and
|
||||||
|
heap-based allocations in Rust code?" There are three strategies I'm going
|
||||||
|
to use for this:
|
||||||
|
|
||||||
|
1. Any time the `push` or `pop` instructions are used, or the `rsp` register is modified,
|
||||||
|
this is a stack allocation:
|
||||||
|
```rust
|
||||||
|
pub fn stack_alloc(x: u32) -> u32 {
|
||||||
|
// Space for `y` is allocated by subtracting from `rsp`,
|
||||||
|
// and then populated
|
||||||
|
let y = [1u8, 2, 3, 4];
|
||||||
|
// Space for `y` is deallocated by adding back to `rsp`
|
||||||
|
x
|
||||||
|
}
|
||||||
|
```
|
||||||
|
-- [Compiler Explorer](https://godbolt.org/z/gKFOgB)
|
||||||
|
2. Any time `call core::ptr::drop_in_place` occurs, a heap allocation has occurred
|
||||||
|
sometime in the past and it is now time for us to de-allocate the memory:
|
||||||
|
```rust
|
||||||
|
pub fn heap_alloc(x: usize) -> usize {
|
||||||
|
// Space for elements in a vector has to be allocated
|
||||||
|
// on the heap, and is then de-allocated once the
|
||||||
|
// vector goes out of scope
|
||||||
|
let y: Vec<u8> = Vec::with_capacity(x);
|
||||||
|
x
|
||||||
|
}
|
||||||
|
```
|
||||||
|
-- [Compiler Explorer](https://godbolt.org/z/T2xoh8) (`drop_in_place` happens on line 1321)
|
||||||
|
3. Using a special [`GlobalAlloc`](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html)
|
||||||
|
implementation to track when heap allocations occur. For this post, I'll be using
|
||||||
|
[qadapt](https://crates.io/crates/qadapt) to trigger a panic if heap allocations
|
||||||
|
occur; code that doesn't panic doesn't use heap allocations, and by necessity
|
||||||
|
uses stack allocation instead.
|
||||||
|
|
||||||
|
With all that in mind, let's talk about how to use the stack in Rust.
|
||||||
|
|
||||||
Example: Why doesn't `Vec::new()` go to the allocator?
|
Example: Why doesn't `Vec::new()` go to the allocator?
|
||||||
|
|
||||||
Questions:
|
Questions:
|
||||||
@ -304,6 +504,8 @@ Questions:
|
|||||||
7. Legal to pass an array as an argument?
|
7. Legal to pass an array as an argument?
|
||||||
8. Can you force a heap allocation with arrays that are larger than stack size?
|
8. Can you force a heap allocation with arrays that are larger than stack size?
|
||||||
- Check `ulimit -s`
|
- Check `ulimit -s`
|
||||||
|
9. Can you force heap allocation by returning something that escapes the stack?
|
||||||
|
- Will `#[inline(always)]` move this back to a stack allocation?
|
||||||
|
|
||||||
# Piling On - Rust and the Heap
|
# Piling On - Rust and the Heap
|
||||||
|
|
||||||
@ -322,7 +524,9 @@ Questions:
|
|||||||
|
|
||||||
# Compiler Optimizations Make Everything Complicated
|
# Compiler Optimizations Make Everything Complicated
|
||||||
|
|
||||||
Example: Compiler stripping out allocations of Box<>, Vec::push()
|
1. Box<> getting inlined into stack allocations
|
||||||
|
2. Vec::push() === Vec::with_capacity() for fixed/predictable capacities
|
||||||
|
3. Inlining statics that don't change value
|
||||||
|
|
||||||
# Appendix and Further Reading
|
# Appendix and Further Reading
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user