mirror of
https://github.com/bspeice/speice.io
synced 2024-12-22 16:48:10 -05:00
149 lines
5.6 KiB
Markdown
149 lines
5.6 KiB
Markdown
|
---
|
||
|
layout: post
|
||
|
title: "Compiler Optimizations: What It's Done Lately"
|
||
|
description: "A lot. The answer is a lot."
|
||
|
category:
|
||
|
tags: [rust, understanding-allocations]
|
||
|
---
|
||
|
|
||
|
**Update 2019-02-10**: When debugging a
|
||
|
[related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the
|
||
|
original code worked because LLVM optimized out the entire function, rather than just the allocation
|
||
|
segments. The code has been updated with proper use of
|
||
|
[`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section
|
||
|
on vector capacity has been removed.
|
||
|
|
||
|
---
|
||
|
|
||
|
Up to this point, we've been discussing memory usage in the Rust language by focusing on simple
|
||
|
rules that are mostly right for small chunks of code. We've spent time showing how those rules work
|
||
|
themselves out in practice, and become familiar with reading the assembly code needed to see each
|
||
|
memory type (global, stack, heap) in action.
|
||
|
|
||
|
Throughout the series so far, we've put a handicap on the code. In the name of consistent and
|
||
|
understandable results, we've asked the compiler to pretty please leave the training wheels on. Now
|
||
|
is the time where we throw out all the rules and take off the kid gloves. As it turns out, both the
|
||
|
Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them
|
||
|
do their job.
|
||
|
|
||
|
Similar to
|
||
|
["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're
|
||
|
focusing on interesting things the Rust language (and LLVM!) can do with memory management. We'll
|
||
|
still be looking at assembly code to understand what's going on, but it's important to mention
|
||
|
again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to
|
||
|
double-check memory behavior if it's something you care about**. It's far too easy to mis-read
|
||
|
assembly in large code sections, you should always verify behavior if you care about memory usage.
|
||
|
|
||
|
The guiding principal as we move forward is this: _optimizing compilers won't produce worse programs
|
||
|
than we started with._ There won't be any situations where stack allocations get moved to heap
|
||
|
allocations. There will, however, be an opera of optimization.
|
||
|
|
||
|
# The Case of the Disappearing Box
|
||
|
|
||
|
Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently
|
||
|
short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the
|
||
|
stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important
|
||
|
part is that LLVM can sometimes do better than the baseline Rust language:
|
||
|
|
||
|
```rust
|
||
|
use std::alloc::{GlobalAlloc, Layout, System};
|
||
|
use std::sync::atomic::{AtomicBool, Ordering};
|
||
|
|
||
|
pub fn cmp(x: u32) {
|
||
|
// Turn on panicking if we allocate on the heap
|
||
|
DO_PANIC.store(true, Ordering::SeqCst);
|
||
|
|
||
|
// The compiler is able to see through the constant `Box`
|
||
|
// and directly compare `x` to 24 - assembly line 73
|
||
|
let y = Box::new(24);
|
||
|
let equals = x == *y;
|
||
|
|
||
|
// This call to drop is eliminated
|
||
|
drop(y);
|
||
|
|
||
|
// Need to mark the comparison result as volatile so that
|
||
|
// LLVM doesn't strip out all the code. If `y` is marked
|
||
|
// volatile instead, allocation will be forced.
|
||
|
unsafe { std::ptr::read_volatile(&equals) };
|
||
|
|
||
|
// Turn off panicking, as there are some deallocations
|
||
|
// when we exit main.
|
||
|
DO_PANIC.store(false, Ordering::SeqCst);
|
||
|
}
|
||
|
|
||
|
fn main() {
|
||
|
cmp(12)
|
||
|
}
|
||
|
|
||
|
#[global_allocator]
|
||
|
static A: PanicAllocator = PanicAllocator;
|
||
|
static DO_PANIC: AtomicBool = AtomicBool::new(false);
|
||
|
struct PanicAllocator;
|
||
|
|
||
|
unsafe impl GlobalAlloc for PanicAllocator {
|
||
|
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
|
||
|
if DO_PANIC.load(Ordering::SeqCst) {
|
||
|
panic!("Unexpected allocation.");
|
||
|
}
|
||
|
System.alloc(layout)
|
||
|
}
|
||
|
|
||
|
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
|
||
|
if DO_PANIC.load(Ordering::SeqCst) {
|
||
|
panic!("Unexpected deallocation.");
|
||
|
}
|
||
|
System.dealloc(ptr, layout);
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## -- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)
|
||
|
|
||
|
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
|
||
|
|
||
|
# Dr. Array or: How I Learned to Love the Optimizer
|
||
|
|
||
|
Finally, this isn't so much about LLVM figuring out different memory behavior, but LLVM stripping
|
||
|
out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if
|
||
|
you're not careful, they can make your benchmarks look
|
||
|
[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the
|
||
|
`black_box` function (implemented in both
|
||
|
[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and
|
||
|
[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler
|
||
|
to disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up
|
||
|
running programs that previously caused errors:
|
||
|
|
||
|
```rust
|
||
|
#[derive(Default)]
|
||
|
struct TwoFiftySix {
|
||
|
_a: [u64; 32]
|
||
|
}
|
||
|
|
||
|
#[derive(Default)]
|
||
|
struct EightK {
|
||
|
_a: [TwoFiftySix; 32]
|
||
|
}
|
||
|
|
||
|
#[derive(Default)]
|
||
|
struct TwoFiftySixK {
|
||
|
_a: [EightK; 32]
|
||
|
}
|
||
|
|
||
|
#[derive(Default)]
|
||
|
struct EightM {
|
||
|
_a: [TwoFiftySixK; 32]
|
||
|
}
|
||
|
|
||
|
pub fn main() {
|
||
|
// Normally this blows up because we can't reserve size on stack
|
||
|
// for the `EightM` struct. But because the compiler notices we
|
||
|
// never do anything with `_x`, it optimizes out the stack storage
|
||
|
// and the program completes successfully.
|
||
|
let _x = EightM::default();
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## -- [Compiler Explorer](https://godbolt.org/z/daHn7P)
|
||
|
|
||
|
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)
|