2019-02-03 15:55:50 -05:00
|
|
|
---
|
|
|
|
layout: post
|
|
|
|
title: "Compiler Optimizations: What It's Done Lately"
|
|
|
|
description: "A lot. The answer is a lot."
|
|
|
|
category:
|
|
|
|
tags: [rust, understanding-allocations]
|
2019-02-03 18:34:45 -05:00
|
|
|
---
|
|
|
|
|
2019-02-11 20:51:31 -05:00
|
|
|
**Update 2019-02-10**: When debugging a [related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1),
|
|
|
|
it was discovered that the original code worked because LLVM optimized out
|
|
|
|
the entire function, rather than just the allocation segments.
|
|
|
|
The code has been updated with proper use of [`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html),
|
|
|
|
and a previous section on vector capacity has been removed.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
2019-02-03 18:34:45 -05:00
|
|
|
Up to this point, we've been discussing memory usage in the Rust language
|
|
|
|
by focusing on simple rules that are mostly right for small chunks of code.
|
|
|
|
We've spent time showing how those rules work themselves out in practice,
|
|
|
|
and become familiar with reading the assembly code needed to see each memory
|
|
|
|
type (global, stack, heap) in action.
|
|
|
|
|
2019-02-11 20:51:31 -05:00
|
|
|
Throughout the series so far, we've put a handicap on the code.
|
2019-02-03 18:34:45 -05:00
|
|
|
In the name of consistent and understandable results, we've asked the
|
|
|
|
compiler to pretty please leave the training wheels on. Now is the time
|
2019-02-10 22:44:40 -05:00
|
|
|
where we throw out all the rules and take off the kid gloves. As it turns out,
|
2019-02-03 18:34:45 -05:00
|
|
|
both the Rust compiler and the LLVM optimizers are incredibly sophisticated,
|
|
|
|
and we'll step back and let them do their job.
|
|
|
|
|
2019-02-09 22:11:53 -05:00
|
|
|
Similar to ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4),
|
2019-02-09 23:20:41 -05:00
|
|
|
we're focusing on interesting things the Rust language (and LLVM!) can do
|
2019-02-10 22:44:40 -05:00
|
|
|
with memory management. We'll still be looking at assembly code to
|
2019-02-09 23:20:41 -05:00
|
|
|
understand what's going on, but it's important to mention again:
|
|
|
|
**please use automated tools like
|
|
|
|
[alloc-counter](https://crates.io/crates/alloc_counter) to double-check
|
|
|
|
memory behavior if it's something you care about**.
|
2019-02-03 18:34:45 -05:00
|
|
|
It's far too easy to mis-read assembly in large code sections, you should
|
2019-02-10 22:44:40 -05:00
|
|
|
always verify behavior if you care about memory usage.
|
2019-02-03 18:34:45 -05:00
|
|
|
|
|
|
|
The guiding principal as we move forward is this: *optimizing compilers
|
2019-02-10 22:44:40 -05:00
|
|
|
won't produce worse programs than we started with.* There won't be any
|
2019-02-03 18:34:45 -05:00
|
|
|
situations where stack allocations get moved to heap allocations.
|
2019-02-04 00:12:01 -05:00
|
|
|
There will, however, be an opera of optimization.
|
2019-02-03 18:34:45 -05:00
|
|
|
|
|
|
|
# The Case of the Disappearing Box
|
|
|
|
|
2019-02-09 23:20:41 -05:00
|
|
|
Our first optimization comes when LLVM can reason that the lifetime of an object
|
|
|
|
is sufficiently short that heap allocations aren't necessary. In these cases,
|
|
|
|
LLVM will move the allocation to the stack instead! The way this interacts
|
|
|
|
with `#[inline]` attributes is a bit opaque, but the important part is that LLVM
|
2019-02-10 22:44:40 -05:00
|
|
|
can sometimes do better than the baseline Rust language:
|
2019-02-09 23:20:41 -05:00
|
|
|
|
2019-02-05 00:10:21 -05:00
|
|
|
```rust
|
|
|
|
use std::alloc::{GlobalAlloc, Layout, System};
|
|
|
|
use std::sync::atomic::{AtomicBool, Ordering};
|
|
|
|
|
2019-02-11 20:51:31 -05:00
|
|
|
pub fn cmp(x: u32) {
|
2019-02-05 00:10:21 -05:00
|
|
|
// Turn on panicking if we allocate on the heap
|
|
|
|
DO_PANIC.store(true, Ordering::SeqCst);
|
|
|
|
|
2019-02-11 20:51:31 -05:00
|
|
|
// The compiler is able to see through the constant `Box`
|
|
|
|
// and directly compare `x` to 24 - assembly line 73
|
|
|
|
let y = Box::new(24);
|
|
|
|
let equals = x == *y;
|
|
|
|
|
|
|
|
// This call to drop is eliminated
|
|
|
|
drop(y);
|
|
|
|
|
|
|
|
// Need to mark the comparison result as volatile so that
|
|
|
|
// LLVM doesn't strip out all the code. If `y` is marked
|
|
|
|
// volatile instead, allocation will be forced.
|
|
|
|
unsafe { std::ptr::read_volatile(&equals) };
|
2019-02-09 22:11:53 -05:00
|
|
|
|
|
|
|
// Turn off panicking, as there are some deallocations
|
|
|
|
// when we exit main.
|
|
|
|
DO_PANIC.store(false, Ordering::SeqCst);
|
2019-02-05 00:10:21 -05:00
|
|
|
}
|
|
|
|
|
2019-02-09 22:11:53 -05:00
|
|
|
fn main() {
|
2019-02-11 20:51:31 -05:00
|
|
|
cmp(12)
|
2019-02-09 22:11:53 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
#[global_allocator]
|
|
|
|
static A: PanicAllocator = PanicAllocator;
|
|
|
|
static DO_PANIC: AtomicBool = AtomicBool::new(false);
|
|
|
|
struct PanicAllocator;
|
|
|
|
|
|
|
|
unsafe impl GlobalAlloc for PanicAllocator {
|
|
|
|
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
|
|
|
|
if DO_PANIC.load(Ordering::SeqCst) {
|
|
|
|
panic!("Unexpected allocation.");
|
|
|
|
}
|
|
|
|
System.alloc(layout)
|
|
|
|
}
|
|
|
|
|
|
|
|
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
|
|
|
|
if DO_PANIC.load(Ordering::SeqCst) {
|
|
|
|
panic!("Unexpected deallocation.");
|
|
|
|
}
|
|
|
|
System.dealloc(ptr, layout);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
2019-02-11 20:51:31 -05:00
|
|
|
-- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)
|
|
|
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
|
2019-02-09 22:11:53 -05:00
|
|
|
|
2019-02-03 18:34:45 -05:00
|
|
|
# Dr. Array or: How I Learned to Love the Optimizer
|
2019-02-09 22:11:53 -05:00
|
|
|
|
2019-02-09 23:20:41 -05:00
|
|
|
Finally, this isn't so much about LLVM figuring out different memory behavior,
|
2019-02-10 22:44:40 -05:00
|
|
|
but LLVM stripping out code that doesn't do anything. Optimizations of
|
2019-02-09 23:20:41 -05:00
|
|
|
this type have a lot of nuance to them; if you're not careful, they can
|
|
|
|
make your benchmarks look
|
|
|
|
[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199).
|
2019-02-10 22:44:40 -05:00
|
|
|
In Rust, the `black_box` function (implemented in both
|
2019-02-09 23:20:41 -05:00
|
|
|
[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and
|
|
|
|
[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html))
|
|
|
|
will tell the compiler to disable this kind of optimization. But if you let
|
2019-02-10 22:44:40 -05:00
|
|
|
LLVM remove unnecessary code, you can end up running programs that
|
|
|
|
previously caused errors:
|
2019-02-09 23:20:41 -05:00
|
|
|
|
2019-02-09 22:11:53 -05:00
|
|
|
```rust
|
|
|
|
#[derive(Default)]
|
|
|
|
struct TwoFiftySix {
|
|
|
|
_a: [u64; 32]
|
|
|
|
}
|
|
|
|
|
|
|
|
#[derive(Default)]
|
|
|
|
struct EightK {
|
|
|
|
_a: [TwoFiftySix; 32]
|
|
|
|
}
|
|
|
|
|
|
|
|
#[derive(Default)]
|
|
|
|
struct TwoFiftySixK {
|
|
|
|
_a: [EightK; 32]
|
|
|
|
}
|
|
|
|
|
|
|
|
#[derive(Default)]
|
|
|
|
struct EightM {
|
|
|
|
_a: [TwoFiftySixK; 32]
|
|
|
|
}
|
|
|
|
|
|
|
|
pub fn main() {
|
|
|
|
// Normally this blows up because we can't reserve size on stack
|
|
|
|
// for the `EightM` struct. But because the compiler notices we
|
|
|
|
// never do anything with `_x`, it optimizes out the stack storage
|
|
|
|
// and the program completes successfully.
|
|
|
|
let _x = EightM::default();
|
|
|
|
}
|
|
|
|
```
|
2019-02-10 22:44:40 -05:00
|
|
|
-- [Compiler Explorer](https://godbolt.org/z/daHn7P)
|
2019-02-09 22:11:53 -05:00
|
|
|
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)
|