diff --git a/blog/2018-10-08-case-study-optimization/_article.md b/blog/2018-10-08-case-study-optimization/_article.md new file mode 100644 index 0000000..00f24ae --- /dev/null +++ b/blog/2018-10-08-case-study-optimization/_article.md @@ -0,0 +1,168 @@ +--- +layout: post +title: "A Case Study in Heaptrack" +description: "...because you don't need no garbage collection" +category: +tags: [] +--- + +One of my earliest conversations about programming went like this: + +> Programmers have it too easy these days. They should learn to develop in low memory environments +> and be more efficient. +> +> -- My Father (paraphrased) + +...though it's not like the first code I wrote was for a +[graphing calculator](https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-se) +packing a whole 24KB of RAM. By the way, _what are you doing on my lawn?_ + +The principle remains though: be efficient with the resources you have, because +[what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html). +My professional work is focused on this kind of efficiency; low-latency financial markets demand +that you understand at a deep level _exactly_ what your code is doing. As I continue experimenting +with Rust for personal projects, it's exciting to bring a utilitarian mindset with me: there's +flexibility for the times I pretend to have a garbage collector, and flexibility for the times that +I really care about how memory is used. + +This post is a (small) case study in how I went from the former to the latter. And ultimately, it's +intended to be a starting toolkit to empower analysis of your own code. + +# Curiosity + +When I first started building the [dtparse] crate, my intention was to mirror as closely as possible +the equivalent [Python library][dateutil]. Python, as you may know, is garbage collected. Very +rarely is memory usage considered in Python, and I likewise wasn't paying too much attention when +`dtparse` was first being built. + +This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse` +hyper-efficient. But every so often, I've wondered: "what exactly is going on in memory?" With the +advent of Rust 1.28 and the +[Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really +great idea: _build a custom allocator that allows you to track your own allocations._ That way, you +can do things like writing tests for both correct results and correct memory usage. I gave it a +[shot][qadapt], but learned very quickly: **never write your own allocator**. It went from "fun +weekend project" to "I have literally no idea what my computer is doing" at breakneck speed. + +Instead, I'll highlight a separate path I took to make sense of my memory usage: [heaptrack]. + +# Turning on the System Allocator + +This is the hardest part of the post. Because Rust uses +[its own allocator](https://github.com/rust-lang/rust/pull/27400#issue-41256384) by default, +`heaptrack` is unable to properly record unmodified Rust code. To remedy this, we'll make use of the +`#[global_allocator]` attribute. + +Specifically, in `lib.rs` or `main.rs`, add this: + +```rust +use std::alloc::System; + +#[global_allocator] +static GLOBAL: System = System; +``` + +...and that's it. Everything else comes essentially for free. + +# Running heaptrack + +Assuming you've installed heaptrack (Homebrew in Mac, package manager +in Linux, ??? in Windows), all that's left is to fire up your application: + +``` +heaptrack my_application +``` + +It's that easy. After the program finishes, you'll see a file in your local directory with a name +like `heaptrack.my_appplication.XXXX.gz`. If you load that up in `heaptrack_gui`, you'll see +something like this: + +![heaptrack](/assets/images/2018-10-heaptrack/heaptrack-before.png) + +--- + +And even these pretty colors: + +![pretty colors](/assets/images/2018-10-heaptrack/heaptrack-flamegraph.png) + +# Reading Flamegraphs + +To make sense of our memory usage, we're going to focus on that last picture - it's called a +["flamegraph"](http://www.brendangregg.com/flamegraphs.html). These charts are typically used to +show how much time your program spends executing each function, but they're used here to show how +much memory was allocated during those functions instead. + +For example, we can see that all executions happened during the `main` function: + +![allocations in main](/assets/images/2018-10-heaptrack/heaptrack-main-colorized.png) + +...and within that, all allocations happened during `dtparse::parse`: + +![allocations in dtparse](/assets/images/2018-10-heaptrack/heaptrack-dtparse-colorized.png) + +...and within _that_, allocations happened in two different places: + +![allocations in parseinfo](/assets/images/2018-10-heaptrack/heaptrack-parseinfo-colorized.png) + +Now I apologize that it's hard to see, but there's one area specifically that stuck out as an issue: +**what the heck is the `Default` thing doing?** + +![pretty colors](/assets/images/2018-10-heaptrack/heaptrack-flamegraph-default.png) + +# Optimizing dtparse + +See, I knew that there were some allocations during calls to `dtparse::parse`, but I was totally +wrong about where the bulk of allocations occurred in my program. Let me post the code and see if +you can spot the mistake: + +```rust +/// Main entry point for using `dtparse`. +pub fn parse(timestr: &str) -> ParseResult<(NaiveDateTime, Option)> { + let res = Parser::default().parse( + timestr, None, None, false, false, + None, false, + &HashMap::new(), + )?; + + Ok((res.0, res.1)) +} +``` + +> [dtparse](https://github.com/bspeice/dtparse/blob/4d7c5dd99572823fa4a390b483c38ab020a2172f/src/lib.rs#L1286) + +--- + +Because `Parser::parse` requires a mutable reference to itself, I have to create a new +`Parser::default` every time it receives a string. This is excessive! We'd rather have an immutable +parser that can be re-used, and avoid allocating memory in the first place. + +Armed with that information, I put some time in to +[make the parser immutable](https://github.com/bspeice/dtparse/commit/741afa34517d6bc1155713bbc5d66905fea13fad#diff-b4aea3e418ccdb71239b96952d9cddb6). +Now that I can re-use the same parser over and over, the allocations disappear: + +![allocations cleaned up](/assets/images/2018-10-heaptrack/heaptrack-flamegraph-after.png) + +In total, we went from requiring 2 MB of memory in +[version 1.0.2](https://crates.io/crates/dtparse/1.0.2): + +![memory before](/assets/images/2018-10-heaptrack/heaptrack-closeup.png) + +All the way down to 300KB in [version 1.0.3](https://crates.io/crates/dtparse/1.0.3): + +![memory after](/assets/images/2018-10-heaptrack/heaptrack-closeup-after.png) + +# Conclusion + +In the end, you don't need to write a custom allocator to be efficient with memory, great tools +already exist to help you understand what your program is doing. + +**Use them.** + +Given that [Moore's Law](https://en.wikipedia.org/wiki/Moore%27s_law) is +[dead](https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/), we've all got to do +our part to take back what Microsoft stole. + +[dtparse]: https://crates.io/crates/dtparse +[dateutil]: https://github.com/dateutil/dateutil +[heaptrack]: https://github.com/KDE/heaptrack +[qadapt]: https://crates.io/crates/qadapt diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-after.png b/blog/2018-10-08-case-study-optimization/heaptrack-after.png new file mode 100644 index 0000000..b5f8d8f Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-after.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-before.png b/blog/2018-10-08-case-study-optimization/heaptrack-before.png new file mode 100644 index 0000000..7e994f3 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-before.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-closeup-after.png b/blog/2018-10-08-case-study-optimization/heaptrack-closeup-after.png new file mode 100644 index 0000000..51c8be6 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-closeup-after.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-closeup.png b/blog/2018-10-08-case-study-optimization/heaptrack-closeup.png new file mode 100644 index 0000000..364ee3b Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-closeup.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-dtparse-colorized.png b/blog/2018-10-08-case-study-optimization/heaptrack-dtparse-colorized.png new file mode 100644 index 0000000..d41f715 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-dtparse-colorized.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph-after.png b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph-after.png new file mode 100644 index 0000000..89a9432 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph-after.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph-default.png b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph-default.png new file mode 100644 index 0000000..b727d26 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph-default.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph.png b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph.png new file mode 100644 index 0000000..ff3b720 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph.xcf b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph.xcf new file mode 100644 index 0000000..3e804e0 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-flamegraph.xcf differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-main-colorized.png b/blog/2018-10-08-case-study-optimization/heaptrack-main-colorized.png new file mode 100644 index 0000000..5489653 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-main-colorized.png differ diff --git a/blog/2018-10-08-case-study-optimization/heaptrack-parseinfo-colorized.png b/blog/2018-10-08-case-study-optimization/heaptrack-parseinfo-colorized.png new file mode 100644 index 0000000..4d02990 Binary files /dev/null and b/blog/2018-10-08-case-study-optimization/heaptrack-parseinfo-colorized.png differ diff --git a/blog/2018-10-08-case-study-optimization/index.mdx b/blog/2018-10-08-case-study-optimization/index.mdx new file mode 100644 index 0000000..1924dd8 --- /dev/null +++ b/blog/2018-10-08-case-study-optimization/index.mdx @@ -0,0 +1,169 @@ +--- +slug: 2018/10/case-study-optimization +title: A case study in heaptrack +date: 2018-10-08 12:00:00 +authors: [bspeice] +tags: [] +--- + +I remember early in my career someone joking that: + +> Programmers have it too easy these days. They should learn to develop in low memory environments +> and be more efficient. + +...though it's not like the first code I wrote was for a +[graphing calculator](https://web.archive.org/web/20180924060530/https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-se) +packing a whole 24KB of RAM. + +But the principle remains: be efficient with the resources you have, because +[what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html). + + + +My professional work is focused on this kind of efficiency; low-latency financial markets demand +that you understand at a deep level _exactly_ what your code is doing. As I continue experimenting +with Rust for personal projects, it's exciting to bring a utilitarian mindset with me: there's +flexibility for the times I pretend to have a garbage collector, and flexibility for the times that +I really care about how memory is used. + +This post is a (small) case study in how I went from the former to the latter. And ultimately, it's +intended to be a starting toolkit to empower analysis of your own code. + +## Curiosity + +When I first started building the [dtparse] crate, my intention was to mirror as closely as possible +the equivalent [Python library][dateutil]. Python, as you may know, is garbage collected. Very +rarely is memory usage considered in Python, and I likewise wasn't paying too much attention when +`dtparse` was first being built. + +This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse` +hyper-efficient. But every so often, I've wondered: "what exactly is going on in memory?" With the +advent of Rust 1.28 and the +[Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really +great idea: _build a custom allocator that allows you to track your own allocations._ That way, you +can do things like writing tests for both correct results and correct memory usage. I gave it a +[shot][qadapt], but learned very quickly: **never write your own allocator**. It went from "fun +weekend project" to "I have literally no idea what my computer is doing" at breakneck speed. + +Instead, I'll highlight a separate path I took to make sense of my memory usage: [heaptrack]. + +## Turning on the System Allocator + +This is the hardest part of the post. Because Rust uses +[its own allocator](https://github.com/rust-lang/rust/pull/27400#issue-41256384) by default, +`heaptrack` is unable to properly record unmodified Rust code. To remedy this, we'll make use of the +`#[global_allocator]` attribute. + +Specifically, in `lib.rs` or `main.rs`, add this: + +```rust +use std::alloc::System; + +#[global_allocator] +static GLOBAL: System = System; +``` + +...and that's it. Everything else comes essentially for free. + +## Running heaptrack + +Assuming you've installed heaptrack (Homebrew in Mac, package manager +in Linux, ??? in Windows), all that's left is to fire up your application: + +``` +heaptrack my_application +``` + +It's that easy. After the program finishes, you'll see a file in your local directory with a name +like `heaptrack.my_appplication.XXXX.gz`. If you load that up in `heaptrack_gui`, you'll see +something like this: + +![heaptrack](./heaptrack-before.png) + +--- + +And even these pretty colors: + +![pretty colors](./heaptrack-flamegraph.png) + +## Reading Flamegraphs + +To make sense of our memory usage, we're going to focus on that last picture - it's called a +["flamegraph"](http://www.brendangregg.com/flamegraphs.html). These charts are typically used to +show how much time your program spends executing each function, but they're used here to show how +much memory was allocated during those functions instead. + +For example, we can see that all executions happened during the `main` function: + +![allocations in main](./heaptrack-main-colorized.png) + +...and within that, all allocations happened during `dtparse::parse`: + +![allocations in dtparse](./heaptrack-dtparse-colorized.png) + +...and within _that_, allocations happened in two different places: + +![allocations in parseinfo](./heaptrack-parseinfo-colorized.png) + +Now I apologize that it's hard to see, but there's one area specifically that stuck out as an issue: +**what the heck is the `Default` thing doing?** + +![pretty colors](./heaptrack-flamegraph-default.png) + +## Optimizing dtparse + +See, I knew that there were some allocations during calls to `dtparse::parse`, but I was totally +wrong about where the bulk of allocations occurred in my program. Let me post the code and see if +you can spot the mistake: + +```rust +/// Main entry point for using `dtparse`. +pub fn parse(timestr: &str) -> ParseResult<(NaiveDateTime, Option)> { + let res = Parser::default().parse( + timestr, None, None, false, false, + None, false, + &HashMap::new(), + )?; + + Ok((res.0, res.1)) +} +``` + +> [dtparse](https://github.com/bspeice/dtparse/blob/4d7c5dd99572823fa4a390b483c38ab020a2172f/src/lib.rs#L1286) + +--- + +Because `Parser::parse` requires a mutable reference to itself, I have to create a new +`Parser::default` every time it receives a string. This is excessive! We'd rather have an immutable +parser that can be re-used, and avoid allocating memory in the first place. + +Armed with that information, I put some time in to +[make the parser immutable](https://github.com/bspeice/dtparse/commit/741afa34517d6bc1155713bbc5d66905fea13fad#diff-b4aea3e418ccdb71239b96952d9cddb6). +Now that I can re-use the same parser over and over, the allocations disappear: + +![allocations cleaned up](./heaptrack-flamegraph-after.png) + +In total, we went from requiring 2 MB of memory in +[version 1.0.2](https://crates.io/crates/dtparse/1.0.2): + +![memory before](./heaptrack-closeup.png) + +All the way down to 300KB in [version 1.0.3](https://crates.io/crates/dtparse/1.0.3): + +![memory after](./heaptrack-closeup-after.png) + +## Conclusion + +In the end, you don't need to write a custom allocator to be efficient with memory, great tools +already exist to help you understand what your program is doing. + +**Use them.** + +Given that [Moore's Law](https://en.wikipedia.org/wiki/Moore%27s_law) is +[dead](https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/), we've all got to do +our part to take back what Microsoft stole. + +[dtparse]: https://crates.io/crates/dtparse +[dateutil]: https://github.com/dateutil/dateutil +[heaptrack]: https://github.com/KDE/heaptrack +[qadapt]: https://crates.io/crates/qadapt diff --git a/docusaurus.config.ts b/docusaurus.config.ts index 9667860..0f60ead 100644 --- a/docusaurus.config.ts +++ b/docusaurus.config.ts @@ -94,7 +94,7 @@ const config: Config = { }, ], future: { - experimental_faster: true + // experimental_faster: true } };