Actually format everything

This commit is contained in:
Bradlee Speice 2020-06-29 16:00:26 -04:00
parent e02556a770
commit b8c12b9cc1
17 changed files with 1217 additions and 889 deletions

View File

@ -6,9 +6,11 @@ category:
tags: [] tags: []
--- ---
I'll do what I can to keep this short, there's plenty of other things we both should be doing right now. I'll do what I can to keep this short, there's plenty of other things we both should be doing right
now.
If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you covered: If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you
covered:
![Saturday Bread]({{ "/assets/images/2018-05-28-bread.jpg" | absolute_url }}) ![Saturday Bread]({{ "/assets/images/2018-05-28-bread.jpg" | absolute_url }})
@ -18,15 +20,15 @@ Okay, just one:
![Bread as rock]({{ "/assets/images/2018-05-28-rocks.jpg" | absolute_url }}) ![Bread as rock]({{ "/assets/images/2018-05-28-rocks.jpg" | absolute_url }})
If you're here for keeping up with the man Bradlee Speice, got plenty of that too. Plus some up-coming If you're here for keeping up with the man Bradlee Speice, got plenty of that too. Plus some
super-nerdy posts about how I'm changing the world. up-coming super-nerdy posts about how I'm changing the world.
And if you're not here for those things: don't have a lot for you, sorry. But you're welcome to let me know And if you're not here for those things: don't have a lot for you, sorry. But you're welcome to let
what needs to change. me know what needs to change.
I'm looking forward to making this a place to talk about what's going on in life, I hope you'll stick it out with me. I'm looking forward to making this a place to talk about what's going on in life, I hope you'll
The best way to follow what's going on is on my [About](/about/) page, but if you want the joy of clicking links, stick it out with me. The best way to follow what's going on is on my [About](/about/) page, but if
here's a few good ones: you want the joy of clicking links, here's a few good ones:
- Email (people still use this?): [bradlee@speice.io](mailto:bradlee@speice.io) - Email (people still use this?): [bradlee@speice.io](mailto:bradlee@speice.io)
- Mastodon (nerd Twitter): [@bradlee](https://mastodon.social/@bradlee) - Mastodon (nerd Twitter): [@bradlee](https://mastodon.social/@bradlee)

View File

@ -20,142 +20,158 @@ what to do with your life (but you should totally keep reading).
OK, fine, I guess I should start with _why_ someone would do this. OK, fine, I guess I should start with _why_ someone would do this.
[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. [Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The
The standard library support for time in Python is kinda dope, but there are a lot of extras standard library support for time in Python is kinda dope, but there are a lot of extras that go
that go into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html) into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html)
module. `dateutil.parser` specifically is code to take all the super-weird time formats people module. `dateutil.parser` specifically is code to take all the super-weird time formats people come
come up with and turn them into something actually useful. up with and turn them into something actually useful.
Date/time parsing, it turns out, is just like everything else involving Date/time parsing, it turns out, is just like everything else involving
[computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) [computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and
and [time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): [time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it
it feels like it shouldn't be that difficult to do, until you try to do it, feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people
and you realize that people suck and this is why [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). suck and this is why
But alas, we'll try and make contemporary art out of the rubble and give it a [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But
pretentious name like _Time_. alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like
_Time_.
![A gravel mound](/assets/images/2018-06-25-gravel-mound.jpg) ![A gravel mound](/assets/images/2018-06-25-gravel-mound.jpg)
> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) > [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php)
What makes `dateutil.parser` great is that there's single function with a single argument that drives What makes `dateutil.parser` great is that there's single function with a single argument that
what programmers interact with: [`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258). drives what programmers interact with:
It takes in the time as a string, and gives you back a reasonable "look, this is the best [`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258).
anyone can possibly do to make sense of your input" value. It doesn't expect much of you. It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can
possibly do to make sense of your input" value. It doesn't expect much of you.
[And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332) [And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332)
# Lost in Translation # Lost in Translation
Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm
I'm admittedly hesitant to publish Python code that's trying to be Rust. admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can
Interestingly, Rust code can actually do a great job of mimicking Python. actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better
It's certainly not idiomatic Rust, but I've had better experiences experiences than
than [this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us) [this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us)
who attempted the same thing for D. These are the actual take-aways: who attempted the same thing for D. These are the actual take-aways:
When transcribing code, **stay as close to the original library as possible**. I'm talking When transcribing code, **stay as close to the original library as possible**. I'm talking about
about using the same variable names, same access patterns, the whole shebang. using the same variable names, same access patterns, the whole shebang. It's way too easy to make a
It's way too easy to make a couple of typos, and all of a sudden couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference
your code blows up in new and exciting ways. Having a reference manual for verbatim manual for verbatim what your code should be means that you don't spend that long debugging
what your code should be means that you don't spend that long debugging complicated logic, complicated logic, you're more looking for typos.
you're more looking for typos.
Also, **don't use nice Rust things like enums**. While Also, **don't use nice Rust things like enums**. While
[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), [one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94),
I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a boolean I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a
and I mixed up which was true, and which was false (side note: AM is false, PM is true). boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In
In general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate
the same functionality. the same functionality.
**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. **Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So
So when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out
Turns out [he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. [he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while
And while `dateutil` is pretty well-behaved about not skipping multiple stack frames, `dateutil` is pretty well-behaved about not skipping multiple stack frames,
[130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865) [130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865)
take a while to verify. take a while to verify.
As another Python quirk, **be very careful about [long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**. As another Python quirk, **be very careful about
I used to think that Python's whitespace was just there [long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**.
to get you to format your code correctly. I think that no longer. It's way too easy I used to think that Python's whitespace was just there to get you to format your code correctly. I
to close a block too early and have incredibly weird issues in the logic. Make sure you use an editor that displays think that no longer. It's way too easy to close a block too early and have incredibly weird issues
indentation levels so you can keep things straight. in the logic. Make sure you use an editor that displays indentation levels so you can keep things
straight.
**Rust macros are not free.** I originally had the **Rust macros are not free.** I originally had the
[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) [main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217)
wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. After wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile.
After
[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) [moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205)
compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code to be compiled. compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code
My new rule of thumb is that any macros longer than 10-15 lines are actually functions that need to be liberated, man. to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually
functions that need to be liberated, man.
Finally, **I really miss list comprehensions and dictionary comprehensions.** Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison,
As a quick comparison, see see
[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476) [this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476)
and [the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629). and
I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be added through macros or syntax extensions. [the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629).
Either way, they're expressive, save typing, and are super-readable. Let's get more of that. I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be
added through macros or syntax extensions. Either way, they're expressive, save typing, and are
super-readable. Let's get more of that.
# Using a young language # Using a young language
Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On
On more than one occasion though, I've had issues navigating the Rust ecosystem. more than one occasion though, I've had issues navigating the Rust ecosystem.
What I'll call the "canonical library" is still being built. In Python, if you need datetime parsing, What I'll call the "canonical library" is still being built. In Python, if you need datetime
you use `dateutil`. If you want `decimal` types, it's already in the parsing, you use `dateutil`. If you want `decimal` types, it's already in the
[standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away with `f64`, [standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away
`dateutil` uses decimals, and I wanted to follow the principle of **staying as close to the original library as possible**. with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to
Thus began my quest to find a decimal library in Rust. What I quickly found was summarized the original library as possible**. Thus began my quest to find a decimal library in Rust. What I
in a comment: quickly found was summarized in a comment:
> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard. > Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard.
> >
> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) > [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794)
In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal) In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal)
[different](https://crates.io/crates/rust_decimal) [implementations](https://crates.io/crates/decimal) [different](https://crates.io/crates/rust_decimal)
[available](https://crates.io/crates/decimate). And that's a lot of decisions to worry about [implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate).
when all I'm thinking is "why can't [calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" And that's a lot of decisions to worry about when all I'm thinking is "why can't
and I'm forced to dig through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916) [calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig
[different](https://github.com/rust-lang/rfcs/issues/334) [threads](https://github.com/rust-num/num/issues/8) through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916)
to figure out if the library I'm look at is dead or just stable. [different](https://github.com/rust-lang/rfcs/issues/334)
[threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead
or just stable.
And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. And even when the "canonical library" exists, there's no guarantees that it will be well-maintained.
[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, [Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just
and just released version 0.4.4 like two days ago. Meanwhile, [chrono-tz](https://github.com/chronotope/chrono-tz) released version 0.4.4 like two days ago. Meanwhile,
appears to be dead in the water even though [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). [chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though
I know relatively little about it, but it appears that most of the release process is automated; keeping [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I
that up to date should be a no-brainer. know relatively little about it, but it appears that most of the release process is automated;
keeping that up to date should be a no-brainer.
## Trial Maintenance Policy ## Trial Maintenance Policy
Specifically given "maintenance" being an [oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) Specifically given "maintenance" being an
[oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/)
issue, I'm going to try out the following policy to keep things moving on `dtparse`: issue, I'm going to try out the following policy to keep things moving on `dtparse`:
1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure nobody's blocking on me. 1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure
nobody's blocking on me.
2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the contributor to check in after two weeks, 2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the
and close the issue without resolution if I hear nothing back after a month. contributor to check in after two weeks, and close the issue without resolution if I hear nothing
back after a month.
The second point I think has the potential to be a bit controversial, so I'm happy to receive feedback on that. The second point I think has the potential to be a bit controversial, so I'm happy to receive
And if a contributor responds with "hey, still working on it, had a kid and I'm running on 30 seconds of sleep a night," feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm
then first: congratulations on sustaining human life. And second: I don't mind keeping those requests going indefinitely. running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And
I just want to try and balance keeping things moving with giving people the necessary time they need. second: I don't mind keeping those requests going indefinitely. I just want to try and balance
keeping things moving with giving people the necessary time they need.
I should also note that I'm still getting some best practices in place - CONTRIBUTING and CONTRIBUTORS files I should also note that I'm still getting some best practices in place - CONTRIBUTING and
need to be added, as well as issue/PR templates. In progress. None of us are perfect. CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are
perfect.
# Roadmap and Conclusion # Roadmap and Conclusion
So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not
nearly ambitious enough. nearly ambitious enough.
Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command
command can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a
coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it
doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime) doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime)
could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each other? could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each
other?
All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month
on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going to take it hard. on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going
to take it hard.
And in the mean time, I'm looking forward to building more. Onwards. And in the mean time, I'm looking forward to building more. Onwards.

View File

@ -15,22 +15,23 @@ fn main() {
} }
``` ```
And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. The
The reason I was so surprised has to do with how Rust treats a special category of things reason I was so surprised has to do with how Rust treats a special category of things I'm going to
I'm going to call _primitives_. In the current version of the Rust book, you'll see them call _primitives_. In the current version of the Rust book, you'll see them referred to as
referred to as [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], but
but we're going to stick with the name _primitive_ for the time being. Explaining we're going to stick with the name _primitive_ for the time being. Explaining why this program is so
why this program is so cool requires talking about a number of other programming languages, cool requires talking about a number of other programming languages, and keeping a consistent
and keeping a consistent terminology makes things easier. terminology makes things easier.
**You've been warned:** this is going to be a tedious post about a relatively minor issue that involves **You've been warned:** this is going to be a tedious post about a relatively minor issue that
Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking about with assembly. involves Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking
about with assembly.
# Defining primitives (Java) # Defining primitives (Java)
The reason I'm using the name _primitive_ comes from how much of my life is Java right now. The reason I'm using the name _primitive_ comes from how much of my life is Java right now. Spoiler
Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a special
special name for some specific types of values: name for some specific types of values:
> ``` > ```
> bool char byte > bool char byte
@ -70,8 +71,8 @@ the fields and methods they define can be used. In contrast, _primitive types ar
there's nothing to be dereferenced. In memory, they're just a sequence of bits. there's nothing to be dereferenced. In memory, they're just a sequence of bits.
If we really want, we can turn the `int` into an If we really want, we can turn the `int` into an
[`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then [`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference
dereference it, but it's a bit wasteful: it, but it's a bit wasteful:
```java ```java
class Main { class Main {
@ -89,9 +90,9 @@ differently, but we have to dig into the low-level details to see it in action.
# Low Level Handling of Primitives (C) # Low Level Handling of Primitives (C)
We first need to build a foundation for reading and understanding the assembly code the We first need to build a foundation for reading and understanding the assembly code the final answer
final answer requires. Let's begin with showing how the `C` language (and your computer) requires. Let's begin with showing how the `C` language (and your computer) thinks about "primitive"
thinks about "primitive" values in memory: values in memory:
```c ```c
void my_function(int num) {} void my_function(int num) {}
@ -102,8 +103,9 @@ int main() {
} }
``` ```
The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the
the assembly-level code that's generated: <span style="font-size:.6em">whose output has been lightly edited</span> assembly-level code that's generated: <span style="font-size:.6em">whose output has been lightly
edited</span>
```nasm ```nasm
main: main:
@ -139,8 +141,9 @@ my_function:
ret ret
``` ```
At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; nothing crazy. At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction;
But to show how similar Rust is, let's take a look at our program translated from C to Rust: nothing crazy. But to show how similar Rust is, let's take a look at our program translated from C
to Rust:
```rust ```rust
fn my_function(x: i32) {} fn my_function(x: i32) {}
@ -151,8 +154,9 @@ fn main() {
} }
``` ```
And the assembly generated when we stick it in the [compiler explorer](https://godbolt.org/z/cAlmk0): And the assembly generated when we stick it in the
<span style="font-size:.6em">again, lightly edited</span> [compiler explorer](https://godbolt.org/z/cAlmk0): <span style="font-size:.6em">again, lightly
edited</span>
```nasm ```nasm
example::main: example::main:
@ -178,22 +182,23 @@ example::my_function:
ret ret
``` ```
The generated Rust assembly is functionally pretty close to the C assembly: The generated Rust assembly is functionally pretty close to the C assembly: _When working with
_When working with primitives, we're just dealing with bits in memory_. primitives, we're just dealing with bits in memory_.
In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to
exactly is going on with this `.to_string()` function call? dereference. So what exactly is going on with this `.to_string()` function call?
# impl primitive (and Python) # impl primitive (and Python)
Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: _Rust has Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this
implementations for its primitive types._ That's right, `impl` blocks aren't only for `structs` and `traits`, together: _Rust has implementations for its primitive types._ That's right, `impl` blocks aren't
primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html), only for `structs` and `traits`, primitives get them too. Don't believe me? Check out
[f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html) [u32](https://doc.rust-lang.org/std/primitive.u32.html),
as examples. [f64](https://doc.rust-lang.org/std/primitive.f64.html) and
[char](https://doc.rust-lang.org/std/primitive.char.html) as examples.
But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out the But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out
[compiler explorer](https://godbolt.org/z/6LBEwq) once again: the [compiler explorer](https://godbolt.org/z/6LBEwq) once again:
```rust ```rust
pub fn main() { pub fn main() {
@ -220,16 +225,16 @@ example::main:
``` ```
Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling
`to_string()` as a function that exists all on its own, and giving it the instance of `8`**. `to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead
Instead of thinking of the value 8 as an instance of `u32` and then peeking in to find of thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the
the location of the function we want to call (like Java), we have a function that exists function we want to call (like Java), we have a function that exists outside of the instance and
outside of the instance and just give that function the value `8`. just give that function the value `8`.
This is an incredibly technical detail, but the interesting idea I had was this: This is an incredibly technical detail, but the interesting idea I had was this: _if `to_string()`
_if `to_string()` is a static function, can I refer to the unbound function and give it an instance?_ is a static function, can I refer to the unbound function and give it an instance?_
Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I
because I seriously love this thing): seriously love this thing):
```rust ```rust
struct MyVal { struct MyVal {
@ -255,8 +260,8 @@ Rust is totally fine "binding" the function call to the instance, and also as a
MIND == BLOWN. MIND == BLOWN.
Python does the same thing where I can both call functions bound to their instances Python does the same thing where I can both call functions bound to their instances and also call as
and also call as an unbound function where I give it the instance: an unbound function where I give it the instance:
```python ```python
class MyClass(): class MyClass():
@ -297,18 +302,18 @@ SyntaxError: invalid syntax
...but in practice it's a bit complicated. ...but in practice it's a bit complicated.
So while Python handles binding instance methods in a way similar to Rust, it's still not able So while Python handles binding instance methods in a way similar to Rust, it's still not able to
to run the example we started with. run the example we started with.
# Conclusion # Conclusion
This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor details This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor
like primitives leads to really cool effects. Primitives are optimized like C in how they have a details like primitives leads to really cool effects. Primitives are optimized like C in how they
space-efficient memory layout, yet the language still has a lot of features I enjoy in Python have a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python
(like both instance and late binding). (like both instance and late binding).
And when you put it together, there are areas where Rust does cool things nobody else can; And when you put it together, there are areas where Rust does cool things nobody else can; as a
as a quirky feature of Rust's type system, `8.to_string()` is actually valid code. quirky feature of Rust's type system, `8.to_string()` is actually valid code.
Now go forth and fool your friends into thinking you know assembly. This is all I've got. Now go forth and fool your friends into thinking you know assembly. This is all I've got.

View File

@ -7,29 +7,32 @@ tags: [rust, javascript, webassembly]
--- ---
Forgive me, but this is going to be a bit of a schizophrenic post. I both despise Javascript and the Forgive me, but this is going to be a bit of a schizophrenic post. I both despise Javascript and the
modern ECMAScript ecosystem, and I'm stunned by its success doing some really cool things. modern ECMAScript ecosystem, and I'm stunned by its success doing some really cool things. It's
It's [this duality](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) [this duality](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) that's
that's led me to a couple of (very) late nights over the past weeks trying to reconcile myself as led me to a couple of (very) late nights over the past weeks trying to reconcile myself as I
I bootstrap a simple desktop application. bootstrap a simple desktop application.
See, as much as [Webassembly isn't trying to replace Javascript](https://webassembly.org/docs/faq/#is-webassembly-trying-to-replace-javascript), See, as much as
**I want Javascript gone**. There are plenty of people who don't share my views, and they are probably [Webassembly isn't trying to replace Javascript](https://webassembly.org/docs/faq/#is-webassembly-trying-to-replace-javascript),
nicer and more fun at parties. But I cringe every time "Webpack" is mentioned, and I think it's hilarious **I want Javascript gone**. There are plenty of people who don't share my views, and they are
that the [language specification](https://ecma-international.org/publications/standards/Ecma-402.htm) probably nicer and more fun at parties. But I cringe every time "Webpack" is mentioned, and I think
dramatically outpaces anyone's [actual implementation](https://kangax.github.io/compat-table/es2016plus/). it's hilarious that the
The answer to this conundrum is of course to recompile code from newer versions of the language to [language specification](https://ecma-international.org/publications/standards/Ecma-402.htm)
older versions _of the same language_ before running. At least [Babel] is a nice tongue-in-cheek reference. dramatically outpaces anyone's
[actual implementation](https://kangax.github.io/compat-table/es2016plus/). The answer to this
conundrum is of course to recompile code from newer versions of the language to older versions _of
the same language_ before running. At least [Babel] is a nice tongue-in-cheek reference.
Yet for as much hate as [Electron] receives, it does a stunningly good job at solving Yet for as much hate as [Electron] receives, it does a stunningly good job at solving a really hard
a really hard problem: _how the hell do I put a button on the screen and react when the user clicks it_? problem: _how the hell do I put a button on the screen and react when the user clicks it_? GUI
GUI programming is hard, straight up. But if browsers are already able to run everywhere, why don't programming is hard, straight up. But if browsers are already able to run everywhere, why don't we
we take advantage of someone else solving the hard problems for us? I don't like that I have to use take advantage of someone else solving the hard problems for us? I don't like that I have to use
Javascript for it, but I really don't feel inclined to whip out good ol' [wxWidgets]. Javascript for it, but I really don't feel inclined to whip out good ol' [wxWidgets].
Now there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxrust]), Now there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxrust]), but
but those also have their own issues with distribution, styling, etc. With Electron, I can those also have their own issues with distribution, styling, etc. With Electron, I can
`yarn create electron-app my-app` and just get going, knowing that packaging/upgrades/etc. `yarn create electron-app my-app` and just get going, knowing that packaging/upgrades/etc. are built
are built in. in.
My question is: given recent innovations with WASM, _are we Electron yet_? My question is: given recent innovations with WASM, _are we Electron yet_?
@ -39,25 +42,28 @@ Instead, **what would it take to get to a point where we can skip Javascript in
# Setting the Stage # Setting the Stage
Truth is, WASM/Webassembly is a pretty new technology and I'm a total beginner in this area. Truth is, WASM/Webassembly is a pretty new technology and I'm a total beginner in this area. There
There may already be solutions to the issues I discuss, but I'm totally unaware of them, may already be solutions to the issues I discuss, but I'm totally unaware of them, so I'm going to
so I'm going to try and organize what I did manage to discover. try and organize what I did manage to discover.
I should also mention that the content and things I'm talking about here are not intended to be prescriptive, I should also mention that the content and things I'm talking about here are not intended to be
but more "if someone else is interested, what do we already know doesn't work?" _I expect everything in this post to be obsolete prescriptive, but more "if someone else is interested, what do we already know doesn't work?" _I
within two months._ Even over the course of writing this, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/) expect everything in this post to be obsolete within two months._ Even over the course of writing
had to be modified because [upstream changes](https://github.com/WebAssembly/binaryen/pull/1642) this, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/) had
broke a [Rust tool](https://github.com/rustwasm/wasm-bindgen/pull/787) the post tried to use. to be modified because [upstream changes](https://github.com/WebAssembly/binaryen/pull/1642) broke a
The post ultimately [got updated](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/#comment-477), [Rust tool](https://github.com/rustwasm/wasm-bindgen/pull/787) the post tried to use. The post
**but all this happened within the span of a week.** Things are moving quickly. ultimately
[got updated](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/#comment-477), **but
all this happened within the span of a week.** Things are moving quickly.
I'll also note that we're going to skip [asm.js] and [emscripten]. Truth be told, I couldn't get either of these I'll also note that we're going to skip [asm.js] and [emscripten]. Truth be told, I couldn't get
to output anything, and so I'm just going to say [here be dragons.](https://en.wikipedia.org/wiki/Here_be_dragons) either of these to output anything, and so I'm just going to say
Everything I'm discussing here uses the `wasm32-unknown-unknown` target. [here be dragons.](https://en.wikipedia.org/wiki/Here_be_dragons) Everything I'm discussing here
uses the `wasm32-unknown-unknown` target.
The code that I _did_ get running is available [over here](https://github.com/speice-io/isomorphic-rust). The code that I _did_ get running is available
Feel free to use it as a starting point, but I'm mostly including the link as a reference for the things [over here](https://github.com/speice-io/isomorphic-rust). Feel free to use it as a starting point,
that were attempted. but I'm mostly including the link as a reference for the things that were attempted.
# An Example Running Application # An Example Running Application
@ -73,32 +79,33 @@ cd isomorphic_rust/percy
yarn install && yarn start yarn install && yarn start
``` ```
...but I wouldn't really call it a "high quality" starting point to base future work on. It's mostly there ...but I wouldn't really call it a "high quality" starting point to base future work on. It's mostly
to prove this is possible in the first place. And that's something to be proud of! there to prove this is possible in the first place. And that's something to be proud of! There's a
There's a huge amount of engineering that went into showing a window with the text "It's alive!". huge amount of engineering that went into showing a window with the text "It's alive!".
There's also a lot of usability issues that prevent me from recommending anyone try Electron and WASM apps There's also a lot of usability issues that prevent me from recommending anyone try Electron and
at the moment, and I think that's the more important thing to discuss. WASM apps at the moment, and I think that's the more important thing to discuss.
# Issue the First: Complicated Toolchains # Issue the First: Complicated Toolchains
I quickly established that [wasm-bindgen] was necessary to "link" my Rust code to Javascript. At that point I quickly established that [wasm-bindgen] was necessary to "link" my Rust code to Javascript. At
you've got an Electron app that starts an HTML page which ultimately fetches your WASM blob. To keep things simple, that point you've got an Electron app that starts an HTML page which ultimately fetches your WASM
the goal was to package everything using [webpack] so that I could just load a `bundle.js` file on the page. blob. To keep things simple, the goal was to package everything using [webpack] so that I could just
That decision was to be the last thing that kinda worked in this process. load a `bundle.js` file on the page. That decision was to be the last thing that kinda worked in
this process.
The first issue [I ran into](https://www.reddit.com/r/rust/comments/98lpun/unable_to_load_wasm_for_electron_application/) The first issue
[I ran into](https://www.reddit.com/r/rust/comments/98lpun/unable_to_load_wasm_for_electron_application/)
while attempting to bundle everything via `webpack` is a detail in the WASM spec: while attempting to bundle everything via `webpack` is a detail in the WASM spec:
> This function accepts a Response object, or a promise for one, and ... > This function accepts a Response object, or a promise for one, and ... **[if > it] does not match
> **[if it] does not match the `application/wasm` MIME type**, the returned promise > the `application/wasm` MIME type**, the returned promise will be rejected with a TypeError;
> will be rejected with a TypeError; >
> [WebAssembly - Additional Web Embedding API](https://webassembly.org/docs/web/#additional-web-embedding-api) > [WebAssembly - Additional Web Embedding API](https://webassembly.org/docs/web/#additional-web-embedding-api)
Specifically, if you try and load a WASM blob without the MIME type set, you'll get an error. Specifically, if you try and load a WASM blob without the MIME type set, you'll get an error. On the
On the web this isn't a huge issue, as the server can set MIME types when delivering the blob. web this isn't a huge issue, as the server can set MIME types when delivering the blob. With
With Electron, you're resolving things with a `file://` URL and thus can't control the MIME type: Electron, you're resolving things with a `file://` URL and thus can't control the MIME type:
![TypeError: Incorrect response MIME type. Expected 'application/wasm'.](/assets/images/2018-09-15-incorrect-MIME-type.png) ![TypeError: Incorrect response MIME type. Expected 'application/wasm'.](/assets/images/2018-09-15-incorrect-MIME-type.png)
@ -108,9 +115,10 @@ There are a couple of solutions depending on how far into the deep end you care
- Use a [custom protocol](https://electronjs.org/docs/api/protocol) and custom protocol handler - Use a [custom protocol](https://electronjs.org/docs/api/protocol) and custom protocol handler
- Host your WASM blob on a website that you resolve at runtime - Host your WASM blob on a website that you resolve at runtime
But all these are pretty bad solutions and defeat the purpose of using WASM in the first place. Instead, But all these are pretty bad solutions and defeat the purpose of using WASM in the first place.
my workaround was to [open a PR with `webpack`](https://github.com/webpack/webpack/issues/7918) and Instead, my workaround was to
use regex to remove calls to `instantiateStreaming` in the [open a PR with `webpack`](https://github.com/webpack/webpack/issues/7918) and use regex to remove
calls to `instantiateStreaming` in the
[build script](https://github.com/speice-io/isomorphic-rust/blob/master/percy/build.sh#L21-L25): [build script](https://github.com/speice-io/isomorphic-rust/blob/master/percy/build.sh#L21-L25):
```sh ```sh
@ -132,12 +140,14 @@ cargo +nightly build --target=wasm32-unknown-unknown && \
"$DIR/node_modules/webpack-cli/bin/cli.js" --mode=production "$APP_DIR/app_loader.js" -o "$APP_DIR/bundle.js" "$DIR/node_modules/webpack-cli/bin/cli.js" --mode=production "$APP_DIR/app_loader.js" -o "$APP_DIR/bundle.js"
``` ```
But we're not done yet! After we compile Rust into WASM and link WASM to Javascript (via `wasm-bindgen` and `webpack`), But we're not done yet! After we compile Rust into WASM and link WASM to Javascript (via
we still have to make an Electron app. For this purpose I used a starter app from [Electron Forge], `wasm-bindgen` and `webpack`), we still have to make an Electron app. For this purpose I used a
and then a [`prestart` script](https://github.com/speice-io/isomorphic-rust/blob/master/percy/package.json#L8) starter app from [Electron Forge], and then a
[`prestart` script](https://github.com/speice-io/isomorphic-rust/blob/master/percy/package.json#L8)
to actually handle starting the application. to actually handle starting the application.
The [final toolchain](https://github.com/speice-io/isomorphic-rust/blob/master/percy/package.json#L8) The
[final toolchain](https://github.com/speice-io/isomorphic-rust/blob/master/percy/package.json#L8)
looks something like this: looks something like this:
- `yarn start` triggers the `prestart` script - `yarn start` triggers the `prestart` script
@ -145,23 +155,24 @@ looks something like this:
- Uses `cargo` to compile the Rust code into WASM - Uses `cargo` to compile the Rust code into WASM
- Uses `wasm-bindgen` to link the WASM blob into a Javascript file with exported symbols - Uses `wasm-bindgen` to link the WASM blob into a Javascript file with exported symbols
- Uses `webpack` to bundle the page start script with the Javascript we just generated - Uses `webpack` to bundle the page start script with the Javascript we just generated
- Uses `babel` under the hood to compile the `wasm-bindgen` code down from ES6 into something browser-compatible - Uses `babel` under the hood to compile the `wasm-bindgen` code down from ES6 into something
browser-compatible
- The `start` script runs an Electron Forge handler to do some sanity checks - The `start` script runs an Electron Forge handler to do some sanity checks
- Electron actually starts - Electron actually starts
...which is complicated. I think more work needs to be done to either build a high-quality starter app that ...which is complicated. I think more work needs to be done to either build a high-quality starter
can manage these steps, or another tool that "just handles" the complexity of linking a compiled WASM file into app that can manage these steps, or another tool that "just handles" the complexity of linking a
something the Electron browser can run. compiled WASM file into something the Electron browser can run.
# Issue the Second: WASM tools in Rust # Issue the Second: WASM tools in Rust
For as much as I didn't enjoy the Javascript tooling needed to interface with Rust, the Rust-only bits aren't For as much as I didn't enjoy the Javascript tooling needed to interface with Rust, the Rust-only
any better at the moment. I get it, a lot of projects are just starting off, and that leads to a fragmented bits aren't any better at the moment. I get it, a lot of projects are just starting off, and that
ecosystem. Here's what I can recommend as a starting point: leads to a fragmented ecosystem. Here's what I can recommend as a starting point:
Don't check in your `Cargo.lock` files to version control. If there's a disagreement between the Don't check in your `Cargo.lock` files to version control. If there's a disagreement between the
version of `wasm-bindgen-cli` you have installed and the `wasm-bindgen` you're compiling with in `Cargo.lock`, version of `wasm-bindgen-cli` you have installed and the `wasm-bindgen` you're compiling with in
you get a nasty error: `Cargo.lock`, you get a nasty error:
``` ```
it looks like the Rust project used to create this wasm file was linked against it looks like the Rust project used to create this wasm file was linked against
@ -180,8 +191,9 @@ Not that I ever managed to run into this myself (_coughs nervously_).
There are two projects attempting to be "application frameworks": [percy] and [yew]. Between those, There are two projects attempting to be "application frameworks": [percy] and [yew]. Between those,
I managed to get [two](https://github.com/speice-io/isomorphic-rust/tree/master/percy) I managed to get [two](https://github.com/speice-io/isomorphic-rust/tree/master/percy)
[examples](https://github.com/speice-io/isomorphic-rust/tree/master/percy_patched_webpack) running [examples](https://github.com/speice-io/isomorphic-rust/tree/master/percy_patched_webpack) running
using `percy`, but was unable to get an [example](https://github.com/speice-io/isomorphic-rust/tree/master/yew) using `percy`, but was unable to get an
running with `yew` because of issues with "missing modules" during the `webpack` step: [example](https://github.com/speice-io/isomorphic-rust/tree/master/yew) running with `yew` because
of issues with "missing modules" during the `webpack` step:
```sh ```sh
ERROR in ./dist/electron_yew_wasm_bg.wasm ERROR in ./dist/electron_yew_wasm_bg.wasm
@ -192,9 +204,10 @@ Module not found: Error: Can't resolve 'env' in '/home/bspeice/Development/isomo
@ ./dist/app_loader.js @ ./dist/app_loader.js
``` ```
If you want to work with the browser APIs directly, your choices are [percy-webapis] or [stdweb] (or eventually [web-sys]). If you want to work with the browser APIs directly, your choices are [percy-webapis] or [stdweb] (or
See above for my `percy` examples, but when I tried [an example with `stdweb`](https://github.com/speice-io/isomorphic-rust/tree/master/stdweb), eventually [web-sys]). See above for my `percy` examples, but when I tried
I was unable to get it running: [an example with `stdweb`](https://github.com/speice-io/isomorphic-rust/tree/master/stdweb), I was
unable to get it running:
```sh ```sh
ERROR in ./dist/stdweb_electron_bg.wasm ERROR in ./dist/stdweb_electron_bg.wasm
@ -204,28 +217,30 @@ Module not found: Error: Can't resolve 'env' in '/home/bspeice/Development/isomo
@ ./dist/app_loader.js @ ./dist/app_loader.js
``` ```
At this point I'm pretty convinced that `stdweb` is causing issues for `yew` as well, but can't prove it. At this point I'm pretty convinced that `stdweb` is causing issues for `yew` as well, but can't
prove it.
I did also get a [minimal example](https://github.com/speice-io/isomorphic-rust/tree/master/minimal) running I did also get a [minimal example](https://github.com/speice-io/isomorphic-rust/tree/master/minimal)
that doesn't depend on any tools besides `wasm-bindgen`. However, it requires manually writing "`extern C`" running that doesn't depend on any tools besides `wasm-bindgen`. However, it requires manually
blocks for everything you need from the browser. Es no bueno. writing "`extern C`" blocks for everything you need from the browser. Es no bueno.
Finally, from a tools and platform view, there are two up-and-coming packages that should be mentioned: Finally, from a tools and platform view, there are two up-and-coming packages that should be
[js-sys] and [web-sys]. Their purpose is to be fundamental building blocks that exposes the browser's APIs mentioned: [js-sys] and [web-sys]. Their purpose is to be fundamental building blocks that exposes
to Rust. If you're interested in building an app framework from scratch, these should give you the most the browser's APIs to Rust. If you're interested in building an app framework from scratch, these
flexibility. I didn't touch either in my research, though I expect them to be essential long-term. should give you the most flexibility. I didn't touch either in my research, though I expect them to
be essential long-term.
So there's a lot in play from the Rust side of things, and it's just going to take some time to figure out So there's a lot in play from the Rust side of things, and it's just going to take some time to
what works and what doesn't. figure out what works and what doesn't.
# Issue the Third: Known Unknowns # Issue the Third: Known Unknowns
Alright, so after I managed to get an application started, I stopped there. It was a good deal of effort Alright, so after I managed to get an application started, I stopped there. It was a good deal of
to chain together even a proof of concept, and at this point I'd rather learn [Typescript] than keep effort to chain together even a proof of concept, and at this point I'd rather learn [Typescript]
trying to maintain an incredibly brittle pipeline. Blasphemy, I know... than keep trying to maintain an incredibly brittle pipeline. Blasphemy, I know...
The important point I want to make is that there's a lot unknown about how any of this holds up outside The important point I want to make is that there's a lot unknown about how any of this holds up
proofs of concept. Things I didn't attempt: outside proofs of concept. Things I didn't attempt:
- Testing - Testing
- Packaging - Packaging
@ -234,23 +249,27 @@ proofs of concept. Things I didn't attempt:
# What it Would Take # What it Would Take
Much as I don't like Javascript, the tools are too shaky for me to recommend mixing Electron and WASM Much as I don't like Javascript, the tools are too shaky for me to recommend mixing Electron and
at the moment. There's a lot of innovation happening, so who knows? Someone might have an application WASM at the moment. There's a lot of innovation happening, so who knows? Someone might have an
in production a couple months from now. But at the moment, I'm personally going to stay away. application in production a couple months from now. But at the moment, I'm personally going to stay
away.
Let's finish with a wishlist then - here are the things that I think need to happen before Electron/WASM/Rust Let's finish with a wishlist then - here are the things that I think need to happen before
can become a thing: Electron/WASM/Rust can become a thing:
- Webpack still needs some updates. The necessary work is in progress, but hasn't landed yet ([#7983](https://github.com/webpack/webpack/pull/7983)) - Webpack still needs some updates. The necessary work is in progress, but hasn't landed yet
- Browser API libraries (`web-sys` and `stdweb`) need to make sure they can support running in Electron (see module error above) ([#7983](https://github.com/webpack/webpack/pull/7983))
- Browser API libraries (`web-sys` and `stdweb`) need to make sure they can support running in
Electron (see module error above)
- Projects need to stabilize. There's talk of `stdweb` being turned into a Rust API - Projects need to stabilize. There's talk of `stdweb` being turned into a Rust API
[on top of web-sys](https://github.com/rustwasm/team/issues/226#issuecomment-418475778), and percy [on top of web-sys](https://github.com/rustwasm/team/issues/226#issuecomment-418475778), and percy
[moving to web-sys](https://github.com/chinedufn/percy/issues/24), both of which are big changes [moving to web-sys](https://github.com/chinedufn/percy/issues/24), both of which are big changes
- `wasm-bindgen` is great, but still in the "move fast and break things" phase - `wasm-bindgen` is great, but still in the "move fast and break things" phase
- A good "boilerplate" app would dramatically simplify the start-up costs; - A good "boilerplate" app would dramatically simplify the start-up costs;
[electron-react-boilerplate](https://github.com/chentsulin/electron-react-boilerplate) [electron-react-boilerplate](https://github.com/chentsulin/electron-react-boilerplate) comes to
comes to mind as a good project to imitate mind as a good project to imitate
- More blog posts/contributors! I think Electron + Rust could be cool, but I have no idea what I'm doing - More blog posts/contributors! I think Electron + Rust could be cool, but I have no idea what I'm
doing
[wxwidgets]: https://wxwidgets.org/ [wxwidgets]: https://wxwidgets.org/
[libui-rs]: https://github.com/LeoTindall/libui-rs/ [libui-rs]: https://github.com/LeoTindall/libui-rs/

View File

@ -8,8 +8,8 @@ tags: []
One of my earliest conversations about programming went like this: One of my earliest conversations about programming went like this:
> Programmers have it too easy these days. They should learn to develop > Programmers have it too easy these days. They should learn to develop in low memory environments
> in low memory environments and be more efficient. > and be more efficient.
> >
> -- My Father (paraphrased) > -- My Father (paraphrased)
@ -19,27 +19,30 @@ packing a whole 24KB of RAM. By the way, _what are you doing on my lawn?_
The principle remains though: be efficient with the resources you have, because The principle remains though: be efficient with the resources you have, because
[what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html). [what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html).
My professional work is focused on this kind of efficiency; low-latency financial markets demand that My professional work is focused on this kind of efficiency; low-latency financial markets demand
you understand at a deep level _exactly_ what your code is doing. As I continue experimenting with Rust for that you understand at a deep level _exactly_ what your code is doing. As I continue experimenting
personal projects, it's exciting to bring a utilitarian mindset with me: there's flexibility for the times I pretend with Rust for personal projects, it's exciting to bring a utilitarian mindset with me: there's
to have a garbage collector, and flexibility for the times that I really care about how memory is used. flexibility for the times I pretend to have a garbage collector, and flexibility for the times that
I really care about how memory is used.
This post is a (small) case study in how I went from the former to the latter. And ultimately, it's intended This post is a (small) case study in how I went from the former to the latter. And ultimately, it's
to be a starting toolkit to empower analysis of your own code. intended to be a starting toolkit to empower analysis of your own code.
# Curiosity # Curiosity
When I first started building the [dtparse] crate, my intention was to mirror as closely as possible When I first started building the [dtparse] crate, my intention was to mirror as closely as possible
the equivalent [Python library][dateutil]. Python, as you may know, is garbage collected. Very rarely is memory the equivalent [Python library][dateutil]. Python, as you may know, is garbage collected. Very
usage considered in Python, and I likewise wasn't paying too much attention when `dtparse` was first being built. rarely is memory usage considered in Python, and I likewise wasn't paying too much attention when
`dtparse` was first being built.
This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse` hyper-efficient. This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse`
But every so often, I've wondered: "what exactly is going on in memory?" With the advent of Rust 1.28 and the hyper-efficient. But every so often, I've wondered: "what exactly is going on in memory?" With the
[Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really great idea: advent of Rust 1.28 and the
_build a custom allocator that allows you to track your own allocations._ That way, you can do things like [Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really
writing tests for both correct results and correct memory usage. I gave it a [shot][qadapt], but learned great idea: _build a custom allocator that allows you to track your own allocations._ That way, you
very quickly: **never write your own allocator**. It went from "fun weekend project" to can do things like writing tests for both correct results and correct memory usage. I gave it a
"I have literally no idea what my computer is doing" at breakneck speed. [shot][qadapt], but learned very quickly: **never write your own allocator**. It went from "fun
weekend project" to "I have literally no idea what my computer is doing" at breakneck speed.
Instead, I'll highlight a separate path I took to make sense of my memory usage: [heaptrack]. Instead, I'll highlight a separate path I took to make sense of my memory usage: [heaptrack].
@ -47,8 +50,8 @@ Instead, I'll highlight a separate path I took to make sense of my memory usage:
This is the hardest part of the post. Because Rust uses This is the hardest part of the post. Because Rust uses
[its own allocator](https://github.com/rust-lang/rust/pull/27400#issue-41256384) by default, [its own allocator](https://github.com/rust-lang/rust/pull/27400#issue-41256384) by default,
`heaptrack` is unable to properly record unmodified Rust code. To remedy this, we'll make use `heaptrack` is unable to properly record unmodified Rust code. To remedy this, we'll make use of the
of the `#[global_allocator]` attribute. `#[global_allocator]` attribute.
Specifically, in `lib.rs` or `main.rs`, add this: Specifically, in `lib.rs` or `main.rs`, add this:
@ -63,8 +66,8 @@ static GLOBAL: System = System;
# Running heaptrack # Running heaptrack
Assuming you've installed heaptrack <span style="font-size: .6em;">(Homebrew in Mac, package manager in Linux, ??? in Windows)</span>, Assuming you've installed heaptrack <span style="font-size: .6em;">(Homebrew in Mac, package manager
all that's left is to fire up your application: in Linux, ??? in Windows)</span>, all that's left is to fire up your application:
``` ```
heaptrack my_application heaptrack my_application
@ -84,10 +87,10 @@ And even these pretty colors:
# Reading Flamegraphs # Reading Flamegraphs
To make sense of our memory usage, we're going to focus on that last picture - it's called To make sense of our memory usage, we're going to focus on that last picture - it's called a
a ["flamegraph"](http://www.brendangregg.com/flamegraphs.html). These charts are typically ["flamegraph"](http://www.brendangregg.com/flamegraphs.html). These charts are typically used to
used to show how much time your program spends executing each function, but they're used here show how much time your program spends executing each function, but they're used here to show how
to show how much memory was allocated during those functions instead. much memory was allocated during those functions instead.
For example, we can see that all executions happened during the `main` function: For example, we can see that all executions happened during the `main` function:
@ -101,16 +104,16 @@ For example, we can see that all executions happened during the `main` function:
![allocations in parseinfo](/assets/images/2018-10-heaptrack/heaptrack-parseinfo-colorized.png) ![allocations in parseinfo](/assets/images/2018-10-heaptrack/heaptrack-parseinfo-colorized.png)
Now I apologize that it's hard to see, but there's one area specifically that stuck out Now I apologize that it's hard to see, but there's one area specifically that stuck out as an issue:
as an issue: **what the heck is the `Default` thing doing?** **what the heck is the `Default` thing doing?**
![pretty colors](/assets/images/2018-10-heaptrack/heaptrack-flamegraph-default.png) ![pretty colors](/assets/images/2018-10-heaptrack/heaptrack-flamegraph-default.png)
# Optimizing dtparse # Optimizing dtparse
See, I knew that there were some allocations during calls to `dtparse::parse`, See, I knew that there were some allocations during calls to `dtparse::parse`, but I was totally
but I was totally wrong about where the bulk of allocations occurred in my program. wrong about where the bulk of allocations occurred in my program. Let me post the code and see if
Let me post the code and see if you can spot the mistake: you can spot the mistake:
```rust ```rust
/// Main entry point for using `dtparse`. /// Main entry point for using `dtparse`.
@ -129,9 +132,9 @@ pub fn parse(timestr: &str) -> ParseResult<(NaiveDateTime, Option<FixedOffset>)>
--- ---
Because `Parser::parse` requires a mutable reference to itself, I have to create a new `Parser::default` Because `Parser::parse` requires a mutable reference to itself, I have to create a new
every time it receives a string. This is excessive! We'd rather have an immutable parser `Parser::default` every time it receives a string. This is excessive! We'd rather have an immutable
that can be re-used, and avoid allocating memory in the first place. parser that can be re-used, and avoid allocating memory in the first place.
Armed with that information, I put some time in to Armed with that information, I put some time in to
[make the parser immutable](https://github.com/bspeice/dtparse/commit/741afa34517d6bc1155713bbc5d66905fea13fad#diff-b4aea3e418ccdb71239b96952d9cddb6). [make the parser immutable](https://github.com/bspeice/dtparse/commit/741afa34517d6bc1155713bbc5d66905fea13fad#diff-b4aea3e418ccdb71239b96952d9cddb6).
@ -139,7 +142,8 @@ Now that I can re-use the same parser over and over, the allocations disappear:
![allocations cleaned up](/assets/images/2018-10-heaptrack/heaptrack-flamegraph-after.png) ![allocations cleaned up](/assets/images/2018-10-heaptrack/heaptrack-flamegraph-after.png)
In total, we went from requiring 2 MB of memory in [version 1.0.2](https://crates.io/crates/dtparse/1.0.2): In total, we went from requiring 2 MB of memory in
[version 1.0.2](https://crates.io/crates/dtparse/1.0.2):
![memory before](/assets/images/2018-10-heaptrack/heaptrack-closeup.png) ![memory before](/assets/images/2018-10-heaptrack/heaptrack-closeup.png)
@ -154,9 +158,9 @@ already exist to help you understand what your program is doing.
**Use them.** **Use them.**
Given that [Moore's Law](https://en.wikipedia.org/wiki/Moore%27s_law) Given that [Moore's Law](https://en.wikipedia.org/wiki/Moore%27s_law) is
is [dead](https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/), we've all got to [dead](https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/), we've all got to do
do our part to take back what Microsoft stole. our part to take back what Microsoft stole.
[dtparse]: https://crates.io/crates/dtparse [dtparse]: https://crates.io/crates/dtparse
[dateutil]: https://github.com/dateutil/dateutil [dateutil]: https://github.com/dateutil/dateutil

View File

@ -8,29 +8,27 @@ tags: []
I recently stumbled across a phenomenal small article entitled I recently stumbled across a phenomenal small article entitled
[What Startups Really Mean By "Why Should We Hire You?"](https://angel.co/blog/what-startups-really-mean-by-why-should-we-hire-you). [What Startups Really Mean By "Why Should We Hire You?"](https://angel.co/blog/what-startups-really-mean-by-why-should-we-hire-you).
Having been interviewed by smaller companies (though not exactly startups), Having been interviewed by smaller companies (though not exactly startups), the questions and
the questions and subtexts are the same. There's often a question behind subtexts are the same. There's often a question behind the question that you're actually trying to
the question that you're actually trying to answer, and I wish I answer, and I wish I spotted the nuance earlier in my career.
spotted the nuance earlier in my career.
Let me also make note of one more question/euphemism I've come across: Let me also make note of one more question/euphemism I've come across:
# How do you feel about Production Support? # How do you feel about Production Support?
**Translation**: _We're a fairly small team, and when things break on an evening/weekend/Christmas Day, **Translation**: _We're a fairly small team, and when things break on an evening/weekend/Christmas
can we call on you to be there?_ Day, can we call on you to be there?_
I've met decidedly few people in my life who truly enjoy the "ops" side of "devops". I've met decidedly few people in my life who truly enjoy the "ops" side of "devops". They're
They're incredibly good at taking an impossible problem, pre-existing knowledge of incredibly good at taking an impossible problem, pre-existing knowledge of arcane arts, and turning
arcane arts, and turning that into a functioning system at the end. And if they all that into a functioning system at the end. And if they all left for lunch, we probably wouldn't make
left for lunch, we probably wouldn't make it out the door before the zombie apocalypse. it out the door before the zombie apocalypse.
Larger organizations (in my experience, 500+ person organizations) have the luxury Larger organizations (in my experience, 500+ person organizations) have the luxury of hiring people
of hiring people who either enjoy that, or play along nicely enough that our systems who either enjoy that, or play along nicely enough that our systems keep working.
keep working.
Small teams have no such luck. If you're interviewing at a small company, especially as a Small teams have no such luck. If you're interviewing at a small company, especially as a "data
"data scientist" or other somesuch position, be aware that systems can and do spontaneously scientist" or other somesuch position, be aware that systems can and do spontaneously combust at the
combust at the most inopportune moments. most inopportune moments.
**Terrible-but-popular answers include**: _It's a part of the job, and I'm happy to contribute._ **Terrible-but-popular answers include**: _It's a part of the job, and I'm happy to contribute._

View File

@ -6,12 +6,11 @@ category:
tags: [] tags: []
--- ---
I think it's part of the human condition to ignore perfectly good advice when it comes our way. I think it's part of the human condition to ignore perfectly good advice when it comes our way. A
A bit over a month ago, I was dispensing sage wisdom for the ages: bit over a month ago, I was dispensing sage wisdom for the ages:
> I had a really great idea: build a custom allocator that allows you to track > I had a really great idea: build a custom allocator that allows you to track your own allocations.
> your own allocations. I gave it a shot, but learned very quickly: > I gave it a shot, but learned very quickly: **never write your own allocator.**
> **never write your own allocator.**
> >
> -- [me](/2018/10/case-study-optimization.html) > -- [me](/2018/10/case-study-optimization.html)
@ -25,23 +24,23 @@ And _that's_ the part I'm going to focus on.
# Why an Allocator? # Why an Allocator?
So why, after complaining about allocators, would I still want to write one? So why, after complaining about allocators, would I still want to write one? There are three reasons
There are three reasons for that: for that:
1. Allocation/dropping is slow 1. Allocation/dropping is slow
2. It's difficult to know exactly when Rust will allocate or drop, especially when using 2. It's difficult to know exactly when Rust will allocate or drop, especially when using code that
code that you did not write you did not write
3. I want automated tools to verify behavior, instead of inspecting by hand 3. I want automated tools to verify behavior, instead of inspecting by hand
When I say "slow," it's important to define the terms. If you're writing web applications, When I say "slow," it's important to define the terms. If you're writing web applications, you'll
you'll spend orders of magnitude more time waiting for the database than you will the allocator. spend orders of magnitude more time waiting for the database than you will the allocator. However,
However, there's still plenty of code where micro- or nano-seconds matter; think there's still plenty of code where micro- or nano-seconds matter; think
[finance](https://www.youtube.com/watch?v=NH1Tta7purM), [finance](https://www.youtube.com/watch?v=NH1Tta7purM),
[real-time audio](https://www.reddit.com/r/rust/comments/9hg7yj/synthesizer_progress_update/e6c291f), [real-time audio](https://www.reddit.com/r/rust/comments/9hg7yj/synthesizer_progress_update/e6c291f),
[self-driving cars](https://polysync.io/blog/session-types-for-hearty-codecs/), and [self-driving cars](https://polysync.io/blog/session-types-for-hearty-codecs/), and
[networking](https://carllerche.github.io/bytes/bytes/index.html). [networking](https://carllerche.github.io/bytes/bytes/index.html). In these situations it's simply
In these situations it's simply unacceptable for you to spend time doing things unacceptable for you to spend time doing things that are not your program, and waiting on the
that are not your program, and waiting on the allocator is not cool. allocator is not cool.
As I continue to learn Rust, it's difficult for me to predict where exactly allocations will happen. As I continue to learn Rust, it's difficult for me to predict where exactly allocations will happen.
So, I propose we play a quick trivia game: **Does this code invoke the allocator?** So, I propose we play a quick trivia game: **Does this code invoke the allocator?**
@ -54,10 +53,9 @@ fn my_function() {
} }
``` ```
**No**: Rust [knows how big](https://doc.rust-lang.org/std/mem/fn.size_of.html) **No**: Rust [knows how big](https://doc.rust-lang.org/std/mem/fn.size_of.html) the `Vec` type is,
the `Vec` type is, and reserves a fixed amount of memory on the stack for the `v` vector. and reserves a fixed amount of memory on the stack for the `v` vector. However, if we wanted to
However, if we wanted to reserve extra space (using `Vec::with_capacity`) the allocator reserve extra space (using `Vec::with_capacity`) the allocator would get invoked.
would get invoked.
## Example 2 ## Example 2
@ -67,10 +65,10 @@ fn my_function() {
} }
``` ```
**Yes**: Because Boxes allow us to work with things that are of unknown size, it has to allocate **Yes**: Because Boxes allow us to work with things that are of unknown size, it has to allocate on
on the heap. While the `Box` is unnecessary in this snippet (release builds will optimize out the heap. While the `Box` is unnecessary in this snippet (release builds will optimize out the
the allocation), reserving heap space more generally is needed to pass a dynamically sized type allocation), reserving heap space more generally is needed to pass a dynamically sized type to
to another function. another function.
## Example 3 ## Example 3
@ -80,20 +78,20 @@ fn my_function(v: Vec<u8>) {
} }
``` ```
**Maybe**: Depending on whether the Vector we were given has space available, we may or may not allocate. **Maybe**: Depending on whether the Vector we were given has space available, we may or may not
Especially when dealing with code that you did not author, it's difficult to verify that things behave allocate. Especially when dealing with code that you did not author, it's difficult to verify that
as you expect them to. things behave as you expect them to.
# Blowing Things Up # Blowing Things Up
So, how exactly does QADAPT solve these problems? **Whenever an allocation or drop occurs in code marked So, how exactly does QADAPT solve these problems? **Whenever an allocation or drop occurs in code
allocation-safe, QADAPT triggers a thread panic.** We don't want to let the program continue as if marked allocation-safe, QADAPT triggers a thread panic.** We don't want to let the program continue
nothing strange happened, _we want things to explode_. as if nothing strange happened, _we want things to explode_.
However, you don't want code to panic in production because of circumstances you didn't predict. However, you don't want code to panic in production because of circumstances you didn't predict.
Just like [`debug_assert!`](https://doc.rust-lang.org/std/macro.debug_assert.html), Just like [`debug_assert!`](https://doc.rust-lang.org/std/macro.debug_assert.html), **QADAPT will
**QADAPT will strip out its own code when building in release mode to guarantee no panics and strip out its own code when building in release mode to guarantee no panics and no performance
no performance impact.** impact.**
Finally, there are three ways to have QADAPT check that your code will not invoke the allocator: Finally, there are three ways to have QADAPT check that your code will not invoke the allocator:
@ -180,8 +178,8 @@ fn main() {
## Caveats ## Caveats
It's important to point out that QADAPT code is synchronous, so please be careful It's important to point out that QADAPT code is synchronous, so please be careful when mixing in
when mixing in asynchronous functions: asynchronous functions:
```rust ```rust
use futures::future::Future; use futures::future::Future;
@ -208,16 +206,13 @@ fn main() {
# Conclusion # Conclusion
While there's a lot more to writing high-performance code than managing your usage While there's a lot more to writing high-performance code than managing your usage of the allocator,
of the allocator, it's critical that you do use the allocator correctly. it's critical that you do use the allocator correctly. QADAPT will verify that your code is doing
QADAPT will verify that your code is doing what you expect. It's usable even on what you expect. It's usable even on stable Rust from version 1.31 onward, which isn't the case for
stable Rust from version 1.31 onward, which isn't the case for most allocators. most allocators. Version 1.0 was released today, and you can check it out over at
Version 1.0 was released today, and you can check it out over at [crates.io](https://crates.io/crates/qadapt) or on [github](https://github.com/bspeice/qadapt).
[crates.io](https://crates.io/crates/qadapt) or on
[github](https://github.com/bspeice/qadapt).
I'm hoping to write more about high-performance Rust in the future, and I expect I'm hoping to write more about high-performance Rust in the future, and I expect that QADAPT will
that QADAPT will help guide that. If there are topics you're interested in, help guide that. If there are topics you're interested in, let me know in the comments below!
let me know in the comments below!
[qadapt]: https://crates.io/crates/qadapt [qadapt]: https://crates.io/crates/qadapt

View File

@ -6,28 +6,27 @@ category:
tags: [rust, understanding-allocations] tags: [rust, understanding-allocations]
--- ---
There's an alchemy of distilling complex technical topics into articles and videos There's an alchemy of distilling complex technical topics into articles and videos that change the
that change the way programmers see the tools they interact with on a regular basis. way programmers see the tools they interact with on a regular basis. I knew what a linker was, but
I knew what a linker was, but there's a staggering amount of complexity in between there's a staggering amount of complexity in between
[the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). [the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). Rust programmers use the
Rust programmers use the [`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) [`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) type all the time, but there's a
type all the time, but there's a rich history of the Rust language itself wrapped up in rich history of the Rust language itself wrapped up in
[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/). [how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/).
In a similar vein, this series attempts to look at code and understand how memory is used; In a similar vein, this series attempts to look at code and understand how memory is used; the
the complex choreography of operating system, compiler, and program that frees you complex choreography of operating system, compiler, and program that frees you to focus on
to focus on functionality far-flung from frivolous book-keeping. The Rust compiler relieves functionality far-flung from frivolous book-keeping. The Rust compiler relieves a great deal of the
a great deal of the cognitive burden associated with memory management, but we're going cognitive burden associated with memory management, but we're going to step into its world for a
to step into its world for a while. while.
Let's learn a bit about memory in Rust. Let's learn a bit about memory in Rust.
# Table of Contents # Table of Contents
This series is intended as both learning and reference material; we'll work through the This series is intended as both learning and reference material; we'll work through the different
different memory types Rust uses, and explain the implications of each. Ultimately, memory types Rust uses, and explain the implications of each. Ultimately, a summary will be provided
a summary will be provided as a cheat sheet for easy future reference. To that end, as a cheat sheet for easy future reference. To that end, a table of contents is in order:
a table of contents is in order:
- Foreword - Foreword
- [Global Memory Usage: The Whole World](/2019/02/the-whole-world.html) - [Global Memory Usage: The Whole World](/2019/02/the-whole-world.html)
@ -38,72 +37,76 @@ a table of contents is in order:
# Foreword # Foreword
Rust's three defining features of [Performance, Reliability, and Productivity](https://www.rust-lang.org/) Rust's three defining features of
are all driven to a great degree by the how the Rust compiler understands memory usage. [Performance, Reliability, and Productivity](https://www.rust-lang.org/) are all driven to a great
Unlike managed memory languages (Java, Python), Rust degree by the how the Rust compiler understands memory usage. Unlike managed memory languages (Java,
Python), Rust
[doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis) [doesn't really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis)
garbage collect; instead, it uses an [ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) garbage collect; instead, it uses an
system to reason about how long objects will last in your program. In some cases, if the life of an object [ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) system to reason about
is fairly transient, Rust can make use of a very fast region called the "stack." When that's not possible, how long objects will last in your program. In some cases, if the life of an object is fairly
Rust uses [dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation) transient, Rust can make use of a very fast region called the "stack." When that's not possible,
and the ownership system to ensure you can't accidentally corrupt memory. It's not as fast, but it is Rust uses
important to have available. [dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation)
and the ownership system to ensure you can't accidentally corrupt memory. It's not as fast, but it
is important to have available.
That said, there are specific situations in Rust where you'd never need to worry about the stack/heap That said, there are specific situations in Rust where you'd never need to worry about the
distinction! If you: stack/heap distinction! If you:
1. Never use `unsafe` 1. Never use `unsafe`
2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html) 2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)
...then it's not possible for you to use dynamic memory! ...then it's not possible for you to use dynamic memory!
For some uses of Rust, typically embedded devices, these constraints are OK. For some uses of Rust, typically embedded devices, these constraints are OK. They have very limited
They have very limited memory, and the program binary size itself may significantly memory, and the program binary size itself may significantly affect what's available! There's no
affect what's available! There's no operating system able to manage operating system able to manage this
this ["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that's ["virtual memory"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that's not an issue
not an issue because there's only one running application. The because there's only one running application. The
[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, [embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, and
and interacting with the "real world" through extra peripherals is accomplished by interacting with the "real world" through extra peripherals is accomplished by reading and writing
reading and writing to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html). to [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html).
Most Rust programs find these requirements overly burdensome though. C++ developers Most Rust programs find these requirements overly burdensome though. C++ developers would struggle
would struggle without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) without access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) (except those
(except those hardcore no-STL people), and Rust developers would struggle without hardcore no-STL people), and Rust developers would struggle without
[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above, [`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above,
`std::vec` is actually a part of the `std::vec` is actually a part of the
[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. [`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. `Box`,
`Box`, `Rc`, etc., are also unusable for the same reason. `Rc`, etc., are also unusable for the same reason.
Whether writing code for embedded devices or not, the important thing in both situations Whether writing code for embedded devices or not, the important thing in both situations is how much
is how much you know _before your application starts_ about what its memory usage will look like. you know _before your application starts_ about what its memory usage will look like. In embedded
In embedded devices, there's a small, fixed amount of memory to use. devices, there's a small, fixed amount of memory to use. In a browser, you have no idea how large
In a browser, you have no idea how large [google.com](https://www.google.com)'s home page is until you start [google.com](https://www.google.com)'s home page is until you start trying to download it. The
trying to download it. The compiler uses this knowledge (or lack thereof) to optimize compiler uses this knowledge (or lack thereof) to optimize how memory is used; put simply, your code
how memory is used; put simply, your code runs faster when the compiler can guarantee exactly runs faster when the compiler can guarantee exactly how much memory your program needs while it's
how much memory your program needs while it's running. This series is all about understanding running. This series is all about understanding how the compiler reasons about your program, with an
how the compiler reasons about your program, with an emphasis on the implications for performance. emphasis on the implications for performance.
Now let's address some conditions and caveats before going much further: Now let's address some conditions and caveats before going much further:
- We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's - We'll focus on "safe" Rust only; `unsafe` lets you use platform-specific allocation API's
([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll ignore. ([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we'll
- We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) ignore.
and address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`). - We'll assume a "debug" build of Rust code (what you get with `cargo run` and `cargo test`) and
address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`).
- All content will be run using Rust 1.32, as that's the highest currently supported in the - All content will be run using Rust 1.32, as that's the highest currently supported in the
[Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like [Compiler Exporer](https://godbolt.org/). As such, we'll avoid upcoming innovations like
[compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) [compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)
that are available in nightly. that are available in nightly.
- Because of the nature of the content, being able to read assembly is helpful. - Because of the nature of the content, being able to read assembly is helpful. We'll keep it
We'll keep it simple, but I [found](https://stackoverflow.com/a/4584131/1454178) simple, but I [found](https://stackoverflow.com/a/4584131/1454178) a
a [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop` [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop`
[instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) [instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) was helpful while writing
was helpful while writing this. this.
- I've tried to be precise in saying only what I can prove using the tools (ASM, docs) - I've tried to be precise in saying only what I can prove using the tools (ASM, docs) that are
that are available, but if there's something said in error it will be corrected available, but if there's something said in error it will be corrected expeditiously. Please let
expeditiously. Please let me know at [bradlee@speice.io](mailto:bradlee@speice.io) me know at [bradlee@speice.io](mailto:bradlee@speice.io)
Finally, I'll do what I can to flag potential future changes but the Rust docs Finally, I'll do what I can to flag potential future changes but the Rust docs have a notice worth
have a notice worth repeating: repeating:
> Rust does not currently have a rigorously and formally defined memory model. > Rust does not currently have a rigorously and formally defined memory model.
> >

View File

@ -6,27 +6,26 @@ category:
tags: [rust, understanding-allocations] tags: [rust, understanding-allocations]
--- ---
The first memory type we'll look at is pretty special: when Rust can prove that The first memory type we'll look at is pretty special: when Rust can prove that a _value_ is fixed
a _value_ is fixed for the life of a program (`const`), and when a _reference_ is unique for for the life of a program (`const`), and when a _reference_ is unique for the life of a program
the life of a program (`static` as a declaration, not (`static` as a declaration, not
[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) [`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) as a
as a lifetime), we can make use of global memory. This special section of data is embedded lifetime), we can make use of global memory. This special section of data is embedded directly in
directly in the program binary so that variables are ready to go once the program loads; the program binary so that variables are ready to go once the program loads; no additional
no additional computation is necessary. computation is necessary.
Understanding the value/reference distinction is important for reasons we'll go into below, Understanding the value/reference distinction is important for reasons we'll go into below, and
and while the while the
[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) [full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) for
for these two keywords is available, we'll take a hands-on approach to the topic. these two keywords is available, we'll take a hands-on approach to the topic.
# **const** # **const**
When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars, When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars,
`struct`s, etc.), you can declare it `const`. `struct`s, etc.), you can declare it `const`. This tells the compiler that it's safe to treat the
This tells the compiler that it's safe to treat the value as never changing, and enables value as never changing, and enables some interesting optimizations; not only is there no
some interesting optimizations; not only is there no initialization cost to initialization cost to creating the value (it is loaded at the same time as the executable parts of
creating the value (it is loaded at the same time as the executable parts of your program), your program), but the compiler can also copy the value around if it speeds up the code.
but the compiler can also copy the value around if it speeds up the code.
The points we need to address when talking about `const` are: The points we need to address when talking about `const` are:
@ -38,12 +37,12 @@ The points we need to address when talking about `const` are:
The first point is a bit strange - "read-only memory." The first point is a bit strange - "read-only memory."
[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants) [The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants)
mentions in a couple places that using `mut` with constants is illegal, mentions in a couple places that using `mut` with constants is illegal, but it's also important to
but it's also important to demonstrate just how immutable they are. _Typically_ in Rust demonstrate just how immutable they are. _Typically_ in Rust you can use
you can use [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) to modify
to modify things that aren't declared `mut`. things that aren't declared `mut`.
[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an [`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an example of this
example of this pattern in action: pattern in action:
```rust ```rust
use std::cell::RefCell; use std::cell::RefCell;
@ -64,7 +63,8 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2)
When `const` is involved though, interior mutability is impossible: When `const` is involved though, interior mutability is impossible:
@ -86,7 +86,8 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00)
And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html): And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html):
@ -105,18 +106,20 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed)
When the [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md) When the
refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this behavior [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md)
is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this as an error, refers to ["rvalues"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this
but it's still something to be aware of. behavior is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this
as an error, but it's still something to be aware of.
## Initialization == Compilation ## Initialization == Compilation
The next thing to mention is that `const` values are loaded into memory _as part of your program binary_. The next thing to mention is that `const` values are loaded into memory _as part of your program
Because of this, any `const` values declared in your program will be "realized" at compile-time; binary_. Because of this, any `const` values declared in your program will be "realized" at
accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may compile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may
be able to prefetch the value), but that's it. be able to prefetch the value), but that's it.
```rust ```rust
@ -132,8 +135,8 @@ pub fn multiply(value: u32) -> u32 {
-- [Compiler Explorer](https://godbolt.org/z/Th8boO) -- [Compiler Explorer](https://godbolt.org/z/Th8boO)
The compiler creates one `RefCell`, uses it everywhere, and never The compiler creates one `RefCell`, uses it everywhere, and never needs to call the `RefCell::new`
needs to call the `RefCell::new` function. function.
## Copying ## Copying
@ -155,39 +158,38 @@ pub fn multiply_twice(value: u32) -> u32 {
-- [Compiler Explorer](https://godbolt.org/z/ZtS54X) -- [Compiler Explorer](https://godbolt.org/z/ZtS54X)
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the
in both the `multiply` and `multiply_twice` functions; the "1000" value is never `multiply` and `multiply_twice` functions; the "1000" value is never "stored" anywhere, as it's
"stored" anywhere, as it's small enough to inline into the assembly instructions. small enough to inline into the assembly instructions.
Finally, getting the address of a `const` value is possible, but not guaranteed Finally, getting the address of a `const` value is possible, but not guaranteed to be unique
to be unique (because the compiler can choose to copy values). I was unable to (because the compiler can choose to copy values). I was unable to get non-unique pointers in my
get non-unique pointers in my testing (even using different crates), testing (even using different crates), but the specifications are clear enough: _don't rely on
but the specifications are clear enough: _don't rely on pointers to `const` pointers to `const` values being consistent_. To be frank, caring about locations for `const` values
values being consistent_. To be frank, caring about locations for `const` values
is almost certainly a code smell. is almost certainly a code smell.
# **static** # **static**
Static variables are related to `const` variables, but take a slightly different approach. Static variables are related to `const` variables, but take a slightly different approach. When we
When we declare that a _reference_ is unique for the life of a program, declare that a _reference_ is unique for the life of a program, you have a `static` variable
you have a `static` variable (unrelated to the `'static` lifetime). Because of the (unrelated to the `'static` lifetime). Because of the reference/value distinction with
reference/value distinction with `const`/`static`, `const`/`static`, static variables behave much more like typical "global" variables.
static variables behave much more like typical "global" variables.
But to understand `static`, here's what we'll look at: But to understand `static`, here's what we'll look at:
- `static` variables are globally unique locations in memory. - `static` variables are globally unique locations in memory.
- Like `const`, `static` variables are loaded at the same time as your program being read into memory. - Like `const`, `static` variables are loaded at the same time as your program being read into
- All `static` variables must implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) memory.
marker trait. - All `static` variables must implement the
[`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) marker trait.
- Interior mutability is safe and acceptable when using `static` variables. - Interior mutability is safe and acceptable when using `static` variables.
## Memory Uniqueness ## Memory Uniqueness
The single biggest difference between `const` and `static` is the guarantees The single biggest difference between `const` and `static` is the guarantees provided about
provided about uniqueness. Where `const` variables may or may not be copied uniqueness. Where `const` variables may or may not be copied in code, `static` variables are
in code, `static` variables are guarantee to be unique. If we take a previous guarantee to be unique. If we take a previous `const` example and change it to `static`, the
`const` example and change it to `static`, the difference should be clear: difference should be clear:
```rust ```rust
static FACTOR: u32 = 1000; static FACTOR: u32 = 1000;
@ -205,15 +207,14 @@ pub fn multiply_twice(value: u32) -> u32 {
-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ) -- [Compiler Explorer](https://godbolt.org/z/uxmiRQ)
Where [previously](#copying) there were plenty of Where [previously](#copying) there were plenty of references to multiplying by 1000, the new
references to multiplying by 1000, the new assembly refers to `FACTOR` assembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be
as a named memory location instead. No initialization work needs to be done, done, but the compiler can no longer prove the value never changes during execution.
but the compiler can no longer prove the value never changes during execution.
## Initialization == Compilation ## Initialization == Compilation
Next, let's talk about initialization. The simplest case is initializing Next, let's talk about initialization. The simplest case is initializing static variables with
static variables with either scalar or struct notation: either scalar or struct notation:
```rust ```rust
#[derive(Debug)] #[derive(Debug)]
@ -234,7 +235,8 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)
Things can get a bit weirder when using `const fn` though. In most cases, it just works: Things can get a bit weirder when using `const fn` though. In most cases, it just works:
@ -257,13 +259,15 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)
However, there's a caveat: you're currently not allowed to use `const fn` to initialize However, there's a caveat: you're currently not allowed to use `const fn` to initialize static
static variables of types that aren't marked `Sync`. For example, variables of types that aren't marked `Sync`. For example,
[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) [`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) is a
is a `const fn`, but because [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), `const fn`, but because
you'll get an error at compile time: [`RefCell` isn't `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you'll
get an error at compile time:
```rust ```rust
use std::cell::RefCell; use std::cell::RefCell;
@ -272,22 +276,23 @@ use std::cell::RefCell;
static MY_LOCK: RefCell<u8> = RefCell::new(0); static MY_LOCK: RefCell<u8> = RefCell::new(0);
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)
It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though. It's likely that this will
[change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though.
## **Sync** ## **Sync**
Which leads well to the next point: static variable types must implement the Which leads well to the next point: static variable types must implement the
[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). [`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they're globally
Because they're globally unique, it must be safe for you to access static variables unique, it must be safe for you to access static variables from any thread at any time. Most
from any thread at any time. Most `struct` definitions automatically implement the `struct` definitions automatically implement the `Sync` trait because they contain only elements
`Sync` trait because they contain only elements which themselves which themselves implement `Sync` (read more in the
implement `Sync` (read more in the [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). [Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). This is why earlier examples could
This is why earlier examples could get away with initializing get away with initializing statics, even though we never included an `impl Sync for MyStruct` in the
statics, even though we never included an `impl Sync for MyStruct` in the code. code. To demonstrate this property, Rust refuses to compile our earlier example if we add a
To demonstrate this property, Rust refuses to compile our earlier non-`Sync` element to the `struct` definition:
example if we add a non-`Sync` element to the `struct` definition:
```rust ```rust
use std::cell::RefCell; use std::cell::RefCell;
@ -304,13 +309,13 @@ static MY_STRUCT: MyStruct = MyStruct {
}; };
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)
## Interior Mutability ## Interior Mutability
Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. Finally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. If we
If we want to stay in `safe` Rust, we can use interior mutability to accomplish want to stay in `safe` Rust, we can use interior mutability to accomplish similar goals:
similar goals:
```rust ```rust
use std::sync::Once; use std::sync::Once;
@ -328,4 +333,5 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)

View File

@ -6,58 +6,57 @@ category:
tags: [rust, understanding-allocations] tags: [rust, understanding-allocations]
--- ---
`const` and `static` are perfectly fine, but it's relatively rare that we know `const` and `static` are perfectly fine, but it's relatively rare that we know at compile-time about
at compile-time about either values or references that will be the same for the either values or references that will be the same for the duration of our program. Put another way,
duration of our program. Put another way, it's not often the case that either you it's not often the case that either you or your compiler knows how much memory your entire program
or your compiler knows how much memory your entire program will ever need. will ever need.
However, there are still some optimizations the compiler can do if it knows how much However, there are still some optimizations the compiler can do if it knows how much memory
memory individual functions will need. Specifically, the compiler can make use of individual functions will need. Specifically, the compiler can make use of "stack" memory (as
"stack" memory (as opposed to "heap" memory) which can be managed far faster in opposed to "heap" memory) which can be managed far faster in both the short- and long-term. When
both the short- and long-term. When requesting memory, the requesting memory, the [`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)
[`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (<1
can typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) nanosecond on modern CPUs). Contrast that to heap memory which requires an allocator (specialized
(<1 nanosecond on modern CPUs). Contrast that to heap memory which requires an allocator software to track what memory is in use) to reserve space. When you're finished with stack memory,
(specialized software to track what memory is in use) to reserve space. the `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory
When you're finished with stack memory, the `pop` instruction runs in fragmentation and other issues with the heap. All sorts of incredibly sophisticated techniques have
1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation been used to design allocators:
and other issues with the heap. All sorts of incredibly sophisticated techniques have been used
to design allocators:
- [Garbage Collection](<https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)>) - [Garbage Collection](<https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)>)
strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) (used in
(used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) and
and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) (used in
(used in [Python](https://docs.python.org/3/extending/extending.html#reference-counts)) [Python](https://docs.python.org/3/extending/extending.html#reference-counts))
- Thread-local structures to prevent locking the allocator in [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html) - Thread-local structures to prevent locking the allocator in
[tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html)
- Arena structures used in [jemalloc](http://jemalloc.net/), which - Arena structures used in [jemalloc](http://jemalloc.net/), which
[until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default) [until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default)
was the primary allocator for Rust programs! was the primary allocator for Rust programs!
But no matter how fast your allocator is, the principle remains: the But no matter how fast your allocator is, the principle remains: the fastest allocator is the one
fastest allocator is the one you never use. As such, we're not going to discuss how exactly the you never use. As such, we're not going to discuss how exactly the
[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), [`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but
but we'll focus instead on the conditions that enable the Rust compiler to use we'll focus instead on the conditions that enable the Rust compiler to use faster stack-based
faster stack-based allocation for variables. allocation for variables.
So, **how do we know when Rust will or will not use stack allocation for objects we create?** So, **how do we know when Rust will or will not use stack allocation for objects we create?**
Looking at other languages, it's often easy to delineate Looking at other languages, it's often easy to delineate between stack and heap. Managed memory
between stack and heap. Managed memory languages (Python, Java, languages (Python, Java,
[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) [C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) place
place everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/), everything on the heap. JIT compilers ([PyPy](https://www.pypy.org/),
[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may [HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize
optimize some heap allocations away, but you should never assume it will happen. some heap allocations away, but you should never assume it will happen. C makes things clear with
C makes things clear with calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) calls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) needed to access
needed to access heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) heap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though
keyword, though modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii). modern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii).
For Rust, we can summarize as follows: **stack allocation will be used for everything For Rust, we can summarize as follows: **stack allocation will be used for everything that doesn't
that doesn't involve "smart pointers" and collections**. We'll skip over a precise definition involve "smart pointers" and collections**. We'll skip over a precise definition of the term "smart
of the term "smart pointer" for now, and instead discuss what we should watch for to understand pointer" for now, and instead discuss what we should watch for to understand when stack and heap
when stack and heap memory regions are used: memory regions are used:
1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) 1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate
indicate allocation of stack memory: allocation of stack memory:
```rust ```rust
pub fn stack_alloc(x: u32) -> u32 { pub fn stack_alloc(x: u32) -> u32 {
@ -71,9 +70,9 @@ when stack and heap memory regions are used:
-- [Compiler Explorer](https://godbolt.org/z/5WSgc9) -- [Compiler Explorer](https://godbolt.org/z/5WSgc9)
2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to 2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to watch
watch for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent
in the recent past: past:
```rust ```rust
pub fn heap_alloc(x: usize) -> usize { pub fn heap_alloc(x: usize) -> usize {
@ -86,33 +85,35 @@ when stack and heap memory regions are used:
``` ```
-- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317) -- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317)
<span style="font-size: .8em">Note: While the [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) <span style="font-size: .8em">Note: While the
is [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46), [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is
the Rust standard library only defines `Drop` implementations for types that involve heap allocation.</span> [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46),
the Rust standard library only defines `Drop` implementations for types that involve heap
allocation.</span>
3. If you don't want to inspect the assembly, use a custom allocator that's able to track 3. If you don't want to inspect the assembly, use a custom allocator that's able to track and alert
and alert when heap allocations occur. Crates like [`alloc_counter`](https://crates.io/crates/alloc_counter) when heap allocations occur. Crates like
are designed for exactly this purpose. [`alloc_counter`](https://crates.io/crates/alloc_counter) are designed for exactly this purpose.
With all that in mind, let's talk about situations in which we're guaranteed to use stack memory: With all that in mind, let's talk about situations in which we're guaranteed to use stack memory:
- Structs are created on the stack. - Structs are created on the stack.
- Function arguments are passed on the stack, meaning the - Function arguments are passed on the stack, meaning the
[`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) [`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) will
will not change the memory region used. not change the memory region used.
- Enums and unions are stack-allocated. - Enums and unions are stack-allocated.
- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated. - [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated.
- Closures capture their arguments on the stack. - Closures capture their arguments on the stack.
- Generics will use stack allocation, even with dynamic dispatch. - Generics will use stack allocation, even with dynamic dispatch.
- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be - [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be
stack-allocated, and copying them will be done in stack memory. stack-allocated, and copying them will be done in stack memory.
- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library - [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library are
are stack-allocated even when iterating over heap-based collections. stack-allocated even when iterating over heap-based collections.
# Structs # Structs
The simplest case comes first. When creating vanilla `struct` objects, we use stack memory The simplest case comes first. When creating vanilla `struct` objects, we use stack memory to hold
to hold their contents: their contents:
```rust ```rust
struct Point { struct Point {
@ -140,21 +141,19 @@ pub fn make_line() {
-- [Compiler Explorer](https://godbolt.org/z/vri9BE) -- [Compiler Explorer](https://godbolt.org/z/vri9BE)
Note that while some extra-fancy instructions are used for memory manipulation in the assembly, Note that while some extra-fancy instructions are used for memory manipulation in the assembly, the
the `sub rsp, 64` instruction indicates we're still working with the stack. `sub rsp, 64` instruction indicates we're still working with the stack.
# Function arguments # Function arguments
Have you ever wondered how functions communicate with each other? Like, once the variables are Have you ever wondered how functions communicate with each other? Like, once the variables are given
given to you, everything's fine. But how do you "give" those variables to another function? to you, everything's fine. But how do you "give" those variables to another function? How do you get
How do you get the results back afterward? The answer: the compiler arranges memory and the results back afterward? The answer: the compiler arranges memory and assembly instructions using
assembly instructions using a pre-determined a pre-determined [calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). This
[calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). convention governs the rules around where arguments needed by a function will be located (either in
This convention governs the rules around where arguments needed by a function will be located memory offsets relative to the stack pointer `rsp`, or in other registers), and where the results
(either in memory offsets relative to the stack pointer `rsp`, or in other registers), and can be found once the function has finished. And when multiple languages agree on what the calling
where the results can be found once the function has finished. And when multiple languages conventions are, you can do things like having [Go call Rust code](https://blog.filippo.io/rustgo/)!
agree on what the calling conventions are, you can do things like having
[Go call Rust code](https://blog.filippo.io/rustgo/)!
Put simply: it's the compiler's job to figure out how to call other functions, and you can assume Put simply: it's the compiler's job to figure out how to call other functions, and you can assume
that the compiler is good at its job. that the compiler is good at its job.
@ -206,9 +205,9 @@ pub fn total_distance() {
-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST) -- [Compiler Explorer](https://godbolt.org/z/Qmx4ST)
As a consequence of function arguments never using heap memory, we can also As a consequence of function arguments never using heap memory, we can also infer that functions
infer that functions using the `#[inline]` attributes also do not heap allocate. using the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look
But better than inferring, we can look at the assembly to prove it: at the assembly to prove it:
```rust ```rust
struct Point { struct Point {
@ -245,10 +244,9 @@ pub fn total_distance() {
-- [Compiler Explorer](https://godbolt.org/z/30Sh66) -- [Compiler Explorer](https://godbolt.org/z/30Sh66)
Finally, passing by value (arguments with type Finally, passing by value (arguments with type
[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either
and passing by reference (either moving ownership or passing a pointer) may have moving ownership or passing a pointer) may have slightly different layouts in assembly, but will
slightly different layouts in assembly, but will still use either stack memory still use either stack memory or CPU registers:
or CPU registers:
```rust ```rust
pub struct Point { pub struct Point {
@ -289,9 +287,9 @@ pub fn distance_borrowed(a: &Point, b: &Point) -> i64 {
If you've ever worried that wrapping your types in If you've ever worried that wrapping your types in
[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or [`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or
[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would [`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would finally make them
finally make them large enough that Rust decides to use heap allocation instead, large enough that Rust decides to use heap allocation instead, fear no longer: `enum` and union
fear no longer: `enum` and union types don't use heap allocation: types don't use heap allocation:
```rust ```rust
enum MyEnum { enum MyEnum {
@ -316,18 +314,17 @@ pub fn enum_compare() {
-- [Compiler Explorer](https://godbolt.org/z/HK7zBx) -- [Compiler Explorer](https://godbolt.org/z/HK7zBx)
Because the size of an `enum` is the size of its largest element plus a flag, Because the size of an `enum` is the size of its largest element plus a flag, the compiler can
the compiler can predict how much memory is used no matter which variant predict how much memory is used no matter which variant of an enum is currently stored in a
of an enum is currently stored in a variable. Thus, enums and unions have no variable. Thus, enums and unions have no need of heap allocation. There's unfortunately not a great
need of heap allocation. There's unfortunately not a great way to show this way to show this in assembly, so I'll instead point you to the
in assembly, so I'll instead point you to the
[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums) [`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums)
documentation. documentation.
# Arrays # Arrays
The array type is guaranteed to be stack allocated, which is why the array size must The array type is guaranteed to be stack allocated, which is why the array size must be declared.
be declared. Interestingly enough, this can be used to cause safe Rust programs to crash: Interestingly enough, this can be used to cause safe Rust programs to crash:
```rust ```rust
// 256 bytes // 256 bytes
@ -364,33 +361,32 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4)
There aren't any security implications of this (no memory corruption occurs), There aren't any security implications of this (no memory corruption occurs), but it's good to note
but it's good to note that the Rust compiler won't move arrays into heap memory that the Rust compiler won't move arrays into heap memory even if they can be reasonably expected to
even if they can be reasonably expected to overflow the stack. overflow the stack.
# Closures # Closures
Rules for how anonymous functions capture their arguments are typically language-specific. Rules for how anonymous functions capture their arguments are typically language-specific. In Java,
In Java, [Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) [Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) are
are actually objects created on the heap that capture local primitives by copying, and capture actually objects created on the heap that capture local primitives by copying, and capture local
local non-primitives as (`final`) references. non-primitives as (`final`) references.
[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and [Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and
[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/) [JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/)
both bind _everything_ by reference normally, but Python can also both bind _everything_ by reference normally, but Python can also
[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has [capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has
[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions). [Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions).
In Rust, arguments to closures are the same as arguments to other functions; In Rust, arguments to closures are the same as arguments to other functions; closures are simply
closures are simply functions that don't have a declared name. Some weird ordering functions that don't have a declared name. Some weird ordering of the stack may be required to
of the stack may be required to handle them, but it's the compiler's responsiblity handle them, but it's the compiler's responsiblity to figure that out.
to figure that out.
Each example below has the same effect, but a different assembly implementation. Each example below has the same effect, but a different assembly implementation. In the simplest
In the simplest case, we immediately run a closure returned by another function. case, we immediately run a closure returned by another function. Because we don't store a reference
Because we don't store a reference to the closure, the stack memory needed to to the closure, the stack memory needed to store the captured values is contiguous:
store the captured values is contiguous:
```rust ```rust
fn my_func() -> impl FnOnce() { fn my_func() -> impl FnOnce() {
@ -409,10 +405,9 @@ pub fn immediate() {
-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions -- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions
If we store a reference to the closure, the Rust compiler keeps values it needs If we store a reference to the closure, the Rust compiler keeps values it needs in the stack memory
in the stack memory of the original function. Getting the details right is a bit harder, of the original function. Getting the details right is a bit harder, so the instruction count goes
so the instruction count goes up even though this code is functionally equivalent up even though this code is functionally equivalent to our original example:
to our original example:
```rust ```rust
pub fn simple_reference() { pub fn simple_reference() {
@ -442,10 +437,9 @@ In every circumstance though, the compiler ensured that no heap allocations were
# Generics # Generics
Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) and dynamic
and dynamic dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often _associated_ with trait
_associated_ with trait objects being stored in the heap, dynamic dispatch can be used objects being stored in the heap, dynamic dispatch can be used with stack allocated objects as well:
with stack allocated objects as well:
```rust ```rust
trait GetInt { trait GetInt {
@ -498,29 +492,28 @@ pub fn do_call() {
-- [Compiler Explorer](https://godbolt.org/z/u_yguS) -- [Compiler Explorer](https://godbolt.org/z/u_yguS)
It's hard to imagine practical situations where dynamic dispatch would be It's hard to imagine practical situations where dynamic dispatch would be used for objects that
used for objects that aren't heap allocated, but it technically can be done. aren't heap allocated, but it technically can be done.
# Copy types # Copy types
Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs Understanding move semantics and copy semantics in Rust is weird at first. The Rust docs
[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) [go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can
far better than can be addressed here, so I'll leave them to do the job. be addressed here, so I'll leave them to do the job. From a memory perspective though, their
From a memory perspective though, their guideline is reasonable: guideline is reasonable:
[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy). [if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy).
While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` (move semantics for
(move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s), stack objects vs. copying stack pointers vs. copying stack `struct`s), _it's impossible for `Copy`
_it's impossible for `Copy` to introduce a heap allocation_. to introduce a heap allocation_.
But why is this the case? Fundamentally, it's because the language controls But why is this the case? Fundamentally, it's because the language controls what `Copy` means -
what `Copy` means -
["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone) ["the behavior of `Copy` is not overloadable"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone)
because it's a marker trait. From there we'll note that a type because it's a marker trait. From there we'll note that a type
[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy) [can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy)
if (and only if) its components implement `Copy`, and that if (and only if) its components implement `Copy`, and that
[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors). [no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors).
Thus, assignments involving heap types are always move semantics, and new heap Thus, assignments involving heap types are always move semantics, and new heap allocations won't
allocations won't occur because of implicit operator behavior. occur because of implicit operator behavior.
```rust ```rust
#[derive(Clone)] #[derive(Clone)]
@ -539,8 +532,9 @@ struct NotCopyable {
# Iterators # Iterators
In managed memory languages (like [Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), In managed memory languages (like
there's a subtle difference between these two code samples: [Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), there's a subtle
difference between these two code samples:
```java ```java
public static int sum_for(List<Long> vals) { public static int sum_for(List<Long> vals) {
@ -562,16 +556,16 @@ public static int sum_foreach(List<Long> vals) {
} }
``` ```
In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, In the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, an object of type
an object of type [`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html) [`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html)
is allocated on the heap, and will eventually be garbage-collected. This isn't a great design; is allocated on the heap, and will eventually be garbage-collected. This isn't a great design;
iterators are often transient objects that you need during a function and can discard iterators are often transient objects that you need during a function and can discard once the
once the function ends. Sounds exactly like the issue stack-allocated objects address, no? function ends. Sounds exactly like the issue stack-allocated objects address, no?
In Rust, iterators are allocated on the stack. The objects to iterate over are almost In Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in
certainly in heap memory, but the iterator itself heap memory, but the iterator itself
([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. ([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn't need to use the heap. In
In each of the examples below we iterate over a collection, but never use heap allocation: each of the examples below we iterate over a collection, but never use heap allocation:
```rust ```rust
use std::collections::HashMap; use std::collections::HashMap;

View File

@ -6,25 +6,25 @@ category:
tags: [rust, understanding-allocations] tags: [rust, understanding-allocations]
--- ---
Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and
and some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, how
how the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_. the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_.
And as the docs mention, ownership And as the docs mention, ownership
[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). [is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html).
The heap is used in two situations; when the compiler is unable to predict either the _total size The heap is used in two situations; when the compiler is unable to predict either the _total size of
of memory needed_, or _how long the memory is needed for_, it allocates space in the heap. memory needed_, or _how long the memory is needed for_, it allocates space in the heap. This happens
This happens pretty frequently; if you want to download the Google home page, you won't know pretty frequently; if you want to download the Google home page, you won't know how large it is
how large it is until your program runs. And when you're finished with Google, we deallocate until your program runs. And when you're finished with Google, we deallocate the memory so it can be
the memory so it can be used to store other webpages. If you're used to store other webpages. If you're interested in a slightly longer explanation of the heap,
interested in a slightly longer explanation of the heap, check out check out
[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap) [The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap)
in Rust's documentation. in Rust's documentation.
We won't go into detail on how the heap is managed; the We won't go into detail on how the heap is managed; the
[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) [ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a
does a phenomenal job explaining both the "why" and "how" of memory management. Instead, phenomenal job explaining both the "why" and "how" of memory management. Instead, we're going to
we're going to focus on understanding "when" heap allocations occur in Rust. focus on understanding "when" heap allocations occur in Rust.
To start off, take a guess for how many allocations happen in the program below: To start off, take a guess for how many allocations happen in the program below:
@ -32,9 +32,8 @@ To start off, take a guess for how many allocations happen in the program below:
fn main() {} fn main() {}
``` ```
It's obviously a trick question; while no heap allocations occur as a result of It's obviously a trick question; while no heap allocations occur as a result of that code, the setup
that code, the setup needed to call `main` does allocate on the heap. needed to call `main` does allocate on the heap. Here's a way to show it:
Here's a way to show it:
```rust ```rust
#![feature(integer_atomics)] #![feature(integer_atomics)]
@ -65,61 +64,58 @@ fn main() {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e) --
[Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e)
As of the time of writing, there are five allocations that happen before `main` As of the time of writing, there are five allocations that happen before `main` is ever called.
is ever called.
But when we want to understand more practically where heap allocation happens, But when we want to understand more practically where heap allocation happens, we'll follow this
we'll follow this guide: guide:
- Smart pointers hold their contents in the heap - Smart pointers hold their contents in the heap
- Collections are smart pointers for many objects at a time, and reallocate - Collections are smart pointers for many objects at a time, and reallocate when they need to grow
when they need to grow
Finally, there are two "addendum" issues that are important to address when discussing Finally, there are two "addendum" issues that are important to address when discussing Rust and the
Rust and the heap: heap:
- Non-heap alternatives to many standard library types are available. - Non-heap alternatives to many standard library types are available.
- Special allocators to track memory behavior should be used to benchmark code. - Special allocators to track memory behavior should be used to benchmark code.
# Smart pointers # Smart pointers
The first thing to note are the "smart pointer" types. The first thing to note are the "smart pointer" types. When you have data that must outlive the
When you have data that must outlive the scope in which it is declared, scope in which it is declared, or your data is of unknown or dynamic size, you'll make use of these
or your data is of unknown or dynamic size, you'll make use of these types. types.
The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) The term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) comes from C++, and while it's
comes from C++, and while it's closely linked to a general design pattern of closely linked to a general design pattern of
["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii), ["Resource Acquisition Is Initialization"](https://en.cppreference.com/w/cpp/language/raii), we'll
we'll use it here specifically to describe objects that are responsible for managing use it here specifically to describe objects that are responsible for managing ownership of data
ownership of data allocated on the heap. The smart pointers available in the `alloc` allocated on the heap. The smart pointers available in the `alloc` crate should look mostly
crate should look mostly familiar: familiar:
- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html) - [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)
- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html) - [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)
- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html) - [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)
- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html) - [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html)
The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers to manage
to manage heap objects, though more than can be covered here. Some examples are: heap objects, though more than can be covered here. Some examples are:
- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) - [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)
- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) - [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)
Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): Finally, there is one ["gotcha"](https://www.merriam-webster.com/dictionary/gotcha): **cell types**
**cell types** (like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) (like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) look and behave
look and behave similarly, but **don't involve heap allocation**. The similarly, but **don't involve heap allocation**. The
[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) [`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) have more information.
have more information.
When a smart pointer is created, the data it is given is placed in heap memory and When a smart pointer is created, the data it is given is placed in heap memory and the location of
the location of that data is recorded in the smart pointer. Once the smart pointer that data is recorded in the smart pointer. Once the smart pointer has determined it's safe to
has determined it's safe to deallocate that memory (when a `Box` has deallocate that memory (when a `Box` has
[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a [gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a reference count
reference count [goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), [goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can
the heap space is reclaimed. We can prove these types use heap memory by prove these types use heap memory by looking at code:
looking at code:
```rust ```rust
use std::rc::Rc; use std::rc::Rc;
@ -151,20 +147,19 @@ pub fn my_cow() {
# Collections # Collections
Collection types use heap memory because their contents have dynamic size; they will request Collection types use heap memory because their contents have dynamic size; they will request more
more memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), memory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can
and can [release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) [release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it's
when it's no longer necessary. This dynamic property forces Rust to heap allocate no longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In
everything they contain. In a way, **collections are smart pointers for many objects at a time**. a way, **collections are smart pointers for many objects at a time**. Common types that fall under
Common types that fall under this umbrella are this umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),
[`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),
[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and [`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and
[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) [`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not
(not [`str`](https://doc.rust-lang.org/std/primitive.str.html)). [`str`](https://doc.rust-lang.org/std/primitive.str.html)).
While collections store the objects they own in heap memory, _creating new collections While collections store the objects they own in heap memory, _creating new collections will not
will not allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the assembly shows a
assembly shows a corresponding call to `real_drop_in_place`: corresponding call to `real_drop_in_place`:
```rust ```rust
pub fn my_vec() { pub fn my_vec() {
@ -175,8 +170,7 @@ pub fn my_vec() {
-- [Compiler Explorer](https://godbolt.org/z/1WkNtC) -- [Compiler Explorer](https://godbolt.org/z/1WkNtC)
But because the vector has no elements to manage, no calls to the allocator But because the vector has no elements to manage, no calls to the allocator will ever be dispatched:
will ever be dispatched:
```rust ```rust
use std::alloc::{GlobalAlloc, Layout, System}; use std::alloc::{GlobalAlloc, Layout, System};
@ -217,7 +211,8 @@ unsafe impl GlobalAlloc for PanicAllocator {
} }
``` ```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6) --
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6)
Other standard library types follow the same behavior; make sure to check out Other standard library types follow the same behavior; make sure to check out
[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new), [`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new),
@ -225,35 +220,35 @@ and [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#me
# Heap Alternatives # Heap Alternatives
While it is a bit strange to speak of the stack after spending time with the heap, While it is a bit strange to speak of the stack after spending time with the heap, it's worth
it's worth pointing out that some heap-allocated objects in Rust have stack-based counterparts pointing out that some heap-allocated objects in Rust have stack-based counterparts provided by
provided by other crates. If you have need of the functionality, but want to avoid allocating, other crates. If you have need of the functionality, but want to avoid allocating, there are
there are typically alternatives available. typically alternatives available.
When it comes to some standard library smart pointers When it comes to some standard library smart pointers
([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and ([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and
[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are
are provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and provided in crates like [parking_lot](https://crates.io/crates/parking_lot) and
[spin](https://crates.io/crates/spin). You can check out [spin](https://crates.io/crates/spin). You can check out
[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html), [`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html),
[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and [`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and
[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) [`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you're in need
if you're in need of synchronization primitives. of synchronization primitives.
[thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator [thread_id](https://crates.io/crates/thread-id) may be necessary if you're implementing an allocator
because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) because [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) uses a
uses a [`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36) [`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36)
that needs heap allocation. that needs heap allocation.
# Tracing Allocators # Tracing Allocators
When writing performance-sensitive code, there's no alternative to measuring your code. When writing performance-sensitive code, there's no alternative to measuring your code. If you
If you didn't write a benchmark, didn't write a benchmark,
[you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263) [you don't care about it's performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263)
You should never rely on your instincts when You should never rely on your instincts when
[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM). [a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM).
Similarly, there's great work going on in Rust with allocators that keep track of what Similarly, there's great work going on in Rust with allocators that keep track of what they're doing
they're doing (like [`alloc_counter`](https://crates.io/crates/alloc_counter)). (like [`alloc_counter`](https://crates.io/crates/alloc_counter)). When it comes to tracking heap
When it comes to tracking heap behavior, it's easy to make mistakes; behavior, it's easy to make mistakes; please write tests and make sure you have tools to guard
please write tests and make sure you have tools to guard against future issues. against future issues.

View File

@ -6,49 +6,44 @@ category:
tags: [rust, understanding-allocations] tags: [rust, understanding-allocations]
--- ---
**Update 2019-02-10**: When debugging a [related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), **Update 2019-02-10**: When debugging a
it was discovered that the original code worked because LLVM optimized out [related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the
the entire function, rather than just the allocation segments. original code worked because LLVM optimized out the entire function, rather than just the allocation
The code has been updated with proper use of [`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), segments. The code has been updated with proper use of
and a previous section on vector capacity has been removed. [`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section
on vector capacity has been removed.
--- ---
Up to this point, we've been discussing memory usage in the Rust language Up to this point, we've been discussing memory usage in the Rust language by focusing on simple
by focusing on simple rules that are mostly right for small chunks of code. rules that are mostly right for small chunks of code. We've spent time showing how those rules work
We've spent time showing how those rules work themselves out in practice, themselves out in practice, and become familiar with reading the assembly code needed to see each
and become familiar with reading the assembly code needed to see each memory memory type (global, stack, heap) in action.
type (global, stack, heap) in action.
Throughout the series so far, we've put a handicap on the code. Throughout the series so far, we've put a handicap on the code. In the name of consistent and
In the name of consistent and understandable results, we've asked the understandable results, we've asked the compiler to pretty please leave the training wheels on. Now
compiler to pretty please leave the training wheels on. Now is the time is the time where we throw out all the rules and take off the kid gloves. As it turns out, both the
where we throw out all the rules and take off the kid gloves. As it turns out, Rust compiler and the LLVM optimizers are incredibly sophisticated, and we'll step back and let them
both the Rust compiler and the LLVM optimizers are incredibly sophisticated, do their job.
and we'll step back and let them do their job.
Similar to ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), Similar to
we're focusing on interesting things the Rust language (and LLVM!) can do ["What Has My Compiler Done For Me Lately?"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we're
with memory management. We'll still be looking at assembly code to focusing on interesting things the Rust language (and LLVM!) can do with memory management. We'll
understand what's going on, but it's important to mention again: still be looking at assembly code to understand what's going on, but it's important to mention
**please use automated tools like again: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to
[alloc-counter](https://crates.io/crates/alloc_counter) to double-check double-check memory behavior if it's something you care about**. It's far too easy to mis-read
memory behavior if it's something you care about**. assembly in large code sections, you should always verify behavior if you care about memory usage.
It's far too easy to mis-read assembly in large code sections, you should
always verify behavior if you care about memory usage.
The guiding principal as we move forward is this: _optimizing compilers The guiding principal as we move forward is this: _optimizing compilers won't produce worse programs
won't produce worse programs than we started with._ There won't be any than we started with._ There won't be any situations where stack allocations get moved to heap
situations where stack allocations get moved to heap allocations. allocations. There will, however, be an opera of optimization.
There will, however, be an opera of optimization.
# The Case of the Disappearing Box # The Case of the Disappearing Box
Our first optimization comes when LLVM can reason that the lifetime of an object Our first optimization comes when LLVM can reason that the lifetime of an object is sufficiently
is sufficiently short that heap allocations aren't necessary. In these cases, short that heap allocations aren't necessary. In these cases, LLVM will move the allocation to the
LLVM will move the allocation to the stack instead! The way this interacts stack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important
with `#[inline]` attributes is a bit opaque, but the important part is that LLVM part is that LLVM can sometimes do better than the baseline Rust language:
can sometimes do better than the baseline Rust language:
```rust ```rust
use std::alloc::{GlobalAlloc, Layout, System}; use std::alloc::{GlobalAlloc, Layout, System};
@ -102,22 +97,21 @@ unsafe impl GlobalAlloc for PanicAllocator {
} }
``` ```
-- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3) ## -- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
# Dr. Array or: How I Learned to Love the Optimizer # Dr. Array or: How I Learned to Love the Optimizer
Finally, this isn't so much about LLVM figuring out different memory behavior, Finally, this isn't so much about LLVM figuring out different memory behavior, but LLVM stripping
but LLVM stripping out code that doesn't do anything. Optimizations of out code that doesn't do anything. Optimizations of this type have a lot of nuance to them; if
this type have a lot of nuance to them; if you're not careful, they can you're not careful, they can make your benchmarks look
make your benchmarks look [impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the
[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). `black_box` function (implemented in both
In Rust, the `black_box` function (implemented in both
[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and [`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and
[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) [`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler
will tell the compiler to disable this kind of optimization. But if you let to disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up
LLVM remove unnecessary code, you can end up running programs that running programs that previously caused errors:
previously caused errors:
```rust ```rust
#[derive(Default)] #[derive(Default)]
@ -149,5 +143,6 @@ pub fn main() {
} }
``` ```
-- [Compiler Explorer](https://godbolt.org/z/daHn7P) ## -- [Compiler Explorer](https://godbolt.org/z/daHn7P)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)
[Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)

View File

@ -6,18 +6,17 @@ category:
tags: [rust, understanding-allocations] tags: [rust, understanding-allocations]
--- ---
While there's a lot of interesting detail captured in this series, it's often helpful While there's a lot of interesting detail captured in this series, it's often helpful to have a
to have a document that answers some "yes/no" questions. You may not care about document that answers some "yes/no" questions. You may not care about what an `Iterator` looks like
what an `Iterator` looks like in assembly, you just need to know whether it allocates in assembly, you just need to know whether it allocates an object on the heap or not. And while Rust
an object on the heap or not. And while Rust will prioritize the fastest behavior it can, will prioritize the fastest behavior it can, here are the rules for each memory type:
here are the rules for each memory type:
**Heap Allocation**: **Heap Allocation**:
- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory. - Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory.
- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory. - Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory.
- Some smart pointers in the standard library have counterparts in other crates that - Some smart pointers in the standard library have counterparts in other crates that don't need heap
don't need heap memory. If possible, use those. memory. If possible, use those.
**Stack Allocation**: **Stack Allocation**:
@ -32,5 +31,5 @@ here are the rules for each memory type:
- `const` is a fixed value; the compiler is allowed to copy it wherever useful. - `const` is a fixed value; the compiler is allowed to copy it wherever useful.
- `static` is a fixed reference; the compiler will guarantee it is unique. - `static` is a fixed reference; the compiler will guarantee it is unique.
![Container Sizes in Rust](/assets/images/2019-02-04-container-size.svg) ![Container Sizes in Rust](/assets/images/2019-02-04-container-size.svg) --
-- [Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing) [Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing)

View File

@ -6,15 +6,22 @@ category:
tags: [baking] tags: [baking]
--- ---
Having recently started my "gardening leave" between positions, I have some more personal time available. I'm planning to stay productive, contributing to some open-source projects, but it also occurred to me that despite [talking about](https://speice.io/2018/05/hello.html) bread pics, this blog has been purely technical. Maybe I'll change the site title from "The Old Speice Guy" to "Bites and Bytes"? Having recently started my "gardening leave" between positions, I have some more personal time
available. I'm planning to stay productive, contributing to some open-source projects, but it also
occurred to me that despite [talking about](https://speice.io/2018/05/hello.html) bread pics, this
blog has been purely technical. Maybe I'll change the site title from "The Old Speice Guy" to "Bites
and Bytes"?
Either way, I'm baking a little bit again, and figured it was worth taking a quick break to focus on some lighter material. I recently learned two critically important lessons: first, the temperature of the dough when you put the yeast in makes a huge difference. Either way, I'm baking a little bit again, and figured it was worth taking a quick break to focus on
some lighter material. I recently learned two critically important lessons: first, the temperature
of the dough when you put the yeast in makes a huge difference.
Previously, when I wasn't paying attention to dough temperature: Previously, when I wasn't paying attention to dough temperature:
![Whole weat dough](/assets/images/2019-05-03-making-bread/whole-wheat-not-rising.jpg) ![Whole weat dough](/assets/images/2019-05-03-making-bread/whole-wheat-not-rising.jpg)
Compared with what happens when I put the dough in the microwave for a defrost cycle because the water I used wasn't warm enough: Compared with what happens when I put the dough in the microwave for a defrost cycle because the
water I used wasn't warm enough:
![White dough](/assets/images/2019-05-03-making-bread/white-dough-rising-before-fold.jpg) ![White dough](/assets/images/2019-05-03-making-bread/white-dough-rising-before-fold.jpg)
@ -26,14 +33,20 @@ After shaping the dough, I've got two loaves ready:
![Shaped loaves](/assets/images/2019-05-03-making-bread/shaped-loaves.jpg) ![Shaped loaves](/assets/images/2019-05-03-making-bread/shaped-loaves.jpg)
Now, the recipe normally calls for a Dutch Oven to bake the bread because it keeps the dough from drying out in the oven. Because I don't own a Dutch Oven, I typically put a casserole dish on the bottom rack and fill it with water so there's still some moisture in the oven. This time, I forgot to add the water and learned my second lesson: never add room-temperature water to a glass dish that's currently at 500 degrees. Now, the recipe normally calls for a Dutch Oven to bake the bread because it keeps the dough from
drying out in the oven. Because I don't own a Dutch Oven, I typically put a casserole dish on the
bottom rack and fill it with water so there's still some moisture in the oven. This time, I forgot
to add the water and learned my second lesson: never add room-temperature water to a glass dish
that's currently at 500 degrees.
![Shattered glass dish](/assets/images/2019-05-03-making-bread/shattered-glass.jpg) ![Shattered glass dish](/assets/images/2019-05-03-making-bread/shattered-glass.jpg)
Needless to say, trying to pull out sharp glass from an incredibly hot oven is not what I expected to be doing during my garden leave. Needless to say, trying to pull out sharp glass from an incredibly hot oven is not what I expected
to be doing during my garden leave.
In the end, the bread crust wasn't great, but the bread itself turned out pretty alright: In the end, the bread crust wasn't great, but the bread itself turned out pretty alright:
![Baked bread](/assets/images/2019-05-03-making-bread/final-product.jpg) ![Baked bread](/assets/images/2019-05-03-making-bread/final-product.jpg)
I've been writing a lot more during this break, so I'm looking forward to sharing that in the future. In the mean-time, I'm planning on making a sandwich. I've been writing a lot more during this break, so I'm looking forward to sharing that in the
future. In the mean-time, I'm planning on making a sandwich.

View File

@ -8,92 +8,289 @@ tags: []
**Update 2019-09-21**: Added notes on `isolcpus` and `systemd` affinity. **Update 2019-09-21**: Added notes on `isolcpus` and `systemd` affinity.
Prior to working in the trading industry, my assumption was that High Frequency Trading (HFT) is made up of people who have access to secret techniques mortal developers could only dream of. There had to be some secret art that could only be learned if one had an appropriately tragic backstory: Prior to working in the trading industry, my assumption was that High Frequency Trading (HFT) is
made up of people who have access to secret techniques mortal developers could only dream of. There
had to be some secret art that could only be learned if one had an appropriately tragic backstory:
<img src="/assets/images/2019-04-24-kung-fu.webp" alt="kung-fu fight"> <img src="/assets/images/2019-04-24-kung-fu.webp" alt="kung-fu fight">
> How I assumed HFT people learn their secret techniques > How I assumed HFT people learn their secret techniques
How else do you explain people working on systems that complete the round trip of market data in to orders out (a.k.a. tick-to-trade) consistently within [750-800 nanoseconds](https://stackoverflow.com/a/22082528/1454178)? How else do you explain people working on systems that complete the round trip of market data in to
In roughly the time it takes a computer to access [main memory 8 times](https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html), trading systems are capable of reading the market data packets, deciding what orders to send, doing risk checks, creating new packets for exchange-specific protocols, and putting those packets on the wire. orders out (a.k.a. tick-to-trade) consistently within
[750-800 nanoseconds](https://stackoverflow.com/a/22082528/1454178)? In roughly the time it takes a
computer to access
[main memory 8 times](https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html),
trading systems are capable of reading the market data packets, deciding what orders to send, doing
risk checks, creating new packets for exchange-specific protocols, and putting those packets on the
wire.
Having now worked in the trading industry, I can confirm the developers aren't super-human; I've made some simple mistakes at the very least. Instead, what shows up in public discussions is that philosophy, not technique, separates high-performance systems from everything else. Performance-critical systems don't rely on "this one cool C++ optimization trick" to make code fast (though micro-optimizations have their place); there's a lot more to worry about than just the code written for the project. Having now worked in the trading industry, I can confirm the developers aren't super-human; I've
made some simple mistakes at the very least. Instead, what shows up in public discussions is that
philosophy, not technique, separates high-performance systems from everything else.
Performance-critical systems don't rely on "this one cool C++ optimization trick" to make code fast
(though micro-optimizations have their place); there's a lot more to worry about than just the code
written for the project.
The framework I'd propose is this: **If you want to build high-performance systems, focus first on reducing performance variance** (reducing the gap between the fastest and slowest runs of the same code), **and only look at average latency once variance is at an acceptable level**. The framework I'd propose is this: **If you want to build high-performance systems, focus first on
reducing performance variance** (reducing the gap between the fastest and slowest runs of the same
code), **and only look at average latency once variance is at an acceptable level**.
Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly _every time_, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving average time. Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20
seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes
a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a
week is the same in each situation, but you're painfully aware of that minute when it happens. When
it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds
doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When
performance matters, you need to respond quickly _every time_, not just in aggregate.
High-performance systems should first optimize for time variance. Once you're consistent at the time
scale you care about, then focus on improving average time.
This focus on variance shows up all the time in industry too (emphasis added in all quotes below): This focus on variance shows up all the time in industry too (emphasis added in all quotes below):
- In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for NASDAQ's matching engine, the most performance-sensitive component of the exchange, dependability is highlighted in addition to instantaneous metrics: - In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for
NASDAQ's matching engine, the most performance-sensitive component of the exchange, dependability
is highlighted in addition to instantaneous metrics:
> Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40 microsecond average latency > Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40
> microsecond average latency
- The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance: - The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance:
> Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and **most predictable latency possible** of any messaging system > Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and
> **most predictable latency possible** of any messaging system
- The company PolySync, which is working on autonomous vehicles, [mentions why](https://polysync.io/blog/session-types-for-hearty-codecs/) they picked their specific messaging format: - The company PolySync, which is working on autonomous vehicles,
[mentions why](https://polysync.io/blog/session-types-for-hearty-codecs/) they picked their
specific messaging format:
> In general, high performance is almost always desirable for serialization. But in the world of autonomous vehicles, **steady timing performance is even more important** than peak throughput. This is because safe operation is sensitive to timing outliers. Nobody wants the system that decides when to slam on the brakes to occasionally take 100 times longer than usual to encode its commands. > In general, high performance is almost always desirable for serialization. But in the world of
> autonomous vehicles, **steady timing performance is even more important** than peak throughput.
> This is because safe operation is sensitive to timing outliers. Nobody wants the system that
> decides when to slam on the brakes to occasionally take 100 times longer than usual to encode
> its commands.
- [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out variance (jitter) as a big concern for [electronic trading](https://solarflare.com/electronic-trading/): - [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out
> The high stakes world of electronic trading, investment banks, market makers, hedge funds and exchanges demand the **lowest possible latency and jitter** while utilizing the highest bandwidth and return on their investment. variance (jitter) as a big concern for
[electronic trading](https://solarflare.com/electronic-trading/):
> The high stakes world of electronic trading, investment banks, market makers, hedge funds and
> exchanges demand the **lowest possible latency and jitter** while utilizing the highest
> bandwidth and return on their investment.
And to further clarify: we're not discussing _total run-time_, but variance of total run-time. There are situations where it's not reasonably possible to make things faster, and you'd much rather be consistent. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at _absolute minimum_ a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago). If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a trade occurs, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of _additional processing_ to a minimum, since speed of light is the limiting factor. And to further clarify: we're not discussing _total run-time_, but variance of total run-time. There
are situations where it's not reasonably possible to make things faster, and you'd much rather be
consistent. For example, trading firms use
[wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because
the speed of light through air is faster than through fiber-optic cables. There's still at _absolute
minimum_ a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between,
say,
[Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago).
If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a
trade occurs, there's a physical limit to how long that will take. In this situation, the focus is
on keeping variance of _additional processing_ to a minimum, since speed of light is the limiting
factor.
So how does one go about looking for and eliminating performance variance? To tell the truth, I don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep understanding of the entire technology stack, and (B) actually measuring system performance (though (C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for inspiration never hurt). Even then, every project cares about performance to a different degree; you may need to build an entire [replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to accurately benchmark at nanosecond precision, or you may be content to simply [avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in your Java code. So how does one go about looking for and eliminating performance variance? To tell the truth, I
don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep
understanding of the entire technology stack, and (B) actually measuring system performance (though
(C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for
inspiration never hurt). Even then, every project cares about performance to a different degree; you
may need to build an entire
[replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to
accurately benchmark at nanosecond precision, or you may be content to simply
[avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in
your Java code.
Even though everyone has different needs, there are still common things to look for when trying to isolate and eliminate variance. In no particular order, these are my focus areas when thinking about high-performance systems: Even though everyone has different needs, there are still common things to look for when trying to
isolate and eliminate variance. In no particular order, these are my focus areas when thinking about
high-performance systems:
## Language-specific ## Language-specific
**Garbage Collection**: How often does garbage collection happen? When is it triggered? What are the impacts? **Garbage Collection**: How often does garbage collection happen? When is it triggered? What are the
impacts?
- [In Python](https://rushter.com/blog/python-garbage-collector/), individual objects are collected if the reference count reaches 0, and each generation is collected if `num_alloc - num_dealloc > gc_threshold` whenever an allocation happens. The GIL is acquired for the duration of generational collection. - [In Python](https://rushter.com/blog/python-garbage-collector/), individual objects are collected
- Java has [many](https://docs.oracle.com/en/java/javase/12/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE) [different](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573) [collection](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A) [algorithms](https://docs.oracle.com/en/java/javase/12/gctuning/z-garbage-collector1.html#GUID-A5A42691-095E-47BA-B6DC-FB4E5FAA43D0) to choose from, each with different characteristics. The default algorithms (Parallel GC in Java 8, G1 in Java 9) freeze the JVM while collecting, while more recent algorithms ([ZGC](https://wiki.openjdk.java.net/display/zgc) and [Shenandoah](https://wiki.openjdk.java.net/display/shenandoah)) are designed to keep "stop the world" to a minimum by doing collection work in parallel. if the reference count reaches 0, and each generation is collected if
`num_alloc - num_dealloc > gc_threshold` whenever an allocation happens. The GIL is acquired for
the duration of generational collection.
- Java has
[many](https://docs.oracle.com/en/java/javase/12/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE)
[different](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573)
[collection](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A)
[algorithms](https://docs.oracle.com/en/java/javase/12/gctuning/z-garbage-collector1.html#GUID-A5A42691-095E-47BA-B6DC-FB4E5FAA43D0)
to choose from, each with different characteristics. The default algorithms (Parallel GC in Java
8, G1 in Java 9) freeze the JVM while collecting, while more recent algorithms
([ZGC](https://wiki.openjdk.java.net/display/zgc) and
[Shenandoah](https://wiki.openjdk.java.net/display/shenandoah)) are designed to keep "stop the
world" to a minimum by doing collection work in parallel.
**Allocation**: Every language has a different way of interacting with "heap" memory, but the principle is the same: running the allocator to allocate/deallocate memory takes time that can often be put to better use. Understanding when your language interacts with the allocator is crucial, and not always obvious. For example: C++ and Rust don't allocate heap memory for iterators, but Java does (meaning potential GC pauses). Take time to understand heap behavior (I made a [a guide for Rust](/2019/02/understanding-allocations-in-rust.html)), and look into alternative allocators ([jemalloc](http://jemalloc.net/), [tcmalloc](https://gperftools.github.io/gperftools/tcmalloc.html)) that might run faster than the operating system default. **Allocation**: Every language has a different way of interacting with "heap" memory, but the
principle is the same: running the allocator to allocate/deallocate memory takes time that can often
be put to better use. Understanding when your language interacts with the allocator is crucial, and
not always obvious. For example: C++ and Rust don't allocate heap memory for iterators, but Java
does (meaning potential GC pauses). Take time to understand heap behavior (I made a
[a guide for Rust](/2019/02/understanding-allocations-in-rust.html)), and look into alternative
allocators ([jemalloc](http://jemalloc.net/),
[tcmalloc](https://gperftools.github.io/gperftools/tcmalloc.html)) that might run faster than the
operating system default.
**Data Layout**: How your data is arranged in memory matters; [data-oriented design](https://www.youtube.com/watch?v=yy8jQgmhbAU) and [cache locality](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=1185) can have huge impacts on performance. The C family of languages (C, value types in C#, C++) and Rust all have guarantees about the shape every object takes in memory that others (e.g. Java and Python) can't make. [Cachegrind](http://valgrind.org/docs/manual/cg-manual.html) and kernel [perf](https://perf.wiki.kernel.org/index.php/Main_Page) counters are both great for understanding how performance relates to memory layout. **Data Layout**: How your data is arranged in memory matters;
[data-oriented design](https://www.youtube.com/watch?v=yy8jQgmhbAU) and
[cache locality](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=1185) can have huge
impacts on performance. The C family of languages (C, value types in C#, C++) and Rust all have
guarantees about the shape every object takes in memory that others (e.g. Java and Python) can't
make. [Cachegrind](http://valgrind.org/docs/manual/cg-manual.html) and kernel
[perf](https://perf.wiki.kernel.org/index.php/Main_Page) counters are both great for understanding
how performance relates to memory layout.
**Just-In-Time Compilation**: Languages that are compiled on the fly (LuaJIT, C#, Java, PyPy) are great because they optimize your program for how it's actually being used, rather than how a compiler expects it to be used. However, there's a variance problem if the program stops executing while waiting for translation from VM bytecode to native code. As a remedy, many languages support ahead-of-time compilation in addition to the JIT versions ([CoreRT](https://github.com/dotnet/corert) in C# and [GraalVM](https://www.graalvm.org/) in Java). On the other hand, LLVM supports [Profile Guided Optimization](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization), which theoretically brings JIT benefits to non-JIT languages. Finally, be careful to avoid comparing apples and oranges during benchmarks; you don't want your code to suddenly speed up because the JIT compiler kicked in. **Just-In-Time Compilation**: Languages that are compiled on the fly (LuaJIT, C#, Java, PyPy) are
great because they optimize your program for how it's actually being used, rather than how a
compiler expects it to be used. However, there's a variance problem if the program stops executing
while waiting for translation from VM bytecode to native code. As a remedy, many languages support
ahead-of-time compilation in addition to the JIT versions
([CoreRT](https://github.com/dotnet/corert) in C# and [GraalVM](https://www.graalvm.org/) in Java).
On the other hand, LLVM supports
[Profile Guided Optimization](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization),
which theoretically brings JIT benefits to non-JIT languages. Finally, be careful to avoid comparing
apples and oranges during benchmarks; you don't want your code to suddenly speed up because the JIT
compiler kicked in.
**Programming Tricks**: These won't make or break performance, but can be useful in specific circumstances. For example, C++ can use [templates instead of branches](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=1206) in critical sections. **Programming Tricks**: These won't make or break performance, but can be useful in specific
circumstances. For example, C++ can use
[templates instead of branches](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=1206)
in critical sections.
## Kernel ## Kernel
Code you wrote is almost certainly not the _only_ code running on your hardware. There are many ways the operating system interacts with your program, from interrupts to system calls, that are important to watch for. These are written from a Linux perspective, but Windows does typically have equivalent functionality. Code you wrote is almost certainly not the _only_ code running on your hardware. There are many ways
the operating system interacts with your program, from interrupts to system calls, that are
important to watch for. These are written from a Linux perspective, but Windows does typically have
equivalent functionality.
**Scheduling**: The kernel is normally free to schedule any process on any core, so it's important to reserve CPU cores exclusively for the important programs. There are a few parts to this: first, limit the CPU cores that non-critical processes are allowed to run on by excluding cores from scheduling ([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html) kernel command-line option), or by setting the `init` process CPU affinity ([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes to run on the isolated cores by setting the [processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using [taskset](https://linux.die.net/man/1/taskset). Finally, use [`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or [`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off hyper-threading is also likely beneficial. **Scheduling**: The kernel is normally free to schedule any process on any core, so it's important
to reserve CPU cores exclusively for the important programs. There are a few parts to this: first,
limit the CPU cores that non-critical processes are allowed to run on by excluding cores from
scheduling
([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html)
kernel command-line option), or by setting the `init` process CPU affinity
([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes
to run on the isolated cores by setting the
[processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using
[taskset](https://linux.die.net/man/1/taskset). Finally, use
[`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or
[`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off
hyper-threading is also likely beneficial.
**System calls**: Reading from a UNIX socket? Writing to a file? In addition to not knowing how long the I/O operation takes, these all trigger expensive [system calls (syscalls)](https://en.wikipedia.org/wiki/System_call). To handle these, the CPU must [context switch](https://en.wikipedia.org/wiki/Context_switch) to the kernel, let the kernel operation complete, then context switch back to your program. We'd rather keep these [to a minimum](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) (see timestamp 18:20). [Strace](https://linux.die.net/man/1/strace) is your friend for understanding when and where syscalls happen. **System calls**: Reading from a UNIX socket? Writing to a file? In addition to not knowing how long
the I/O operation takes, these all trigger expensive
[system calls (syscalls)](https://en.wikipedia.org/wiki/System_call). To handle these, the CPU must
[context switch](https://en.wikipedia.org/wiki/Context_switch) to the kernel, let the kernel
operation complete, then context switch back to your program. We'd rather keep these
[to a minimum](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) (see
timestamp 18:20). [Strace](https://linux.die.net/man/1/strace) is your friend for understanding when
and where syscalls happen.
**Signal Handling**: Far less likely to be an issue, but signals do trigger a context switch if your code has a handler registered. This will be highly dependent on the application, but you can [block signals](https://www.linuxprogrammingblog.com/all-about-linux-signals?page=show#Blocking_signals) if it's an issue. **Signal Handling**: Far less likely to be an issue, but signals do trigger a context switch if your
code has a handler registered. This will be highly dependent on the application, but you can
[block signals](https://www.linuxprogrammingblog.com/all-about-linux-signals?page=show#Blocking_signals)
if it's an issue.
**Interrupts**: System interrupts are how devices connected to your computer notify the CPU that something has happened. The CPU will then choose a processor core to pause and context switch to the OS to handle the interrupt. Make sure that [SMP affinity](http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux) is set so that interrupts are handled on a CPU core not running the program you care about. **Interrupts**: System interrupts are how devices connected to your computer notify the CPU that
something has happened. The CPU will then choose a processor core to pause and context switch to the
OS to handle the interrupt. Make sure that
[SMP affinity](http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux) is
set so that interrupts are handled on a CPU core not running the program you care about.
**[NUMA](https://www.kernel.org/doc/html/latest/vm/numa.html)**: While NUMA is good at making multi-cell systems transparent, there are variance implications; if the kernel moves a process across nodes, future memory accesses must wait for the controller on the original node. Use [numactl](https://linux.die.net/man/8/numactl) to handle memory-/cpu-cell pinning so this doesn't happen. **[NUMA](https://www.kernel.org/doc/html/latest/vm/numa.html)**: While NUMA is good at making
multi-cell systems transparent, there are variance implications; if the kernel moves a process
across nodes, future memory accesses must wait for the controller on the original node. Use
[numactl](https://linux.die.net/man/8/numactl) to handle memory-/cpu-cell pinning so this doesn't
happen.
## Hardware ## Hardware
**CPU Pipelining/Speculation**: Speculative execution in modern processors gave us vulnerabilities like Spectre, but it also gave us performance improvements like [branch prediction](https://stackoverflow.com/a/11227902/1454178). And if the CPU mis-speculates your code, there's variance associated with rewind and replay. While the compiler knows a lot about how your CPU [pipelines instructions](https://youtu.be/nAbCKa0FzjQ?t=4467), code can be [structured to help](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=755) the branch predictor. **CPU Pipelining/Speculation**: Speculative execution in modern processors gave us vulnerabilities
like Spectre, but it also gave us performance improvements like
[branch prediction](https://stackoverflow.com/a/11227902/1454178). And if the CPU mis-speculates
your code, there's variance associated with rewind and replay. While the compiler knows a lot about
how your CPU [pipelines instructions](https://youtu.be/nAbCKa0FzjQ?t=4467), code can be
[structured to help](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=755) the branch
predictor.
**Paging**: For most systems, virtual memory is incredible. Applications live in their own worlds, and the CPU/[MMU](https://en.wikipedia.org/wiki/Memory_management_unit) figures out the details. However, there's a variance penalty associated with memory paging and caching; if you access more memory pages than the [TLB](https://en.wikipedia.org/wiki/Translation_lookaside_buffer) can store, you'll have to wait for the page walk. Kernel perf tools are necessary to figure out if this is an issue, but using [huge pages](https://blog.pythian.com/performance-tuning-hugepages-in-linux/) can reduce TLB burdens. Alternately, running applications in a hypervisor like [Jailhouse](https://github.com/siemens/jailhouse) allows one to skip virtual memory entirely, but this is probably more work than the benefits are worth. **Paging**: For most systems, virtual memory is incredible. Applications live in their own worlds,
and the CPU/[MMU](https://en.wikipedia.org/wiki/Memory_management_unit) figures out the details.
However, there's a variance penalty associated with memory paging and caching; if you access more
memory pages than the [TLB](https://en.wikipedia.org/wiki/Translation_lookaside_buffer) can store,
you'll have to wait for the page walk. Kernel perf tools are necessary to figure out if this is an
issue, but using [huge pages](https://blog.pythian.com/performance-tuning-hugepages-in-linux/) can
reduce TLB burdens. Alternately, running applications in a hypervisor like
[Jailhouse](https://github.com/siemens/jailhouse) allows one to skip virtual memory entirely, but
this is probably more work than the benefits are worth.
**Network Interfaces**: When more than one computer is involved, variance can go up dramatically. Tuning kernel [network parameters](https://github.com/leandromoreira/linux-network-performance-parameters) may be helpful, but modern systems more frequently opt to skip the kernel altogether with a technique called [kernel bypass](https://blog.cloudflare.com/kernel-bypass/). This typically requires specialized hardware and [drivers](https://www.openonload.org/), but even industries like [telecom](https://www.bbc.co.uk/rd/blog/2018-04-high-speed-networking-open-source-kernel-bypass) are finding the benefits. **Network Interfaces**: When more than one computer is involved, variance can go up dramatically.
Tuning kernel
[network parameters](https://github.com/leandromoreira/linux-network-performance-parameters) may be
helpful, but modern systems more frequently opt to skip the kernel altogether with a technique
called [kernel bypass](https://blog.cloudflare.com/kernel-bypass/). This typically requires
specialized hardware and [drivers](https://www.openonload.org/), but even industries like
[telecom](https://www.bbc.co.uk/rd/blog/2018-04-high-speed-networking-open-source-kernel-bypass) are
finding the benefits.
## Networks ## Networks
**Routing**: There's a reason financial firms are willing to pay [millions of euros](https://sniperinmahwah.wordpress.com/2019/03/26/4-les-moeres-english-version/) for rights to a small plot of land - having a straight-line connection from point A to point B means the path their data takes is the shortest possible. In contrast, there are currently 6 computers in between me and Google, but that may change at any moment if my ISP realizes a [more efficient route](https://en.wikipedia.org/wiki/Border_Gateway_Protocol) is available. Whether it's using [research-quality equipment](https://sniperinmahwah.wordpress.com/2018/05/07/shortwave-trading-part-i-the-west-chicago-tower-mystery/) for shortwave radio, or just making sure there's no data inadvertently going between data centers, routing matters. **Routing**: There's a reason financial firms are willing to pay
[millions of euros](https://sniperinmahwah.wordpress.com/2019/03/26/4-les-moeres-english-version/)
for rights to a small plot of land - having a straight-line connection from point A to point B means
the path their data takes is the shortest possible. In contrast, there are currently 6 computers in
between me and Google, but that may change at any moment if my ISP realizes a
[more efficient route](https://en.wikipedia.org/wiki/Border_Gateway_Protocol) is available. Whether
it's using
[research-quality equipment](https://sniperinmahwah.wordpress.com/2018/05/07/shortwave-trading-part-i-the-west-chicago-tower-mystery/)
for shortwave radio, or just making sure there's no data inadvertently going between data centers,
routing matters.
**Protocol**: TCP as a network protocol is awesome: guaranteed and in-order delivery, flow control, and congestion control all built in. But these attributes make the most sense when networking infrastructure is lossy; for systems that expect nearly all packets to be delivered correctly, the setup handshaking and packet acknowledgment are just overhead. Using UDP (unicast or multicast) may make sense in these contexts as it avoids the chatter needed to track connection state, and [gap-fill](https://iextrading.com/docs/IEX%20Transport%20Specification.pdf) [strategies](http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/moldudp64.pdf) can handle the rest. **Protocol**: TCP as a network protocol is awesome: guaranteed and in-order delivery, flow control,
and congestion control all built in. But these attributes make the most sense when networking
infrastructure is lossy; for systems that expect nearly all packets to be delivered correctly, the
setup handshaking and packet acknowledgment are just overhead. Using UDP (unicast or multicast) may
make sense in these contexts as it avoids the chatter needed to track connection state, and
[gap-fill](https://iextrading.com/docs/IEX%20Transport%20Specification.pdf)
[strategies](http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/moldudp64.pdf)
can handle the rest.
**Switching**: Many routers/switches handle packets using "store-and-forward" behavior: wait for the whole packet, validate checksums, and then send to the next device. In variance terms, the time needed to move data between two nodes is proportional to the size of that data; the switch must "store" all data before it can calculate checksums and "forward" to the next node. With ["cut-through"](https://www.networkworld.com/article/2241573/latency-and-jitter--cut-through-design-pays-off-for-arista--blade.html) designs, switches will begin forwarding data as soon as they know where the destination is, checksums be damned. This means there's a fixed cost (at the switch) for network traffic, no matter the size. **Switching**: Many routers/switches handle packets using "store-and-forward" behavior: wait for the
whole packet, validate checksums, and then send to the next device. In variance terms, the time
needed to move data between two nodes is proportional to the size of that data; the switch must
"store" all data before it can calculate checksums and "forward" to the next node. With
["cut-through"](https://www.networkworld.com/article/2241573/latency-and-jitter--cut-through-design-pays-off-for-arista--blade.html)
designs, switches will begin forwarding data as soon as they know where the destination is,
checksums be damned. This means there's a fixed cost (at the switch) for network traffic, no matter
the size.
# Final Thoughts # Final Thoughts
High-performance systems, regardless of industry, are not magical. They do require extreme precision and attention to detail, but they're designed, built, and operated by regular people, using a lot of tools that are publicly available. Interested in seeing how context switching affects performance of your benchmarks? `taskset` should be installed in all modern Linux distributions, and can be used to make sure the OS never migrates your process. Curious how often garbage collection triggers during a crucial operation? Your language of choice will typically expose details of its operations ([Python](https://docs.python.org/3/library/gc.html), [Java](https://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#DebuggingOptions)). Want to know how hard your program is stressing the TLB? Use `perf record` and look for `dtlb_load_misses.miss_causes_a_walk`. High-performance systems, regardless of industry, are not magical. They do require extreme precision
and attention to detail, but they're designed, built, and operated by regular people, using a lot of
tools that are publicly available. Interested in seeing how context switching affects performance of
your benchmarks? `taskset` should be installed in all modern Linux distributions, and can be used to
make sure the OS never migrates your process. Curious how often garbage collection triggers during a
crucial operation? Your language of choice will typically expose details of its operations
([Python](https://docs.python.org/3/library/gc.html),
[Java](https://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#DebuggingOptions)).
Want to know how hard your program is stressing the TLB? Use `perf record` and look for
`dtlb_load_misses.miss_causes_a_walk`.
Two final guiding questions, then: first, before attempting to apply some of the technology above to your own systems, can you first identify [where/when you care](http://wiki.c2.com/?PrematureOptimization) about "high-performance"? As an example, if parts of a system rely on humans pushing buttons, CPU pinning won't have any measurable effect. Humans are already far too slow to react in time. Second, if you're using benchmarks, are they being designed in a way that's actually helpful? Tools like [Criterion](http://www.serpentine.com/criterion/) (also in [Rust](https://github.com/bheisler/criterion.rs)) and Google's [Benchmark](https://github.com/google/benchmark) output not only average run time, but variance as well; your benchmarking environment is subject to the same concerns your production environment is. Two final guiding questions, then: first, before attempting to apply some of the technology above to
your own systems, can you first identify
[where/when you care](http://wiki.c2.com/?PrematureOptimization) about "high-performance"? As an
example, if parts of a system rely on humans pushing buttons, CPU pinning won't have any measurable
effect. Humans are already far too slow to react in time. Second, if you're using benchmarks, are
they being designed in a way that's actually helpful? Tools like
[Criterion](http://www.serpentine.com/criterion/) (also in
[Rust](https://github.com/bheisler/criterion.rs)) and Google's
[Benchmark](https://github.com/google/benchmark) output not only average run time, but variance as
well; your benchmarking environment is subject to the same concerns your production environment is.
Finally, I believe high-performance systems are a matter of philosophy, not necessarily technique. Rigorous focus on variance is the first step, and there are plenty of ways to measure and mitigate it; once that's at an acceptable level, then optimize for speed. Finally, I believe high-performance systems are a matter of philosophy, not necessarily technique.
Rigorous focus on variance is the first step, and there are plenty of ways to measure and mitigate
it; once that's at an acceptable level, then optimize for speed.

View File

@ -6,29 +6,32 @@ category:
tags: [rust] tags: [rust]
--- ---
I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) I've found that in many personal projects,
is particularly deadly. Making good decisions in the beginning avoids pain and suffering later; [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) is particularly deadly.
if extra research prevents future problems, I'm happy to continue ~~procrastinating~~ researching indefinitely. Making good decisions in the beginning avoids pain and suffering later; if extra research prevents
future problems, I'm happy to continue ~~procrastinating~~ researching indefinitely.
So let's say you're in need of a binary serialization format. Data will be going over the network, not just in memory, So let's say you're in need of a binary serialization format. Data will be going over the network,
so having a schema document and code generation is a must. Performance is crucial, so formats that support zero-copy not just in memory, so having a schema document and code generation is a must. Performance is
de/serialization are given priority. And the more languages supported, the better; I use Rust, crucial, so formats that support zero-copy de/serialization are given priority. And the more
but can't predict what other languages this could interact with. languages supported, the better; I use Rust, but can't predict what other languages this could
interact with.
Given these requirements, the candidates I could find were: Given these requirements, the candidates I could find were:
1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and is the most established 1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and is the most established
2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding 2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler
3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest encoding, encoding
but the Rust implementation is unmaintained 3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest
encoding, but the Rust implementation is unmaintained
Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast, Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably
and polyglot support. But how do you actually pick one? It's impossible to know what issues will follow that choice, fast, and polyglot support. But how do you actually pick one? It's impossible to know what issues
so I tend to avoid commitment until the last possible moment. will follow that choice, so I tend to avoid commitment until the last possible moment.
Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a
proof-of-concept system in each format and pit them against each other. All code can be found in the small proof-of-concept system in each format and pit them against each other. All code can be found
[repository](https://github.com/speice-io/marketdata-shootout) for this post. in the [repository](https://github.com/speice-io/marketdata-shootout) for this post.
We'll discuss more in detail, but a quick preview of the results: We'll discuss more in detail, but a quick preview of the results:
@ -39,17 +42,19 @@ We'll discuss more in detail, but a quick preview of the results:
# Prologue: Binary Parsing with Nom # Prologue: Binary Parsing with Nom
Our benchmark system will be a simple data processor; given depth-of-book market data from Our benchmark system will be a simple data processor; given depth-of-book market data from
[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format, [IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema
read it back, and calculate total size of stock traded and the lowest/highest quoted prices. This test format, read it back, and calculate total size of stock traded and the lowest/highest quoted prices.
isn't complex, but is representative of the project I need a binary format for. This test isn't complex, but is representative of the project I need a binary format for.
But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library But before we make it to that point, we have to actually read in the market data. To do so, I'm
called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes, using a library called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and
so this was an opportunity to build a non-trivial program and get familiar. brought some big changes, so this was an opportunity to build a non-trivial program and get
familiar.
If you don't already know about `nom`, it's a "parser generator". By combining different smaller parsers, If you don't already know about `nom`, it's a "parser generator". By combining different smaller
you can assemble a parser to handle complex structures without writing tedious code by hand. parsers, you can assemble a parser to handle complex structures without writing tedious code by
For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3): hand. For example, when parsing
[PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3):
``` ```
0 1 2 3 0 1 2 3
@ -108,45 +113,57 @@ pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> {
While this example isn't too interesting, more complex formats (like IEX market data) are where While this example isn't too interesting, more complex formats (like IEX market data) are where
[`nom` really shines](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs). [`nom` really shines](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs).
Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance. Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too
Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write tons of boring code by hand. interested in its performance. Still, it's worth mentioning that building the market data parser was
actually fun; I didn't have to write tons of boring code by hand.
# Part 1: Cap'n Proto # Part 1: Cap'n Proto
Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried because of how long Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried
it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for maintaining the Rust port since because of how long it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for
[2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton of performance concerns maintaining the Rust port since
once I started using it. [2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton
of performance concerns once I started using it.
To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the
content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate heap to hold the message content, but because builders
a new buffer for every single message. I was able to work around this with a [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate a
new buffer for every single message. I was able to work around this with a
[special builder](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51) [special builder](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51)
that could re-use the buffer, but it required reading through Cap'n Proto's that could re-use the buffer, but it required reading through Cap'n Proto's
[benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)
to find an example, and used [`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow checker. to find an example, and used
[`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow
checker.
The process of reading messages was better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) The process of reading messages was better, but still had issues. Cap'n Proto has two message
representation, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; encodings: a ["packed"](https://capnproto.org/encoding.html#packing) representation, and an
Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that. "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into
In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/). before we can use it; Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't
However, accomplishing zero-copy deserialization required code in the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), able to figure out a way around that. In contrast, the unpacked message format should be where Cap'n
and we allocate a vector on every read for the segment table. Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/).
However, accomplishing zero-copy deserialization required code in the private API
([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), and we allocate a vector on
every read for the segment table.
In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel comfortable In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too
using it long-term. many issues for me to feel comfortable using it long-term.
# Part 2: Flatbuffers # Part 2: Flatbuffers
This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, This is the new kid on the block. After a
official support was [recently launched](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to address [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, official support
the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims was [recently launched](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to
to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html). address the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The
difference is that Flatbuffers claims to have a simpler wire format and
[more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).
On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice, and unlike On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is
Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. However, there were still some issues. nice, and unlike Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. However,
there were still some issues.
First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following: First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats
like the following:
``` ```
table Message { table Message {
@ -157,51 +174,61 @@ table MultiMessage {
} }
``` ```
We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type). We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself
I was able to work around this by [caching `Message` elements](https://github.com/speice-io/marketdata-shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83) contains a vector (the `string` type). I was able to work around this by
in a `SmallVec` before building the final `MultiMessage`, but it was a painful process that I believe contributed to poor serialization performance. [caching `Message` elements](https://github.com/speice-io/marketdata-shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)
in a `SmallVec` before building the final `MultiMessage`, but it was a painful process that I
believe contributed to poor serialization performance.
Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898). Second, streaming support in Flatbuffers seems to be something of an
Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of each [afterthought](https://github.com/google/flatbuffers/issues/3898). Where Cap'n Proto in Rust handles
message to indicate the size. Not specifically a problem, but calculating message size without that tag is nigh on impossible. reading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of
each message to indicate the size. Not specifically a problem, but calculating message size without
that tag is nigh on impossible.
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well. Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform
well.
# Part 3: Simple Binary Encoding # Part 3: Simple Binary Encoding
Support for SBE was added by the author of one of my favorite Support for SBE was added by the author of one of my favorite
[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/). [Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).
I've [talked previously]({% post_url 2019-06-31-high-performance-systems %}) about how important variance is in I've [talked previously]({% post_url 2019-06-31-high-performance-systems %}) about how important
high-performance systems, so it was encouraging to read about a format that variance is in high-performance systems, so it was encouraging to read about a format that
[directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my concerns. SBE has by far [directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my
the simplest binary format, but it does make some tradeoffs. concerns. SBE has by far the simplest binary format, but it does make some tradeoffs.
Both Cap'n Proto and Flatbuffers use [message offsets](https://capnproto.org/encoding.html#structs) to handle Both Cap'n Proto and Flatbuffers use [message offsets](https://capnproto.org/encoding.html#structs)
variable-length data, [unions](https://capnproto.org/language.html#unions), and various other features. In contrast, to handle variable-length data, [unions](https://capnproto.org/language.html#unions), and various
messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml); other features. In contrast, messages in SBE are essentially
[just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml);
variable-length data is supported, but there's no union type. variable-length data is supported, but there's no union type.
As mentioned in the beginning, the Rust port of SBE works well, but is As mentioned in the beginning, the Rust port of SBE works well, but is
[essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9). [essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9).
However, if you don't need union types, and can accept that schemas are XML documents, it's still worth using. SBE's implementation However, if you don't need union types, and can accept that schemas are XML documents, it's still
had the best streaming support of all formats I tested, and doesn't trigger allocation during de/serialization. worth using. SBE's implementation had the best streaming support of all formats I tested, and
doesn't trigger allocation during de/serialization.
# Results # Results
After building a test harness [for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs) After building a test harness
[for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs)
[each](https://github.com/speice-io/marketdata-shootout/blob/master/src/flatbuffers_runner.rs) [each](https://github.com/speice-io/marketdata-shootout/blob/master/src/flatbuffers_runner.rs)
[format](https://github.com/speice-io/marketdata-shootout/blob/master/src/sbe_runner.rs), [format](https://github.com/speice-io/marketdata-shootout/blob/master/src/sbe_runner.rs), it was
it was time to actually take them for a spin. I used time to actually take them for a spin. I used
[this script](https://github.com/speice-io/marketdata-shootout/blob/master/run_shootout.sh) to run the benchmarks, [this script](https://github.com/speice-io/marketdata-shootout/blob/master/run_shootout.sh) to run
and the raw results are [here](https://github.com/speice-io/marketdata-shootout/blob/master/shootout.csv). All data the benchmarks, and the raw results are
reported below is the average of 10 runs on a single day of IEX data. Results were validated to make sure [here](https://github.com/speice-io/marketdata-shootout/blob/master/shootout.csv). All data reported
below is the average of 10 runs on a single day of IEX data. Results were validated to make sure
that each format parsed the data correctly. that each format parsed the data correctly.
## Serialization ## Serialization
This test measures, on a This test measures, on a
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272), [per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272),
how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer. how long it takes to serialize the IEX message into the desired format and write to a pre-allocated
buffer.
| Schema | Median | 99th Pctl | 99.9th Pctl | Total | | Schema | Median | 99th Pctl | 99.9th Pctl | Total |
| :------------------- | :----- | :-------- | :---------- | :----- | | :------------------- | :----- | :-------- | :---------- | :----- |
@ -214,9 +241,9 @@ how long it takes to serialize the IEX message into the desired format and write
This test measures, on a This test measures, on a
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298), [per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298),
how long it takes to read the previously-serialized message and how long it takes to read the previously-serialized message and perform some basic aggregation. The
perform some basic aggregation. The aggregation code is the same for each format, aggregation code is the same for each format, so any performance differences are due solely to the
so any performance differences are due solely to the format implementation. format implementation.
| Schema | Median | 99th Pctl | 99.9th Pctl | Total | | Schema | Median | 99th Pctl | 99.9th Pctl | Total |
| :------------------- | :----- | :-------- | :---------- | :----- | | :------------------- | :----- | :-------- | :---------- | :----- |
@ -227,10 +254,10 @@ so any performance differences are due solely to the format implementation.
# Conclusion # Conclusion
Building a benchmark turned out to be incredibly helpful in making a decision; because a Building a benchmark turned out to be incredibly helpful in making a decision; because a "union"
"union" type isn't important to me, I can be confident that SBE best addresses my needs. type isn't important to me, I can be confident that SBE best addresses my needs.
While SBE was the fastest in terms of both median and worst-case performance, its worst case While SBE was the fastest in terms of both median and worst-case performance, its worst case
performance was proportionately far higher than any other format. It seems to be that de/serialization performance was proportionately far higher than any other format. It seems to be that
time scales with message size, but I'll need to do some more research to understand what exactly de/serialization time scales with message size, but I'll need to do some more research to understand
is going on. what exactly is going on.

View File

@ -6,11 +6,26 @@ category:
tags: [python] tags: [python]
--- ---
Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock) (GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision made before multi-core CPU's were widely available, but the fact that it's still around indicates that it generally works [Good](https://wiki.c2.com/?PrematureOptimization) [Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective workarounds; it's not hard to start a [new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to synchronize code running in parallel. Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)
(GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision
made before multi-core CPU's were widely available, but the fact that it's still around indicates
that it generally works [Good](https://wiki.c2.com/?PrematureOptimization)
[Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective
workarounds; it's not hard to start a
[new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to
synchronize code running in parallel.
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of both Python's productivity and the modern CPU's parallel capabilities. Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of
asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of
both Python's productivity and the modern CPU's parallel capabilities.
Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about whether it's a good idea to use them. Very often, unlocking the GIL is an [XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at the expense of project complexity; messing with the GIL is ultimately messing with Python's memory model. Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes
Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about
whether it's a good idea to use them. Very often, unlocking the GIL is an
[XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the
GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at
the expense of project complexity; messing with the GIL is ultimately messing with Python's memory
model.
```python ```python
%load_ext Cython %load_ext Cython
@ -21,12 +36,23 @@ N = 1_000_000_000
# Cython # Cython
Put simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python, gets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates well with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It's great for building Python wrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And when it comes to managing the GIL, there are two special features: Put simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python,
gets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates
well with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It's great for building Python
wrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And
when it comes to managing the GIL, there are two special features:
- The `nogil` [function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil) asserts that a Cython function is safe to use without the GIL, and compilation will fail if it interacts with Python in an unsafe manner - The `nogil`
- The `with nogil` [context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil) explicitly unlocks the CPython GIL while active [function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil)
asserts that a Cython function is safe to use without the GIL, and compilation will fail if it
interacts with Python in an unsafe manner
- The `with nogil`
[context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil)
explicitly unlocks the CPython GIL while active
Whenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter is unblocked and allowed to continue work elsewhere. We'll define a "busy work" function that demonstrates this principle in action: Whenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter
is unblocked and allowed to continue work elsewhere. We'll define a "busy work" function that
demonstrates this principle in action:
```python ```python
%%cython %%cython
@ -84,7 +110,9 @@ _ = cython_nogil(N);
> Wall time: 388 ms > Wall time: 388 ms
> </pre> > </pre>
Both versions (with and without GIL) take effectively the same amount of time to run. Even when running this calculation in parallel on separate threads, it is expected that the run time will double because only one thread can be active at a time: Both versions (with and without GIL) take effectively the same amount of time to run. Even when
running this calculation in parallel on separate threads, it is expected that the run time will
double because only one thread can be active at a time:
```python ```python
%%time %%time
@ -104,7 +132,8 @@ t1.join(); t2.join()
> Wall time: 645 ms > Wall time: 645 ms
> </pre> > </pre>
However, if the first thread releases the GIL, the second thread is free to acquire it and run in parallel: However, if the first thread releases the GIL, the second thread is free to acquire it and run in
parallel:
```python ```python
%%time %%time
@ -120,7 +149,9 @@ t1.join(); t2.join()
> Wall time: 358 ms > Wall time: 358 ms
> </pre> > </pre>
Because `user` time represents the sum of processing time on all threads, it doesn't change much. The ["wall time"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half because each function is running simultaneously. Because `user` time represents the sum of processing time on all threads, it doesn't change much.
The ["wall time"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half
because each function is running simultaneously.
Keep in mind that the **order in which threads are started** makes a difference! Keep in mind that the **order in which threads are started** makes a difference!
@ -139,9 +170,11 @@ t1.join(); t2.join()
> Wall time: 672 ms > Wall time: 672 ms
> </pre> > </pre>
Even though the second thread releases the GIL while running, it can't start until the first has completed. Thus, the overall runtime is effectively the same as running two GIL-locked threads. Even though the second thread releases the GIL while running, it can't start until the first has
completed. Thus, the overall runtime is effectively the same as running two GIL-locked threads.
Finally, be aware that attempting to unlock the GIL from a thread that doesn't own it will crash the **interpreter**, not just the thread attempting the unlock: Finally, be aware that attempting to unlock the GIL from a thread that doesn't own it will crash the
**interpreter**, not just the thread attempting the unlock:
```python ```python
%%cython %%cython
@ -165,15 +198,30 @@ cython_recurse(2)
> File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap > File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
> </pre> > </pre>
In practice, avoiding this issue is simple. First, `nogil` functions probably shouldn't contain `with nogil` blocks. Second, Cython can [conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil) the GIL, so these conditions can be used to synchronize access. Finally, Cython's documentation for [external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil) contains more detail on how to safely manage the GIL. In practice, avoiding this issue is simple. First, `nogil` functions probably shouldn't contain
`with nogil` blocks. Second, Cython can
[conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil)
the GIL, so these conditions can be used to synchronize access. Finally, Cython's documentation for
[external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil)
contains more detail on how to safely manage the GIL.
To conclude: use Cython's `nogil` annotation to assert that functions are safe for calling when the GIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions. To conclude: use Cython's `nogil` annotation to assert that functions are safe for calling when the
GIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions.
# Numba # Numba
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_ at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function first compiles it to machine code before running. Calling the function a second time re-uses that machine code unless the argument types have changed. Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by
compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_
at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function
first compiles it to machine code before running. Calling the function a second time re-uses that
machine code unless the argument types have changed.
Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the `@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython` are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to release the lock, the GIL will remain locked if `nogil=False` (the default). Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions
compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode
avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the
`@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython`
are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to
release the lock, the GIL will remain locked if `nogil=False` (the default).
Let's repeat the same experiment, this time using Numba instead of Cython: Let's repeat the same experiment, this time using Numba instead of Cython:
@ -284,7 +332,8 @@ t1.join(); t2.join()
> Wall time: 522 ms > Wall time: 522 ms
> </pre> > </pre>
Finally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired; recursively calling `@jit(nogil=True)` functions is perfectly safe: Finally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired;
recursively calling `@jit(nogil=True)` functions is perfectly safe:
```python ```python
from numba import jit from numba import jit
@ -301,10 +350,21 @@ numba_recurse(2);
# Conclusion # Conclusion
Before finishing, it's important to address pain points that will show up if these techniques are used in a more realistic project: Before finishing, it's important to address pain points that will show up if these techniques are
used in a more realistic project:
First, code running in a GIL-free context will likely also need non-trivial data structures; GIL-free functions aren't useful if they're constantly interacting with Python objects whose access requires the GIL. Cython provides [extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba provides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to address this need. First, code running in a GIL-free context will likely also need non-trivial data structures;
GIL-free functions aren't useful if they're constantly interacting with Python objects whose access
requires the GIL. Cython provides
[extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba
provides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to
address this need.
Second, building and distributing applications that make use of Cython/Numba can be complicated. Cython packages require running the compiler, (potentially) linking/packaging external dependencies, and distributing a binary wheel. Numba is generally simpler because the code being distributed is pure Python, but can be tricky since errors aren't detected until runtime. Second, building and distributing applications that make use of Cython/Numba can be complicated.
Cython packages require running the compiler, (potentially) linking/packaging external dependencies,
and distributing a binary wheel. Numba is generally simpler because the code being distributed is
pure Python, but can be tricky since errors aren't detected until runtime.
Finally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba provide tools to directly manage the GIL when appropriate. This enables true parallelism (not just [concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python. Finally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba
provide tools to directly manage the GIL when appropriate. This enables true parallelism (not just
[concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python.