Markdown auto formatting

This commit is contained in:
Bradlee Speice 2020-06-29 15:51:23 -04:00
parent 7fa69dea9f
commit e02556a770
16 changed files with 182 additions and 136 deletions

View File

@ -2,7 +2,7 @@
layout: post
title: "What I Learned: Porting Dateutil Parser to Rust"
description: ""
category:
category:
tags: [dtparse, rust]
---
@ -18,7 +18,7 @@ what to do with your life (but you should totally keep reading).
# Slow down, what?
OK, fine, I guess I should start with *why* someone would do this.
OK, fine, I guess I should start with _why_ someone would do this.
[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates.
The standard library support for time in Python is kinda dope, but there are a lot of extras
@ -32,9 +32,10 @@ and [time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers
it feels like it shouldn't be that difficult to do, until you try to do it,
and you realize that people suck and this is why [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right).
But alas, we'll try and make contemporary art out of the rubble and give it a
pretentious name like *Time*.
pretentious name like _Time_.
![A gravel mound](/assets/images/2018-06-25-gravel-mound.jpg)
> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php)
What makes `dateutil.parser` great is that there's single function with a single argument that drives
@ -60,11 +61,11 @@ your code blows up in new and exciting ways. Having a reference manual for verba
what your code should be means that you don't spend that long debugging complicated logic,
you're more looking for typos.
Also, **don't use nice Rust things like enums**. While
Also, **don't use nice Rust things like enums**. While
[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94),
I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a boolean
and I mixed up which was true, and which was false (side note: AM is false, PM is true).
In general, writing nice code *should not be a first-pass priority* when you're just trying to recreate
In general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate
the same functionality.
**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames.
@ -84,11 +85,11 @@ indentation levels so you can keep things straight.
[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217)
wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. After
[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205)
compile times dropped down to ~5 seconds. Turns out 150 lines * 100 tests = a lot of redundant code to be compiled.
compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code to be compiled.
My new rule of thumb is that any macros longer than 10-15 lines are actually functions that need to be liberated, man.
Finally, **I really miss list comprehensions and dictionary comprehensions.**
As a quick comparison, see
As a quick comparison, see
[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476)
and [the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629).
I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be added through macros or syntax extensions.
@ -106,7 +107,7 @@ you use `dateutil`. If you want `decimal` types, it's already in the
Thus began my quest to find a decimal library in Rust. What I quickly found was summarized
in a comment:
> Writing a BigDecimal is easy. Writing a *good* BigDecimal is hard.
> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard.
>
> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794)
@ -119,7 +120,7 @@ and I'm forced to dig through a [couple](https://github.com/rust-lang/rust/issue
to figure out if the library I'm look at is dead or just stable.
And even when the "canonical library" exists, there's no guarantees that it will be well-maintained.
[Chrono](https://github.com/chronotope/chrono) is the *de facto* date/time library in Rust,
[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust,
and just released version 0.4.4 like two days ago. Meanwhile, [chrono-tz](https://github.com/chronotope/chrono-tz)
appears to be dead in the water even though [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19).
I know relatively little about it, but it appears that most of the release process is automated; keeping
@ -130,10 +131,10 @@ that up to date should be a no-brainer.
Specifically given "maintenance" being an [oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/)
issue, I'm going to try out the following policy to keep things moving on `dtparse`:
1. Issues/PRs needing *maintainer* feedback will be updated at least weekly. I want to make sure nobody's blocking on me.
1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure nobody's blocking on me.
2. To keep issues/PRs needing *contributor* feedback moving, I'm going to (kindly) ask the contributor to check in after two weeks,
and close the issue without resolution if I hear nothing back after a month.
2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the contributor to check in after two weeks,
and close the issue without resolution if I hear nothing back after a month.
The second point I think has the potential to be a bit controversial, so I'm happy to receive feedback on that.
And if a contributor responds with "hey, still working on it, had a kid and I'm running on 30 seconds of sleep a night,"

View File

@ -2,7 +2,7 @@
layout: post
title: "Primitives in Rust are Weird (and Cool)"
description: "but mostly weird."
category:
category:
tags: [rust, c, java, python, x86]
---
@ -17,9 +17,9 @@ fn main() {
And to my complete befuddlement, it compiled, ran, and produced a completely sensible output.
The reason I was so surprised has to do with how Rust treats a special category of things
I'm going to call *primitives*. In the current version of the Rust book, you'll see them
I'm going to call _primitives_. In the current version of the Rust book, you'll see them
referred to as [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive],
but we're going to stick with the name *primitive* for the time being. Explaining
but we're going to stick with the name _primitive_ for the time being. Explaining
why this program is so cool requires talking about a number of other programming languages,
and keeping a consistent terminology makes things easier.
@ -28,15 +28,17 @@ Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm t
# Defining primitives (Java)
The reason I'm using the name *primitive* comes from how much of my life is Java right now.
The reason I'm using the name _primitive_ comes from how much of my life is Java right now.
Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a
special name for some specific types of values:
> ```
bool char byte
short int long
float double
```
> bool char byte
> short int long
> float double
> ```
````
They are referred to as [primitives][java_primitive]. And relative to the other bits of Java,
they have two unique features. First, they don't have to worry about the
@ -51,7 +53,7 @@ class Main {
System.out.println(x.toString()); // Triggers a compiler error
}
}
```
````
The error is:
@ -64,7 +66,7 @@ Main.java:5: error: int cannot be dereferenced
Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html)
and things that inherit from it are pointers under the hood, and we have to dereference them before
the fields and methods they define can be used. In contrast, *primitive types are just values* -
the fields and methods they define can be used. In contrast, _primitive types are just values_ -
there's nothing to be dereferenced. In memory, they're just a sequence of bits.
If we really want, we can turn the `int` into an
@ -138,7 +140,7 @@ my_function:
```
At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; nothing crazy.
But to show how similar Rust is, let's take a look at our program translated from C to Rust:
But to show how similar Rust is, let's take a look at our program translated from C to Rust:
```rust
fn my_function(x: i32) {}
@ -177,15 +179,15 @@ example::my_function:
```
The generated Rust assembly is functionally pretty close to the C assembly:
*When working with primitives, we're just dealing with bits in memory*.
_When working with primitives, we're just dealing with bits in memory_.
In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what
exactly is going on with this `.to_string()` function call?
# impl primitive (and Python)
Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: *Rust has
implementations for its primitive types.* That's right, `impl` blocks aren't only for `structs` and `traits`,
Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: _Rust has
implementations for its primitive types._ That's right, `impl` blocks aren't only for `structs` and `traits`,
primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html),
[f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html)
as examples.
@ -207,7 +209,7 @@ example::main:
mov rdi, rsp
lea rax, [rip + .Lbyte_str.u]
mov rsi, rax
; Cool stuff right here
call <T as alloc::string::ToString>::to_string@PLT
@ -224,7 +226,7 @@ the location of the function we want to call (like Java), we have a function tha
outside of the instance and just give that function the value `8`.
This is an incredibly technical detail, but the interesting idea I had was this:
*if `to_string()` is a static function, can I refer to the unbound function and give it an instance?*
_if `to_string()` is a static function, can I refer to the unbound function and give it an instance?_
Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link
because I seriously love this thing):
@ -269,7 +271,7 @@ m.my_function()
MyClass.my_function(m)
```
And Python tries to make you *think* that primitives can have instance methods...
And Python tries to make you _think_ that primitives can have instance methods...
```python
>>> dir(8)
@ -313,4 +315,4 @@ Now go forth and fool your friends into thinking you know assembly. This is all
[x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
[java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
[rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types
[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html
[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html

View File

@ -2,7 +2,7 @@
layout: post
title: "Isomorphic Desktop Apps with Rust"
description: "Electron + WASM = ☣"
category:
category:
tags: [rust, javascript, webassembly]
---
@ -18,20 +18,20 @@ nicer and more fun at parties. But I cringe every time "Webpack" is mentioned, a
that the [language specification](https://ecma-international.org/publications/standards/Ecma-402.htm)
dramatically outpaces anyone's [actual implementation](https://kangax.github.io/compat-table/es2016plus/).
The answer to this conundrum is of course to recompile code from newer versions of the language to
older versions *of the same language* before running. At least [Babel] is a nice tongue-in-cheek reference.
older versions _of the same language_ before running. At least [Babel] is a nice tongue-in-cheek reference.
Yet for as much hate as [Electron] receives, it does a stunningly good job at solving
a really hard problem: *how the hell do I put a button on the screen and react when the user clicks it*?
a really hard problem: _how the hell do I put a button on the screen and react when the user clicks it_?
GUI programming is hard, straight up. But if browsers are already able to run everywhere, why don't
we take advantage of someone else solving the hard problems for us? I don't like that I have to use
Javascript for it, but I really don't feel inclined to whip out good ol' [wxWidgets].
Now there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxRust]),
Now there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxrust]),
but those also have their own issues with distribution, styling, etc. With Electron, I can
`yarn create electron-app my-app` and just get going, knowing that packaging/upgrades/etc.
are built in.
My question is: given recent innovations with WASM, *are we Electron yet*?
My question is: given recent innovations with WASM, _are we Electron yet_?
No, not really.
@ -44,8 +44,8 @@ There may already be solutions to the issues I discuss, but I'm totally unaware
so I'm going to try and organize what I did manage to discover.
I should also mention that the content and things I'm talking about here are not intended to be prescriptive,
but more "if someone else is interested, what do we already know doesn't work?" *I expect everything in this post to be obsolete
within two months.* Even over the course of writing this, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/)
but more "if someone else is interested, what do we already know doesn't work?" _I expect everything in this post to be obsolete
within two months._ Even over the course of writing this, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/)
had to be modified because [upstream changes](https://github.com/WebAssembly/binaryen/pull/1642)
broke a [Rust tool](https://github.com/rustwasm/wasm-bindgen/pull/787) the post tried to use.
The post ultimately [got updated](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/#comment-477),
@ -55,13 +55,13 @@ I'll also note that we're going to skip [asm.js] and [emscripten]. Truth be told
to output anything, and so I'm just going to say [here be dragons.](https://en.wikipedia.org/wiki/Here_be_dragons)
Everything I'm discussing here uses the `wasm32-unknown-unknown` target.
The code that I *did* get running is available [over here](https://github.com/speice-io/isomorphic-rust).
The code that I _did_ get running is available [over here](https://github.com/speice-io/isomorphic-rust).
Feel free to use it as a starting point, but I'm mostly including the link as a reference for the things
that were attempted.
# An Example Running Application
So, I did *technically* get a running application:
So, I did _technically_ get a running application:
![Electron app using WASM](/assets/images/2018-09-15-electron-percy-wasm.png)
@ -142,10 +142,10 @@ looks something like this:
- `yarn start` triggers the `prestart` script
- `prestart` checks for missing tools (`wasm-bindgen-cli`, etc.) and then:
- Uses `cargo` to compile the Rust code into WASM
- Uses `wasm-bindgen` to link the WASM blob into a Javascript file with exported symbols
- Uses `webpack` to bundle the page start script with the Javascript we just generated
- Uses `babel` under the hood to compile the `wasm-bindgen` code down from ES6 into something browser-compatible
- Uses `cargo` to compile the Rust code into WASM
- Uses `wasm-bindgen` to link the WASM blob into a Javascript file with exported symbols
- Uses `webpack` to bundle the page start script with the Javascript we just generated
- Uses `babel` under the hood to compile the `wasm-bindgen` code down from ES6 into something browser-compatible
- The `start` script runs an Electron Forge handler to do some sanity checks
- Electron actually starts
@ -175,7 +175,7 @@ exactly match, so it's required that these two version are kept in sync by
either updating the wasm-bindgen dependency or this binary.
```
Not that I ever managed to run into this myself (*coughs nervously*).
Not that I ever managed to run into this myself (_coughs nervously_).
There are two projects attempting to be "application frameworks": [percy] and [yew]. Between those,
I managed to get [two](https://github.com/speice-io/isomorphic-rust/tree/master/percy)
@ -256,7 +256,7 @@ can become a thing:
[libui-rs]: https://github.com/LeoTindall/libui-rs/
[electron]: https://electronjs.org/
[babel]: https://babeljs.io/
[wxRust]: https://github.com/kenz-gelsoft/wxRust
[wxrust]: https://github.com/kenz-gelsoft/wxRust
[wasm-bindgen]: https://github.com/rustwasm/wasm-bindgen
[js-sys]: https://crates.io/crates/js-sys
[percy-webapis]: https://crates.io/crates/percy-webapis
@ -272,4 +272,4 @@ can become a thing:
[typescript]: https://www.typescriptlang.org/
[electron forge]: https://electronforge.io/
[conrod]: https://github.com/PistonDevelopers/conrod
[webpack]: https://webpack.js.org/
[webpack]: https://webpack.js.org/

View File

@ -2,7 +2,7 @@
layout: post
title: "A Case Study in Heaptrack"
description: "...because you don't need no garbage collection"
category:
category:
tags: []
---
@ -15,12 +15,12 @@ One of my earliest conversations about programming went like this:
...though it's not like the first code I wrote was for a
[graphing calculator](https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-se)
packing a whole 24KB of RAM. By the way, *what are you doing on my lawn?*
packing a whole 24KB of RAM. By the way, _what are you doing on my lawn?_
The principle remains though: be efficient with the resources you have, because
[what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html).
My professional work is focused on this kind of efficiency; low-latency financial markets demand that
you understand at a deep level *exactly* what your code is doing. As I continue experimenting with Rust for
you understand at a deep level _exactly_ what your code is doing. As I continue experimenting with Rust for
personal projects, it's exciting to bring a utilitarian mindset with me: there's flexibility for the times I pretend
to have a garbage collector, and flexibility for the times that I really care about how memory is used.
@ -29,14 +29,14 @@ to be a starting toolkit to empower analysis of your own code.
# Curiosity
When I first started building the [dtparse] crate, my intention was to mirror as closely as possible
When I first started building the [dtparse] crate, my intention was to mirror as closely as possible
the equivalent [Python library][dateutil]. Python, as you may know, is garbage collected. Very rarely is memory
usage considered in Python, and I likewise wasn't paying too much attention when `dtparse` was first being built.
This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse` hyper-efficient.
But every so often, I've wondered: "what exactly is going on in memory?" With the advent of Rust 1.28 and the
[Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really great idea:
*build a custom allocator that allows you to track your own allocations.* That way, you can do things like
_build a custom allocator that allows you to track your own allocations._ That way, you can do things like
writing tests for both correct results and correct memory usage. I gave it a [shot][qadapt], but learned
very quickly: **never write your own allocator**. It went from "fun weekend project" to
"I have literally no idea what my computer is doing" at breakneck speed.
@ -97,7 +97,7 @@ For example, we can see that all executions happened during the `main` function:
![allocations in dtparse](/assets/images/2018-10-heaptrack/heaptrack-dtparse-colorized.png)
...and within *that*, allocations happened in two different places:
...and within _that_, allocations happened in two different places:
![allocations in parseinfo](/assets/images/2018-10-heaptrack/heaptrack-parseinfo-colorized.png)
@ -124,6 +124,7 @@ pub fn parse(timestr: &str) -> ParseResult<(NaiveDateTime, Option<FixedOffset>)>
Ok((res.0, res.1))
}
```
> [dtparse](https://github.com/bspeice/dtparse/blob/4d7c5dd99572823fa4a390b483c38ab020a2172f/src/lib.rs#L1286)
---

View File

@ -1,12 +1,12 @@
---
layout: post
title: "More \"What Companies Really Mean\""
description: "when they ask \"Why should we hire you?\""
category:
title: 'More "What Companies Really Mean"'
description: 'when they ask "Why should we hire you?"'
category:
tags: []
---
I recently stumbled across a phenomenal small article entitled
I recently stumbled across a phenomenal small article entitled
[What Startups Really Mean By "Why Should We Hire You?"](https://angel.co/blog/what-startups-really-mean-by-why-should-we-hire-you).
Having been interviewed by smaller companies (though not exactly startups),
the questions and subtexts are the same. There's often a question behind
@ -17,8 +17,8 @@ Let me also make note of one more question/euphemism I've come across:
# How do you feel about Production Support?
**Translation**: *We're a fairly small team, and when things break on an evening/weekend/Christmas Day,
can we call on you to be there?*
**Translation**: _We're a fairly small team, and when things break on an evening/weekend/Christmas Day,
can we call on you to be there?_
I've met decidedly few people in my life who truly enjoy the "ops" side of "devops".
They're incredibly good at taking an impossible problem, pre-existing knowledge of
@ -33,4 +33,4 @@ Small teams have no such luck. If you're interviewing at a small company, especi
"data scientist" or other somesuch position, be aware that systems can and do spontaneously
combust at the most inopportune moments.
**Terrible-but-popular answers include**: *It's a part of the job, and I'm happy to contribute.*
**Terrible-but-popular answers include**: _It's a part of the job, and I'm happy to contribute._

View File

@ -2,7 +2,7 @@
layout: post
title: "QADAPT - debug_assert! for your memory usage"
description: "...and why you want an allocator that goes 💥."
category:
category:
tags: []
---
@ -21,7 +21,7 @@ There's another part of the human condition that derives joy from seeing things
<iframe src="https://giphy.com/embed/YA6dmVW0gfIw8" width="480" height="336" frameBorder="0"></iframe>
And *that's* the part I'm going to focus on.
And _that's_ the part I'm going to focus on.
# Why an Allocator?
@ -30,7 +30,7 @@ There are three reasons for that:
1. Allocation/dropping is slow
2. It's difficult to know exactly when Rust will allocate or drop, especially when using
code that you did not write
code that you did not write
3. I want automated tools to verify behavior, instead of inspecting by hand
When I say "slow," it's important to define the terms. If you're writing web applications,
@ -38,7 +38,7 @@ you'll spend orders of magnitude more time waiting for the database than you wil
However, there's still plenty of code where micro- or nano-seconds matter; think
[finance](https://www.youtube.com/watch?v=NH1Tta7purM),
[real-time audio](https://www.reddit.com/r/rust/comments/9hg7yj/synthesizer_progress_update/e6c291f),
[self-driving cars](https://polysync.io/blog/session-types-for-hearty-codecs/), and
[self-driving cars](https://polysync.io/blog/session-types-for-hearty-codecs/), and
[networking](https://carllerche.github.io/bytes/bytes/index.html).
In these situations it's simply unacceptable for you to spend time doing things
that are not your program, and waiting on the allocator is not cool.
@ -88,7 +88,7 @@ as you expect them to.
So, how exactly does QADAPT solve these problems? **Whenever an allocation or drop occurs in code marked
allocation-safe, QADAPT triggers a thread panic.** We don't want to let the program continue as if
nothing strange happened, *we want things to explode*.
nothing strange happened, _we want things to explode_.
However, you don't want code to panic in production because of circumstances you didn't predict.
Just like [`debug_assert!`](https://doc.rust-lang.org/std/macro.debug_assert.html),
@ -220,4 +220,4 @@ I'm hoping to write more about high-performance Rust in the future, and I expect
that QADAPT will help guide that. If there are topics you're interested in,
let me know in the comments below!
[qadapt]: https://crates.io/crates/qadapt
[qadapt]: https://crates.io/crates/qadapt

View File

@ -2,7 +2,7 @@
layout: post
title: "Allocations in Rust"
description: "An introduction to the memory model."
category:
category:
tags: [rust, understanding-allocations]
---
@ -55,7 +55,7 @@ distinction! If you:
1. Never use `unsafe`
2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)
...then it's not possible for you to use dynamic memory!
...then it's not possible for you to use dynamic memory!
For some uses of Rust, typically embedded devices, these constraints are OK.
They have very limited memory, and the program binary size itself may significantly
@ -75,7 +75,7 @@ would struggle without access to [`std::vector`](https://en.cppreference.com/w/c
`Box`, `Rc`, etc., are also unusable for the same reason.
Whether writing code for embedded devices or not, the important thing in both situations
is how much you know *before your application starts* about what its memory usage will look like.
is how much you know _before your application starts_ about what its memory usage will look like.
In embedded devices, there's a small, fixed amount of memory to use.
In a browser, you have no idea how large [google.com](https://www.google.com)'s home page is until you start
trying to download it. The compiler uses this knowledge (or lack thereof) to optimize
@ -106,5 +106,5 @@ Finally, I'll do what I can to flag potential future changes but the Rust docs
have a notice worth repeating:
> Rust does not currently have a rigorously and formally defined memory model.
>
>
> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html)

View File

@ -2,12 +2,12 @@
layout: post
title: "Global Memory Usage: The Whole World"
description: "Static considered slightly less harmful."
category:
category:
tags: [rust, understanding-allocations]
---
The first memory type we'll look at is pretty special: when Rust can prove that
a *value* is fixed for the life of a program (`const`), and when a *reference* is unique for
a _value_ is fixed for the life of a program (`const`), and when a _reference_ is unique for
the life of a program (`static` as a declaration, not
[`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime)
as a lifetime), we can make use of global memory. This special section of data is embedded
@ -21,7 +21,7 @@ for these two keywords is available, we'll take a hands-on approach to the topic
# **const**
When a *value* is guaranteed to be unchanging in your program (where "value" may be scalars,
When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars,
`struct`s, etc.), you can declare it `const`.
This tells the compiler that it's safe to treat the value as never changing, and enables
some interesting optimizations; not only is there no initialization cost to
@ -29,6 +29,7 @@ creating the value (it is loaded at the same time as the executable parts of you
but the compiler can also copy the value around if it speeds up the code.
The points we need to address when talking about `const` are:
- `Const` values are stored in read-only memory - it's impossible to modify.
- Values resulting from calling a `const fn` are materialized at compile-time.
- The compiler may (or may not) copy `const` values wherever it chooses.
@ -38,10 +39,10 @@ The points we need to address when talking about `const` are:
The first point is a bit strange - "read-only memory."
[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants)
mentions in a couple places that using `mut` with constants is illegal,
but it's also important to demonstrate just how immutable they are. *Typically* in Rust
but it's also important to demonstrate just how immutable they are. _Typically_ in Rust
you can use [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html)
to modify things that aren't declared `mut`.
[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an
[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an
example of this pattern in action:
```rust
@ -62,6 +63,7 @@ fn main() {
println!("Cell: {:?}", cell);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2)
When `const` is involved though, interior mutability is impossible:
@ -83,6 +85,7 @@ fn main() {
println!("Cell: {:?}", &CELL);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00)
And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html):
@ -101,6 +104,7 @@ fn main() {
SURPRISE.call_once(|| println!("Initializing again???"));
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed)
When the [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md)
@ -110,7 +114,7 @@ but it's still something to be aware of.
## Initialization == Compilation
The next thing to mention is that `const` values are loaded into memory *as part of your program binary*.
The next thing to mention is that `const` values are loaded into memory _as part of your program binary_.
Because of this, any `const` values declared in your program will be "realized" at compile-time;
accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may
be able to prefetch the value), but that's it.
@ -125,6 +129,7 @@ pub fn multiply(value: u32) -> u32 {
value * (*CELL.get_mut())
}
```
-- [Compiler Explorer](https://godbolt.org/z/Th8boO)
The compiler creates one `RefCell`, uses it everywhere, and never
@ -147,6 +152,7 @@ pub fn multiply_twice(value: u32) -> u32 {
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/ZtS54X)
In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction
@ -156,23 +162,24 @@ in both the `multiply` and `multiply_twice` functions; the "1000" value is never
Finally, getting the address of a `const` value is possible, but not guaranteed
to be unique (because the compiler can choose to copy values). I was unable to
get non-unique pointers in my testing (even using different crates),
but the specifications are clear enough: *don't rely on pointers to `const`
values being consistent*. To be frank, caring about locations for `const` values
but the specifications are clear enough: _don't rely on pointers to `const`
values being consistent_. To be frank, caring about locations for `const` values
is almost certainly a code smell.
# **static**
Static variables are related to `const` variables, but take a slightly different approach.
When we declare that a *reference* is unique for the life of a program,
When we declare that a _reference_ is unique for the life of a program,
you have a `static` variable (unrelated to the `'static` lifetime). Because of the
reference/value distinction with `const`/`static`,
static variables behave much more like typical "global" variables.
But to understand `static`, here's what we'll look at:
- `static` variables are globally unique locations in memory.
- Like `const`, `static` variables are loaded at the same time as your program being read into memory.
- All `static` variables must implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html)
marker trait.
marker trait.
- Interior mutability is safe and acceptable when using `static` variables.
## Memory Uniqueness
@ -195,6 +202,7 @@ pub fn multiply_twice(value: u32) -> u32 {
value * FACTOR * FACTOR
}
```
-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ)
Where [previously](#copying) there were plenty of
@ -225,6 +233,7 @@ fn main() {
println!("Static MyStruct: {:?}", MY_STRUCT);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)
Things can get a bit weirder when using `const fn` though. In most cases, it just works:
@ -247,6 +256,7 @@ fn main() {
println!("const fn Static MyStruct: {:?}", MY_STRUCT);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)
However, there's a caveat: you're currently not allowed to use `const fn` to initialize
@ -261,6 +271,7 @@ use std::cell::RefCell;
// error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely
static MY_LOCK: RefCell<u8> = RefCell::new(0);
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)
It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though.
@ -292,6 +303,7 @@ static MY_STRUCT: MyStruct = MyStruct {
y: RefCell::new(8)
};
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)
## Interior Mutability
@ -315,4 +327,5 @@ fn main() {
INIT.call_once(|| panic!("INIT was called twice!"));
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)

View File

@ -2,7 +2,7 @@
layout: post
title: "Fixed Memory: Stacking Up"
description: "We don't need no allocator."
category:
category:
tags: [rust, understanding-allocations]
---
@ -23,7 +23,8 @@ When you're finished with stack memory, the `pop` instruction runs in
1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation
and other issues with the heap. All sorts of incredibly sophisticated techniques have been used
to design allocators:
- [Garbage Collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science))
- [Garbage Collection](<https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)>)
strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection)
(used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html))
and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting)
@ -57,6 +58,7 @@ when stack and heap memory regions are used:
1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register)
indicate allocation of stack memory:
```rust
pub fn stack_alloc(x: u32) -> u32 {
// Space for `y` is allocated by subtracting from `rsp`,
@ -66,11 +68,13 @@ when stack and heap memory regions are used:
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/5WSgc9)
2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to
watch for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened
in the recent past:
```rust
pub fn heap_alloc(x: usize) -> usize {
// Space for elements in a vector has to be allocated
@ -80,10 +84,11 @@ when stack and heap memory regions are used:
x
}
```
-- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317)
<span style="font-size: .8em">Note: While the [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html)
is [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46),
the Rust standard library only defines `Drop` implementations for types that involve heap allocation.</span>
the Rust standard library only defines `Drop` implementations for types that involve heap allocation.</span>
3. If you don't want to inspect the assembly, use a custom allocator that's able to track
and alert when heap allocations occur. Crates like [`alloc_counter`](https://crates.io/crates/alloc_counter)
@ -132,6 +137,7 @@ pub fn make_line() {
let ray = Line { a: origin, b: point };
}
```
-- [Compiler Explorer](https://godbolt.org/z/vri9BE)
Note that while some extra-fancy instructions are used for memory manipulation in the assembly,
@ -181,7 +187,7 @@ fn distance(a: &Point, b: &Point) -> i64 {
let y_pow = (y1 - y2) * (y1 - y2);
let squared = x_pow + y_pow;
squared / squared
// Our final result will be stored in the `rax` register
// so that our caller knows where to retrieve it.
// Finally, add back to `rsp` the stack memory that is
@ -197,6 +203,7 @@ pub fn total_distance() {
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST)
As a consequence of function arguments never using heap memory, we can also
@ -234,6 +241,7 @@ pub fn total_distance() {
let _dist_2 = distance(&middle, &end);
}
```
-- [Compiler Explorer](https://godbolt.org/z/30Sh66)
Finally, passing by value (arguments with type
@ -274,6 +282,7 @@ pub fn distance_borrowed(a: &Point, b: &Point) -> i64 {
squared / squared
}
```
-- [Compiler Explorer](https://godbolt.org/z/06hGiv)
# Enums
@ -304,6 +313,7 @@ pub fn enum_compare() {
let opt = Option::Some(z);
}
```
-- [Compiler Explorer](https://godbolt.org/z/HK7zBx)
Because the size of an `enum` is the size of its largest element plus a flag,
@ -312,7 +322,7 @@ of an enum is currently stored in a variable. Thus, enums and unions have no
need of heap allocation. There's unfortunately not a great way to show this
in assembly, so I'll instead point you to the
[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums)
documentation.
documentation.
# Arrays
@ -353,6 +363,7 @@ fn main() {
let _x = EightM::default();
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4)
There aren't any security implications of this (no memory corruption occurs),
@ -367,7 +378,7 @@ are actually objects created on the heap that capture local primitives by copyin
local non-primitives as (`final`) references.
[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and
[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/)
both bind *everything* by reference normally, but Python can also
both bind _everything_ by reference normally, but Python can also
[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has
[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions).
@ -395,6 +406,7 @@ pub fn immediate() {
my_func()();
}
```
-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions
If we store a reference to the closure, the Rust compiler keeps values it needs
@ -410,6 +422,7 @@ pub fn simple_reference() {
x();
}
```
-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions
Even things like variable order can make a difference in instruction count:
@ -422,6 +435,7 @@ pub fn complex() {
y();
}
```
-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions
In every circumstance though, the compiler ensured that no heap allocations were necessary.
@ -430,7 +444,7 @@ In every circumstance though, the compiler ensured that no heap allocations were
Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`)
and dynamic dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often
*associated* with trait objects being stored in the heap, dynamic dispatch can be used
_associated_ with trait objects being stored in the heap, dynamic dispatch can be used
with stack allocated objects as well:
```rust
@ -445,7 +459,7 @@ struct WhyNotU8 {
impl GetInt for WhyNotU8 {
fn get_int(&self) -> u64 {
self.x as u64
}
}
}
// vtable stored at section L__unnamed_2
@ -481,6 +495,7 @@ pub fn do_call() {
retrieve_int(&b);
}
```
-- [Compiler Explorer](https://godbolt.org/z/u_yguS)
It's hard to imagine practical situations where dynamic dispatch would be
@ -493,9 +508,9 @@ Understanding move semantics and copy semantics in Rust is weird at first. The R
far better than can be addressed here, so I'll leave them to do the job.
From a memory perspective though, their guideline is reasonable:
[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy).
While there are potential speed tradeoffs to *benchmark* when discussing `Copy`
(move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s),
*it's impossible for `Copy` to introduce a heap allocation*.
While there are potential speed tradeoffs to _benchmark_ when discussing `Copy`
(move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s),
_it's impossible for `Copy` to introduce a heap allocation_.
But why is this the case? Fundamentally, it's because the language controls
what `Copy` means -
@ -519,6 +534,7 @@ struct NotCopyable {
x: Box<u64>
}
```
-- [Compiler Explorer](https://godbolt.org/z/VToRuK)
# Iterators
@ -587,4 +603,5 @@ pub fn sum_hm(x: &HashMap<u32, u32>) {
}
}
```
-- [Compiler Explorer](https://godbolt.org/z/FTT3CT)

View File

@ -2,21 +2,21 @@
layout: post
title: "Dynamic Memory: A Heaping Helping"
description: "The reason Rust exists."
category:
category:
tags: [rust, understanding-allocations]
---
Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++),
and some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust,
how the language uses dynamic memory (also referred to as the **heap**) is a system called *ownership*.
how the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_.
And as the docs mention, ownership
[is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html).
The heap is used in two situations; when the compiler is unable to predict either the *total size
of memory needed*, or *how long the memory is needed for*, it allocates space in the heap.
The heap is used in two situations; when the compiler is unable to predict either the _total size
of memory needed_, or _how long the memory is needed for_, it allocates space in the heap.
This happens pretty frequently; if you want to download the Google home page, you won't know
how large it is until your program runs. And when you're finished with Google, we deallocate
the memory so it can be used to store other webpages. If you're
the memory so it can be used to store other webpages. If you're
interested in a slightly longer explanation of the heap, check out
[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap)
in Rust's documentation.
@ -50,7 +50,7 @@ unsafe impl GlobalAlloc for CountingAllocator {
ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
System.dealloc(ptr, layout);
}
@ -64,6 +64,7 @@ fn main() {
println!("There were {} allocations before calling main!", x);
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e)
As of the time of writing, there are five allocations that happen before `main`
@ -78,6 +79,7 @@ we'll follow this guide:
Finally, there are two "addendum" issues that are important to address when discussing
Rust and the heap:
- Non-heap alternatives to many standard library types are available.
- Special allocators to track memory behavior should be used to benchmark code.
@ -93,6 +95,7 @@ comes from C++, and while it's closely linked to a general design pattern of
we'll use it here specifically to describe objects that are responsible for managing
ownership of data allocated on the heap. The smart pointers available in the `alloc`
crate should look mostly familiar:
- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)
- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)
- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)
@ -100,6 +103,7 @@ crate should look mostly familiar:
The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers
to manage heap objects, though more than can be covered here. Some examples are:
- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)
- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)
@ -142,6 +146,7 @@ pub fn my_cow() {
Cow::from("drop");
}
```
-- [Compiler Explorer](https://godbolt.org/z/4AMQug)
# Collections
@ -157,8 +162,8 @@ Common types that fall under this umbrella are
[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html)
(not [`str`](https://doc.rust-lang.org/std/primitive.str.html)).
While collections store the objects they own in heap memory, *creating new collections
will not allocate on the heap*. This is a bit weird; if we call `Vec::new()`, the
While collections store the objects they own in heap memory, _creating new collections
will not allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the
assembly shows a corresponding call to `real_drop_in_place`:
```rust
@ -167,6 +172,7 @@ pub fn my_vec() {
Vec::<u8>::new();
}
```
-- [Compiler Explorer](https://godbolt.org/z/1WkNtC)
But because the vector has no elements to manage, no calls to the allocator
@ -179,11 +185,11 @@ use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// Interesting bit happens here
let x: Vec<u8> = Vec::new();
drop(x);
// Turn panicking back off, some deallocations occur
// after main as well.
DO_PANIC.store(false, Ordering::SeqCst);
@ -201,7 +207,7 @@ unsafe impl GlobalAlloc for PanicAllocator {
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
@ -210,6 +216,7 @@ unsafe impl GlobalAlloc for PanicAllocator {
}
}
```
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6)
Other standard library types follow the same behavior; make sure to check out

View File

@ -2,7 +2,7 @@
layout: post
title: "Compiler Optimizations: What It's Done Lately"
description: "A lot. The answer is a lot."
category:
category:
tags: [rust, understanding-allocations]
---
@ -32,13 +32,13 @@ we're focusing on interesting things the Rust language (and LLVM!) can do
with memory management. We'll still be looking at assembly code to
understand what's going on, but it's important to mention again:
**please use automated tools like
[alloc-counter](https://crates.io/crates/alloc_counter) to double-check
memory behavior if it's something you care about**.
[alloc-counter](https://crates.io/crates/alloc_counter) to double-check
memory behavior if it's something you care about**.
It's far too easy to mis-read assembly in large code sections, you should
always verify behavior if you care about memory usage.
The guiding principal as we move forward is this: *optimizing compilers
won't produce worse programs than we started with.* There won't be any
The guiding principal as we move forward is this: _optimizing compilers
won't produce worse programs than we started with._ There won't be any
situations where stack allocations get moved to heap allocations.
There will, however, be an opera of optimization.
@ -57,7 +57,7 @@ use std::sync::atomic::{AtomicBool, Ordering};
pub fn cmp(x: u32) {
// Turn on panicking if we allocate on the heap
DO_PANIC.store(true, Ordering::SeqCst);
// The compiler is able to see through the constant `Box`
// and directly compare `x` to 24 - assembly line 73
let y = Box::new(24);
@ -70,7 +70,7 @@ pub fn cmp(x: u32) {
// LLVM doesn't strip out all the code. If `y` is marked
// volatile instead, allocation will be forced.
unsafe { std::ptr::read_volatile(&equals) };
// Turn off panicking, as there are some deallocations
// when we exit main.
DO_PANIC.store(false, Ordering::SeqCst);
@ -92,7 +92,7 @@ unsafe impl GlobalAlloc for PanicAllocator {
}
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
if DO_PANIC.load(Ordering::SeqCst) {
panic!("Unexpected deallocation.");
@ -101,6 +101,7 @@ unsafe impl GlobalAlloc for PanicAllocator {
}
}
```
-- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)
@ -147,5 +148,6 @@ pub fn main() {
let _x = EightM::default();
}
```
-- [Compiler Explorer](https://godbolt.org/z/daHn7P)
-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)

View File

@ -2,7 +2,7 @@
layout: post
title: "Summary: What are the Allocation Rules?"
description: "A synopsis and reference."
category:
category:
tags: [rust, understanding-allocations]
---
@ -13,12 +13,14 @@ an object on the heap or not. And while Rust will prioritize the fastest behavio
here are the rules for each memory type:
**Heap Allocation**:
- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory.
- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory.
- Some smart pointers in the standard library have counterparts in other crates that
don't need heap memory. If possible, use those.
**Stack Allocation**:
- Everything not using a smart pointer will be allocated on the stack.
- Structs, enums, iterators, arrays, and closures are all stack allocated.
- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated.
@ -26,6 +28,7 @@ here are the rules for each memory type:
- Types that are marked `Copy` are guaranteed to have their contents stack-allocated.
**Global Allocation**:
- `const` is a fixed value; the compiler is allowed to copy it wherever useful.
- `static` is a fixed reference; the compiler will guarantee it is unique.

View File

@ -2,7 +2,7 @@
layout: post
title: "Making Bread"
description: "...because I've got some free time now. 🍞"
category:
category:
tags: [baking]
---
@ -36,4 +36,4 @@ In the end, the bread crust wasn't great, but the bread itself turned out pretty
![Baked bread](/assets/images/2019-05-03-making-bread/final-product.jpg)
I've been writing a lot more during this break, so I'm looking forward to sharing that in the future. In the mean-time, I'm planning on making a sandwich.
I've been writing a lot more during this break, so I'm looking forward to sharing that in the future. In the mean-time, I'm planning on making a sandwich.

View File

@ -2,7 +2,7 @@
layout: post
title: "On Building High Performance Systems"
description: ""
category:
category:
tags: []
---
@ -20,23 +20,26 @@ Having now worked in the trading industry, I can confirm the developers aren't s
The framework I'd propose is this: **If you want to build high-performance systems, focus first on reducing performance variance** (reducing the gap between the fastest and slowest runs of the same code), **and only look at average latency once variance is at an acceptable level**.
Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly *every time*, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving average time.
Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly _every time_, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving average time.
This focus on variance shows up all the time in industry too (emphasis added in all quotes below):
- In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for NASDAQ's matching engine, the most performance-sensitive component of the exchange, dependability is highlighted in addition to instantaneous metrics:
> Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40 microsecond average latency
> Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40 microsecond average latency
- The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance:
> Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and **most predictable latency possible** of any messaging system
> Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and **most predictable latency possible** of any messaging system
- The company PolySync, which is working on autonomous vehicles, [mentions why](https://polysync.io/blog/session-types-for-hearty-codecs/) they picked their specific messaging format:
> In general, high performance is almost always desirable for serialization. But in the world of autonomous vehicles, **steady timing performance is even more important** than peak throughput. This is because safe operation is sensitive to timing outliers. Nobody wants the system that decides when to slam on the brakes to occasionally take 100 times longer than usual to encode its commands.
> In general, high performance is almost always desirable for serialization. But in the world of autonomous vehicles, **steady timing performance is even more important** than peak throughput. This is because safe operation is sensitive to timing outliers. Nobody wants the system that decides when to slam on the brakes to occasionally take 100 times longer than usual to encode its commands.
- [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out variance (jitter) as a big concern for [electronic trading](https://solarflare.com/electronic-trading/):
> The high stakes world of electronic trading, investment banks, market makers, hedge funds and exchanges demand the **lowest possible latency and jitter** while utilizing the highest bandwidth and return on their investment.
> The high stakes world of electronic trading, investment banks, market makers, hedge funds and exchanges demand the **lowest possible latency and jitter** while utilizing the highest bandwidth and return on their investment.
And to further clarify: we're not discussing *total run-time*, but variance of total run-time. There are situations where it's not reasonably possible to make things faster, and you'd much rather be consistent. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at *absolute minimum* a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago). If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a trade occurs, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of *additional processing* to a minimum, since speed of light is the limiting factor.
And to further clarify: we're not discussing _total run-time_, but variance of total run-time. There are situations where it's not reasonably possible to make things faster, and you'd much rather be consistent. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at _absolute minimum_ a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago). If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a trade occurs, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of _additional processing_ to a minimum, since speed of light is the limiting factor.
So how does one go about looking for and eliminating performance variance? To tell the truth, I don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep understanding of the entire technology stack, and (B) actually measuring system performance (though (C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for inspiration never hurt). Even then, every project cares about performance to a different degree; you may need to build an entire [replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to accurately benchmark at nanosecond precision, or you may be content to simply [avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in your Java code.
@ -45,6 +48,7 @@ Even though everyone has different needs, there are still common things to look
## Language-specific
**Garbage Collection**: How often does garbage collection happen? When is it triggered? What are the impacts?
- [In Python](https://rushter.com/blog/python-garbage-collector/), individual objects are collected if the reference count reaches 0, and each generation is collected if `num_alloc - num_dealloc > gc_threshold` whenever an allocation happens. The GIL is acquired for the duration of generational collection.
- Java has [many](https://docs.oracle.com/en/java/javase/12/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE) [different](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573) [collection](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A) [algorithms](https://docs.oracle.com/en/java/javase/12/gctuning/z-garbage-collector1.html#GUID-A5A42691-095E-47BA-B6DC-FB4E5FAA43D0) to choose from, each with different characteristics. The default algorithms (Parallel GC in Java 8, G1 in Java 9) freeze the JVM while collecting, while more recent algorithms ([ZGC](https://wiki.openjdk.java.net/display/zgc) and [Shenandoah](https://wiki.openjdk.java.net/display/shenandoah)) are designed to keep "stop the world" to a minimum by doing collection work in parallel.
@ -58,9 +62,9 @@ Even though everyone has different needs, there are still common things to look
## Kernel
Code you wrote is almost certainly not the *only* code running on your hardware. There are many ways the operating system interacts with your program, from interrupts to system calls, that are important to watch for. These are written from a Linux perspective, but Windows does typically have equivalent functionality.
Code you wrote is almost certainly not the _only_ code running on your hardware. There are many ways the operating system interacts with your program, from interrupts to system calls, that are important to watch for. These are written from a Linux perspective, but Windows does typically have equivalent functionality.
**Scheduling**: The kernel is normally free to schedule any process on any core, so it's important to reserve CPU cores exclusively for the important programs. There are a few parts to this: first, limit the CPU cores that non-critical processes are allowed to run on by excluding cores from scheduling ([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html) kernel command-line option), or by setting the `init` process CPU affinity ([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes to run on the isolated cores by setting the [processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using [taskset](https://linux.die.net/man/1/taskset). Finally, use [`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or [`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off hyper-threading is also likely beneficial.
**Scheduling**: The kernel is normally free to schedule any process on any core, so it's important to reserve CPU cores exclusively for the important programs. There are a few parts to this: first, limit the CPU cores that non-critical processes are allowed to run on by excluding cores from scheduling ([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html) kernel command-line option), or by setting the `init` process CPU affinity ([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes to run on the isolated cores by setting the [processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using [taskset](https://linux.die.net/man/1/taskset). Finally, use [`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or [`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off hyper-threading is also likely beneficial.
**System calls**: Reading from a UNIX socket? Writing to a file? In addition to not knowing how long the I/O operation takes, these all trigger expensive [system calls (syscalls)](https://en.wikipedia.org/wiki/System_call). To handle these, the CPU must [context switch](https://en.wikipedia.org/wiki/Context_switch) to the kernel, let the kernel operation complete, then context switch back to your program. We'd rather keep these [to a minimum](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) (see timestamp 18:20). [Strace](https://linux.die.net/man/1/strace) is your friend for understanding when and where syscalls happen.
@ -82,7 +86,7 @@ Code you wrote is almost certainly not the *only* code running on your hardware.
**Routing**: There's a reason financial firms are willing to pay [millions of euros](https://sniperinmahwah.wordpress.com/2019/03/26/4-les-moeres-english-version/) for rights to a small plot of land - having a straight-line connection from point A to point B means the path their data takes is the shortest possible. In contrast, there are currently 6 computers in between me and Google, but that may change at any moment if my ISP realizes a [more efficient route](https://en.wikipedia.org/wiki/Border_Gateway_Protocol) is available. Whether it's using [research-quality equipment](https://sniperinmahwah.wordpress.com/2018/05/07/shortwave-trading-part-i-the-west-chicago-tower-mystery/) for shortwave radio, or just making sure there's no data inadvertently going between data centers, routing matters.
**Protocol**: TCP as a network protocol is awesome: guaranteed and in-order delivery, flow control, and congestion control all built in. But these attributes make the most sense when networking infrastructure is lossy; for systems that expect nearly all packets to be delivered correctly, the setup handshaking and packet acknowledgment are just overhead. Using UDP (unicast or multicast) may make sense in these contexts as it avoids the chatter needed to track connection state, and [gap-fill](https://iextrading.com/docs/IEX%20Transport%20Specification.pdf) [strategies](http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/moldudp64.pdf) can handle the rest.
**Protocol**: TCP as a network protocol is awesome: guaranteed and in-order delivery, flow control, and congestion control all built in. But these attributes make the most sense when networking infrastructure is lossy; for systems that expect nearly all packets to be delivered correctly, the setup handshaking and packet acknowledgment are just overhead. Using UDP (unicast or multicast) may make sense in these contexts as it avoids the chatter needed to track connection state, and [gap-fill](https://iextrading.com/docs/IEX%20Transport%20Specification.pdf) [strategies](http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/moldudp64.pdf) can handle the rest.
**Switching**: Many routers/switches handle packets using "store-and-forward" behavior: wait for the whole packet, validate checksums, and then send to the next device. In variance terms, the time needed to move data between two nodes is proportional to the size of that data; the switch must "store" all data before it can calculate checksums and "forward" to the next node. With ["cut-through"](https://www.networkworld.com/article/2241573/latency-and-jitter--cut-through-design-pays-off-for-arista--blade.html) designs, switches will begin forwarding data as soon as they know where the destination is, checksums be damned. This means there's a fixed cost (at the switch) for network traffic, no matter the size.

View File

@ -2,7 +2,7 @@
layout: post
title: "Binary Format Shootout"
description: "Cap'n Proto vs. Flatbuffers vs. SBE"
category:
category:
tags: [rust]
---
@ -26,7 +26,7 @@ Any one of these will satisfy the project requirements: easy to transmit over a
and polyglot support. But how do you actually pick one? It's impossible to know what issues will follow that choice,
so I tend to avoid commitment until the last possible moment.
Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small
Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small
proof-of-concept system in each format and pit them against each other. All code can be found in the
[repository](https://github.com/speice-io/marketdata-shootout) for this post.
@ -204,7 +204,7 @@ This test measures, on a
how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer.
| Schema | Median | 99th Pctl | 99.9th Pctl | Total |
|:---------------------|:-------|:----------|:------------|:-------|
| :------------------- | :----- | :-------- | :---------- | :----- |
| Cap'n Proto Packed | 413ns | 1751ns | 2943ns | 14.80s |
| Cap'n Proto Unpacked | 273ns | 1828ns | 2836ns | 10.65s |
| Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s |
@ -219,7 +219,7 @@ perform some basic aggregation. The aggregation code is the same for each format
so any performance differences are due solely to the format implementation.
| Schema | Median | 99th Pctl | 99.9th Pctl | Total |
|:---------------------|:-------|:----------|:------------|:-------|
| :------------------- | :----- | :-------- | :---------- | :----- |
| Cap'n Proto Packed | 539ns | 1216ns | 2599ns | 18.92s |
| Cap'n Proto Unpacked | 366ns | 737ns | 1583ns | 12.32s |
| Flatbuffers | 173ns | 421ns | 1007ns | 6.00s |

View File

@ -2,13 +2,13 @@
layout: post
title: "Release the GIL"
description: "Strategies for Parallelism in Python"
category:
category:
tags: [python]
---
Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock) (GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision made before multi-core CPU's were widely available, but the fact that it's still around indicates that it generally works [Good](https://wiki.c2.com/?PrematureOptimization) [Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective workarounds; it's not hard to start a [new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to synchronize code running in parallel.
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of asynchronicity and *M:N* threading, Python seems lacking. The ideal scenario is to take advantage of both Python's productivity and the modern CPU's parallel capabilities.
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of both Python's productivity and the modern CPU's parallel capabilities.
Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about whether it's a good idea to use them. Very often, unlocking the GIL is an [XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at the expense of project complexity; messing with the GIL is ultimately messing with Python's memory model.
@ -84,10 +84,8 @@ _ = cython_nogil(N);
> Wall time: 388 ms
> </pre>
Both versions (with and without GIL) take effectively the same amount of time to run. Even when running this calculation in parallel on separate threads, it is expected that the run time will double because only one thread can be active at a time:
```python
%%time
from threading import Thread
@ -106,10 +104,8 @@ t1.join(); t2.join()
> Wall time: 645 ms
> </pre>
However, if the first thread releases the GIL, the second thread is free to acquire it and run in parallel:
```python
%%time
@ -153,10 +149,10 @@ Finally, be aware that attempting to unlock the GIL from a thread that doesn't o
cdef int cython_recurse(int n) nogil:
if n <= 0:
return 0
with nogil:
return cython_recurse(n - 1)
cython_recurse(2)
```
@ -175,7 +171,7 @@ To conclude: use Cython's `nogil` annotation to assert that functions are safe f
# Numba
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by compiling a Python-like language to C/C++, Numba compiles Python bytecode *directly to machine code* at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function first compiles it to machine code before running. Calling the function a second time re-uses that machine code unless the argument types have changed.
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_ at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function first compiles it to machine code before running. Calling the function a second time re-uses that machine code unless the argument types have changed.
Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the `@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython` are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to release the lock, the GIL will remain locked if `nogil=False` (the default).
@ -299,7 +295,7 @@ def numba_recurse(n: int) -> int:
return 0
return numba_recurse(n - 1)
numba_recurse(2);
```