mirror of
				https://github.com/bspeice/speice.io
				synced 2025-10-26 15:10:30 -04:00 
			
		
		
		
	Markdown auto formatting
This commit is contained in:
		| @ -18,7 +18,7 @@ what to do with your life (but you should totally keep reading). | |||||||
|  |  | ||||||
| # Slow down, what? | # Slow down, what? | ||||||
|  |  | ||||||
| OK, fine, I guess I should start with *why* someone would do this. | OK, fine, I guess I should start with _why_ someone would do this. | ||||||
|  |  | ||||||
| [Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. | [Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. | ||||||
| The standard library support for time in Python is kinda dope, but there are a lot of extras | The standard library support for time in Python is kinda dope, but there are a lot of extras | ||||||
| @ -32,9 +32,10 @@ and [time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers | |||||||
| it feels like it shouldn't be that difficult to do, until you try to do it, | it feels like it shouldn't be that difficult to do, until you try to do it, | ||||||
| and you realize that people suck and this is why [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). | and you realize that people suck and this is why [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). | ||||||
| But alas, we'll try and make contemporary art out of the rubble and give it a | But alas, we'll try and make contemporary art out of the rubble and give it a | ||||||
| pretentious name like *Time*. | pretentious name like _Time_. | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
| > [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) | > [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) | ||||||
|  |  | ||||||
| What makes `dateutil.parser` great is that there's single function with a single argument that drives | What makes `dateutil.parser` great is that there's single function with a single argument that drives | ||||||
| @ -64,7 +65,7 @@ Also, **don't use nice Rust things like enums**. While | |||||||
| [one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), | [one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), | ||||||
| I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a boolean | I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a boolean | ||||||
| and I mixed up which was true, and which was false (side note: AM is false, PM is true). | and I mixed up which was true, and which was false (side note: AM is false, PM is true). | ||||||
| In general, writing nice code *should not be a first-pass priority* when you're just trying to recreate | In general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate | ||||||
| the same functionality. | the same functionality. | ||||||
|  |  | ||||||
| **Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. | **Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. | ||||||
| @ -84,7 +85,7 @@ indentation levels so you can keep things straight. | |||||||
| [main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) | [main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) | ||||||
| wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. After | wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. After | ||||||
| [moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) | [moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) | ||||||
| compile times dropped down to ~5 seconds. Turns out 150 lines * 100 tests = a lot of redundant code to be compiled. | compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code to be compiled. | ||||||
| My new rule of thumb is that any macros longer than 10-15 lines are actually functions that need to be liberated, man. | My new rule of thumb is that any macros longer than 10-15 lines are actually functions that need to be liberated, man. | ||||||
|  |  | ||||||
| Finally, **I really miss list comprehensions and dictionary comprehensions.** | Finally, **I really miss list comprehensions and dictionary comprehensions.** | ||||||
| @ -106,7 +107,7 @@ you use `dateutil`. If you want `decimal` types, it's already in the | |||||||
| Thus began my quest to find a decimal library in Rust. What I quickly found was summarized | Thus began my quest to find a decimal library in Rust. What I quickly found was summarized | ||||||
| in a comment: | in a comment: | ||||||
|  |  | ||||||
| > Writing a BigDecimal is easy. Writing a *good* BigDecimal is hard. | > Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard. | ||||||
| > | > | ||||||
| > [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) | > [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) | ||||||
|  |  | ||||||
| @ -119,7 +120,7 @@ and I'm forced to dig through a [couple](https://github.com/rust-lang/rust/issue | |||||||
| to figure out if the library I'm look at is dead or just stable. | to figure out if the library I'm look at is dead or just stable. | ||||||
|  |  | ||||||
| And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. | And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. | ||||||
| [Chrono](https://github.com/chronotope/chrono) is the *de facto* date/time library in Rust, | [Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, | ||||||
| and just released version 0.4.4 like two days ago. Meanwhile, [chrono-tz](https://github.com/chronotope/chrono-tz) | and just released version 0.4.4 like two days ago. Meanwhile, [chrono-tz](https://github.com/chronotope/chrono-tz) | ||||||
| appears to be dead in the water even though [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). | appears to be dead in the water even though [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). | ||||||
| I know relatively little about it, but it appears that most of the release process is automated; keeping | I know relatively little about it, but it appears that most of the release process is automated; keeping | ||||||
| @ -130,9 +131,9 @@ that up to date should be a no-brainer. | |||||||
| Specifically given "maintenance" being an [oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) | Specifically given "maintenance" being an [oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) | ||||||
| issue, I'm going to try out the following policy to keep things moving on `dtparse`: | issue, I'm going to try out the following policy to keep things moving on `dtparse`: | ||||||
|  |  | ||||||
| 1. Issues/PRs needing *maintainer* feedback will be updated at least weekly. I want to make sure nobody's blocking on me. | 1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure nobody's blocking on me. | ||||||
|  |  | ||||||
| 2. To keep issues/PRs needing *contributor* feedback moving, I'm going to (kindly) ask the contributor to check in after two weeks, | 2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the contributor to check in after two weeks, | ||||||
|    and close the issue without resolution if I hear nothing back after a month. |    and close the issue without resolution if I hear nothing back after a month. | ||||||
|  |  | ||||||
| The second point I think has the potential to be a bit controversial, so I'm happy to receive feedback on that. | The second point I think has the potential to be a bit controversial, so I'm happy to receive feedback on that. | ||||||
|  | |||||||
| @ -17,9 +17,9 @@ fn main() { | |||||||
|  |  | ||||||
| And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. | And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. | ||||||
| The reason I was so surprised has to do with how Rust treats a special category of things | The reason I was so surprised has to do with how Rust treats a special category of things | ||||||
| I'm going to call *primitives*. In the current version of the Rust book, you'll see them | I'm going to call _primitives_. In the current version of the Rust book, you'll see them | ||||||
| referred to as [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], | referred to as [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], | ||||||
| but we're going to stick with the name *primitive* for the time being. Explaining | but we're going to stick with the name _primitive_ for the time being. Explaining | ||||||
| why this program is so cool requires talking about a number of other programming languages, | why this program is so cool requires talking about a number of other programming languages, | ||||||
| and keeping a consistent terminology makes things easier. | and keeping a consistent terminology makes things easier. | ||||||
|  |  | ||||||
| @ -28,15 +28,17 @@ Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm t | |||||||
|  |  | ||||||
| # Defining primitives (Java) | # Defining primitives (Java) | ||||||
|  |  | ||||||
| The reason I'm using the name *primitive* comes from how much of my life is Java right now. | The reason I'm using the name _primitive_ comes from how much of my life is Java right now. | ||||||
| Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a | Spoiler alert: a lot of it. And for the most part I like Java, but I digress. In Java, there's a | ||||||
| special name for some specific types of values: | special name for some specific types of values: | ||||||
|  |  | ||||||
| > ``` | > ``` | ||||||
| bool    char    byte | > bool    char    byte | ||||||
| short   int     long | > short   int     long | ||||||
| float   double | > float   double | ||||||
| ``` | > ``` | ||||||
|  |  | ||||||
|  | ```` | ||||||
|  |  | ||||||
| They are referred to as [primitives][java_primitive]. And relative to the other bits of Java, | They are referred to as [primitives][java_primitive]. And relative to the other bits of Java, | ||||||
| they have two unique features. First, they don't have to worry about the | they have two unique features. First, they don't have to worry about the | ||||||
| @ -51,7 +53,7 @@ class Main { | |||||||
|         System.out.println(x.toString()); // Triggers a compiler error |         System.out.println(x.toString()); // Triggers a compiler error | ||||||
|     } |     } | ||||||
| } | } | ||||||
| ``` | ```` | ||||||
|  |  | ||||||
| The error is: | The error is: | ||||||
|  |  | ||||||
| @ -64,7 +66,7 @@ Main.java:5: error: int cannot be dereferenced | |||||||
|  |  | ||||||
| Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html) | Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html) | ||||||
| and things that inherit from it are pointers under the hood, and we have to dereference them before | and things that inherit from it are pointers under the hood, and we have to dereference them before | ||||||
| the fields and methods they define can be used. In contrast, *primitive types are just values* - | the fields and methods they define can be used. In contrast, _primitive types are just values_ - | ||||||
| there's nothing to be dereferenced. In memory, they're just a sequence of bits. | there's nothing to be dereferenced. In memory, they're just a sequence of bits. | ||||||
|  |  | ||||||
| If we really want, we can turn the `int` into an | If we really want, we can turn the `int` into an | ||||||
| @ -177,15 +179,15 @@ example::my_function: | |||||||
| ``` | ``` | ||||||
|  |  | ||||||
| The generated Rust assembly is functionally pretty close to the C assembly: | The generated Rust assembly is functionally pretty close to the C assembly: | ||||||
| *When working with primitives, we're just dealing with bits in memory*.  | _When working with primitives, we're just dealing with bits in memory_. | ||||||
|  |  | ||||||
| In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what | In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to dereference. So what | ||||||
| exactly is going on with this `.to_string()` function call? | exactly is going on with this `.to_string()` function call? | ||||||
|  |  | ||||||
| # impl primitive (and Python) | # impl primitive (and Python) | ||||||
|  |  | ||||||
| Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: *Rust has | Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this together: _Rust has | ||||||
| implementations for its primitive types.* That's right, `impl` blocks aren't only for `structs` and `traits`, | implementations for its primitive types._ That's right, `impl` blocks aren't only for `structs` and `traits`, | ||||||
| primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html), | primitives get them too. Don't believe me? Check out [u32](https://doc.rust-lang.org/std/primitive.u32.html), | ||||||
| [f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html) | [f64](https://doc.rust-lang.org/std/primitive.f64.html) and [char](https://doc.rust-lang.org/std/primitive.char.html) | ||||||
| as examples. | as examples. | ||||||
| @ -224,7 +226,7 @@ the location of the function we want to call (like Java), we have a function tha | |||||||
| outside of the instance and just give that function the value `8`. | outside of the instance and just give that function the value `8`. | ||||||
|  |  | ||||||
| This is an incredibly technical detail, but the interesting idea I had was this: | This is an incredibly technical detail, but the interesting idea I had was this: | ||||||
| *if `to_string()` is a static function, can I refer to the unbound function and give it an instance?* | _if `to_string()` is a static function, can I refer to the unbound function and give it an instance?_ | ||||||
|  |  | ||||||
| Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link | Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link | ||||||
| because I seriously love this thing): | because I seriously love this thing): | ||||||
| @ -269,7 +271,7 @@ m.my_function() | |||||||
| MyClass.my_function(m) | MyClass.my_function(m) | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| And Python tries to make you *think* that primitives can have instance methods... | And Python tries to make you _think_ that primitives can have instance methods... | ||||||
|  |  | ||||||
| ```python | ```python | ||||||
| >>> dir(8) | >>> dir(8) | ||||||
|  | |||||||
| @ -18,20 +18,20 @@ nicer and more fun at parties. But I cringe every time "Webpack" is mentioned, a | |||||||
| that the [language specification](https://ecma-international.org/publications/standards/Ecma-402.htm) | that the [language specification](https://ecma-international.org/publications/standards/Ecma-402.htm) | ||||||
| dramatically outpaces anyone's [actual implementation](https://kangax.github.io/compat-table/es2016plus/). | dramatically outpaces anyone's [actual implementation](https://kangax.github.io/compat-table/es2016plus/). | ||||||
| The answer to this conundrum is of course to recompile code from newer versions of the language to | The answer to this conundrum is of course to recompile code from newer versions of the language to | ||||||
| older versions *of the same language* before running. At least [Babel] is a nice tongue-in-cheek reference. | older versions _of the same language_ before running. At least [Babel] is a nice tongue-in-cheek reference. | ||||||
|  |  | ||||||
| Yet for as much hate as [Electron] receives, it does a stunningly good job at solving | Yet for as much hate as [Electron] receives, it does a stunningly good job at solving | ||||||
| a really hard problem: *how the hell do I put a button on the screen and react when the user clicks it*? | a really hard problem: _how the hell do I put a button on the screen and react when the user clicks it_? | ||||||
| GUI programming is hard, straight up. But if browsers are already able to run everywhere, why don't | GUI programming is hard, straight up. But if browsers are already able to run everywhere, why don't | ||||||
| we take advantage of someone else solving the hard problems for us? I don't like that I have to use | we take advantage of someone else solving the hard problems for us? I don't like that I have to use | ||||||
| Javascript for it, but I really don't feel inclined to whip out good ol' [wxWidgets]. | Javascript for it, but I really don't feel inclined to whip out good ol' [wxWidgets]. | ||||||
|  |  | ||||||
| Now there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxRust]), | Now there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxrust]), | ||||||
| but those also have their own issues with distribution, styling, etc. With Electron, I can | but those also have their own issues with distribution, styling, etc. With Electron, I can | ||||||
| `yarn create electron-app my-app` and just get going, knowing that packaging/upgrades/etc. | `yarn create electron-app my-app` and just get going, knowing that packaging/upgrades/etc. | ||||||
| are built in. | are built in. | ||||||
|  |  | ||||||
| My question is: given recent innovations with WASM, *are we Electron yet*? | My question is: given recent innovations with WASM, _are we Electron yet_? | ||||||
|  |  | ||||||
| No, not really. | No, not really. | ||||||
|  |  | ||||||
| @ -44,8 +44,8 @@ There may already be solutions to the issues I discuss, but I'm totally unaware | |||||||
| so I'm going to try and organize what I did manage to discover. | so I'm going to try and organize what I did manage to discover. | ||||||
|  |  | ||||||
| I should also mention that the content and things I'm talking about here are not intended to be prescriptive, | I should also mention that the content and things I'm talking about here are not intended to be prescriptive, | ||||||
| but more "if someone else is interested, what do we already know doesn't work?" *I expect everything in this post to be obsolete | but more "if someone else is interested, what do we already know doesn't work?" _I expect everything in this post to be obsolete | ||||||
| within two months.* Even over the course of writing this, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/) | within two months._ Even over the course of writing this, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/) | ||||||
| had to be modified because [upstream changes](https://github.com/WebAssembly/binaryen/pull/1642) | had to be modified because [upstream changes](https://github.com/WebAssembly/binaryen/pull/1642) | ||||||
| broke a [Rust tool](https://github.com/rustwasm/wasm-bindgen/pull/787) the post tried to use. | broke a [Rust tool](https://github.com/rustwasm/wasm-bindgen/pull/787) the post tried to use. | ||||||
| The post ultimately [got updated](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/#comment-477), | The post ultimately [got updated](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/#comment-477), | ||||||
| @ -55,13 +55,13 @@ I'll also note that we're going to skip [asm.js] and [emscripten]. Truth be told | |||||||
| to output anything, and so I'm just going to say [here be dragons.](https://en.wikipedia.org/wiki/Here_be_dragons) | to output anything, and so I'm just going to say [here be dragons.](https://en.wikipedia.org/wiki/Here_be_dragons) | ||||||
| Everything I'm discussing here uses the `wasm32-unknown-unknown` target. | Everything I'm discussing here uses the `wasm32-unknown-unknown` target. | ||||||
|  |  | ||||||
| The code that I *did* get running is available [over here](https://github.com/speice-io/isomorphic-rust). | The code that I _did_ get running is available [over here](https://github.com/speice-io/isomorphic-rust). | ||||||
| Feel free to use it as a starting point, but I'm mostly including the link as a reference for the things | Feel free to use it as a starting point, but I'm mostly including the link as a reference for the things | ||||||
| that were attempted. | that were attempted. | ||||||
|  |  | ||||||
| # An Example Running Application | # An Example Running Application | ||||||
|  |  | ||||||
| So, I did *technically* get a running application: | So, I did _technically_ get a running application: | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
| @ -175,7 +175,7 @@ exactly match, so it's required that these two version are kept in sync by | |||||||
| either updating the wasm-bindgen dependency or this binary. | either updating the wasm-bindgen dependency or this binary. | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| Not that I ever managed to run into this myself (*coughs nervously*). | Not that I ever managed to run into this myself (_coughs nervously_). | ||||||
|  |  | ||||||
| There are two projects attempting to be "application frameworks": [percy] and [yew]. Between those, | There are two projects attempting to be "application frameworks": [percy] and [yew]. Between those, | ||||||
| I managed to get [two](https://github.com/speice-io/isomorphic-rust/tree/master/percy) | I managed to get [two](https://github.com/speice-io/isomorphic-rust/tree/master/percy) | ||||||
| @ -256,7 +256,7 @@ can become a thing: | |||||||
| [libui-rs]: https://github.com/LeoTindall/libui-rs/ | [libui-rs]: https://github.com/LeoTindall/libui-rs/ | ||||||
| [electron]: https://electronjs.org/ | [electron]: https://electronjs.org/ | ||||||
| [babel]: https://babeljs.io/ | [babel]: https://babeljs.io/ | ||||||
| [wxRust]: https://github.com/kenz-gelsoft/wxRust | [wxrust]: https://github.com/kenz-gelsoft/wxRust | ||||||
| [wasm-bindgen]: https://github.com/rustwasm/wasm-bindgen | [wasm-bindgen]: https://github.com/rustwasm/wasm-bindgen | ||||||
| [js-sys]: https://crates.io/crates/js-sys | [js-sys]: https://crates.io/crates/js-sys | ||||||
| [percy-webapis]: https://crates.io/crates/percy-webapis | [percy-webapis]: https://crates.io/crates/percy-webapis | ||||||
|  | |||||||
| @ -15,12 +15,12 @@ One of my earliest conversations about programming went like this: | |||||||
|  |  | ||||||
| ...though it's not like the first code I wrote was for a | ...though it's not like the first code I wrote was for a | ||||||
| [graphing calculator](https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-se) | [graphing calculator](https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-se) | ||||||
| packing a whole 24KB of RAM. By the way, *what are you doing on my lawn?* | packing a whole 24KB of RAM. By the way, _what are you doing on my lawn?_ | ||||||
|  |  | ||||||
| The principle remains though: be efficient with the resources you have, because | The principle remains though: be efficient with the resources you have, because | ||||||
| [what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html). | [what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html). | ||||||
| My professional work is focused on this kind of efficiency; low-latency financial markets demand that | My professional work is focused on this kind of efficiency; low-latency financial markets demand that | ||||||
| you understand at a deep level *exactly* what your code is doing. As I continue experimenting with Rust for | you understand at a deep level _exactly_ what your code is doing. As I continue experimenting with Rust for | ||||||
| personal projects, it's exciting to bring a utilitarian mindset with me: there's flexibility for the times I pretend | personal projects, it's exciting to bring a utilitarian mindset with me: there's flexibility for the times I pretend | ||||||
| to have a garbage collector, and flexibility for the times that I really care about how memory is used. | to have a garbage collector, and flexibility for the times that I really care about how memory is used. | ||||||
|  |  | ||||||
| @ -36,7 +36,7 @@ usage considered in Python, and I likewise wasn't paying too much attention when | |||||||
| This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse` hyper-efficient. | This lackadaisical approach to memory works well enough, and I'm not planning on making `dtparse` hyper-efficient. | ||||||
| But every so often, I've wondered: "what exactly is going on in memory?" With the advent of Rust 1.28 and the | But every so often, I've wondered: "what exactly is going on in memory?" With the advent of Rust 1.28 and the | ||||||
| [Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really great idea: | [Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really great idea: | ||||||
| *build a custom allocator that allows you to track your own allocations.* That way, you can do things like | _build a custom allocator that allows you to track your own allocations._ That way, you can do things like | ||||||
| writing tests for both correct results and correct memory usage. I gave it a [shot][qadapt], but learned | writing tests for both correct results and correct memory usage. I gave it a [shot][qadapt], but learned | ||||||
| very quickly: **never write your own allocator**. It went from "fun weekend project" to | very quickly: **never write your own allocator**. It went from "fun weekend project" to | ||||||
| "I have literally no idea what my computer is doing" at breakneck speed. | "I have literally no idea what my computer is doing" at breakneck speed. | ||||||
| @ -97,7 +97,7 @@ For example, we can see that all executions happened during the `main` function: | |||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
| ...and within *that*, allocations happened in two different places: | ...and within _that_, allocations happened in two different places: | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  | ||||||
| @ -124,6 +124,7 @@ pub fn parse(timestr: &str) -> ParseResult<(NaiveDateTime, Option<FixedOffset>)> | |||||||
|     Ok((res.0, res.1)) |     Ok((res.0, res.1)) | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| > [dtparse](https://github.com/bspeice/dtparse/blob/4d7c5dd99572823fa4a390b483c38ab020a2172f/src/lib.rs#L1286) | > [dtparse](https://github.com/bspeice/dtparse/blob/4d7c5dd99572823fa4a390b483c38ab020a2172f/src/lib.rs#L1286) | ||||||
|  |  | ||||||
| --- | --- | ||||||
|  | |||||||
| @ -1,7 +1,7 @@ | |||||||
| --- | --- | ||||||
| layout: post | layout: post | ||||||
| title: "More \"What Companies Really Mean\"" | title: 'More "What Companies Really Mean"' | ||||||
| description: "when they ask \"Why should we hire you?\"" | description: 'when they ask "Why should we hire you?"' | ||||||
| category: | category: | ||||||
| tags: [] | tags: [] | ||||||
| --- | --- | ||||||
| @ -17,8 +17,8 @@ Let me also make note of one more question/euphemism I've come across: | |||||||
|  |  | ||||||
| # How do you feel about Production Support? | # How do you feel about Production Support? | ||||||
|  |  | ||||||
| **Translation**: *We're a fairly small team, and when things break on an evening/weekend/Christmas Day, | **Translation**: _We're a fairly small team, and when things break on an evening/weekend/Christmas Day, | ||||||
| can we call on you to be there?* | can we call on you to be there?_ | ||||||
|  |  | ||||||
| I've met decidedly few people in my life who truly enjoy the "ops" side of "devops". | I've met decidedly few people in my life who truly enjoy the "ops" side of "devops". | ||||||
| They're incredibly good at taking an impossible problem, pre-existing knowledge of | They're incredibly good at taking an impossible problem, pre-existing knowledge of | ||||||
| @ -33,4 +33,4 @@ Small teams have no such luck. If you're interviewing at a small company, especi | |||||||
| "data scientist" or other somesuch position, be aware that systems can and do spontaneously | "data scientist" or other somesuch position, be aware that systems can and do spontaneously | ||||||
| combust at the most inopportune moments. | combust at the most inopportune moments. | ||||||
|  |  | ||||||
| **Terrible-but-popular answers include**: *It's a part of the job, and I'm happy to contribute.* | **Terrible-but-popular answers include**: _It's a part of the job, and I'm happy to contribute._ | ||||||
|  | |||||||
| @ -21,7 +21,7 @@ There's another part of the human condition that derives joy from seeing things | |||||||
|  |  | ||||||
| <iframe src="https://giphy.com/embed/YA6dmVW0gfIw8" width="480" height="336" frameBorder="0"></iframe> | <iframe src="https://giphy.com/embed/YA6dmVW0gfIw8" width="480" height="336" frameBorder="0"></iframe> | ||||||
|  |  | ||||||
| And *that's* the part I'm going to focus on. | And _that's_ the part I'm going to focus on. | ||||||
|  |  | ||||||
| # Why an Allocator? | # Why an Allocator? | ||||||
|  |  | ||||||
| @ -88,7 +88,7 @@ as you expect them to. | |||||||
|  |  | ||||||
| So, how exactly does QADAPT solve these problems? **Whenever an allocation or drop occurs in code marked | So, how exactly does QADAPT solve these problems? **Whenever an allocation or drop occurs in code marked | ||||||
| allocation-safe, QADAPT triggers a thread panic.** We don't want to let the program continue as if | allocation-safe, QADAPT triggers a thread panic.** We don't want to let the program continue as if | ||||||
| nothing strange happened, *we want things to explode*. | nothing strange happened, _we want things to explode_. | ||||||
|  |  | ||||||
| However, you don't want code to panic in production because of circumstances you didn't predict. | However, you don't want code to panic in production because of circumstances you didn't predict. | ||||||
| Just like [`debug_assert!`](https://doc.rust-lang.org/std/macro.debug_assert.html), | Just like [`debug_assert!`](https://doc.rust-lang.org/std/macro.debug_assert.html), | ||||||
|  | |||||||
| @ -75,7 +75,7 @@ would struggle without access to [`std::vector`](https://en.cppreference.com/w/c | |||||||
| `Box`, `Rc`, etc., are also unusable for the same reason. | `Box`, `Rc`, etc., are also unusable for the same reason. | ||||||
|  |  | ||||||
| Whether writing code for embedded devices or not, the important thing in both situations | Whether writing code for embedded devices or not, the important thing in both situations | ||||||
| is how much you know *before your application starts* about what its memory usage will look like. | is how much you know _before your application starts_ about what its memory usage will look like. | ||||||
| In embedded devices, there's a small, fixed amount of memory to use. | In embedded devices, there's a small, fixed amount of memory to use. | ||||||
| In a browser, you have no idea how large [google.com](https://www.google.com)'s home page is until you start | In a browser, you have no idea how large [google.com](https://www.google.com)'s home page is until you start | ||||||
| trying to download it. The compiler uses this knowledge (or lack thereof) to optimize | trying to download it. The compiler uses this knowledge (or lack thereof) to optimize | ||||||
|  | |||||||
| @ -7,7 +7,7 @@ tags: [rust, understanding-allocations] | |||||||
| --- | --- | ||||||
|  |  | ||||||
| The first memory type we'll look at is pretty special: when Rust can prove that | The first memory type we'll look at is pretty special: when Rust can prove that | ||||||
| a *value* is fixed for the life of a program (`const`), and when a *reference* is unique for | a _value_ is fixed for the life of a program (`const`), and when a _reference_ is unique for | ||||||
| the life of a program (`static` as a declaration, not | the life of a program (`static` as a declaration, not | ||||||
| [`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) | [`'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) | ||||||
| as a lifetime), we can make use of global memory. This special section of data is embedded | as a lifetime), we can make use of global memory. This special section of data is embedded | ||||||
| @ -21,7 +21,7 @@ for these two keywords is available, we'll take a hands-on approach to the topic | |||||||
|  |  | ||||||
| # **const** | # **const** | ||||||
|  |  | ||||||
| When a *value* is guaranteed to be unchanging in your program (where "value" may be scalars, | When a _value_ is guaranteed to be unchanging in your program (where "value" may be scalars, | ||||||
| `struct`s, etc.), you can declare it `const`. | `struct`s, etc.), you can declare it `const`. | ||||||
| This tells the compiler that it's safe to treat the value as never changing, and enables | This tells the compiler that it's safe to treat the value as never changing, and enables | ||||||
| some interesting optimizations; not only is there no initialization cost to | some interesting optimizations; not only is there no initialization cost to | ||||||
| @ -29,6 +29,7 @@ creating the value (it is loaded at the same time as the executable parts of you | |||||||
| but the compiler can also copy the value around if it speeds up the code. | but the compiler can also copy the value around if it speeds up the code. | ||||||
|  |  | ||||||
| The points we need to address when talking about `const` are: | The points we need to address when talking about `const` are: | ||||||
|  |  | ||||||
| - `Const` values are stored in read-only memory - it's impossible to modify. | - `Const` values are stored in read-only memory - it's impossible to modify. | ||||||
| - Values resulting from calling a `const fn` are materialized at compile-time. | - Values resulting from calling a `const fn` are materialized at compile-time. | ||||||
| - The compiler may (or may not) copy `const` values wherever it chooses. | - The compiler may (or may not) copy `const` values wherever it chooses. | ||||||
| @ -38,7 +39,7 @@ The points we need to address when talking about `const` are: | |||||||
| The first point is a bit strange - "read-only memory." | The first point is a bit strange - "read-only memory." | ||||||
| [The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants) | [The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants) | ||||||
| mentions in a couple places that using `mut` with constants is illegal, | mentions in a couple places that using `mut` with constants is illegal, | ||||||
| but it's also important to demonstrate just how immutable they are. *Typically* in Rust | but it's also important to demonstrate just how immutable they are. _Typically_ in Rust | ||||||
| you can use [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) | you can use [interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) | ||||||
| to modify things that aren't declared `mut`. | to modify things that aren't declared `mut`. | ||||||
| [`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an | [`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an | ||||||
| @ -62,6 +63,7 @@ fn main() { | |||||||
|     println!("Cell: {:?}", cell); |     println!("Cell: {:?}", cell); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2) | ||||||
|  |  | ||||||
| When `const` is involved though, interior mutability is impossible: | When `const` is involved though, interior mutability is impossible: | ||||||
| @ -83,6 +85,7 @@ fn main() { | |||||||
|     println!("Cell: {:?}", &CELL); |     println!("Cell: {:?}", &CELL); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00) | ||||||
|  |  | ||||||
| And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html): | And a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html): | ||||||
| @ -101,6 +104,7 @@ fn main() { | |||||||
|     SURPRISE.call_once(|| println!("Initializing again???")); |     SURPRISE.call_once(|| println!("Initializing again???")); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed) | ||||||
|  |  | ||||||
| When the [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md) | When the [`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md) | ||||||
| @ -110,7 +114,7 @@ but it's still something to be aware of. | |||||||
|  |  | ||||||
| ## Initialization == Compilation | ## Initialization == Compilation | ||||||
|  |  | ||||||
| The next thing to mention is that `const` values are loaded into memory *as part of your program binary*. | The next thing to mention is that `const` values are loaded into memory _as part of your program binary_. | ||||||
| Because of this, any `const` values declared in your program will be "realized" at compile-time; | Because of this, any `const` values declared in your program will be "realized" at compile-time; | ||||||
| accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may | accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may | ||||||
| be able to prefetch the value), but that's it. | be able to prefetch the value), but that's it. | ||||||
| @ -125,6 +129,7 @@ pub fn multiply(value: u32) -> u32 { | |||||||
|     value * (*CELL.get_mut()) |     value * (*CELL.get_mut()) | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/Th8boO) | -- [Compiler Explorer](https://godbolt.org/z/Th8boO) | ||||||
|  |  | ||||||
| The compiler creates one `RefCell`, uses it everywhere, and never | The compiler creates one `RefCell`, uses it everywhere, and never | ||||||
| @ -147,6 +152,7 @@ pub fn multiply_twice(value: u32) -> u32 { | |||||||
|     value * FACTOR * FACTOR |     value * FACTOR * FACTOR | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/ZtS54X) | -- [Compiler Explorer](https://godbolt.org/z/ZtS54X) | ||||||
|  |  | ||||||
| In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction | In this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction | ||||||
| @ -156,19 +162,20 @@ in both the `multiply` and `multiply_twice` functions; the "1000" value is never | |||||||
| Finally, getting the address of a `const` value is possible, but not guaranteed | Finally, getting the address of a `const` value is possible, but not guaranteed | ||||||
| to be unique (because the compiler can choose to copy values). I was unable to | to be unique (because the compiler can choose to copy values). I was unable to | ||||||
| get non-unique pointers in my testing (even using different crates), | get non-unique pointers in my testing (even using different crates), | ||||||
| but the specifications are clear enough: *don't rely on pointers to `const` | but the specifications are clear enough: _don't rely on pointers to `const` | ||||||
| values being consistent*. To be frank, caring about locations for `const` values | values being consistent_. To be frank, caring about locations for `const` values | ||||||
| is almost certainly a code smell. | is almost certainly a code smell. | ||||||
|  |  | ||||||
| # **static** | # **static** | ||||||
|  |  | ||||||
| Static variables are related to `const` variables, but take a slightly different approach. | Static variables are related to `const` variables, but take a slightly different approach. | ||||||
| When we declare that a *reference* is unique for the life of a program, | When we declare that a _reference_ is unique for the life of a program, | ||||||
| you have a `static` variable (unrelated to the `'static` lifetime). Because of the | you have a `static` variable (unrelated to the `'static` lifetime). Because of the | ||||||
| reference/value distinction with `const`/`static`, | reference/value distinction with `const`/`static`, | ||||||
| static variables behave much more like typical "global" variables. | static variables behave much more like typical "global" variables. | ||||||
|  |  | ||||||
| But to understand `static`, here's what we'll look at: | But to understand `static`, here's what we'll look at: | ||||||
|  |  | ||||||
| - `static` variables are globally unique locations in memory. | - `static` variables are globally unique locations in memory. | ||||||
| - Like `const`, `static` variables are loaded at the same time as your program being read into memory. | - Like `const`, `static` variables are loaded at the same time as your program being read into memory. | ||||||
| - All `static` variables must implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) | - All `static` variables must implement the [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) | ||||||
| @ -195,6 +202,7 @@ pub fn multiply_twice(value: u32) -> u32 { | |||||||
|     value * FACTOR * FACTOR |     value * FACTOR * FACTOR | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/uxmiRQ) | -- [Compiler Explorer](https://godbolt.org/z/uxmiRQ) | ||||||
|  |  | ||||||
| Where [previously](#copying) there were plenty of | Where [previously](#copying) there were plenty of | ||||||
| @ -225,6 +233,7 @@ fn main() { | |||||||
|     println!("Static MyStruct: {:?}", MY_STRUCT); |     println!("Static MyStruct: {:?}", MY_STRUCT); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e) | ||||||
|  |  | ||||||
| Things can get a bit weirder when using `const fn` though. In most cases, it just works: | Things can get a bit weirder when using `const fn` though. In most cases, it just works: | ||||||
| @ -247,6 +256,7 @@ fn main() { | |||||||
|     println!("const fn Static MyStruct: {:?}", MY_STRUCT); |     println!("const fn Static MyStruct: {:?}", MY_STRUCT); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255) | ||||||
|  |  | ||||||
| However, there's a caveat: you're currently not allowed to use `const fn` to initialize | However, there's a caveat: you're currently not allowed to use `const fn` to initialize | ||||||
| @ -261,6 +271,7 @@ use std::cell::RefCell; | |||||||
| // error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely | // error[E0277]: `std::cell::RefCell<u8>` cannot be shared between threads safely | ||||||
| static MY_LOCK: RefCell<u8> = RefCell::new(0); | static MY_LOCK: RefCell<u8> = RefCell::new(0); | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560) | ||||||
|  |  | ||||||
| It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though. | It's likely that this will [change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though. | ||||||
| @ -292,6 +303,7 @@ static MY_STRUCT: MyStruct = MyStruct { | |||||||
|     y: RefCell::new(8) |     y: RefCell::new(8) | ||||||
| }; | }; | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc) | ||||||
|  |  | ||||||
| ## Interior Mutability | ## Interior Mutability | ||||||
| @ -315,4 +327,5 @@ fn main() { | |||||||
|     INIT.call_once(|| panic!("INIT was called twice!")); |     INIT.call_once(|| panic!("INIT was called twice!")); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59) | ||||||
|  | |||||||
| @ -23,7 +23,8 @@ When you're finished with stack memory, the `pop` instruction runs in | |||||||
| 1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation | 1-3 cycles, as opposed to an allocator needing to worry about memory fragmentation | ||||||
| and other issues with the heap. All sorts of incredibly sophisticated techniques have been used | and other issues with the heap. All sorts of incredibly sophisticated techniques have been used | ||||||
| to design allocators: | to design allocators: | ||||||
| - [Garbage Collection](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)) |  | ||||||
|  | - [Garbage Collection](<https://en.wikipedia.org/wiki/Garbage_collection_(computer_science)>) | ||||||
|   strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) |   strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) | ||||||
|   (used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) |   (used in [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) | ||||||
|   and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) |   and [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) | ||||||
| @ -57,6 +58,7 @@ when stack and heap memory regions are used: | |||||||
|  |  | ||||||
| 1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) | 1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) | ||||||
|    indicate allocation of stack memory: |    indicate allocation of stack memory: | ||||||
|  |  | ||||||
|    ```rust |    ```rust | ||||||
|    pub fn stack_alloc(x: u32) -> u32 { |    pub fn stack_alloc(x: u32) -> u32 { | ||||||
|        // Space for `y` is allocated by subtracting from `rsp`, |        // Space for `y` is allocated by subtracting from `rsp`, | ||||||
| @ -66,11 +68,13 @@ when stack and heap memory regions are used: | |||||||
|        x |        x | ||||||
|    } |    } | ||||||
|    ``` |    ``` | ||||||
|  |  | ||||||
|    -- [Compiler Explorer](https://godbolt.org/z/5WSgc9) |    -- [Compiler Explorer](https://godbolt.org/z/5WSgc9) | ||||||
|  |  | ||||||
| 2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to | 2. Tracking when exactly heap allocation calls occur is difficult. It's typically easier to | ||||||
|    watch for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened |    watch for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened | ||||||
|    in the recent past: |    in the recent past: | ||||||
|  |  | ||||||
|    ```rust |    ```rust | ||||||
|    pub fn heap_alloc(x: usize) -> usize { |    pub fn heap_alloc(x: usize) -> usize { | ||||||
|        // Space for elements in a vector has to be allocated |        // Space for elements in a vector has to be allocated | ||||||
| @ -80,6 +84,7 @@ when stack and heap memory regions are used: | |||||||
|        x |        x | ||||||
|    } |    } | ||||||
|    ``` |    ``` | ||||||
|  |  | ||||||
|    -- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317) |    -- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317) | ||||||
|    <span style="font-size: .8em">Note: While the [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) |    <span style="font-size: .8em">Note: While the [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) | ||||||
|    is [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46), |    is [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46), | ||||||
| @ -132,6 +137,7 @@ pub fn make_line() { | |||||||
|     let ray = Line { a: origin, b: point }; |     let ray = Line { a: origin, b: point }; | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/vri9BE) | -- [Compiler Explorer](https://godbolt.org/z/vri9BE) | ||||||
|  |  | ||||||
| Note that while some extra-fancy instructions are used for memory manipulation in the assembly, | Note that while some extra-fancy instructions are used for memory manipulation in the assembly, | ||||||
| @ -197,6 +203,7 @@ pub fn total_distance() { | |||||||
|     let _dist_2 = distance(&middle, &end); |     let _dist_2 = distance(&middle, &end); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/Qmx4ST) | -- [Compiler Explorer](https://godbolt.org/z/Qmx4ST) | ||||||
|  |  | ||||||
| As a consequence of function arguments never using heap memory, we can also | As a consequence of function arguments never using heap memory, we can also | ||||||
| @ -234,6 +241,7 @@ pub fn total_distance() { | |||||||
|     let _dist_2 = distance(&middle, &end); |     let _dist_2 = distance(&middle, &end); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/30Sh66) | -- [Compiler Explorer](https://godbolt.org/z/30Sh66) | ||||||
|  |  | ||||||
| Finally, passing by value (arguments with type | Finally, passing by value (arguments with type | ||||||
| @ -274,6 +282,7 @@ pub fn distance_borrowed(a: &Point, b: &Point) -> i64 { | |||||||
|     squared / squared |     squared / squared | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/06hGiv) | -- [Compiler Explorer](https://godbolt.org/z/06hGiv) | ||||||
|  |  | ||||||
| # Enums | # Enums | ||||||
| @ -304,6 +313,7 @@ pub fn enum_compare() { | |||||||
|     let opt = Option::Some(z); |     let opt = Option::Some(z); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/HK7zBx) | -- [Compiler Explorer](https://godbolt.org/z/HK7zBx) | ||||||
|  |  | ||||||
| Because the size of an `enum` is the size of its largest element plus a flag, | Because the size of an `enum` is the size of its largest element plus a flag, | ||||||
| @ -353,6 +363,7 @@ fn main() { | |||||||
|     let _x = EightM::default(); |     let _x = EightM::default(); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4) | ||||||
|  |  | ||||||
| There aren't any security implications of this (no memory corruption occurs), | There aren't any security implications of this (no memory corruption occurs), | ||||||
| @ -367,7 +378,7 @@ are actually objects created on the heap that capture local primitives by copyin | |||||||
| local non-primitives as (`final`) references. | local non-primitives as (`final`) references. | ||||||
| [Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and | [Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and | ||||||
| [JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/) | [JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/) | ||||||
| both bind *everything* by reference normally, but Python can also | both bind _everything_ by reference normally, but Python can also | ||||||
| [capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has | [capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has | ||||||
| [Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions). | [Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions). | ||||||
|  |  | ||||||
| @ -395,6 +406,7 @@ pub fn immediate() { | |||||||
|     my_func()(); |     my_func()(); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions | -- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions | ||||||
|  |  | ||||||
| If we store a reference to the closure, the Rust compiler keeps values it needs | If we store a reference to the closure, the Rust compiler keeps values it needs | ||||||
| @ -410,6 +422,7 @@ pub fn simple_reference() { | |||||||
|     x(); |     x(); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions | -- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions | ||||||
|  |  | ||||||
| Even things like variable order can make a difference in instruction count: | Even things like variable order can make a difference in instruction count: | ||||||
| @ -422,6 +435,7 @@ pub fn complex() { | |||||||
|     y(); |     y(); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions | -- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions | ||||||
|  |  | ||||||
| In every circumstance though, the compiler ensured that no heap allocations were necessary. | In every circumstance though, the compiler ensured that no heap allocations were necessary. | ||||||
| @ -430,7 +444,7 @@ In every circumstance though, the compiler ensured that no heap allocations were | |||||||
|  |  | ||||||
| Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) | Traits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) | ||||||
| and dynamic dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often | and dynamic dispatch (trait objects, `dyn Trait`). While dynamic dispatch is often | ||||||
| *associated* with trait objects being stored in the heap, dynamic dispatch can be used | _associated_ with trait objects being stored in the heap, dynamic dispatch can be used | ||||||
| with stack allocated objects as well: | with stack allocated objects as well: | ||||||
|  |  | ||||||
| ```rust | ```rust | ||||||
| @ -481,6 +495,7 @@ pub fn do_call() { | |||||||
|     retrieve_int(&b); |     retrieve_int(&b); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/u_yguS) | -- [Compiler Explorer](https://godbolt.org/z/u_yguS) | ||||||
|  |  | ||||||
| It's hard to imagine practical situations where dynamic dispatch would be | It's hard to imagine practical situations where dynamic dispatch would be | ||||||
| @ -493,9 +508,9 @@ Understanding move semantics and copy semantics in Rust is weird at first. The R | |||||||
| far better than can be addressed here, so I'll leave them to do the job. | far better than can be addressed here, so I'll leave them to do the job. | ||||||
| From a memory perspective though, their guideline is reasonable: | From a memory perspective though, their guideline is reasonable: | ||||||
| [if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy). | [if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy). | ||||||
| While there are potential speed tradeoffs to *benchmark* when discussing `Copy` | While there are potential speed tradeoffs to _benchmark_ when discussing `Copy` | ||||||
| (move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s), | (move semantics for stack objects vs. copying stack pointers vs. copying stack `struct`s), | ||||||
| *it's impossible for `Copy` to introduce a heap allocation*. | _it's impossible for `Copy` to introduce a heap allocation_. | ||||||
|  |  | ||||||
| But why is this the case? Fundamentally, it's because the language controls | But why is this the case? Fundamentally, it's because the language controls | ||||||
| what `Copy` means - | what `Copy` means - | ||||||
| @ -519,6 +534,7 @@ struct NotCopyable { | |||||||
|     x: Box<u64> |     x: Box<u64> | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/VToRuK) | -- [Compiler Explorer](https://godbolt.org/z/VToRuK) | ||||||
|  |  | ||||||
| # Iterators | # Iterators | ||||||
| @ -587,4 +603,5 @@ pub fn sum_hm(x: &HashMap<u32, u32>) { | |||||||
|     } |     } | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/FTT3CT) | -- [Compiler Explorer](https://godbolt.org/z/FTT3CT) | ||||||
|  | |||||||
| @ -8,12 +8,12 @@ tags: [rust, understanding-allocations] | |||||||
|  |  | ||||||
| Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), | Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), | ||||||
| and some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, | and some languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, | ||||||
| how the language uses dynamic memory (also referred to as the **heap**) is a system called *ownership*. | how the language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_. | ||||||
| And as the docs mention, ownership | And as the docs mention, ownership | ||||||
| [is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). | [is Rust's most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html). | ||||||
|  |  | ||||||
| The heap is used in two situations; when the compiler is unable to predict either the *total size | The heap is used in two situations; when the compiler is unable to predict either the _total size | ||||||
| of memory needed*, or *how long the memory is needed for*, it allocates space in the heap. | of memory needed_, or _how long the memory is needed for_, it allocates space in the heap. | ||||||
| This happens pretty frequently; if you want to download the Google home page, you won't know | This happens pretty frequently; if you want to download the Google home page, you won't know | ||||||
| how large it is until your program runs. And when you're finished with Google, we deallocate | how large it is until your program runs. And when you're finished with Google, we deallocate | ||||||
| the memory so it can be used to store other webpages. If you're | the memory so it can be used to store other webpages. If you're | ||||||
| @ -64,6 +64,7 @@ fn main() { | |||||||
|     println!("There were {} allocations before calling main!", x); |     println!("There were {} allocations before calling main!", x); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e) | -- [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e) | ||||||
|  |  | ||||||
| As of the time of writing, there are five allocations that happen before `main` | As of the time of writing, there are five allocations that happen before `main` | ||||||
| @ -78,6 +79,7 @@ we'll follow this guide: | |||||||
|  |  | ||||||
| Finally, there are two "addendum" issues that are important to address when discussing | Finally, there are two "addendum" issues that are important to address when discussing | ||||||
| Rust and the heap: | Rust and the heap: | ||||||
|  |  | ||||||
| - Non-heap alternatives to many standard library types are available. | - Non-heap alternatives to many standard library types are available. | ||||||
| - Special allocators to track memory behavior should be used to benchmark code. | - Special allocators to track memory behavior should be used to benchmark code. | ||||||
|  |  | ||||||
| @ -93,6 +95,7 @@ comes from C++, and while it's closely linked to a general design pattern of | |||||||
| we'll use it here specifically to describe objects that are responsible for managing | we'll use it here specifically to describe objects that are responsible for managing | ||||||
| ownership of data allocated on the heap. The smart pointers available in the `alloc` | ownership of data allocated on the heap. The smart pointers available in the `alloc` | ||||||
| crate should look mostly familiar: | crate should look mostly familiar: | ||||||
|  |  | ||||||
| - [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html) | - [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html) | ||||||
| - [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html) | - [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html) | ||||||
| - [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html) | - [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html) | ||||||
| @ -100,6 +103,7 @@ crate should look mostly familiar: | |||||||
|  |  | ||||||
| The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers | The [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers | ||||||
| to manage heap objects, though more than can be covered here. Some examples are: | to manage heap objects, though more than can be covered here. Some examples are: | ||||||
|  |  | ||||||
| - [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) | - [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) | ||||||
| - [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) | - [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html) | ||||||
|  |  | ||||||
| @ -142,6 +146,7 @@ pub fn my_cow() { | |||||||
|     Cow::from("drop"); |     Cow::from("drop"); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/4AMQug) | -- [Compiler Explorer](https://godbolt.org/z/4AMQug) | ||||||
|  |  | ||||||
| # Collections | # Collections | ||||||
| @ -157,8 +162,8 @@ Common types that fall under this umbrella are | |||||||
| [`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) | [`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) | ||||||
| (not [`str`](https://doc.rust-lang.org/std/primitive.str.html)). | (not [`str`](https://doc.rust-lang.org/std/primitive.str.html)). | ||||||
|  |  | ||||||
| While collections store the objects they own in heap memory, *creating new collections | While collections store the objects they own in heap memory, _creating new collections | ||||||
| will not allocate on the heap*. This is a bit weird; if we call `Vec::new()`, the | will not allocate on the heap_. This is a bit weird; if we call `Vec::new()`, the | ||||||
| assembly shows a corresponding call to `real_drop_in_place`: | assembly shows a corresponding call to `real_drop_in_place`: | ||||||
|  |  | ||||||
| ```rust | ```rust | ||||||
| @ -167,6 +172,7 @@ pub fn my_vec() { | |||||||
|     Vec::<u8>::new(); |     Vec::<u8>::new(); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/1WkNtC) | -- [Compiler Explorer](https://godbolt.org/z/1WkNtC) | ||||||
|  |  | ||||||
| But because the vector has no elements to manage, no calls to the allocator | But because the vector has no elements to manage, no calls to the allocator | ||||||
| @ -210,6 +216,7 @@ unsafe impl GlobalAlloc for PanicAllocator { | |||||||
|     } |     } | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6) | ||||||
|  |  | ||||||
| Other standard library types follow the same behavior; make sure to check out | Other standard library types follow the same behavior; make sure to check out | ||||||
|  | |||||||
| @ -37,8 +37,8 @@ memory behavior if it's something you care about**. | |||||||
| It's far too easy to mis-read assembly in large code sections, you should | It's far too easy to mis-read assembly in large code sections, you should | ||||||
| always verify behavior if you care about memory usage. | always verify behavior if you care about memory usage. | ||||||
|  |  | ||||||
| The guiding principal as we move forward is this: *optimizing compilers | The guiding principal as we move forward is this: _optimizing compilers | ||||||
| won't produce worse programs than we started with.* There won't be any | won't produce worse programs than we started with._ There won't be any | ||||||
| situations where stack allocations get moved to heap allocations. | situations where stack allocations get moved to heap allocations. | ||||||
| There will, however, be an opera of optimization. | There will, however, be an opera of optimization. | ||||||
|  |  | ||||||
| @ -101,6 +101,7 @@ unsafe impl GlobalAlloc for PanicAllocator { | |||||||
|     } |     } | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)   | -- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)   | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d) | ||||||
|  |  | ||||||
| @ -147,5 +148,6 @@ pub fn main() { | |||||||
|     let _x = EightM::default(); |     let _x = EightM::default(); | ||||||
| } | } | ||||||
| ``` | ``` | ||||||
|  |  | ||||||
| -- [Compiler Explorer](https://godbolt.org/z/daHn7P)   | -- [Compiler Explorer](https://godbolt.org/z/daHn7P)   | ||||||
| -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0) | -- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0) | ||||||
|  | |||||||
| @ -13,12 +13,14 @@ an object on the heap or not. And while Rust will prioritize the fastest behavio | |||||||
| here are the rules for each memory type: | here are the rules for each memory type: | ||||||
|  |  | ||||||
| **Heap Allocation**: | **Heap Allocation**: | ||||||
|  |  | ||||||
| - Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory. | - Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory. | ||||||
| - Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory. | - Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory. | ||||||
| - Some smart pointers in the standard library have counterparts in other crates that | - Some smart pointers in the standard library have counterparts in other crates that | ||||||
|   don't need heap memory. If possible, use those. |   don't need heap memory. If possible, use those. | ||||||
|  |  | ||||||
| **Stack Allocation**: | **Stack Allocation**: | ||||||
|  |  | ||||||
| - Everything not using a smart pointer will be allocated on the stack. | - Everything not using a smart pointer will be allocated on the stack. | ||||||
| - Structs, enums, iterators, arrays, and closures are all stack allocated. | - Structs, enums, iterators, arrays, and closures are all stack allocated. | ||||||
| - Cell types (`RefCell`) behave like smart pointers, but are stack-allocated. | - Cell types (`RefCell`) behave like smart pointers, but are stack-allocated. | ||||||
| @ -26,6 +28,7 @@ here are the rules for each memory type: | |||||||
| - Types that are marked `Copy` are guaranteed to have their contents stack-allocated. | - Types that are marked `Copy` are guaranteed to have their contents stack-allocated. | ||||||
|  |  | ||||||
| **Global Allocation**: | **Global Allocation**: | ||||||
|  |  | ||||||
| - `const` is a fixed value; the compiler is allowed to copy it wherever useful. | - `const` is a fixed value; the compiler is allowed to copy it wherever useful. | ||||||
| - `static` is a fixed reference; the compiler will guarantee it is unique. | - `static` is a fixed reference; the compiler will guarantee it is unique. | ||||||
|  |  | ||||||
|  | |||||||
| @ -20,23 +20,26 @@ Having now worked in the trading industry, I can confirm the developers aren't s | |||||||
|  |  | ||||||
| The framework I'd propose is this: **If you want to build high-performance systems, focus first on reducing performance variance** (reducing the gap between the fastest and slowest runs of the same code), **and only look at average latency once variance is at an acceptable level**. | The framework I'd propose is this: **If you want to build high-performance systems, focus first on reducing performance variance** (reducing the gap between the fastest and slowest runs of the same code), **and only look at average latency once variance is at an acceptable level**. | ||||||
|  |  | ||||||
| Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly *every time*, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving average  time. | Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly _every time_, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving average time. | ||||||
|  |  | ||||||
| This focus on variance shows up all the time in industry too (emphasis added in all quotes below): | This focus on variance shows up all the time in industry too (emphasis added in all quotes below): | ||||||
|  |  | ||||||
| - In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for NASDAQ's matching engine, the most performance-sensitive component of the exchange, dependability is highlighted in addition to instantaneous metrics: | - In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for NASDAQ's matching engine, the most performance-sensitive component of the exchange, dependability is highlighted in addition to instantaneous metrics: | ||||||
|  |  | ||||||
|   > Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40 microsecond average latency |   > Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40 microsecond average latency | ||||||
|  |  | ||||||
| - The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance: | - The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance: | ||||||
|  |  | ||||||
|   > Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and **most predictable latency possible** of any messaging system |   > Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and **most predictable latency possible** of any messaging system | ||||||
|  |  | ||||||
| - The company PolySync, which is working on autonomous vehicles, [mentions why](https://polysync.io/blog/session-types-for-hearty-codecs/) they picked their specific messaging format: | - The company PolySync, which is working on autonomous vehicles, [mentions why](https://polysync.io/blog/session-types-for-hearty-codecs/) they picked their specific messaging format: | ||||||
|  |  | ||||||
|   > In general, high performance is almost always desirable for serialization. But in the world of autonomous vehicles, **steady timing performance is even more important** than peak throughput. This is because safe operation is sensitive to timing outliers. Nobody wants the system that decides when to slam on the brakes to occasionally take 100 times longer than usual to encode its commands. |   > In general, high performance is almost always desirable for serialization. But in the world of autonomous vehicles, **steady timing performance is even more important** than peak throughput. This is because safe operation is sensitive to timing outliers. Nobody wants the system that decides when to slam on the brakes to occasionally take 100 times longer than usual to encode its commands. | ||||||
|  |  | ||||||
| - [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out variance (jitter) as a big concern for [electronic trading](https://solarflare.com/electronic-trading/): | - [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out variance (jitter) as a big concern for [electronic trading](https://solarflare.com/electronic-trading/): | ||||||
|   > The high stakes world of electronic trading, investment banks, market makers, hedge funds and exchanges demand the **lowest possible latency and jitter** while utilizing the highest bandwidth and return on their investment. |   > The high stakes world of electronic trading, investment banks, market makers, hedge funds and exchanges demand the **lowest possible latency and jitter** while utilizing the highest bandwidth and return on their investment. | ||||||
|  |  | ||||||
| And to further clarify: we're not discussing *total run-time*, but variance of total run-time. There are situations where it's not reasonably possible to make things faster, and you'd much rather be consistent. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at *absolute minimum* a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago). If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a trade occurs, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of *additional processing* to a minimum, since speed of light is the limiting factor. | And to further clarify: we're not discussing _total run-time_, but variance of total run-time. There are situations where it's not reasonably possible to make things faster, and you'd much rather be consistent. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at _absolute minimum_ a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago). If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a trade occurs, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of _additional processing_ to a minimum, since speed of light is the limiting factor. | ||||||
|  |  | ||||||
| So how does one go about looking for and eliminating performance variance? To tell the truth, I don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep understanding of the entire technology stack, and (B) actually measuring system performance (though (C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for inspiration never hurt). Even then, every project cares about performance to a different degree; you may need to build an entire [replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to accurately benchmark at nanosecond precision, or you may be content to simply [avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in your Java code. | So how does one go about looking for and eliminating performance variance? To tell the truth, I don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep understanding of the entire technology stack, and (B) actually measuring system performance (though (C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for inspiration never hurt). Even then, every project cares about performance to a different degree; you may need to build an entire [replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to accurately benchmark at nanosecond precision, or you may be content to simply [avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in your Java code. | ||||||
|  |  | ||||||
| @ -45,6 +48,7 @@ Even though everyone has different needs, there are still common things to look | |||||||
| ## Language-specific | ## Language-specific | ||||||
|  |  | ||||||
| **Garbage Collection**: How often does garbage collection happen? When is it triggered? What are the impacts? | **Garbage Collection**: How often does garbage collection happen? When is it triggered? What are the impacts? | ||||||
|  |  | ||||||
| - [In Python](https://rushter.com/blog/python-garbage-collector/), individual objects are collected if the reference count reaches 0, and each generation is collected if `num_alloc - num_dealloc > gc_threshold` whenever an allocation happens. The GIL is acquired for the duration of generational collection. | - [In Python](https://rushter.com/blog/python-garbage-collector/), individual objects are collected if the reference count reaches 0, and each generation is collected if `num_alloc - num_dealloc > gc_threshold` whenever an allocation happens. The GIL is acquired for the duration of generational collection. | ||||||
| - Java has [many](https://docs.oracle.com/en/java/javase/12/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE) [different](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573) [collection](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A) [algorithms](https://docs.oracle.com/en/java/javase/12/gctuning/z-garbage-collector1.html#GUID-A5A42691-095E-47BA-B6DC-FB4E5FAA43D0) to choose from, each with different characteristics. The default algorithms (Parallel GC in Java 8, G1 in Java 9) freeze the JVM while collecting, while more recent algorithms ([ZGC](https://wiki.openjdk.java.net/display/zgc) and [Shenandoah](https://wiki.openjdk.java.net/display/shenandoah)) are designed to keep "stop the world" to a minimum by doing collection work in parallel. | - Java has [many](https://docs.oracle.com/en/java/javase/12/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE) [different](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573) [collection](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A) [algorithms](https://docs.oracle.com/en/java/javase/12/gctuning/z-garbage-collector1.html#GUID-A5A42691-095E-47BA-B6DC-FB4E5FAA43D0) to choose from, each with different characteristics. The default algorithms (Parallel GC in Java 8, G1 in Java 9) freeze the JVM while collecting, while more recent algorithms ([ZGC](https://wiki.openjdk.java.net/display/zgc) and [Shenandoah](https://wiki.openjdk.java.net/display/shenandoah)) are designed to keep "stop the world" to a minimum by doing collection work in parallel. | ||||||
|  |  | ||||||
| @ -58,7 +62,7 @@ Even though everyone has different needs, there are still common things to look | |||||||
|  |  | ||||||
| ## Kernel | ## Kernel | ||||||
|  |  | ||||||
| Code you wrote is almost certainly not the *only* code running on your hardware. There are many ways the operating system interacts with your program, from interrupts to system calls, that are important to watch for. These are written from a Linux perspective, but Windows does typically have equivalent functionality. | Code you wrote is almost certainly not the _only_ code running on your hardware. There are many ways the operating system interacts with your program, from interrupts to system calls, that are important to watch for. These are written from a Linux perspective, but Windows does typically have equivalent functionality. | ||||||
|  |  | ||||||
| **Scheduling**: The kernel is normally free to schedule any process on any core, so it's important to reserve CPU cores exclusively for the important programs. There are a few parts to this: first, limit the CPU cores that non-critical processes are allowed to run on by excluding cores from scheduling ([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html) kernel command-line option), or by setting the `init` process CPU affinity ([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes to run on the isolated cores by setting the [processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using [taskset](https://linux.die.net/man/1/taskset). Finally, use [`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or [`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off hyper-threading is also likely beneficial. | **Scheduling**: The kernel is normally free to schedule any process on any core, so it's important to reserve CPU cores exclusively for the important programs. There are a few parts to this: first, limit the CPU cores that non-critical processes are allowed to run on by excluding cores from scheduling ([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html) kernel command-line option), or by setting the `init` process CPU affinity ([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes to run on the isolated cores by setting the [processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using [taskset](https://linux.die.net/man/1/taskset). Finally, use [`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or [`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off hyper-threading is also likely beneficial. | ||||||
|  |  | ||||||
|  | |||||||
| @ -204,7 +204,7 @@ This test measures, on a | |||||||
| how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer. | how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer. | ||||||
|  |  | ||||||
| | Schema               | Median | 99th Pctl | 99.9th Pctl | Total  | | | Schema               | Median | 99th Pctl | 99.9th Pctl | Total  | | ||||||
| |:---------------------|:-------|:----------|:------------|:-------| | | :------------------- | :----- | :-------- | :---------- | :----- | | ||||||
| | Cap'n Proto Packed   | 413ns  | 1751ns    | 2943ns      | 14.80s | | | Cap'n Proto Packed   | 413ns  | 1751ns    | 2943ns      | 14.80s | | ||||||
| | Cap'n Proto Unpacked | 273ns  | 1828ns    | 2836ns      | 10.65s | | | Cap'n Proto Unpacked | 273ns  | 1828ns    | 2836ns      | 10.65s | | ||||||
| | Flatbuffers          | 355ns  | 2185ns    | 3497ns      | 14.31s | | | Flatbuffers          | 355ns  | 2185ns    | 3497ns      | 14.31s | | ||||||
| @ -219,7 +219,7 @@ perform some basic aggregation. The aggregation code is the same for each format | |||||||
| so any performance differences are due solely to the format implementation. | so any performance differences are due solely to the format implementation. | ||||||
|  |  | ||||||
| | Schema               | Median | 99th Pctl | 99.9th Pctl | Total  | | | Schema               | Median | 99th Pctl | 99.9th Pctl | Total  | | ||||||
| |:---------------------|:-------|:----------|:------------|:-------| | | :------------------- | :----- | :-------- | :---------- | :----- | | ||||||
| | Cap'n Proto Packed   | 539ns  | 1216ns    | 2599ns      | 18.92s | | | Cap'n Proto Packed   | 539ns  | 1216ns    | 2599ns      | 18.92s | | ||||||
| | Cap'n Proto Unpacked | 366ns  | 737ns     | 1583ns      | 12.32s | | | Cap'n Proto Unpacked | 366ns  | 737ns     | 1583ns      | 12.32s | | ||||||
| | Flatbuffers          | 173ns  | 421ns     | 1007ns      | 6.00s  | | | Flatbuffers          | 173ns  | 421ns     | 1007ns      | 6.00s  | | ||||||
|  | |||||||
| @ -8,7 +8,7 @@ tags: [python] | |||||||
|  |  | ||||||
| Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock) (GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision made before multi-core CPU's were widely available, but the fact that it's still around indicates that it generally works [Good](https://wiki.c2.com/?PrematureOptimization) [Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective workarounds; it's not hard to start a [new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to synchronize code running in parallel. | Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock) (GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision made before multi-core CPU's were widely available, but the fact that it's still around indicates that it generally works [Good](https://wiki.c2.com/?PrematureOptimization) [Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective workarounds; it's not hard to start a [new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to synchronize code running in parallel. | ||||||
|  |  | ||||||
| Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of asynchronicity and *M:N* threading, Python seems lacking. The ideal scenario is to take advantage of both Python's productivity and the modern CPU's parallel capabilities. | Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of both Python's productivity and the modern CPU's parallel capabilities. | ||||||
|  |  | ||||||
| Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about whether it's a good idea to use them. Very often, unlocking the GIL is an [XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at the expense of project complexity; messing with the GIL is ultimately messing with Python's memory model. | Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about whether it's a good idea to use them. Very often, unlocking the GIL is an [XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at the expense of project complexity; messing with the GIL is ultimately messing with Python's memory model. | ||||||
|  |  | ||||||
| @ -84,10 +84,8 @@ _ = cython_nogil(N); | |||||||
| > Wall time: 388 ms | > Wall time: 388 ms | ||||||
| > </pre> | > </pre> | ||||||
|  |  | ||||||
|  |  | ||||||
| Both versions (with and without GIL) take effectively the same amount of time to run. Even when running this calculation in parallel on separate threads, it is expected that the run time will double because only one thread can be active at a time: | Both versions (with and without GIL) take effectively the same amount of time to run. Even when running this calculation in parallel on separate threads, it is expected that the run time will double because only one thread can be active at a time: | ||||||
|  |  | ||||||
|  |  | ||||||
| ```python | ```python | ||||||
| %%time | %%time | ||||||
| from threading import Thread | from threading import Thread | ||||||
| @ -106,10 +104,8 @@ t1.join(); t2.join() | |||||||
| > Wall time: 645 ms | > Wall time: 645 ms | ||||||
| > </pre> | > </pre> | ||||||
|  |  | ||||||
|  |  | ||||||
| However, if the first thread releases the GIL, the second thread is free to acquire it and run in parallel: | However, if the first thread releases the GIL, the second thread is free to acquire it and run in parallel: | ||||||
|  |  | ||||||
|  |  | ||||||
| ```python | ```python | ||||||
| %%time | %%time | ||||||
|  |  | ||||||
| @ -175,7 +171,7 @@ To conclude: use Cython's `nogil` annotation to assert that functions are safe f | |||||||
|  |  | ||||||
| # Numba | # Numba | ||||||
|  |  | ||||||
| Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by compiling a Python-like language to C/C++, Numba compiles Python bytecode *directly to machine code* at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function first compiles it to machine code before running. Calling the function a second time re-uses that machine code unless the argument types have changed. | Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_ at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function first compiles it to machine code before running. Calling the function a second time re-uses that machine code unless the argument types have changed. | ||||||
|  |  | ||||||
| Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the `@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython` are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to release the lock, the GIL will remain locked if `nogil=False` (the default). | Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the `@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython` are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to release the lock, the GIL will remain locked if `nogil=False` (the default). | ||||||
|  |  | ||||||
|  | |||||||
		Reference in New Issue
	
	Block a user