diff --git a/blog/2018-05-28-hello/_article.md b/blog/2018-05-28-hello/_article.md new file mode 100644 index 0000000..f7c76c7 --- /dev/null +++ b/blog/2018-05-28-hello/_article.md @@ -0,0 +1,38 @@ +--- +layout: post +title: "Hello!" +description: "" +category: +tags: [] +--- + +I'll do what I can to keep this short, there's plenty of other things we both should be doing right +now. + +If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you +covered: + +![Saturday Bread]({{ "/assets/images/2018-05-28-bread.jpg" | absolute_url }}) + +And no, I'm not posting pictures of earlier attempts that ended up turning into rocks in the oven. + +Okay, just one: + +![Bread as rock]({{ "/assets/images/2018-05-28-rocks.jpg" | absolute_url }}) + +If you're here for keeping up with the man Bradlee Speice, got plenty of that too. Plus some +up-coming super-nerdy posts about how I'm changing the world. + +And if you're not here for those things: don't have a lot for you, sorry. But you're welcome to let +me know what needs to change. + +I'm looking forward to making this a place to talk about what's going on in life, I hope you'll +stick it out with me. The best way to follow what's going on is on my [About](/about/) page, but if +you want the joy of clicking links, here's a few good ones: + +- Email (people still use this?): [bradlee@speice.io](mailto:bradlee@speice.io) +- Mastodon (nerd Twitter): [@bradlee](https://mastodon.social/@bradlee) +- Chat (RiotIM): [@bspeice:matrix.com](https://matrix.to/#/@bspeice:matrix.com) +- The comments section (not for people with sanity intact): ↓↓↓ + +Thanks, and keep it amazing. diff --git a/blog/2018-05-28-hello/bread.jpg b/blog/2018-05-28-hello/bread.jpg new file mode 100644 index 0000000..79a50b4 Binary files /dev/null and b/blog/2018-05-28-hello/bread.jpg differ diff --git a/blog/2018-05-28-hello/index.mdx b/blog/2018-05-28-hello/index.mdx new file mode 100644 index 0000000..c1ee738 --- /dev/null +++ b/blog/2018-05-28-hello/index.mdx @@ -0,0 +1,25 @@ +--- +slug: 2018/05/hello +title: Hello! +date: 2018-05-28 12:00:00 +authors: [bspeice] +tags: [] +--- + +I'll do what I can to keep this short, there's plenty of other things we both should be doing right +now. + + + +If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you +covered: + +![Saturday Bread](./bread.jpg) + +And no, I'm not posting pictures of earlier attempts that ended up turning into rocks in the oven. + +Okay, just one: + +![Bread as rock](./rocks.jpg) + +Thanks, and keep it amazing. diff --git a/blog/2018-05-28-hello/rocks.jpg b/blog/2018-05-28-hello/rocks.jpg new file mode 100644 index 0000000..20db7fa Binary files /dev/null and b/blog/2018-05-28-hello/rocks.jpg differ diff --git a/blog/2018-06-25-dateutil-parser-to-rust/_article.md b/blog/2018-06-25-dateutil-parser-to-rust/_article.md new file mode 100644 index 0000000..7646f28 --- /dev/null +++ b/blog/2018-06-25-dateutil-parser-to-rust/_article.md @@ -0,0 +1,177 @@ +--- +layout: post +title: "What I Learned: Porting Dateutil Parser to Rust" +description: "" +category: +tags: [dtparse, rust] +--- + +Hi. I'm Bradlee. + +I've mostly been a lurker in Rust for a while, making a couple small contributions here and there. +So launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a +functioning member of society. But not too much, because then you know people start asking you to +pay bills, and ain't nobody got time for that. + +But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you +what to do with your life (but you should totally keep reading). + +# Slow down, what? + +OK, fine, I guess I should start with _why_ someone would do this. + +[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The +standard library support for time in Python is kinda dope, but there are a lot of extras that go +into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html) +module. `dateutil.parser` specifically is code to take all the super-weird time formats people come +up with and turn them into something actually useful. + +Date/time parsing, it turns out, is just like everything else involving +[computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and +[time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it +feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people +suck and this is why +[we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But +alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like +_Time_. + +![A gravel mound](/assets/images/2018-06-25-gravel-mound.jpg) + +> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) + +What makes `dateutil.parser` great is that there's single function with a single argument that +drives what programmers interact with: +[`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258). +It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can +possibly do to make sense of your input" value. It doesn't expect much of you. + +[And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332) + +# Lost in Translation + +Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm +admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can +actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better +experiences than +[this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us) +who attempted the same thing for D. These are the actual take-aways: + +When transcribing code, **stay as close to the original library as possible**. I'm talking about +using the same variable names, same access patterns, the whole shebang. It's way too easy to make a +couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference +manual for verbatim what your code should be means that you don't spend that long debugging +complicated logic, you're more looking for typos. + +Also, **don't use nice Rust things like enums**. While +[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), +I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a +boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In +general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate +the same functionality. + +**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So +when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out +[he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while +`dateutil` is pretty well-behaved about not skipping multiple stack frames, +[130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865) +take a while to verify. + +As another Python quirk, **be very careful about +[long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**. +I used to think that Python's whitespace was just there to get you to format your code correctly. I +think that no longer. It's way too easy to close a block too early and have incredibly weird issues +in the logic. Make sure you use an editor that displays indentation levels so you can keep things +straight. + +**Rust macros are not free.** I originally had the +[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) +wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. +After +[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) +compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code +to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually +functions that need to be liberated, man. + +Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison, +see +[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476) +and +[the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629). +I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be +added through macros or syntax extensions. Either way, they're expressive, save typing, and are +super-readable. Let's get more of that. + +# Using a young language + +Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On +more than one occasion though, I've had issues navigating the Rust ecosystem. + +What I'll call the "canonical library" is still being built. In Python, if you need datetime +parsing, you use `dateutil`. If you want `decimal` types, it's already in the +[standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away +with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to +the original library as possible**. Thus began my quest to find a decimal library in Rust. What I +quickly found was summarized in a comment: + +> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard. +> +> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) + +In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal) +[different](https://crates.io/crates/rust_decimal) +[implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate). +And that's a lot of decisions to worry about when all I'm thinking is "why can't +[calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig +through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916) +[different](https://github.com/rust-lang/rfcs/issues/334) +[threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead +or just stable. + +And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. +[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just +released version 0.4.4 like two days ago. Meanwhile, +[chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though +[there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I +know relatively little about it, but it appears that most of the release process is automated; +keeping that up to date should be a no-brainer. + +## Trial Maintenance Policy + +Specifically given "maintenance" being an +[oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) +issue, I'm going to try out the following policy to keep things moving on `dtparse`: + +1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure + nobody's blocking on me. + +2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the + contributor to check in after two weeks, and close the issue without resolution if I hear nothing + back after a month. + +The second point I think has the potential to be a bit controversial, so I'm happy to receive +feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm +running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And +second: I don't mind keeping those requests going indefinitely. I just want to try and balance +keeping things moving with giving people the necessary time they need. + +I should also note that I'm still getting some best practices in place - CONTRIBUTING and +CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are +perfect. + +# Roadmap and Conclusion + +So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not +nearly ambitious enough. + +Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command +can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a +coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it +doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime) +could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each +other? + +All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month +on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going +to take it hard. + +And in the mean time, I'm looking forward to building more. Onwards. diff --git a/blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg b/blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg new file mode 100644 index 0000000..15148b0 Binary files /dev/null and b/blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg differ diff --git a/blog/2018-06-25-dateutil-parser-to-rust/index.mdx b/blog/2018-06-25-dateutil-parser-to-rust/index.mdx new file mode 100644 index 0000000..ed919af --- /dev/null +++ b/blog/2018-06-25-dateutil-parser-to-rust/index.mdx @@ -0,0 +1,177 @@ +--- +slug: 2018/06/dateutil-parser-to-rust +title: "What I Learned: Porting Dateutil Parser to Rust" +date: 2018-06-25 12:00:00 +authors: [bspeice] +tags: [] +--- + +I've mostly been a lurker in Rust for a while, making a couple small contributions here and there. +So launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a +functioning member of society. But not too much, because then you know people start asking you to +pay bills, and ain't nobody got time for that. + + + +But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you +what to do with your life (but you should totally keep reading). + +## Slow down, what? + +OK, fine, I guess I should start with _why_ someone would do this. + +[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The +standard library support for time in Python is kinda dope, but there are a lot of extras that go +into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html) +module. `dateutil.parser` specifically is code to take all the super-weird time formats people come +up with and turn them into something actually useful. + +Date/time parsing, it turns out, is just like everything else involving +[computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and +[time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it +feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people +suck and this is why +[we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But +alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like +_Time_. + +![A gravel mound](./gravel-mound.jpg) + +> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) + +What makes `dateutil.parser` great is that there's single function with a single argument that +drives what programmers interact with: +[`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258). +It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can +possibly do to make sense of your input" value. It doesn't expect much of you. + +[And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332) + +## Lost in Translation + +Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm +admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can +actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better +experiences than +[this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us) +who attempted the same thing for D. These are the actual take-aways: + +When transcribing code, **stay as close to the original library as possible**. I'm talking about +using the same variable names, same access patterns, the whole shebang. It's way too easy to make a +couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference +manual for verbatim what your code should be means that you don't spend that long debugging +complicated logic, you're more looking for typos. + +Also, **don't use nice Rust things like enums**. While +[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), +I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a +boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In +general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate +the same functionality. + +**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So +when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out +[he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while +`dateutil` is pretty well-behaved about not skipping multiple stack frames, +[130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865) +take a while to verify. + +As another Python quirk, **be very careful about +[long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**. +I used to think that Python's whitespace was just there to get you to format your code correctly. I +think that no longer. It's way too easy to close a block too early and have incredibly weird issues +in the logic. Make sure you use an editor that displays indentation levels so you can keep things +straight. + +**Rust macros are not free.** I originally had the +[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) +wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. +After +[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) +compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code +to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually +functions that need to be liberated, man. + +Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison, +see +[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476) +and +[the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629). +I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be +added through macros or syntax extensions. Either way, they're expressive, save typing, and are +super-readable. Let's get more of that. + +## Using a young language + +Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On +more than one occasion though, I've had issues navigating the Rust ecosystem. + +What I'll call the "canonical library" is still being built. In Python, if you need datetime +parsing, you use `dateutil`. If you want `decimal` types, it's already in the +[standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away +with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to +the original library as possible**. Thus began my quest to find a decimal library in Rust. What I +quickly found was summarized in a comment: + +> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard. +> +> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) + +In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal) +[different](https://crates.io/crates/rust_decimal) +[implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate). +And that's a lot of decisions to worry about when all I'm thinking is "why can't +[calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig +through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916) +[different](https://github.com/rust-lang/rfcs/issues/334) +[threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead +or just stable. + +And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. +[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just +released version 0.4.4 like two days ago. Meanwhile, +[chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though +[there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I +know relatively little about it, but it appears that most of the release process is automated; +keeping that up to date should be a no-brainer. + +## Trial Maintenance Policy + +Specifically given "maintenance" being an +[oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) +issue, I'm going to try out the following policy to keep things moving on `dtparse`: + +1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure + nobody's blocking on me. + +2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the + contributor to check in after two weeks, and close the issue without resolution if I hear nothing + back after a month. + +The second point I think has the potential to be a bit controversial, so I'm happy to receive +feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm +running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And +second: I don't mind keeping those requests going indefinitely. I just want to try and balance +keeping things moving with giving people the necessary time they need. + +I should also note that I'm still getting some best practices in place - CONTRIBUTING and +CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are +perfect. + +## Roadmap and Conclusion + +So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not +nearly ambitious enough. + +Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command +can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a +coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it +doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime) +could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each +other? + +All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month +on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going +to take it hard. + +And in the mean time, I'm looking forward to building more. Onwards. diff --git a/blog/2018-09-01-primitives-in-rust-are-weird/_article.md b/blog/2018-09-01-primitives-in-rust-are-weird/_article.md new file mode 100644 index 0000000..e69de29 diff --git a/blog/2018-09-01-primitives-in-rust-are-weird/index.mdx b/blog/2018-09-01-primitives-in-rust-are-weird/index.mdx new file mode 100644 index 0000000..f740023 --- /dev/null +++ b/blog/2018-09-01-primitives-in-rust-are-weird/index.mdx @@ -0,0 +1,323 @@ +--- +slug: 2018/09/primitives-in-rust-are-weird +title: "Primitives in Rust are weird (and cool)" +date: 2018-09-01 12:00:00 +authors: [bspeice] +tags: [] +--- + +I wrote a really small Rust program a while back because I was curious. I was 100% convinced it +couldn't possibly run: + +```rust +fn main() { + println!("{}", 8.to_string()) +} +``` + +And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. + + + +The reason I was so surprised has to do with how Rust treats a special category of things I'm going to +call _primitives_. In the current version of the Rust book, you'll see them referred to as +[scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], but +we're going to stick with the name _primitive_ for the time being. Explaining why this program is so +cool requires talking about a number of other programming languages, and keeping a consistent +terminology makes things easier. + +**You've been warned:** this is going to be a tedious post about a relatively minor issue that +involves Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking +about with assembly. + +## Defining primitives (Java) + +The reason I'm using the name _primitive_ comes from how much of my life is Java right now. For the most part I like Java, but I digress. In Java, there's a special +name for some specific types of values: + +> ``` +> bool char byte +> short int long +> float double +> ``` + +They are referred to as [primitives][java_primitive]. And relative to the other bits of Java, +they have two unique features. First, they don't have to worry about the +[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions); +primitives in Java can never be `null`. Second: *they can't have instance methods*. +Remember that Rust program from earlier? Java has no idea what to do with it: + +```java +class Main { + public static void main(String[] args) { + int x = 8; + System.out.println(x.toString()); // Triggers a compiler error + } +} +```` + +The error is: + +``` +Main.java:5: error: int cannot be dereferenced + System.out.println(x.toString()); + ^ +1 error +``` + +Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html) +and things that inherit from it are pointers under the hood, and we have to dereference them before +the fields and methods they define can be used. In contrast, _primitive types are just values_ - +there's nothing to be dereferenced. In memory, they're just a sequence of bits. + +If we really want, we can turn the `int` into an +[`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference +it, but it's a bit wasteful: + +```java +class Main { + public static void main(String[] args) { + int x = 8; + Integer y = Integer.valueOf(x); + System.out.println(y.toString()); + } +} +``` + +This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we +dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit +differently, but we have to dig into the low-level details to see it in action. + +## Low Level Handling of Primitives (C) + +We first need to build a foundation for reading and understanding the assembly code the final answer +requires. Let's begin with showing how the `C` language (and your computer) thinks about "primitive" +values in memory: + +```c +void my_function(int num) {} + +int main() { + int x = 8; + my_function(x); +} +``` + +The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the +assembly-level code that's generated: whose output has been lightly +edited + +```nasm +main: + push rbp + mov rbp, rsp + sub rsp, 16 + + ; We assign the value `8` to `x` here + mov DWORD PTR [rbp-4], 8 + + ; And copy the bits making up `x` to a location + ; `my_function` can access (`edi`) + mov eax, DWORD PTR [rbp-4] + mov edi, eax + + ; Call `my_function` and give it control + call my_function + + mov eax, 0 + leave + ret + +my_function: + push rbp + mov rbp, rsp + + ; Copy the bits out of the pre-determined location (`edi`) + ; to somewhere we can use + mov DWORD PTR [rbp-4], edi + nop + + pop rbp + ret +``` + +At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; +nothing crazy. But to show how similar Rust is, let's take a look at our program translated from C +to Rust: + +```rust +fn my_function(x: i32) {} + +fn main() { + let x = 8; + my_function(x) +} +``` + +And the assembly generated when we stick it in the +[compiler explorer](https://godbolt.org/z/cAlmk0): again, lightly +edited + +```nasm +example::main: + push rax + + ; Look familiar? We're copying bits to a location for `my_function` + ; The compiler just optimizes out holding `x` in memory + mov edi, 8 + + ; Call `my_function` and give it control + call example::my_function + + pop rax + ret + +example::my_function: + sub rsp, 4 + + ; And copying those bits again, just like in C + mov dword ptr [rsp], edi + + add rsp, 4 + ret +``` + +The generated Rust assembly is functionally pretty close to the C assembly: _When working with +primitives, we're just dealing with bits in memory_. + +In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to +dereference. So what exactly is going on with this `.to_string()` function call? + +## impl primitive (and Python) + +Now it's time to reveal my trap card show the revelation that tied all this +together: _Rust has implementations for its primitive types._ That's right, `impl` blocks aren't +only for `structs` and `traits`, primitives get them too. Don't believe me? Check out +[u32](https://doc.rust-lang.org/std/primitive.u32.html), +[f64](https://doc.rust-lang.org/std/primitive.f64.html) and +[char](https://doc.rust-lang.org/std/primitive.char.html) as examples. + +But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out +the [compiler explorer](https://godbolt.org/z/6LBEwq) once again: + +```rust +pub fn main() { + 8.to_string() +} +``` + +And the interesting bits in the assembly: heavily trimmed down + +```nasm +example::main: + sub rsp, 24 + mov rdi, rsp + lea rax, [rip + .Lbyte_str.u] + mov rsi, rax + + ; Cool stuff right here + call ::to_string@PLT + + mov rdi, rsp + call core::ptr::drop_in_place + add rsp, 24 + ret +``` + +Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling +`to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead +of thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the +function we want to call (like Java), we have a function that exists outside of the instance and +just give that function the value `8`. + +This is an incredibly technical detail, but the interesting idea I had was this: _if `to_string()` +is a static function, can I refer to the unbound function and give it an instance?_ + +Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I +seriously love this thing): + +```rust +struct MyVal { + x: u32 +} + +impl MyVal { + fn to_string(&self) -> String { + self.x.to_string() + } +} + +pub fn main() { + let my_val = MyVal { x: 8 }; + + // THESE ARE THE SAME + my_val.to_string(); + MyVal::to_string(&my_val); +} +``` + +Rust is totally fine "binding" the function call to the instance, and also as a static. + +MIND == BLOWN. + +Python does the same thing where I can both call functions bound to their instances and also call as +an unbound function where I give it the instance: + +```python +class MyClass(): + x = 24 + + def my_function(self): + print(self.x) + +m = MyClass() + +m.my_function() +MyClass.my_function(m) +``` + +And Python tries to make you _think_ that primitives can have instance methods... + +```python +>>> dir(8) +['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', +'__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', +... +'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', +...] + +>>> # Theoretically `8.__str__()` should exist, but: + +>>> 8.__str__() + File "", line 1 + 8.__str__() + ^ +SyntaxError: invalid syntax + +>>> # It will run if we assign it first though: +>>> x = 8 +>>> x.__str__() +'8' +``` + +...but in practice it's a bit complicated. + +So while Python handles binding instance methods in a way similar to Rust, it's still not able to +run the example we started with. + +## Conclusion + +This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor +details like primitives leads to really cool effects. Primitives are optimized like C in how they +have a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python +(like both instance and late binding). + +And when you put it together, there are areas where Rust does cool things nobody else can; as a +quirky feature of Rust's type system, `8.to_string()` is actually valid code. + +Now go forth and fool your friends into thinking you know assembly. This is all I've got. + +[x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html +[java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html +[rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types +[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html diff --git a/docusaurus.config.ts b/docusaurus.config.ts index b424075..1f21c19 100644 --- a/docusaurus.config.ts +++ b/docusaurus.config.ts @@ -80,7 +80,7 @@ const config: Config = { prism: { theme: prismThemes.oneLight, darkTheme: prismThemes.oneDark, - additionalLanguages: ['julia'] + additionalLanguages: ['java', 'julia', 'nasm'] }, } satisfies Preset.ThemeConfig, plugins: [require.resolve('docusaurus-lunr-search')],