mirror of
				https://github.com/bspeice/speice.io
				synced 2025-10-31 17:40:28 -04:00 
			
		
		
		
	First posts from speice.io
This commit is contained in:
		
							
								
								
									
										38
									
								
								blog/2018-05-28-hello/_article.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										38
									
								
								blog/2018-05-28-hello/_article.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,38 @@ | ||||
| --- | ||||
| layout: post | ||||
| title: "Hello!" | ||||
| description: "" | ||||
| category: | ||||
| tags: [] | ||||
| --- | ||||
|  | ||||
| I'll do what I can to keep this short, there's plenty of other things we both should be doing right | ||||
| now. | ||||
|  | ||||
| If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you | ||||
| covered: | ||||
|  | ||||
|  | ||||
|  | ||||
| And no, I'm not posting pictures of earlier attempts that ended up turning into rocks in the oven. | ||||
|  | ||||
| Okay, just one: | ||||
|  | ||||
|  | ||||
|  | ||||
| If you're here for keeping up with the man Bradlee Speice, got plenty of that too. Plus some | ||||
| up-coming super-nerdy posts about how I'm changing the world. | ||||
|  | ||||
| And if you're not here for those things: don't have a lot for you, sorry. But you're welcome to let | ||||
| me know what needs to change. | ||||
|  | ||||
| I'm looking forward to making this a place to talk about what's going on in life, I hope you'll | ||||
| stick it out with me. The best way to follow what's going on is on my [About](/about/) page, but if | ||||
| you want the joy of clicking links, here's a few good ones: | ||||
|  | ||||
| - Email (people still use this?): [bradlee@speice.io](mailto:bradlee@speice.io) | ||||
| - Mastodon (nerd Twitter): [@bradlee](https://mastodon.social/@bradlee) | ||||
| - Chat (RiotIM): [@bspeice:matrix.com](https://matrix.to/#/@bspeice:matrix.com) | ||||
| - The comments section (not for people with sanity intact): ↓↓↓ | ||||
|  | ||||
| Thanks, and keep it amazing. | ||||
							
								
								
									
										
											BIN
										
									
								
								blog/2018-05-28-hello/bread.jpg
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								blog/2018-05-28-hello/bread.jpg
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 840 KiB | 
							
								
								
									
										25
									
								
								blog/2018-05-28-hello/index.mdx
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										25
									
								
								blog/2018-05-28-hello/index.mdx
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,25 @@ | ||||
| --- | ||||
| slug: 2018/05/hello | ||||
| title: Hello! | ||||
| date: 2018-05-28 12:00:00 | ||||
| authors: [bspeice] | ||||
| tags: [] | ||||
| --- | ||||
|  | ||||
| I'll do what I can to keep this short, there's plenty of other things we both should be doing right | ||||
| now. | ||||
|  | ||||
| <!-- truncate --> | ||||
|  | ||||
| If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you | ||||
| covered: | ||||
|  | ||||
|  | ||||
|  | ||||
| And no, I'm not posting pictures of earlier attempts that ended up turning into rocks in the oven. | ||||
|  | ||||
| Okay, just one: | ||||
|  | ||||
|  | ||||
|  | ||||
| Thanks, and keep it amazing. | ||||
							
								
								
									
										
											BIN
										
									
								
								blog/2018-05-28-hello/rocks.jpg
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								blog/2018-05-28-hello/rocks.jpg
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 926 KiB | 
							
								
								
									
										177
									
								
								blog/2018-06-25-dateutil-parser-to-rust/_article.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										177
									
								
								blog/2018-06-25-dateutil-parser-to-rust/_article.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,177 @@ | ||||
| --- | ||||
| layout: post | ||||
| title: "What I Learned: Porting Dateutil Parser to Rust" | ||||
| description: "" | ||||
| category: | ||||
| tags: [dtparse, rust] | ||||
| --- | ||||
|  | ||||
| Hi. I'm Bradlee. | ||||
|  | ||||
| I've mostly been a lurker in Rust for a while, making a couple small contributions here and there. | ||||
| So launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a | ||||
| functioning member of society. But not too much, because then you know people start asking you to | ||||
| pay bills, and ain't nobody got time for that. | ||||
|  | ||||
| But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you | ||||
| what to do with your life (but you should totally keep reading). | ||||
|  | ||||
| # Slow down, what? | ||||
|  | ||||
| OK, fine, I guess I should start with _why_ someone would do this. | ||||
|  | ||||
| [Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The | ||||
| standard library support for time in Python is kinda dope, but there are a lot of extras that go | ||||
| into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html) | ||||
| module. `dateutil.parser` specifically is code to take all the super-weird time formats people come | ||||
| up with and turn them into something actually useful. | ||||
|  | ||||
| Date/time parsing, it turns out, is just like everything else involving | ||||
| [computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and | ||||
| [time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it | ||||
| feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people | ||||
| suck and this is why | ||||
| [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But | ||||
| alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like | ||||
| _Time_. | ||||
|  | ||||
|  | ||||
|  | ||||
| > [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) | ||||
|  | ||||
| What makes `dateutil.parser` great is that there's single function with a single argument that | ||||
| drives what programmers interact with: | ||||
| [`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258). | ||||
| It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can | ||||
| possibly do to make sense of your input" value. It doesn't expect much of you. | ||||
|  | ||||
| [And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332) | ||||
|  | ||||
| # Lost in Translation | ||||
|  | ||||
| Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm | ||||
| admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can | ||||
| actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better | ||||
| experiences than | ||||
| [this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us) | ||||
| who attempted the same thing for D. These are the actual take-aways: | ||||
|  | ||||
| When transcribing code, **stay as close to the original library as possible**. I'm talking about | ||||
| using the same variable names, same access patterns, the whole shebang. It's way too easy to make a | ||||
| couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference | ||||
| manual for verbatim what your code should be means that you don't spend that long debugging | ||||
| complicated logic, you're more looking for typos. | ||||
|  | ||||
| Also, **don't use nice Rust things like enums**. While | ||||
| [one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), | ||||
| I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a | ||||
| boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In | ||||
| general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate | ||||
| the same functionality. | ||||
|  | ||||
| **Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So | ||||
| when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out | ||||
| [he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while | ||||
| `dateutil` is pretty well-behaved about not skipping multiple stack frames, | ||||
| [130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865) | ||||
| take a while to verify. | ||||
|  | ||||
| As another Python quirk, **be very careful about | ||||
| [long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**. | ||||
| I used to think that Python's whitespace was just there to get you to format your code correctly. I | ||||
| think that no longer. It's way too easy to close a block too early and have incredibly weird issues | ||||
| in the logic. Make sure you use an editor that displays indentation levels so you can keep things | ||||
| straight. | ||||
|  | ||||
| **Rust macros are not free.** I originally had the | ||||
| [main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) | ||||
| wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. | ||||
| After | ||||
| [moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) | ||||
| compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code | ||||
| to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually | ||||
| functions that need to be liberated, man. | ||||
|  | ||||
| Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison, | ||||
| see | ||||
| [this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476) | ||||
| and | ||||
| [the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629). | ||||
| I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be | ||||
| added through macros or syntax extensions. Either way, they're expressive, save typing, and are | ||||
| super-readable. Let's get more of that. | ||||
|  | ||||
| # Using a young language | ||||
|  | ||||
| Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On | ||||
| more than one occasion though, I've had issues navigating the Rust ecosystem. | ||||
|  | ||||
| What I'll call the "canonical library" is still being built. In Python, if you need datetime | ||||
| parsing, you use `dateutil`. If you want `decimal` types, it's already in the | ||||
| [standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away | ||||
| with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to | ||||
| the original library as possible**. Thus began my quest to find a decimal library in Rust. What I | ||||
| quickly found was summarized in a comment: | ||||
|  | ||||
| > Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard. | ||||
| > | ||||
| > [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) | ||||
|  | ||||
| In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal) | ||||
| [different](https://crates.io/crates/rust_decimal) | ||||
| [implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate). | ||||
| And that's a lot of decisions to worry about when all I'm thinking is "why can't | ||||
| [calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig | ||||
| through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916) | ||||
| [different](https://github.com/rust-lang/rfcs/issues/334) | ||||
| [threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead | ||||
| or just stable. | ||||
|  | ||||
| And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. | ||||
| [Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just | ||||
| released version 0.4.4 like two days ago. Meanwhile, | ||||
| [chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though | ||||
| [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I | ||||
| know relatively little about it, but it appears that most of the release process is automated; | ||||
| keeping that up to date should be a no-brainer. | ||||
|  | ||||
| ## Trial Maintenance Policy | ||||
|  | ||||
| Specifically given "maintenance" being an | ||||
| [oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) | ||||
| issue, I'm going to try out the following policy to keep things moving on `dtparse`: | ||||
|  | ||||
| 1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure | ||||
|    nobody's blocking on me. | ||||
|  | ||||
| 2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the | ||||
|    contributor to check in after two weeks, and close the issue without resolution if I hear nothing | ||||
|    back after a month. | ||||
|  | ||||
| The second point I think has the potential to be a bit controversial, so I'm happy to receive | ||||
| feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm | ||||
| running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And | ||||
| second: I don't mind keeping those requests going indefinitely. I just want to try and balance | ||||
| keeping things moving with giving people the necessary time they need. | ||||
|  | ||||
| I should also note that I'm still getting some best practices in place - CONTRIBUTING and | ||||
| CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are | ||||
| perfect. | ||||
|  | ||||
| # Roadmap and Conclusion | ||||
|  | ||||
| So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not | ||||
| nearly ambitious enough. | ||||
|  | ||||
| Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command | ||||
| can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a | ||||
| coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it | ||||
| doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime) | ||||
| could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each | ||||
| other? | ||||
|  | ||||
| All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month | ||||
| on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going | ||||
| to take it hard. | ||||
|  | ||||
| And in the mean time, I'm looking forward to building more. Onwards. | ||||
							
								
								
									
										
											BIN
										
									
								
								blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 165 KiB | 
							
								
								
									
										177
									
								
								blog/2018-06-25-dateutil-parser-to-rust/index.mdx
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										177
									
								
								blog/2018-06-25-dateutil-parser-to-rust/index.mdx
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,177 @@ | ||||
| --- | ||||
| slug: 2018/06/dateutil-parser-to-rust | ||||
| title: "What I Learned: Porting Dateutil Parser to Rust" | ||||
| date: 2018-06-25 12:00:00 | ||||
| authors: [bspeice] | ||||
| tags: [] | ||||
| --- | ||||
|  | ||||
| I've mostly been a lurker in Rust for a while, making a couple small contributions here and there. | ||||
| So launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a | ||||
| functioning member of society. But not too much, because then you know people start asking you to | ||||
| pay bills, and ain't nobody got time for that. | ||||
|  | ||||
| <!-- truncate --> | ||||
|  | ||||
| But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you | ||||
| what to do with your life (but you should totally keep reading). | ||||
|  | ||||
| ## Slow down, what? | ||||
|  | ||||
| OK, fine, I guess I should start with _why_ someone would do this. | ||||
|  | ||||
| [Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The | ||||
| standard library support for time in Python is kinda dope, but there are a lot of extras that go | ||||
| into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html) | ||||
| module. `dateutil.parser` specifically is code to take all the super-weird time formats people come | ||||
| up with and turn them into something actually useful. | ||||
|  | ||||
| Date/time parsing, it turns out, is just like everything else involving | ||||
| [computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and | ||||
| [time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it | ||||
| feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people | ||||
| suck and this is why | ||||
| [we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But | ||||
| alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like | ||||
| _Time_. | ||||
|  | ||||
|  | ||||
|  | ||||
| > [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php) | ||||
|  | ||||
| What makes `dateutil.parser` great is that there's single function with a single argument that | ||||
| drives what programmers interact with: | ||||
| [`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258). | ||||
| It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can | ||||
| possibly do to make sense of your input" value. It doesn't expect much of you. | ||||
|  | ||||
| [And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332) | ||||
|  | ||||
| ## Lost in Translation | ||||
|  | ||||
| Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm | ||||
| admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can | ||||
| actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better | ||||
| experiences than | ||||
| [this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us) | ||||
| who attempted the same thing for D. These are the actual take-aways: | ||||
|  | ||||
| When transcribing code, **stay as close to the original library as possible**. I'm talking about | ||||
| using the same variable names, same access patterns, the whole shebang. It's way too easy to make a | ||||
| couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference | ||||
| manual for verbatim what your code should be means that you don't spend that long debugging | ||||
| complicated logic, you're more looking for typos. | ||||
|  | ||||
| Also, **don't use nice Rust things like enums**. While | ||||
| [one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94), | ||||
| I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a | ||||
| boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In | ||||
| general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate | ||||
| the same functionality. | ||||
|  | ||||
| **Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So | ||||
| when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out | ||||
| [he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while | ||||
| `dateutil` is pretty well-behaved about not skipping multiple stack frames, | ||||
| [130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865) | ||||
| take a while to verify. | ||||
|  | ||||
| As another Python quirk, **be very careful about | ||||
| [long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**. | ||||
| I used to think that Python's whitespace was just there to get you to format your code correctly. I | ||||
| think that no longer. It's way too easy to close a block too early and have incredibly weird issues | ||||
| in the logic. Make sure you use an editor that displays indentation levels so you can keep things | ||||
| straight. | ||||
|  | ||||
| **Rust macros are not free.** I originally had the | ||||
| [main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217) | ||||
| wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile. | ||||
| After | ||||
| [moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205) | ||||
| compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code | ||||
| to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually | ||||
| functions that need to be liberated, man. | ||||
|  | ||||
| Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison, | ||||
| see | ||||
| [this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476) | ||||
| and | ||||
| [the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629). | ||||
| I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be | ||||
| added through macros or syntax extensions. Either way, they're expressive, save typing, and are | ||||
| super-readable. Let's get more of that. | ||||
|  | ||||
| ## Using a young language | ||||
|  | ||||
| Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On | ||||
| more than one occasion though, I've had issues navigating the Rust ecosystem. | ||||
|  | ||||
| What I'll call the "canonical library" is still being built. In Python, if you need datetime | ||||
| parsing, you use `dateutil`. If you want `decimal` types, it's already in the | ||||
| [standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away | ||||
| with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to | ||||
| the original library as possible**. Thus began my quest to find a decimal library in Rust. What I | ||||
| quickly found was summarized in a comment: | ||||
|  | ||||
| > Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard. | ||||
| > | ||||
| > [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794) | ||||
|  | ||||
| In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal) | ||||
| [different](https://crates.io/crates/rust_decimal) | ||||
| [implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate). | ||||
| And that's a lot of decisions to worry about when all I'm thinking is "why can't | ||||
| [calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig | ||||
| through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916) | ||||
| [different](https://github.com/rust-lang/rfcs/issues/334) | ||||
| [threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead | ||||
| or just stable. | ||||
|  | ||||
| And even when the "canonical library" exists, there's no guarantees that it will be well-maintained. | ||||
| [Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just | ||||
| released version 0.4.4 like two days ago. Meanwhile, | ||||
| [chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though | ||||
| [there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I | ||||
| know relatively little about it, but it appears that most of the release process is automated; | ||||
| keeping that up to date should be a no-brainer. | ||||
|  | ||||
| ## Trial Maintenance Policy | ||||
|  | ||||
| Specifically given "maintenance" being an | ||||
| [oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/) | ||||
| issue, I'm going to try out the following policy to keep things moving on `dtparse`: | ||||
|  | ||||
| 1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure | ||||
|    nobody's blocking on me. | ||||
|  | ||||
| 2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the | ||||
|    contributor to check in after two weeks, and close the issue without resolution if I hear nothing | ||||
|    back after a month. | ||||
|  | ||||
| The second point I think has the potential to be a bit controversial, so I'm happy to receive | ||||
| feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm | ||||
| running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And | ||||
| second: I don't mind keeping those requests going indefinitely. I just want to try and balance | ||||
| keeping things moving with giving people the necessary time they need. | ||||
|  | ||||
| I should also note that I'm still getting some best practices in place - CONTRIBUTING and | ||||
| CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are | ||||
| perfect. | ||||
|  | ||||
| ## Roadmap and Conclusion | ||||
|  | ||||
| So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not | ||||
| nearly ambitious enough. | ||||
|  | ||||
| Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command | ||||
| can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a | ||||
| coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it | ||||
| doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime) | ||||
| could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each | ||||
| other? | ||||
|  | ||||
| All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month | ||||
| on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going | ||||
| to take it hard. | ||||
|  | ||||
| And in the mean time, I'm looking forward to building more. Onwards. | ||||
							
								
								
									
										323
									
								
								blog/2018-09-01-primitives-in-rust-are-weird/index.mdx
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										323
									
								
								blog/2018-09-01-primitives-in-rust-are-weird/index.mdx
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,323 @@ | ||||
| --- | ||||
| slug: 2018/09/primitives-in-rust-are-weird | ||||
| title: "Primitives in Rust are weird (and cool)" | ||||
| date: 2018-09-01 12:00:00 | ||||
| authors: [bspeice] | ||||
| tags: [] | ||||
| --- | ||||
|  | ||||
| I wrote a really small Rust program a while back because I was curious. I was 100% convinced it | ||||
| couldn't possibly run: | ||||
|  | ||||
| ```rust | ||||
| fn main() { | ||||
|     println!("{}", 8.to_string()) | ||||
| } | ||||
| ``` | ||||
|  | ||||
| And to my complete befuddlement, it compiled, ran, and produced a completely sensible output. | ||||
|  | ||||
| <!-- truncate --> | ||||
|  | ||||
| The reason I was so surprised has to do with how Rust treats a special category of things I'm going to | ||||
| call _primitives_. In the current version of the Rust book, you'll see them referred to as | ||||
| [scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], but | ||||
| we're going to stick with the name _primitive_ for the time being. Explaining why this program is so | ||||
| cool requires talking about a number of other programming languages, and keeping a consistent | ||||
| terminology makes things easier. | ||||
|  | ||||
| **You've been warned:** this is going to be a tedious post about a relatively minor issue that | ||||
| involves Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking | ||||
| about with assembly. | ||||
|  | ||||
| ## Defining primitives (Java) | ||||
|  | ||||
| The reason I'm using the name _primitive_ comes from how much of my life is Java right now. For the most part I like Java, but I digress. In Java, there's a special | ||||
| name for some specific types of values: | ||||
|  | ||||
| > ``` | ||||
| > bool    char    byte | ||||
| > short   int     long | ||||
| > float   double | ||||
| > ``` | ||||
|  | ||||
| They are referred to as [primitives][java_primitive]. And relative to the other bits of Java, | ||||
| they have two unique features. First, they don't have to worry about the | ||||
| [billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions); | ||||
| primitives in Java can never be `null`. Second: *they can't have instance methods*. | ||||
| Remember that Rust program from earlier? Java has no idea what to do with it: | ||||
|  | ||||
| ```java | ||||
| class Main { | ||||
|     public static void main(String[] args) { | ||||
|         int x = 8; | ||||
|         System.out.println(x.toString()); // Triggers a compiler error | ||||
|     } | ||||
| } | ||||
| ```` | ||||
|  | ||||
| The error is: | ||||
|  | ||||
| ``` | ||||
| Main.java:5: error: int cannot be dereferenced | ||||
|         System.out.println(x.toString()); | ||||
|                             ^ | ||||
| 1 error | ||||
| ``` | ||||
|  | ||||
| Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html) | ||||
| and things that inherit from it are pointers under the hood, and we have to dereference them before | ||||
| the fields and methods they define can be used. In contrast, _primitive types are just values_ - | ||||
| there's nothing to be dereferenced. In memory, they're just a sequence of bits. | ||||
|  | ||||
| If we really want, we can turn the `int` into an | ||||
| [`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference | ||||
| it, but it's a bit wasteful: | ||||
|  | ||||
| ```java | ||||
| class Main { | ||||
|     public static void main(String[] args) { | ||||
|         int x = 8; | ||||
|         Integer y = Integer.valueOf(x); | ||||
|         System.out.println(y.toString()); | ||||
|     } | ||||
| } | ||||
| ``` | ||||
|  | ||||
| This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we | ||||
| dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit | ||||
| differently, but we have to dig into the low-level details to see it in action. | ||||
|  | ||||
| ## Low Level Handling of Primitives (C) | ||||
|  | ||||
| We first need to build a foundation for reading and understanding the assembly code the final answer | ||||
| requires. Let's begin with showing how the `C` language (and your computer) thinks about "primitive" | ||||
| values in memory: | ||||
|  | ||||
| ```c | ||||
| void my_function(int num) {} | ||||
|  | ||||
| int main() { | ||||
|     int x = 8; | ||||
|     my_function(x); | ||||
| } | ||||
| ``` | ||||
|  | ||||
| The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the | ||||
| assembly-level code that's generated: <small>whose output has been lightly | ||||
| edited</small> | ||||
|  | ||||
| ```nasm | ||||
| main: | ||||
|         push    rbp | ||||
|         mov     rbp, rsp | ||||
|         sub     rsp, 16 | ||||
|  | ||||
|         ; We assign the value `8` to `x` here | ||||
|         mov     DWORD PTR [rbp-4], 8 | ||||
|  | ||||
|         ; And copy the bits making up `x` to a location | ||||
|         ; `my_function` can access (`edi`) | ||||
|         mov     eax, DWORD PTR [rbp-4] | ||||
|         mov     edi, eax | ||||
|  | ||||
|         ; Call `my_function` and give it control | ||||
|         call    my_function | ||||
|  | ||||
|         mov     eax, 0 | ||||
|         leave | ||||
|         ret | ||||
|  | ||||
| my_function: | ||||
|         push    rbp | ||||
|         mov     rbp, rsp | ||||
|  | ||||
|         ; Copy the bits out of the pre-determined location (`edi`) | ||||
|         ; to somewhere we can use | ||||
|         mov     DWORD PTR [rbp-4], edi | ||||
|         nop | ||||
|  | ||||
|         pop     rbp | ||||
|         ret | ||||
| ``` | ||||
|  | ||||
| At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction; | ||||
| nothing crazy. But to show how similar Rust is, let's take a look at our program translated from C | ||||
| to Rust: | ||||
|  | ||||
| ```rust | ||||
| fn my_function(x: i32) {} | ||||
|  | ||||
| fn main() { | ||||
|     let x = 8; | ||||
|     my_function(x) | ||||
| } | ||||
| ``` | ||||
|  | ||||
| And the assembly generated when we stick it in the | ||||
| [compiler explorer](https://godbolt.org/z/cAlmk0): <small>again, lightly | ||||
| edited</small> | ||||
|  | ||||
| ```nasm | ||||
| example::main: | ||||
|   push rax | ||||
|  | ||||
|   ; Look familiar? We're copying bits to a location for `my_function` | ||||
|   ; The compiler just optimizes out holding `x` in memory | ||||
|   mov edi, 8 | ||||
|  | ||||
|   ; Call `my_function` and give it control | ||||
|   call example::my_function | ||||
|  | ||||
|   pop rax | ||||
|   ret | ||||
|  | ||||
| example::my_function: | ||||
|   sub rsp, 4 | ||||
|  | ||||
|   ; And copying those bits again, just like in C | ||||
|   mov dword ptr [rsp], edi | ||||
|  | ||||
|   add rsp, 4 | ||||
|   ret | ||||
| ``` | ||||
|  | ||||
| The generated Rust assembly is functionally pretty close to the C assembly: _When working with | ||||
| primitives, we're just dealing with bits in memory_. | ||||
|  | ||||
| In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to | ||||
| dereference. So what exactly is going on with this `.to_string()` function call? | ||||
|  | ||||
| ## impl primitive (and Python) | ||||
|  | ||||
| Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this | ||||
| together: _Rust has implementations for its primitive types._ That's right, `impl` blocks aren't | ||||
| only for `structs` and `traits`, primitives get them too. Don't believe me? Check out | ||||
| [u32](https://doc.rust-lang.org/std/primitive.u32.html), | ||||
| [f64](https://doc.rust-lang.org/std/primitive.f64.html) and | ||||
| [char](https://doc.rust-lang.org/std/primitive.char.html) as examples. | ||||
|  | ||||
| But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out | ||||
| the [compiler explorer](https://godbolt.org/z/6LBEwq) once again: | ||||
|  | ||||
| ```rust | ||||
| pub fn main() { | ||||
|     8.to_string() | ||||
| } | ||||
| ``` | ||||
|  | ||||
| And the interesting bits in the assembly: <small>heavily trimmed down</small> | ||||
|  | ||||
| ```nasm | ||||
| example::main: | ||||
|   sub rsp, 24 | ||||
|   mov rdi, rsp | ||||
|   lea rax, [rip + .Lbyte_str.u] | ||||
|   mov rsi, rax | ||||
|  | ||||
|   ; Cool stuff right here | ||||
|   call <T as alloc::string::ToString>::to_string@PLT | ||||
|  | ||||
|   mov rdi, rsp | ||||
|   call core::ptr::drop_in_place | ||||
|   add rsp, 24 | ||||
|   ret | ||||
| ``` | ||||
|  | ||||
| Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling | ||||
| `to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead | ||||
| of thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the | ||||
| function we want to call (like Java), we have a function that exists outside of the instance and | ||||
| just give that function the value `8`. | ||||
|  | ||||
| This is an incredibly technical detail, but the interesting idea I had was this: _if `to_string()` | ||||
| is a static function, can I refer to the unbound function and give it an instance?_ | ||||
|  | ||||
| Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I | ||||
| seriously love this thing): | ||||
|  | ||||
| ```rust | ||||
| struct MyVal { | ||||
|     x: u32 | ||||
| } | ||||
|  | ||||
| impl MyVal { | ||||
|     fn to_string(&self) -> String { | ||||
|         self.x.to_string() | ||||
|     } | ||||
| } | ||||
|  | ||||
| pub fn main() { | ||||
|     let my_val = MyVal { x: 8 }; | ||||
|  | ||||
|     // THESE ARE THE SAME | ||||
|     my_val.to_string(); | ||||
|     MyVal::to_string(&my_val); | ||||
| } | ||||
| ``` | ||||
|  | ||||
| Rust is totally fine "binding" the function call to the instance, and also as a static. | ||||
|  | ||||
| MIND == BLOWN. | ||||
|  | ||||
| Python does the same thing where I can both call functions bound to their instances and also call as | ||||
| an unbound function where I give it the instance: | ||||
|  | ||||
| ```python | ||||
| class MyClass(): | ||||
|     x = 24 | ||||
|  | ||||
|     def my_function(self): | ||||
|         print(self.x) | ||||
|  | ||||
| m = MyClass() | ||||
|  | ||||
| m.my_function() | ||||
| MyClass.my_function(m) | ||||
| ``` | ||||
|  | ||||
| And Python tries to make you _think_ that primitives can have instance methods... | ||||
|  | ||||
| ```python | ||||
| >>> dir(8) | ||||
| ['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__', | ||||
| '__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__', | ||||
| ... | ||||
| '__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', | ||||
| ...] | ||||
|  | ||||
| >>> # Theoretically `8.__str__()` should exist, but: | ||||
|  | ||||
| >>> 8.__str__() | ||||
|   File "<stdin>", line 1 | ||||
|     8.__str__() | ||||
|              ^ | ||||
| SyntaxError: invalid syntax | ||||
|  | ||||
| >>> # It will run if we assign it first though: | ||||
| >>> x = 8 | ||||
| >>> x.__str__() | ||||
| '8' | ||||
| ``` | ||||
|  | ||||
| ...but in practice it's a bit complicated. | ||||
|  | ||||
| So while Python handles binding instance methods in a way similar to Rust, it's still not able to | ||||
| run the example we started with. | ||||
|  | ||||
| ## Conclusion | ||||
|  | ||||
| This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor | ||||
| details like primitives leads to really cool effects. Primitives are optimized like C in how they | ||||
| have a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python | ||||
| (like both instance and late binding). | ||||
|  | ||||
| And when you put it together, there are areas where Rust does cool things nobody else can; as a | ||||
| quirky feature of Rust's type system, `8.to_string()` is actually valid code. | ||||
|  | ||||
| Now go forth and fool your friends into thinking you know assembly. This is all I've got. | ||||
|  | ||||
| [x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html | ||||
| [java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html | ||||
| [rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types | ||||
| [rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html | ||||
| @ -80,7 +80,7 @@ const config: Config = { | ||||
|     prism: { | ||||
|       theme: prismThemes.oneLight, | ||||
|       darkTheme: prismThemes.oneDark, | ||||
|       additionalLanguages: ['julia'] | ||||
|       additionalLanguages: ['java', 'julia', 'nasm'] | ||||
|     }, | ||||
|   } satisfies Preset.ThemeConfig, | ||||
|   plugins: [require.resolve('docusaurus-lunr-search')], | ||||
|  | ||||
		Reference in New Issue
	
	Block a user