mirror of
https://github.com/bspeice/speice.io
synced 2024-12-22 08:38:09 -05:00
First posts from speice.io
This commit is contained in:
parent
835d79f294
commit
853bf5ebd0
38
blog/2018-05-28-hello/_article.md
Normal file
38
blog/2018-05-28-hello/_article.md
Normal file
@ -0,0 +1,38 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: "Hello!"
|
||||||
|
description: ""
|
||||||
|
category:
|
||||||
|
tags: []
|
||||||
|
---
|
||||||
|
|
||||||
|
I'll do what I can to keep this short, there's plenty of other things we both should be doing right
|
||||||
|
now.
|
||||||
|
|
||||||
|
If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you
|
||||||
|
covered:
|
||||||
|
|
||||||
|
![Saturday Bread]({{ "/assets/images/2018-05-28-bread.jpg" | absolute_url }})
|
||||||
|
|
||||||
|
And no, I'm not posting pictures of earlier attempts that ended up turning into rocks in the oven.
|
||||||
|
|
||||||
|
Okay, just one:
|
||||||
|
|
||||||
|
![Bread as rock]({{ "/assets/images/2018-05-28-rocks.jpg" | absolute_url }})
|
||||||
|
|
||||||
|
If you're here for keeping up with the man Bradlee Speice, got plenty of that too. Plus some
|
||||||
|
up-coming super-nerdy posts about how I'm changing the world.
|
||||||
|
|
||||||
|
And if you're not here for those things: don't have a lot for you, sorry. But you're welcome to let
|
||||||
|
me know what needs to change.
|
||||||
|
|
||||||
|
I'm looking forward to making this a place to talk about what's going on in life, I hope you'll
|
||||||
|
stick it out with me. The best way to follow what's going on is on my [About](/about/) page, but if
|
||||||
|
you want the joy of clicking links, here's a few good ones:
|
||||||
|
|
||||||
|
- Email (people still use this?): [bradlee@speice.io](mailto:bradlee@speice.io)
|
||||||
|
- Mastodon (nerd Twitter): [@bradlee](https://mastodon.social/@bradlee)
|
||||||
|
- Chat (RiotIM): [@bspeice:matrix.com](https://matrix.to/#/@bspeice:matrix.com)
|
||||||
|
- The comments section (not for people with sanity intact): ↓↓↓
|
||||||
|
|
||||||
|
Thanks, and keep it amazing.
|
BIN
blog/2018-05-28-hello/bread.jpg
Normal file
BIN
blog/2018-05-28-hello/bread.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 840 KiB |
25
blog/2018-05-28-hello/index.mdx
Normal file
25
blog/2018-05-28-hello/index.mdx
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
---
|
||||||
|
slug: 2018/05/hello
|
||||||
|
title: Hello!
|
||||||
|
date: 2018-05-28 12:00:00
|
||||||
|
authors: [bspeice]
|
||||||
|
tags: []
|
||||||
|
---
|
||||||
|
|
||||||
|
I'll do what I can to keep this short, there's plenty of other things we both should be doing right
|
||||||
|
now.
|
||||||
|
|
||||||
|
<!-- truncate -->
|
||||||
|
|
||||||
|
If you're here for the bread pics, and to marvel in some other culinary side projects, I've got you
|
||||||
|
covered:
|
||||||
|
|
||||||
|
![Saturday Bread](./bread.jpg)
|
||||||
|
|
||||||
|
And no, I'm not posting pictures of earlier attempts that ended up turning into rocks in the oven.
|
||||||
|
|
||||||
|
Okay, just one:
|
||||||
|
|
||||||
|
![Bread as rock](./rocks.jpg)
|
||||||
|
|
||||||
|
Thanks, and keep it amazing.
|
BIN
blog/2018-05-28-hello/rocks.jpg
Normal file
BIN
blog/2018-05-28-hello/rocks.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 926 KiB |
177
blog/2018-06-25-dateutil-parser-to-rust/_article.md
Normal file
177
blog/2018-06-25-dateutil-parser-to-rust/_article.md
Normal file
@ -0,0 +1,177 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: "What I Learned: Porting Dateutil Parser to Rust"
|
||||||
|
description: ""
|
||||||
|
category:
|
||||||
|
tags: [dtparse, rust]
|
||||||
|
---
|
||||||
|
|
||||||
|
Hi. I'm Bradlee.
|
||||||
|
|
||||||
|
I've mostly been a lurker in Rust for a while, making a couple small contributions here and there.
|
||||||
|
So launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a
|
||||||
|
functioning member of society. But not too much, because then you know people start asking you to
|
||||||
|
pay bills, and ain't nobody got time for that.
|
||||||
|
|
||||||
|
But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you
|
||||||
|
what to do with your life (but you should totally keep reading).
|
||||||
|
|
||||||
|
# Slow down, what?
|
||||||
|
|
||||||
|
OK, fine, I guess I should start with _why_ someone would do this.
|
||||||
|
|
||||||
|
[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The
|
||||||
|
standard library support for time in Python is kinda dope, but there are a lot of extras that go
|
||||||
|
into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html)
|
||||||
|
module. `dateutil.parser` specifically is code to take all the super-weird time formats people come
|
||||||
|
up with and turn them into something actually useful.
|
||||||
|
|
||||||
|
Date/time parsing, it turns out, is just like everything else involving
|
||||||
|
[computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and
|
||||||
|
[time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it
|
||||||
|
feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people
|
||||||
|
suck and this is why
|
||||||
|
[we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But
|
||||||
|
alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like
|
||||||
|
_Time_.
|
||||||
|
|
||||||
|
![A gravel mound](/assets/images/2018-06-25-gravel-mound.jpg)
|
||||||
|
|
||||||
|
> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php)
|
||||||
|
|
||||||
|
What makes `dateutil.parser` great is that there's single function with a single argument that
|
||||||
|
drives what programmers interact with:
|
||||||
|
[`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258).
|
||||||
|
It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can
|
||||||
|
possibly do to make sense of your input" value. It doesn't expect much of you.
|
||||||
|
|
||||||
|
[And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332)
|
||||||
|
|
||||||
|
# Lost in Translation
|
||||||
|
|
||||||
|
Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm
|
||||||
|
admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can
|
||||||
|
actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better
|
||||||
|
experiences than
|
||||||
|
[this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us)
|
||||||
|
who attempted the same thing for D. These are the actual take-aways:
|
||||||
|
|
||||||
|
When transcribing code, **stay as close to the original library as possible**. I'm talking about
|
||||||
|
using the same variable names, same access patterns, the whole shebang. It's way too easy to make a
|
||||||
|
couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference
|
||||||
|
manual for verbatim what your code should be means that you don't spend that long debugging
|
||||||
|
complicated logic, you're more looking for typos.
|
||||||
|
|
||||||
|
Also, **don't use nice Rust things like enums**. While
|
||||||
|
[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94),
|
||||||
|
I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a
|
||||||
|
boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In
|
||||||
|
general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate
|
||||||
|
the same functionality.
|
||||||
|
|
||||||
|
**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So
|
||||||
|
when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out
|
||||||
|
[he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while
|
||||||
|
`dateutil` is pretty well-behaved about not skipping multiple stack frames,
|
||||||
|
[130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865)
|
||||||
|
take a while to verify.
|
||||||
|
|
||||||
|
As another Python quirk, **be very careful about
|
||||||
|
[long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**.
|
||||||
|
I used to think that Python's whitespace was just there to get you to format your code correctly. I
|
||||||
|
think that no longer. It's way too easy to close a block too early and have incredibly weird issues
|
||||||
|
in the logic. Make sure you use an editor that displays indentation levels so you can keep things
|
||||||
|
straight.
|
||||||
|
|
||||||
|
**Rust macros are not free.** I originally had the
|
||||||
|
[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217)
|
||||||
|
wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile.
|
||||||
|
After
|
||||||
|
[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205)
|
||||||
|
compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code
|
||||||
|
to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually
|
||||||
|
functions that need to be liberated, man.
|
||||||
|
|
||||||
|
Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison,
|
||||||
|
see
|
||||||
|
[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476)
|
||||||
|
and
|
||||||
|
[the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629).
|
||||||
|
I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be
|
||||||
|
added through macros or syntax extensions. Either way, they're expressive, save typing, and are
|
||||||
|
super-readable. Let's get more of that.
|
||||||
|
|
||||||
|
# Using a young language
|
||||||
|
|
||||||
|
Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On
|
||||||
|
more than one occasion though, I've had issues navigating the Rust ecosystem.
|
||||||
|
|
||||||
|
What I'll call the "canonical library" is still being built. In Python, if you need datetime
|
||||||
|
parsing, you use `dateutil`. If you want `decimal` types, it's already in the
|
||||||
|
[standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away
|
||||||
|
with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to
|
||||||
|
the original library as possible**. Thus began my quest to find a decimal library in Rust. What I
|
||||||
|
quickly found was summarized in a comment:
|
||||||
|
|
||||||
|
> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard.
|
||||||
|
>
|
||||||
|
> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794)
|
||||||
|
|
||||||
|
In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal)
|
||||||
|
[different](https://crates.io/crates/rust_decimal)
|
||||||
|
[implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate).
|
||||||
|
And that's a lot of decisions to worry about when all I'm thinking is "why can't
|
||||||
|
[calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig
|
||||||
|
through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916)
|
||||||
|
[different](https://github.com/rust-lang/rfcs/issues/334)
|
||||||
|
[threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead
|
||||||
|
or just stable.
|
||||||
|
|
||||||
|
And even when the "canonical library" exists, there's no guarantees that it will be well-maintained.
|
||||||
|
[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just
|
||||||
|
released version 0.4.4 like two days ago. Meanwhile,
|
||||||
|
[chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though
|
||||||
|
[there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I
|
||||||
|
know relatively little about it, but it appears that most of the release process is automated;
|
||||||
|
keeping that up to date should be a no-brainer.
|
||||||
|
|
||||||
|
## Trial Maintenance Policy
|
||||||
|
|
||||||
|
Specifically given "maintenance" being an
|
||||||
|
[oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/)
|
||||||
|
issue, I'm going to try out the following policy to keep things moving on `dtparse`:
|
||||||
|
|
||||||
|
1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure
|
||||||
|
nobody's blocking on me.
|
||||||
|
|
||||||
|
2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the
|
||||||
|
contributor to check in after two weeks, and close the issue without resolution if I hear nothing
|
||||||
|
back after a month.
|
||||||
|
|
||||||
|
The second point I think has the potential to be a bit controversial, so I'm happy to receive
|
||||||
|
feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm
|
||||||
|
running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And
|
||||||
|
second: I don't mind keeping those requests going indefinitely. I just want to try and balance
|
||||||
|
keeping things moving with giving people the necessary time they need.
|
||||||
|
|
||||||
|
I should also note that I'm still getting some best practices in place - CONTRIBUTING and
|
||||||
|
CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are
|
||||||
|
perfect.
|
||||||
|
|
||||||
|
# Roadmap and Conclusion
|
||||||
|
|
||||||
|
So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not
|
||||||
|
nearly ambitious enough.
|
||||||
|
|
||||||
|
Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command
|
||||||
|
can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a
|
||||||
|
coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it
|
||||||
|
doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime)
|
||||||
|
could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each
|
||||||
|
other?
|
||||||
|
|
||||||
|
All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month
|
||||||
|
on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going
|
||||||
|
to take it hard.
|
||||||
|
|
||||||
|
And in the mean time, I'm looking forward to building more. Onwards.
|
BIN
blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg
Normal file
BIN
blog/2018-06-25-dateutil-parser-to-rust/gravel-mound.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 165 KiB |
177
blog/2018-06-25-dateutil-parser-to-rust/index.mdx
Normal file
177
blog/2018-06-25-dateutil-parser-to-rust/index.mdx
Normal file
@ -0,0 +1,177 @@
|
|||||||
|
---
|
||||||
|
slug: 2018/06/dateutil-parser-to-rust
|
||||||
|
title: "What I Learned: Porting Dateutil Parser to Rust"
|
||||||
|
date: 2018-06-25 12:00:00
|
||||||
|
authors: [bspeice]
|
||||||
|
tags: []
|
||||||
|
---
|
||||||
|
|
||||||
|
I've mostly been a lurker in Rust for a while, making a couple small contributions here and there.
|
||||||
|
So launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a
|
||||||
|
functioning member of society. But not too much, because then you know people start asking you to
|
||||||
|
pay bills, and ain't nobody got time for that.
|
||||||
|
|
||||||
|
<!-- truncate -->
|
||||||
|
|
||||||
|
But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you
|
||||||
|
what to do with your life (but you should totally keep reading).
|
||||||
|
|
||||||
|
## Slow down, what?
|
||||||
|
|
||||||
|
OK, fine, I guess I should start with _why_ someone would do this.
|
||||||
|
|
||||||
|
[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The
|
||||||
|
standard library support for time in Python is kinda dope, but there are a lot of extras that go
|
||||||
|
into making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html)
|
||||||
|
module. `dateutil.parser` specifically is code to take all the super-weird time formats people come
|
||||||
|
up with and turn them into something actually useful.
|
||||||
|
|
||||||
|
Date/time parsing, it turns out, is just like everything else involving
|
||||||
|
[computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and
|
||||||
|
[time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it
|
||||||
|
feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people
|
||||||
|
suck and this is why
|
||||||
|
[we can't we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But
|
||||||
|
alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like
|
||||||
|
_Time_.
|
||||||
|
|
||||||
|
![A gravel mound](./gravel-mound.jpg)
|
||||||
|
|
||||||
|
> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php)
|
||||||
|
|
||||||
|
What makes `dateutil.parser` great is that there's single function with a single argument that
|
||||||
|
drives what programmers interact with:
|
||||||
|
[`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258).
|
||||||
|
It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can
|
||||||
|
possibly do to make sense of your input" value. It doesn't expect much of you.
|
||||||
|
|
||||||
|
[And now it's in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332)
|
||||||
|
|
||||||
|
## Lost in Translation
|
||||||
|
|
||||||
|
Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm
|
||||||
|
admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can
|
||||||
|
actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better
|
||||||
|
experiences than
|
||||||
|
[this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us)
|
||||||
|
who attempted the same thing for D. These are the actual take-aways:
|
||||||
|
|
||||||
|
When transcribing code, **stay as close to the original library as possible**. I'm talking about
|
||||||
|
using the same variable names, same access patterns, the whole shebang. It's way too easy to make a
|
||||||
|
couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference
|
||||||
|
manual for verbatim what your code should be means that you don't spend that long debugging
|
||||||
|
complicated logic, you're more looking for typos.
|
||||||
|
|
||||||
|
Also, **don't use nice Rust things like enums**. While
|
||||||
|
[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94),
|
||||||
|
I also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a
|
||||||
|
boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In
|
||||||
|
general, writing nice code _should not be a first-pass priority_ when you're just trying to recreate
|
||||||
|
the same functionality.
|
||||||
|
|
||||||
|
**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So
|
||||||
|
when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out
|
||||||
|
[he's not quite right](https://github.com/rust-lang/rfcs/pull/243), and I'm OK with that. And while
|
||||||
|
`dateutil` is pretty well-behaved about not skipping multiple stack frames,
|
||||||
|
[130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865)
|
||||||
|
take a while to verify.
|
||||||
|
|
||||||
|
As another Python quirk, **be very careful about
|
||||||
|
[long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**.
|
||||||
|
I used to think that Python's whitespace was just there to get you to format your code correctly. I
|
||||||
|
think that no longer. It's way too easy to close a block too early and have incredibly weird issues
|
||||||
|
in the logic. Make sure you use an editor that displays indentation levels so you can keep things
|
||||||
|
straight.
|
||||||
|
|
||||||
|
**Rust macros are not free.** I originally had the
|
||||||
|
[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217)
|
||||||
|
wrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile.
|
||||||
|
After
|
||||||
|
[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205)
|
||||||
|
compile times dropped down to ~5 seconds. Turns out 150 lines \* 100 tests = a lot of redundant code
|
||||||
|
to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually
|
||||||
|
functions that need to be liberated, man.
|
||||||
|
|
||||||
|
Finally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison,
|
||||||
|
see
|
||||||
|
[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476)
|
||||||
|
and
|
||||||
|
[the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629).
|
||||||
|
I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be
|
||||||
|
added through macros or syntax extensions. Either way, they're expressive, save typing, and are
|
||||||
|
super-readable. Let's get more of that.
|
||||||
|
|
||||||
|
## Using a young language
|
||||||
|
|
||||||
|
Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On
|
||||||
|
more than one occasion though, I've had issues navigating the Rust ecosystem.
|
||||||
|
|
||||||
|
What I'll call the "canonical library" is still being built. In Python, if you need datetime
|
||||||
|
parsing, you use `dateutil`. If you want `decimal` types, it's already in the
|
||||||
|
[standard library](https://docs.python.org/3.6/library/decimal.html). While I might've gotten away
|
||||||
|
with `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to
|
||||||
|
the original library as possible**. Thus began my quest to find a decimal library in Rust. What I
|
||||||
|
quickly found was summarized in a comment:
|
||||||
|
|
||||||
|
> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard.
|
||||||
|
>
|
||||||
|
> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794)
|
||||||
|
|
||||||
|
In practice, this means that there are at least [4](https://crates.io/crates/bigdecimal)
|
||||||
|
[different](https://crates.io/crates/rust_decimal)
|
||||||
|
[implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate).
|
||||||
|
And that's a lot of decisions to worry about when all I'm thinking is "why can't
|
||||||
|
[calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing" and I'm forced to dig
|
||||||
|
through a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916)
|
||||||
|
[different](https://github.com/rust-lang/rfcs/issues/334)
|
||||||
|
[threads](https://github.com/rust-num/num/issues/8) to figure out if the library I'm look at is dead
|
||||||
|
or just stable.
|
||||||
|
|
||||||
|
And even when the "canonical library" exists, there's no guarantees that it will be well-maintained.
|
||||||
|
[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just
|
||||||
|
released version 0.4.4 like two days ago. Meanwhile,
|
||||||
|
[chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though
|
||||||
|
[there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I
|
||||||
|
know relatively little about it, but it appears that most of the release process is automated;
|
||||||
|
keeping that up to date should be a no-brainer.
|
||||||
|
|
||||||
|
## Trial Maintenance Policy
|
||||||
|
|
||||||
|
Specifically given "maintenance" being an
|
||||||
|
[oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/)
|
||||||
|
issue, I'm going to try out the following policy to keep things moving on `dtparse`:
|
||||||
|
|
||||||
|
1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure
|
||||||
|
nobody's blocking on me.
|
||||||
|
|
||||||
|
2. To keep issues/PRs needing _contributor_ feedback moving, I'm going to (kindly) ask the
|
||||||
|
contributor to check in after two weeks, and close the issue without resolution if I hear nothing
|
||||||
|
back after a month.
|
||||||
|
|
||||||
|
The second point I think has the potential to be a bit controversial, so I'm happy to receive
|
||||||
|
feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm
|
||||||
|
running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And
|
||||||
|
second: I don't mind keeping those requests going indefinitely. I just want to try and balance
|
||||||
|
keeping things moving with giving people the necessary time they need.
|
||||||
|
|
||||||
|
I should also note that I'm still getting some best practices in place - CONTRIBUTING and
|
||||||
|
CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are
|
||||||
|
perfect.
|
||||||
|
|
||||||
|
## Roadmap and Conclusion
|
||||||
|
|
||||||
|
So if I've now built a `dateutil`-compatible parser, we're done, right? Of course not! That's not
|
||||||
|
nearly ambitious enough.
|
||||||
|
|
||||||
|
Ultimately, I'd love to have a library that's capable of parsing everything the Linux `date` command
|
||||||
|
can do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a
|
||||||
|
coreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it
|
||||||
|
doesn't bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime)
|
||||||
|
could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each
|
||||||
|
other?
|
||||||
|
|
||||||
|
All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month
|
||||||
|
on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going
|
||||||
|
to take it hard.
|
||||||
|
|
||||||
|
And in the mean time, I'm looking forward to building more. Onwards.
|
323
blog/2018-09-01-primitives-in-rust-are-weird/index.mdx
Normal file
323
blog/2018-09-01-primitives-in-rust-are-weird/index.mdx
Normal file
@ -0,0 +1,323 @@
|
|||||||
|
---
|
||||||
|
slug: 2018/09/primitives-in-rust-are-weird
|
||||||
|
title: "Primitives in Rust are weird (and cool)"
|
||||||
|
date: 2018-09-01 12:00:00
|
||||||
|
authors: [bspeice]
|
||||||
|
tags: []
|
||||||
|
---
|
||||||
|
|
||||||
|
I wrote a really small Rust program a while back because I was curious. I was 100% convinced it
|
||||||
|
couldn't possibly run:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
fn main() {
|
||||||
|
println!("{}", 8.to_string())
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And to my complete befuddlement, it compiled, ran, and produced a completely sensible output.
|
||||||
|
|
||||||
|
<!-- truncate -->
|
||||||
|
|
||||||
|
The reason I was so surprised has to do with how Rust treats a special category of things I'm going to
|
||||||
|
call _primitives_. In the current version of the Rust book, you'll see them referred to as
|
||||||
|
[scalars][rust_scalar], and in older versions they'll be called [primitives][rust_primitive], but
|
||||||
|
we're going to stick with the name _primitive_ for the time being. Explaining why this program is so
|
||||||
|
cool requires talking about a number of other programming languages, and keeping a consistent
|
||||||
|
terminology makes things easier.
|
||||||
|
|
||||||
|
**You've been warned:** this is going to be a tedious post about a relatively minor issue that
|
||||||
|
involves Java, Python, C, and x86 Assembly. And also me pretending like I know what I'm talking
|
||||||
|
about with assembly.
|
||||||
|
|
||||||
|
## Defining primitives (Java)
|
||||||
|
|
||||||
|
The reason I'm using the name _primitive_ comes from how much of my life is Java right now. For the most part I like Java, but I digress. In Java, there's a special
|
||||||
|
name for some specific types of values:
|
||||||
|
|
||||||
|
> ```
|
||||||
|
> bool char byte
|
||||||
|
> short int long
|
||||||
|
> float double
|
||||||
|
> ```
|
||||||
|
|
||||||
|
They are referred to as [primitives][java_primitive]. And relative to the other bits of Java,
|
||||||
|
they have two unique features. First, they don't have to worry about the
|
||||||
|
[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions);
|
||||||
|
primitives in Java can never be `null`. Second: *they can't have instance methods*.
|
||||||
|
Remember that Rust program from earlier? Java has no idea what to do with it:
|
||||||
|
|
||||||
|
```java
|
||||||
|
class Main {
|
||||||
|
public static void main(String[] args) {
|
||||||
|
int x = 8;
|
||||||
|
System.out.println(x.toString()); // Triggers a compiler error
|
||||||
|
}
|
||||||
|
}
|
||||||
|
````
|
||||||
|
|
||||||
|
The error is:
|
||||||
|
|
||||||
|
```
|
||||||
|
Main.java:5: error: int cannot be dereferenced
|
||||||
|
System.out.println(x.toString());
|
||||||
|
^
|
||||||
|
1 error
|
||||||
|
```
|
||||||
|
|
||||||
|
Specifically, Java's [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html)
|
||||||
|
and things that inherit from it are pointers under the hood, and we have to dereference them before
|
||||||
|
the fields and methods they define can be used. In contrast, _primitive types are just values_ -
|
||||||
|
there's nothing to be dereferenced. In memory, they're just a sequence of bits.
|
||||||
|
|
||||||
|
If we really want, we can turn the `int` into an
|
||||||
|
[`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference
|
||||||
|
it, but it's a bit wasteful:
|
||||||
|
|
||||||
|
```java
|
||||||
|
class Main {
|
||||||
|
public static void main(String[] args) {
|
||||||
|
int x = 8;
|
||||||
|
Integer y = Integer.valueOf(x);
|
||||||
|
System.out.println(y.toString());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we
|
||||||
|
dereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit
|
||||||
|
differently, but we have to dig into the low-level details to see it in action.
|
||||||
|
|
||||||
|
## Low Level Handling of Primitives (C)
|
||||||
|
|
||||||
|
We first need to build a foundation for reading and understanding the assembly code the final answer
|
||||||
|
requires. Let's begin with showing how the `C` language (and your computer) thinks about "primitive"
|
||||||
|
values in memory:
|
||||||
|
|
||||||
|
```c
|
||||||
|
void my_function(int num) {}
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
int x = 8;
|
||||||
|
my_function(x);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the
|
||||||
|
assembly-level code that's generated: <small>whose output has been lightly
|
||||||
|
edited</small>
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
main:
|
||||||
|
push rbp
|
||||||
|
mov rbp, rsp
|
||||||
|
sub rsp, 16
|
||||||
|
|
||||||
|
; We assign the value `8` to `x` here
|
||||||
|
mov DWORD PTR [rbp-4], 8
|
||||||
|
|
||||||
|
; And copy the bits making up `x` to a location
|
||||||
|
; `my_function` can access (`edi`)
|
||||||
|
mov eax, DWORD PTR [rbp-4]
|
||||||
|
mov edi, eax
|
||||||
|
|
||||||
|
; Call `my_function` and give it control
|
||||||
|
call my_function
|
||||||
|
|
||||||
|
mov eax, 0
|
||||||
|
leave
|
||||||
|
ret
|
||||||
|
|
||||||
|
my_function:
|
||||||
|
push rbp
|
||||||
|
mov rbp, rsp
|
||||||
|
|
||||||
|
; Copy the bits out of the pre-determined location (`edi`)
|
||||||
|
; to somewhere we can use
|
||||||
|
mov DWORD PTR [rbp-4], edi
|
||||||
|
nop
|
||||||
|
|
||||||
|
pop rbp
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
At a really low level of memory, we're copying bits around using the [`mov`][x86_guide] instruction;
|
||||||
|
nothing crazy. But to show how similar Rust is, let's take a look at our program translated from C
|
||||||
|
to Rust:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
fn my_function(x: i32) {}
|
||||||
|
|
||||||
|
fn main() {
|
||||||
|
let x = 8;
|
||||||
|
my_function(x)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And the assembly generated when we stick it in the
|
||||||
|
[compiler explorer](https://godbolt.org/z/cAlmk0): <small>again, lightly
|
||||||
|
edited</small>
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
example::main:
|
||||||
|
push rax
|
||||||
|
|
||||||
|
; Look familiar? We're copying bits to a location for `my_function`
|
||||||
|
; The compiler just optimizes out holding `x` in memory
|
||||||
|
mov edi, 8
|
||||||
|
|
||||||
|
; Call `my_function` and give it control
|
||||||
|
call example::my_function
|
||||||
|
|
||||||
|
pop rax
|
||||||
|
ret
|
||||||
|
|
||||||
|
example::my_function:
|
||||||
|
sub rsp, 4
|
||||||
|
|
||||||
|
; And copying those bits again, just like in C
|
||||||
|
mov dword ptr [rsp], edi
|
||||||
|
|
||||||
|
add rsp, 4
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
The generated Rust assembly is functionally pretty close to the C assembly: _When working with
|
||||||
|
primitives, we're just dealing with bits in memory_.
|
||||||
|
|
||||||
|
In Java we have to dereference a pointer to call its functions; in Rust, there's no pointer to
|
||||||
|
dereference. So what exactly is going on with this `.to_string()` function call?
|
||||||
|
|
||||||
|
## impl primitive (and Python)
|
||||||
|
|
||||||
|
Now it's time to <strike>reveal my trap card</strike> show the revelation that tied all this
|
||||||
|
together: _Rust has implementations for its primitive types._ That's right, `impl` blocks aren't
|
||||||
|
only for `structs` and `traits`, primitives get them too. Don't believe me? Check out
|
||||||
|
[u32](https://doc.rust-lang.org/std/primitive.u32.html),
|
||||||
|
[f64](https://doc.rust-lang.org/std/primitive.f64.html) and
|
||||||
|
[char](https://doc.rust-lang.org/std/primitive.char.html) as examples.
|
||||||
|
|
||||||
|
But the really interesting bit is how Rust turns those `impl` blocks into assembly. Let's break out
|
||||||
|
the [compiler explorer](https://godbolt.org/z/6LBEwq) once again:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn main() {
|
||||||
|
8.to_string()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
And the interesting bits in the assembly: <small>heavily trimmed down</small>
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
example::main:
|
||||||
|
sub rsp, 24
|
||||||
|
mov rdi, rsp
|
||||||
|
lea rax, [rip + .Lbyte_str.u]
|
||||||
|
mov rsi, rax
|
||||||
|
|
||||||
|
; Cool stuff right here
|
||||||
|
call <T as alloc::string::ToString>::to_string@PLT
|
||||||
|
|
||||||
|
mov rdi, rsp
|
||||||
|
call core::ptr::drop_in_place
|
||||||
|
add rsp, 24
|
||||||
|
ret
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, this assembly is a bit more complicated, but here's the big revelation: **we're calling
|
||||||
|
`to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead
|
||||||
|
of thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the
|
||||||
|
function we want to call (like Java), we have a function that exists outside of the instance and
|
||||||
|
just give that function the value `8`.
|
||||||
|
|
||||||
|
This is an incredibly technical detail, but the interesting idea I had was this: _if `to_string()`
|
||||||
|
is a static function, can I refer to the unbound function and give it an instance?_
|
||||||
|
|
||||||
|
Better explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I
|
||||||
|
seriously love this thing):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
struct MyVal {
|
||||||
|
x: u32
|
||||||
|
}
|
||||||
|
|
||||||
|
impl MyVal {
|
||||||
|
fn to_string(&self) -> String {
|
||||||
|
self.x.to_string()
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn main() {
|
||||||
|
let my_val = MyVal { x: 8 };
|
||||||
|
|
||||||
|
// THESE ARE THE SAME
|
||||||
|
my_val.to_string();
|
||||||
|
MyVal::to_string(&my_val);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Rust is totally fine "binding" the function call to the instance, and also as a static.
|
||||||
|
|
||||||
|
MIND == BLOWN.
|
||||||
|
|
||||||
|
Python does the same thing where I can both call functions bound to their instances and also call as
|
||||||
|
an unbound function where I give it the instance:
|
||||||
|
|
||||||
|
```python
|
||||||
|
class MyClass():
|
||||||
|
x = 24
|
||||||
|
|
||||||
|
def my_function(self):
|
||||||
|
print(self.x)
|
||||||
|
|
||||||
|
m = MyClass()
|
||||||
|
|
||||||
|
m.my_function()
|
||||||
|
MyClass.my_function(m)
|
||||||
|
```
|
||||||
|
|
||||||
|
And Python tries to make you _think_ that primitives can have instance methods...
|
||||||
|
|
||||||
|
```python
|
||||||
|
>>> dir(8)
|
||||||
|
['__abs__', '__add__', '__and__', '__class__', '__cmp__', '__coerce__',
|
||||||
|
'__delattr__', '__div__', '__divmod__', '__doc__', '__float__', '__floordiv__',
|
||||||
|
...
|
||||||
|
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__',
|
||||||
|
...]
|
||||||
|
|
||||||
|
>>> # Theoretically `8.__str__()` should exist, but:
|
||||||
|
|
||||||
|
>>> 8.__str__()
|
||||||
|
File "<stdin>", line 1
|
||||||
|
8.__str__()
|
||||||
|
^
|
||||||
|
SyntaxError: invalid syntax
|
||||||
|
|
||||||
|
>>> # It will run if we assign it first though:
|
||||||
|
>>> x = 8
|
||||||
|
>>> x.__str__()
|
||||||
|
'8'
|
||||||
|
```
|
||||||
|
|
||||||
|
...but in practice it's a bit complicated.
|
||||||
|
|
||||||
|
So while Python handles binding instance methods in a way similar to Rust, it's still not able to
|
||||||
|
run the example we started with.
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
This was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor
|
||||||
|
details like primitives leads to really cool effects. Primitives are optimized like C in how they
|
||||||
|
have a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python
|
||||||
|
(like both instance and late binding).
|
||||||
|
|
||||||
|
And when you put it together, there are areas where Rust does cool things nobody else can; as a
|
||||||
|
quirky feature of Rust's type system, `8.to_string()` is actually valid code.
|
||||||
|
|
||||||
|
Now go forth and fool your friends into thinking you know assembly. This is all I've got.
|
||||||
|
|
||||||
|
[x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
|
||||||
|
[java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
|
||||||
|
[rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types
|
||||||
|
[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html
|
@ -80,7 +80,7 @@ const config: Config = {
|
|||||||
prism: {
|
prism: {
|
||||||
theme: prismThemes.oneLight,
|
theme: prismThemes.oneLight,
|
||||||
darkTheme: prismThemes.oneDark,
|
darkTheme: prismThemes.oneDark,
|
||||||
additionalLanguages: ['julia']
|
additionalLanguages: ['java', 'julia', 'nasm']
|
||||||
},
|
},
|
||||||
} satisfies Preset.ThemeConfig,
|
} satisfies Preset.ThemeConfig,
|
||||||
plugins: [require.resolve('docusaurus-lunr-search')],
|
plugins: [require.resolve('docusaurus-lunr-search')],
|
||||||
|
Loading…
Reference in New Issue
Block a user