speice.io/2018/06/dateutil-parser-to-rust/index.html

142 lines
30 KiB
HTML
Raw Normal View History

<!doctype html><html lang=en dir=ltr class="blog-wrapper blog-post-page plugin-blog plugin-id-default" data-has-hydrated=false><meta charset=UTF-8><meta name=generator content="Docusaurus v3.6.0"><title data-rh=true>What I learned porting dateutil to Rust | The Old Speice Guy</title><meta data-rh=true name=viewport content="width=device-width,initial-scale=1.0"><meta data-rh=true name=twitter:card content=summary_large_image><meta data-rh=true property=og:url content=https://speice.io/2018/06/dateutil-parser-to-rust><meta data-rh=true property=og:locale content=en><meta data-rh=true name=docusaurus_locale content=en><meta data-rh=true name=docusaurus_tag content=default><meta data-rh=true name=docsearch:language content=en><meta data-rh=true name=docsearch:docusaurus_tag content=default><meta data-rh=true property=og:title content="What I learned porting dateutil to Rust | The Old Speice Guy"><meta data-rh=true name=description content="I've mostly been a lurker in Rust for a while, making a couple small contributions here and there."><meta data-rh=true property=og:description content="I've mostly been a lurker in Rust for a while, making a couple small contributions here and there."><meta data-rh=true property=og:type content=article><meta data-rh=true property=article:published_time content=2018-06-25T12:00:00.000Z><link data-rh=true rel=icon href=/img/favicon.ico><link data-rh=true rel=canonical href=https://speice.io/2018/06/dateutil-parser-to-rust><link data-rh=true rel=alternate href=https://speice.io/2018/06/dateutil-parser-to-rust hreflang=en><link data-rh=true rel=alternate href=https://speice.io/2018/06/dateutil-parser-to-rust hreflang=x-default><script data-rh=true type=application/ld+json>{"@context":"https://schema.org","@id":"https://speice.io/2018/06/dateutil-parser-to-rust","@type":"BlogPosting","author":{"@type":"Person","name":"Bradlee Speice"},"dateModified":"2024-11-10T01:23:31.000Z","datePublished":"2018-06-25T12:00:00.000Z","description":"I've mostly been a lurker in Rust for a while, making a couple small contributions here and there.","headline":"What I learned porting dateutil to Rust","isPartOf":{"@id":"https://speice.io/","@type":"Blog","name":"Blog"},"keywords":[],"mainEntityOfPage":"https://speice.io/2018/06/dateutil-parser-to-rust","name":"What I learned porting dateutil to Rust","url":"https://speice.io/2018/06/dateutil-parser-to-rust"}</script><link rel=alternate type=application/rss+xml href=/rss.xml title="The Old Speice Guy RSS Feed"><link rel=alternate type=application/atom+xml href=/atom.xml title="The Old Speice Guy Atom Feed"><link rel=stylesheet href=https://cdn.jsdelivr.net/npm/katex@0.13.24/dist/katex.min.css integrity=sha384-odtC+0UGzzFL/6PNoE8rX/SPcQDXBJ+uRepguP4QkPCm2LBxH3FA3y+fKSiJ+AmM crossorigin><link rel=stylesheet href=/assets/css/styles.ae6ff4a3.css><script src=/assets/js/runtime~main.751b419d.js defer></script><script src=/assets/js/main.62ce6156.js defer></script><body class=navigation-with-keyboard><script>!function(){var t,e=function(){try{return new URLSearchParams(window.location.search).get("docusaurus-theme")}catch(t){}}()||function(){try{return window.localStorage.getItem("theme")}catch(t){}}();t=null!==e?e:"light",document.documentElement.setAttribute("data-theme",t)}(),function(){try{for(var[t,e]of new URLSearchParams(window.location.search).entries())if(t.startsWith("docusaurus-data-")){var a=t.replace("docusaurus-data-","data-");document.documentElement.setAttribute(a,e)}}catch(t){}}()</script><div id=__docusaurus><div role=region aria-label="Skip to main content"><a class=skipToContent_fXgn href=#__docusaurus_skipToContent_fallback>Skip to main content</a></div><nav aria-label=Main class="navbar navbar--fixed-top"><div class=navbar__inner><div class=navbar__items><button aria-label="Toggle navigation bar" aria-expanded=false class="navbar__toggle clean-btn" type=button><svg width=30 height=30 viewBox="0 0 30 30" aria-hidden=true><path stroke=currentColor stroke-linecap=round stroke-miterlimit=10 stroke-width=2 d="M4 7h22M4 15h22M4 23h22"/></svg></button><a
So launching <a href=https://github.com/bspeice/dtparse target=_blank rel="noopener noreferrer">dtparse</a> feels like nice step towards becoming a
functioning member of society. But not too much, because then you know people start asking you to
pay bills, and ain't nobody got time for that.</p>
<p>But I built dtparse, and you can read about my thoughts on the process. Or don't. I won't tell you
what to do with your life (but you should totally keep reading).</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=slow-down-what>Slow down, what?<a href=#slow-down-what class=hash-link aria-label="Direct link to Slow down, what?" title="Direct link to Slow down, what?"></a></h2>
<p>OK, fine, I guess I should start with <em>why</em> someone would do this.</p>
<p><a href=https://github.com/dateutil/dateutil target=_blank rel="noopener noreferrer">Dateutil</a> is a Python library for handling dates. The
standard library support for time in Python is kinda dope, but there are a lot of extras that go
into making it useful beyond just the <a href=https://docs.python.org/3.6/library/datetime.html target=_blank rel="noopener noreferrer">datetime</a>
module. <code>dateutil.parser</code> specifically is code to take all the super-weird time formats people come
up with and turn them into something actually useful.</p>
<p>Date/time parsing, it turns out, is just like everything else involving
<a href=https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time target=_blank rel="noopener noreferrer">computers</a> and
<a href=https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time target=_blank rel="noopener noreferrer">time</a>: it
feels like it shouldn't be that difficult to do, until you try to do it, and you realize that people
suck and this is why
<a href=https://zachholman.com/talk/utc-is-enough-for-everyone-right target=_blank rel="noopener noreferrer">we can't we have nice things</a>. But
alas, we'll try and make contemporary art out of the rubble and give it a pretentious name like
<em>Time</em>.</p>
<p><img decoding=async loading=lazy alt="A gravel mound" src=/assets/images/gravel-mound-4afad8bdb1cd6b0e40dd2fd41adca36f.jpg width=800 height=374 class=img_ev3q></p>
<blockquote>
<p><a href=https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php target=_blank rel="noopener noreferrer">Time</a></p>
</blockquote>
<p>What makes <code>dateutil.parser</code> great is that there's single function with a single argument that
drives what programmers interact with:
<a href=https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258 target=_blank rel="noopener noreferrer"><code>parse(timestr)</code></a>.
It takes in the time as a string, and gives you back a reasonable "look, this is the best anyone can
possibly do to make sense of your input" value. It doesn't expect much of you.</p>
<p><a href=https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332 target=_blank rel="noopener noreferrer">And now it's in Rust.</a></p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=lost-in-translation>Lost in Translation<a href=#lost-in-translation class=hash-link aria-label="Direct link to Lost in Translation" title="Direct link to Lost in Translation"></a></h2>
<p>Having worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I'm
admittedly hesitant to publish Python code that's trying to be Rust. Interestingly, Rust code can
actually do a great job of mimicking Python. It's certainly not idiomatic Rust, but I've had better
experiences than
<a href="https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us" target=_blank rel="noopener noreferrer">this guy</a>
who attempted the same thing for D. These are the actual take-aways:</p>
<p>When transcribing code, <strong>stay as close to the original library as possible</strong>. I'm talking about
using the same variable names, same access patterns, the whole shebang. It's way too easy to make a
couple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference
manual for verbatim what your code should be means that you don't spend that long debugging
complicated logic, you're more looking for typos.</p>
<p>Also, <strong>don't use nice Rust things like enums</strong>. While
<a href=https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94 target=_blank rel="noopener noreferrer">one time it worked out OK for me</a>,
I also managed to shoot myself in the foot a couple times because <code>dateutil</code> stores AM/PM as a
boolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In
general, writing nice code <em>should not be a first-pass priority</em> when you're just trying to recreate
the same functionality.</p>
<p><strong>Exceptions are a pain.</strong> Make peace with it. Python code is just allowed to skip stack frames. So
when a co-worker told me "Rust is getting try-catch syntax" I properly freaked out. Turns out
<a href=https://github.com/rust-lang/rfcs/pull/243 target=_blank rel="noopener noreferrer">he's not quite right</a>, and I'm OK with that. And while
<code>dateutil</code> is pretty well-behaved about not skipping multiple stack frames,
<a href=https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865 target=_blank rel="noopener noreferrer">130-line try-catch blocks</a>
take a while to verify.</p>
<p>As another Python quirk, <strong>be very careful about
<a href=https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568 target=_blank rel="noopener noreferrer">long nested if-elif-else blocks</a></strong>.
I used to think that Python's whitespace was just there to get you to format your code correctly. I
think that no longer. It's way too easy to close a block too early and have incredibly weird issues
in the logic. Make sure you use an editor that displays indentation levels so you can keep things
straight.</p>
<p><strong>Rust macros are not free.</strong> I originally had the
<a href=https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217 target=_blank rel="noopener noreferrer">main test body</a>
wrapped up in a macro using <a href=https://github.com/PyO3/PyO3 target=_blank rel="noopener noreferrer">pyo3</a>. It took two minutes to compile.
After
<a href=https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205 target=_blank rel="noopener noreferrer">moving things to a function</a>
compile times dropped down to ~5 seconds. Turns out 150 lines * 100 tests = a lot of redundant code
to be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually
functions that need to be liberated, man.</p>
<p>Finally, <strong>I really miss list comprehensions and dictionary comprehensions.</strong> As a quick comparison,
see
<a href=https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476 target=_blank rel="noopener noreferrer">this dateutil code</a>
and
<a href=https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629 target=_blank rel="noopener noreferrer">the implementation in Rust</a>.
I probably wrote it wrong, and I'm sorry. Ultimately though, I hope that these comprehensions can be
added through macros or syntax extensions. Either way, they're expressive, save typing, and are
super-readable. Let's get more of that.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=using-a-young-language>Using a young language<a href=#using-a-young-language class=hash-link aria-label="Direct link to Using a young language" title="Direct link to Using a young language"></a></h2>
<p>Now, Rust is exciting and new, which means that there's opportunity to make a substantive impact. On
more than one occasion though, I've had issues navigating the Rust ecosystem.</p>
<p>What I'll call the "canonical library" is still being built. In Python, if you need datetime
parsing, you use <code>dateutil</code>. If you want <code>decimal</code> types, it's already in the
<a href=https://docs.python.org/3.6/library/decimal.html target=_blank rel="noopener noreferrer">standard library</a>. While I might've gotten away
with <code>f64</code>, <code>dateutil</code> uses decimals, and I wanted to follow the principle of <strong>staying as close to
the original library as possible</strong>. Thus began my quest to find a decimal library in Rust. What I
quickly found was summarized in a comment:</p>
<blockquote>
<p>Writing a BigDecimal is easy. Writing a <em>good</em> BigDecimal is hard.</p>
<p><a href=https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794 target=_blank rel="noopener noreferrer">-cmr</a></p>
</blockquote>
<p>In practice, this means that there are at least <a href=https://crates.io/crates/bigdecimal target=_blank rel="noopener noreferrer">4</a>
<a href=https://crates.io/crates/rust_decimal target=_blank rel="noopener noreferrer">different</a>
<a href=https://crates.io/crates/decimal target=_blank rel="noopener noreferrer">implementations</a> <a href=https://crates.io/crates/decimate target=_blank rel="noopener noreferrer">available</a>.
And that's a lot of decisions to worry about when all I'm thinking is "why can't
<a href=https://en.wikipedia.org/wiki/Calendar_reform target=_blank rel="noopener noreferrer">calendar reform</a> be a thing" and I'm forced to dig
through a <a href=https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916 target=_blank rel="noopener noreferrer">couple</a>
<a href=https://github.com/rust-lang/rfcs/issues/334 target=_blank rel="noopener noreferrer">different</a>
<a href=https://github.com/rust-num/num/issues/8 target=_blank rel="noopener noreferrer">threads</a> to figure out if the library I'm look at is dead
or just stable.</p>
<p>And even when the "canonical library" exists, there's no guarantees that it will be well-maintained.
<a href=https://github.com/chronotope/chrono target=_blank rel="noopener noreferrer">Chrono</a> is the <em>de facto</em> date/time library in Rust, and just
released version 0.4.4 like two days ago. Meanwhile,
<a href=https://github.com/chronotope/chrono-tz target=_blank rel="noopener noreferrer">chrono-tz</a> appears to be dead in the water even though
<a href=https://github.com/chronotope/chrono-tz/issues/19 target=_blank rel="noopener noreferrer">there are people happy to help maintain it</a>. I
know relatively little about it, but it appears that most of the release process is automated;
keeping that up to date should be a no-brainer.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=trial-maintenance-policy>Trial Maintenance Policy<a href=#trial-maintenance-policy class=hash-link aria-label="Direct link to Trial Maintenance Policy" title="Direct link to Trial Maintenance Policy"></a></h2>
<p>Specifically given "maintenance" being an
<a href=https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/ target=_blank rel="noopener noreferrer">oft-discussed</a>
issue, I'm going to try out the following policy to keep things moving on <code>dtparse</code>:</p>
<ol>
<li>
<p>Issues/PRs needing <em>maintainer</em> feedback will be updated at least weekly. I want to make sure
nobody's blocking on me.</p>
</li>
<li>
<p>To keep issues/PRs needing <em>contributor</em> feedback moving, I'm going to (kindly) ask the
contributor to check in after two weeks, and close the issue without resolution if I hear nothing
back after a month.</p>
</li>
</ol>
<p>The second point I think has the potential to be a bit controversial, so I'm happy to receive
feedback on that. And if a contributor responds with "hey, still working on it, had a kid and I'm
running on 30 seconds of sleep a night," then first: congratulations on sustaining human life. And
second: I don't mind keeping those requests going indefinitely. I just want to try and balance
keeping things moving with giving people the necessary time they need.</p>
<p>I should also note that I'm still getting some best practices in place - CONTRIBUTING and
CONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are
perfect.</p>
<h2 class="anchor anchorWithStickyNavbar_LWe7" id=roadmap-and-conclusion>Roadmap and Conclusion<a href=#roadmap-and-conclusion class=hash-link aria-label="Direct link to Roadmap and Conclusion" title="Direct link to Roadmap and Conclusion"></a></h2>
<p>So if I've now built a <code>dateutil</code>-compatible parser, we're done, right? Of course not! That's not
nearly ambitious enough.</p>
<p>Ultimately, I'd love to have a library that's capable of parsing everything the Linux <code>date</code> command
can do (and not <code>date</code> on OSX, because seriously, BSD coreutils are the worst). I know Rust has a
coreutils rewrite going on, and <code>dtparse</code> would potentially be an interesting candidate since it
doesn't bring in a lot of extra dependencies. <a href=https://crates.io/crates/humantime target=_blank rel="noopener noreferrer"><code>humantime</code></a>
could help pick up some of the (current) slack in dtparse, so maybe we can share and care with each
other?</p>
<p>All in all, I'm mostly hoping that nobody's already done this and I haven't spent a bit over a month
on redundant code. So if it exists, tell me. I need to know, but be nice about it, because I'm going
to take it hard.</p>
<p>And in the mean time, I'm looking forward to building more. Onwards.</div></article><nav class="pagination-nav docusaurus-mt-lg" aria-label="Blog post page navigation"><a class="pagination-nav__link pagination-nav__link--prev" href=/2018/05/hello><div class=pagination-nav__sublabel>Older post</div><div class=pagination-nav__label>Hello!</div></a><a class="pagination-nav__link pagination-nav__link--next" href=/2018/09/primitives-in-rust-are-weird><div class=pagination-nav__sublabel>Newer post</div><div class=pagination-nav__label>Primitives in Rust are weird (and cool)</div></a></nav></main><div class="col col--2"><div class="tableOfContents_bqdL thin-scrollbar"><ul class="table-of-contents table-of-contents__left-border"><li><a href=#slow-down-what class="table-of-contents__link toc-highlight">Slow down, what?</a><li><a href=#lost-in-translation class="table-of-contents__link toc-highlight">Lost in Translation</a><li><a href=#using-a-young-language class="table-of-contents__link toc-highlight">Using a young language</a><li><a href=#trial-maintenance-policy class="table-of-contents__link toc-highlight">Trial Maintenance Policy</a><li><a href=#roadmap-and-conclusion class="table-of-contents__link toc-highlight">Roadmap and Conclusion</a></ul></div></div></div></div></div><footer class=footer><div class="container container-fluid"><div class="footer__bottom text--center"><div class=footer__copyright>Copyright © 2024 Bradlee Speice</div></div></div></footer></div>