"use strict";(self.webpackChunkspeice_io=self.webpackChunkspeice_io||[]).push([["1225"],{6760:function(e){e.exports=JSON.parse('{"archive":{"blogPosts":[{"id":"2011/11/webpack-industrial-complex","metadata":{"permalink":"/2011/11/webpack-industrial-complex","source":"@site/blog/2022-11-20-webpack-industrial-complex/index.mdx","title":"The webpack industrial complex","description":"This started because I wanted to build a synthesizer. Setting a goal of \\"digital DX7\\" was ambitious, but I needed something unrelated to the day job. Beyond that, working with audio seemed like a good challenge. I enjoy performance-focused code, and performance problems in audio are conspicuous. Building a web project was an obvious choice because of the web audio API documentation and independence from a large Digital Audio Workstation (DAW).","date":"2022-11-20T12:00:00.000Z","tags":[],"readingTime":4.51,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2011/11/webpack-industrial-complex","title":"The webpack industrial complex","date":"2022-11-20T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731274898000,"nextItem":{"title":"Release the GIL","permalink":"/2019/12/release-the-gil"}},"content":"This started because I wanted to build a synthesizer. Setting a goal of \\"digital DX7\\" was ambitious, but I needed something unrelated to the day job. Beyond that, working with audio seemed like a good challenge. I enjoy performance-focused code, and performance problems in audio are conspicuous. Building a web project was an obvious choice because of the web audio API documentation and independence from a large Digital Audio Workstation (DAW).\\n\\nThe project was soon derailed trying to sort out technical issues unrelated to the original purpose. Finding a resolution was a frustrating journey, and it\'s still not clear whether those problems were my fault. As a result, I\'m writing this to try making sense of it, as a case study/reference material, and to salvage something from the process.\\n\\n\x3c!-- truncate --\x3e\\n\\n## Starting strong\\n\\nThe sole starting requirement was to write everything in TypeScript. Not because of project scale, but because guardrails help with unfamiliar territory. Keeping that in mind, the first question was: how does one start a new project? All I actually need is \\"compile TypeScript, show it in a browser.\\"\\n\\nCreate React App (CRA) came to the rescue and the rest of that evening was a joy. My TypeScript/JavaScript skills were rusty, but the online documentation was helpful. I had never understood the appeal of JSX (why put a DOM in JavaScript?) until it made connecting an `onEvent` handler and a function easy.\\n\\nSome quick dimensional analysis later and there was a sine wave oscillator playing A=440 through the speakers. I specifically remember thinking \\"modern browsers are magical.\\"\\n\\n## Continuing on\\n\\nNow comes the first mistake: I began to worry about \\"scale\\" before encountering an actual problem. Rather than rendering audio in the main thread, why not use audio worklets and render in a background thread instead?\\n\\nThe first sign something was amiss came from the TypeScript compiler errors showing the audio worklet API [was missing](https://github.com/microsoft/TypeScript/issues/28308). After searching out Github issues and (unsuccessfully) tweaking the `.tsconfig` settings, I settled on installing a package and moving on.\\n\\nThe next problem came from actually using the API. Worklets must load from separate \\"modules,\\" but it wasn\'t clear how to guarantee the worklet code stayed separate from the application. I saw recommendations to use `new URL(, import.meta.url)` and it worked! Well, kind of:\\n\\n![Browser error](./video_mp2t.png)\\n\\nThat file has the audio processor code, so why does it get served with `Content-Type: video/mp2t`?\\n\\n## Floundering about\\n\\nNow comes the second mistake: even though I didn\'t understand the error, I ignored recommendations to [just use JavaScript](https://hackernoon.com/implementing-audioworklets-with-react-8a80a470474) and stuck by the original TypeScript requirement.\\n\\nI tried different project structures. Moving the worklet code to a new folder didn\'t help, nor did setting up a monorepo and placing it in a new package.\\n\\nI tried three different CRA tools - `react-app-rewired`, `craco`, `customize-react-app` - but got the same problem. Each has varying levels of compatibility with recent CRA versions, so it wasn\'t clear if I had the right solution but implemented it incorrectly. After attempting to eject the application and panicking after seeing the configuration, I abandoned that as well.\\n\\nI tried changing the webpack configuration: using [new](https://github.com/webpack/webpack/issues/11543#issuecomment-917673256) [loaders](https://github.com/popelenkow/worker-url), setting [asset rules](https://github.com/webpack/webpack/discussions/14093#discussioncomment-1257149), even [changing how webpack detects worker resources](https://github.com/webpack/webpack/issues/11543#issuecomment-826897590). In hindsight, entry points may have been the answer. But because CRA actively resists attempts to change its webpack configuration, and I couldn\'t find audio worklet examples in any other framework, I gave up.\\n\\nI tried so many application frameworks. Next.js looked like a good candidate, but added its own [bespoke webpack complexity](https://github.com/vercel/next.js/issues/24907) to the existing confusion. Astro had the best \\"getting started\\" experience, but I refuse to install an IDE-specific plugin. I first used Deno while exploring Lume, but it couldn\'t import the audio worklet types (maybe because of module compatibility?). Each framework was unique in its own way (shout-out to SvelteKit) but I couldn\'t figure out how to make them work.\\n\\n## Learning and reflecting\\n\\nI ended up using Vite and vite-plugin-react-pages to handle both \\"build the app\\" and \\"bundle worklets,\\" but the specific tool choice isn\'t important. Instead, the focus should be on lessons learned.\\n\\nFor myself:\\n\\n- I\'m obsessed with tooling, to the point it can derail the original goal. While it comes from a good place (for example: \\"types are awesome\\"), it can get in the way of more important work\\n- I tend to reach for online resources right after seeing a new problem. While finding help online is often faster, spending time understanding the problem would have been more productive than cycling through (often outdated) blog posts\\n\\nFor the tools:\\n\\n- Resource bundling is great and solves a genuine challenge. I\'ve heard too many horror stories of developers writing modules by hand to believe this is unnecessary complexity\\n- Webpack is a build system and modern frameworks are deeply dependent on it (hence the \\"webpack industrial complex\\"). While this often saves users from unnecessary complexity, there\'s no path forward if something breaks\\n- There\'s little ability to mix and match tools across frameworks. Next.js and Gatsby let users extend webpack, but because each framework adds its own modules, changes aren\'t portable. After spending a week looking at webpack, I had an example running with parcel in thirty minutes, but couldn\'t integrate it\\n\\nIn the end, learning new systems is fun, but a focus on tools that \\"just work\\" can leave users out in the cold if they break down."},{"id":"2019/12/release-the-gil","metadata":{"permalink":"/2019/12/release-the-gil","source":"@site/blog/2019-12-14-release-the-gil/index.mdx","title":"Release the GIL","description":"Complaining about the Global Interpreter Lock","date":"2019-12-14T12:00:00.000Z","tags":[],"readingTime":8.58,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/12/release-the-gil","title":"Release the GIL","date":"2019-12-14T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731207983000,"prevItem":{"title":"The webpack industrial complex","permalink":"/2011/11/webpack-industrial-complex"},"nextItem":{"title":"Binary format shootout","permalink":"/2019/09/binary-format-shootout"}},"content":"Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)\\n(GIL) seems like a rite of passage for Python developers. It\'s easy to criticize a design decision\\nmade before multi-core CPU\'s were widely available, but the fact that it\'s still around indicates\\nthat it generally works [Good](https://wiki.c2.com/?PrematureOptimization)\\n[Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective\\nworkarounds; it\'s not hard to start a\\n[new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to\\nsynchronize code running in parallel.\\n\\nStill, wouldn\'t it be nice to have more than a single active interpreter thread? In an age of\\nasynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of\\nboth Python\'s productivity and the modern CPU\'s parallel capabilities.\\n\\n\x3c!-- truncate --\x3e\\n\\nPresented below are two strategies for releasing the GIL\'s icy grip without giving up on what makes\\nPython a nice language to start with. Bear in mind: these are just the tools, no claim is made about\\nwhether it\'s a good idea to use them. Very often, unlocking the GIL is an\\n[XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the\\nGIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at\\nthe expense of project complexity; messing with the GIL is ultimately messing with Python\'s memory\\nmodel.\\n\\n```python\\n%load_ext Cython\\nfrom numba import jit\\n\\nN = 1_000_000_000\\n```\\n\\n## Cython\\n\\nPut simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python,\\ngets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates\\nwell with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It\'s great for building Python\\nwrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And\\nwhen it comes to managing the GIL, there are two special features:\\n\\n- The `nogil`\\n [function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil)\\n asserts that a Cython function is safe to use without the GIL, and compilation will fail if it\\n interacts with Python in an unsafe manner\\n- The `with nogil`\\n [context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil)\\n explicitly unlocks the CPython GIL while active\\n\\nWhenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter\\nis unblocked and allowed to continue work elsewhere. We\'ll define a \\"busy work\\" function that\\ndemonstrates this principle in action:\\n\\n```python\\n%%cython\\n\\n# Annotating a function with `nogil` indicates only that it is safe\\n# to call in a `with nogil` block. It *does not* release the GIL.\\ncdef unsigned long fibonacci(unsigned long n) nogil:\\n if n <= 1:\\n return n\\n\\n cdef unsigned long a = 0, b = 1, c = 0\\n\\n c = a + b\\n for _i in range(2, n):\\n a = b\\n b = c\\n c = a + b\\n\\n return c\\n\\n\\ndef cython_nogil(unsigned long n):\\n # Explicitly release the GIL while running `fibonacci`\\n with nogil:\\n value = fibonacci(n)\\n\\n return value\\n\\n\\ndef cython_gil(unsigned long n):\\n # Because the GIL is not explicitly released, it implicitly\\n # remains acquired when running the `fibonacci` function\\n return fibonacci(n)\\n```\\n\\nFirst, let\'s time how long it takes Cython to calculate the billionth Fibonacci number:\\n\\n```python\\n%%time\\n_ = cython_gil(N);\\n```\\n\\n>

\\n> CPU times: user 365 ms, sys: 0 ns, total: 365 ms\\n> Wall time: 372 ms\\n>

\\n\\n```python\\n%%time\\n_ = cython_nogil(N);\\n```\\n\\n>

\\n> CPU times: user 381 ms, sys: 0 ns, total: 381 ms\\n> Wall time: 388 ms\\n>

\\n\\nBoth versions (with and without GIL) take effectively the same amount of time to run. Even when\\nrunning this calculation in parallel on separate threads, it is expected that the run time will\\ndouble because only one thread can be active at a time:\\n\\n```python\\n%%time\\nfrom threading import Thread\\n\\n# Create the two threads to run on\\nt1 = Thread(target=cython_gil, args=[N])\\nt2 = Thread(target=cython_gil, args=[N])\\n# Start the threads\\nt1.start(); t2.start()\\n# Wait for the threads to finish\\nt1.join(); t2.join()\\n```\\n\\n>

\\n> CPU times: user 641 ms, sys: 5.62 ms, total: 647 ms\\n> Wall time: 645 ms\\n>

\\n\\nHowever, if the first thread releases the GIL, the second thread is free to acquire it and run in\\nparallel:\\n\\n```python\\n%%time\\n\\nt1 = Thread(target=cython_nogil, args=[N])\\nt2 = Thread(target=cython_gil, args=[N])\\nt1.start(); t2.start()\\nt1.join(); t2.join()\\n```\\n\\n>

\\n> CPU times: user 717 ms, sys: 372 \xb5s, total: 718 ms\\n> Wall time: 358 ms\\n>

\\n\\nBecause `user` time represents the sum of processing time on all threads, it doesn\'t change much.\\nThe [\\"wall time\\"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half\\nbecause each function is running simultaneously.\\n\\nKeep in mind that the **order in which threads are started** makes a difference!\\n\\n```python\\n%%time\\n\\n# Note that the GIL-locked version is started first\\nt1 = Thread(target=cython_gil, args=[N])\\nt2 = Thread(target=cython_nogil, args=[N])\\nt1.start(); t2.start()\\nt1.join(); t2.join()\\n```\\n\\n>

\\n> CPU times: user 667 ms, sys: 0 ns, total: 667 ms\\n> Wall time: 672 ms\\n>

\\n\\nEven though the second thread releases the GIL while running, it can\'t start until the first has\\ncompleted. Thus, the overall runtime is effectively the same as running two GIL-locked threads.\\n\\nFinally, be aware that attempting to unlock the GIL from a thread that doesn\'t own it will crash the\\n**interpreter**, not just the thread attempting the unlock:\\n\\n```python\\n%%cython\\n\\ncdef int cython_recurse(int n) nogil:\\n if n <= 0:\\n return 0\\n\\n with nogil:\\n return cython_recurse(n - 1)\\n\\ncython_recurse(2)\\n```\\n\\n>

\\n> Fatal Python error: PyEval_SaveThread: NULL tstate\\n> \\n> Thread 0x00007f499effd700 (most recent call first):\\n>   File \\"/home/bspeice/.virtualenvs/release-the-gil/lib/python3.7/site-packages/ipykernel/parentpoller.py\\", line 39 in run\\n>   File \\"/usr/lib/python3.7/threading.py\\", line 926 in _bootstrap_inner\\n>   File \\"/usr/lib/python3.7/threading.py\\", line 890 in _bootstrap\\n>

\\n\\nIn practice, avoiding this issue is simple. First, `nogil` functions probably shouldn\'t contain\\n`with nogil` blocks. Second, Cython can\\n[conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil)\\nthe GIL, so these conditions can be used to synchronize access. Finally, Cython\'s documentation for\\n[external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil)\\ncontains more detail on how to safely manage the GIL.\\n\\nTo conclude: use Cython\'s `nogil` annotation to assert that functions are safe for calling when the\\nGIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions.\\n\\n## Numba\\n\\nLike Cython, [Numba](https://numba.pydata.org/) is a \\"compiled Python.\\" Where Cython works by\\ncompiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_\\nat runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function\\nfirst compiles it to machine code before running. Calling the function a second time re-uses that\\nmachine code unless the argument types have changed.\\n\\nNumba works best when a `nopython=True` argument is added to the `@jit` decorator; functions\\ncompiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode\\navoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the\\n`@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython`\\nare separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to\\nrelease the lock, the GIL will remain locked if `nogil=False` (the default).\\n\\nLet\'s repeat the same experiment, this time using Numba instead of Cython:\\n\\n```python\\n# The `int` type annotation is only for humans and is ignored\\n# by Numba.\\n@jit(nopython=True, nogil=True)\\ndef numba_nogil(n: int) -> int:\\n if n <= 1:\\n return n\\n\\n a = 0\\n b = 1\\n\\n c = a + b\\n for _i in range(2, n):\\n a = b\\n b = c\\n c = a + b\\n\\n return c\\n\\n\\n# Run using `nopython` mode to receive a performance boost,\\n# but GIL remains locked due to `nogil=False` by default.\\n@jit(nopython=True)\\ndef numba_gil(n: int) -> int:\\n if n <= 1:\\n return n\\n\\n a = 0\\n b = 1\\n\\n c = a + b\\n for _i in range(2, n):\\n a = b\\n b = c\\n c = a + b\\n\\n return c\\n\\n\\n# Call each function once to force compilation; we don\'t want\\n# the timing statistics to include how long it takes to compile.\\nnumba_nogil(N)\\nnumba_gil(N);\\n```\\n\\nWe\'ll perform the same tests as above; first, figure out how long it takes the function to run:\\n\\n```python\\n%%time\\n_ = numba_gil(N)\\n```\\n\\n>

\\n> CPU times: user 253 ms, sys: 258 \xb5s, total: 253 ms\\n> Wall time: 251 ms\\n>

\\n\\n\\nAside: it\'s not immediately clear why Numba takes ~20% less time to run than Cython for code that should be\\neffectively identical after compilation.\\n\\n\\nWhen running two GIL-locked threads, the result (as expected) takes around twice as long to compute:\\n\\n```python\\n%%time\\nt1 = Thread(target=numba_gil, args=[N])\\nt2 = Thread(target=numba_gil, args=[N])\\nt1.start(); t2.start()\\nt1.join(); t2.join()\\n```\\n\\n>

\\n> CPU times: user 541 ms, sys: 3.96 ms, total: 545 ms\\n> Wall time: 541 ms\\n>

\\n\\nBut if the GIL-unlocking thread starts first, both threads run in parallel:\\n\\n```python\\n%%time\\nt1 = Thread(target=numba_nogil, args=[N])\\nt2 = Thread(target=numba_gil, args=[N])\\nt1.start(); t2.start()\\nt1.join(); t2.join()\\n```\\n\\n>

\\n> CPU times: user 551 ms, sys: 7.77 ms, total: 559 ms\\n> Wall time: 279 ms\\n>

\\n\\nJust like Cython, starting the GIL-locked thread first leads to poor performance:\\n\\n```python\\n%%time\\nt1 = Thread(target=numba_gil, args=[N])\\nt2 = Thread(target=numba_nogil, args=[N])\\nt1.start(); t2.start()\\nt1.join(); t2.join()\\n```\\n\\n>

\\n> CPU times: user 524 ms, sys: 0 ns, total: 524 ms\\n> Wall time: 522 ms\\n>

\\n\\nFinally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired;\\nrecursively calling `@jit(nogil=True)` functions is perfectly safe:\\n\\n```python\\nfrom numba import jit\\n\\n@jit(nopython=True, nogil=True)\\ndef numba_recurse(n: int) -> int:\\n if n <= 0:\\n return 0\\n\\n return numba_recurse(n - 1)\\n\\nnumba_recurse(2);\\n```\\n\\n## Conclusion\\n\\nBefore finishing, it\'s important to address pain points that will show up if these techniques are\\nused in a more realistic project:\\n\\nFirst, code running in a GIL-free context will likely also need non-trivial data structures;\\nGIL-free functions aren\'t useful if they\'re constantly interacting with Python objects whose access\\nrequires the GIL. Cython provides\\n[extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba\\nprovides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to\\naddress this need.\\n\\nSecond, building and distributing applications that make use of Cython/Numba can be complicated.\\nCython packages require running the compiler, (potentially) linking/packaging external dependencies,\\nand distributing a binary wheel. Numba is generally simpler because the code being distributed is\\npure Python, but can be tricky since errors aren\'t detected until runtime.\\n\\nFinally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba\\nprovide tools to directly manage the GIL when appropriate. This enables true parallelism (not just\\n[concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python."},{"id":"2019/09/binary-format-shootout","metadata":{"permalink":"/2019/09/binary-format-shootout","source":"@site/blog/2019-09-28-binary-format-shootout/index.mdx","title":"Binary format shootout","description":"I\'ve found that in many personal projects,","date":"2019-09-28T12:00:00.000Z","tags":[],"readingTime":8.37,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/09/binary-format-shootout","title":"Binary format shootout","date":"2019-09-28T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731207983000,"prevItem":{"title":"Release the GIL","permalink":"/2019/12/release-the-gil"},"nextItem":{"title":"On building high performance systems","permalink":"/2019/06/high-performance-systems"}},"content":"I\'ve found that in many personal projects,\\n[analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) is particularly deadly.\\nMaking good decisions in the beginning avoids pain and suffering later; if extra research prevents\\nfuture problems, I\'m happy to continue ~~procrastinating~~ researching indefinitely.\\n\\nSo let\'s say you\'re in need of a binary serialization format. Data will be going over the network,\\nnot just in memory, so having a schema document and code generation is a must. Performance is\\ncrucial, so formats that support zero-copy de/serialization are given priority. And the more\\nlanguages supported, the better; I use Rust, but can\'t predict what other languages this could\\ninteract with.\\n\\nGiven these requirements, the candidates I could find were:\\n\\n\x3c!-- truncate --\x3e\\n\\n1. [Cap\'n Proto](https://capnproto.org/) has been around the longest, and is the most established\\n2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler\\n encoding\\n3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest\\n encoding, but the Rust implementation is unmaintained\\n\\nAny one of these will satisfy the project requirements: easy to transmit over a network, reasonably\\nfast, and polyglot support. But how do you actually pick one? It\'s impossible to know what issues\\nwill follow that choice, so I tend to avoid commitment until the last possible moment.\\n\\nStill, a choice must be made. Instead of worrying about which is \\"the best,\\" I decided to build a\\nsmall proof-of-concept system in each format and pit them against each other. All code can be found\\nin the [repository](https://github.com/speice-io/marketdata-shootout) for this post.\\n\\nWe\'ll discuss more in detail, but a quick preview of the results:\\n\\n- Cap\'n Proto: Theoretically performs incredibly well, the implementation had issues\\n- Flatbuffers: Has some quirks, but largely lived up to its \\"zero-copy\\" promises\\n- SBE: Best median and worst-case performance, but the message structure has a limited feature set\\n\\n## Prologue: Binary Parsing with Nom\\n\\nOur benchmark system will be a simple data processor; given depth-of-book market data from\\n[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema\\nformat, read it back, and calculate total size of stock traded and the lowest/highest quoted prices.\\nThis test isn\'t complex, but is representative of the project I need a binary format for.\\n\\nBut before we make it to that point, we have to actually read in the market data. To do so, I\'m\\nusing a library called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and\\nbrought some big changes, so this was an opportunity to build a non-trivial program and get\\nfamiliar.\\n\\nIf you don\'t already know about `nom`, it\'s a \\"parser generator\\". By combining different smaller\\nparsers, you can assemble a parser to handle complex structures without writing tedious code by\\nhand. For example, when parsing\\n[PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3):\\n\\n```\\n 0 1 2 3\\n 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1\\n +---------------------------------------------------------------+\\n 0 | Block Type = 0x00000006 |\\n +---------------------------------------------------------------+\\n 4 | Block Total Length |\\n +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\\n 8 | Interface ID |\\n +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\\n12 | Timestamp (High) |\\n +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\\n16 | Timestamp (Low) |\\n +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\\n20 | Captured Len |\\n +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\\n24 | Packet Len |\\n +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+\\n | Packet Data |\\n | ... |\\n```\\n\\n...you can build a parser in `nom` that looks like\\n[this](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93):\\n\\n```rust\\nconst ENHANCED_PACKET: [u8; 4] = [0x06, 0x00, 0x00, 0x00];\\npub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> {\\n let (\\n remaining,\\n (\\n block_type,\\n block_len,\\n interface_id,\\n timestamp_high,\\n timestamp_low,\\n captured_len,\\n packet_len,\\n ),\\n ) = tuple((\\n tag(ENHANCED_PACKET),\\n le_u32,\\n le_u32,\\n le_u32,\\n le_u32,\\n le_u32,\\n le_u32,\\n ))(input)?;\\n\\n let (remaining, packet_data) = take(captured_len)(remaining)?;\\n Ok((remaining, packet_data))\\n}\\n```\\n\\nWhile this example isn\'t too interesting, more complex formats (like IEX market data) are where\\n[`nom` really shines](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs).\\n\\nUltimately, because the `nom` code in this shootout was the same for all formats, we\'re not too\\ninterested in its performance. Still, it\'s worth mentioning that building the market data parser was\\nactually fun; I didn\'t have to write tons of boring code by hand.\\n\\n## Cap\'n Proto\\n\\nNow it\'s time to get into the meaty part of the story. Cap\'n Proto was the first format I tried\\nbecause of how long it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for\\nmaintaining the Rust port since\\n[2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton\\nof performance concerns once I started using it.\\n\\nTo serialize new messages, Cap\'n Proto uses a \\"builder\\" object. This builder allocates memory on the\\nheap to hold the message content, but because builders\\n[can\'t be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate a\\nnew buffer for every single message. I was able to work around this with a\\n[special builder](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51)\\nthat could re-use the buffer, but it required reading through Cap\'n Proto\'s\\n[benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)\\nto find an example, and used\\n[`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust\'s borrow\\nchecker.\\n\\nThe process of reading messages was better, but still had issues. Cap\'n Proto has two message\\nencodings: a [\\"packed\\"](https://capnproto.org/encoding.html#packing) representation, and an\\n\\"unpacked\\" version. When reading \\"packed\\" messages, we need a buffer to unpack the message into\\nbefore we can use it; Cap\'n Proto allocates a new buffer for each message we unpack, and I wasn\'t\\nable to figure out a way around that. In contrast, the unpacked message format should be where Cap\'n\\nProto shines; its main selling point is that there\'s [no decoding step](https://capnproto.org/).\\nHowever, accomplishing zero-copy deserialization required code in the private API\\n([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), and we allocate a vector on\\nevery read for the segment table.\\n\\nIn the end, I put in significant work to make Cap\'n Proto as fast as possible, but there were too\\nmany issues for me to feel comfortable using it long-term.\\n\\n## Flatbuffers\\n\\nThis is the new kid on the block. After a\\n[first attempt](https://github.com/google/flatbuffers/pull/3894) didn\'t pan out, official support\\nwas [recently launched](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to\\naddress the same problems as Cap\'n Proto: high-performance, polyglot, binary messaging. The\\ndifference is that Flatbuffers claims to have a simpler wire format and\\n[more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).\\n\\nOn the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is\\nnice, and unlike Cap\'n Proto, parsing messages was actually zero-copy and zero-allocation. However,\\nthere were still some issues.\\n\\nFirst, Flatbuffers (at least in Rust) can\'t handle nested vectors. This is a problem for formats\\nlike the following:\\n\\n```\\ntable Message {\\n symbol: string;\\n}\\ntable MultiMessage {\\n messages:[Message];\\n}\\n```\\n\\nWe want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself\\ncontains a vector (the `string` type). I was able to work around this by\\n[caching `Message` elements](https://github.com/speice-io/marketdata-shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)\\nin a `SmallVec` before building the final `MultiMessage`, but it was a painful process that I\\nbelieve contributed to poor serialization performance.\\n\\nSecond, streaming support in Flatbuffers seems to be something of an\\n[afterthought](https://github.com/google/flatbuffers/issues/3898). Where Cap\'n Proto in Rust handles\\nreading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of\\neach message to indicate the size. Not specifically a problem, but calculating message size without\\nthat tag is nigh on impossible.\\n\\nUltimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform\\nwell.\\n\\n## Simple Binary Encoding\\n\\nSupport for SBE was added by the author of one of my favorite\\n[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).\\nI\'ve [talked previously](/2019/06/high-performance-systems) about how important\\nvariance is in high-performance systems, so it was encouraging to read about a format that\\n[directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my\\nconcerns. SBE has by far the simplest binary format, but it does make some tradeoffs.\\n\\nBoth Cap\'n Proto and Flatbuffers use [message offsets](https://capnproto.org/encoding.html#structs)\\nto handle variable-length data, [unions](https://capnproto.org/language.html#unions), and various\\nother features. In contrast, messages in SBE are essentially\\n[just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml);\\nvariable-length data is supported, but there\'s no union type.\\n\\nAs mentioned in the beginning, the Rust port of SBE works well, but is\\n[essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9).\\nHowever, if you don\'t need union types, and can accept that schemas are XML documents, it\'s still\\nworth using. SBE\'s implementation had the best streaming support of all formats I tested, and\\ndoesn\'t trigger allocation during de/serialization.\\n\\n## Results\\n\\nAfter building a test harness\\n[for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs)\\n[each](https://github.com/speice-io/marketdata-shootout/blob/master/src/flatbuffers_runner.rs)\\n[format](https://github.com/speice-io/marketdata-shootout/blob/master/src/sbe_runner.rs), it was\\ntime to actually take them for a spin. I used\\n[this script](https://github.com/speice-io/marketdata-shootout/blob/master/run_shootout.sh) to run\\nthe benchmarks, and the raw results are\\n[here](https://github.com/speice-io/marketdata-shootout/blob/master/shootout.csv). All data reported\\nbelow is the average of 10 runs on a single day of IEX data. Results were validated to make sure\\nthat each format parsed the data correctly.\\n\\n### Serialization\\n\\nThis test measures, on a\\n[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272),\\nhow long it takes to serialize the IEX message into the desired format and write to a pre-allocated\\nbuffer.\\n\\n| Schema | Median | 99th Pctl | 99.9th Pctl | Total |\\n| :------------------- | :----- | :-------- | :---------- | :----- |\\n| Cap\'n Proto Packed | 413ns | 1751ns | 2943ns | 14.80s |\\n| Cap\'n Proto Unpacked | 273ns | 1828ns | 2836ns | 10.65s |\\n| Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s |\\n| SBE | 91ns | 1535ns | 2423ns | 3.91s |\\n\\n### Deserialization\\n\\nThis test measures, on a\\n[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298),\\nhow long it takes to read the previously-serialized message and perform some basic aggregation. The\\naggregation code is the same for each format, so any performance differences are due solely to the\\nformat implementation.\\n\\n| Schema | Median | 99th Pctl | 99.9th Pctl | Total |\\n| :------------------- | :----- | :-------- | :---------- | :----- |\\n| Cap\'n Proto Packed | 539ns | 1216ns | 2599ns | 18.92s |\\n| Cap\'n Proto Unpacked | 366ns | 737ns | 1583ns | 12.32s |\\n| Flatbuffers | 173ns | 421ns | 1007ns | 6.00s |\\n| SBE | 116ns | 286ns | 659ns | 4.05s |\\n\\n## Conclusion\\n\\nBuilding a benchmark turned out to be incredibly helpful in making a decision; because a \\"union\\"\\ntype isn\'t important to me, I can be confident that SBE best addresses my needs.\\n\\nWhile SBE was the fastest in terms of both median and worst-case performance, its worst case\\nperformance was proportionately far higher than any other format. It seems to be that\\nde/serialization time scales with message size, but I\'ll need to do some more research to understand\\nwhat exactly is going on."},{"id":"2019/06/high-performance-systems","metadata":{"permalink":"/2019/06/high-performance-systems","source":"@site/blog/2019-06-31-high-performance-systems/index.mdx","title":"On building high performance systems","description":"Prior to working in the trading industry, my assumption was that High Frequency Trading (HFT) is","date":"2019-07-01T12:00:00.000Z","tags":[],"readingTime":12.175,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/06/high-performance-systems","title":"On building high performance systems","date":"2019-07-01T12:00:00.000Z","last_updated":{"date":"2019-09-21T12:00:00.000Z"},"authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731207625000,"prevItem":{"title":"Binary format shootout","permalink":"/2019/09/binary-format-shootout"},"nextItem":{"title":"Making bread","permalink":"/2019/05/making-bread"}},"content":"Prior to working in the trading industry, my assumption was that High Frequency Trading (HFT) is\\nmade up of people who have access to secret techniques mortal developers could only dream of. There\\nhad to be some secret art that could only be learned if one had an appropriately tragic backstory.\\n\\n\x3c!-- truncate --\x3e\\n\\n![Kung Fu fight](./kung-fu.webp)\\n\\n> How I assumed HFT people learn their secret techniques\\n\\nHow else do you explain people working on systems that complete the round trip of market data in to\\norders out (a.k.a. tick-to-trade) consistently within\\n[750-800 nanoseconds](https://stackoverflow.com/a/22082528/1454178)? In roughly the time it takes a\\ncomputer to access\\n[main memory 8 times](https://people.eecs.berkeley.edu/~rcs/research/interactive_latency.html),\\ntrading systems are capable of reading the market data packets, deciding what orders to send, doing\\nrisk checks, creating new packets for exchange-specific protocols, and putting those packets on the\\nwire.\\n\\nHaving now worked in the trading industry, I can confirm the developers aren\'t super-human; I\'ve\\nmade some simple mistakes at the very least. Instead, what shows up in public discussions is that\\nphilosophy, not technique, separates high-performance systems from everything else.\\nPerformance-critical systems don\'t rely on \\"this one cool C++ optimization trick\\" to make code fast\\n(though micro-optimizations have their place); there\'s a lot more to worry about than just the code\\nwritten for the project.\\n\\nThe framework I\'d propose is this: **If you want to build high-performance systems, focus first on\\nreducing performance variance** (reducing the gap between the fastest and slowest runs of the same\\ncode), **and only look at average latency once variance is at an acceptable level**.\\n\\nDon\'t get me wrong, I\'m a much happier person when things are fast. Computer goes from booting in 20\\nseconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes\\na full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a\\nweek is the same in each situation, but you\'re painfully aware of that minute when it happens. When\\nit comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds\\ndoesn\'t mean much if there\'s a 100ms difference between your fastest and slowest runs. When\\nperformance matters, you need to respond quickly _every time_, not just in aggregate.\\nHigh-performance systems should first optimize for time variance. Once you\'re consistent at the time\\nscale you care about, then focus on improving average time.\\n\\nThis focus on variance shows up all the time in industry too (emphasis added in all quotes below):\\n\\n- In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for\\n NASDAQ\'s matching engine, the most performance-sensitive component of the exchange, dependability\\n is highlighted in addition to instantaneous metrics:\\n\\n > Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40\\n > microsecond average latency\\n\\n- The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance:\\n\\n > Performance is the key focus. Aeron is designed to be the highest throughput with the lowest and\\n > **most predictable latency possible** of any messaging system\\n\\n- The company PolySync, which is working on autonomous vehicles,\\n [mentions why](https://polysync.io/blog/session-types-for-hearty-codecs/) they picked their\\n specific messaging format:\\n\\n > In general, high performance is almost always desirable for serialization. But in the world of\\n > autonomous vehicles, **steady timing performance is even more important** than peak throughput.\\n > This is because safe operation is sensitive to timing outliers. Nobody wants the system that\\n > decides when to slam on the brakes to occasionally take 100 times longer than usual to encode\\n > its commands.\\n\\n- [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out\\n variance (jitter) as a big concern for\\n [electronic trading](https://solarflare.com/electronic-trading/):\\n > The high stakes world of electronic trading, investment banks, market makers, hedge funds and\\n > exchanges demand the **lowest possible latency and jitter** while utilizing the highest\\n > bandwidth and return on their investment.\\n\\nAnd to further clarify: we\'re not discussing _total run-time_, but variance of total run-time. There\\nare situations where it\'s not reasonably possible to make things faster, and you\'d much rather be\\nconsistent. For example, trading firms use\\n[wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because\\nthe speed of light through air is faster than through fiber-optic cables. There\'s still at _absolute\\nminimum_ a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between,\\nsay,\\n[Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago).\\nIf a trading system in Chicago calls the function for \\"send order to Tokyo\\" and waits to see if a\\ntrade occurs, there\'s a physical limit to how long that will take. In this situation, the focus is\\non keeping variance of _additional processing_ to a minimum, since speed of light is the limiting\\nfactor.\\n\\nSo how does one go about looking for and eliminating performance variance? To tell the truth, I\\ndon\'t think a systematic answer or flow-chart exists. There\'s no substitute for (A) building a deep\\nunderstanding of the entire technology stack, and (B) actually measuring system performance (though\\n(C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for\\ninspiration never hurt). Even then, every project cares about performance to a different degree; you\\nmay need to build an entire\\n[replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to\\naccurately benchmark at nanosecond precision, or you may be content to simply\\n[avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in\\nyour Java code.\\n\\nEven though everyone has different needs, there are still common things to look for when trying to\\nisolate and eliminate variance. In no particular order, these are my focus areas when thinking about\\nhigh-performance systems:\\n\\n**Update 2019-09-21**: Added notes on `isolcpus` and `systemd` affinity.\\n\\n## Language-specific\\n\\n**Garbage Collection**: How often does garbage collection happen? When is it triggered? What are the\\nimpacts?\\n\\n- [In Python](https://rushter.com/blog/python-garbage-collector/), individual objects are collected\\n if the reference count reaches 0, and each generation is collected if\\n `num_alloc - num_dealloc > gc_threshold` whenever an allocation happens. The GIL is acquired for\\n the duration of generational collection.\\n- Java has\\n [many](https://docs.oracle.com/en/java/javase/12/gctuning/parallel-collector1.html#GUID-DCDD6E46-0406-41D1-AB49-FB96A50EB9CE)\\n [different](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector.html#GUID-ED3AB6D3-FD9B-4447-9EDF-983ED2F7A573)\\n [collection](https://docs.oracle.com/en/java/javase/12/gctuning/garbage-first-garbage-collector-tuning.html#GUID-90E30ACA-8040-432E-B3A0-1E0440AB556A)\\n [algorithms](https://docs.oracle.com/en/java/javase/12/gctuning/z-garbage-collector1.html#GUID-A5A42691-095E-47BA-B6DC-FB4E5FAA43D0)\\n to choose from, each with different characteristics. The default algorithms (Parallel GC in Java\\n 8, G1 in Java 9) freeze the JVM while collecting, while more recent algorithms\\n ([ZGC](https://wiki.openjdk.java.net/display/zgc) and\\n [Shenandoah](https://wiki.openjdk.java.net/display/shenandoah)) are designed to keep \\"stop the\\n world\\" to a minimum by doing collection work in parallel.\\n\\n**Allocation**: Every language has a different way of interacting with \\"heap\\" memory, but the\\nprinciple is the same: running the allocator to allocate/deallocate memory takes time that can often\\nbe put to better use. Understanding when your language interacts with the allocator is crucial, and\\nnot always obvious. For example: C++ and Rust don\'t allocate heap memory for iterators, but Java\\ndoes (meaning potential GC pauses). Take time to understand heap behavior (I made a\\n[a guide for Rust](/2019/02/understanding-allocations-in-rust)), and look into alternative\\nallocators ([jemalloc](http://jemalloc.net/),\\n[tcmalloc](https://gperftools.github.io/gperftools/tcmalloc.html)) that might run faster than the\\noperating system default.\\n\\n**Data Layout**: How your data is arranged in memory matters;\\n[data-oriented design](https://www.youtube.com/watch?v=yy8jQgmhbAU) and\\n[cache locality](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=1185) can have huge\\nimpacts on performance. The C family of languages (C, value types in C#, C++) and Rust all have\\nguarantees about the shape every object takes in memory that others (e.g. Java and Python) can\'t\\nmake. [Cachegrind](http://valgrind.org/docs/manual/cg-manual.html) and kernel\\n[perf](https://perf.wiki.kernel.org/index.php/Main_Page) counters are both great for understanding\\nhow performance relates to memory layout.\\n\\n**Just-In-Time Compilation**: Languages that are compiled on the fly (LuaJIT, C#, Java, PyPy) are\\ngreat because they optimize your program for how it\'s actually being used, rather than how a\\ncompiler expects it to be used. However, there\'s a variance problem if the program stops executing\\nwhile waiting for translation from VM bytecode to native code. As a remedy, many languages support\\nahead-of-time compilation in addition to the JIT versions\\n([CoreRT](https://github.com/dotnet/corert) in C# and [GraalVM](https://www.graalvm.org/) in Java).\\nOn the other hand, LLVM supports\\n[Profile Guided Optimization](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization),\\nwhich theoretically brings JIT benefits to non-JIT languages. Finally, be careful to avoid comparing\\napples and oranges during benchmarks; you don\'t want your code to suddenly speed up because the JIT\\ncompiler kicked in.\\n\\n**Programming Tricks**: These won\'t make or break performance, but can be useful in specific\\ncircumstances. For example, C++ can use\\n[templates instead of branches](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=1206)\\nin critical sections.\\n\\n## Kernel\\n\\nCode you wrote is almost certainly not the _only_ code running on your hardware. There are many ways\\nthe operating system interacts with your program, from interrupts to system calls, that are\\nimportant to watch for. These are written from a Linux perspective, but Windows does typically have\\nequivalent functionality.\\n\\n**Scheduling**: The kernel is normally free to schedule any process on any core, so it\'s important\\nto reserve CPU cores exclusively for the important programs. There are a few parts to this: first,\\nlimit the CPU cores that non-critical processes are allowed to run on by excluding cores from\\nscheduling\\n([`isolcpus`](https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html)\\nkernel command-line option), or by setting the `init` process CPU affinity\\n([`systemd` example](https://access.redhat.com/solutions/2884991)). Second, set critical processes\\nto run on the isolated cores by setting the\\n[processor affinity](https://en.wikipedia.org/wiki/Processor_affinity) using\\n[taskset](https://linux.die.net/man/1/taskset). Finally, use\\n[`NO_HZ`](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) or\\n[`chrt`](https://linux.die.net/man/1/chrt) to disable scheduling interrupts. Turning off\\nhyper-threading is also likely beneficial.\\n\\n**System calls**: Reading from a UNIX socket? Writing to a file? In addition to not knowing how long\\nthe I/O operation takes, these all trigger expensive\\n[system calls (syscalls)](https://en.wikipedia.org/wiki/System_call). To handle these, the CPU must\\n[context switch](https://en.wikipedia.org/wiki/Context_switch) to the kernel, let the kernel\\noperation complete, then context switch back to your program. We\'d rather keep these\\n[to a minimum](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) (see\\ntimestamp 18:20). [Strace](https://linux.die.net/man/1/strace) is your friend for understanding when\\nand where syscalls happen.\\n\\n**Signal Handling**: Far less likely to be an issue, but signals do trigger a context switch if your\\ncode has a handler registered. This will be highly dependent on the application, but you can\\n[block signals](https://www.linuxprogrammingblog.com/all-about-linux-signals?page=show#Blocking_signals)\\nif it\'s an issue.\\n\\n**Interrupts**: System interrupts are how devices connected to your computer notify the CPU that\\nsomething has happened. The CPU will then choose a processor core to pause and context switch to the\\nOS to handle the interrupt. Make sure that\\n[SMP affinity](http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux) is\\nset so that interrupts are handled on a CPU core not running the program you care about.\\n\\n**[NUMA](https://www.kernel.org/doc/html/latest/vm/numa.html)**: While NUMA is good at making\\nmulti-cell systems transparent, there are variance implications; if the kernel moves a process\\nacross nodes, future memory accesses must wait for the controller on the original node. Use\\n[numactl](https://linux.die.net/man/8/numactl) to handle memory-/cpu-cell pinning so this doesn\'t\\nhappen.\\n\\n## Hardware\\n\\n**CPU Pipelining/Speculation**: Speculative execution in modern processors gave us vulnerabilities\\nlike Spectre, but it also gave us performance improvements like\\n[branch prediction](https://stackoverflow.com/a/11227902/1454178). And if the CPU mis-speculates\\nyour code, there\'s variance associated with rewind and replay. While the compiler knows a lot about\\nhow your CPU [pipelines instructions](https://youtu.be/nAbCKa0FzjQ?t=4467), code can be\\n[structured to help](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=755) the branch\\npredictor.\\n\\n**Paging**: For most systems, virtual memory is incredible. Applications live in their own worlds,\\nand the CPU/[MMU](https://en.wikipedia.org/wiki/Memory_management_unit) figures out the details.\\nHowever, there\'s a variance penalty associated with memory paging and caching; if you access more\\nmemory pages than the [TLB](https://en.wikipedia.org/wiki/Translation_lookaside_buffer) can store,\\nyou\'ll have to wait for the page walk. Kernel perf tools are necessary to figure out if this is an\\nissue, but using [huge pages](https://blog.pythian.com/performance-tuning-hugepages-in-linux/) can\\nreduce TLB burdens. Alternately, running applications in a hypervisor like\\n[Jailhouse](https://github.com/siemens/jailhouse) allows one to skip virtual memory entirely, but\\nthis is probably more work than the benefits are worth.\\n\\n**Network Interfaces**: When more than one computer is involved, variance can go up dramatically.\\nTuning kernel\\n[network parameters](https://github.com/leandromoreira/linux-network-performance-parameters) may be\\nhelpful, but modern systems more frequently opt to skip the kernel altogether with a technique\\ncalled [kernel bypass](https://blog.cloudflare.com/kernel-bypass/). This typically requires\\nspecialized hardware and [drivers](https://www.openonload.org/), but even industries like\\n[telecom](https://www.bbc.co.uk/rd/blog/2018-04-high-speed-networking-open-source-kernel-bypass) are\\nfinding the benefits.\\n\\n## Networks\\n\\n**Routing**: There\'s a reason financial firms are willing to pay\\n[millions of euros](https://sniperinmahwah.wordpress.com/2019/03/26/4-les-moeres-english-version/)\\nfor rights to a small plot of land - having a straight-line connection from point A to point B means\\nthe path their data takes is the shortest possible. In contrast, there are currently 6 computers in\\nbetween me and Google, but that may change at any moment if my ISP realizes a\\n[more efficient route](https://en.wikipedia.org/wiki/Border_Gateway_Protocol) is available. Whether\\nit\'s using\\n[research-quality equipment](https://sniperinmahwah.wordpress.com/2018/05/07/shortwave-trading-part-i-the-west-chicago-tower-mystery/)\\nfor shortwave radio, or just making sure there\'s no data inadvertently going between data centers,\\nrouting matters.\\n\\n**Protocol**: TCP as a network protocol is awesome: guaranteed and in-order delivery, flow control,\\nand congestion control all built in. But these attributes make the most sense when networking\\ninfrastructure is lossy; for systems that expect nearly all packets to be delivered correctly, the\\nsetup handshaking and packet acknowledgment are just overhead. Using UDP (unicast or multicast) may\\nmake sense in these contexts as it avoids the chatter needed to track connection state, and\\n[gap-fill](https://iextrading.com/docs/IEX%20Transport%20Specification.pdf)\\n[strategies](http://www.nasdaqtrader.com/content/technicalsupport/specifications/dataproducts/moldudp64.pdf)\\ncan handle the rest.\\n\\n**Switching**: Many routers/switches handle packets using \\"store-and-forward\\" behavior: wait for the\\nwhole packet, validate checksums, and then send to the next device. In variance terms, the time\\nneeded to move data between two nodes is proportional to the size of that data; the switch must\\n\\"store\\" all data before it can calculate checksums and \\"forward\\" to the next node. With\\n[\\"cut-through\\"](https://www.networkworld.com/article/2241573/latency-and-jitter--cut-through-design-pays-off-for-arista--blade.html)\\ndesigns, switches will begin forwarding data as soon as they know where the destination is,\\nchecksums be damned. This means there\'s a fixed cost (at the switch) for network traffic, no matter\\nthe size.\\n\\n## Final Thoughts\\n\\nHigh-performance systems, regardless of industry, are not magical. They do require extreme precision\\nand attention to detail, but they\'re designed, built, and operated by regular people, using a lot of\\ntools that are publicly available. Interested in seeing how context switching affects performance of\\nyour benchmarks? `taskset` should be installed in all modern Linux distributions, and can be used to\\nmake sure the OS never migrates your process. Curious how often garbage collection triggers during a\\ncrucial operation? Your language of choice will typically expose details of its operations\\n([Python](https://docs.python.org/3/library/gc.html),\\n[Java](https://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#DebuggingOptions)).\\nWant to know how hard your program is stressing the TLB? Use `perf record` and look for\\n`dtlb_load_misses.miss_causes_a_walk`.\\n\\nTwo final guiding questions, then: first, before attempting to apply some of the technology above to\\nyour own systems, can you first identify\\n[where/when you care](http://wiki.c2.com/?PrematureOptimization) about \\"high-performance\\"? As an\\nexample, if parts of a system rely on humans pushing buttons, CPU pinning won\'t have any measurable\\neffect. Humans are already far too slow to react in time. Second, if you\'re using benchmarks, are\\nthey being designed in a way that\'s actually helpful? Tools like\\n[Criterion](http://www.serpentine.com/criterion/) (also in\\n[Rust](https://github.com/bheisler/criterion.rs)) and Google\'s\\n[Benchmark](https://github.com/google/benchmark) output not only average run time, but variance as\\nwell; your benchmarking environment is subject to the same concerns your production environment is.\\n\\nFinally, I believe high-performance systems are a matter of philosophy, not necessarily technique.\\nRigorous focus on variance is the first step, and there are plenty of ways to measure and mitigate\\nit; once that\'s at an acceptable level, then optimize for speed."},{"id":"2019/05/making-bread","metadata":{"permalink":"/2019/05/making-bread","source":"@site/blog/2019-05-03-making-bread/index.mdx","title":"Making bread","description":"Having recently started my \\"gardening leave\\" between positions, I have some more personal time","date":"2019-05-03T12:00:00.000Z","tags":[],"readingTime":1.61,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/05/making-bread","title":"Making bread","date":"2019-05-03T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731207625000,"prevItem":{"title":"On building high performance systems","permalink":"/2019/06/high-performance-systems"},"nextItem":{"title":"Allocations in Rust: Summary","permalink":"/2019/02/summary"}},"content":"Having recently started my \\"gardening leave\\" between positions, I have some more personal time\\navailable. I\'m planning to stay productive, contributing to some open-source projects, but it also\\noccurred to me that despite [talking about](https://speice.io/2018/05/hello.html) bread pics, this\\nblog has been purely technical. Maybe I\'ll change the site title from \\"The Old Speice Guy\\" to \\"Bites\\nand Bytes\\"?\\n\\n\x3c!-- truncate --\x3e\\n\\nEither way, I\'m baking a little bit again, and figured it was worth taking a quick break to focus on\\nsome lighter material. I recently learned two critically important lessons: first, the temperature\\nof the dough when you put the yeast in makes a huge difference.\\n\\nPreviously, when I wasn\'t paying attention to dough temperature:\\n\\n![Whole weat dough](./whole-wheat-not-rising.jpg)\\n\\nCompared with what happens when I put the dough in the microwave for a defrost cycle because the\\nwater I used wasn\'t warm enough:\\n\\n![White dough](./white-dough-rising-before-fold.jpg)\\n\\nI mean, just look at the bubbles!\\n\\n![White dough with bubbles](./white-dough-rising-after-fold.jpg)\\n\\nAfter shaping the dough, I\'ve got two loaves ready:\\n\\n![Shaped loaves](./shaped-loaves.jpg)\\n\\nNow, the recipe normally calls for a Dutch Oven to bake the bread because it keeps the dough from\\ndrying out in the oven. Because I don\'t own a Dutch Oven, I typically put a casserole dish on the\\nbottom rack and fill it with water so there\'s still some moisture in the oven. This time, I forgot\\nto add the water and learned my second lesson: never add room-temperature water to a glass dish\\nthat\'s currently at 500 degrees.\\n\\n![Shattered glass dish](./shattered-glass.jpg)\\n\\nNeedless to say, trying to pull out sharp glass from an incredibly hot oven is not what I expected\\nto be doing during my garden leave.\\n\\nIn the end, the bread crust wasn\'t great, but the bread itself turned out pretty alright:\\n\\n![Baked bread](./final-product.jpg)\\n\\nI\'ve been writing a lot more during this break, so I\'m looking forward to sharing that in the\\nfuture. In the mean-time, I\'m planning on making a sandwich."},{"id":"2019/02/summary","metadata":{"permalink":"/2019/02/summary","source":"@site/blog/2019-02-09-summary/index.mdx","title":"Allocations in Rust: Summary","description":"While there\'s a lot of interesting detail captured in this series, it\'s often helpful to have a","date":"2019-02-09T12:00:00.000Z","tags":[],"readingTime":1.095,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/02/summary","title":"Allocations in Rust: Summary","date":"2019-02-09T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Making bread","permalink":"/2019/05/making-bread"},"nextItem":{"title":"Allocations in Rust: Compiler optimizations","permalink":"/2019/02/08/compiler-optimizations"}},"content":"While there\'s a lot of interesting detail captured in this series, it\'s often helpful to have a\\ndocument that answers some \\"yes/no\\" questions. You may not care about what an `Iterator` looks like\\nin assembly, you just need to know whether it allocates an object on the heap or not. And while Rust\\nwill prioritize the fastest behavior it can, here are the rules for each memory type:\\n\\n\x3c!-- truncate --\x3e\\n\\n**Global Allocation**:\\n\\n- `const` is a fixed value; the compiler is allowed to copy it wherever useful.\\n- `static` is a fixed reference; the compiler will guarantee it is unique.\\n\\n**Stack Allocation**:\\n\\n- Everything not using a smart pointer will be allocated on the stack.\\n- Structs, enums, iterators, arrays, and closures are all stack allocated.\\n- Cell types (`RefCell`) behave like smart pointers, but are stack-allocated.\\n- Inlining (`#[inline]`) will not affect allocation behavior for better or worse.\\n- Types that are marked `Copy` are guaranteed to have their contents stack-allocated.\\n\\n\\n**Heap Allocation**:\\n\\n- Smart pointers (`Box`, `Rc`, `Mutex`, etc.) allocate their contents in heap memory.\\n- Collections (`HashMap`, `Vec`, `String`, etc.) allocate their contents in heap memory.\\n- Some smart pointers in the standard library have counterparts in other crates that don\'t need heap\\n memory. If possible, use those.\\n\\n![Container Sizes in Rust](./container-size.svg)\\n\\n-- [Raph Levien](https://docs.google.com/presentation/d/1q-c7UAyrUlM-eZyTo1pd8SZ0qwA_wYxmPZVOQkoDmH4/edit?usp=sharing)"},{"id":"/2019/02/08/compiler-optimizations","metadata":{"permalink":"/2019/02/08/compiler-optimizations","source":"@site/blog/2019-02-08-compiler-optimizations/index.mdx","title":"Allocations in Rust: Compiler optimizations","description":"A lot. The answer is a lot.","date":"2019-02-08T12:00:00.000Z","tags":[],"readingTime":3.695,"hasTruncateMarker":true,"authors":[],"frontMatter":{"title":"Allocations in Rust: Compiler optimizations","description":"A lot. The answer is a lot.","date":"2019-02-08T12:00:00.000Z","last_updated":{"date":"2019-02-10T12:00:00.000Z"},"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Allocations in Rust: Summary","permalink":"/2019/02/summary"},"nextItem":{"title":"Allocations in Rust: Dynamic memory","permalink":"/2019/02/a-heaping-helping"}},"content":"Up to this point, we\'ve been discussing memory usage in the Rust language by focusing on simple\\nrules that are mostly right for small chunks of code. We\'ve spent time showing how those rules work\\nthemselves out in practice, and become familiar with reading the assembly code needed to see each\\nmemory type (global, stack, heap) in action.\\n\\nThroughout the series so far, we\'ve put a handicap on the code. In the name of consistent and\\nunderstandable results, we\'ve asked the compiler to pretty please leave the training wheels on. Now\\nis the time where we throw out all the rules and take off the kid gloves. As it turns out, both the\\nRust compiler and the LLVM optimizers are incredibly sophisticated, and we\'ll step back and let them\\ndo their job.\\n\\n\x3c!-- truncate --\x3e\\n\\nSimilar to\\n[\\"What Has My Compiler Done For Me Lately?\\"](https://www.youtube.com/watch?v=bSkpMdDe4g4), we\'re\\nfocusing on interesting things the Rust language (and LLVM!) can do with memory management. We\'ll\\nstill be looking at assembly code to understand what\'s going on, but it\'s important to mention\\nagain: **please use automated tools like [alloc-counter](https://crates.io/crates/alloc_counter) to\\ndouble-check memory behavior if it\'s something you care about**. It\'s far too easy to mis-read\\nassembly in large code sections, you should always verify behavior if you care about memory usage.\\n\\nThe guiding principal as we move forward is this: _optimizing compilers won\'t produce worse programs\\nthan we started with._ There won\'t be any situations where stack allocations get moved to heap\\nallocations. There will, however, be an opera of optimization.\\n\\n**Update 2019-02-10**: When debugging a\\n[related issue](https://gitlab.com/sio4/code/alloc-counter/issues/1), it was discovered that the\\noriginal code worked because LLVM optimized out the entire function, rather than just the allocation\\nsegments. The code has been updated with proper use of\\n[`read_volatile`](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html), and a previous section\\non vector capacity has been removed.\\n\\n## The Case of the Disappearing Box\\n\\nOur first optimization comes when LLVM can reason that the lifetime of an object is sufficiently\\nshort that heap allocations aren\'t necessary. In these cases, LLVM will move the allocation to the\\nstack instead! The way this interacts with `#[inline]` attributes is a bit opaque, but the important\\npart is that LLVM can sometimes do better than the baseline Rust language:\\n\\n```rust\\nuse std::alloc::{GlobalAlloc, Layout, System};\\nuse std::sync::atomic::{AtomicBool, Ordering};\\n\\npub fn cmp(x: u32) {\\n // Turn on panicking if we allocate on the heap\\n DO_PANIC.store(true, Ordering::SeqCst);\\n\\n // The compiler is able to see through the constant `Box`\\n // and directly compare `x` to 24 - assembly line 73\\n let y = Box::new(24);\\n let equals = x == *y;\\n\\n // This call to drop is eliminated\\n drop(y);\\n\\n // Need to mark the comparison result as volatile so that\\n // LLVM doesn\'t strip out all the code. If `y` is marked\\n // volatile instead, allocation will be forced.\\n unsafe { std::ptr::read_volatile(&equals) };\\n\\n // Turn off panicking, as there are some deallocations\\n // when we exit main.\\n DO_PANIC.store(false, Ordering::SeqCst);\\n}\\n\\nfn main() {\\n cmp(12)\\n}\\n\\n#[global_allocator]\\nstatic A: PanicAllocator = PanicAllocator;\\nstatic DO_PANIC: AtomicBool = AtomicBool::new(false);\\nstruct PanicAllocator;\\n\\nunsafe impl GlobalAlloc for PanicAllocator {\\n unsafe fn alloc(&self, layout: Layout) -> *mut u8 {\\n if DO_PANIC.load(Ordering::SeqCst) {\\n panic!(\\"Unexpected allocation.\\");\\n }\\n System.alloc(layout)\\n }\\n\\n unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {\\n if DO_PANIC.load(Ordering::SeqCst) {\\n panic!(\\"Unexpected deallocation.\\");\\n }\\n System.dealloc(ptr, layout);\\n }\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/BZ_Yp3)\\n\\n-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4a765f753183d5b919f62c71d2109d5d)\\n\\n## Dr. Array or: how I learned to love the optimizer\\n\\nFinally, this isn\'t so much about LLVM figuring out different memory behavior, but LLVM stripping\\nout code that doesn\'t do anything. Optimizations of this type have a lot of nuance to them; if\\nyou\'re not careful, they can make your benchmarks look\\n[impossibly good](https://www.youtube.com/watch?v=nXaxk27zwlk&feature=youtu.be&t=1199). In Rust, the\\n`black_box` function (implemented in both\\n[`libtest`](https://doc.rust-lang.org/1.1.0/test/fn.black_box.html) and\\n[`criterion`](https://docs.rs/criterion/0.2.10/criterion/fn.black_box.html)) will tell the compiler\\nto disable this kind of optimization. But if you let LLVM remove unnecessary code, you can end up\\nrunning programs that previously caused errors:\\n\\n```rust\\n#[derive(Default)]\\nstruct TwoFiftySix {\\n _a: [u64; 32]\\n}\\n\\n#[derive(Default)]\\nstruct EightK {\\n _a: [TwoFiftySix; 32]\\n}\\n\\n#[derive(Default)]\\nstruct TwoFiftySixK {\\n _a: [EightK; 32]\\n}\\n\\n#[derive(Default)]\\nstruct EightM {\\n _a: [TwoFiftySixK; 32]\\n}\\n\\npub fn main() {\\n // Normally this blows up because we can\'t reserve size on stack\\n // for the `EightM` struct. But because the compiler notices we\\n // never do anything with `_x`, it optimizes out the stack storage\\n // and the program completes successfully.\\n let _x = EightM::default();\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/daHn7P)\\n\\n-- [Rust Playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=4c253bf26072119896ab93c6ef064dc0)"},{"id":"2019/02/a-heaping-helping","metadata":{"permalink":"/2019/02/a-heaping-helping","source":"@site/blog/2019-02-07-a-heaping-helping/index.mdx","title":"Allocations in Rust: Dynamic memory","description":"Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and","date":"2019-02-07T12:00:00.000Z","tags":[],"readingTime":5.86,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/02/a-heaping-helping","title":"Allocations in Rust: Dynamic memory","date":"2019-02-07T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Allocations in Rust: Compiler optimizations","permalink":"/2019/02/08/compiler-optimizations"},"nextItem":{"title":"Allocations in Rust: Fixed memory","permalink":"/2019/02/stacking-up"}},"content":"Managing dynamic memory is hard. Some languages assume users will do it themselves (C, C++), and\\nsome languages go to extreme lengths to protect users from themselves (Java, Python). In Rust, how\\nthe language uses dynamic memory (also referred to as the **heap**) is a system called _ownership_.\\nAnd as the docs mention, ownership\\n[is Rust\'s most unique feature](https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html).\\n\\nThe heap is used in two situations; when the compiler is unable to predict either the _total size of\\nmemory needed_, or _how long the memory is needed for_, it allocates space in the heap.\\n\\n\x3c!-- truncate --\x3e\\n\\nThis happens\\npretty frequently; if you want to download the Google home page, you won\'t know how large it is\\nuntil your program runs. And when you\'re finished with Google, we deallocate the memory so it can be\\nused to store other webpages. If you\'re interested in a slightly longer explanation of the heap,\\ncheck out\\n[The Stack and the Heap](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html#the-stack-and-the-heap)\\nin Rust\'s documentation.\\n\\nWe won\'t go into detail on how the heap is managed; the\\n[ownership documentation](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) does a\\nphenomenal job explaining both the \\"why\\" and \\"how\\" of memory management. Instead, we\'re going to\\nfocus on understanding \\"when\\" heap allocations occur in Rust.\\n\\nTo start off, take a guess for how many allocations happen in the program below:\\n\\n```rust\\nfn main() {}\\n```\\n\\nIt\'s obviously a trick question; while no heap allocations occur as a result of that code, the setup\\nneeded to call `main` does allocate on the heap. Here\'s a way to show it:\\n\\n```rust\\n#![feature(integer_atomics)]\\nuse std::alloc::{GlobalAlloc, Layout, System};\\nuse std::sync::atomic::{AtomicU64, Ordering};\\n\\nstatic ALLOCATION_COUNT: AtomicU64 = AtomicU64::new(0);\\n\\nstruct CountingAllocator;\\n\\nunsafe impl GlobalAlloc for CountingAllocator {\\n unsafe fn alloc(&self, layout: Layout) -> *mut u8 {\\n ALLOCATION_COUNT.fetch_add(1, Ordering::SeqCst);\\n System.alloc(layout)\\n }\\n\\n unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {\\n System.dealloc(ptr, layout);\\n }\\n}\\n\\n#[global_allocator]\\nstatic A: CountingAllocator = CountingAllocator;\\n\\nfn main() {\\n let x = ALLOCATION_COUNT.fetch_add(0, Ordering::SeqCst);\\n println!(\\"There were {} allocations before calling main!\\", x);\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=fb5060025ba79fc0f906b65a4ef8eb8e)\\n\\nAs of the time of writing, there are five allocations that happen before `main` is ever called.\\n\\nBut when we want to understand more practically where heap allocation happens, we\'ll follow this\\nguide:\\n\\n- Smart pointers hold their contents in the heap\\n- Collections are smart pointers for many objects at a time, and reallocate when they need to grow\\n\\nFinally, there are two \\"addendum\\" issues that are important to address when discussing Rust and the\\nheap:\\n\\n- Non-heap alternatives to many standard library types are available.\\n- Special allocators to track memory behavior should be used to benchmark code.\\n\\n## Smart pointers\\n\\nThe first thing to note are the \\"smart pointer\\" types. When you have data that must outlive the\\nscope in which it is declared, or your data is of unknown or dynamic size, you\'ll make use of these\\ntypes.\\n\\nThe term [smart pointer](https://en.wikipedia.org/wiki/Smart_pointer) comes from C++, and while it\'s\\nclosely linked to a general design pattern of\\n[\\"Resource Acquisition Is Initialization\\"](https://en.cppreference.com/w/cpp/language/raii), we\'ll\\nuse it here specifically to describe objects that are responsible for managing ownership of data\\nallocated on the heap. The smart pointers available in the `alloc` crate should look mostly\\nfamiliar:\\n\\n- [`Box`](https://doc.rust-lang.org/alloc/boxed/struct.Box.html)\\n- [`Rc`](https://doc.rust-lang.org/alloc/rc/struct.Rc.html)\\n- [`Arc`](https://doc.rust-lang.org/alloc/sync/struct.Arc.html)\\n- [`Cow`](https://doc.rust-lang.org/alloc/borrow/enum.Cow.html)\\n\\nThe [standard library](https://doc.rust-lang.org/std/) also defines some smart pointers to manage\\nheap objects, though more than can be covered here. Some examples are:\\n\\n- [`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html)\\n- [`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)\\n\\nFinally, there is one [\\"gotcha\\"](https://www.merriam-webster.com/dictionary/gotcha): **cell types**\\n(like [`RefCell`](https://doc.rust-lang.org/stable/core/cell/struct.RefCell.html)) look and behave\\nsimilarly, but **don\'t involve heap allocation**. The\\n[`core::cell` docs](https://doc.rust-lang.org/stable/core/cell/index.html) have more information.\\n\\nWhen a smart pointer is created, the data it is given is placed in heap memory and the location of\\nthat data is recorded in the smart pointer. Once the smart pointer has determined it\'s safe to\\ndeallocate that memory (when a `Box` has\\n[gone out of scope](https://doc.rust-lang.org/stable/std/boxed/index.html) or a reference count\\n[goes to zero](https://doc.rust-lang.org/alloc/rc/index.html)), the heap space is reclaimed. We can\\nprove these types use heap memory by looking at code:\\n\\n```rust\\nuse std::rc::Rc;\\nuse std::sync::Arc;\\nuse std::borrow::Cow;\\n\\npub fn my_box() {\\n // Drop at assembly line 1640\\n Box::new(0);\\n}\\n\\npub fn my_rc() {\\n // Drop at assembly line 1650\\n Rc::new(0);\\n}\\n\\npub fn my_arc() {\\n // Drop at assembly line 1660\\n Arc::new(0);\\n}\\n\\npub fn my_cow() {\\n // Drop at assembly line 1672\\n Cow::from(\\"drop\\");\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/4AMQug)\\n\\n## Collections\\n\\nCollection types use heap memory because their contents have dynamic size; they will request more\\nmemory [when needed](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve), and can\\n[release memory](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.shrink_to_fit) when it\'s\\nno longer necessary. This dynamic property forces Rust to heap allocate everything they contain. In\\na way, **collections are smart pointers for many objects at a time**. Common types that fall under\\nthis umbrella are [`Vec`](https://doc.rust-lang.org/stable/alloc/vec/struct.Vec.html),\\n[`HashMap`](https://doc.rust-lang.org/stable/std/collections/struct.HashMap.html), and\\n[`String`](https://doc.rust-lang.org/stable/alloc/string/struct.String.html) (not\\n[`str`](https://doc.rust-lang.org/std/primitive.str.html)).\\n\\nWhile collections store the objects they own in heap memory, _creating new collections will not\\nallocate on the heap_. This is a bit weird; if we call `Vec::new()`, the assembly shows a\\ncorresponding call to `real_drop_in_place`:\\n\\n```rust\\npub fn my_vec() {\\n // Drop in place at line 481\\n Vec::::new();\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/1WkNtC)\\n\\nBut because the vector has no elements to manage, no calls to the allocator will ever be dispatched:\\n\\n```rust\\nuse std::alloc::{GlobalAlloc, Layout, System};\\nuse std::sync::atomic::{AtomicBool, Ordering};\\n\\nfn main() {\\n // Turn on panicking if we allocate on the heap\\n DO_PANIC.store(true, Ordering::SeqCst);\\n\\n // Interesting bit happens here\\n let x: Vec = Vec::new();\\n drop(x);\\n\\n // Turn panicking back off, some deallocations occur\\n // after main as well.\\n DO_PANIC.store(false, Ordering::SeqCst);\\n}\\n\\n#[global_allocator]\\nstatic A: PanicAllocator = PanicAllocator;\\nstatic DO_PANIC: AtomicBool = AtomicBool::new(false);\\nstruct PanicAllocator;\\n\\nunsafe impl GlobalAlloc for PanicAllocator {\\n unsafe fn alloc(&self, layout: Layout) -> *mut u8 {\\n if DO_PANIC.load(Ordering::SeqCst) {\\n panic!(\\"Unexpected allocation.\\");\\n }\\n System.alloc(layout)\\n }\\n\\n unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {\\n if DO_PANIC.load(Ordering::SeqCst) {\\n panic!(\\"Unexpected deallocation.\\");\\n }\\n System.dealloc(ptr, layout);\\n }\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=831a297d176d015b1f9ace01ae416cc6)\\n\\nOther standard library types follow the same behavior; make sure to check out\\n[`HashMap::new()`](https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.new),\\nand [`String::new()`](https://doc.rust-lang.org/std/string/struct.String.html#method.new).\\n\\n## Heap Alternatives\\n\\nWhile it is a bit strange to speak of the stack after spending time with the heap, it\'s worth\\npointing out that some heap-allocated objects in Rust have stack-based counterparts provided by\\nother crates. If you have need of the functionality, but want to avoid allocating, there are\\ntypically alternatives available.\\n\\nWhen it comes to some standard library smart pointers\\n([`RwLock`](https://doc.rust-lang.org/std/sync/struct.RwLock.html) and\\n[`Mutex`](https://doc.rust-lang.org/std/sync/struct.Mutex.html)), stack-based alternatives are\\nprovided in crates like [parking_lot](https://crates.io/crates/parking_lot) and\\n[spin](https://crates.io/crates/spin). You can check out\\n[`lock_api::RwLock`](https://docs.rs/lock_api/0.1.5/lock_api/struct.RwLock.html),\\n[`lock_api::Mutex`](https://docs.rs/lock_api/0.1.5/lock_api/struct.Mutex.html), and\\n[`spin::Once`](https://mvdnes.github.io/rust-docs/spin-rs/spin/struct.Once.html) if you\'re in need\\nof synchronization primitives.\\n\\n[thread_id](https://crates.io/crates/thread-id) may be necessary if you\'re implementing an allocator\\nbecause [`thread::current().id()`](https://doc.rust-lang.org/std/thread/struct.ThreadId.html) uses a\\n[`thread_local!` structure](https://doc.rust-lang.org/stable/src/std/sys_common/thread_info.rs.html#17-36)\\nthat needs heap allocation.\\n\\n## Tracing Allocators\\n\\nWhen writing performance-sensitive code, there\'s no alternative to measuring your code. If you\\ndidn\'t write a benchmark,\\n[you don\'t care about it\'s performance](https://www.youtube.com/watch?v=2EWejmkKlxs&feature=youtu.be&t=263)\\nYou should never rely on your instincts when\\n[a microsecond is an eternity](https://www.youtube.com/watch?v=NH1Tta7purM).\\n\\nSimilarly, there\'s great work going on in Rust with allocators that keep track of what they\'re doing\\n(like [`alloc_counter`](https://crates.io/crates/alloc_counter)). When it comes to tracking heap\\nbehavior, it\'s easy to make mistakes; please write tests and make sure you have tools to guard\\nagainst future issues."},{"id":"2019/02/stacking-up","metadata":{"permalink":"/2019/02/stacking-up","source":"@site/blog/2019-02-06-stacking-up/index.mdx","title":"Allocations in Rust: Fixed memory","description":"const and static are perfectly fine, but it\'s relatively rare that we know at compile-time about","date":"2019-02-06T12:00:00.000Z","tags":[],"readingTime":15.165,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/02/stacking-up","title":"Allocations in Rust: Fixed memory","date":"2019-02-06T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Allocations in Rust: Dynamic memory","permalink":"/2019/02/a-heaping-helping"},"nextItem":{"title":"Allocations in Rust: Global memory","permalink":"/2019/02/the-whole-world"}},"content":"`const` and `static` are perfectly fine, but it\'s relatively rare that we know at compile-time about\\neither values or references that will be the same for the duration of our program. Put another way,\\nit\'s not often the case that either you or your compiler knows how much memory your entire program\\nwill ever need.\\n\\nHowever, there are still some optimizations the compiler can do if it knows how much memory\\nindividual functions will need. Specifically, the compiler can make use of \\"stack\\" memory (as\\nopposed to \\"heap\\" memory) which can be managed far faster in both the short- and long-term.\\n\\n\x3c!-- truncate --\x3e\\n\\nWhen requesting memory, the [`push` instruction](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html)\\ncan typically complete in [1 or 2 cycles](https://agner.org/optimize/instruction_tables.ods) (<1ns\\non modern CPUs). Contrast that to heap memory which requires an allocator (specialized\\nsoftware to track what memory is in use) to reserve space. When you\'re finished with stack memory,\\nthe `pop` instruction runs in 1-3 cycles, as opposed to an allocator needing to worry about memory\\nfragmentation and other issues with the heap. All sorts of incredibly sophisticated techniques have\\nbeen used to design allocators:\\n\\n- [Garbage Collection]()\\n strategies like [Tracing](https://en.wikipedia.org/wiki/Tracing_garbage_collection) (used in\\n [Java](https://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)) and\\n [Reference counting](https://en.wikipedia.org/wiki/Reference_counting) (used in\\n [Python](https://docs.python.org/3/extending/extending.html#reference-counts))\\n- Thread-local structures to prevent locking the allocator in\\n [tcmalloc](https://jamesgolick.com/2013/5/19/how-tcmalloc-works.html)\\n- Arena structures used in [jemalloc](http://jemalloc.net/), which\\n [until recently](https://blog.rust-lang.org/2019/01/17/Rust-1.32.0.html#jemalloc-is-removed-by-default)\\n was the primary allocator for Rust programs!\\n\\nBut no matter how fast your allocator is, the principle remains: the fastest allocator is the one\\nyou never use. As such, we\'re not going to discuss how exactly the\\n[`push` and `pop` instructions work](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html), but\\nwe\'ll focus instead on the conditions that enable the Rust compiler to use faster stack-based\\nallocation for variables.\\n\\nSo, **how do we know when Rust will or will not use stack allocation for objects we create?**\\nLooking at other languages, it\'s often easy to delineate between stack and heap. Managed memory\\nlanguages (Python, Java,\\n[C#](https://blogs.msdn.microsoft.com/ericlippert/2010/09/30/the-truth-about-value-types/)) place\\neverything on the heap. JIT compilers ([PyPy](https://www.pypy.org/),\\n[HotSpot](https://www.oracle.com/technetwork/java/javase/tech/index-jsp-136373.html)) may optimize\\nsome heap allocations away, but you should never assume it will happen. C makes things clear with\\ncalls to special functions (like [malloc(3)](https://linux.die.net/man/3/malloc)) needed to access\\nheap memory. Old C++ has the [`new`](https://stackoverflow.com/a/655086/1454178) keyword, though\\nmodern C++/C++11 is more complicated with [RAII](https://en.cppreference.com/w/cpp/language/raii).\\n\\nFor Rust, we can summarize as follows: **stack allocation will be used for everything that doesn\'t\\ninvolve \\"smart pointers\\" and collections**. We\'ll skip over a precise definition of the term \\"smart\\npointer\\" for now, and instead discuss what we should watch for to understand when stack and heap\\nmemory regions are used:\\n\\n1. Stack manipulation instructions (`push`, `pop`, and `add`/`sub` of the `rsp` register) indicate\\n allocation of stack memory:\\n\\n ```rust\\n pub fn stack_alloc(x: u32) -> u32 {\\n // Space for `y` is allocated by subtracting from `rsp`,\\n // and then populated\\n let y = [1u8, 2, 3, 4];\\n // Space for `y` is deallocated by adding back to `rsp`\\n x\\n }\\n ```\\n\\n -- [Compiler Explorer](https://godbolt.org/z/5WSgc9)\\n\\n2. Tracking when exactly heap allocation calls occur is difficult. It\'s typically easier to watch\\n for `call core::ptr::real_drop_in_place`, and infer that a heap allocation happened in the recent\\n past:\\n\\n ```rust\\n pub fn heap_alloc(x: usize) -> usize {\\n // Space for elements in a vector has to be allocated\\n // on the heap, and is then de-allocated once the\\n // vector goes out of scope\\n let y: Vec = Vec::with_capacity(x);\\n x\\n }\\n ```\\n\\n -- [Compiler Explorer](https://godbolt.org/z/epfgoQ) (`real_drop_in_place` happens on line 1317)\\n Note: While the\\n [`Drop` trait](https://doc.rust-lang.org/std/ops/trait.Drop.html) is\\n [called for stack-allocated objects](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=87edf374d8983816eb3d8cfeac657b46),\\n the Rust standard library only defines `Drop` implementations for types that involve heap\\n allocation.\\n\\n3. If you don\'t want to inspect the assembly, use a custom allocator that\'s able to track and alert\\n when heap allocations occur. Crates like\\n [`alloc_counter`](https://crates.io/crates/alloc_counter) are designed for exactly this purpose.\\n\\nWith all that in mind, let\'s talk about situations in which we\'re guaranteed to use stack memory:\\n\\n- Structs are created on the stack.\\n- Function arguments are passed on the stack, meaning the\\n [`#[inline]` attribute](https://doc.rust-lang.org/reference/attributes.html#inline-attribute) will\\n not change the memory region used.\\n- Enums and unions are stack-allocated.\\n- [Arrays](https://doc.rust-lang.org/std/primitive.array.html) are always stack-allocated.\\n- Closures capture their arguments on the stack.\\n- Generics will use stack allocation, even with dynamic dispatch.\\n- [`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html) types are guaranteed to be\\n stack-allocated, and copying them will be done in stack memory.\\n- [`Iterator`s](https://doc.rust-lang.org/std/iter/trait.Iterator.html) in the standard library are\\n stack-allocated even when iterating over heap-based collections.\\n\\n## Structs\\n\\nThe simplest case comes first. When creating vanilla `struct` objects, we use stack memory to hold\\ntheir contents:\\n\\n```rust\\nstruct Point {\\n x: u64,\\n y: u64,\\n}\\n\\nstruct Line {\\n a: Point,\\n b: Point,\\n}\\n\\npub fn make_line() {\\n // `origin` is stored in the first 16 bytes of memory\\n // starting at location `rsp`\\n let origin = Point { x: 0, y: 0 };\\n // `point` makes up the next 16 bytes of memory\\n let point = Point { x: 1, y: 2 };\\n\\n // When creating `ray`, we just move the content out of\\n // `origin` and `point` into the next 32 bytes of memory\\n let ray = Line { a: origin, b: point };\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/vri9BE)\\n\\nNote that while some extra-fancy instructions are used for memory manipulation in the assembly, the\\n`sub rsp, 64` instruction indicates we\'re still working with the stack.\\n\\n## Function arguments\\n\\nHave you ever wondered how functions communicate with each other? Like, once the variables are given\\nto you, everything\'s fine. But how do you \\"give\\" those variables to another function? How do you get\\nthe results back afterward? The answer: the compiler arranges memory and assembly instructions using\\na pre-determined [calling convention](http://llvm.org/docs/LangRef.html#calling-conventions). This\\nconvention governs the rules around where arguments needed by a function will be located (either in\\nmemory offsets relative to the stack pointer `rsp`, or in other registers), and where the results\\ncan be found once the function has finished. And when multiple languages agree on what the calling\\nconventions are, you can do things like having [Go call Rust code](https://blog.filippo.io/rustgo/)!\\n\\nPut simply: it\'s the compiler\'s job to figure out how to call other functions, and you can assume\\nthat the compiler is good at its job.\\n\\nWe can see this in action using a simple example:\\n\\n```rust\\nstruct Point {\\n x: i64,\\n y: i64,\\n}\\n\\n// We use integer division operations to keep\\n// the assembly clean, understanding the result\\n// isn\'t accurate.\\nfn distance(a: &Point, b: &Point) -> i64 {\\n // Immediately subtract from `rsp` the bytes needed\\n // to hold all the intermediate results - this is\\n // the stack allocation step\\n\\n // The compiler used the `rdi` and `rsi` registers\\n // to pass our arguments, so read them in\\n let x1 = a.x;\\n let x2 = b.x;\\n let y1 = a.y;\\n let y2 = b.y;\\n\\n // Do the actual math work\\n let x_pow = (x1 - x2) * (x1 - x2);\\n let y_pow = (y1 - y2) * (y1 - y2);\\n let squared = x_pow + y_pow;\\n squared / squared\\n\\n // Our final result will be stored in the `rax` register\\n // so that our caller knows where to retrieve it.\\n // Finally, add back to `rsp` the stack memory that is\\n // now ready to be used by other functions.\\n}\\n\\npub fn total_distance() {\\n let start = Point { x: 1, y: 2 };\\n let middle = Point { x: 3, y: 4 };\\n let end = Point { x: 5, y: 6 };\\n\\n let _dist_1 = distance(&start, &middle);\\n let _dist_2 = distance(&middle, &end);\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/Qmx4ST)\\n\\nAs a consequence of function arguments never using heap memory, we can also infer that functions\\nusing the `#[inline]` attributes also do not heap allocate. But better than inferring, we can look\\nat the assembly to prove it:\\n\\n```rust\\nstruct Point {\\n x: i64,\\n y: i64,\\n}\\n\\n// Note that there is no `distance` function in the assembly output,\\n// and the total line count goes from 229 with inlining off\\n// to 306 with inline on. Even still, no heap allocations occur.\\n#[inline(always)]\\nfn distance(a: &Point, b: &Point) -> i64 {\\n let x1 = a.x;\\n let x2 = b.x;\\n let y1 = a.y;\\n let y2 = b.y;\\n\\n let x_pow = (a.x - b.x) * (a.x - b.x);\\n let y_pow = (a.y - b.y) * (a.y - b.y);\\n let squared = x_pow + y_pow;\\n squared / squared\\n}\\n\\npub fn total_distance() {\\n let start = Point { x: 1, y: 2 };\\n let middle = Point { x: 3, y: 4 };\\n let end = Point { x: 5, y: 6 };\\n\\n let _dist_1 = distance(&start, &middle);\\n let _dist_2 = distance(&middle, &end);\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/30Sh66)\\n\\nFinally, passing by value (arguments with type\\n[`Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html)) and passing by reference (either\\nmoving ownership or passing a pointer) may have slightly different layouts in assembly, but will\\nstill use either stack memory or CPU registers:\\n\\n```rust\\npub struct Point {\\n x: i64,\\n y: i64,\\n}\\n\\n// Moving values\\npub fn distance_moved(a: Point, b: Point) -> i64 {\\n let x1 = a.x;\\n let x2 = b.x;\\n let y1 = a.y;\\n let y2 = b.y;\\n\\n let x_pow = (x1 - x2) * (x1 - x2);\\n let y_pow = (y1 - y2) * (y1 - y2);\\n let squared = x_pow + y_pow;\\n squared / squared\\n}\\n\\n// Borrowing values has two extra `mov` instructions on lines 21 and 22\\npub fn distance_borrowed(a: &Point, b: &Point) -> i64 {\\n let x1 = a.x;\\n let x2 = b.x;\\n let y1 = a.y;\\n let y2 = b.y;\\n\\n let x_pow = (x1 - x2) * (x1 - x2);\\n let y_pow = (y1 - y2) * (y1 - y2);\\n let squared = x_pow + y_pow;\\n squared / squared\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/06hGiv)\\n\\n## Enums\\n\\nIf you\'ve ever worried that wrapping your types in\\n[`Option`](https://doc.rust-lang.org/stable/core/option/enum.Option.html) or\\n[`Result`](https://doc.rust-lang.org/stable/core/result/enum.Result.html) would finally make them\\nlarge enough that Rust decides to use heap allocation instead, fear no longer: `enum` and union\\ntypes don\'t use heap allocation:\\n\\n```rust\\nenum MyEnum {\\n Small(u8),\\n Large(u64)\\n}\\n\\nstruct MyStruct {\\n x: MyEnum,\\n y: MyEnum,\\n}\\n\\npub fn enum_compare() {\\n let x = MyEnum::Small(0);\\n let y = MyEnum::Large(0);\\n\\n let z = MyStruct { x, y };\\n\\n let opt = Option::Some(z);\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/HK7zBx)\\n\\nBecause the size of an `enum` is the size of its largest element plus a flag, the compiler can\\npredict how much memory is used no matter which variant of an enum is currently stored in a\\nvariable. Thus, enums and unions have no need of heap allocation. There\'s unfortunately not a great\\nway to show this in assembly, so I\'ll instead point you to the\\n[`core::mem::size_of`](https://doc.rust-lang.org/stable/core/mem/fn.size_of.html#size-of-enums)\\ndocumentation.\\n\\n## Arrays\\n\\nThe array type is guaranteed to be stack allocated, which is why the array size must be declared.\\nInterestingly enough, this can be used to cause safe Rust programs to crash:\\n\\n```rust\\n// 256 bytes\\n#[derive(Default)]\\nstruct TwoFiftySix {\\n _a: [u64; 32]\\n}\\n\\n// 8 kilobytes\\n#[derive(Default)]\\nstruct EightK {\\n _a: [TwoFiftySix; 32]\\n}\\n\\n// 256 kilobytes\\n#[derive(Default)]\\nstruct TwoFiftySixK {\\n _a: [EightK; 32]\\n}\\n\\n// 8 megabytes - exceeds space typically provided for the stack,\\n// though the kernel can be instructed to allocate more.\\n// On Linux, you can check stack size using `ulimit -s`\\n#[derive(Default)]\\nstruct EightM {\\n _a: [TwoFiftySixK; 32]\\n}\\n\\nfn main() {\\n // Because we already have things in stack memory\\n // (like the current function call stack), allocating another\\n // eight megabytes of stack memory crashes the program\\n let _x = EightM::default();\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=587a6380a4914bcbcef4192c90c01dc4)\\n\\nThere aren\'t any security implications of this (no memory corruption occurs), but it\'s good to note\\nthat the Rust compiler won\'t move arrays into heap memory even if they can be reasonably expected to\\noverflow the stack.\\n\\n## Closures\\n\\nRules for how anonymous functions capture their arguments are typically language-specific. In Java,\\n[Lambda Expressions](https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html) are\\nactually objects created on the heap that capture local primitives by copying, and capture local\\nnon-primitives as (`final`) references.\\n[Python](https://docs.python.org/3.7/reference/expressions.html#lambda) and\\n[JavaScript](https://javascriptweblog.wordpress.com/2010/10/25/understanding-javascript-closures/)\\nboth bind _everything_ by reference normally, but Python can also\\n[capture values](https://stackoverflow.com/a/235764/1454178) and JavaScript has\\n[Arrow functions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/Arrow_functions).\\n\\nIn Rust, arguments to closures are the same as arguments to other functions; closures are simply\\nfunctions that don\'t have a declared name. Some weird ordering of the stack may be required to\\nhandle them, but it\'s the compiler\'s responsiblity to figure that out.\\n\\nEach example below has the same effect, but a different assembly implementation. In the simplest\\ncase, we immediately run a closure returned by another function. Because we don\'t store a reference\\nto the closure, the stack memory needed to store the captured values is contiguous:\\n\\n```rust\\nfn my_func() -> impl FnOnce() {\\n let x = 24;\\n // Note that this closure in assembly looks exactly like\\n // any other function; you even use the `call` instruction\\n // to start running it.\\n move || { x; }\\n}\\n\\npub fn immediate() {\\n my_func()();\\n my_func()();\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/mgJ2zl), 25 total assembly instructions\\n\\nIf we store a reference to the closure, the Rust compiler keeps values it needs in the stack memory\\nof the original function. Getting the details right is a bit harder, so the instruction count goes\\nup even though this code is functionally equivalent to our original example:\\n\\n```rust\\npub fn simple_reference() {\\n let x = my_func();\\n let y = my_func();\\n y();\\n x();\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/K_dj5n), 55 total assembly instructions\\n\\nEven things like variable order can make a difference in instruction count:\\n\\n```rust\\npub fn complex() {\\n let x = my_func();\\n let y = my_func();\\n x();\\n y();\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/p37qFl), 70 total assembly instructions\\n\\nIn every circumstance though, the compiler ensured that no heap allocations were necessary.\\n\\n## Generics\\n\\nTraits in Rust come in two broad forms: static dispatch (monomorphization, `impl Trait`) and dynamic\\ndispatch (trait objects, `dyn Trait`). While dynamic dispatch is often _associated_ with trait\\nobjects being stored in the heap, dynamic dispatch can be used with stack allocated objects as well:\\n\\n```rust\\ntrait GetInt {\\n fn get_int(&self) -> u64;\\n}\\n\\n// vtable stored at section L__unnamed_1\\nstruct WhyNotU8 {\\n x: u8\\n}\\nimpl GetInt for WhyNotU8 {\\n fn get_int(&self) -> u64 {\\n self.x as u64\\n }\\n}\\n\\n// vtable stored at section L__unnamed_2\\nstruct ActualU64 {\\n x: u64\\n}\\nimpl GetInt for ActualU64 {\\n fn get_int(&self) -> u64 {\\n self.x\\n }\\n}\\n\\n// `&dyn` declares that we want to use dynamic dispatch\\n// rather than monomorphization, so there is only one\\n// `retrieve_int` function that shows up in the final assembly.\\n// If we used generics, there would be one implementation of\\n// `retrieve_int` for each type that implements `GetInt`.\\npub fn retrieve_int(u: &dyn GetInt) {\\n // In the assembly, we just call an address given to us\\n // in the `rsi` register and hope that it was set up\\n // correctly when this function was invoked.\\n let x = u.get_int();\\n}\\n\\npub fn do_call() {\\n // Note that even though the vtable for `WhyNotU8` and\\n // `ActualU64` includes a pointer to\\n // `core::ptr::real_drop_in_place`, it is never invoked.\\n let a = WhyNotU8 { x: 0 };\\n let b = ActualU64 { x: 0 };\\n\\n retrieve_int(&a);\\n retrieve_int(&b);\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/u_yguS)\\n\\nIt\'s hard to imagine practical situations where dynamic dispatch would be used for objects that\\naren\'t heap allocated, but it technically can be done.\\n\\n## Copy types\\n\\nUnderstanding move semantics and copy semantics in Rust is weird at first. The Rust docs\\n[go into detail](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html) far better than can\\nbe addressed here, so I\'ll leave them to do the job. From a memory perspective though, their\\nguideline is reasonable:\\n[if your type can implemement `Copy`, it should](https://doc.rust-lang.org/stable/core/marker/trait.Copy.html#when-should-my-type-be-copy).\\nWhile there are potential speed tradeoffs to _benchmark_ when discussing `Copy` (move semantics for\\nstack objects vs. copying stack pointers vs. copying stack `struct`s), _it\'s impossible for `Copy`\\nto introduce a heap allocation_.\\n\\nBut why is this the case? Fundamentally, it\'s because the language controls what `Copy` means -\\n[\\"the behavior of `Copy` is not overloadable\\"](https://doc.rust-lang.org/std/marker/trait.Copy.html#whats-the-difference-between-copy-and-clone)\\nbecause it\'s a marker trait. From there we\'ll note that a type\\n[can implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#when-can-my-type-be-copy)\\nif (and only if) its components implement `Copy`, and that\\n[no heap-allocated types implement `Copy`](https://doc.rust-lang.org/std/marker/trait.Copy.html#implementors).\\nThus, assignments involving heap types are always move semantics, and new heap allocations won\'t\\noccur because of implicit operator behavior.\\n\\n```rust\\n#[derive(Clone)]\\nstruct Cloneable {\\n x: Box\\n}\\n\\n// error[E0204]: the trait `Copy` may not be implemented for this type\\n#[derive(Copy, Clone)]\\nstruct NotCopyable {\\n x: Box\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/VToRuK)\\n\\n## Iterators\\n\\nIn managed memory languages (like\\n[Java](https://www.youtube.com/watch?v=bSkpMdDe4g4&feature=youtu.be&t=357)), there\'s a subtle\\ndifference between these two code samples:\\n\\n```java\\npublic static int sum_for(List vals) {\\n long sum = 0;\\n // Regular for loop\\n for (int i = 0; i < vals.length; i++) {\\n sum += vals[i];\\n }\\n return sum;\\n}\\n\\npublic static int sum_foreach(List vals) {\\n long sum = 0;\\n // \\"Foreach\\" loop - uses iteration\\n for (Long l : vals) {\\n sum += l;\\n }\\n return sum;\\n}\\n```\\n\\nIn the `sum_for` function, nothing terribly interesting happens. In `sum_foreach`, an object of type\\n[`Iterator`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Iterator.html)\\nis allocated on the heap, and will eventually be garbage-collected. This isn\'t a great design;\\niterators are often transient objects that you need during a function and can discard once the\\nfunction ends. Sounds exactly like the issue stack-allocated objects address, no?\\n\\nIn Rust, iterators are allocated on the stack. The objects to iterate over are almost certainly in\\nheap memory, but the iterator itself\\n([`Iter`](https://doc.rust-lang.org/std/slice/struct.Iter.html)) doesn\'t need to use the heap. In\\neach of the examples below we iterate over a collection, but never use heap allocation:\\n\\n```rust\\nuse std::collections::HashMap;\\n// There\'s a lot of assembly generated, but if you search in the text,\\n// there are no references to `real_drop_in_place` anywhere.\\n\\npub fn sum_vec(x: &Vec) {\\n let mut s = 0;\\n // Basic iteration over vectors doesn\'t need allocation\\n for y in x {\\n s += y;\\n }\\n}\\n\\npub fn sum_enumerate(x: &Vec) {\\n let mut s = 0;\\n // More complex iterators are just fine too\\n for (_i, y) in x.iter().enumerate() {\\n s += y;\\n }\\n}\\n\\npub fn sum_hm(x: &HashMap) {\\n let mut s = 0;\\n // And it\'s not just Vec, all types will allocate the iterator\\n // on stack memory\\n for y in x.values() {\\n s += y;\\n }\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/FTT3CT)"},{"id":"2019/02/the-whole-world","metadata":{"permalink":"/2019/02/the-whole-world","source":"@site/blog/2019-02-05-the-whole-world/index.mdx","title":"Allocations in Rust: Global memory","description":"The first memory type we\'ll look at is pretty special: when Rust can prove that a value is fixed","date":"2019-02-05T12:00:00.000Z","tags":[],"readingTime":7.485,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/02/the-whole-world","title":"Allocations in Rust: Global memory","date":"2019-02-05T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Allocations in Rust: Fixed memory","permalink":"/2019/02/stacking-up"},"nextItem":{"title":"Allocations in Rust: Foreword","permalink":"/2019/02/understanding-allocations-in-rust"}},"content":"The first memory type we\'ll look at is pretty special: when Rust can prove that a _value_ is fixed\\nfor the life of a program (`const`), and when a _reference_ is unique for the life of a program\\n(`static` as a declaration, not\\n[`\'static`](https://doc.rust-lang.org/book/ch10-03-lifetime-syntax.html#the-static-lifetime) as a\\nlifetime), we can make use of global memory. This special section of data is embedded directly in\\nthe program binary so that variables are ready to go once the program loads; no additional\\ncomputation is necessary.\\n\\nUnderstanding the value/reference distinction is important for reasons we\'ll go into below, and\\nwhile the\\n[full specification](https://github.com/rust-lang/rfcs/blob/master/text/0246-const-vs-static.md) for\\nthese two keywords is available, we\'ll take a hands-on approach to the topic.\\n\\n\x3c!-- truncate --\x3e\\n\\n## `const` values\\n\\nWhen a _value_ is guaranteed to be unchanging in your program (where \\"value\\" may be scalars,\\n`struct`s, etc.), you can declare it `const`. This tells the compiler that it\'s safe to treat the\\nvalue as never changing, and enables some interesting optimizations; not only is there no\\ninitialization cost to creating the value (it is loaded at the same time as the executable parts of\\nyour program), but the compiler can also copy the value around if it speeds up the code.\\n\\nThe points we need to address when talking about `const` are:\\n\\n- `Const` values are stored in read-only memory - it\'s impossible to modify.\\n- Values resulting from calling a `const fn` are materialized at compile-time.\\n- The compiler may (or may not) copy `const` values wherever it chooses.\\n\\n### Read-Only\\n\\nThe first point is a bit strange - \\"read-only memory.\\"\\n[The Rust book](https://doc.rust-lang.org/book/ch03-01-variables-and-mutability.html#differences-between-variables-and-constants)\\nmentions in a couple places that using `mut` with constants is illegal, but it\'s also important to\\ndemonstrate just how immutable they are. _Typically_ in Rust you can use\\n[interior mutability](https://doc.rust-lang.org/book/ch15-05-interior-mutability.html) to modify\\nthings that aren\'t declared `mut`.\\n[`RefCell`](https://doc.rust-lang.org/std/cell/struct.RefCell.html) provides an example of this\\npattern in action:\\n\\n```rust\\nuse std::cell::RefCell;\\n\\nfn my_mutator(cell: &RefCell) {\\n // Even though we\'re given an immutable reference,\\n // the `replace` method allows us to modify the inner value.\\n cell.replace(14);\\n}\\n\\nfn main() {\\n let cell = RefCell::new(25);\\n // Prints out 25\\n println!(\\"Cell: {:?}\\", cell);\\n my_mutator(&cell);\\n // Prints out 14\\n println!(\\"Cell: {:?}\\", cell);\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e4bea1a718edaff4507944e825a54b2)\\n\\nWhen `const` is involved though, interior mutability is impossible:\\n\\n```rust\\nuse std::cell::RefCell;\\n\\nconst CELL: RefCell = RefCell::new(25);\\n\\nfn my_mutator(cell: &RefCell) {\\n cell.replace(14);\\n}\\n\\nfn main() {\\n // First line prints 25 as expected\\n println!(\\"Cell: {:?}\\", &CELL);\\n my_mutator(&CELL);\\n // Second line *still* prints 25\\n println!(\\"Cell: {:?}\\", &CELL);\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=88fe98110c33c1b3a51e341f48b8ae00)\\n\\nAnd a second example using [`Once`](https://doc.rust-lang.org/std/sync/struct.Once.html):\\n\\n```rust\\nuse std::sync::Once;\\n\\nconst SURPRISE: Once = Once::new();\\n\\nfn main() {\\n // This is how `Once` is supposed to be used\\n SURPRISE.call_once(|| println!(\\"Initializing...\\"));\\n // Because `Once` is a `const` value, we never record it\\n // having been initialized the first time, and this closure\\n // will also execute.\\n SURPRISE.call_once(|| println!(\\"Initializing again???\\"));\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c3cc5979b5e5434eca0f9ec4a06ee0ed)\\n\\nWhen the\\n[`const` specification](https://github.com/rust-lang/rfcs/blob/26197104b7bb9a5a35db243d639aee6e46d35d75/text/0246-const-vs-static.md)\\nrefers to [\\"rvalues\\"](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf), this\\nbehavior is what they refer to. [Clippy](https://github.com/rust-lang/rust-clippy) will treat this\\nas an error, but it\'s still something to be aware of.\\n\\n### Initialization\\n\\nThe next thing to mention is that `const` values are loaded into memory _as part of your program\\nbinary_. Because of this, any `const` values declared in your program will be \\"realized\\" at\\ncompile-time; accessing them may trigger a main-memory lookup (with a fixed address, so your CPU may\\nbe able to prefetch the value), but that\'s it.\\n\\n```rust\\nuse std::cell::RefCell;\\n\\nconst CELL: RefCell = RefCell::new(24);\\n\\npub fn multiply(value: u32) -> u32 {\\n // CELL is stored at `.L__unnamed_1`\\n value * (*CELL.get_mut())\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/Th8boO)\\n\\nThe compiler creates one `RefCell`, uses it everywhere, and never needs to call the `RefCell::new`\\nfunction.\\n\\n### Copying\\n\\nIf it\'s helpful though, the compiler can choose to copy `const` values.\\n\\n```rust\\nconst FACTOR: u32 = 1000;\\n\\npub fn multiply(value: u32) -> u32 {\\n // See assembly line 4 for the `mov edi, 1000` instruction\\n value * FACTOR\\n}\\n\\npub fn multiply_twice(value: u32) -> u32 {\\n // See assembly lines 22 and 29 for `mov edi, 1000` instructions\\n value * FACTOR * FACTOR\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/ZtS54X)\\n\\nIn this example, the `FACTOR` value is turned into the `mov edi, 1000` instruction in both the\\n`multiply` and `multiply_twice` functions; the \\"1000\\" value is never \\"stored\\" anywhere, as it\'s\\nsmall enough to inline into the assembly instructions.\\n\\nFinally, getting the address of a `const` value is possible, but not guaranteed to be unique\\n(because the compiler can choose to copy values). I was unable to get non-unique pointers in my\\ntesting (even using different crates), but the specifications are clear enough: _don\'t rely on\\npointers to `const` values being consistent_. To be frank, caring about locations for `const` values\\nis almost certainly a code smell.\\n\\n## `static` values\\n\\nStatic variables are related to `const` variables, but take a slightly different approach. When we\\ndeclare that a _reference_ is unique for the life of a program, you have a `static` variable\\n(unrelated to the `\'static` lifetime). Because of the reference/value distinction with\\n`const`/`static`, static variables behave much more like typical \\"global\\" variables.\\n\\nBut to understand `static`, here\'s what we\'ll look at:\\n\\n- `static` variables are globally unique locations in memory.\\n- Like `const`, `static` variables are loaded at the same time as your program being read into\\n memory.\\n- All `static` variables must implement the\\n [`Sync`](https://doc.rust-lang.org/std/marker/trait.Sync.html) marker trait.\\n- Interior mutability is safe and acceptable when using `static` variables.\\n\\n### Memory Uniqueness\\n\\nThe single biggest difference between `const` and `static` is the guarantees provided about\\nuniqueness. Where `const` variables may or may not be copied in code, `static` variables are\\nguarantee to be unique. If we take a previous `const` example and change it to `static`, the\\ndifference should be clear:\\n\\n```rust\\nstatic FACTOR: u32 = 1000;\\n\\npub fn multiply(value: u32) -> u32 {\\n // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used\\n value * FACTOR\\n}\\n\\npub fn multiply_twice(value: u32) -> u32 {\\n // The assembly to `mul dword ptr [rip + example::FACTOR]` is how FACTOR gets used\\n value * FACTOR * FACTOR\\n}\\n```\\n\\n-- [Compiler Explorer](https://godbolt.org/z/uxmiRQ)\\n\\nWhere [previously](#copying) there were plenty of references to multiplying by 1000, the new\\nassembly refers to `FACTOR` as a named memory location instead. No initialization work needs to be\\ndone, but the compiler can no longer prove the value never changes during execution.\\n\\n### Initialization\\n\\nNext, let\'s talk about initialization. The simplest case is initializing static variables with\\neither scalar or struct notation:\\n\\n```rust\\n#[derive(Debug)]\\nstruct MyStruct {\\n x: u32\\n}\\n\\nstatic MY_STRUCT: MyStruct = MyStruct {\\n // You can even reference other statics\\n // declared later\\n x: MY_VAL\\n};\\n\\nstatic MY_VAL: u32 = 24;\\n\\nfn main() {\\n println!(\\"Static MyStruct: {:?}\\", MY_STRUCT);\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b538dbc46076f12db047af4f4403ee6e)\\n\\nThings can get a bit weirder when using `const fn` though. In most cases, it just works:\\n\\n```rust\\n#[derive(Debug)]\\nstruct MyStruct {\\n x: u32\\n}\\n\\nimpl MyStruct {\\n const fn new() -> MyStruct {\\n MyStruct { x: 24 }\\n }\\n}\\n\\nstatic MY_STRUCT: MyStruct = MyStruct::new();\\n\\nfn main() {\\n println!(\\"const fn Static MyStruct: {:?}\\", MY_STRUCT);\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8c796a6e7fc273c12115091b707b0255)\\n\\nHowever, there\'s a caveat: you\'re currently not allowed to use `const fn` to initialize static\\nvariables of types that aren\'t marked `Sync`. For example,\\n[`RefCell::new()`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.new) is a\\n`const fn`, but because\\n[`RefCell` isn\'t `Sync`](https://doc.rust-lang.org/std/cell/struct.RefCell.html#impl-Sync), you\'ll\\nget an error at compile time:\\n\\n```rust\\nuse std::cell::RefCell;\\n\\n// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely\\nstatic MY_LOCK: RefCell = RefCell::new(0);\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=c76ef86e473d07117a1700e21fd45560)\\n\\nIt\'s likely that this will\\n[change in the future](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md) though.\\n\\n### The `Sync` marker\\n\\nWhich leads well to the next point: static variable types must implement the\\n[`Sync` marker](https://doc.rust-lang.org/std/marker/trait.Sync.html). Because they\'re globally\\nunique, it must be safe for you to access static variables from any thread at any time. Most\\n`struct` definitions automatically implement the `Sync` trait because they contain only elements\\nwhich themselves implement `Sync` (read more in the\\n[Nomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)). This is why earlier examples could\\nget away with initializing statics, even though we never included an `impl Sync for MyStruct` in the\\ncode. To demonstrate this property, Rust refuses to compile our earlier example if we add a\\nnon-`Sync` element to the `struct` definition:\\n\\n```rust\\nuse std::cell::RefCell;\\n\\nstruct MyStruct {\\n x: u32,\\n y: RefCell,\\n}\\n\\n// error[E0277]: `std::cell::RefCell` cannot be shared between threads safely\\nstatic MY_STRUCT: MyStruct = MyStruct {\\n x: 8,\\n y: RefCell::new(8)\\n};\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=40074d0248f056c296b662dbbff97cfc)\\n\\n### Interior mutability\\n\\nFinally, while `static mut` variables are allowed, mutating them is an `unsafe` operation. If we\\nwant to stay in `safe` Rust, we can use interior mutability to accomplish similar goals:\\n\\n```rust\\nuse std::sync::Once;\\n\\n// This example adapted from https://doc.rust-lang.org/std/sync/struct.Once.html#method.call_once\\nstatic INIT: Once = Once::new();\\n\\nfn main() {\\n // Note that while `INIT` is declared immutable, we\'re still allowed\\n // to mutate its interior\\n INIT.call_once(|| println!(\\"Initializing...\\"));\\n // This code won\'t panic, as the interior of INIT was modified\\n // as part of the previous `call_once`\\n INIT.call_once(|| panic!(\\"INIT was called twice!\\"));\\n}\\n```\\n\\n--\\n[Rust Playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3ba003a981a7ed7400240caadd384d59)"},{"id":"2019/02/understanding-allocations-in-rust","metadata":{"permalink":"/2019/02/understanding-allocations-in-rust","source":"@site/blog/2019-02-04-understanding-allocations-in-rust/index.mdx","title":"Allocations in Rust: Foreword","description":"There\'s an alchemy of distilling complex technical topics into articles and videos that change the","date":"2019-02-04T12:00:00.000Z","tags":[],"readingTime":3.785,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2019/02/understanding-allocations-in-rust","title":"Allocations in Rust: Foreword","date":"2019-02-04T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Allocations in Rust: Global memory","permalink":"/2019/02/the-whole-world"},"nextItem":{"title":"QADAPT - debug_assert! for allocations","permalink":"/2018/12/allocation-safety"}},"content":"There\'s an alchemy of distilling complex technical topics into articles and videos that change the\\nway programmers see the tools they interact with on a regular basis. I knew what a linker was, but\\nthere\'s a staggering amount of complexity in between\\n[the OS and `main()`](https://www.youtube.com/watch?v=dOfucXtyEsU). Rust programmers use the\\n[`Box`](https://doc.rust-lang.org/stable/std/boxed/struct.Box.html) type all the time, but there\'s a\\nrich history of the Rust language itself wrapped up in\\n[how special it is](https://manishearth.github.io/blog/2017/01/10/rust-tidbits-box-is-special/).\\n\\nIn a similar vein, this series attempts to look at code and understand how memory is used; the\\ncomplex choreography of operating system, compiler, and program that frees you to focus on\\nfunctionality far-flung from frivolous book-keeping. The Rust compiler relieves a great deal of the\\ncognitive burden associated with memory management, but we\'re going to step into its world for a\\nwhile.\\n\\nLet\'s learn a bit about memory in Rust.\\n\\n\x3c!-- truncate --\x3e\\n\\n---\\n\\nRust\'s three defining features of\\n[Performance, Reliability, and Productivity](https://www.rust-lang.org/) are all driven to a great\\ndegree by the how the Rust compiler understands memory usage. Unlike managed memory languages (Java,\\nPython), Rust\\n[doesn\'t really](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-the-generational-hypothesis)\\ngarbage collect; instead, it uses an\\n[ownership](https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) system to reason about\\nhow long objects will last in your program. In some cases, if the life of an object is fairly\\ntransient, Rust can make use of a very fast region called the \\"stack.\\" When that\'s not possible,\\nRust uses\\n[dynamic (heap) memory](https://en.wikipedia.org/wiki/Memory_management#Dynamic_memory_allocation)\\nand the ownership system to ensure you can\'t accidentally corrupt memory. It\'s not as fast, but it\\nis important to have available.\\n\\nThat said, there are specific situations in Rust where you\'d never need to worry about the\\nstack/heap distinction! If you:\\n\\n1. Never use `unsafe`\\n2. Never use `#![feature(alloc)]` or the [`alloc` crate](https://doc.rust-lang.org/alloc/index.html)\\n\\n...then it\'s not possible for you to use dynamic memory!\\n\\nFor some uses of Rust, typically embedded devices, these constraints are OK. They have very limited\\nmemory, and the program binary size itself may significantly affect what\'s available! There\'s no\\noperating system able to manage this\\n[\\"virtual memory\\"](https://en.wikipedia.org/wiki/Virtual_memory) thing, but that\'s not an issue\\nbecause there\'s only one running application. The\\n[embedonomicon](https://docs.rust-embedded.org/embedonomicon/preface.html) is ever in mind, and\\ninteracting with the \\"real world\\" through extra peripherals is accomplished by reading and writing\\nto [specific memory addresses](https://bob.cs.sonoma.edu/IntroCompOrg-RPi/sec-gpio-mem.html).\\n\\nMost Rust programs find these requirements overly burdensome though. C++ developers would struggle\\nwithout access to [`std::vector`](https://en.cppreference.com/w/cpp/container/vector) (except those\\nhardcore no-STL people), and Rust developers would struggle without\\n[`std::vec`](https://doc.rust-lang.org/std/vec/struct.Vec.html). But with the constraints above,\\n`std::vec` is actually a part of the\\n[`alloc` crate](https://doc.rust-lang.org/alloc/vec/struct.Vec.html), and thus off-limits. `Box`,\\n`Rc`, etc., are also unusable for the same reason.\\n\\nWhether writing code for embedded devices or not, the important thing in both situations is how much\\nyou know _before your application starts_ about what its memory usage will look like. In embedded\\ndevices, there\'s a small, fixed amount of memory to use. In a browser, you have no idea how large\\n[google.com](https://www.google.com)\'s home page is until you start trying to download it. The\\ncompiler uses this knowledge (or lack thereof) to optimize how memory is used; put simply, your code\\nruns faster when the compiler can guarantee exactly how much memory your program needs while it\'s\\nrunning. This series is all about understanding how the compiler reasons about your program, with an\\nemphasis on the implications for performance.\\n\\nNow let\'s address some conditions and caveats before going much further:\\n\\n- We\'ll focus on \\"safe\\" Rust only; `unsafe` lets you use platform-specific allocation API\'s\\n ([`malloc`](https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm)) that we\'ll\\n ignore.\\n- We\'ll assume a \\"debug\\" build of Rust code (what you get with `cargo run` and `cargo test`) and\\n address (pun intended) release mode at the end (`cargo run --release` and `cargo test --release`).\\n- All content will be run using Rust 1.32, as that\'s the highest currently supported in the\\n [Compiler Exporer](https://godbolt.org/). As such, we\'ll avoid upcoming innovations like\\n [compile-time evaluation of `static`](https://github.com/rust-lang/rfcs/blob/master/text/0911-const-fn.md)\\n that are available in nightly.\\n- Because of the nature of the content, being able to read assembly is helpful. We\'ll keep it\\n simple, but I [found](https://stackoverflow.com/a/4584131/1454178) a\\n [refresher](https://stackoverflow.com/a/26026278/1454178) on the `push` and `pop`\\n [instructions](http://www.cs.virginia.edu/~evans/cs216/guides/x86.html) was helpful while writing\\n this.\\n- I\'ve tried to be precise in saying only what I can prove using the tools (ASM, docs) that are\\n available, but if there\'s something said in error it will be corrected expeditiously. Please let\\n me know at [bradlee@speice.io](mailto:bradlee@speice.io)\\n\\nFinally, I\'ll do what I can to flag potential future changes but the Rust docs have a notice worth\\nrepeating:\\n\\n> Rust does not currently have a rigorously and formally defined memory model.\\n>\\n> -- [the docs](https://doc.rust-lang.org/std/ptr/fn.read_volatile.html)"},{"id":"2018/12/allocation-safety","metadata":{"permalink":"/2018/12/allocation-safety","source":"@site/blog/2018-12-15-allocation-safety/index.mdx","title":"QADAPT - debug_assert! for allocations","description":"I think it\'s part of the human condition to ignore perfectly good advice when it comes our way. A","date":"2018-12-15T12:00:00.000Z","tags":[],"readingTime":4.775,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/12/allocation-safety","title":"QADAPT - debug_assert! for allocations","date":"2018-12-15T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731204300000,"prevItem":{"title":"Allocations in Rust: Foreword","permalink":"/2019/02/understanding-allocations-in-rust"},"nextItem":{"title":"More \\"what companies really mean\\"","permalink":"/2018/12/what-small-business-really-means"}},"content":"I think it\'s part of the human condition to ignore perfectly good advice when it comes our way. A\\nbit over a month ago, I was dispensing sage wisdom for the ages:\\n\\n> I had a really great idea: build a custom allocator that allows you to track your own allocations.\\n> I gave it a shot, but learned very quickly: **never write your own allocator.**\\n>\\n> -- [me](/2018/10/case-study-optimization)\\n\\nI proceeded to ignore it, because we never really learn from our mistakes.\\n\\n\x3c!-- truncate --\x3e\\n\\nThere\'s another part of the human condition that derives joy from seeing things explode.\\n\\n\\n![Explosions](./watch-the-world-burn.webp)\\n\\n\\nAnd _that\'s_ the part I\'m going to focus on.\\n\\n## Why an Allocator?\\n\\nSo why, after complaining about allocators, would I still want to write one? There are three reasons\\nfor that:\\n\\n1. Allocation/dropping is slow\\n2. It\'s difficult to know exactly when Rust will allocate or drop, especially when using code that\\n you did not write\\n3. I want automated tools to verify behavior, instead of inspecting by hand\\n\\nWhen I say \\"slow,\\" it\'s important to define the terms. If you\'re writing web applications, you\'ll\\nspend orders of magnitude more time waiting for the database than you will the allocator. However,\\nthere\'s still plenty of code where micro- or nano-seconds matter; think\\n[finance](https://www.youtube.com/watch?v=NH1Tta7purM),\\n[real-time audio](https://www.reddit.com/r/rust/comments/9hg7yj/synthesizer_progress_update/e6c291f),\\n[self-driving cars](https://polysync.io/blog/session-types-for-hearty-codecs/), and\\n[networking](https://carllerche.github.io/bytes/bytes/index.html). In these situations it\'s simply\\nunacceptable for you to spend time doing things that are not your program, and waiting on the\\nallocator is not cool.\\n\\nAs I continue to learn Rust, it\'s difficult for me to predict where exactly allocations will happen.\\nSo, I propose we play a quick trivia game: **Does this code invoke the allocator?**\\n\\n### Example 1\\n\\n```rust\\nfn my_function() {\\n let v: Vec = Vec::new();\\n}\\n```\\n\\n**No**: Rust [knows how big](https://doc.rust-lang.org/std/mem/fn.size_of.html) the `Vec` type is,\\nand reserves a fixed amount of memory on the stack for the `v` vector. However, if we wanted to\\nreserve extra space (using `Vec::with_capacity`) the allocator would get invoked.\\n\\n### Example 2\\n\\n```rust\\nfn my_function() {\\n let v: Box> = Box::new(Vec::new());\\n}\\n```\\n\\n**Yes**: Because Boxes allow us to work with things that are of unknown size, it has to allocate on\\nthe heap. While the `Box` is unnecessary in this snippet (release builds will optimize out the\\nallocation), reserving heap space more generally is needed to pass a dynamically sized type to\\nanother function.\\n\\n### Example 3\\n\\n```rust\\nfn my_function(v: Vec) {\\n v.push(5);\\n}\\n```\\n\\n**Maybe**: Depending on whether the Vector we were given has space available, we may or may not\\nallocate. Especially when dealing with code that you did not author, it\'s difficult to verify that\\nthings behave as you expect them to.\\n\\n## Blowing Things Up\\n\\nSo, how exactly does QADAPT solve these problems? **Whenever an allocation or drop occurs in code\\nmarked allocation-safe, QADAPT triggers a thread panic.** We don\'t want to let the program continue\\nas if nothing strange happened, _we want things to explode_.\\n\\nHowever, you don\'t want code to panic in production because of circumstances you didn\'t predict.\\nJust like [`debug_assert!`](https://doc.rust-lang.org/std/macro.debug_assert.html), **QADAPT will\\nstrip out its own code when building in release mode to guarantee no panics and no performance\\nimpact.**\\n\\nFinally, there are three ways to have QADAPT check that your code will not invoke the allocator:\\n\\n### Using a procedural macro\\n\\nThe easiest method, watch an entire function for allocator invocation:\\n\\n```rust\\nuse qadapt::no_alloc;\\nuse qadapt::QADAPT;\\n\\n#[global_allocator]\\nstatic Q: QADAPT = QADAPT;\\n\\n#[no_alloc]\\nfn push_vec(v: &mut Vec) {\\n // This triggers a panic if v.len() == v.capacity()\\n v.push(5);\\n}\\n\\nfn main() {\\n let v = Vec::with_capacity(1);\\n\\n // This will *not* trigger a panic\\n push_vec(&v);\\n\\n // This *will* trigger a panic\\n push_vec(&v);\\n}\\n```\\n\\n### Using a regular macro\\n\\nFor times when you need more precision:\\n\\n```rust\\nuse qadapt::assert_no_alloc;\\nuse qadapt::QADAPT;\\n\\n#[global_allocator]\\nstatic Q: QADAPT = QADAPT;\\n\\nfn main() {\\n let v = Vec::with_capacity(1);\\n\\n // No allocations here, we already have space reserved\\n assert_no_alloc!(v.push(5));\\n\\n // Even though we remove an item, it doesn\'t trigger a drop\\n // because it\'s a scalar. If it were a `Box<_>` type,\\n // a drop would trigger.\\n assert_no_alloc!({\\n v.pop().unwrap();\\n });\\n}\\n```\\n\\n### Using function calls\\n\\nBoth the most precise and most tedious:\\n\\n```rust\\nuse qadapt::enter_protected;\\nuse qadapt::exit_protected;\\nuse qadapt::QADAPT;\\n\\n#[global_allocator]\\nstatic Q: QADAPT = QADAPT;\\n\\nfn main() {\\n // This triggers an allocation (on non-release builds)\\n let v = Vec::with_capacity(1);\\n\\n enter_protected();\\n // This does not trigger an allocation because we\'ve reserved size\\n v.push(0);\\n exit_protected();\\n\\n // This triggers an allocation because we ran out of size,\\n // but doesn\'t panic because we\'re no longer protected.\\n v.push(1);\\n}\\n```\\n\\n### Caveats\\n\\nIt\'s important to point out that QADAPT code is synchronous, so please be careful when mixing in\\nasynchronous functions:\\n\\n```rust\\nuse futures::future::Future;\\nuse futures::future::ok;\\n\\n#[no_alloc]\\nfn async_capacity() -> impl Future, Error=()> {\\n ok(12).and_then(|e| Ok(Vec::with_capacity(e)))\\n}\\n\\nfn main() {\\n // This doesn\'t trigger a panic because the `and_then` closure\\n // wasn\'t run during the function call.\\n async_capacity();\\n\\n // Still no panic\\n assert_no_alloc!(async_capacity());\\n\\n // This will panic because the allocation happens during `unwrap`\\n // in the `assert_no_alloc!` macro\\n assert_no_alloc!(async_capacity().poll().unwrap());\\n}\\n```\\n\\n## Conclusion\\n\\nWhile there\'s a lot more to writing high-performance code than managing your usage of the allocator,\\nit\'s critical that you do use the allocator correctly. QADAPT will verify that your code is doing\\nwhat you expect. It\'s usable even on stable Rust from version 1.31 onward, which isn\'t the case for\\nmost allocators. Version 1.0 was released today, and you can check it out over at\\n[crates.io](https://crates.io/crates/qadapt) or on [github](https://github.com/bspeice/qadapt).\\n\\nI\'m hoping to write more about high-performance Rust in the future, and I expect that QADAPT will\\nhelp guide that. If there are topics you\'re interested in, let me know in the comments below!\\n\\n[qadapt]: https://crates.io/crates/qadapt"},{"id":"2018/12/what-small-business-really-means","metadata":{"permalink":"/2018/12/what-small-business-really-means","source":"@site/blog/2018-12-04-what-small-business-really-means/index.mdx","title":"More \\"what companies really mean\\"","description":"I recently stumbled across a phenomenal small article entitled","date":"2018-12-04T12:00:00.000Z","tags":[],"readingTime":1.205,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/12/what-small-business-really-means","title":"More \\"what companies really mean\\"","date":"2018-12-04T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731190109000,"prevItem":{"title":"QADAPT - debug_assert! for allocations","permalink":"/2018/12/allocation-safety"},"nextItem":{"title":"A case study in heaptrack","permalink":"/2018/10/case-study-optimization"}},"content":"I recently stumbled across a phenomenal small article entitled\\n[What Startups Really Mean By \\"Why Should We Hire You?\\"](https://angel.co/blog/what-startups-really-mean-by-why-should-we-hire-you).\\nHaving been interviewed by smaller companies (though not exactly startups), the questions and\\nsubtexts are the same. There\'s often a question behind the question that you\'re actually trying to\\nanswer, and I wish I spotted the nuance earlier in my career.\\n\\nLet me also make note of one more question/euphemism I\'ve come across:\\n\\n\x3c!-- truncate --\x3e\\n\\n## How do you feel about production support?\\n\\n**Translation**: _We\'re a fairly small team, and when things break on an evening/weekend/Christmas\\nDay, can we call on you to be there?_\\n\\nI\'ve met decidedly few people in my life who truly enjoy the \\"ops\\" side of \\"devops\\". They\'re\\nincredibly good at taking an impossible problem, pre-existing knowledge of arcane arts, and turning\\nthat into a functioning system at the end. And if they all left for lunch, we probably wouldn\'t make\\nit out the door before the zombie apocalypse.\\n\\nLarger organizations (in my experience, 500+ person organizations) have the luxury of hiring people\\nwho either enjoy that, or play along nicely enough that our systems keep working.\\n\\nSmall teams have no such luck. If you\'re interviewing at a small company, especially as a \\"data\\nscientist\\" or other somesuch position, be aware that systems can and do spontaneously combust at the\\nmost inopportune moments.\\n\\n**Terrible-but-popular answers include**: _It\'s a part of the job, and I\'m happy to contribute._"},{"id":"2018/10/case-study-optimization","metadata":{"permalink":"/2018/10/case-study-optimization","source":"@site/blog/2018-10-08-case-study-optimization/index.mdx","title":"A case study in heaptrack","description":"I remember early in my career someone joking that:","date":"2018-10-08T12:00:00.000Z","tags":[],"readingTime":4.26,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/10/case-study-optimization","title":"A case study in heaptrack","date":"2018-10-08T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731189722000,"prevItem":{"title":"More \\"what companies really mean\\"","permalink":"/2018/12/what-small-business-really-means"},"nextItem":{"title":"Isomorphic desktop apps with Rust","permalink":"/2018/09/isomorphic-apps"}},"content":"I remember early in my career someone joking that:\\n\\n> Programmers have it too easy these days. They should learn to develop in low memory environments\\n> and be more efficient.\\n\\n...though it\'s not like the first code I wrote was for a\\n[graphing calculator](https://web.archive.org/web/20180924060530/https://education.ti.com/en/products/calculators/graphing-calculators/ti-84-plus-se)\\npacking a whole 24KB of RAM.\\n\\nBut the principle remains: be efficient with the resources you have, because\\n[what Intel giveth, Microsoft taketh away](http://exo-blog.blogspot.com/2007/09/what-intel-giveth-microsoft-taketh-away.html).\\n\\n\x3c!-- truncate --\x3e\\n\\nMy professional work is focused on this kind of efficiency; low-latency financial markets demand\\nthat you understand at a deep level _exactly_ what your code is doing. As I continue experimenting\\nwith Rust for personal projects, it\'s exciting to bring a utilitarian mindset with me: there\'s\\nflexibility for the times I pretend to have a garbage collector, and flexibility for the times that\\nI really care about how memory is used.\\n\\nThis post is a (small) case study in how I went from the former to the latter. And ultimately, it\'s\\nintended to be a starting toolkit to empower analysis of your own code.\\n\\n## Curiosity\\n\\nWhen I first started building the [dtparse] crate, my intention was to mirror as closely as possible\\nthe equivalent [Python library][dateutil]. Python, as you may know, is garbage collected. Very\\nrarely is memory usage considered in Python, and I likewise wasn\'t paying too much attention when\\n`dtparse` was first being built.\\n\\nThis lackadaisical approach to memory works well enough, and I\'m not planning on making `dtparse`\\nhyper-efficient. But every so often, I\'ve wondered: \\"what exactly is going on in memory?\\" With the\\nadvent of Rust 1.28 and the\\n[Global Allocator trait](https://doc.rust-lang.org/std/alloc/trait.GlobalAlloc.html), I had a really\\ngreat idea: _build a custom allocator that allows you to track your own allocations._ That way, you\\ncan do things like writing tests for both correct results and correct memory usage. I gave it a\\n[shot][qadapt], but learned very quickly: **never write your own allocator**. It went from \\"fun\\nweekend project\\" to \\"I have literally no idea what my computer is doing\\" at breakneck speed.\\n\\nInstead, I\'ll highlight a separate path I took to make sense of my memory usage: [heaptrack].\\n\\n## Turning on the System Allocator\\n\\nThis is the hardest part of the post. Because Rust uses\\n[its own allocator](https://github.com/rust-lang/rust/pull/27400#issue-41256384) by default,\\n`heaptrack` is unable to properly record unmodified Rust code. To remedy this, we\'ll make use of the\\n`#[global_allocator]` attribute.\\n\\nSpecifically, in `lib.rs` or `main.rs`, add this:\\n\\n```rust\\nuse std::alloc::System;\\n\\n#[global_allocator]\\nstatic GLOBAL: System = System;\\n```\\n\\n...and that\'s it. Everything else comes essentially for free.\\n\\n## Running heaptrack\\n\\nAssuming you\'ve installed heaptrack (Homebrew in Mac, package manager\\nin Linux, ??? in Windows), all that\'s left is to fire up your application:\\n\\n```\\nheaptrack my_application\\n```\\n\\nIt\'s that easy. After the program finishes, you\'ll see a file in your local directory with a name\\nlike `heaptrack.my_appplication.XXXX.gz`. If you load that up in `heaptrack_gui`, you\'ll see\\nsomething like this:\\n\\n![heaptrack](./heaptrack-before.png)\\n\\n---\\n\\nAnd even these pretty colors:\\n\\n![pretty colors](./heaptrack-flamegraph.png)\\n\\n## Reading Flamegraphs\\n\\nTo make sense of our memory usage, we\'re going to focus on that last picture - it\'s called a\\n[\\"flamegraph\\"](http://www.brendangregg.com/flamegraphs.html). These charts are typically used to\\nshow how much time your program spends executing each function, but they\'re used here to show how\\nmuch memory was allocated during those functions instead.\\n\\nFor example, we can see that all executions happened during the `main` function:\\n\\n![allocations in main](./heaptrack-main-colorized.png)\\n\\n...and within that, all allocations happened during `dtparse::parse`:\\n\\n![allocations in dtparse](./heaptrack-dtparse-colorized.png)\\n\\n...and within _that_, allocations happened in two different places:\\n\\n![allocations in parseinfo](./heaptrack-parseinfo-colorized.png)\\n\\nNow I apologize that it\'s hard to see, but there\'s one area specifically that stuck out as an issue:\\n**what the heck is the `Default` thing doing?**\\n\\n![pretty colors](./heaptrack-flamegraph-default.png)\\n\\n## Optimizing dtparse\\n\\nSee, I knew that there were some allocations during calls to `dtparse::parse`, but I was totally\\nwrong about where the bulk of allocations occurred in my program. Let me post the code and see if\\nyou can spot the mistake:\\n\\n```rust\\n/// Main entry point for using `dtparse`.\\npub fn parse(timestr: &str) -> ParseResult<(NaiveDateTime, Option)> {\\n let res = Parser::default().parse(\\n timestr, None, None, false, false,\\n None, false,\\n &HashMap::new(),\\n )?;\\n\\n Ok((res.0, res.1))\\n}\\n```\\n\\n> [dtparse](https://github.com/bspeice/dtparse/blob/4d7c5dd99572823fa4a390b483c38ab020a2172f/src/lib.rs#L1286)\\n\\n---\\n\\nBecause `Parser::parse` requires a mutable reference to itself, I have to create a new\\n`Parser::default` every time it receives a string. This is excessive! We\'d rather have an immutable\\nparser that can be re-used, and avoid allocating memory in the first place.\\n\\nArmed with that information, I put some time in to\\n[make the parser immutable](https://github.com/bspeice/dtparse/commit/741afa34517d6bc1155713bbc5d66905fea13fad#diff-b4aea3e418ccdb71239b96952d9cddb6).\\nNow that I can re-use the same parser over and over, the allocations disappear:\\n\\n![allocations cleaned up](./heaptrack-flamegraph-after.png)\\n\\nIn total, we went from requiring 2 MB of memory in\\n[version 1.0.2](https://crates.io/crates/dtparse/1.0.2):\\n\\n![memory before](./heaptrack-closeup.png)\\n\\nAll the way down to 300KB in [version 1.0.3](https://crates.io/crates/dtparse/1.0.3):\\n\\n![memory after](./heaptrack-closeup-after.png)\\n\\n## Conclusion\\n\\nIn the end, you don\'t need to write a custom allocator to be efficient with memory, great tools\\nalready exist to help you understand what your program is doing.\\n\\n**Use them.**\\n\\nGiven that [Moore\'s Law](https://en.wikipedia.org/wiki/Moore%27s_law) is\\n[dead](https://www.technologyreview.com/s/601441/moores-law-is-dead-now-what/), we\'ve all got to do\\nour part to take back what Microsoft stole.\\n\\n[dtparse]: https://crates.io/crates/dtparse\\n[dateutil]: https://github.com/dateutil/dateutil\\n[heaptrack]: https://github.com/KDE/heaptrack\\n[qadapt]: https://crates.io/crates/qadapt"},{"id":"2018/09/isomorphic-apps","metadata":{"permalink":"/2018/09/isomorphic-apps","source":"@site/blog/2018-09-15-isomorphic-apps/index.mdx","title":"Isomorphic desktop apps with Rust","description":"I both despise Javascript and am stunned by its success doing some really cool things. It\'s","date":"2018-09-15T12:00:00.000Z","tags":[],"readingTime":9.905,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/09/isomorphic-apps","title":"Isomorphic desktop apps with Rust","date":"2018-09-15T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731188450000,"prevItem":{"title":"A case study in heaptrack","permalink":"/2018/10/case-study-optimization"},"nextItem":{"title":"Primitives in Rust are weird (and cool)","permalink":"/2018/09/primitives-in-rust-are-weird"}},"content":"I both despise Javascript and am stunned by its success doing some really cool things. It\'s\\n[this duality](https://www.destroyallsoftware.com/talks/the-birth-and-death-of-javascript) that\'s\\nled me to a couple of (very) late nights over the past weeks trying to reconcile myself as I\\nbootstrap a simple desktop application.\\n\\n\x3c!-- truncate --\x3e\\n\\nSee, as much as\\n[Webassembly isn\'t trying to replace Javascript](https://webassembly.org/docs/faq/#is-webassembly-trying-to-replace-javascript),\\n**I want Javascript gone**. There are plenty of people who don\'t share my views, and they are\\nprobably nicer and more fun at parties. But I cringe every time \\"Webpack\\" is mentioned, and I think\\nit\'s hilarious that the\\n[language specification](https://ecma-international.org/publications/standards/Ecma-402.htm)\\ndramatically outpaces anyone\'s\\n[actual implementation](https://kangax.github.io/compat-table/es2016plus/). The answer to this\\nconundrum is of course to recompile code from newer versions of the language to older versions _of\\nthe same language_ before running. At least [Babel] is a nice tongue-in-cheek reference.\\n\\nYet for as much hate as [Electron] receives, it does a stunningly good job at solving a really hard\\nproblem: _how the hell do I put a button on the screen and react when the user clicks it_? GUI\\nprogramming is hard, straight up. But if browsers are already able to run everywhere, why don\'t we\\ntake advantage of someone else solving the hard problems for us? I don\'t like that I have to use\\nJavascript for it, but I really don\'t feel inclined to whip out good ol\' [wxWidgets].\\n\\nNow there are other native solutions ([libui-rs], [conrod], [oh hey wxWdidgets again!][wxrust]), but\\nthose also have their own issues with distribution, styling, etc. With Electron, I can\\n`yarn create electron-app my-app` and just get going, knowing that packaging/upgrades/etc. are built\\nin.\\n\\nMy question is: given recent innovations with WASM, _are we Electron yet_?\\n\\nNo, not really.\\n\\nInstead, **what would it take to get to a point where we can skip Javascript in Electron apps?**\\n\\n# Setting the Stage\\n\\nTruth is, WASM/Webassembly is a pretty new technology and I\'m a total beginner in this area. There\\nmay already be solutions to the issues I discuss, but I\'m totally unaware of them, so I\'m going to\\ntry and organize what I did manage to discover.\\n\\nI should also mention that the content and things I\'m talking about here are not intended to be\\nprescriptive, but more \\"if someone else is interested, what do we already know doesn\'t work?\\" _I\\nexpect everything in this post to be obsolete within two months._ Even over the course of writing\\nthis, [a separate blog post](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/) had\\nto be modified because [upstream changes](https://github.com/WebAssembly/binaryen/pull/1642) broke a\\n[Rust tool](https://github.com/rustwasm/wasm-bindgen/pull/787) the post tried to use. The post\\nultimately\\n[got updated](https://mnt.io/2018/08/28/from-rust-to-beyond-the-asm-js-galaxy/#comment-477), **but\\nall this happened within the span of a week.** Things are moving quickly.\\n\\nI\'ll also note that we\'re going to skip [asm.js] and [emscripten]. Truth be told, I couldn\'t get\\neither of these to output anything, and so I\'m just going to say\\n[here be dragons.](https://en.wikipedia.org/wiki/Here_be_dragons) Everything I\'m discussing here\\nuses the `wasm32-unknown-unknown` target.\\n\\nThe code that I _did_ get running is available\\n[over here](https://github.com/speice-io/isomorphic-rust). Feel free to use it as a starting point,\\nbut I\'m mostly including the link as a reference for the things that were attempted.\\n\\n# An Example Running Application\\n\\nSo, I did _technically_ get a running application:\\n\\n![Electron app using WASM](./electron-percy-wasm.png)\\n\\n...which you can also try out if you want:\\n\\n```sh\\ngit clone https://github.com/speice-io/isomorphic-rust.git\\ncd isomorphic_rust/percy\\nyarn install && yarn start\\n```\\n\\n...but I wouldn\'t really call it a \\"high quality\\" starting point to base future work on. It\'s mostly\\nthere to prove this is possible in the first place. And that\'s something to be proud of! There\'s a\\nhuge amount of engineering that went into showing a window with the text \\"It\'s alive!\\".\\n\\nThere\'s also a lot of usability issues that prevent me from recommending anyone try Electron and\\nWASM apps at the moment, and I think that\'s the more important thing to discuss.\\n\\n# Issue the First: Complicated Toolchains\\n\\nI quickly established that [wasm-bindgen] was necessary to \\"link\\" my Rust code to Javascript. At\\nthat point you\'ve got an Electron app that starts an HTML page which ultimately fetches your WASM\\nblob. To keep things simple, the goal was to package everything using [webpack] so that I could just\\nload a `bundle.js` file on the page. That decision was to be the last thing that kinda worked in\\nthis process.\\n\\nThe first issue\\n[I ran into](https://www.reddit.com/r/rust/comments/98lpun/unable_to_load_wasm_for_electron_application/)\\nwhile attempting to bundle everything via `webpack` is a detail in the WASM spec:\\n\\n> This function accepts a Response object, or a promise for one, and ... **[if > it] does not match\\n> the `application/wasm` MIME type**, the returned promise will be rejected with a TypeError;\\n>\\n> [WebAssembly - Additional Web Embedding API](https://webassembly.org/docs/web/#additional-web-embedding-api)\\n\\nSpecifically, if you try and load a WASM blob without the MIME type set, you\'ll get an error. On the\\nweb this isn\'t a huge issue, as the server can set MIME types when delivering the blob. With\\nElectron, you\'re resolving things with a `file://` URL and thus can\'t control the MIME type:\\n\\n![TypeError: Incorrect response MIME type. Expected \'application/wasm\'.](./incorrect-MIME-type.png)\\n\\nThere are a couple of solutions depending on how far into the deep end you care to venture:\\n\\n- Embed a static file server in your Electron application\\n- Use a [custom protocol](https://electronjs.org/docs/api/protocol) and custom protocol handler\\n- Host your WASM blob on a website that you resolve at runtime\\n\\nBut all these are pretty bad solutions and defeat the purpose of using WASM in the first place.\\nInstead, my workaround was to\\n[open a PR with `webpack`](https://github.com/webpack/webpack/issues/7918) and use regex to remove\\ncalls to `instantiateStreaming` in the\\n[build script](https://github.com/speice-io/isomorphic-rust/blob/master/percy/build.sh#L21-L25):\\n\\n```sh\\ncargo +nightly build --target=wasm32-unknown-unknown && \\\\\\n wasm-bindgen \\"$WASM_DIR/debug/$WASM_NAME.wasm\\" --out-dir \\"$APP_DIR\\" --no-typescript && \\\\\\n # Have to use --mode=development so we can patch out the call to instantiateStreaming\\n \\"$DIR/node_modules/webpack-cli/bin/cli.js\\" --mode=development \\"$APP_DIR/app_loader.js\\" -o \\"$APP_DIR/bundle.js\\" && \\\\\\n sed -i \'s/.*instantiateStreaming.*//g\' \\"$APP_DIR/bundle.js\\"\\n```\\n\\nOnce that lands, the\\n[build process](https://github.com/speice-io/isomorphic-rust/blob/master/percy_patched_webpack/build.sh#L24-L27)\\nbecomes much simpler:\\n\\n```sh\\n\\ncargo +nightly build --target=wasm32-unknown-unknown && \\\\\\n wasm-bindgen \\"$WASM_DIR/debug/$WASM_NAME.wasm\\" --out-dir \\"$APP_DIR\\" --no-typescript && \\\\\\n \\"$DIR/node_modules/webpack-cli/bin/cli.js\\" --mode=production \\"$APP_DIR/app_loader.js\\" -o \\"$APP_DIR/bundle.js\\"\\n```\\n\\nBut we\'re not done yet! After we compile Rust into WASM and link WASM to Javascript (via\\n`wasm-bindgen` and `webpack`), we still have to make an Electron app. For this purpose I used a\\nstarter app from [Electron Forge], and then a\\n[`prestart` script](https://github.com/speice-io/isomorphic-rust/blob/master/percy/package.json#L8)\\nto actually handle starting the application.\\n\\nThe\\n[final toolchain](https://github.com/speice-io/isomorphic-rust/blob/master/percy/package.json#L8)\\nlooks something like this:\\n\\n- `yarn start` triggers the `prestart` script\\n- `prestart` checks for missing tools (`wasm-bindgen-cli`, etc.) and then:\\n - Uses `cargo` to compile the Rust code into WASM\\n - Uses `wasm-bindgen` to link the WASM blob into a Javascript file with exported symbols\\n - Uses `webpack` to bundle the page start script with the Javascript we just generated\\n - Uses `babel` under the hood to compile the `wasm-bindgen` code down from ES6 into something\\n browser-compatible\\n- The `start` script runs an Electron Forge handler to do some sanity checks\\n- Electron actually starts\\n\\n...which is complicated. I think more work needs to be done to either build a high-quality starter\\napp that can manage these steps, or another tool that \\"just handles\\" the complexity of linking a\\ncompiled WASM file into something the Electron browser can run.\\n\\n# Issue the Second: WASM tools in Rust\\n\\nFor as much as I didn\'t enjoy the Javascript tooling needed to interface with Rust, the Rust-only\\nbits aren\'t any better at the moment. I get it, a lot of projects are just starting off, and that\\nleads to a fragmented ecosystem. Here\'s what I can recommend as a starting point:\\n\\nDon\'t check in your `Cargo.lock` files to version control. If there\'s a disagreement between the\\nversion of `wasm-bindgen-cli` you have installed and the `wasm-bindgen` you\'re compiling with in\\n`Cargo.lock`, you get a nasty error:\\n\\n```\\nit looks like the Rust project used to create this wasm file was linked against\\na different version of wasm-bindgen than this binary:\\n\\nrust wasm file: 0.2.21\\n this binary: 0.2.17\\n\\nCurrently the bindgen format is unstable enough that these two version must\\nexactly match, so it\'s required that these two version are kept in sync by\\neither updating the wasm-bindgen dependency or this binary.\\n```\\n\\nNot that I ever managed to run into this myself (_coughs nervously_).\\n\\nThere are two projects attempting to be \\"application frameworks\\": [percy] and [yew]. Between those,\\nI managed to get [two](https://github.com/speice-io/isomorphic-rust/tree/master/percy)\\n[examples](https://github.com/speice-io/isomorphic-rust/tree/master/percy_patched_webpack) running\\nusing `percy`, but was unable to get an\\n[example](https://github.com/speice-io/isomorphic-rust/tree/master/yew) running with `yew` because\\nof issues with \\"missing modules\\" during the `webpack` step:\\n\\n```sh\\nERROR in ./dist/electron_yew_wasm_bg.wasm\\nModule not found: Error: Can\'t resolve \'env\' in \'/home/bspeice/Development/isomorphic_rust/yew/dist\'\\n @ ./dist/electron_yew_wasm_bg.wasm\\n @ ./dist/electron_yew_wasm.js\\n @ ./dist/app.js\\n @ ./dist/app_loader.js\\n```\\n\\nIf you want to work with the browser APIs directly, your choices are [percy-webapis] or [stdweb] (or\\neventually [web-sys]). See above for my `percy` examples, but when I tried\\n[an example with `stdweb`](https://github.com/speice-io/isomorphic-rust/tree/master/stdweb), I was\\nunable to get it running:\\n\\n```sh\\nERROR in ./dist/stdweb_electron_bg.wasm\\nModule not found: Error: Can\'t resolve \'env\' in \'/home/bspeice/Development/isomorphic_rust/stdweb/dist\'\\n @ ./dist/stdweb_electron_bg.wasm\\n @ ./dist/stdweb_electron.js\\n @ ./dist/app_loader.js\\n```\\n\\nAt this point I\'m pretty convinced that `stdweb` is causing issues for `yew` as well, but can\'t\\nprove it.\\n\\nI did also get a [minimal example](https://github.com/speice-io/isomorphic-rust/tree/master/minimal)\\nrunning that doesn\'t depend on any tools besides `wasm-bindgen`. However, it requires manually\\nwriting \\"`extern C`\\" blocks for everything you need from the browser. Es no bueno.\\n\\nFinally, from a tools and platform view, there are two up-and-coming packages that should be\\nmentioned: [js-sys] and [web-sys]. Their purpose is to be fundamental building blocks that exposes\\nthe browser\'s APIs to Rust. If you\'re interested in building an app framework from scratch, these\\nshould give you the most flexibility. I didn\'t touch either in my research, though I expect them to\\nbe essential long-term.\\n\\nSo there\'s a lot in play from the Rust side of things, and it\'s just going to take some time to\\nfigure out what works and what doesn\'t.\\n\\n# Issue the Third: Known Unknowns\\n\\nAlright, so after I managed to get an application started, I stopped there. It was a good deal of\\neffort to chain together even a proof of concept, and at this point I\'d rather learn [Typescript]\\nthan keep trying to maintain an incredibly brittle pipeline. Blasphemy, I know...\\n\\nThe important point I want to make is that there\'s a lot unknown about how any of this holds up\\noutside proofs of concept. Things I didn\'t attempt:\\n\\n- Testing\\n- Packaging\\n- Updates\\n- Literally anything related to why I wanted to use Electron in the first place\\n\\n# What it Would Take\\n\\nMuch as I don\'t like Javascript, the tools are too shaky for me to recommend mixing Electron and\\nWASM at the moment. There\'s a lot of innovation happening, so who knows? Someone might have an\\napplication in production a couple months from now. But at the moment, I\'m personally going to stay\\naway.\\n\\nLet\'s finish with a wishlist then - here are the things that I think need to happen before\\nElectron/WASM/Rust can become a thing:\\n\\n- Webpack still needs some updates. The necessary work is in progress, but hasn\'t landed yet\\n ([#7983](https://github.com/webpack/webpack/pull/7983))\\n- Browser API libraries (`web-sys` and `stdweb`) need to make sure they can support running in\\n Electron (see module error above)\\n- Projects need to stabilize. There\'s talk of `stdweb` being turned into a Rust API\\n [on top of web-sys](https://github.com/rustwasm/team/issues/226#issuecomment-418475778), and percy\\n [moving to web-sys](https://github.com/chinedufn/percy/issues/24), both of which are big changes\\n- `wasm-bindgen` is great, but still in the \\"move fast and break things\\" phase\\n- A good \\"boilerplate\\" app would dramatically simplify the start-up costs;\\n [electron-react-boilerplate](https://github.com/chentsulin/electron-react-boilerplate) comes to\\n mind as a good project to imitate\\n- More blog posts/contributors! I think Electron + Rust could be cool, but I have no idea what I\'m\\n doing\\n\\n[wxwidgets]: https://wxwidgets.org/\\n[libui-rs]: https://github.com/LeoTindall/libui-rs/\\n[electron]: https://electronjs.org/\\n[babel]: https://babeljs.io/\\n[wxrust]: https://github.com/kenz-gelsoft/wxRust\\n[wasm-bindgen]: https://github.com/rustwasm/wasm-bindgen\\n[js-sys]: https://crates.io/crates/js-sys\\n[percy-webapis]: https://crates.io/crates/percy-webapis\\n[stdweb]: https://crates.io/crates/stdweb\\n[web-sys]: https://crates.io/crates/web-sys\\n[percy]: https://chinedufn.github.io/percy/\\n[virtual-dom-rs]: https://crates.io/crates/virtual-dom-rs\\n[yew]: https://github.com/DenisKolodin/yew\\n[react]: https://reactjs.org/\\n[elm]: http://elm-lang.org/\\n[asm.js]: http://asmjs.org/\\n[emscripten]: https://kripken.github.io/emscripten-site/\\n[typescript]: https://www.typescriptlang.org/\\n[electron forge]: https://electronforge.io/\\n[conrod]: https://github.com/PistonDevelopers/conrod\\n[webpack]: https://webpack.js.org/"},{"id":"2018/09/primitives-in-rust-are-weird","metadata":{"permalink":"/2018/09/primitives-in-rust-are-weird","source":"@site/blog/2018-09-01-primitives-in-rust-are-weird/index.mdx","title":"Primitives in Rust are weird (and cool)","description":"I wrote a really small Rust program a while back because I was curious. I was 100% convinced it","date":"2018-09-01T12:00:00.000Z","tags":[],"readingTime":6.945,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/09/primitives-in-rust-are-weird","title":"Primitives in Rust are weird (and cool)","date":"2018-09-01T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731187596000,"prevItem":{"title":"Isomorphic desktop apps with Rust","permalink":"/2018/09/isomorphic-apps"},"nextItem":{"title":"What I learned porting dateutil to Rust","permalink":"/2018/06/dateutil-parser-to-rust"}},"content":"I wrote a really small Rust program a while back because I was curious. I was 100% convinced it\\ncouldn\'t possibly run:\\n\\n```rust\\nfn main() {\\n println!(\\"{}\\", 8.to_string())\\n}\\n```\\n\\nAnd to my complete befuddlement, it compiled, ran, and produced a completely sensible output.\\n\\n\x3c!-- truncate --\x3e\\n\\nThe reason I was so surprised has to do with how Rust treats a special category of things I\'m going to\\ncall _primitives_. In the current version of the Rust book, you\'ll see them referred to as\\n[scalars][rust_scalar], and in older versions they\'ll be called [primitives][rust_primitive], but\\nwe\'re going to stick with the name _primitive_ for the time being. Explaining why this program is so\\ncool requires talking about a number of other programming languages, and keeping a consistent\\nterminology makes things easier.\\n\\n**You\'ve been warned:** this is going to be a tedious post about a relatively minor issue that\\ninvolves Java, Python, C, and x86 Assembly. And also me pretending like I know what I\'m talking\\nabout with assembly.\\n\\n## Defining primitives (Java)\\n\\nThe reason I\'m using the name _primitive_ comes from how much of my life is Java right now. For the most part I like Java, but I digress. In Java, there\'s a special\\nname for some specific types of values:\\n\\n> ```\\n> bool char byte\\n> short int long\\n> float double\\n> ```\\n\\nThey are referred to as [primitives][java_primitive]. And relative to the other bits of Java,\\nthey have two unique features. First, they don\'t have to worry about the\\n[billion-dollar mistake](https://en.wikipedia.org/wiki/Tony_Hoare#Apologies_and_retractions);\\nprimitives in Java can never be `null`. Second: *they can\'t have instance methods*.\\nRemember that Rust program from earlier? Java has no idea what to do with it:\\n\\n```java\\nclass Main {\\n public static void main(String[] args) {\\n int x = 8;\\n System.out.println(x.toString()); // Triggers a compiler error\\n }\\n}\\n````\\n\\nThe error is:\\n\\n```\\nMain.java:5: error: int cannot be dereferenced\\n System.out.println(x.toString());\\n ^\\n1 error\\n```\\n\\nSpecifically, Java\'s [`Object`](https://docs.oracle.com/javase/10/docs/api/java/lang/Object.html)\\nand things that inherit from it are pointers under the hood, and we have to dereference them before\\nthe fields and methods they define can be used. In contrast, _primitive types are just values_ -\\nthere\'s nothing to be dereferenced. In memory, they\'re just a sequence of bits.\\n\\nIf we really want, we can turn the `int` into an\\n[`Integer`](https://docs.oracle.com/javase/10/docs/api/java/lang/Integer.html) and then dereference\\nit, but it\'s a bit wasteful:\\n\\n```java\\nclass Main {\\n public static void main(String[] args) {\\n int x = 8;\\n Integer y = Integer.valueOf(x);\\n System.out.println(y.toString());\\n }\\n}\\n```\\n\\nThis creates the variable `y` of type `Integer` (which inherits `Object`), and at run time we\\ndereference `y` to locate the `toString()` function and call it. Rust obviously handles things a bit\\ndifferently, but we have to dig into the low-level details to see it in action.\\n\\n## Low Level Handling of Primitives (C)\\n\\nWe first need to build a foundation for reading and understanding the assembly code the final answer\\nrequires. Let\'s begin with showing how the `C` language (and your computer) thinks about \\"primitive\\"\\nvalues in memory:\\n\\n```c\\nvoid my_function(int num) {}\\n\\nint main() {\\n int x = 8;\\n my_function(x);\\n}\\n```\\n\\nThe [compiler explorer](https://godbolt.org/z/lgNYcc) gives us an easy way of showing off the\\nassembly-level code that\'s generated: whose output has been lightly\\nedited\\n\\n```nasm\\nmain:\\n push rbp\\n mov rbp, rsp\\n sub rsp, 16\\n\\n ; We assign the value `8` to `x` here\\n mov DWORD PTR [rbp-4], 8\\n\\n ; And copy the bits making up `x` to a location\\n ; `my_function` can access (`edi`)\\n mov eax, DWORD PTR [rbp-4]\\n mov edi, eax\\n\\n ; Call `my_function` and give it control\\n call my_function\\n\\n mov eax, 0\\n leave\\n ret\\n\\nmy_function:\\n push rbp\\n mov rbp, rsp\\n\\n ; Copy the bits out of the pre-determined location (`edi`)\\n ; to somewhere we can use\\n mov DWORD PTR [rbp-4], edi\\n nop\\n\\n pop rbp\\n ret\\n```\\n\\nAt a really low level of memory, we\'re copying bits around using the [`mov`][x86_guide] instruction;\\nnothing crazy. But to show how similar Rust is, let\'s take a look at our program translated from C\\nto Rust:\\n\\n```rust\\nfn my_function(x: i32) {}\\n\\nfn main() {\\n let x = 8;\\n my_function(x)\\n}\\n```\\n\\nAnd the assembly generated when we stick it in the\\n[compiler explorer](https://godbolt.org/z/cAlmk0): again, lightly\\nedited\\n\\n```nasm\\nexample::main:\\n push rax\\n\\n ; Look familiar? We\'re copying bits to a location for `my_function`\\n ; The compiler just optimizes out holding `x` in memory\\n mov edi, 8\\n\\n ; Call `my_function` and give it control\\n call example::my_function\\n\\n pop rax\\n ret\\n\\nexample::my_function:\\n sub rsp, 4\\n\\n ; And copying those bits again, just like in C\\n mov dword ptr [rsp], edi\\n\\n add rsp, 4\\n ret\\n```\\n\\nThe generated Rust assembly is functionally pretty close to the C assembly: _When working with\\nprimitives, we\'re just dealing with bits in memory_.\\n\\nIn Java we have to dereference a pointer to call its functions; in Rust, there\'s no pointer to\\ndereference. So what exactly is going on with this `.to_string()` function call?\\n\\n## impl primitive (and Python)\\n\\nNow it\'s time to ~~reveal my trap card~~ show the revelation that tied all this\\ntogether: _Rust has implementations for its primitive types._ That\'s right, `impl` blocks aren\'t\\nonly for `structs` and `traits`, primitives get them too. Don\'t believe me? Check out\\n[u32](https://doc.rust-lang.org/std/primitive.u32.html),\\n[f64](https://doc.rust-lang.org/std/primitive.f64.html) and\\n[char](https://doc.rust-lang.org/std/primitive.char.html) as examples.\\n\\nBut the really interesting bit is how Rust turns those `impl` blocks into assembly. Let\'s break out\\nthe [compiler explorer](https://godbolt.org/z/6LBEwq) once again:\\n\\n```rust\\npub fn main() {\\n 8.to_string()\\n}\\n```\\n\\nAnd the interesting bits in the assembly: heavily trimmed down\\n\\n```nasm\\nexample::main:\\n sub rsp, 24\\n mov rdi, rsp\\n lea rax, [rip + .Lbyte_str.u]\\n mov rsi, rax\\n\\n ; Cool stuff right here\\n call ::to_string@PLT\\n\\n mov rdi, rsp\\n call core::ptr::drop_in_place\\n add rsp, 24\\n ret\\n```\\n\\nNow, this assembly is a bit more complicated, but here\'s the big revelation: **we\'re calling\\n`to_string()` as a function that exists all on its own, and giving it the instance of `8`**. Instead\\nof thinking of the value 8 as an instance of `u32` and then peeking in to find the location of the\\nfunction we want to call (like Java), we have a function that exists outside of the instance and\\njust give that function the value `8`.\\n\\nThis is an incredibly technical detail, but the interesting idea I had was this: _if `to_string()`\\nis a static function, can I refer to the unbound function and give it an instance?_\\n\\nBetter explained in code (and a [compiler explorer](https://godbolt.org/z/fJY-gA) link because I\\nseriously love this thing):\\n\\n```rust\\nstruct MyVal {\\n x: u32\\n}\\n\\nimpl MyVal {\\n fn to_string(&self) -> String {\\n self.x.to_string()\\n }\\n}\\n\\npub fn main() {\\n let my_val = MyVal { x: 8 };\\n\\n // THESE ARE THE SAME\\n my_val.to_string();\\n MyVal::to_string(&my_val);\\n}\\n```\\n\\nRust is totally fine \\"binding\\" the function call to the instance, and also as a static.\\n\\nMIND == BLOWN.\\n\\nPython does the same thing where I can both call functions bound to their instances and also call as\\nan unbound function where I give it the instance:\\n\\n```python\\nclass MyClass():\\n x = 24\\n\\n def my_function(self):\\n print(self.x)\\n\\nm = MyClass()\\n\\nm.my_function()\\nMyClass.my_function(m)\\n```\\n\\nAnd Python tries to make you _think_ that primitives can have instance methods...\\n\\n```python\\n>>> dir(8)\\n[\'__abs__\', \'__add__\', \'__and__\', \'__class__\', \'__cmp__\', \'__coerce__\',\\n\'__delattr__\', \'__div__\', \'__divmod__\', \'__doc__\', \'__float__\', \'__floordiv__\',\\n...\\n\'__setattr__\', \'__sizeof__\', \'__str__\', \'__sub__\', \'__subclasshook__\', \'__truediv__\',\\n...]\\n\\n>>> # Theoretically `8.__str__()` should exist, but:\\n\\n>>> 8.__str__()\\n File \\"\\", line 1\\n 8.__str__()\\n ^\\nSyntaxError: invalid syntax\\n\\n>>> # It will run if we assign it first though:\\n>>> x = 8\\n>>> x.__str__()\\n\'8\'\\n```\\n\\n...but in practice it\'s a bit complicated.\\n\\nSo while Python handles binding instance methods in a way similar to Rust, it\'s still not able to\\nrun the example we started with.\\n\\n## Conclusion\\n\\nThis was a super-roundabout way of demonstrating it, but the way Rust handles incredibly minor\\ndetails like primitives leads to really cool effects. Primitives are optimized like C in how they\\nhave a space-efficient memory layout, yet the language still has a lot of features I enjoy in Python\\n(like both instance and late binding).\\n\\nAnd when you put it together, there are areas where Rust does cool things nobody else can; as a\\nquirky feature of Rust\'s type system, `8.to_string()` is actually valid code.\\n\\nNow go forth and fool your friends into thinking you know assembly. This is all I\'ve got.\\n\\n[x86_guide]: http://www.cs.virginia.edu/~evans/cs216/guides/x86.html\\n[java_primitive]: https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html\\n[rust_scalar]: https://doc.rust-lang.org/book/second-edition/ch03-02-data-types.html#scalar-types\\n[rust_primitive]: https://doc.rust-lang.org/book/first-edition/primitive-types.html"},{"id":"2018/06/dateutil-parser-to-rust","metadata":{"permalink":"/2018/06/dateutil-parser-to-rust","source":"@site/blog/2018-06-25-dateutil-parser-to-rust/index.mdx","title":"What I learned porting dateutil to Rust","description":"I\'ve mostly been a lurker in Rust for a while, making a couple small contributions here and there.","date":"2018-06-25T12:00:00.000Z","tags":[],"readingTime":6.99,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/06/dateutil-parser-to-rust","title":"What I learned porting dateutil to Rust","date":"2018-06-25T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731201811000,"prevItem":{"title":"Primitives in Rust are weird (and cool)","permalink":"/2018/09/primitives-in-rust-are-weird"},"nextItem":{"title":"Hello!","permalink":"/2018/05/hello"}},"content":"I\'ve mostly been a lurker in Rust for a while, making a couple small contributions here and there.\\nSo launching [dtparse](https://github.com/bspeice/dtparse) feels like nice step towards becoming a\\nfunctioning member of society. But not too much, because then you know people start asking you to\\npay bills, and ain\'t nobody got time for that.\\n\\n\x3c!-- truncate --\x3e\\n\\nBut I built dtparse, and you can read about my thoughts on the process. Or don\'t. I won\'t tell you\\nwhat to do with your life (but you should totally keep reading).\\n\\n## Slow down, what?\\n\\nOK, fine, I guess I should start with _why_ someone would do this.\\n\\n[Dateutil](https://github.com/dateutil/dateutil) is a Python library for handling dates. The\\nstandard library support for time in Python is kinda dope, but there are a lot of extras that go\\ninto making it useful beyond just the [datetime](https://docs.python.org/3.6/library/datetime.html)\\nmodule. `dateutil.parser` specifically is code to take all the super-weird time formats people come\\nup with and turn them into something actually useful.\\n\\nDate/time parsing, it turns out, is just like everything else involving\\n[computers](https://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time) and\\n[time](https://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time): it\\nfeels like it shouldn\'t be that difficult to do, until you try to do it, and you realize that people\\nsuck and this is why\\n[we can\'t we have nice things](https://zachholman.com/talk/utc-is-enough-for-everyone-right). But\\nalas, we\'ll try and make contemporary art out of the rubble and give it a pretentious name like\\n_Time_.\\n\\n![A gravel mound](./gravel-mound.jpg)\\n\\n> [Time](https://www.goodfreephotos.com/united-states/montana/elkhorn/remains-of-the-mining-operation-elkhorn.jpg.php)\\n\\nWhat makes `dateutil.parser` great is that there\'s single function with a single argument that\\ndrives what programmers interact with:\\n[`parse(timestr)`](https://github.com/dateutil/dateutil/blob/6dde5d6298cfb81a4c594a38439462799ed2aef2/dateutil/parser/_parser.py#L1258).\\nIt takes in the time as a string, and gives you back a reasonable \\"look, this is the best anyone can\\npossibly do to make sense of your input\\" value. It doesn\'t expect much of you.\\n\\n[And now it\'s in Rust.](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L1332)\\n\\n## Lost in Translation\\n\\nHaving worked at a bulge-bracket bank watching Java programmers try to be Python programmers, I\'m\\nadmittedly hesitant to publish Python code that\'s trying to be Rust. Interestingly, Rust code can\\nactually do a great job of mimicking Python. It\'s certainly not idiomatic Rust, but I\'ve had better\\nexperiences than\\n[this guy](https://webcache.googleusercontent.com/search?q=cache:wkYMpktJtnUJ:https://jackstouffer.com/blog/porting_dateutil.html+&cd=3&hl=en&ct=clnk&gl=us)\\nwho attempted the same thing for D. These are the actual take-aways:\\n\\nWhen transcribing code, **stay as close to the original library as possible**. I\'m talking about\\nusing the same variable names, same access patterns, the whole shebang. It\'s way too easy to make a\\ncouple of typos, and all of a sudden your code blows up in new and exciting ways. Having a reference\\nmanual for verbatim what your code should be means that you don\'t spend that long debugging\\ncomplicated logic, you\'re more looking for typos.\\n\\nAlso, **don\'t use nice Rust things like enums**. While\\n[one time it worked out OK for me](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L88-L94),\\nI also managed to shoot myself in the foot a couple times because `dateutil` stores AM/PM as a\\nboolean and I mixed up which was true, and which was false (side note: AM is false, PM is true). In\\ngeneral, writing nice code _should not be a first-pass priority_ when you\'re just trying to recreate\\nthe same functionality.\\n\\n**Exceptions are a pain.** Make peace with it. Python code is just allowed to skip stack frames. So\\nwhen a co-worker told me \\"Rust is getting try-catch syntax\\" I properly freaked out. Turns out\\n[he\'s not quite right](https://github.com/rust-lang/rfcs/pull/243), and I\'m OK with that. And while\\n`dateutil` is pretty well-behaved about not skipping multiple stack frames,\\n[130-line try-catch blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L730-L865)\\ntake a while to verify.\\n\\nAs another Python quirk, **be very careful about\\n[long nested if-elif-else blocks](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L494-L568)**.\\nI used to think that Python\'s whitespace was just there to get you to format your code correctly. I\\nthink that no longer. It\'s way too easy to close a block too early and have incredibly weird issues\\nin the logic. Make sure you use an editor that displays indentation levels so you can keep things\\nstraight.\\n\\n**Rust macros are not free.** I originally had the\\n[main test body](https://github.com/bspeice/dtparse/blob/b0e737f088eca8e83ab4244c6621a2797d247697/tests/compat.rs#L63-L217)\\nwrapped up in a macro using [pyo3](https://github.com/PyO3/PyO3). It took two minutes to compile.\\nAfter\\n[moving things to a function](https://github.com/bspeice/dtparse/blob/e017018295c670e4b6c6ee1cfff00dbb233db47d/tests/compat.rs#L76-L205)\\ncompile times dropped down to ~5 seconds. Turns out 150 lines \\\\* 100 tests = a lot of redundant code\\nto be compiled. My new rule of thumb is that any macros longer than 10-15 lines are actually\\nfunctions that need to be liberated, man.\\n\\nFinally, **I really miss list comprehensions and dictionary comprehensions.** As a quick comparison,\\nsee\\n[this dateutil code](https://github.com/dateutil/dateutil/blob/16561fc99361979e88cccbd135393b06b1af7e90/dateutil/parser/_parser.py#L476)\\nand\\n[the implementation in Rust](https://github.com/bspeice/dtparse/blob/7d565d3a78876dbebd9711c9720364fe9eba7915/src/lib.rs#L619-L629).\\nI probably wrote it wrong, and I\'m sorry. Ultimately though, I hope that these comprehensions can be\\nadded through macros or syntax extensions. Either way, they\'re expressive, save typing, and are\\nsuper-readable. Let\'s get more of that.\\n\\n## Using a young language\\n\\nNow, Rust is exciting and new, which means that there\'s opportunity to make a substantive impact. On\\nmore than one occasion though, I\'ve had issues navigating the Rust ecosystem.\\n\\nWhat I\'ll call the \\"canonical library\\" is still being built. In Python, if you need datetime\\nparsing, you use `dateutil`. If you want `decimal` types, it\'s already in the\\n[standard library](https://docs.python.org/3.6/library/decimal.html). While I might\'ve gotten away\\nwith `f64`, `dateutil` uses decimals, and I wanted to follow the principle of **staying as close to\\nthe original library as possible**. Thus began my quest to find a decimal library in Rust. What I\\nquickly found was summarized in a comment:\\n\\n> Writing a BigDecimal is easy. Writing a _good_ BigDecimal is hard.\\n>\\n> [-cmr](https://github.com/rust-lang/rust/issues/8937#issuecomment-34582794)\\n\\nIn practice, this means that there are at least [4](https://crates.io/crates/bigdecimal)\\n[different](https://crates.io/crates/rust_decimal)\\n[implementations](https://crates.io/crates/decimal) [available](https://crates.io/crates/decimate).\\nAnd that\'s a lot of decisions to worry about when all I\'m thinking is \\"why can\'t\\n[calendar reform](https://en.wikipedia.org/wiki/Calendar_reform) be a thing\\" and I\'m forced to dig\\nthrough a [couple](https://github.com/rust-lang/rust/issues/8937#issuecomment-31661916)\\n[different](https://github.com/rust-lang/rfcs/issues/334)\\n[threads](https://github.com/rust-num/num/issues/8) to figure out if the library I\'m look at is dead\\nor just stable.\\n\\nAnd even when the \\"canonical library\\" exists, there\'s no guarantees that it will be well-maintained.\\n[Chrono](https://github.com/chronotope/chrono) is the _de facto_ date/time library in Rust, and just\\nreleased version 0.4.4 like two days ago. Meanwhile,\\n[chrono-tz](https://github.com/chronotope/chrono-tz) appears to be dead in the water even though\\n[there are people happy to help maintain it](https://github.com/chronotope/chrono-tz/issues/19). I\\nknow relatively little about it, but it appears that most of the release process is automated;\\nkeeping that up to date should be a no-brainer.\\n\\n## Trial Maintenance Policy\\n\\nSpecifically given \\"maintenance\\" being an\\n[oft-discussed](https://www.reddit.com/r/rust/comments/48540g/thoughts_on_initiators_vs_maintainers/)\\nissue, I\'m going to try out the following policy to keep things moving on `dtparse`:\\n\\n1. Issues/PRs needing _maintainer_ feedback will be updated at least weekly. I want to make sure\\n nobody\'s blocking on me.\\n\\n2. To keep issues/PRs needing _contributor_ feedback moving, I\'m going to (kindly) ask the\\n contributor to check in after two weeks, and close the issue without resolution if I hear nothing\\n back after a month.\\n\\nThe second point I think has the potential to be a bit controversial, so I\'m happy to receive\\nfeedback on that. And if a contributor responds with \\"hey, still working on it, had a kid and I\'m\\nrunning on 30 seconds of sleep a night,\\" then first: congratulations on sustaining human life. And\\nsecond: I don\'t mind keeping those requests going indefinitely. I just want to try and balance\\nkeeping things moving with giving people the necessary time they need.\\n\\nI should also note that I\'m still getting some best practices in place - CONTRIBUTING and\\nCONTRIBUTORS files need to be added, as well as issue/PR templates. In progress. None of us are\\nperfect.\\n\\n## Roadmap and Conclusion\\n\\nSo if I\'ve now built a `dateutil`-compatible parser, we\'re done, right? Of course not! That\'s not\\nnearly ambitious enough.\\n\\nUltimately, I\'d love to have a library that\'s capable of parsing everything the Linux `date` command\\ncan do (and not `date` on OSX, because seriously, BSD coreutils are the worst). I know Rust has a\\ncoreutils rewrite going on, and `dtparse` would potentially be an interesting candidate since it\\ndoesn\'t bring in a lot of extra dependencies. [`humantime`](https://crates.io/crates/humantime)\\ncould help pick up some of the (current) slack in dtparse, so maybe we can share and care with each\\nother?\\n\\nAll in all, I\'m mostly hoping that nobody\'s already done this and I haven\'t spent a bit over a month\\non redundant code. So if it exists, tell me. I need to know, but be nice about it, because I\'m going\\nto take it hard.\\n\\nAnd in the mean time, I\'m looking forward to building more. Onwards."},{"id":"2018/05/hello","metadata":{"permalink":"/2018/05/hello","source":"@site/blog/2018-05-28-hello/index.mdx","title":"Hello!","description":"I\'ll do what I can to keep this short, there\'s plenty of other things we both should be doing right","date":"2018-05-28T12:00:00.000Z","tags":[],"readingTime":0.375,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/05/hello","title":"Hello!","date":"2018-05-28T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731187596000,"prevItem":{"title":"What I learned porting dateutil to Rust","permalink":"/2018/06/dateutil-parser-to-rust"},"nextItem":{"title":"Captain\'s Cookbook: Practical usage","permalink":"/2018/01/captains-cookbook-part-2"}},"content":"I\'ll do what I can to keep this short, there\'s plenty of other things we both should be doing right\\nnow.\\n\\n\x3c!-- truncate --\x3e\\n\\nIf you\'re here for the bread pics, and to marvel in some other culinary side projects, I\'ve got you\\ncovered:\\n\\n![Saturday Bread](./bread.jpg)\\n\\nAnd no, I\'m not posting pictures of earlier attempts that ended up turning into rocks in the oven.\\n\\nOkay, just one:\\n\\n![Bread as rock](./rocks.jpg)\\n\\nThanks, and keep it amazing."},{"id":"2018/01/captains-cookbook-part-2","metadata":{"permalink":"/2018/01/captains-cookbook-part-2","source":"@site/blog/2018-01-16-captains-cookbook-part-2/index.mdx","title":"Captain\'s Cookbook: Practical usage","description":"A look at more practical usages of Cap\'N Proto","date":"2018-01-16T13:00:00.000Z","tags":[],"readingTime":6.51,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/01/captains-cookbook-part-2","title":"Captain\'s Cookbook: Practical usage","date":"2018-01-16T13:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731201811000,"prevItem":{"title":"Hello!","permalink":"/2018/05/hello"},"nextItem":{"title":"Captain\'s Cookbook: Project setup","permalink":"/2018/01/captains-cookbook-part-1"}},"content":"A look at more practical usages of Cap\'N Proto\\n\\n\x3c!-- truncate --\x3e\\n\\n[Part 1](/2018/01/captains-cookbook-part-1) of this series took a look at a basic starting project\\nwith Cap\'N Proto. In this section, we\'re going to take the (admittedly basic) schema and look at how we can add a pretty\\nbasic feature - sending Cap\'N Proto messages between threads. It\'s nothing complex, but I want to make sure that there\'s\\nsome documentation surrounding practical usage of the library.\\n\\nAs a quick refresher, we build a Cap\'N Proto message and go through the serialization/deserialization steps\\n[here](https://github.com/bspeice/capnp_cookbook_1/blob/master/src/main.rs). Our current example is going to build on\\nthe code we wrote there; after the deserialization step, we\'ll try and send the `point_reader` to a separate thread\\nfor verification.\\n\\nI\'m going to walk through the attempts as I made them and my thinking throughout.\\nIf you want to skip to the final project, check out the code available [here](https://github.com/bspeice/capnp_cookbook_2)\\n\\n## Attempt 1: Move the reference\\n\\nAs a first attempt, we\'re going to try and let Rust move the reference. Our code will look something like:\\n\\n```rust\\nfn main() {\\n\\n // ...assume that we own a `buffer: Vec` containing the binary message content from\\n // somewhere else\\n\\n let deserialized = capnp::serialize::read_message(\\n &mut buffer.as_slice(),\\n capnp::message::ReaderOptions::new()\\n ).unwrap();\\n\\n let point_reader = deserialized.get_root::().unwrap();\\n\\n // By using `point_reader` inside the new thread, we\'re hoping that Rust can\\n // safely move the reference and invalidate the original thread\'s usage.\\n // Since the original thread doesn\'t use `point_reader` again, this should\\n // be safe, right?\\n let handle = std::thread:spawn(move || {\\n\\n assert_eq!(point_reader.get_x(), 12);\\n\\n assert_eq!(point_reader.get_y(), 14);\\n });\\n\\n handle.join().unwrap()\\n}\\n```\\n\\nWell, the Rust compiler doesn\'t really like this. We get four distinct errors back:\\n\\n```\\nerror[E0277]: the trait bound `*const u8: std::marker::Send` is not satisfied in `[closure@src/main.rs:31:37: 36:6 point_reader:point_capnp::point::Reader<\'_>]` \\n --\x3e src/main.rs:31:18 \\n | \\n31 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `*const u8` cannot be sent between threads safely \\n | \\n\\nerror[E0277]: the trait bound `*const capnp::private::layout::WirePointer: std::marker::Send` is not satisfied in `[closure@src/main.rs:31:37: 36:6 point_reader:point_capnp::point::Reader<\'_>]` \\n --\x3e src/main.rs:31:18 \\n | \\n31 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `*const capnp::private::layout::WirePointer` cannot be sent between threads safely \\n | \\n\\nerror[E0277]: the trait bound `capnp::private::arena::ReaderArena: std::marker::Sync` is not satisfied \\n --\x3e src/main.rs:31:18 \\n | \\n31 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `capnp::private::arena::ReaderArena` cannot be shared between threads safely \\n | \\n\\nerror[E0277]: the trait bound `*const std::vec::Vec>>: std::marker::Send` is not satisfied in `[closure@src/main.rs:31:37: 36:6 point_reader:point_capnp::point::Reader<\'_>]` \\n --\x3e src/main.rs:31:18 \\n | \\n31 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `*const std::vec::Vec>>` cannot be sent between threads safely \\n | \\n\\nerror: aborting due to 4 previous errors\\n```\\n\\nNote, I\'ve removed the help text for brevity, but suffice to say that these errors are intimidating.\\nPay attention to the text that keeps on getting repeated though: `XYZ cannot be sent between threads safely`.\\n\\nThis is a bit frustrating: we own the `buffer` from which all the content was derived, and we don\'t have any\\nunsafe accesses in our code. We guarantee that we wait for the child thread to stop first, so there\'s no possibility\\nof the pointer becoming invalid because the original thread exits before the child thread does. So why is Rust\\npreventing us from doing something that really should be legal?\\n\\nThis is what is known as [fighting the borrow checker](https://doc.rust-lang.org/1.8.0/book/references-and-borrowing.html).\\nLet our crusade begin.\\n\\n## Attempt 2: Put the `Reader` in a `Box`\\n\\nThe [`Box`](https://doc.rust-lang.org/std/boxed/struct.Box.html) type allows us to convert a pointer we have\\n(in our case the `point_reader`) into an \\"owned\\" value, which should be easier to send across threads.\\nOur next attempt looks something like this:\\n\\n```rust\\nfn main() {\\n\\n // ...assume that we own a `buffer: Vec` containing the binary message content\\n // from somewhere else\\n\\n let deserialized = capnp::serialize::read_message(\\n &mut buffer.as_slice(),\\n capnp::message::ReaderOptions::new()\\n ).unwrap();\\n\\n let point_reader = deserialized.get_root::().unwrap();\\n\\n let boxed_reader = Box::new(point_reader);\\n\\n // Now that the reader is `Box`ed, we\'ve proven ownership, and Rust can\\n // move the ownership to the new thread, right?\\n let handle = std::thread::spawn(move || {\\n\\n assert_eq!(boxed_reader.get_x(), 12);\\n\\n assert_eq!(boxed_reader.get_y(), 14);\\n });\\n\\n handle.join().unwrap();\\n}\\n```\\n\\nSpoiler alert: still doesn\'t work. Same errors still show up.\\n\\n```\\nerror[E0277]: the trait bound `*const u8: std::marker::Send` is not satisfied in `point_capnp::point::Reader<\'_>` \\n --\x3e src/main.rs:33:18 \\n | \\n33 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `*const u8` cannot be sent between threads safely \\n | \\n\\nerror[E0277]: the trait bound `*const capnp::private::layout::WirePointer: std::marker::Send` is not satisfied in `point_capnp::point::Reader<\'_>` \\n --\x3e src/main.rs:33:18 \\n | \\n33 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `*const capnp::private::layout::WirePointer` cannot be sent between threads safely \\n | \\n\\nerror[E0277]: the trait bound `capnp::private::arena::ReaderArena: std::marker::Sync` is not satisfied \\n --\x3e src/main.rs:33:18 \\n | \\n33 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `capnp::private::arena::ReaderArena` cannot be shared between threads safely \\n | \\n\\nerror[E0277]: the trait bound `*const std::vec::Vec>>: std::marker::Send` is not satisfied in `point_capnp::point::Reader<\'_>` \\n --\x3e src/main.rs:33:18 \\n | \\n33 | let handle = std::thread::spawn(move || { \\n | ^^^^^^^^^^^^^^^^^^ `*const std::vec::Vec>>` cannot be sent between threads safely \\n | \\n\\nerror: aborting due to 4 previous errors\\n```\\n\\nLet\'s be a little bit smarter about the exceptions this time though. What is that\\n[`std::marker::Send`](https://doc.rust-lang.org/std/marker/trait.Send.html) thing the compiler keeps telling us about?\\n\\nThe documentation is pretty clear; `Send` is used to denote:\\n\\n> Types that can be transferred across thread boundaries.\\n\\nIn our case, we are seeing the error messages for two reasons:\\n\\n1. Pointers (`*const u8`) are not safe to send across thread boundaries. While we\'re nice in our code\\nmaking sure that we wait on the child thread to finish before closing down, the Rust compiler can\'t make\\nthat assumption, and so complains that we\'re not using this in a safe manner.\\n\\n2. The `point_capnp::point::Reader` type is itself not safe to send across threads because it doesn\'t\\nimplement the `Send` trait. Which is to say, the things that make up a `Reader` are themselves not thread-safe,\\nso the `Reader` is also not thread-safe.\\n\\nSo, how are we to actually transfer a parsed Cap\'N Proto message between threads?\\n\\n## Attempt 3: The `TypedReader`\\n\\nThe `TypedReader` is a new API implemented in the Cap\'N Proto [Rust code](https://crates.io/crates/capnp/0.8.14).\\nWe\'re interested in it here for two reasons:\\n\\n1. It allows us to define an object where the _object_ owns the underlying data. In previous attempts,\\nthe current context owned the data, but the `Reader` itself had no such control.\\n\\n2. We can compose the `TypedReader` using objects that are safe to `Send` across threads, guaranteeing\\nthat we can transfer parsed messages across threads.\\n\\nThe actual type info for the [`TypedReader`](https://github.com/capnproto/capnproto-rust/blob/f0efc35d7e9bd8f97ca4fdeb7c57fd7ea348e303/src/message.rs#L181)\\nis a bit complex. And to be honest, I\'m still really not sure what the whole point of the\\n[`PhantomData`](https://doc.rust-lang.org/std/marker/struct.PhantomData.html) thing is either.\\nMy impression is that it lets us enforce type safety when we know what the underlying Cap\'N Proto\\nmessage represents. That is, technically the only thing we\'re storing is the untyped binary message;\\n`PhantomData` just enforces the principle that the binary represents some specific object that has been parsed.\\n\\nEither way, we can carefully construct something which is safe to move between threads:\\n\\n```rust\\nfn main() {\\n\\n // ...assume that we own a `buffer: Vec` containing the binary message content from somewhere else\\n\\n let deserialized = capnp::serialize::read_message(\\n &mut buffer.as_slice(),\\n capnp::message::ReaderOptions::new()\\n ).unwrap();\\n\\n let point_reader: capnp::message::TypedReader =\\n capnp::message::TypedReader::new(deserialized);\\n\\n // Because the point_reader is now working with OwnedSegments (which are owned vectors) and an Owned message\\n // (which is \'static lifetime), this is now safe\\n let handle = std::thread::spawn(move || {\\n\\n // The point_reader owns its data, and we use .get() to retrieve the actual point_capnp::point::Reader\\n // object from it\\n let point_root = point_reader.get().unwrap();\\n\\n assert_eq!(point_root.get_x(), 12);\\n\\n assert_eq!(point_root.get_y(), 14);\\n });\\n\\n handle.join().unwrap();\\n}\\n```\\n\\nAnd while we\'ve left Rust to do the dirty work of actually moving the `point_reader` into the new thread,\\nwe could also use things like [`mpsc` channels](https://doc.rust-lang.org/std/sync/mpsc/index.html) to achieve a similar effect.\\n\\nSo now we\'re able to define basic Cap\'N Proto messages, and send them all around our programs."},{"id":"2018/01/captains-cookbook-part-1","metadata":{"permalink":"/2018/01/captains-cookbook-part-1","source":"@site/blog/2018-01-16-captains-cookbok-part-1/index.mdx","title":"Captain\'s Cookbook: Project setup","description":"A basic introduction to getting started with Cap\'N Proto.","date":"2018-01-16T12:00:00.000Z","tags":[],"readingTime":7.555,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2018/01/captains-cookbook-part-1","title":"Captain\'s Cookbook: Project setup","date":"2018-01-16T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1731201811000,"prevItem":{"title":"Captain\'s Cookbook: Practical usage","permalink":"/2018/01/captains-cookbook-part-2"},"nextItem":{"title":"PCA audio compression","permalink":"/2016/11/pca-audio-compression"}},"content":"A basic introduction to getting started with Cap\'N Proto.\\n\\n\x3c!-- truncate --\x3e\\n\\nI\'ve been working a lot with [Cap\'N Proto](https://capnproto.org/) recently with Rust, but there\'s a real dearth of information\\non how to set up and get going quickly. In the interest of trying to get more people using this (because I think it\'s\\nfantastic), I\'m going to work through a couple of examples detailing what exactly should be done to get going.\\n\\nSo, what is Cap\'N Proto? It\'s a data serialization library. It has contemporaries with [Protobuf](https://developers.google.com/protocol-buffers/)\\nand [FlatBuffers](https://google.github.io/flatbuffers/), but is better compared with FlatBuffers. The whole point behind it\\nis to define a schema language and serialization format such that:\\n\\n1. Applications that do not share the same base programming language can communicate\\n2. The data and schema you use can naturally evolve over time as your needs change\\n\\nAccompanying this are typically code generators that take the schemas you define for your application and give you back\\ncode for different languages to get data to and from that schema.\\n\\nNow, what makes Cap\'N Proto different from, say, Protobuf, is that there is no serialization/deserialization step the same way\\nas is implemented with Protobuf. Instead, the idea is that the message itself can be loaded in memory and used directly there.\\n\\nWe\'re going to take a look at a series of progressively more complex projects that use Cap\'N Proto in an effort to provide some\\nexamples of what idiomatic usage looks like, and shorten the startup time needed to make use of this library in Rust projects.\\nIf you want to follow along, feel free. If not, I\'ve posted [the final result](https://github.com/bspeice/capnp_cookbook_1)\\nfor reference.\\n\\n## Step 1: Installing `capnp`\\n\\nThe `capnp` binary itself is needed for taking the schema files you write and turning them into a format that can be used by the\\ncode generation libraries. Don\'t ask me what that actually means, I just know that you need to make sure this is installed.\\n\\nI\'ll refer you to [Cap\'N Proto\'s installation instructions](https://capnproto.org/install.html) here. As a quick TLDR though:\\n\\n- Linux users will likely have a binary shipped by their package manager - On Ubuntu, `apt install capnproto` is enough\\n- OS X users can use [Homebrew](https://brew.sh/) as an easy install path. Just `brew install capnp`\\n- Windows users are a bit more complicated. If you\'re using [Chocolatey](https://chocolatey.org/), there\'s [a package](https://chocolatey.org/packages/capnproto/) available. If that doesn\'t work however, you need to download [a release zip](https://capnproto.org/capnproto-c++-win32-0.6.1.zip) and make sure that the `capnp.exe` binary is in your `%PATH%` environment variable\\n\\nThe way you know you\'re done with this step is if the following command works in your shell:\\n\\n```bash\\ncapnp id\\n```\\n\\n## Step 2: Starting a Cap\'N Proto Rust project\\n\\nAfter the `capnp` binary is set up, it\'s time to actually create our Rust project. Nothing terribly complex here, just a simple\\n\\n```bash\\nmkdir capnp_cookbook_1\\ncd capnp_cookbook_1\\ncargo init --bin\\n```\\n\\nWe\'ll put the following content into `Cargo.toml`:\\n\\n```\\n[package]\\nname = \\"capnp_cookbook_1\\"\\nversion = \\"0.1.0\\"\\nauthors = [\\"Bradlee Speice \\"]\\n\\n[build-dependencies]\\ncapnpc = \\"0.8\\" # 1\\n\\n[dependencies]\\ncapnp = \\"0.8\\" # 2\\n```\\n\\nThis sets up: \\n\\n1. The Rust code generator (CAPNProto Compiler)\\n2. The Cap\'N Proto runtime library (CAPNProto runtime)\\n\\nWe\'ve now got everything prepared that we need for writing a Cap\'N Proto project.\\n\\n## Step 3: Writing a basic schema\\n\\nWe\'re going to start with writing a pretty trivial data schema that we can extend later. This is just intended to make sure\\nyou get familiar with how to start from a basic project.\\n\\nFirst, we\'re going to create a top-level directory for storing the schema files in:\\n\\n```bash\\n# Assuming we\'re starting from the `capnp_cookbook_1` directory created earlier\\n\\nmkdir schema\\ncd schema\\n```\\n\\nNow, we\'re going to put the following content in `point.capnp`:\\n\\n```\\n@0xab555145c708dad2;\\n\\nstruct Point {\\n x @0 :Int32;\\n y @1 :Int32;\\n}\\n```\\n\\nPretty easy, we\'ve now got structure for an object we\'ll be able to quickly encode in a binary format.\\n\\n## Step 4: Setting up the build process\\n\\nNow it\'s time to actually set up the build process to make sure that Cap\'N Proto generates the Rust code we\'ll eventually be using.\\nThis is typically done through a `build.rs` file to invoke the schema compiler.\\n\\nIn the same folder as your `Cargo.toml` file, please put the following content in `build.rs`:\\n\\n```rust\\nextern crate capnpc;\\n\\nfn main() {\\n ::capnpc::CompilerCommand::new()\\n .src_prefix(\\"schema\\") // 1\\n .file(\\"schema/point.capnp\\") // 2\\n .run().expect(\\"compiling schema\\");\\n}\\n```\\n\\nThis sets up the protocol compiler (`capnpc` from earlier) to compile the schema we\'ve built so far.\\n\\n1. Because Cap\'N Proto schema files can re-use types specified in other files, the `src_prefix()` tells the compiler\\nwhere to look for those extra files at.\\n2. We specify the schema file we\'re including by hand. In a much larger project, you could presumably build the `CompilerCommand`\\ndynamically, but we won\'t worry too much about that one for now.\\n\\n## Step 5: Running the build\\n\\nIf you\'ve done everything correctly so far, you should be able to actually build the project and see the auto-generated code.\\nRun a `cargo build` command, and if you don\'t see `cargo` complaining, you\'re doing just fine!\\n\\nSo where exactly does the generated code go to? I think it\'s critically important for people to be able to see what the generated\\ncode looks like, because you need to understand what you\'re actually programming against. The short answer is: the generated code lives\\nsomewhere in the `target/` directory.\\n\\nThe long answer is that you\'re best off running a `find` command to get the actual file path:\\n\\n```bash\\n# Assuming we\'re running from the capnp_cookbook_1 project folder\\nfind . -name point_capnp.rs\\n```\\n\\nAlternately, if the `find` command isn\'t available, the path will look something like:\\n\\n```\\n./target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs\\n```\\n\\nSee if there are any paths in your target directory that look similar.\\n\\nNow, the file content looks pretty nasty. I\'ve included an example [here](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs)\\nif you aren\'t following along at home. There are a couple things I\'ll try and point out though so you can get an idea of how\\nthe schema we wrote for the \\"Point\\" message is tied to the generated code.\\n\\nFirst, the Cap\'N Proto library splits things up into `Builder` and `Reader` structs. These are best thought of the same way\\nRust separates `mut` from non-`mut` code. `Builder`s are `mut` versions of your message, and `Reader`s are immutable versions.\\n\\nFor example, the [`Builder` impl](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L90) for `point` defines [`get_x()`](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L105), [`set_x()`](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L109), [`get_y()`](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L113), and [`set_y()`](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L117) methods.\\nIn comparison, the [`Reader` impl](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L38) only defines [`get_x()`](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L47) and [`get_y()`](https://github.com/bspeice/capnp_cookbook_1/blob/master/target/debug/build/capnp_cookbook_1-c6e2990393c32fe6/out/point_capnp.rs#L51) methods.\\n\\nSo now we know that there are some `get` and `set` methods available for our `x` and `y` coordinates;\\nbut what do we actually do with those?\\n\\n## Step 6: Making a point\\n\\nSo we\'ve install Cap\'N Proto, gotten a project set up, and can generate schema code now. It\'s time to actually start building\\nCap\'N Proto messages! I\'m going to put the code you need here because it\'s small, and put some extra long comments inline. This code\\nshould go in [`src/main.rs`](https://github.com/bspeice/capnp_cookbook_1/blob/master/src/main.rs):\\n\\n```rust\\n// Note that we use `capnp` here, NOT `capnpc`\\nextern crate capnp;\\n\\n// We create a module here to define how we are to access the code\\n// being included.\\npub mod point_capnp {\\n // The environment variable OUT_DIR is set by Cargo, and\\n // is the location of all the code that was built as part\\n // of the codegen step.\\n // point_capnp.rs is the actual file to include\\n include!(concat!(env!(\\"OUT_DIR\\"), \\"/point_capnp.rs\\"));\\n}\\n\\nfn main() {\\n\\n // The process of building a Cap\'N Proto message is a bit tedious.\\n // We start by creating a generic Builder; it acts as the message\\n // container that we\'ll later be filling with content of our `Point`\\n let mut builder = capnp::message::Builder::new_default();\\n\\n // Because we need a mutable reference to the `builder` later,\\n // we fence off this part of the code to allow sequential mutable\\n // borrows. As I understand it, non-lexical lifetimes:\\n // https://github.com/rust-lang/rust-roadmap/issues/16\\n // will make this no longer necessary\\n {\\n // And now we can set up the actual message we\'re trying to create\\n let mut point_msg = builder.init_root::();\\n\\n // Stuff our message with some content\\n point_msg.set_x(12);\\n\\n point_msg.set_y(14);\\n }\\n\\n // It\'s now time to serialize our message to binary. Let\'s set up a buffer for that:\\n let mut buffer = Vec::new();\\n\\n // And actually fill that buffer with our data\\n capnp::serialize::write_message(&mut buffer, &builder).unwrap();\\n\\n // Finally, let\'s deserialize the data\\n let deserialized = capnp::serialize::read_message(\\n &mut buffer.as_slice(),\\n capnp::message::ReaderOptions::new()\\n ).unwrap();\\n\\n // `deserialized` is currently a generic reader; it understands\\n // the content of the message we gave it (i.e. that there are two\\n // int32 values) but doesn\'t really know what they represent (the Point).\\n // This is where we map the generic data back into our schema.\\n let point_reader = deserialized.get_root::().unwrap();\\n\\n // We can now get our x and y values back, and make sure they match\\n assert_eq!(point_reader.get_x(), 12);\\n assert_eq!(point_reader.get_y(), 14);\\n}\\n```\\n\\nAnd with that, we\'ve now got a functioning project. Here\'s the content I\'m planning to go over next as we build up\\nsome practical examples of Cap\'N Proto in action:"},{"id":"2016/11/pca-audio-compression","metadata":{"permalink":"/2016/11/pca-audio-compression","source":"@site/blog/2016-11-01-PCA-audio-compression/index.mdx","title":"PCA audio compression","description":"In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure.","date":"2016-11-01T12:00:00.000Z","tags":[],"readingTime":10.39,"hasTruncateMarker":true,"authors":[{"name":"Bradlee Speice","socials":{"github":"https://github.com/bspeice"},"key":"bspeice","page":null}],"frontMatter":{"slug":"2016/11/pca-audio-compression","title":"PCA audio compression","date":"2016-11-01T12:00:00.000Z","authors":["bspeice"],"tags":[]},"unlisted":false,"lastUpdatedAt":1730863976000,"prevItem":{"title":"Captain\'s Cookbook: Project setup","permalink":"/2018/01/captains-cookbook-part-1"},"nextItem":{"title":"A Rustic re-podcasting server","permalink":"/2016/10/rustic-repodcasting"}},"content":"In which I apply Machine Learning techniques to Digital Signal Processing to astounding failure.\\n\\n\x3c!-- truncate --\x3e\\n\\nTowards a new (and pretty poor) compression scheme\\n--------------------------------------------------\\n\\nI\'m going to be working with some audio data for a while as I get prepared for a term project this semester. I\'ll be working (with a partner) to design a system for separating voices from music. Given my total lack of experience with [Digital Signal Processing][1] I figured that now was as good a time as ever to work on a couple of fun projects that would get me back up to speed.\\n\\nThe first project I want to work on: Designing a new compression scheme for audio data.\\n\\nA Brief Introduction to Audio Compression\\n-----------------------------------------\\n\\nAudio files when uncompressed (files ending with `.wav`) are huge. Like, 10.5 Megabytes per minute huge. Storage is cheap these days, but that\'s still an incredible amount of data that we don\'t really need. Instead, we\'d like to compress that data so that it\'s not taking up so much space. There are broadly two ways to accomplish this:\\n\\n1. Lossless compression - Formats like [FLAC][2], [ALAC][3], and [Monkey\'s Audio (.ape)][4] all go down this route. The idea is that when you compress and uncompress a file, you get exactly the same as what you started with.\\n\\n2. Lossy compression - Formats like [MP3][5], [Ogg][6], and [AAC (`.m4a`)][7] are far more popular, but make a crucial tradeoff: We can reduce the file size even more during compression, but the decompressed file won\'t be the same.\\n\\nThere is a fundamental tradeoff at stake: Using lossy compression sacrifices some of the integrity of the resulting file to save on storage space. Most people (I personally believe it\'s everybody) can\'t hear the difference, so this is an acceptable tradeoff. You have files that take up a 10^th of the space, and nobody can tell there\'s a difference in audio quality.\\n\\nA PCA-based Compression Scheme\\n------------------------------\\n\\nWhat I want to try out is a [PCA][8] approach to encoding audio. The PCA technique comes from Machine Learning, where it is used for a process called [Dimensionality Reduction][9]. Put simply, the idea is the same as lossy compression: if we can find a way that represents the data well enough, we can save on space. There are a lot of theoretical concerns that lead me to believe this compression style will not end well, but I\'m interested to try it nonetheless.\\n\\nPCA works as follows: Given a dataset with a number of features, I find a way to approximate those original features using some \\"new features\\" that are statistically as close as possible to the original ones. This is comparable to a scheme like MP3: Given an original signal, I want to find a way of representing it that gets approximately close to what the original was. The difference is that PCA is designed for statistical data, and not signal data. But we won\'t let that stop us.\\n\\nThe idea is as follows: Given a signal, reshape it into 1024 columns by however many rows are needed (zero-padded if necessary). Run the PCA algorithm, and do dimensionality reduction with a couple different settings. The number of components I choose determines the quality: If I use 1024 components, I will essentially be using the original signal. If I use a smaller number of components, I start losing some of the data that was in the original file. This will give me an idea of whether it\'s possible to actually build an encoding scheme off of this, or whether I\'m wasting my time.\\n\\nRunning the Algorithm\\n---------------------\\n\\nThe audio I will be using comes from the song [Tabulasa][10], by [Broke for Free][11]. I\'ll be loading in the audio signal to Python and using [Scikit-Learn][12] to actually run the PCA algorithm.\\n\\nWe first need to convert the FLAC file I have to a WAV:\\n\\n[1]: https://en.wikipedia.org/wiki/Digital_signal_processing\\n[2]: https://en.wikipedia.org/wiki/FLAC\\n[3]: https://en.wikipedia.org/wiki/Apple_Lossless\\n[4]: https://en.wikipedia.org/wiki/Monkey%27s_Audio\\n[5]: https://en.wikipedia.org/wiki/MP3\\n[6]: https://en.wikipedia.org/wiki/Vorbis\\n[7]: https://en.wikipedia.org/wiki/Advanced_Audio_Coding\\n[8]: https://en.wikipedia.org/wiki/Principal_component_analysis\\n[9]: https://en.wikipedia.org/wiki/Dimensionality_reduction\\n[10]: https://brokeforfree.bandcamp.com/track/tabulasa\\n[11]: https://brokeforfree.bandcamp.com/album/xxvii\\n[12]: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA\\n\\n\\n```python\\n!ffmpeg -hide_banner -loglevel panic -i \\"Broke For Free/XXVII/01 Tabulasa.flac\\" \\"Tabulasa.wav\\" -c wav\\n```\\n\\nThen, let\'s go ahead and load a small sample so you can hear what is going on.\\n\\n\\n```python\\nfrom IPython.display import Audio\\nfrom scipy.io import wavfile\\n\\nsamplerate, tabulasa = wavfile.read(\'Tabulasa.wav\')\\n\\nstart = samplerate * 14 # 10 seconds in\\nend = start + samplerate * 10 # 5 second duration\\nAudio(data=tabulasa[start:end, 0], rate=samplerate)\\n```\\n\\nimport wav1 from \\"./1.wav\\";\\n\\n

	Raw	PCA	PCA w/ BZ2
(1, 1)	69.054298	138.108597	16.431797
(1, 2)	69.054306	69.054306	32.981380
(1, 4)	69.054321	34.527161	16.715032
(4, 32)	69.054443	17.263611	8.481735
(16, 256)	69.054688	8.631836	4.274846
(32, 256)	69.054688	17.263672	8.542909
(64, 256)	69.054688	34.527344	17.097543
(128, 1024)	69.054688	17.263672	9.430644
(256, 1024)	69.054688	34.527344	18.870387
(512, 1024)	69.054688	69.054688	37.800940
(128, 2048)	69.062500	8.632812	6.185015
(256, 2048)	69.062500	17.265625	12.366942
(512, 2048)	69.062500	34.531250	24.736506
(1024, 2048)	69.062500	69.062500	49.517493

	Accuracy
count	111.000000
mean	0.905159
std	0.180602
min	0.043598
25%	0.937329
50%	0.959372
75%	0.960837
max	0.960837