Finish converting blog posts

This commit is contained in:
Bradlee Speice 2024-11-09 22:06:23 -05:00
parent 8b3c967b2c
commit 4a060ddbe2
7 changed files with 870 additions and 56 deletions

View File

@ -41,7 +41,7 @@ We'll discuss more in detail, but a quick preview of the results:
- Flatbuffers: Has some quirks, but largely lived up to its "zero-copy" promises
- SBE: Best median and worst-case performance, but the message structure has a limited feature set
# Prologue: Binary Parsing with Nom
## Prologue: Binary Parsing with Nom
Our benchmark system will be a simple data processor; given depth-of-book market data from
[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema
@ -119,7 +119,7 @@ Ultimately, because the `nom` code in this shootout was the same for all formats
interested in its performance. Still, it's worth mentioning that building the market data parser was
actually fun; I didn't have to write tons of boring code by hand.
# Part 1: Cap'n Proto
## Cap'n Proto
Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried
because of how long it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for
@ -151,7 +151,7 @@ every read for the segment table.
In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too
many issues for me to feel comfortable using it long-term.
# Part 2: Flatbuffers
## Flatbuffers
This is the new kid on the block. After a
[first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, official support
@ -191,7 +191,7 @@ that tag is nigh on impossible.
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform
well.
# Part 3: Simple Binary Encoding
## Simple Binary Encoding
Support for SBE was added by the author of one of my favorite
[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).
@ -212,7 +212,7 @@ However, if you don't need union types, and can accept that schemas are XML docu
worth using. SBE's implementation had the best streaming support of all formats I tested, and
doesn't trigger allocation during de/serialization.
# Results
## Results
After building a test harness
[for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs)
@ -225,7 +225,7 @@ the benchmarks, and the raw results are
below is the average of 10 runs on a single day of IEX data. Results were validated to make sure
that each format parsed the data correctly.
## Serialization
### Serialization
This test measures, on a
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272),
@ -239,7 +239,7 @@ buffer.
| Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s |
| SBE | 91ns | 1535ns | 2423ns | 3.91s |
## Deserialization
### Deserialization
This test measures, on a
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298),
@ -254,7 +254,7 @@ format implementation.
| Flatbuffers | 173ns | 421ns | 1007ns | 6.00s |
| SBE | 116ns | 286ns | 659ns | 4.05s |
# Conclusion
## Conclusion
Building a benchmark turned out to be incredibly helpful in making a decision; because a "union"
type isn't important to me, I can be confident that SBE best addresses my needs.

View File

@ -0,0 +1,370 @@
---
layout: post
title: "Release the GIL"
description: "Strategies for Parallelism in Python"
category:
tags: [python]
---
Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)
(GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision
made before multi-core CPU's were widely available, but the fact that it's still around indicates
that it generally works [Good](https://wiki.c2.com/?PrematureOptimization)
[Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective
workarounds; it's not hard to start a
[new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to
synchronize code running in parallel.
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of
asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of
both Python's productivity and the modern CPU's parallel capabilities.
Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes
Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about
whether it's a good idea to use them. Very often, unlocking the GIL is an
[XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the
GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at
the expense of project complexity; messing with the GIL is ultimately messing with Python's memory
model.
```python
%load_ext Cython
from numba import jit
N = 1_000_000_000
```
# Cython
Put simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python,
gets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates
well with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It's great for building Python
wrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And
when it comes to managing the GIL, there are two special features:
- The `nogil`
[function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil)
asserts that a Cython function is safe to use without the GIL, and compilation will fail if it
interacts with Python in an unsafe manner
- The `with nogil`
[context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil)
explicitly unlocks the CPython GIL while active
Whenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter
is unblocked and allowed to continue work elsewhere. We'll define a "busy work" function that
demonstrates this principle in action:
```python
%%cython
# Annotating a function with `nogil` indicates only that it is safe
# to call in a `with nogil` block. It *does not* release the GIL.
cdef unsigned long fibonacci(unsigned long n) nogil:
if n <= 1:
return n
cdef unsigned long a = 0, b = 1, c = 0
c = a + b
for _i in range(2, n):
a = b
b = c
c = a + b
return c
def cython_nogil(unsigned long n):
# Explicitly release the GIL while running `fibonacci`
with nogil:
value = fibonacci(n)
return value
def cython_gil(unsigned long n):
# Because the GIL is not explicitly released, it implicitly
# remains acquired when running the `fibonacci` function
return fibonacci(n)
```
First, let's time how long it takes Cython to calculate the billionth Fibonacci number:
```python
%%time
_ = cython_gil(N);
```
> <pre>
> CPU times: user 365 ms, sys: 0 ns, total: 365 ms
> Wall time: 372 ms
> </pre>
```python
%%time
_ = cython_nogil(N);
```
> <pre>
> CPU times: user 381 ms, sys: 0 ns, total: 381 ms
> Wall time: 388 ms
> </pre>
Both versions (with and without GIL) take effectively the same amount of time to run. Even when
running this calculation in parallel on separate threads, it is expected that the run time will
double because only one thread can be active at a time:
```python
%%time
from threading import Thread
# Create the two threads to run on
t1 = Thread(target=cython_gil, args=[N])
t2 = Thread(target=cython_gil, args=[N])
# Start the threads
t1.start(); t2.start()
# Wait for the threads to finish
t1.join(); t2.join()
```
> <pre>
> CPU times: user 641 ms, sys: 5.62 ms, total: 647 ms
> Wall time: 645 ms
> </pre>
However, if the first thread releases the GIL, the second thread is free to acquire it and run in
parallel:
```python
%%time
t1 = Thread(target=cython_nogil, args=[N])
t2 = Thread(target=cython_gil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 717 ms, sys: 372 µs, total: 718 ms
> Wall time: 358 ms
> </pre>
Because `user` time represents the sum of processing time on all threads, it doesn't change much.
The ["wall time"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half
because each function is running simultaneously.
Keep in mind that the **order in which threads are started** makes a difference!
```python
%%time
# Note that the GIL-locked version is started first
t1 = Thread(target=cython_gil, args=[N])
t2 = Thread(target=cython_nogil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 667 ms, sys: 0 ns, total: 667 ms
> Wall time: 672 ms
> </pre>
Even though the second thread releases the GIL while running, it can't start until the first has
completed. Thus, the overall runtime is effectively the same as running two GIL-locked threads.
Finally, be aware that attempting to unlock the GIL from a thread that doesn't own it will crash the
**interpreter**, not just the thread attempting the unlock:
```python
%%cython
cdef int cython_recurse(int n) nogil:
if n <= 0:
return 0
with nogil:
return cython_recurse(n - 1)
cython_recurse(2)
```
> <pre>
> Fatal Python error: PyEval_SaveThread: NULL tstate
>
> Thread 0x00007f499effd700 (most recent call first):
> File "/home/bspeice/.virtualenvs/release-the-gil/lib/python3.7/site-packages/ipykernel/parentpoller.py", line 39 in run
> File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
> File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
> </pre>
In practice, avoiding this issue is simple. First, `nogil` functions probably shouldn't contain
`with nogil` blocks. Second, Cython can
[conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil)
the GIL, so these conditions can be used to synchronize access. Finally, Cython's documentation for
[external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil)
contains more detail on how to safely manage the GIL.
To conclude: use Cython's `nogil` annotation to assert that functions are safe for calling when the
GIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions.
# Numba
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by
compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_
at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function
first compiles it to machine code before running. Calling the function a second time re-uses that
machine code unless the argument types have changed.
Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions
compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode
avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the
`@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython`
are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to
release the lock, the GIL will remain locked if `nogil=False` (the default).
Let's repeat the same experiment, this time using Numba instead of Cython:
```python
# The `int` type annotation is only for humans and is ignored
# by Numba.
@jit(nopython=True, nogil=True)
def numba_nogil(n: int) -> int:
if n <= 1:
return n
a = 0
b = 1
c = a + b
for _i in range(2, n):
a = b
b = c
c = a + b
return c
# Run using `nopython` mode to receive a performance boost,
# but GIL remains locked due to `nogil=False` by default.
@jit(nopython=True)
def numba_gil(n: int) -> int:
if n <= 1:
return n
a = 0
b = 1
c = a + b
for _i in range(2, n):
a = b
b = c
c = a + b
return c
# Call each function once to force compilation; we don't want
# the timing statistics to include how long it takes to compile.
numba_nogil(N)
numba_gil(N);
```
We'll perform the same tests as above; first, figure out how long it takes the function to run:
```python
%%time
_ = numba_gil(N)
```
> <pre>
> CPU times: user 253 ms, sys: 258 µs, total: 253 ms
> Wall time: 251 ms
> </pre>
<span style="font-size: .8em">
Aside: it's not immediately clear why Numba takes ~20% less time to run than Cython for code that should be
effectively identical after compilation.
</span>
When running two GIL-locked threads, the result (as expected) takes around twice as long to compute:
```python
%%time
t1 = Thread(target=numba_gil, args=[N])
t2 = Thread(target=numba_gil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 541 ms, sys: 3.96 ms, total: 545 ms
> Wall time: 541 ms
> </pre>
But if the GIL-unlocking thread starts first, both threads run in parallel:
```python
%%time
t1 = Thread(target=numba_nogil, args=[N])
t2 = Thread(target=numba_gil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 551 ms, sys: 7.77 ms, total: 559 ms
> Wall time: 279 ms
> </pre>
Just like Cython, starting the GIL-locked thread first leads to poor performance:
```python
%%time
t1 = Thread(target=numba_gil, args=[N])
t2 = Thread(target=numba_nogil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 524 ms, sys: 0 ns, total: 524 ms
> Wall time: 522 ms
> </pre>
Finally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired;
recursively calling `@jit(nogil=True)` functions is perfectly safe:
```python
from numba import jit
@jit(nopython=True, nogil=True)
def numba_recurse(n: int) -> int:
if n <= 0:
return 0
return numba_recurse(n - 1)
numba_recurse(2);
```
# Conclusion
Before finishing, it's important to address pain points that will show up if these techniques are
used in a more realistic project:
First, code running in a GIL-free context will likely also need non-trivial data structures;
GIL-free functions aren't useful if they're constantly interacting with Python objects whose access
requires the GIL. Cython provides
[extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba
provides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to
address this need.
Second, building and distributing applications that make use of Cython/Numba can be complicated.
Cython packages require running the compiler, (potentially) linking/packaging external dependencies,
and distributing a binary wheel. Numba is generally simpler because the code being distributed is
pure Python, but can be tricky since errors aren't detected until runtime.
Finally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba
provide tools to directly manage the GIL when appropriate. This enables true parallelism (not just
[concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python.

View File

@ -0,0 +1,372 @@
---
slug: 2019/12/release-the-gil
title: Release the GIL
date: 2019-12-14 12:00:00
authors: [bspeice]
tags: []
---
Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)
(GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision
made before multi-core CPU's were widely available, but the fact that it's still around indicates
that it generally works [Good](https://wiki.c2.com/?PrematureOptimization)
[Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective
workarounds; it's not hard to start a
[new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to
synchronize code running in parallel.
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of
asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of
both Python's productivity and the modern CPU's parallel capabilities.
<!-- truncate -->
Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes
Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about
whether it's a good idea to use them. Very often, unlocking the GIL is an
[XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the
GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at
the expense of project complexity; messing with the GIL is ultimately messing with Python's memory
model.
```python
%load_ext Cython
from numba import jit
N = 1_000_000_000
```
## Cython
Put simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python,
gets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates
well with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It's great for building Python
wrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And
when it comes to managing the GIL, there are two special features:
- The `nogil`
[function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil)
asserts that a Cython function is safe to use without the GIL, and compilation will fail if it
interacts with Python in an unsafe manner
- The `with nogil`
[context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil)
explicitly unlocks the CPython GIL while active
Whenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter
is unblocked and allowed to continue work elsewhere. We'll define a "busy work" function that
demonstrates this principle in action:
```python
%%cython
# Annotating a function with `nogil` indicates only that it is safe
# to call in a `with nogil` block. It *does not* release the GIL.
cdef unsigned long fibonacci(unsigned long n) nogil:
if n <= 1:
return n
cdef unsigned long a = 0, b = 1, c = 0
c = a + b
for _i in range(2, n):
a = b
b = c
c = a + b
return c
def cython_nogil(unsigned long n):
# Explicitly release the GIL while running `fibonacci`
with nogil:
value = fibonacci(n)
return value
def cython_gil(unsigned long n):
# Because the GIL is not explicitly released, it implicitly
# remains acquired when running the `fibonacci` function
return fibonacci(n)
```
First, let's time how long it takes Cython to calculate the billionth Fibonacci number:
```python
%%time
_ = cython_gil(N);
```
> <pre>
> CPU times: user 365 ms, sys: 0 ns, total: 365 ms
> Wall time: 372 ms
> </pre>
```python
%%time
_ = cython_nogil(N);
```
> <pre>
> CPU times: user 381 ms, sys: 0 ns, total: 381 ms
> Wall time: 388 ms
> </pre>
Both versions (with and without GIL) take effectively the same amount of time to run. Even when
running this calculation in parallel on separate threads, it is expected that the run time will
double because only one thread can be active at a time:
```python
%%time
from threading import Thread
# Create the two threads to run on
t1 = Thread(target=cython_gil, args=[N])
t2 = Thread(target=cython_gil, args=[N])
# Start the threads
t1.start(); t2.start()
# Wait for the threads to finish
t1.join(); t2.join()
```
> <pre>
> CPU times: user 641 ms, sys: 5.62 ms, total: 647 ms
> Wall time: 645 ms
> </pre>
However, if the first thread releases the GIL, the second thread is free to acquire it and run in
parallel:
```python
%%time
t1 = Thread(target=cython_nogil, args=[N])
t2 = Thread(target=cython_gil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 717 ms, sys: 372 µs, total: 718 ms
> Wall time: 358 ms
> </pre>
Because `user` time represents the sum of processing time on all threads, it doesn't change much.
The ["wall time"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half
because each function is running simultaneously.
Keep in mind that the **order in which threads are started** makes a difference!
```python
%%time
# Note that the GIL-locked version is started first
t1 = Thread(target=cython_gil, args=[N])
t2 = Thread(target=cython_nogil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 667 ms, sys: 0 ns, total: 667 ms
> Wall time: 672 ms
> </pre>
Even though the second thread releases the GIL while running, it can't start until the first has
completed. Thus, the overall runtime is effectively the same as running two GIL-locked threads.
Finally, be aware that attempting to unlock the GIL from a thread that doesn't own it will crash the
**interpreter**, not just the thread attempting the unlock:
```python
%%cython
cdef int cython_recurse(int n) nogil:
if n <= 0:
return 0
with nogil:
return cython_recurse(n - 1)
cython_recurse(2)
```
> <pre>
> Fatal Python error: PyEval_SaveThread: NULL tstate
>
> Thread 0x00007f499effd700 (most recent call first):
> File "/home/bspeice/.virtualenvs/release-the-gil/lib/python3.7/site-packages/ipykernel/parentpoller.py", line 39 in run
> File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
> File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
> </pre>
In practice, avoiding this issue is simple. First, `nogil` functions probably shouldn't contain
`with nogil` blocks. Second, Cython can
[conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil)
the GIL, so these conditions can be used to synchronize access. Finally, Cython's documentation for
[external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil)
contains more detail on how to safely manage the GIL.
To conclude: use Cython's `nogil` annotation to assert that functions are safe for calling when the
GIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions.
## Numba
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by
compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_
at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function
first compiles it to machine code before running. Calling the function a second time re-uses that
machine code unless the argument types have changed.
Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions
compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode
avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the
`@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython`
are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to
release the lock, the GIL will remain locked if `nogil=False` (the default).
Let's repeat the same experiment, this time using Numba instead of Cython:
```python
# The `int` type annotation is only for humans and is ignored
# by Numba.
@jit(nopython=True, nogil=True)
def numba_nogil(n: int) -> int:
if n <= 1:
return n
a = 0
b = 1
c = a + b
for _i in range(2, n):
a = b
b = c
c = a + b
return c
# Run using `nopython` mode to receive a performance boost,
# but GIL remains locked due to `nogil=False` by default.
@jit(nopython=True)
def numba_gil(n: int) -> int:
if n <= 1:
return n
a = 0
b = 1
c = a + b
for _i in range(2, n):
a = b
b = c
c = a + b
return c
# Call each function once to force compilation; we don't want
# the timing statistics to include how long it takes to compile.
numba_nogil(N)
numba_gil(N);
```
We'll perform the same tests as above; first, figure out how long it takes the function to run:
```python
%%time
_ = numba_gil(N)
```
> <pre>
> CPU times: user 253 ms, sys: 258 µs, total: 253 ms
> Wall time: 251 ms
> </pre>
<small>
Aside: it's not immediately clear why Numba takes ~20% less time to run than Cython for code that should be
effectively identical after compilation.
</small>
When running two GIL-locked threads, the result (as expected) takes around twice as long to compute:
```python
%%time
t1 = Thread(target=numba_gil, args=[N])
t2 = Thread(target=numba_gil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 541 ms, sys: 3.96 ms, total: 545 ms
> Wall time: 541 ms
> </pre>
But if the GIL-unlocking thread starts first, both threads run in parallel:
```python
%%time
t1 = Thread(target=numba_nogil, args=[N])
t2 = Thread(target=numba_gil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 551 ms, sys: 7.77 ms, total: 559 ms
> Wall time: 279 ms
> </pre>
Just like Cython, starting the GIL-locked thread first leads to poor performance:
```python
%%time
t1 = Thread(target=numba_gil, args=[N])
t2 = Thread(target=numba_nogil, args=[N])
t1.start(); t2.start()
t1.join(); t2.join()
```
> <pre>
> CPU times: user 524 ms, sys: 0 ns, total: 524 ms
> Wall time: 522 ms
> </pre>
Finally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired;
recursively calling `@jit(nogil=True)` functions is perfectly safe:
```python
from numba import jit
@jit(nopython=True, nogil=True)
def numba_recurse(n: int) -> int:
if n <= 0:
return 0
return numba_recurse(n - 1)
numba_recurse(2);
```
## Conclusion
Before finishing, it's important to address pain points that will show up if these techniques are
used in a more realistic project:
First, code running in a GIL-free context will likely also need non-trivial data structures;
GIL-free functions aren't useful if they're constantly interacting with Python objects whose access
requires the GIL. Cython provides
[extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba
provides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to
address this need.
Second, building and distributing applications that make use of Cython/Numba can be complicated.
Cython packages require running the compiler, (potentially) linking/packaging external dependencies,
and distributing a binary wheel. Numba is generally simpler because the code being distributed is
pure Python, but can be tricky since errors aren't detected until runtime.
Finally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba
provide tools to directly manage the GIL when appropriate. This enables true parallelism (not just
[concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python.

View File

@ -1,48 +0,0 @@
---
slug: mdx-blog-post
title: MDX Blog Post With An Extraordinarily Long Title
date: 2021-08-02 10:00:00
authors: [bspeice]
tags: []
---
## title
Hello?
Blog posts support [Docusaurus Markdown features](https://docusaurus.io/docs/markdown-features), such as [MDX](https://mdxjs.com/).
<details>
<summary>Hello</summary>
Testing - {1 + 2}
</details>
:::tip
Use the power of React to create interactive blog posts.
:::
{/* truncate */}
For example, use JSX to create an interactive button:
```js
<button onClick={() => alert('button clicked!')}>Click me!</button>
<button onClick={() => alert('button clicked!')}>Click me!</button>
```
```cpp
class MyClass {
public:
MyClass() = default;
};
int main() {
auto x = 24;
}
```
<button onClick={() => alert('button clicked!')}>Click me!</button>

View File

@ -0,0 +1,60 @@
---
layout: post
title: "The webpack industrial complex"
description: "Reflections on a new project"
category:
tags: [webpack, react, vite]
---
This started because I wanted to build a synthesizer. Setting a goal of "digital DX7" was ambitious, but I needed something unrelated to the day job. Beyond that, working with audio seemed like a good challenge. I enjoy performance-focused code, and performance problems in audio are conspicuous. Building a web project was an obvious choice because of the web audio API documentation and independence from a large Digital Audio Workstation (DAW).
The project was soon derailed trying to sort out technical issues unrelated to the original purpose. Finding a resolution was a frustrating journey, and it's still not clear whether those problems were my fault. As a result, I'm writing this to try making sense of it, as a case study/reference material, and to salvage something from the process.
## Starting strong
The sole starting requirement was to write everything in TypeScript. Not because of project scale, but because guardrails help with unfamiliar territory. Keeping that in mind, the first question was: how does one start a new project? All I actually need is "compile TypeScript, show it in a browser."
Create React App (CRA) came to the rescue and the rest of that evening was a joy. My TypeScript/JavaScript skills were rusty, but the online documentation was helpful. I had never understood the appeal of JSX (why put a DOM in JavaScript?) until it made connecting an `onEvent` handler and a function easy.
Some quick dimensional analysis later and there was a sine wave oscillator playing A=440 through the speakers. I specifically remember thinking "modern browsers are magical."
## Continuing on
Now comes the first mistake: I began to worry about "scale" before encountering an actual problem. Rather than rendering audio in the main thread, why not use audio worklets and render in a background thread instead?
The first sign something was amiss came from the TypeScript compiler errors showing the audio worklet API [was missing](https://github.com/microsoft/TypeScript/issues/28308). After searching out Github issues and (unsuccessfully) tweaking the `.tsconfig` settings, I settled on installing a package and moving on.
The next problem came from actually using the API. Worklets must load from separate "modules," but it wasn't clear how to guarantee the worklet code stayed separate from the application. I saw recommendations to use `new URL(<local path>, import.meta.url)` and it worked! Well, kind of:
![Browser error](/assets/images/2022-11-20-video_mp2t.png)
That file has the audio processor code, so why does it get served with `Content-Type: video/mp2t`?
## Floundering about
Now comes the second mistake: even though I didn't understand the error, I ignored recommendations to [just use JavaScript](https://hackernoon.com/implementing-audioworklets-with-react-8a80a470474) and stuck by the original TypeScript requirement.
I tried different project structures. Moving the worklet code to a new folder didn't help, nor did setting up a monorepo and placing it in a new package.
I tried three different CRA tools - `react-app-rewired`, `craco`, `customize-react-app` - but got the same problem. Each has varying levels of compatibility with recent CRA versions, so it wasn't clear if I had the right solution but implemented it incorrectly. After attempting to eject the application and panicking after seeing the configuration, I abandoned that as well.
I tried changing the webpack configuration: using [new](https://github.com/webpack/webpack/issues/11543#issuecomment-917673256) [loaders](https://github.com/popelenkow/worker-url), setting [asset rules](https://github.com/webpack/webpack/discussions/14093#discussioncomment-1257149), even [changing how webpack detects worker resources](https://github.com/webpack/webpack/issues/11543#issuecomment-826897590). In hindsight, entry points may have been the answer. But because CRA actively resists attempts to change its webpack configuration, and I couldn't find audio worklet examples in any other framework, I gave up.
I tried so many application frameworks. Next.js looked like a good candidate, but added its own [bespoke webpack complexity](https://github.com/vercel/next.js/issues/24907) to the existing confusion. Astro had the best "getting started" experience, but I refuse to install an IDE-specific plugin. I first used Deno while exploring Lume, but it couldn't import the audio worklet types (maybe because of module compatibility?). Each framework was unique in its own way (shout-out to SvelteKit) but I couldn't figure out how to make them work.
## Learning and reflecting
I ended up using Vite and vite-plugin-react-pages to handle both "build the app" and "bundle worklets," but the specific tool choice isn't important. Instead, the focus should be on lessons learned.
For myself:
- I'm obsessed with tooling, to the point it can derail the original goal. While it comes from a good place (for example: "types are awesome"), it can get in the way of more important work
- I tend to reach for online resources right after seeing a new problem. While finding help online is often faster, spending time understanding the problem would have been more productive than cycling through (often outdated) blog posts
For the tools:
- Resource bundling is great and solves a genuine challenge. I've heard too many horror stories of developers writing modules by hand to believe this is unnecessary complexity
- Webpack is a build system and modern frameworks are deeply dependent on it (hence the "webpack industrial complex"). While this often saves users from unnecessary complexity, there's no path forward if something breaks
- There's little ability to mix and match tools across frameworks. Next.js and Gatsby let users extend webpack, but because each framework adds its own modules, changes aren't portable. After spending a week looking at webpack, I had an example running with parcel in thirty minutes, but couldn't integrate it
In the end, learning new systems is fun, but a focus on tools that "just work" can leave users out in the cold if they break down.

View File

@ -0,0 +1,60 @@
---
slug: 2011/11/webpack-industrial-complex
title: The webpack industrial complex
date: 2022-11-20 12:00:00
authors: [bspeice]
tags: []
---
This started because I wanted to build a synthesizer. Setting a goal of "digital DX7" was ambitious, but I needed something unrelated to the day job. Beyond that, working with audio seemed like a good challenge. I enjoy performance-focused code, and performance problems in audio are conspicuous. Building a web project was an obvious choice because of the web audio API documentation and independence from a large Digital Audio Workstation (DAW).
The project was soon derailed trying to sort out technical issues unrelated to the original purpose. Finding a resolution was a frustrating journey, and it's still not clear whether those problems were my fault. As a result, I'm writing this to try making sense of it, as a case study/reference material, and to salvage something from the process.
## Starting strong
The sole starting requirement was to write everything in TypeScript. Not because of project scale, but because guardrails help with unfamiliar territory. Keeping that in mind, the first question was: how does one start a new project? All I actually need is "compile TypeScript, show it in a browser."
Create React App (CRA) came to the rescue and the rest of that evening was a joy. My TypeScript/JavaScript skills were rusty, but the online documentation was helpful. I had never understood the appeal of JSX (why put a DOM in JavaScript?) until it made connecting an `onEvent` handler and a function easy.
Some quick dimensional analysis later and there was a sine wave oscillator playing A=440 through the speakers. I specifically remember thinking "modern browsers are magical."
## Continuing on
Now comes the first mistake: I began to worry about "scale" before encountering an actual problem. Rather than rendering audio in the main thread, why not use audio worklets and render in a background thread instead?
The first sign something was amiss came from the TypeScript compiler errors showing the audio worklet API [was missing](https://github.com/microsoft/TypeScript/issues/28308). After searching out Github issues and (unsuccessfully) tweaking the `.tsconfig` settings, I settled on installing a package and moving on.
The next problem came from actually using the API. Worklets must load from separate "modules," but it wasn't clear how to guarantee the worklet code stayed separate from the application. I saw recommendations to use `new URL(<local path>, import.meta.url)` and it worked! Well, kind of:
![Browser error](./video_mp2t.png)
That file has the audio processor code, so why does it get served with `Content-Type: video/mp2t`?
## Floundering about
Now comes the second mistake: even though I didn't understand the error, I ignored recommendations to [just use JavaScript](https://hackernoon.com/implementing-audioworklets-with-react-8a80a470474) and stuck by the original TypeScript requirement.
I tried different project structures. Moving the worklet code to a new folder didn't help, nor did setting up a monorepo and placing it in a new package.
I tried three different CRA tools - `react-app-rewired`, `craco`, `customize-react-app` - but got the same problem. Each has varying levels of compatibility with recent CRA versions, so it wasn't clear if I had the right solution but implemented it incorrectly. After attempting to eject the application and panicking after seeing the configuration, I abandoned that as well.
I tried changing the webpack configuration: using [new](https://github.com/webpack/webpack/issues/11543#issuecomment-917673256) [loaders](https://github.com/popelenkow/worker-url), setting [asset rules](https://github.com/webpack/webpack/discussions/14093#discussioncomment-1257149), even [changing how webpack detects worker resources](https://github.com/webpack/webpack/issues/11543#issuecomment-826897590). In hindsight, entry points may have been the answer. But because CRA actively resists attempts to change its webpack configuration, and I couldn't find audio worklet examples in any other framework, I gave up.
I tried so many application frameworks. Next.js looked like a good candidate, but added its own [bespoke webpack complexity](https://github.com/vercel/next.js/issues/24907) to the existing confusion. Astro had the best "getting started" experience, but I refuse to install an IDE-specific plugin. I first used Deno while exploring Lume, but it couldn't import the audio worklet types (maybe because of module compatibility?). Each framework was unique in its own way (shout-out to SvelteKit) but I couldn't figure out how to make them work.
## Learning and reflecting
I ended up using Vite and vite-plugin-react-pages to handle both "build the app" and "bundle worklets," but the specific tool choice isn't important. Instead, the focus should be on lessons learned.
For myself:
- I'm obsessed with tooling, to the point it can derail the original goal. While it comes from a good place (for example: "types are awesome"), it can get in the way of more important work
- I tend to reach for online resources right after seeing a new problem. While finding help online is often faster, spending time understanding the problem would have been more productive than cycling through (often outdated) blog posts
For the tools:
- Resource bundling is great and solves a genuine challenge. I've heard too many horror stories of developers writing modules by hand to believe this is unnecessary complexity
- Webpack is a build system and modern frameworks are deeply dependent on it (hence the "webpack industrial complex"). While this often saves users from unnecessary complexity, there's no path forward if something breaks
- There's little ability to mix and match tools across frameworks. Next.js and Gatsby let users extend webpack, but because each framework adds its own modules, changes aren't portable. After spending a week looking at webpack, I had an example running with parcel in thirty minutes, but couldn't integrate it
In the end, learning new systems is fun, but a focus on tools that "just work" can leave users out in the cold if they break down.

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB