mirror of
https://github.com/bspeice/speice.io
synced 2024-12-22 16:48:10 -05:00
Finish converting blog posts
This commit is contained in:
parent
8b3c967b2c
commit
4a060ddbe2
@ -41,7 +41,7 @@ We'll discuss more in detail, but a quick preview of the results:
|
|||||||
- Flatbuffers: Has some quirks, but largely lived up to its "zero-copy" promises
|
- Flatbuffers: Has some quirks, but largely lived up to its "zero-copy" promises
|
||||||
- SBE: Best median and worst-case performance, but the message structure has a limited feature set
|
- SBE: Best median and worst-case performance, but the message structure has a limited feature set
|
||||||
|
|
||||||
# Prologue: Binary Parsing with Nom
|
## Prologue: Binary Parsing with Nom
|
||||||
|
|
||||||
Our benchmark system will be a simple data processor; given depth-of-book market data from
|
Our benchmark system will be a simple data processor; given depth-of-book market data from
|
||||||
[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema
|
[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema
|
||||||
@ -119,7 +119,7 @@ Ultimately, because the `nom` code in this shootout was the same for all formats
|
|||||||
interested in its performance. Still, it's worth mentioning that building the market data parser was
|
interested in its performance. Still, it's worth mentioning that building the market data parser was
|
||||||
actually fun; I didn't have to write tons of boring code by hand.
|
actually fun; I didn't have to write tons of boring code by hand.
|
||||||
|
|
||||||
# Part 1: Cap'n Proto
|
## Cap'n Proto
|
||||||
|
|
||||||
Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried
|
Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried
|
||||||
because of how long it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for
|
because of how long it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for
|
||||||
@ -151,7 +151,7 @@ every read for the segment table.
|
|||||||
In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too
|
In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too
|
||||||
many issues for me to feel comfortable using it long-term.
|
many issues for me to feel comfortable using it long-term.
|
||||||
|
|
||||||
# Part 2: Flatbuffers
|
## Flatbuffers
|
||||||
|
|
||||||
This is the new kid on the block. After a
|
This is the new kid on the block. After a
|
||||||
[first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, official support
|
[first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, official support
|
||||||
@ -191,7 +191,7 @@ that tag is nigh on impossible.
|
|||||||
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform
|
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform
|
||||||
well.
|
well.
|
||||||
|
|
||||||
# Part 3: Simple Binary Encoding
|
## Simple Binary Encoding
|
||||||
|
|
||||||
Support for SBE was added by the author of one of my favorite
|
Support for SBE was added by the author of one of my favorite
|
||||||
[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).
|
[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).
|
||||||
@ -212,7 +212,7 @@ However, if you don't need union types, and can accept that schemas are XML docu
|
|||||||
worth using. SBE's implementation had the best streaming support of all formats I tested, and
|
worth using. SBE's implementation had the best streaming support of all formats I tested, and
|
||||||
doesn't trigger allocation during de/serialization.
|
doesn't trigger allocation during de/serialization.
|
||||||
|
|
||||||
# Results
|
## Results
|
||||||
|
|
||||||
After building a test harness
|
After building a test harness
|
||||||
[for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs)
|
[for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs)
|
||||||
@ -225,7 +225,7 @@ the benchmarks, and the raw results are
|
|||||||
below is the average of 10 runs on a single day of IEX data. Results were validated to make sure
|
below is the average of 10 runs on a single day of IEX data. Results were validated to make sure
|
||||||
that each format parsed the data correctly.
|
that each format parsed the data correctly.
|
||||||
|
|
||||||
## Serialization
|
### Serialization
|
||||||
|
|
||||||
This test measures, on a
|
This test measures, on a
|
||||||
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272),
|
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272),
|
||||||
@ -239,7 +239,7 @@ buffer.
|
|||||||
| Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s |
|
| Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s |
|
||||||
| SBE | 91ns | 1535ns | 2423ns | 3.91s |
|
| SBE | 91ns | 1535ns | 2423ns | 3.91s |
|
||||||
|
|
||||||
## Deserialization
|
### Deserialization
|
||||||
|
|
||||||
This test measures, on a
|
This test measures, on a
|
||||||
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298),
|
[per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298),
|
||||||
@ -254,7 +254,7 @@ format implementation.
|
|||||||
| Flatbuffers | 173ns | 421ns | 1007ns | 6.00s |
|
| Flatbuffers | 173ns | 421ns | 1007ns | 6.00s |
|
||||||
| SBE | 116ns | 286ns | 659ns | 4.05s |
|
| SBE | 116ns | 286ns | 659ns | 4.05s |
|
||||||
|
|
||||||
# Conclusion
|
## Conclusion
|
||||||
|
|
||||||
Building a benchmark turned out to be incredibly helpful in making a decision; because a "union"
|
Building a benchmark turned out to be incredibly helpful in making a decision; because a "union"
|
||||||
type isn't important to me, I can be confident that SBE best addresses my needs.
|
type isn't important to me, I can be confident that SBE best addresses my needs.
|
||||||
|
370
blog/2019-12-14-release-the-gil/_article.md
Normal file
370
blog/2019-12-14-release-the-gil/_article.md
Normal file
@ -0,0 +1,370 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: "Release the GIL"
|
||||||
|
description: "Strategies for Parallelism in Python"
|
||||||
|
category:
|
||||||
|
tags: [python]
|
||||||
|
---
|
||||||
|
|
||||||
|
Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)
|
||||||
|
(GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision
|
||||||
|
made before multi-core CPU's were widely available, but the fact that it's still around indicates
|
||||||
|
that it generally works [Good](https://wiki.c2.com/?PrematureOptimization)
|
||||||
|
[Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective
|
||||||
|
workarounds; it's not hard to start a
|
||||||
|
[new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to
|
||||||
|
synchronize code running in parallel.
|
||||||
|
|
||||||
|
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of
|
||||||
|
asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of
|
||||||
|
both Python's productivity and the modern CPU's parallel capabilities.
|
||||||
|
|
||||||
|
Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes
|
||||||
|
Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about
|
||||||
|
whether it's a good idea to use them. Very often, unlocking the GIL is an
|
||||||
|
[XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the
|
||||||
|
GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at
|
||||||
|
the expense of project complexity; messing with the GIL is ultimately messing with Python's memory
|
||||||
|
model.
|
||||||
|
|
||||||
|
```python
|
||||||
|
%load_ext Cython
|
||||||
|
from numba import jit
|
||||||
|
|
||||||
|
N = 1_000_000_000
|
||||||
|
```
|
||||||
|
|
||||||
|
# Cython
|
||||||
|
|
||||||
|
Put simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python,
|
||||||
|
gets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates
|
||||||
|
well with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It's great for building Python
|
||||||
|
wrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And
|
||||||
|
when it comes to managing the GIL, there are two special features:
|
||||||
|
|
||||||
|
- The `nogil`
|
||||||
|
[function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil)
|
||||||
|
asserts that a Cython function is safe to use without the GIL, and compilation will fail if it
|
||||||
|
interacts with Python in an unsafe manner
|
||||||
|
- The `with nogil`
|
||||||
|
[context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil)
|
||||||
|
explicitly unlocks the CPython GIL while active
|
||||||
|
|
||||||
|
Whenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter
|
||||||
|
is unblocked and allowed to continue work elsewhere. We'll define a "busy work" function that
|
||||||
|
demonstrates this principle in action:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%cython
|
||||||
|
|
||||||
|
# Annotating a function with `nogil` indicates only that it is safe
|
||||||
|
# to call in a `with nogil` block. It *does not* release the GIL.
|
||||||
|
cdef unsigned long fibonacci(unsigned long n) nogil:
|
||||||
|
if n <= 1:
|
||||||
|
return n
|
||||||
|
|
||||||
|
cdef unsigned long a = 0, b = 1, c = 0
|
||||||
|
|
||||||
|
c = a + b
|
||||||
|
for _i in range(2, n):
|
||||||
|
a = b
|
||||||
|
b = c
|
||||||
|
c = a + b
|
||||||
|
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
def cython_nogil(unsigned long n):
|
||||||
|
# Explicitly release the GIL while running `fibonacci`
|
||||||
|
with nogil:
|
||||||
|
value = fibonacci(n)
|
||||||
|
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
def cython_gil(unsigned long n):
|
||||||
|
# Because the GIL is not explicitly released, it implicitly
|
||||||
|
# remains acquired when running the `fibonacci` function
|
||||||
|
return fibonacci(n)
|
||||||
|
```
|
||||||
|
|
||||||
|
First, let's time how long it takes Cython to calculate the billionth Fibonacci number:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
_ = cython_gil(N);
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 365 ms, sys: 0 ns, total: 365 ms
|
||||||
|
> Wall time: 372 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
_ = cython_nogil(N);
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 381 ms, sys: 0 ns, total: 381 ms
|
||||||
|
> Wall time: 388 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Both versions (with and without GIL) take effectively the same amount of time to run. Even when
|
||||||
|
running this calculation in parallel on separate threads, it is expected that the run time will
|
||||||
|
double because only one thread can be active at a time:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
from threading import Thread
|
||||||
|
|
||||||
|
# Create the two threads to run on
|
||||||
|
t1 = Thread(target=cython_gil, args=[N])
|
||||||
|
t2 = Thread(target=cython_gil, args=[N])
|
||||||
|
# Start the threads
|
||||||
|
t1.start(); t2.start()
|
||||||
|
# Wait for the threads to finish
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 641 ms, sys: 5.62 ms, total: 647 ms
|
||||||
|
> Wall time: 645 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
However, if the first thread releases the GIL, the second thread is free to acquire it and run in
|
||||||
|
parallel:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
|
||||||
|
t1 = Thread(target=cython_nogil, args=[N])
|
||||||
|
t2 = Thread(target=cython_gil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 717 ms, sys: 372 µs, total: 718 ms
|
||||||
|
> Wall time: 358 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Because `user` time represents the sum of processing time on all threads, it doesn't change much.
|
||||||
|
The ["wall time"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half
|
||||||
|
because each function is running simultaneously.
|
||||||
|
|
||||||
|
Keep in mind that the **order in which threads are started** makes a difference!
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
|
||||||
|
# Note that the GIL-locked version is started first
|
||||||
|
t1 = Thread(target=cython_gil, args=[N])
|
||||||
|
t2 = Thread(target=cython_nogil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 667 ms, sys: 0 ns, total: 667 ms
|
||||||
|
> Wall time: 672 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Even though the second thread releases the GIL while running, it can't start until the first has
|
||||||
|
completed. Thus, the overall runtime is effectively the same as running two GIL-locked threads.
|
||||||
|
|
||||||
|
Finally, be aware that attempting to unlock the GIL from a thread that doesn't own it will crash the
|
||||||
|
**interpreter**, not just the thread attempting the unlock:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%cython
|
||||||
|
|
||||||
|
cdef int cython_recurse(int n) nogil:
|
||||||
|
if n <= 0:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
with nogil:
|
||||||
|
return cython_recurse(n - 1)
|
||||||
|
|
||||||
|
cython_recurse(2)
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> Fatal Python error: PyEval_SaveThread: NULL tstate
|
||||||
|
>
|
||||||
|
> Thread 0x00007f499effd700 (most recent call first):
|
||||||
|
> File "/home/bspeice/.virtualenvs/release-the-gil/lib/python3.7/site-packages/ipykernel/parentpoller.py", line 39 in run
|
||||||
|
> File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
|
||||||
|
> File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
In practice, avoiding this issue is simple. First, `nogil` functions probably shouldn't contain
|
||||||
|
`with nogil` blocks. Second, Cython can
|
||||||
|
[conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil)
|
||||||
|
the GIL, so these conditions can be used to synchronize access. Finally, Cython's documentation for
|
||||||
|
[external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil)
|
||||||
|
contains more detail on how to safely manage the GIL.
|
||||||
|
|
||||||
|
To conclude: use Cython's `nogil` annotation to assert that functions are safe for calling when the
|
||||||
|
GIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions.
|
||||||
|
|
||||||
|
# Numba
|
||||||
|
|
||||||
|
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by
|
||||||
|
compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_
|
||||||
|
at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function
|
||||||
|
first compiles it to machine code before running. Calling the function a second time re-uses that
|
||||||
|
machine code unless the argument types have changed.
|
||||||
|
|
||||||
|
Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions
|
||||||
|
compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode
|
||||||
|
avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the
|
||||||
|
`@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython`
|
||||||
|
are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to
|
||||||
|
release the lock, the GIL will remain locked if `nogil=False` (the default).
|
||||||
|
|
||||||
|
Let's repeat the same experiment, this time using Numba instead of Cython:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# The `int` type annotation is only for humans and is ignored
|
||||||
|
# by Numba.
|
||||||
|
@jit(nopython=True, nogil=True)
|
||||||
|
def numba_nogil(n: int) -> int:
|
||||||
|
if n <= 1:
|
||||||
|
return n
|
||||||
|
|
||||||
|
a = 0
|
||||||
|
b = 1
|
||||||
|
|
||||||
|
c = a + b
|
||||||
|
for _i in range(2, n):
|
||||||
|
a = b
|
||||||
|
b = c
|
||||||
|
c = a + b
|
||||||
|
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
# Run using `nopython` mode to receive a performance boost,
|
||||||
|
# but GIL remains locked due to `nogil=False` by default.
|
||||||
|
@jit(nopython=True)
|
||||||
|
def numba_gil(n: int) -> int:
|
||||||
|
if n <= 1:
|
||||||
|
return n
|
||||||
|
|
||||||
|
a = 0
|
||||||
|
b = 1
|
||||||
|
|
||||||
|
c = a + b
|
||||||
|
for _i in range(2, n):
|
||||||
|
a = b
|
||||||
|
b = c
|
||||||
|
c = a + b
|
||||||
|
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
# Call each function once to force compilation; we don't want
|
||||||
|
# the timing statistics to include how long it takes to compile.
|
||||||
|
numba_nogil(N)
|
||||||
|
numba_gil(N);
|
||||||
|
```
|
||||||
|
|
||||||
|
We'll perform the same tests as above; first, figure out how long it takes the function to run:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
_ = numba_gil(N)
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 253 ms, sys: 258 µs, total: 253 ms
|
||||||
|
> Wall time: 251 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
<span style="font-size: .8em">
|
||||||
|
Aside: it's not immediately clear why Numba takes ~20% less time to run than Cython for code that should be
|
||||||
|
effectively identical after compilation.
|
||||||
|
</span>
|
||||||
|
|
||||||
|
When running two GIL-locked threads, the result (as expected) takes around twice as long to compute:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
t1 = Thread(target=numba_gil, args=[N])
|
||||||
|
t2 = Thread(target=numba_gil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 541 ms, sys: 3.96 ms, total: 545 ms
|
||||||
|
> Wall time: 541 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
But if the GIL-unlocking thread starts first, both threads run in parallel:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
t1 = Thread(target=numba_nogil, args=[N])
|
||||||
|
t2 = Thread(target=numba_gil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 551 ms, sys: 7.77 ms, total: 559 ms
|
||||||
|
> Wall time: 279 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Just like Cython, starting the GIL-locked thread first leads to poor performance:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
t1 = Thread(target=numba_gil, args=[N])
|
||||||
|
t2 = Thread(target=numba_nogil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 524 ms, sys: 0 ns, total: 524 ms
|
||||||
|
> Wall time: 522 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Finally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired;
|
||||||
|
recursively calling `@jit(nogil=True)` functions is perfectly safe:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from numba import jit
|
||||||
|
|
||||||
|
@jit(nopython=True, nogil=True)
|
||||||
|
def numba_recurse(n: int) -> int:
|
||||||
|
if n <= 0:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
return numba_recurse(n - 1)
|
||||||
|
|
||||||
|
numba_recurse(2);
|
||||||
|
```
|
||||||
|
|
||||||
|
# Conclusion
|
||||||
|
|
||||||
|
Before finishing, it's important to address pain points that will show up if these techniques are
|
||||||
|
used in a more realistic project:
|
||||||
|
|
||||||
|
First, code running in a GIL-free context will likely also need non-trivial data structures;
|
||||||
|
GIL-free functions aren't useful if they're constantly interacting with Python objects whose access
|
||||||
|
requires the GIL. Cython provides
|
||||||
|
[extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba
|
||||||
|
provides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to
|
||||||
|
address this need.
|
||||||
|
|
||||||
|
Second, building and distributing applications that make use of Cython/Numba can be complicated.
|
||||||
|
Cython packages require running the compiler, (potentially) linking/packaging external dependencies,
|
||||||
|
and distributing a binary wheel. Numba is generally simpler because the code being distributed is
|
||||||
|
pure Python, but can be tricky since errors aren't detected until runtime.
|
||||||
|
|
||||||
|
Finally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba
|
||||||
|
provide tools to directly manage the GIL when appropriate. This enables true parallelism (not just
|
||||||
|
[concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python.
|
372
blog/2019-12-14-release-the-gil/index.mdx
Normal file
372
blog/2019-12-14-release-the-gil/index.mdx
Normal file
@ -0,0 +1,372 @@
|
|||||||
|
---
|
||||||
|
slug: 2019/12/release-the-gil
|
||||||
|
title: Release the GIL
|
||||||
|
date: 2019-12-14 12:00:00
|
||||||
|
authors: [bspeice]
|
||||||
|
tags: []
|
||||||
|
---
|
||||||
|
|
||||||
|
Complaining about the [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)
|
||||||
|
(GIL) seems like a rite of passage for Python developers. It's easy to criticize a design decision
|
||||||
|
made before multi-core CPU's were widely available, but the fact that it's still around indicates
|
||||||
|
that it generally works [Good](https://wiki.c2.com/?PrematureOptimization)
|
||||||
|
[Enough](https://wiki.c2.com/?YouArentGonnaNeedIt). Besides, there are simple and effective
|
||||||
|
workarounds; it's not hard to start a
|
||||||
|
[new process](https://docs.python.org/3/library/multiprocessing.html) and use message passing to
|
||||||
|
synchronize code running in parallel.
|
||||||
|
|
||||||
|
Still, wouldn't it be nice to have more than a single active interpreter thread? In an age of
|
||||||
|
asynchronicity and _M:N_ threading, Python seems lacking. The ideal scenario is to take advantage of
|
||||||
|
both Python's productivity and the modern CPU's parallel capabilities.
|
||||||
|
|
||||||
|
<!-- truncate -->
|
||||||
|
|
||||||
|
Presented below are two strategies for releasing the GIL's icy grip without giving up on what makes
|
||||||
|
Python a nice language to start with. Bear in mind: these are just the tools, no claim is made about
|
||||||
|
whether it's a good idea to use them. Very often, unlocking the GIL is an
|
||||||
|
[XY problem](https://en.wikipedia.org/wiki/XY_problem); you want application performance, and the
|
||||||
|
GIL seems like an obvious bottleneck. Remember that any gains from running code in parallel come at
|
||||||
|
the expense of project complexity; messing with the GIL is ultimately messing with Python's memory
|
||||||
|
model.
|
||||||
|
|
||||||
|
```python
|
||||||
|
%load_ext Cython
|
||||||
|
from numba import jit
|
||||||
|
|
||||||
|
N = 1_000_000_000
|
||||||
|
```
|
||||||
|
|
||||||
|
## Cython
|
||||||
|
|
||||||
|
Put simply, [Cython](https://cython.org/) is a programming language that looks a lot like Python,
|
||||||
|
gets [transpiled](https://en.wikipedia.org/wiki/Source-to-source_compiler) to C/C++, and integrates
|
||||||
|
well with the [CPython](https://en.wikipedia.org/wiki/CPython) API. It's great for building Python
|
||||||
|
wrappers to C and C++ libraries, writing optimized code for numerical processing, and tons more. And
|
||||||
|
when it comes to managing the GIL, there are two special features:
|
||||||
|
|
||||||
|
- The `nogil`
|
||||||
|
[function annotation](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#declaring-a-function-as-callable-without-the-gil)
|
||||||
|
asserts that a Cython function is safe to use without the GIL, and compilation will fail if it
|
||||||
|
interacts with Python in an unsafe manner
|
||||||
|
- The `with nogil`
|
||||||
|
[context manager](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#releasing-the-gil)
|
||||||
|
explicitly unlocks the CPython GIL while active
|
||||||
|
|
||||||
|
Whenever Cython code runs inside a `with nogil` block on a separate thread, the Python interpreter
|
||||||
|
is unblocked and allowed to continue work elsewhere. We'll define a "busy work" function that
|
||||||
|
demonstrates this principle in action:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%cython
|
||||||
|
|
||||||
|
# Annotating a function with `nogil` indicates only that it is safe
|
||||||
|
# to call in a `with nogil` block. It *does not* release the GIL.
|
||||||
|
cdef unsigned long fibonacci(unsigned long n) nogil:
|
||||||
|
if n <= 1:
|
||||||
|
return n
|
||||||
|
|
||||||
|
cdef unsigned long a = 0, b = 1, c = 0
|
||||||
|
|
||||||
|
c = a + b
|
||||||
|
for _i in range(2, n):
|
||||||
|
a = b
|
||||||
|
b = c
|
||||||
|
c = a + b
|
||||||
|
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
def cython_nogil(unsigned long n):
|
||||||
|
# Explicitly release the GIL while running `fibonacci`
|
||||||
|
with nogil:
|
||||||
|
value = fibonacci(n)
|
||||||
|
|
||||||
|
return value
|
||||||
|
|
||||||
|
|
||||||
|
def cython_gil(unsigned long n):
|
||||||
|
# Because the GIL is not explicitly released, it implicitly
|
||||||
|
# remains acquired when running the `fibonacci` function
|
||||||
|
return fibonacci(n)
|
||||||
|
```
|
||||||
|
|
||||||
|
First, let's time how long it takes Cython to calculate the billionth Fibonacci number:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
_ = cython_gil(N);
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 365 ms, sys: 0 ns, total: 365 ms
|
||||||
|
> Wall time: 372 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
_ = cython_nogil(N);
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 381 ms, sys: 0 ns, total: 381 ms
|
||||||
|
> Wall time: 388 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Both versions (with and without GIL) take effectively the same amount of time to run. Even when
|
||||||
|
running this calculation in parallel on separate threads, it is expected that the run time will
|
||||||
|
double because only one thread can be active at a time:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
from threading import Thread
|
||||||
|
|
||||||
|
# Create the two threads to run on
|
||||||
|
t1 = Thread(target=cython_gil, args=[N])
|
||||||
|
t2 = Thread(target=cython_gil, args=[N])
|
||||||
|
# Start the threads
|
||||||
|
t1.start(); t2.start()
|
||||||
|
# Wait for the threads to finish
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 641 ms, sys: 5.62 ms, total: 647 ms
|
||||||
|
> Wall time: 645 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
However, if the first thread releases the GIL, the second thread is free to acquire it and run in
|
||||||
|
parallel:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
|
||||||
|
t1 = Thread(target=cython_nogil, args=[N])
|
||||||
|
t2 = Thread(target=cython_gil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 717 ms, sys: 372 µs, total: 718 ms
|
||||||
|
> Wall time: 358 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Because `user` time represents the sum of processing time on all threads, it doesn't change much.
|
||||||
|
The ["wall time"](https://en.wikipedia.org/wiki/Elapsed_real_time) has been cut roughly in half
|
||||||
|
because each function is running simultaneously.
|
||||||
|
|
||||||
|
Keep in mind that the **order in which threads are started** makes a difference!
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
|
||||||
|
# Note that the GIL-locked version is started first
|
||||||
|
t1 = Thread(target=cython_gil, args=[N])
|
||||||
|
t2 = Thread(target=cython_nogil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 667 ms, sys: 0 ns, total: 667 ms
|
||||||
|
> Wall time: 672 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Even though the second thread releases the GIL while running, it can't start until the first has
|
||||||
|
completed. Thus, the overall runtime is effectively the same as running two GIL-locked threads.
|
||||||
|
|
||||||
|
Finally, be aware that attempting to unlock the GIL from a thread that doesn't own it will crash the
|
||||||
|
**interpreter**, not just the thread attempting the unlock:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%cython
|
||||||
|
|
||||||
|
cdef int cython_recurse(int n) nogil:
|
||||||
|
if n <= 0:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
with nogil:
|
||||||
|
return cython_recurse(n - 1)
|
||||||
|
|
||||||
|
cython_recurse(2)
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> Fatal Python error: PyEval_SaveThread: NULL tstate
|
||||||
|
>
|
||||||
|
> Thread 0x00007f499effd700 (most recent call first):
|
||||||
|
> File "/home/bspeice/.virtualenvs/release-the-gil/lib/python3.7/site-packages/ipykernel/parentpoller.py", line 39 in run
|
||||||
|
> File "/usr/lib/python3.7/threading.py", line 926 in _bootstrap_inner
|
||||||
|
> File "/usr/lib/python3.7/threading.py", line 890 in _bootstrap
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
In practice, avoiding this issue is simple. First, `nogil` functions probably shouldn't contain
|
||||||
|
`with nogil` blocks. Second, Cython can
|
||||||
|
[conditionally acquire/release](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#conditional-acquiring-releasing-the-gil)
|
||||||
|
the GIL, so these conditions can be used to synchronize access. Finally, Cython's documentation for
|
||||||
|
[external C code](https://cython.readthedocs.io/en/latest/src/userguide/external_C_code.html#acquiring-and-releasing-the-gil)
|
||||||
|
contains more detail on how to safely manage the GIL.
|
||||||
|
|
||||||
|
To conclude: use Cython's `nogil` annotation to assert that functions are safe for calling when the
|
||||||
|
GIL is unlocked, and `with nogil` to actually unlock the GIL and run those functions.
|
||||||
|
|
||||||
|
## Numba
|
||||||
|
|
||||||
|
Like Cython, [Numba](https://numba.pydata.org/) is a "compiled Python." Where Cython works by
|
||||||
|
compiling a Python-like language to C/C++, Numba compiles Python bytecode _directly to machine code_
|
||||||
|
at runtime. Behavior is controlled with a special `@jit` decorator; calling a decorated function
|
||||||
|
first compiles it to machine code before running. Calling the function a second time re-uses that
|
||||||
|
machine code unless the argument types have changed.
|
||||||
|
|
||||||
|
Numba works best when a `nopython=True` argument is added to the `@jit` decorator; functions
|
||||||
|
compiled in [`nopython`](http://numba.pydata.org/numba-doc/latest/user/jit.html?#nopython) mode
|
||||||
|
avoid the CPython API and have performance comparable to C. Further, adding `nogil=True` to the
|
||||||
|
`@jit` decorator unlocks the GIL while that function is running. Note that `nogil` and `nopython`
|
||||||
|
are separate arguments; while it is necessary for code to be compiled in `nopython` mode in order to
|
||||||
|
release the lock, the GIL will remain locked if `nogil=False` (the default).
|
||||||
|
|
||||||
|
Let's repeat the same experiment, this time using Numba instead of Cython:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# The `int` type annotation is only for humans and is ignored
|
||||||
|
# by Numba.
|
||||||
|
@jit(nopython=True, nogil=True)
|
||||||
|
def numba_nogil(n: int) -> int:
|
||||||
|
if n <= 1:
|
||||||
|
return n
|
||||||
|
|
||||||
|
a = 0
|
||||||
|
b = 1
|
||||||
|
|
||||||
|
c = a + b
|
||||||
|
for _i in range(2, n):
|
||||||
|
a = b
|
||||||
|
b = c
|
||||||
|
c = a + b
|
||||||
|
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
# Run using `nopython` mode to receive a performance boost,
|
||||||
|
# but GIL remains locked due to `nogil=False` by default.
|
||||||
|
@jit(nopython=True)
|
||||||
|
def numba_gil(n: int) -> int:
|
||||||
|
if n <= 1:
|
||||||
|
return n
|
||||||
|
|
||||||
|
a = 0
|
||||||
|
b = 1
|
||||||
|
|
||||||
|
c = a + b
|
||||||
|
for _i in range(2, n):
|
||||||
|
a = b
|
||||||
|
b = c
|
||||||
|
c = a + b
|
||||||
|
|
||||||
|
return c
|
||||||
|
|
||||||
|
|
||||||
|
# Call each function once to force compilation; we don't want
|
||||||
|
# the timing statistics to include how long it takes to compile.
|
||||||
|
numba_nogil(N)
|
||||||
|
numba_gil(N);
|
||||||
|
```
|
||||||
|
|
||||||
|
We'll perform the same tests as above; first, figure out how long it takes the function to run:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
_ = numba_gil(N)
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 253 ms, sys: 258 µs, total: 253 ms
|
||||||
|
> Wall time: 251 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
<small>
|
||||||
|
Aside: it's not immediately clear why Numba takes ~20% less time to run than Cython for code that should be
|
||||||
|
effectively identical after compilation.
|
||||||
|
</small>
|
||||||
|
|
||||||
|
When running two GIL-locked threads, the result (as expected) takes around twice as long to compute:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
t1 = Thread(target=numba_gil, args=[N])
|
||||||
|
t2 = Thread(target=numba_gil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 541 ms, sys: 3.96 ms, total: 545 ms
|
||||||
|
> Wall time: 541 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
But if the GIL-unlocking thread starts first, both threads run in parallel:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
t1 = Thread(target=numba_nogil, args=[N])
|
||||||
|
t2 = Thread(target=numba_gil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 551 ms, sys: 7.77 ms, total: 559 ms
|
||||||
|
> Wall time: 279 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Just like Cython, starting the GIL-locked thread first leads to poor performance:
|
||||||
|
|
||||||
|
```python
|
||||||
|
%%time
|
||||||
|
t1 = Thread(target=numba_gil, args=[N])
|
||||||
|
t2 = Thread(target=numba_nogil, args=[N])
|
||||||
|
t1.start(); t2.start()
|
||||||
|
t1.join(); t2.join()
|
||||||
|
```
|
||||||
|
|
||||||
|
> <pre>
|
||||||
|
> CPU times: user 524 ms, sys: 0 ns, total: 524 ms
|
||||||
|
> Wall time: 522 ms
|
||||||
|
> </pre>
|
||||||
|
|
||||||
|
Finally, unlike Cython, Numba will unlock the GIL if and only if it is currently acquired;
|
||||||
|
recursively calling `@jit(nogil=True)` functions is perfectly safe:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from numba import jit
|
||||||
|
|
||||||
|
@jit(nopython=True, nogil=True)
|
||||||
|
def numba_recurse(n: int) -> int:
|
||||||
|
if n <= 0:
|
||||||
|
return 0
|
||||||
|
|
||||||
|
return numba_recurse(n - 1)
|
||||||
|
|
||||||
|
numba_recurse(2);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
Before finishing, it's important to address pain points that will show up if these techniques are
|
||||||
|
used in a more realistic project:
|
||||||
|
|
||||||
|
First, code running in a GIL-free context will likely also need non-trivial data structures;
|
||||||
|
GIL-free functions aren't useful if they're constantly interacting with Python objects whose access
|
||||||
|
requires the GIL. Cython provides
|
||||||
|
[extension types](http://docs.cython.org/en/latest/src/tutorial/cdef_classes.html) and Numba
|
||||||
|
provides a [`@jitclass`](https://numba.pydata.org/numba-doc/dev/user/jitclass.html) decorator to
|
||||||
|
address this need.
|
||||||
|
|
||||||
|
Second, building and distributing applications that make use of Cython/Numba can be complicated.
|
||||||
|
Cython packages require running the compiler, (potentially) linking/packaging external dependencies,
|
||||||
|
and distributing a binary wheel. Numba is generally simpler because the code being distributed is
|
||||||
|
pure Python, but can be tricky since errors aren't detected until runtime.
|
||||||
|
|
||||||
|
Finally, while unlocking the GIL is often a solution in search of a problem, both Cython and Numba
|
||||||
|
provide tools to directly manage the GIL when appropriate. This enables true parallelism (not just
|
||||||
|
[concurrency](https://stackoverflow.com/a/1050257)) that is impossible in vanilla Python.
|
@ -1,48 +0,0 @@
|
|||||||
---
|
|
||||||
slug: mdx-blog-post
|
|
||||||
title: MDX Blog Post With An Extraordinarily Long Title
|
|
||||||
date: 2021-08-02 10:00:00
|
|
||||||
authors: [bspeice]
|
|
||||||
tags: []
|
|
||||||
---
|
|
||||||
|
|
||||||
## title
|
|
||||||
|
|
||||||
Hello?
|
|
||||||
|
|
||||||
Blog posts support [Docusaurus Markdown features](https://docusaurus.io/docs/markdown-features), such as [MDX](https://mdxjs.com/).
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary>Hello</summary>
|
|
||||||
|
|
||||||
Testing - {1 + 2}
|
|
||||||
</details>
|
|
||||||
|
|
||||||
|
|
||||||
:::tip
|
|
||||||
|
|
||||||
Use the power of React to create interactive blog posts.
|
|
||||||
|
|
||||||
:::
|
|
||||||
|
|
||||||
{/* truncate */}
|
|
||||||
|
|
||||||
For example, use JSX to create an interactive button:
|
|
||||||
|
|
||||||
```js
|
|
||||||
<button onClick={() => alert('button clicked!')}>Click me!</button>
|
|
||||||
<button onClick={() => alert('button clicked!')}>Click me!</button>
|
|
||||||
```
|
|
||||||
|
|
||||||
```cpp
|
|
||||||
class MyClass {
|
|
||||||
public:
|
|
||||||
MyClass() = default;
|
|
||||||
};
|
|
||||||
|
|
||||||
int main() {
|
|
||||||
auto x = 24;
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
<button onClick={() => alert('button clicked!')}>Click me!</button>
|
|
60
blog/2022-11-20-webpack-industrial-complex/_article.md
Normal file
60
blog/2022-11-20-webpack-industrial-complex/_article.md
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
---
|
||||||
|
layout: post
|
||||||
|
title: "The webpack industrial complex"
|
||||||
|
description: "Reflections on a new project"
|
||||||
|
category:
|
||||||
|
tags: [webpack, react, vite]
|
||||||
|
---
|
||||||
|
|
||||||
|
This started because I wanted to build a synthesizer. Setting a goal of "digital DX7" was ambitious, but I needed something unrelated to the day job. Beyond that, working with audio seemed like a good challenge. I enjoy performance-focused code, and performance problems in audio are conspicuous. Building a web project was an obvious choice because of the web audio API documentation and independence from a large Digital Audio Workstation (DAW).
|
||||||
|
|
||||||
|
The project was soon derailed trying to sort out technical issues unrelated to the original purpose. Finding a resolution was a frustrating journey, and it's still not clear whether those problems were my fault. As a result, I'm writing this to try making sense of it, as a case study/reference material, and to salvage something from the process.
|
||||||
|
|
||||||
|
## Starting strong
|
||||||
|
|
||||||
|
The sole starting requirement was to write everything in TypeScript. Not because of project scale, but because guardrails help with unfamiliar territory. Keeping that in mind, the first question was: how does one start a new project? All I actually need is "compile TypeScript, show it in a browser."
|
||||||
|
|
||||||
|
Create React App (CRA) came to the rescue and the rest of that evening was a joy. My TypeScript/JavaScript skills were rusty, but the online documentation was helpful. I had never understood the appeal of JSX (why put a DOM in JavaScript?) until it made connecting an `onEvent` handler and a function easy.
|
||||||
|
|
||||||
|
Some quick dimensional analysis later and there was a sine wave oscillator playing A=440 through the speakers. I specifically remember thinking "modern browsers are magical."
|
||||||
|
|
||||||
|
## Continuing on
|
||||||
|
|
||||||
|
Now comes the first mistake: I began to worry about "scale" before encountering an actual problem. Rather than rendering audio in the main thread, why not use audio worklets and render in a background thread instead?
|
||||||
|
|
||||||
|
The first sign something was amiss came from the TypeScript compiler errors showing the audio worklet API [was missing](https://github.com/microsoft/TypeScript/issues/28308). After searching out Github issues and (unsuccessfully) tweaking the `.tsconfig` settings, I settled on installing a package and moving on.
|
||||||
|
|
||||||
|
The next problem came from actually using the API. Worklets must load from separate "modules," but it wasn't clear how to guarantee the worklet code stayed separate from the application. I saw recommendations to use `new URL(<local path>, import.meta.url)` and it worked! Well, kind of:
|
||||||
|
|
||||||
|
![Browser error](/assets/images/2022-11-20-video_mp2t.png)
|
||||||
|
|
||||||
|
That file has the audio processor code, so why does it get served with `Content-Type: video/mp2t`?
|
||||||
|
|
||||||
|
## Floundering about
|
||||||
|
|
||||||
|
Now comes the second mistake: even though I didn't understand the error, I ignored recommendations to [just use JavaScript](https://hackernoon.com/implementing-audioworklets-with-react-8a80a470474) and stuck by the original TypeScript requirement.
|
||||||
|
|
||||||
|
I tried different project structures. Moving the worklet code to a new folder didn't help, nor did setting up a monorepo and placing it in a new package.
|
||||||
|
|
||||||
|
I tried three different CRA tools - `react-app-rewired`, `craco`, `customize-react-app` - but got the same problem. Each has varying levels of compatibility with recent CRA versions, so it wasn't clear if I had the right solution but implemented it incorrectly. After attempting to eject the application and panicking after seeing the configuration, I abandoned that as well.
|
||||||
|
|
||||||
|
I tried changing the webpack configuration: using [new](https://github.com/webpack/webpack/issues/11543#issuecomment-917673256) [loaders](https://github.com/popelenkow/worker-url), setting [asset rules](https://github.com/webpack/webpack/discussions/14093#discussioncomment-1257149), even [changing how webpack detects worker resources](https://github.com/webpack/webpack/issues/11543#issuecomment-826897590). In hindsight, entry points may have been the answer. But because CRA actively resists attempts to change its webpack configuration, and I couldn't find audio worklet examples in any other framework, I gave up.
|
||||||
|
|
||||||
|
I tried so many application frameworks. Next.js looked like a good candidate, but added its own [bespoke webpack complexity](https://github.com/vercel/next.js/issues/24907) to the existing confusion. Astro had the best "getting started" experience, but I refuse to install an IDE-specific plugin. I first used Deno while exploring Lume, but it couldn't import the audio worklet types (maybe because of module compatibility?). Each framework was unique in its own way (shout-out to SvelteKit) but I couldn't figure out how to make them work.
|
||||||
|
|
||||||
|
## Learning and reflecting
|
||||||
|
|
||||||
|
I ended up using Vite and vite-plugin-react-pages to handle both "build the app" and "bundle worklets," but the specific tool choice isn't important. Instead, the focus should be on lessons learned.
|
||||||
|
|
||||||
|
For myself:
|
||||||
|
|
||||||
|
- I'm obsessed with tooling, to the point it can derail the original goal. While it comes from a good place (for example: "types are awesome"), it can get in the way of more important work
|
||||||
|
- I tend to reach for online resources right after seeing a new problem. While finding help online is often faster, spending time understanding the problem would have been more productive than cycling through (often outdated) blog posts
|
||||||
|
|
||||||
|
For the tools:
|
||||||
|
|
||||||
|
- Resource bundling is great and solves a genuine challenge. I've heard too many horror stories of developers writing modules by hand to believe this is unnecessary complexity
|
||||||
|
- Webpack is a build system and modern frameworks are deeply dependent on it (hence the "webpack industrial complex"). While this often saves users from unnecessary complexity, there's no path forward if something breaks
|
||||||
|
- There's little ability to mix and match tools across frameworks. Next.js and Gatsby let users extend webpack, but because each framework adds its own modules, changes aren't portable. After spending a week looking at webpack, I had an example running with parcel in thirty minutes, but couldn't integrate it
|
||||||
|
|
||||||
|
In the end, learning new systems is fun, but a focus on tools that "just work" can leave users out in the cold if they break down.
|
60
blog/2022-11-20-webpack-industrial-complex/index.mdx
Normal file
60
blog/2022-11-20-webpack-industrial-complex/index.mdx
Normal file
@ -0,0 +1,60 @@
|
|||||||
|
---
|
||||||
|
slug: 2011/11/webpack-industrial-complex
|
||||||
|
title: The webpack industrial complex
|
||||||
|
date: 2022-11-20 12:00:00
|
||||||
|
authors: [bspeice]
|
||||||
|
tags: []
|
||||||
|
---
|
||||||
|
|
||||||
|
This started because I wanted to build a synthesizer. Setting a goal of "digital DX7" was ambitious, but I needed something unrelated to the day job. Beyond that, working with audio seemed like a good challenge. I enjoy performance-focused code, and performance problems in audio are conspicuous. Building a web project was an obvious choice because of the web audio API documentation and independence from a large Digital Audio Workstation (DAW).
|
||||||
|
|
||||||
|
The project was soon derailed trying to sort out technical issues unrelated to the original purpose. Finding a resolution was a frustrating journey, and it's still not clear whether those problems were my fault. As a result, I'm writing this to try making sense of it, as a case study/reference material, and to salvage something from the process.
|
||||||
|
|
||||||
|
## Starting strong
|
||||||
|
|
||||||
|
The sole starting requirement was to write everything in TypeScript. Not because of project scale, but because guardrails help with unfamiliar territory. Keeping that in mind, the first question was: how does one start a new project? All I actually need is "compile TypeScript, show it in a browser."
|
||||||
|
|
||||||
|
Create React App (CRA) came to the rescue and the rest of that evening was a joy. My TypeScript/JavaScript skills were rusty, but the online documentation was helpful. I had never understood the appeal of JSX (why put a DOM in JavaScript?) until it made connecting an `onEvent` handler and a function easy.
|
||||||
|
|
||||||
|
Some quick dimensional analysis later and there was a sine wave oscillator playing A=440 through the speakers. I specifically remember thinking "modern browsers are magical."
|
||||||
|
|
||||||
|
## Continuing on
|
||||||
|
|
||||||
|
Now comes the first mistake: I began to worry about "scale" before encountering an actual problem. Rather than rendering audio in the main thread, why not use audio worklets and render in a background thread instead?
|
||||||
|
|
||||||
|
The first sign something was amiss came from the TypeScript compiler errors showing the audio worklet API [was missing](https://github.com/microsoft/TypeScript/issues/28308). After searching out Github issues and (unsuccessfully) tweaking the `.tsconfig` settings, I settled on installing a package and moving on.
|
||||||
|
|
||||||
|
The next problem came from actually using the API. Worklets must load from separate "modules," but it wasn't clear how to guarantee the worklet code stayed separate from the application. I saw recommendations to use `new URL(<local path>, import.meta.url)` and it worked! Well, kind of:
|
||||||
|
|
||||||
|
![Browser error](./video_mp2t.png)
|
||||||
|
|
||||||
|
That file has the audio processor code, so why does it get served with `Content-Type: video/mp2t`?
|
||||||
|
|
||||||
|
## Floundering about
|
||||||
|
|
||||||
|
Now comes the second mistake: even though I didn't understand the error, I ignored recommendations to [just use JavaScript](https://hackernoon.com/implementing-audioworklets-with-react-8a80a470474) and stuck by the original TypeScript requirement.
|
||||||
|
|
||||||
|
I tried different project structures. Moving the worklet code to a new folder didn't help, nor did setting up a monorepo and placing it in a new package.
|
||||||
|
|
||||||
|
I tried three different CRA tools - `react-app-rewired`, `craco`, `customize-react-app` - but got the same problem. Each has varying levels of compatibility with recent CRA versions, so it wasn't clear if I had the right solution but implemented it incorrectly. After attempting to eject the application and panicking after seeing the configuration, I abandoned that as well.
|
||||||
|
|
||||||
|
I tried changing the webpack configuration: using [new](https://github.com/webpack/webpack/issues/11543#issuecomment-917673256) [loaders](https://github.com/popelenkow/worker-url), setting [asset rules](https://github.com/webpack/webpack/discussions/14093#discussioncomment-1257149), even [changing how webpack detects worker resources](https://github.com/webpack/webpack/issues/11543#issuecomment-826897590). In hindsight, entry points may have been the answer. But because CRA actively resists attempts to change its webpack configuration, and I couldn't find audio worklet examples in any other framework, I gave up.
|
||||||
|
|
||||||
|
I tried so many application frameworks. Next.js looked like a good candidate, but added its own [bespoke webpack complexity](https://github.com/vercel/next.js/issues/24907) to the existing confusion. Astro had the best "getting started" experience, but I refuse to install an IDE-specific plugin. I first used Deno while exploring Lume, but it couldn't import the audio worklet types (maybe because of module compatibility?). Each framework was unique in its own way (shout-out to SvelteKit) but I couldn't figure out how to make them work.
|
||||||
|
|
||||||
|
## Learning and reflecting
|
||||||
|
|
||||||
|
I ended up using Vite and vite-plugin-react-pages to handle both "build the app" and "bundle worklets," but the specific tool choice isn't important. Instead, the focus should be on lessons learned.
|
||||||
|
|
||||||
|
For myself:
|
||||||
|
|
||||||
|
- I'm obsessed with tooling, to the point it can derail the original goal. While it comes from a good place (for example: "types are awesome"), it can get in the way of more important work
|
||||||
|
- I tend to reach for online resources right after seeing a new problem. While finding help online is often faster, spending time understanding the problem would have been more productive than cycling through (often outdated) blog posts
|
||||||
|
|
||||||
|
For the tools:
|
||||||
|
|
||||||
|
- Resource bundling is great and solves a genuine challenge. I've heard too many horror stories of developers writing modules by hand to believe this is unnecessary complexity
|
||||||
|
- Webpack is a build system and modern frameworks are deeply dependent on it (hence the "webpack industrial complex"). While this often saves users from unnecessary complexity, there's no path forward if something breaks
|
||||||
|
- There's little ability to mix and match tools across frameworks. Next.js and Gatsby let users extend webpack, but because each framework adds its own modules, changes aren't portable. After spending a week looking at webpack, I had an example running with parcel in thirty minutes, but couldn't integrate it
|
||||||
|
|
||||||
|
In the end, learning new systems is fun, but a focus on tools that "just work" can leave users out in the cold if they break down.
|
BIN
blog/2022-11-20-webpack-industrial-complex/video_mp2t.png
Normal file
BIN
blog/2022-11-20-webpack-industrial-complex/video_mp2t.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 48 KiB |
Loading…
Reference in New Issue
Block a user