Final draft, and publish

2025-07-28 11:05:07 -04:00 · 2019-07-31 16:17:15 -04:00
parent 3f561c959c
commit 4714448b1a
1 changed files with 7 additions and 9 deletions
--- a/_drafts/writing-fast-code.md
+++ b/_drafts/writing-fast-code.md
@ -18,11 +18,11 @@ Having now worked in the trading industry, I can confirm the developers aren't s

 The framework I'd propose is this: **If you want to build high-performance systems, focus first on reducing performance variance** (reducing the gap between the fastest and slowest runs of the same code), **and only look at average latency once variance is at an acceptable level**.

-Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly *every time*, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving overall time.
+Don't get me wrong, I'm a much happier person when things are fast. Computer goes from booting in 20 seconds down to 10 because I installed a solid-state drive? Awesome. But if every fifth day it takes a full minute to boot because of corrupted sectors? Not so great. Average speed over the course of a week is the same in each situation, but you're painfully aware of that minute when it happens. When it comes to code, the principal is the same: speeding up a function by an average of 10 milliseconds doesn't mean much if there's a 100ms difference between your fastest and slowest runs. When performance matters, you need to respond quickly *every time*, not just in aggregate. High-performance systems should first optimize for time variance. Once you're consistent at the time scale you care about, then focus on improving average  time.

-This focus on variance shows up all the time in public discussions (emphasis added in all quotes below):
+This focus on variance shows up all the time in industry too (emphasis added in all quotes below):

- Consistency shows up in [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for NASDAQ's matching engine, the most performance-sensitive component of the exchange:
+- In [marketing materials](https://business.nasdaq.com/market-tech/marketplaces/trading) for NASDAQ's matching engine, the most performance-sensitive component of the exchange, dependability is highlighted in addition to instantaneous metrics:
    > Able to **consistently sustain** an order rate of over 100,000 orders per second at sub-40 microsecond average latency

 - The [Aeron](https://github.com/real-logic/aeron) message bus has this to say about performance:
@ -34,9 +34,9 @@ This focus on variance shows up all the time in public discussions (emphasis add
 - [Solarflare](https://solarflare.com/), which makes highly-specialized network hardware, points out variance (jitter) as a big concern for [electronic trading](https://solarflare.com/electronic-trading/):
    > The high stakes world of electronic trading, investment banks, market makers, hedge funds and exchanges demand the **lowest possible latency and jitter** while utilizing the highest bandwidth and return on their investment.

-And one more clarification: we're not discussing how to reduce *total run-time*, but its variance. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at *absolute minimum* a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago) (unless [neutrinos](https://arxiv.org/pdf/1203.2847.pdf) become [a thing](https://news.ycombinator.com/item?id=10979340)). If a trading system in Chicago calls the function for "send order to Tokyo" and waits for a result, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of additional processing to a minimum.
+And to further clarify: we're not discussing *total run-time*, but variance of total run-time. There are situations where it's not reasonably possible to make things faster, and you'd much rather be consistent. For example, trading firms use [wireless networks](https://sniperinmahwah.wordpress.com/2017/06/07/network-effects-part-i/) because the speed of light through air is faster than through fiber-optic cables. There's still at *absolute minimum* a [~33.76 millisecond](http://tinyurl.com/y2vd7tn8) delay required to send data between, say, [Chicago and Tokyo](https://www.theice.com/market-data/connectivity-and-feeds/wireless/tokyo-chicago). If a trading system in Chicago calls the function for "send order to Tokyo" and waits to see if a trade occurs, there's a physical limit to how long that will take. In this situation, the focus is on keeping variance of *additional processing* to a minimum, since speed of light is the limiting factor.

-So how exactly does one go about looking for and eliminating performance variance? To tell the truth, I don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep understanding of the entire technology stack, and (B) actually measuring system performance (though (C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos never hurt). Even then, every project cares about performance to a different degree; you may need to build an entire [replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to accurately benchmark at nanosecond precision, or you may be content to simply [avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in your Java code.
+So how does one go about looking for and eliminating performance variance? To tell the truth, I don't think a systematic answer or flow-chart exists. There's no substitute for (A) building a deep understanding of the entire technology stack, and (B) actually measuring system performance (though (C) watching a lot of [CppCon](https://www.youtube.com/channel/UCMlGfpWw-RUdWX_JbLCukXg) videos for inspiration never hurt). Even then, every project cares about performance to a different degree; you may need to build an entire [replica production system](https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be&t=3015) to accurately benchmark at nanosecond precision, or you may be content to simply [avoid garbage collection](https://www.youtube.com/watch?v=BD9cRbxWQx8&feature=youtu.be&t=1335) in your Java code.

 Even though everyone has different needs, there are still common things to look for when trying to isolate and eliminate variance. In no particular order, these are my focus areas when thinking about high-performance systems:

@ -88,8 +88,6 @@ Code you wrote is almost certainly not the *only* code running on your hardware.

 High-performance systems, regardless of industry, are not magical. They do require extreme precision and attention to detail, but they're designed, built, and operated by regular people, using a lot of tools that are publicly available. Interested in seeing how context switching affects performance of your benchmarks? `taskset` should be installed in all modern Linux distributions, and can be used to make sure the OS never migrates your process. Curious how often garbage collection triggers during a crucial operation? Your language of choice will typically expose details of its operations ([Python](https://docs.python.org/3/library/gc.html), [Java](https://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#DebuggingOptions)). Want to know how hard your program is stressing the TLB? Use `perf record` and look for `dtlb_load_misses.miss_causes_a_walk`.

-Two final caveats then. First, before attempting to apply some of the technology above to your own systems, can you first identify [where/when you care](http://wiki.c2.com/?PrematureOptimization) about "high-performance"? As an example, if parts of a system rely on humans pushing buttons, you likely don't need to worry CPU pinning. Humans are already far too slow to react in time.
+Two final guiding questions, then: first, before attempting to apply some of the technology above to your own systems, can you first identify [where/when you care](http://wiki.c2.com/?PrematureOptimization) about "high-performance"? As an example, if parts of a system rely on humans pushing buttons, CPU pinning won't have any measurable effect. Humans are already far too slow to react in time. Second, if you're using benchmarks, are they being designed in a way that's actually helpful? Tools like [Criterion](http://www.serpentine.com/criterion/) (also in [Rust](https://github.com/bheisler/criterion.rs)) and Google's [Benchmark](https://github.com/google/benchmark) output not only average run time, but variance as well; your benchmarking environment is subject to the same concerns your production environment is.

-Second, if you're using benchmarks, are they being designed in a way that's actually helpful? Tools like [Criterion](http://www.serpentine.com/criterion/) (also in [Rust](https://github.com/bheisler/criterion.rs)) and Google's [Benchmark](https://github.com/google/benchmark) output not only average run time, but variance as well; your benchmarking environment is subject to the same concerns your production environment is.
-
-Finally then, I believe high-performance systems are a matter of philosophy, not necessarily technique. Rigorous focus on variance is the first step, and there are plenty of ways to measure and mitigate it; once that's at an acceptable level, then optimize for speed.
+Finally, I believe high-performance systems are a matter of philosophy, not necessarily technique. Rigorous focus on variance is the first step, and there are plenty of ways to measure and mitigate it; once that's at an acceptable level, then optimize for speed.