From c7f94b742600e000cfddc063b6f9eed3b9eb7d62 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sun, 1 Sep 2019 23:56:43 -0400 Subject: [PATCH 01/12] Start work on binary format shootout --- _pages/about.md | 2 - _posts/2019-09-01-binary-shootout-part-0.md | 44 +++++++++++++++++++ _posts/2019-09-01-binary-shootout-part-1.md | 47 +++++++++++++++++++++ 3 files changed, 91 insertions(+), 2 deletions(-) create mode 100644 _posts/2019-09-01-binary-shootout-part-0.md create mode 100644 _posts/2019-09-01-binary-shootout-part-1.md diff --git a/_pages/about.md b/_pages/about.md index 782427d..699dc7d 100644 --- a/_pages/about.md +++ b/_pages/about.md @@ -10,5 +10,3 @@ Best ways to get in contact: - Email: [bradlee@speice.io](mailto:bradlee@speice.io) - LinkedIn: [bradleespeice](https://www.linkedin.com/in/bradleespeice/) -- Matrix (Chat): [@bspeice:matrix.com](https://matrix.to/#/@bspeice:matrix.com) -- Gitter (Chat): [bspeice](https://gitter.im/bspeice/Lobby) diff --git a/_posts/2019-09-01-binary-shootout-part-0.md b/_posts/2019-09-01-binary-shootout-part-0.md new file mode 100644 index 0000000..5e75b5a --- /dev/null +++ b/_posts/2019-09-01-binary-shootout-part-0.md @@ -0,0 +1,44 @@ +--- +layout: post +title: "Binary Format Shootout - Prologue: Nom" +description: "Making sense of binary streams" +category: +tags: [rust, binary-shootout] +--- + +I've been interested in using a binary protocol library for personal projects recently, +and found myself with a strong case of decision paralysis. Do I use +[Cap'n Proto](https://capnproto.org/), which has supported Rust the longest? +[Flatbuffers](https://google.github.io/flatbuffers) recently added support, +or I could take a look at [SBE](https://github.com/real-logic/simple-binary-encoding). +Or what about building something myself? A lot of these seem unnecessarily +complicated, when my personal use case is just providing views on top of +buffers with a relatively simple structure. + +Even in my personal projects, I want the choices to be the best possible; +I hate the feeling of looking back at anything I've built and saying "I regret +that decision and I could have done better." So after agonizing over the choice +of protocol library for too long, I decided it would be worth building a test +to get a feel for each. It would give me a way to build a proof-of-concept +and become familiar with how each library worked, what the performance +characteristics were of each, and evaluate whether it was worth putting +in the effort of building yet another binary protocol library myself. + +To that end, this is the summation of research into the binary protocol +systems that currently support Rust. The goal isn't to recommend "the best," +but to understand each well enough to make an informed decision. + +My use case is as follows: ingest binary market data from +[IEX](https://iextrading.com/trading/market-data/) and turn it into +a format understandable by each library being tested. We'll later +write a simple program to analyze the data. + +Note: Market data is the use case here +simply because IEX makes the data freely available; no code or analysis +in this blog is related to my past or present work. + +But before we can run any analysis, we need to read in the files +supplied by IEX. To do that, we'll use a library in Rust +called [`nom`](https://docs.rs/nom/5.0.1/nom/). + +# Ingesting Market Data diff --git a/_posts/2019-09-01-binary-shootout-part-1.md b/_posts/2019-09-01-binary-shootout-part-1.md new file mode 100644 index 0000000..c62eb04 --- /dev/null +++ b/_posts/2019-09-01-binary-shootout-part-1.md @@ -0,0 +1,47 @@ +--- +layout: post +title: "new post" +description: "" +category: +tags: [] +--- + +# Designing the Test + +My use case is as follows: ingest binary market data from +[IEX](https://iextrading.com/trading/market-data/) and turn it into +a format understandable by each library being tested. Then we'll +write a simple program to find total trade volume per ticker, +and the highest and lowest bid/ask price per ticker as well. + +Note: Market data is the use case here +simply because IEX makes the data freely available; no code or analysis +in this blog is related to my past or present work. + +Now, the basic criteria used to evaluate each library: + +1) The library must have cross-language support, and treat Rust as a +first-class citizen. + +2) The schema must be able to evolve and add new fields. The information +I'm gathering now is fairly simple, but would evolve in the future. + +3) Performance is a priority; material performance differences +(time to de/serialize, memory usage) matter. + +Under those three criteria, we're excluding a lot of systems that +may make sense in other contexts: + +- [Bincode](https://github.com/servo/bincode) has great Rust support +and a simple wire format (message structure) but isn't usable from +other languages and doesn't deal well with schema evolution. + +- [Protocol Buffers](https://developers.google.com/protocol-buffers/) have +great cross-language support, but material performance issues compared +to other systems like FlatBuffers. + +- JSON/Msgpack are schema-less; while the wire format is simple, +having code generated from a schema is too nice to pass up. + +While each of these have a niche they perform well in, they're not +suited for the system under consideration. \ No newline at end of file From 20f041688708f395435188f9e104b56ccf4cf644 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Thu, 26 Sep 2019 23:24:39 -0400 Subject: [PATCH 02/12] Start a second pass on the article Also change the table formatting to actually be readable --- _posts/2019-09-01-binary-shootout-part-0.md | 175 ++++++++++++++++---- _sass/components/_article.scss | 4 + 2 files changed, 148 insertions(+), 31 deletions(-) diff --git a/_posts/2019-09-01-binary-shootout-part-0.md b/_posts/2019-09-01-binary-shootout-part-0.md index 5e75b5a..0574a6c 100644 --- a/_posts/2019-09-01-binary-shootout-part-0.md +++ b/_posts/2019-09-01-binary-shootout-part-0.md @@ -1,44 +1,157 @@ --- layout: post -title: "Binary Format Shootout - Prologue: Nom" +title: "Binary Format Shootout" description: "Making sense of binary streams" category: tags: [rust, binary-shootout] --- -I've been interested in using a binary protocol library for personal projects recently, -and found myself with a strong case of decision paralysis. Do I use -[Cap'n Proto](https://capnproto.org/), which has supported Rust the longest? -[Flatbuffers](https://google.github.io/flatbuffers) recently added support, -or I could take a look at [SBE](https://github.com/real-logic/simple-binary-encoding). -Or what about building something myself? A lot of these seem unnecessarily -complicated, when my personal use case is just providing views on top of -buffers with a relatively simple structure. +I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) +is particularly deadly. There's nothing like having other options available to make you question your decisions. +There's a particular scenario that scares me: I'm a couple months into a project, only to realize that if I had +made a different choice at an earlier juncture, weeks of work could have been saved. If only an extra hour or +two of research had been conducted, everything would've turned out differently. -Even in my personal projects, I want the choices to be the best possible; -I hate the feeling of looking back at anything I've built and saying "I regret -that decision and I could have done better." So after agonizing over the choice -of protocol library for too long, I decided it would be worth building a test -to get a feel for each. It would give me a way to build a proof-of-concept -and become familiar with how each library worked, what the performance -characteristics were of each, and evaluate whether it was worth putting -in the effort of building yet another binary protocol library myself. +Let's say you're in need of a binary serialization schema for a project you're working on. Data will be going +over the network, not just in memory, so having a schema document is a must. Performance is important; +there's no reason to use Protocol Buffers when other projects support similar features at faster speed. +And it must be polyglot; Rust support needs to be there, but we can't predict what other languages this will +interact with. -To that end, this is the summation of research into the binary protocol -systems that currently support Rust. The goal isn't to recommend "the best," -but to understand each well enough to make an informed decision. +Given these requirements, the formats I could find were: -My use case is as follows: ingest binary market data from -[IEX](https://iextrading.com/trading/market-data/) and turn it into -a format understandable by each library being tested. We'll later -write a simple program to analyze the data. +1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and integrates well with all the build tools +2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding +3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) is being adopted by the + [High-performance financial](https://www.fixtrading.org/standards/sbe/) community, but the Rust implementation + is essentially unmaintained -Note: Market data is the use case here -simply because IEX makes the data freely available; no code or analysis -in this blog is related to my past or present work. +Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast, +and support multiple languages. But actually picking one to build a system on is intimidating; it's impossible +to know what issues that choice will lead to. -But before we can run any analysis, we need to read in the files -supplied by IEX. To do that, we'll use a library in Rust -called [`nom`](https://docs.rs/nom/5.0.1/nom/). +Still, a choice must be made. It's not particularly groundbreaking, but I decided to build a test system to help +understand how they all behave. -# Ingesting Market Data +# Prologue: Reading the Data + +Our benchmark will be a simple market data processor; given messages from [IEX](https://iextrading.com/trading/market-data/#deep), +serialize each message into the schema format, then read back each message to do some basic aggregation. + +But before we make it to that point, we have to read in the market data. To do so, I'm using a library +called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes, +so this was an opportunity to build a non-trivial program and see how it fared. + +If you're not familiar with `nom`, the idea is to build a binary data parser by combining different +mini-parsers. For example, if your data looks like +[this](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3): + +``` + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +---------------------------------------------------------------+ + 0 | Block Type = 0x00000006 | + +---------------------------------------------------------------+ + 4 | Block Total Length | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 8 | Interface ID | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +12 | Timestamp (High) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +16 | Timestamp (Low) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +20 | Captured Len | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +24 | Packet Len | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Packet Data | + | ... | +``` + +...you can build a parser in `nom` like +[this](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93): + +```rust +const ENHANCED_PACKET: [u8; 4] = [0x06, 0x00, 0x00, 0x00]; +pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> { + let ( + remaining, + ( + block_type, + block_len, + interface_id, + timestamp_high, + timestamp_low, + captured_len, + packet_len, + ), + ) = tuple(( + tag(ENHANCED_PACKET), + le_u32, + le_u32, + le_u32, + le_u32, + le_u32, + le_u32, + ))(input)?; + + let (remaining, packet_data) = take(captured_len)(remaining)?; + Ok((remaining, packet_data)) +} +``` + +This demonstration isn't too interesting, but when more complex formats need to be parsed (like IEX market data), +[`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs). + +Ultimately, because `nom` was used to parse the IEX-format market data before serialization, we're not too interested +in its performance. However, it's worth mentioning how much easier this project was because I didn't have to write +all the boring code by hand. + +# Part 1: Cap'n Proto + +Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried because of how long +it has supported Rust. It was a bit tricky to get the compiler installed, but once that was done, the +[schema document](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/marketdata.capnp) +wasn't hard to create. + +In practice, I had a ton of issues with Cap'n Proto. + +To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message +content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate +a new buffer for every single message. I was able to work around this and re-use memory with a +[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51), +but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) +to find an example usage and using `transmute` to bypass Rust's borrow checker. + +Reading messages was similarly problematic. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) +version, and an unpacked version. When reading "packed" messages, we need to unpack the message before we can make use of it. +This allocates a new buffer for each message, and I wasn't able to find a way to get around this. Unpacked messages, however, +shouldn't require any allocation or decoding steps. In practice, because of a +[bounds check](https://github.com/capnproto/capnproto-rust/blob/master/capnp/src/serialize.rs#L60) on the payload size, +I had to [copy parts](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L255-L340) +of the Cap'n Proto API to read messages without allocation. + +In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel +comfortable making use of Cap'n Proto. + +# Final Results + +NOTE: Need to expand on this, but numbers reported below are from the IEX's 2019-09-03 data, took average over 10 runs. + +Serialization + +| | median | 99th Pctl | 99.9th Pctl | Total | +|----------------------|--------|-----------|-------------|--------| +| Cap'n Proto Packed | 413ns | 1751ns | 2943ns | 14.80s | +| Cap'n Proto Unpacked | 273ns | 1828ns | 2836ns | 10.65s | +| Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s | +| SBE | 91ns | 1535ns | 2423ns | 3.91s | + +Deserialization + +| | median | 99th Pctl | 99.9th Pctl | Total | +|----------------------|--------|-----------|-------------|--------| +| Cap'n Proto Packed | 539ns | 1216ns | 2599ns | 18.92s | +| Cap'n Proto Unpacked | 366ns | 737ns | 1583ns | 12.32s | +| Flatbuffers | 173ns | 421ns | 1007ns | 6.00s | +| SBE | 116ns | 286ns | 659ns | 4.05s | diff --git a/_sass/components/_article.scss b/_sass/components/_article.scss index 33a0861..f6aae37 100644 --- a/_sass/components/_article.scss +++ b/_sass/components/_article.scss @@ -112,6 +112,10 @@ border-bottom-style: dotted; border-bottom-width: 1px; } + + td, th { + padding-right: 2em; + } } .c-article__footer { From e9794bc0c7bf563c694a062e57e6b6bd7aa93ba9 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Thu, 26 Sep 2019 23:25:42 -0400 Subject: [PATCH 03/12] Renaming and cleanup --- ...d => 2019-09-01-binary-format-shootout.md} | 2 +- _posts/2019-09-01-binary-shootout-part-1.md | 47 ------------------- 2 files changed, 1 insertion(+), 48 deletions(-) rename _posts/{2019-09-01-binary-shootout-part-0.md => 2019-09-01-binary-format-shootout.md} (99%) delete mode 100644 _posts/2019-09-01-binary-shootout-part-1.md diff --git a/_posts/2019-09-01-binary-shootout-part-0.md b/_posts/2019-09-01-binary-format-shootout.md similarity index 99% rename from _posts/2019-09-01-binary-shootout-part-0.md rename to _posts/2019-09-01-binary-format-shootout.md index 0574a6c..192eb03 100644 --- a/_posts/2019-09-01-binary-shootout-part-0.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -3,7 +3,7 @@ layout: post title: "Binary Format Shootout" description: "Making sense of binary streams" category: -tags: [rust, binary-shootout] +tags: [rust] --- I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) diff --git a/_posts/2019-09-01-binary-shootout-part-1.md b/_posts/2019-09-01-binary-shootout-part-1.md deleted file mode 100644 index c62eb04..0000000 --- a/_posts/2019-09-01-binary-shootout-part-1.md +++ /dev/null @@ -1,47 +0,0 @@ ---- -layout: post -title: "new post" -description: "" -category: -tags: [] ---- - -# Designing the Test - -My use case is as follows: ingest binary market data from -[IEX](https://iextrading.com/trading/market-data/) and turn it into -a format understandable by each library being tested. Then we'll -write a simple program to find total trade volume per ticker, -and the highest and lowest bid/ask price per ticker as well. - -Note: Market data is the use case here -simply because IEX makes the data freely available; no code or analysis -in this blog is related to my past or present work. - -Now, the basic criteria used to evaluate each library: - -1) The library must have cross-language support, and treat Rust as a -first-class citizen. - -2) The schema must be able to evolve and add new fields. The information -I'm gathering now is fairly simple, but would evolve in the future. - -3) Performance is a priority; material performance differences -(time to de/serialize, memory usage) matter. - -Under those three criteria, we're excluding a lot of systems that -may make sense in other contexts: - -- [Bincode](https://github.com/servo/bincode) has great Rust support -and a simple wire format (message structure) but isn't usable from -other languages and doesn't deal well with schema evolution. - -- [Protocol Buffers](https://developers.google.com/protocol-buffers/) have -great cross-language support, but material performance issues compared -to other systems like FlatBuffers. - -- JSON/Msgpack are schema-less; while the wire format is simple, -having code generated from a schema is too nice to pass up. - -While each of these have a niche they perform well in, they're not -suited for the system under consideration. \ No newline at end of file From 388bd413d51d546b07a944d2439755ba29ed45ac Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Thu, 26 Sep 2019 23:35:53 -0400 Subject: [PATCH 04/12] Continue formatting work --- _posts/2019-09-01-binary-format-shootout.md | 8 ++++---- _sass/components/_article.scss | 8 ++++++++ 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index 192eb03..a38cd5d 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -140,8 +140,8 @@ NOTE: Need to expand on this, but numbers reported below are from the IEX's 2019 Serialization -| | median | 99th Pctl | 99.9th Pctl | Total | -|----------------------|--------|-----------|-------------|--------| +| Schema | Median | 99th Pctl | 99.9th Pctl | Total | +|:---------------------|:-------|:----------|:------------|:-------| | Cap'n Proto Packed | 413ns | 1751ns | 2943ns | 14.80s | | Cap'n Proto Unpacked | 273ns | 1828ns | 2836ns | 10.65s | | Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s | @@ -149,8 +149,8 @@ Serialization Deserialization -| | median | 99th Pctl | 99.9th Pctl | Total | -|----------------------|--------|-----------|-------------|--------| +| Schema | Median | 99th Pctl | 99.9th Pctl | Total | +|:---------------------|:-------|:----------|:------------|:-------| | Cap'n Proto Packed | 539ns | 1216ns | 2599ns | 18.92s | | Cap'n Proto Unpacked | 366ns | 737ns | 1583ns | 12.32s | | Flatbuffers | 173ns | 421ns | 1007ns | 6.00s | diff --git a/_sass/components/_article.scss b/_sass/components/_article.scss index f6aae37..46979ca 100644 --- a/_sass/components/_article.scss +++ b/_sass/components/_article.scss @@ -113,8 +113,16 @@ border-bottom-width: 1px; } + table { + border-collapse: collapse; + border-style: hidden; + } + td, th { + padding-left: .1em; padding-right: 2em; + border-style: solid; + border-width: .1em; } } From 50df5f19c2fc530fbe5a10234cd3b6e0a69d7a59 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Fri, 27 Sep 2019 23:20:46 -0400 Subject: [PATCH 05/12] Writeup for Flatbuffers --- _posts/2019-09-01-binary-format-shootout.md | 50 ++++++++++++++++----- 1 file changed, 40 insertions(+), 10 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index a38cd5d..c7f4444 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -121,18 +121,48 @@ content, but because builders [can't be re-used](https://github.com/capnproto/ca a new buffer for every single message. I was able to work around this and re-use memory with a [special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51), but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) -to find an example usage and using `transmute` to bypass Rust's borrow checker. +to find an example and using `transmute` to bypass Rust's borrow checker. -Reading messages was similarly problematic. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) -version, and an unpacked version. When reading "packed" messages, we need to unpack the message before we can make use of it. -This allocates a new buffer for each message, and I wasn't able to find a way to get around this. Unpacked messages, however, -shouldn't require any allocation or decoding steps. In practice, because of a -[bounds check](https://github.com/capnproto/capnproto-rust/blob/master/capnp/src/serialize.rs#L60) on the payload size, -I had to [copy parts](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L255-L340) -of the Cap'n Proto API to read messages without allocation. +Reading messages is better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) +version, and an unpacked version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; +Cap'n Proto allocates a new buffer to unpack the message every time, and I wasn't able to figure out a way around that. +In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/). +However, accomplishing this required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), +and we still allocate a vector on every read for the segment table. -In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel -comfortable making use of Cap'n Proto. +In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues +for me to feel comfortable using it long-term. + +# Part 2: Flatbuffers + +This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't work out, +official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers is intended to address +the same problems as Cap'n Proto; have a binary schema to describe the format that can be used from many languages. The difference +is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html). + +On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike +Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though. + +First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following: + +```flatbuffers +table Message { + symbol: string; +} +table MultiMessage { + messages:[Message]; +} +``` + +We want to create a `MultiMessage` that contains a vector of `Message`, but each `Message` has a vector (the `string` type). +I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83) +in a `SmallVec` before building the final `MultiMessage`, but it was a painful process. + +Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898). +Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each +message to indicate the size. Not specifically a problem, but I would've rather seen message size integrated into the underlying format. + +Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it fast. # Final Results From 5ce0515e2c176c270ef9d7f3539a7a945e26c7c5 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Fri, 27 Sep 2019 23:36:38 -0400 Subject: [PATCH 06/12] Wording cleanup --- _posts/2019-09-01-binary-format-shootout.md | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index c7f4444..2001966 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -22,9 +22,8 @@ Given these requirements, the formats I could find were: 1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and integrates well with all the build tools 2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding -3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) is being adopted by the - [High-performance financial](https://www.fixtrading.org/standards/sbe/) community, but the Rust implementation - is essentially unmaintained +3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest encoding, + but the Rust implementation is essentially unmaintained Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast, and support multiple languages. But actually picking one to build a system on is intimidating; it's impossible @@ -110,11 +109,9 @@ all the boring code by hand. # Part 1: Cap'n Proto Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried because of how long -it has supported Rust. It was a bit tricky to get the compiler installed, but once that was done, the -[schema document](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/marketdata.capnp) -wasn't hard to create. - -In practice, I had a ton of issues with Cap'n Proto. +it has supported Rust (thanks to [David Renshaw](https://github.com/dwrensha) for maintaining the Rust port since +[2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton of performance concerns +actually using of Cap'n Proto. To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate @@ -124,11 +121,11 @@ but it required reading through Cap'n Proto's [benchmarks](https://github.com/ca to find an example and using `transmute` to bypass Rust's borrow checker. Reading messages is better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) -version, and an unpacked version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; +version, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; Cap'n Proto allocates a new buffer to unpack the message every time, and I wasn't able to figure out a way around that. In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/). -However, accomplishing this required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), -and we still allocate a vector on every read for the segment table. +However, accomplishing zero-copy deserialization required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), +and we still allocate a vector on every read for the segment table (not fixed at time of writing). In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues for me to feel comfortable using it long-term. @@ -162,7 +159,7 @@ Second, streaming support in Flatbuffers seems to be something of an [afterthoug Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each message to indicate the size. Not specifically a problem, but I would've rather seen message size integrated into the underlying format. -Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it fast. +Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well. # Final Results From f50e65204c04c6787dc63666228956c4a8098281 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sat, 28 Sep 2019 00:18:20 -0400 Subject: [PATCH 07/12] Finish first draft --- _posts/2019-09-01-binary-format-shootout.md | 54 +++++++++++++++++++-- 1 file changed, 49 insertions(+), 5 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index 2001966..64b3ef3 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -1,7 +1,7 @@ --- layout: post title: "Binary Format Shootout" -description: "Making sense of binary streams" +description: "Cap'n Proto vs. Flatbuffers vs. SBE" category: tags: [rust] --- @@ -161,11 +161,40 @@ message to indicate the size. Not specifically a problem, but I would've rather Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well. -# Final Results +# Part 3: Simple Binary Encoding -NOTE: Need to expand on this, but numbers reported below are from the IEX's 2019-09-03 data, took average over 10 runs. +Support for SBE was added by the author of one of my favorite +[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/). +I've [talked previously]({% post_url 2019-06-31-high-performance-systems %}) about how important variance is in +high-performance systems, so it was encouraging to read about a format that +[directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my concerns. SBE has by far +the simplest binary format, but it does make some tradeoffs. -Serialization +Both Cap'n Proto and Flatbuffers use [pointers in their messages](https://capnproto.org/encoding.html#structs) to handle +variable-length data, [unions](https://capnproto.org/language.html#unions), and a couple other features. In contrast, +messages in SBE are essentially [primitive structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml); +variable-length data is supported, but there's no union type. + +As mentioned in the beginning, the Rust port of SBE is certainly usable, but is essentially unmaintained. However, if you +don't need union types, and can accept that schemas are XML documents, it's still worth using. + +# Results + +After building a test harness [for](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/capnp_runner.rs) +[each](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/flatbuffers_runner.rs) +[protocol](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs), +it was time to actually take them for a spin. I used +[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to manage the test process, +and the raw results are [here](https://github.com/bspeice/speice.io-md_shootout/blob/master/shootout.csv). All data +reported below is the average of 10 runs over a single day of IEX data. Data checks were implemented to make sure +that each format achieved the same results. + +## Serialization + +Serialization measures on a +[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L268-L272) +how long it takes to convert the pre-parsed IEX message into the desired format +and write to a pre-allocated buffer. | Schema | Median | 99th Pctl | 99.9th Pctl | Total | |:---------------------|:-------|:----------|:------------|:-------| @@ -174,7 +203,13 @@ Serialization | Flatbuffers | 355ns | 2185ns | 3497ns | 14.31s | | SBE | 91ns | 1535ns | 2423ns | 3.91s | -Deserialization +## Deserialization + +Deserialization measures on a +[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L294-L298) +how long it takes to read the message encoded during deserialization and +perform some basic aggregation. The aggregation code is the same for each format, +so any performance differences are due solely to the format implementation. | Schema | Median | 99th Pctl | 99.9th Pctl | Total | |:---------------------|:-------|:----------|:------------|:-------| @@ -182,3 +217,12 @@ Deserialization | Cap'n Proto Unpacked | 366ns | 737ns | 1583ns | 12.32s | | Flatbuffers | 173ns | 421ns | 1007ns | 6.00s | | SBE | 116ns | 286ns | 659ns | 4.05s | + +# Conclusion + +Building a benchmark turned out to be incredibly helpful in making a decision; because a +"union" type isn't important to me, I'll be using SBE for my personal projects. + +And while SBE was the fastest in terms of both median and worst-case performance, its worst case +performance was proportionately far higher than any other format. Further research is necessary +to figure out why this is the case. But that's for another time. From 84fa2f5fa05821c155d41ed1dedd8ede30f3b335 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sat, 28 Sep 2019 00:28:32 -0400 Subject: [PATCH 08/12] Add a TLDR --- _posts/2019-09-01-binary-format-shootout.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index 64b3ef3..40403ce 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -30,7 +30,14 @@ and support multiple languages. But actually picking one to build a system on is to know what issues that choice will lead to. Still, a choice must be made. It's not particularly groundbreaking, but I decided to build a test system to help -understand how they all behave. +understand how they all behave. All code can be found in the [repository](https://github.com/bspeice/speice.io-md_shootout). + +We'll discuss more in detail, but the TLDR: + +- Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues +- Flatbuffers had poor serialization performance, but more than made up for it during deserialiation +- SBE has the best median and worst-case performance, but the message structure doesn't support some + features that both Cap'n Proto and Flatbuffers have # Prologue: Reading the Data From ec490bfc9912d4301c29640bcaed893b29bc6e0f Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sat, 28 Sep 2019 12:55:46 -0400 Subject: [PATCH 09/12] First editing pass --- _posts/2019-09-01-binary-format-shootout.md | 86 +++++++++++---------- 1 file changed, 44 insertions(+), 42 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index 40403ce..0662fa5 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -7,18 +7,16 @@ tags: [rust] --- I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) -is particularly deadly. There's nothing like having other options available to make you question your decisions. -There's a particular scenario that scares me: I'm a couple months into a project, only to realize that if I had -made a different choice at an earlier juncture, weeks of work could have been saved. If only an extra hour or -two of research had been conducted, everything would've turned out differently. +is particularly deadly. Making good decisions at the start avoids pain and suffering down the line; +if doing extra research avoids problems in the future, I'm happy to continue researching indefinitely. -Let's say you're in need of a binary serialization schema for a project you're working on. Data will be going -over the network, not just in memory, so having a schema document is a must. Performance is important; +So let's say you're in need of a binary serialization schema for a project you're working on. Data will be going +over the network, not just in memory, so having a schema document and code generation is a must. Performance is important; there's no reason to use Protocol Buffers when other projects support similar features at faster speed. -And it must be polyglot; Rust support needs to be there, but we can't predict what other languages this will +And it must be polyglot; Rust support is a minimum, but we can't predict what other languages this will interact with. -Given these requirements, the formats I could find were: +Given these requirements, the candidates I could find were: 1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and integrates well with all the build tools 2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding @@ -26,31 +24,34 @@ Given these requirements, the formats I could find were: but the Rust implementation is essentially unmaintained Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast, -and support multiple languages. But actually picking one to build a system on is intimidating; it's impossible -to know what issues that choice will lead to. +and support multiple languages. But how do you actually pick one? It's impossible to know what issues that +choice will lead to, so you avoid commitment until the last possible moment. -Still, a choice must be made. It's not particularly groundbreaking, but I decided to build a test system to help -understand how they all behave. All code can be found in the [repository](https://github.com/bspeice/speice.io-md_shootout). +Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small +proof-of-concept system in each format and pit them against each other. All code can be found in the +[repository](https://github.com/bspeice/speice.io-md_shootout) for this project. -We'll discuss more in detail, but the TLDR: +We'll discuss more in detail, but a quick preview of the results: - Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues - Flatbuffers had poor serialization performance, but more than made up for it during deserialiation - SBE has the best median and worst-case performance, but the message structure doesn't support some - features that both Cap'n Proto and Flatbuffers have + features that both Cap'n Proto and Flatbuffers do # Prologue: Reading the Data -Our benchmark will be a simple market data processor; given messages from [IEX](https://iextrading.com/trading/market-data/#deep), -serialize each message into the schema format, then read back each message to do some basic aggregation. +Our benchmark system will be a simple market data processor; given messages from +[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format, +then read back the message to do some basic aggregation. This test isn't complex, but it is representative +of the project I need a binary format for. -But before we make it to that point, we have to read in the market data. To do so, I'm using a library +But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes, -so this was an opportunity to build a non-trivial program and see how it fared. +so this was an opportunity to build a non-trivial program and get familiar again. -If you're not familiar with `nom`, the idea is to build a binary data parser by combining different -mini-parsers. For example, if your data looks like -[this](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3): +If you don't already know about `nom`, it's a kind of "parser generator". By combining different +mini-parsers, you can parse more complex structures without writing all tedious code by hand. +For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3): ``` 0 1 2 3 @@ -74,7 +75,7 @@ mini-parsers. For example, if your data looks like | ... | ``` -...you can build a parser in `nom` like +...you can build a parser in `nom` that looks like [this](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93): ```rust @@ -106,12 +107,11 @@ pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> { } ``` -This demonstration isn't too interesting, but when more complex formats need to be parsed (like IEX market data), +This example isn't too interesting, but when more complex formats need to be parsed (like IEX market data), [`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs). -Ultimately, because `nom` was used to parse the IEX-format market data before serialization, we're not too interested -in its performance. However, it's worth mentioning how much easier this project was because I didn't have to write -all the boring code by hand. +Ultimately, because the `nom` code in this shootout was used for all formats, we're not too interested in its performance. +Still, building the market data parser was actually fun because I didn't have to write all the boring code by hand. # Part 1: Cap'n Proto @@ -127,29 +127,29 @@ a new buffer for every single message. I was able to work around this and re-use but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) to find an example and using `transmute` to bypass Rust's borrow checker. -Reading messages is better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) -version, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; -Cap'n Proto allocates a new buffer to unpack the message every time, and I wasn't able to figure out a way around that. +The process of reading messages was better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) +representation, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; +Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that. In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/). However, accomplishing zero-copy deserialization required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), -and we still allocate a vector on every read for the segment table (not fixed at time of writing). +and we still allocate a vector on every read for the segment table. In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues for me to feel comfortable using it long-term. # Part 2: Flatbuffers -This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't work out, +This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers is intended to address -the same problems as Cap'n Proto; have a binary schema to describe the format that can be used from many languages. The difference -is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html). +the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims +to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html). On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though. First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following: -```flatbuffers +``` table Message { symbol: string; } @@ -164,7 +164,7 @@ in a `SmallVec` before building the final `MultiMessage`, but it was a painful p Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898). Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each -message to indicate the size. Not specifically a problem, but I would've rather seen message size integrated into the underlying format. +message to indicate the size. Not specifically a problem, but calculating message size without that size tag at the front is nigh on impossible. Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well. @@ -178,12 +178,13 @@ high-performance systems, so it was encouraging to read about a format that the simplest binary format, but it does make some tradeoffs. Both Cap'n Proto and Flatbuffers use [pointers in their messages](https://capnproto.org/encoding.html#structs) to handle -variable-length data, [unions](https://capnproto.org/language.html#unions), and a couple other features. In contrast, -messages in SBE are essentially [primitive structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml); +variable-length data, [unions](https://capnproto.org/language.html#unions), and various other features. In contrast, +messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml); variable-length data is supported, but there's no union type. -As mentioned in the beginning, the Rust port of SBE is certainly usable, but is essentially unmaintained. However, if you -don't need union types, and can accept that schemas are XML documents, it's still worth using. +As mentioned in the beginning, the Rust port of SBE works well, but is essentially unmaintained. However, if you +don't need union types, and can accept that schemas are XML documents, it's still worth using. The Rust SBE implementation +had the best streaming support of any format I used, and doesn't trigger allocation during de/serialization. # Results @@ -230,6 +231,7 @@ so any performance differences are due solely to the format implementation. Building a benchmark turned out to be incredibly helpful in making a decision; because a "union" type isn't important to me, I'll be using SBE for my personal projects. -And while SBE was the fastest in terms of both median and worst-case performance, its worst case -performance was proportionately far higher than any other format. Further research is necessary -to figure out why this is the case. But that's for another time. +While SBE was the fastest in terms of both median and worst-case performance, its worst case +performance was proportionately far higher than any other format. It seems to be that deserialization +time scales with message size, but I'll need to do some more research to understand what exactly +is going on. From cf2cee23b1d90c307d03686014be17607893c5fa Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sat, 28 Sep 2019 13:25:52 -0400 Subject: [PATCH 10/12] Second editing pass --- _posts/2019-09-01-binary-format-shootout.md | 101 ++++++++++---------- 1 file changed, 50 insertions(+), 51 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index 0662fa5..96f3d85 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -7,50 +7,49 @@ tags: [rust] --- I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) -is particularly deadly. Making good decisions at the start avoids pain and suffering down the line; -if doing extra research avoids problems in the future, I'm happy to continue researching indefinitely. +is particularly deadly. Making good decisions in the beginning avoids pain and suffering later; +if extra research prevents future problems, I'm happy to continue researching indefinitely. -So let's say you're in need of a binary serialization schema for a project you're working on. Data will be going -over the network, not just in memory, so having a schema document and code generation is a must. Performance is important; -there's no reason to use Protocol Buffers when other projects support similar features at faster speed. -And it must be polyglot; Rust support is a minimum, but we can't predict what other languages this will -interact with. +So let's say you're in need of a binary serialization schema. Data will be going over the network, not just in memory, +so having a schema document and code generation is a must. Performance is crucial; there's no reason to use Protocol Buffers +when other formats support similar features at faster speeds. And the more languages supported, the better; I use Rust, +but can't predict what other languages this will interact with. Given these requirements, the candidates I could find were: -1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and integrates well with all the build tools +1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and is the most established 2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding 3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest encoding, - but the Rust implementation is essentially unmaintained + but the Rust implementation is [essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9) Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast, -and support multiple languages. But how do you actually pick one? It's impossible to know what issues that -choice will lead to, so you avoid commitment until the last possible moment. +and polyglot support. But how do you actually pick one? It's impossible to know what issues will follow that choice, +so I tend to avoid commitment until the last possible moment. Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small proof-of-concept system in each format and pit them against each other. All code can be found in the -[repository](https://github.com/bspeice/speice.io-md_shootout) for this project. +[repository](https://github.com/bspeice/speice.io-md_shootout) for this post. We'll discuss more in detail, but a quick preview of the results: - Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues -- Flatbuffers had poor serialization performance, but more than made up for it during deserialiation -- SBE has the best median and worst-case performance, but the message structure doesn't support some - features that both Cap'n Proto and Flatbuffers do +- Flatbuffers had some quirks, but largely lived up to its "zero-copy" promises +- SBE has the best median and worst-case performance, but the message structure has a limited feature set + relative to Cap'n Proto and Flatbuffers # Prologue: Reading the Data -Our benchmark system will be a simple market data processor; given messages from +Our benchmark system will be a simple data processor; given depth-of-book market data from [IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format, -then read back the message to do some basic aggregation. This test isn't complex, but it is representative +then read back the message for some basic aggregation. This test isn't complex, but is representative of the project I need a binary format for. But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes, so this was an opportunity to build a non-trivial program and get familiar again. -If you don't already know about `nom`, it's a kind of "parser generator". By combining different -mini-parsers, you can parse more complex structures without writing all tedious code by hand. +If you don't already know about `nom`, it's a "parser generator". By combining different smaller parsers, +you can build a parser to handle more complex structures without writing all the tedious code by hand. For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3): ``` @@ -107,40 +106,41 @@ pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> { } ``` -This example isn't too interesting, but when more complex formats need to be parsed (like IEX market data), +While this example isn't too interesting, more complex formats (like IEX market data) are where [`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs). -Ultimately, because the `nom` code in this shootout was used for all formats, we're not too interested in its performance. -Still, building the market data parser was actually fun because I didn't have to write all the boring code by hand. +Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance. +Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write all the boring code by hand. # Part 1: Cap'n Proto Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried because of how long -it has supported Rust (thanks to [David Renshaw](https://github.com/dwrensha) for maintaining the Rust port since +it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for maintaining the Rust port since [2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton of performance concerns -actually using of Cap'n Proto. +once I started using it. To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate -a new buffer for every single message. I was able to work around this and re-use memory with a -[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51), -but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) -to find an example and using `transmute` to bypass Rust's borrow checker. +a new buffer for every single message. I was able to work around this with a +[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51) +that could re-use the buffer, but it required reading through Cap'n Proto's +[benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) +to find an example, and used [`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow checker. The process of reading messages was better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) representation, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it; Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that. In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/). -However, accomplishing zero-copy deserialization required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), +However, accomplishing zero-copy deserialization required code in the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), and we still allocate a vector on every read for the segment table. -In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues -for me to feel comfortable using it long-term. +In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel comfortable +using it long-term. # Part 2: Flatbuffers This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, -official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers is intended to address +official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to address the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html). @@ -158,13 +158,13 @@ table MultiMessage { } ``` -We want to create a `MultiMessage` that contains a vector of `Message`, but each `Message` has a vector (the `string` type). +We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type). I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83) in a `SmallVec` before building the final `MultiMessage`, but it was a painful process. Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898). -Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each -message to indicate the size. Not specifically a problem, but calculating message size without that size tag at the front is nigh on impossible. +Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of each +message to indicate the size. Not specifically a problem, but calculating message size without that tag is nigh on impossible. Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well. @@ -177,32 +177,31 @@ high-performance systems, so it was encouraging to read about a format that [directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my concerns. SBE has by far the simplest binary format, but it does make some tradeoffs. -Both Cap'n Proto and Flatbuffers use [pointers in their messages](https://capnproto.org/encoding.html#structs) to handle +Both Cap'n Proto and Flatbuffers use [message offsets](https://capnproto.org/encoding.html#structs) to handle variable-length data, [unions](https://capnproto.org/language.html#unions), and various other features. In contrast, messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml); variable-length data is supported, but there's no union type. As mentioned in the beginning, the Rust port of SBE works well, but is essentially unmaintained. However, if you -don't need union types, and can accept that schemas are XML documents, it's still worth using. The Rust SBE implementation -had the best streaming support of any format I used, and doesn't trigger allocation during de/serialization. +don't need union types, and can accept that schemas are XML documents, it's still worth using. The implementation +had the best streaming support of all formats being tested, and doesn't trigger allocation during de/serialization. # Results After building a test harness [for](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/capnp_runner.rs) [each](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/flatbuffers_runner.rs) -[protocol](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs), +[format](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs), it was time to actually take them for a spin. I used -[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to manage the test process, +[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to manage the benchmarking, and the raw results are [here](https://github.com/bspeice/speice.io-md_shootout/blob/master/shootout.csv). All data -reported below is the average of 10 runs over a single day of IEX data. Data checks were implemented to make sure -that each format achieved the same results. +reported below is the average of 10 runs over a single day of IEX data. Results were validated to make sure +that each format parsed the data correctly. ## Serialization -Serialization measures on a -[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L268-L272) -how long it takes to convert the pre-parsed IEX message into the desired format -and write to a pre-allocated buffer. +This test measures, on a +[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L268-L272), +how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer. | Schema | Median | 99th Pctl | 99.9th Pctl | Total | |:---------------------|:-------|:----------|:------------|:-------| @@ -213,9 +212,9 @@ and write to a pre-allocated buffer. ## Deserialization -Deserialization measures on a -[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L294-L298) -how long it takes to read the message encoded during deserialization and +This test measures, on a +[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L294-L298), +how long it takes to read the previously-serialized message and perform some basic aggregation. The aggregation code is the same for each format, so any performance differences are due solely to the format implementation. @@ -229,9 +228,9 @@ so any performance differences are due solely to the format implementation. # Conclusion Building a benchmark turned out to be incredibly helpful in making a decision; because a -"union" type isn't important to me, I'll be using SBE for my personal projects. +"union" type isn't important to me, I can be confident that SBE best addresses my needs. While SBE was the fastest in terms of both median and worst-case performance, its worst case -performance was proportionately far higher than any other format. It seems to be that deserialization +performance was proportionately far higher than any other format. It seems to be that de/serialization time scales with message size, but I'll need to do some more research to understand what exactly is going on. From 0997ba4d9bd5a70c8e5e0f8bc6fde0e3d132980a Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sat, 28 Sep 2019 13:43:47 -0400 Subject: [PATCH 11/12] Final editing pass --- _posts/2019-09-01-binary-format-shootout.md | 50 ++++++++++----------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-01-binary-format-shootout.md index 96f3d85..5afc148 100644 --- a/_posts/2019-09-01-binary-format-shootout.md +++ b/_posts/2019-09-01-binary-format-shootout.md @@ -8,19 +8,19 @@ tags: [rust] I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis) is particularly deadly. Making good decisions in the beginning avoids pain and suffering later; -if extra research prevents future problems, I'm happy to continue researching indefinitely. +if extra research prevents future problems, I'm happy to continue ~~procrastinating~~ researching indefinitely. -So let's say you're in need of a binary serialization schema. Data will be going over the network, not just in memory, +So let's say you're in need of a binary serialization format. Data will be going over the network, not just in memory, so having a schema document and code generation is a must. Performance is crucial; there's no reason to use Protocol Buffers -when other formats support similar features at faster speeds. And the more languages supported, the better; I use Rust, -but can't predict what other languages this will interact with. +when other formats support similar features. And the more languages supported, the better; I use Rust, +but can't predict what other languages this could interact with. Given these requirements, the candidates I could find were: 1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and is the most established 2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding 3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest encoding, - but the Rust implementation is [essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9) + but the Rust implementation is unmaintained Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast, and polyglot support. But how do you actually pick one? It's impossible to know what issues will follow that choice, @@ -32,24 +32,23 @@ proof-of-concept system in each format and pit them against each other. All code We'll discuss more in detail, but a quick preview of the results: -- Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues -- Flatbuffers had some quirks, but largely lived up to its "zero-copy" promises -- SBE has the best median and worst-case performance, but the message structure has a limited feature set - relative to Cap'n Proto and Flatbuffers +- Cap'n Proto: Theoretically performs incredibly well, the implementation had issues +- Flatbuffers: Has some quirks, but largely lived up to its "zero-copy" promises +- SBE: Best median and worst-case performance, but the message structure has a limited feature set -# Prologue: Reading the Data +# Prologue: Binary Parsing with Nom Our benchmark system will be a simple data processor; given depth-of-book market data from [IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format, -then read back the message for some basic aggregation. This test isn't complex, but is representative -of the project I need a binary format for. +read it back, and calculate total size of stock traded and the lowest/highest quoted prices. This test +isn't complex, but is representative of the project I need a binary format for. But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes, -so this was an opportunity to build a non-trivial program and get familiar again. +so this was an opportunity to build a non-trivial program and get familiar. If you don't already know about `nom`, it's a "parser generator". By combining different smaller parsers, -you can build a parser to handle more complex structures without writing all the tedious code by hand. +you can assemble a parser to handle complex structures without writing tedious code by hand. For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3): ``` @@ -110,7 +109,7 @@ While this example isn't too interesting, more complex formats (like IEX market [`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs). Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance. -Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write all the boring code by hand. +Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write tons of boring code by hand. # Part 1: Cap'n Proto @@ -132,7 +131,7 @@ representation, and an "unpacked" version. When reading "packed" messages, we ne Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that. In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/). However, accomplishing zero-copy deserialization required code in the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)), -and we still allocate a vector on every read for the segment table. +and we allocate a vector on every read for the segment table. In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel comfortable using it long-term. @@ -140,12 +139,12 @@ using it long-term. # Part 2: Flatbuffers This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out, -official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to address +official support was [recently launched](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to address the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html). -On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike -Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though. +On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice, and unlike +Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. However, there were still some issues. First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following: @@ -160,7 +159,7 @@ table MultiMessage { We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type). I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83) -in a `SmallVec` before building the final `MultiMessage`, but it was a painful process. +in a `SmallVec` before building the final `MultiMessage`, but it was a painful process that I believe contributed to poor serialization performance. Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898). Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of each @@ -182,9 +181,10 @@ variable-length data, [unions](https://capnproto.org/language.html#unions), and messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml); variable-length data is supported, but there's no union type. -As mentioned in the beginning, the Rust port of SBE works well, but is essentially unmaintained. However, if you -don't need union types, and can accept that schemas are XML documents, it's still worth using. The implementation -had the best streaming support of all formats being tested, and doesn't trigger allocation during de/serialization. +As mentioned in the beginning, the Rust port of SBE works well, but is +[essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9). +However, if you don't need union types, and can accept that schemas are XML documents, it's still worth using. SBE's implementation +had the best streaming support of all formats I tested, and doesn't trigger allocation during de/serialization. # Results @@ -192,9 +192,9 @@ After building a test harness [for](https://github.com/bspeice/speice.io-md_shoo [each](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/flatbuffers_runner.rs) [format](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs), it was time to actually take them for a spin. I used -[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to manage the benchmarking, +[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to run the benchmarks, and the raw results are [here](https://github.com/bspeice/speice.io-md_shootout/blob/master/shootout.csv). All data -reported below is the average of 10 runs over a single day of IEX data. Results were validated to make sure +reported below is the average of 10 runs on a single day of IEX data. Results were validated to make sure that each format parsed the data correctly. ## Serialization From 769514c25e4490aa8976d617fbbc9cc56ef7fe80 Mon Sep 17 00:00:00 2001 From: Bradlee Speice Date: Sat, 28 Sep 2019 13:44:06 -0400 Subject: [PATCH 12/12] Get ready for publication --- ...ry-format-shootout.md => 2019-09-28-binary-format-shootout.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename _posts/{2019-09-01-binary-format-shootout.md => 2019-09-28-binary-format-shootout.md} (100%) diff --git a/_posts/2019-09-01-binary-format-shootout.md b/_posts/2019-09-28-binary-format-shootout.md similarity index 100% rename from _posts/2019-09-01-binary-format-shootout.md rename to _posts/2019-09-28-binary-format-shootout.md