speice.io/_posts/2019-09-01-binary-format-shootout.md

---
layout: post
title: "Binary Format Shootout"
description: "Cap'n Proto vs. Flatbuffers vs. SBE"
category: 
tags: [rust]
---

I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis)
is particularly deadly. Making good decisions in the beginning avoids pain and suffering later;
if extra research prevents future problems, I'm happy to continue researching indefinitely.

So let's say you're in need of a binary serialization schema. Data will be going over the network, not just in memory,
so having a schema document and code generation is a must. Performance is crucial; there's no reason to use Protocol Buffers
when other formats support similar features at faster speeds. And the more languages supported, the better; I use Rust,
but can't predict what other languages this will interact with.

Given these requirements, the candidates I could find were:

1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and is the most established
2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding
3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest encoding,
   but the Rust implementation is [essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9)

Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast,
and polyglot support. But how do you actually pick one? It's impossible to know what issues will follow that choice,
so I tend to avoid commitment until the last possible moment.

Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small 
proof-of-concept system in each format and pit them against each other. All code can be found in the
[repository](https://github.com/bspeice/speice.io-md_shootout) for this post.

We'll discuss more in detail, but a quick preview of the results:

- Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues
- Flatbuffers had some quirks, but largely lived up to its "zero-copy" promises
- SBE has the best median and worst-case performance, but the message structure has a limited feature set
  relative to Cap'n Proto and Flatbuffers

# Prologue: Reading the Data

Our benchmark system will be a simple data processor; given depth-of-book market data from
[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format,
then read back the message for some basic aggregation. This test isn't complex, but is representative
of the project I need a binary format for.

But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library
called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes,
so this was an opportunity to build a non-trivial program and get familiar again.

If you don't already know about `nom`, it's a "parser generator". By combining different smaller parsers,
you can build a parser to handle more complex structures without writing all the tedious code by hand.
For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3):

```
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +---------------------------------------------------------------+
 0 |                    Block Type = 0x00000006                    |
   +---------------------------------------------------------------+
 4 |                      Block Total Length                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 8 |                         Interface ID                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
12 |                        Timestamp (High)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
16 |                        Timestamp (Low)                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
20 |                         Captured Len                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
24 |                          Packet Len                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                          Packet Data                          |
   |                              ...                              |
```

...you can build a parser in `nom` that looks like
[this](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93):

```rust
const ENHANCED_PACKET: [u8; 4] = [0x06, 0x00, 0x00, 0x00];
pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> {
    let (
        remaining,
        (
            block_type,
            block_len,
            interface_id,
            timestamp_high,
            timestamp_low,
            captured_len,
            packet_len,
        ),
    ) = tuple((
        tag(ENHANCED_PACKET),
        le_u32,
        le_u32,
        le_u32,
        le_u32,
        le_u32,
        le_u32,
    ))(input)?;

    let (remaining, packet_data) = take(captured_len)(remaining)?;
    Ok((remaining, packet_data))
}
```

While this example isn't too interesting, more complex formats (like IEX market data) are where
[`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs).

Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance.
Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write all the boring code by hand.

# Part 1: Cap'n Proto

Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried because of how long
it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for maintaining the Rust port since
[2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton of performance concerns
once I started using it.

To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message
content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate
a new buffer for every single message. I was able to work around this with a
[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51)
that could re-use the buffer, but it required reading through Cap'n Proto's
[benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)
to find an example, and used [`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow checker.

The process of reading messages was better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)
representation, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it;
Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that.
In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/).
However, accomplishing zero-copy deserialization required code in the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)),
and we still allocate a vector on every read for the segment table.

In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel comfortable
using it long-term.

# Part 2: Flatbuffers

This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out,
official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to address
the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims
to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).

On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike
Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though.

First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following:

```
table Message {
  symbol: string;
}
table MultiMessage {
  messages:[Message];
}
```

We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type).
I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)
in a `SmallVec` before building the final `MultiMessage`, but it was a painful process.

Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898).
Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of each
message to indicate the size. Not specifically a problem, but calculating message size without that tag is nigh on impossible.

Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well.

# Part 3: Simple Binary Encoding

Support for SBE was added by the author of one of my favorite
[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).
I've [talked previously]({% post_url 2019-06-31-high-performance-systems %}) about how important variance is in
high-performance systems, so it was encouraging to read about a format that
[directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my concerns. SBE has by far
the simplest binary format, but it does make some tradeoffs.

Both Cap'n Proto and Flatbuffers use [message offsets](https://capnproto.org/encoding.html#structs) to handle
variable-length data, [unions](https://capnproto.org/language.html#unions), and various other features. In contrast,
messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml);
variable-length data is supported, but there's no union type.

As mentioned in the beginning, the Rust port of SBE works well, but is essentially unmaintained. However, if you
don't need union types, and can accept that schemas are XML documents, it's still worth using. The implementation
had the best streaming support of all formats being tested, and doesn't trigger allocation during de/serialization.

# Results

After building a test harness [for](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/capnp_runner.rs)
[each](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/flatbuffers_runner.rs)
[format](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs),
it was time to actually take them for a spin. I used
[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to manage the benchmarking,
and the raw results are [here](https://github.com/bspeice/speice.io-md_shootout/blob/master/shootout.csv). All data
reported below is the average of 10 runs over a single day of IEX data. Results were validated to make sure
that each format parsed the data correctly.

## Serialization

This test measures, on a
[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L268-L272),
how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer.

| Schema               | Median | 99th Pctl | 99.9th Pctl | Total  |
|:---------------------|:-------|:----------|:------------|:-------|
| Cap'n Proto Packed   | 413ns  | 1751ns    | 2943ns      | 14.80s |
| Cap'n Proto Unpacked | 273ns  | 1828ns    | 2836ns      | 10.65s |
| Flatbuffers          | 355ns  | 2185ns    | 3497ns      | 14.31s |
| SBE                  | 91ns   | 1535ns    | 2423ns      | 3.91s  |

## Deserialization

This test measures, on a
[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L294-L298),
how long it takes to read the previously-serialized message and
perform some basic aggregation. The aggregation code is the same for each format,
so any performance differences are due solely to the format implementation.

| Schema               | Median | 99th Pctl | 99.9th Pctl | Total  |
|:---------------------|:-------|:----------|:------------|:-------|
| Cap'n Proto Packed   | 539ns  | 1216ns    | 2599ns      | 18.92s |
| Cap'n Proto Unpacked | 366ns  | 737ns     | 1583ns      | 12.32s |
| Flatbuffers          | 173ns  | 421ns     | 1007ns      | 6.00s  |
| SBE                  | 116ns  | 286ns     | 659ns       | 4.05s  |

# Conclusion

Building a benchmark turned out to be incredibly helpful in making a decision; because a
"union" type isn't important to me, I can be confident that SBE best addresses my needs.

While SBE was the fastest in terms of both median and worst-case performance, its worst case
performance was proportionately far higher than any other format. It seems to be that de/serialization
time scales with message size, but I'll need to do some more research to understand what exactly
is going on.
Start work on binary format shootout 2019-09-01 23:56:43 -04:00			`---`
			`layout: post`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			`title: "Binary Format Shootout"`
Finish first draft 2019-09-28 00:18:20 -04:00			`description: "Cap'n Proto vs. Flatbuffers vs. SBE"`
Start work on binary format shootout 2019-09-01 23:56:43 -04:00			`category:`
Renaming and cleanup 2019-09-26 23:25:42 -04:00			`tags: [rust]`
Start work on binary format shootout 2019-09-01 23:56:43 -04:00			`---`

Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			`I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis)`
Second editing pass 2019-09-28 13:25:52 -04:00			`is particularly deadly. Making good decisions in the beginning avoids pain and suffering later;`
			`if extra research prevents future problems, I'm happy to continue researching indefinitely.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Second editing pass 2019-09-28 13:25:52 -04:00			`So let's say you're in need of a binary serialization schema. Data will be going over the network, not just in memory,`
			`so having a schema document and code generation is a must. Performance is crucial; there's no reason to use Protocol Buffers`
			`when other formats support similar features at faster speeds. And the more languages supported, the better; I use Rust,`
			`but can't predict what other languages this will interact with.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
First editing pass 2019-09-28 12:55:46 -04:00			`Given these requirements, the candidates I could find were:`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Second editing pass 2019-09-28 13:25:52 -04:00			`1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and is the most established`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			`2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding`
Wording cleanup 2019-09-27 23:36:38 -04:00			`3. [Simple Binary Encoding](https://github.com/real-logic/simple-binary-encoding) has the simplest encoding,`
Second editing pass 2019-09-28 13:25:52 -04:00			`but the Rust implementation is [essentially unmaintained](https://users.rust-lang.org/t/zero-cost-abstraction-frontier-no-copy-low-allocation-ordered-decoding/11515/9)`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
			`Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast,`
Second editing pass 2019-09-28 13:25:52 -04:00			`and polyglot support. But how do you actually pick one? It's impossible to know what issues will follow that choice,`
			`so I tend to avoid commitment until the last possible moment.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
First editing pass 2019-09-28 12:55:46 -04:00			`Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small`
			`proof-of-concept system in each format and pit them against each other. All code can be found in the`
Second editing pass 2019-09-28 13:25:52 -04:00			`[repository](https://github.com/bspeice/speice.io-md_shootout) for this post.`
Add a TLDR 2019-09-28 00:28:32 -04:00
First editing pass 2019-09-28 12:55:46 -04:00			`We'll discuss more in detail, but a quick preview of the results:`
Add a TLDR 2019-09-28 00:28:32 -04:00
			`- Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues`
Second editing pass 2019-09-28 13:25:52 -04:00			`- Flatbuffers had some quirks, but largely lived up to its "zero-copy" promises`
			`- SBE has the best median and worst-case performance, but the message structure has a limited feature set`
			`relative to Cap'n Proto and Flatbuffers`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
			`# Prologue: Reading the Data`

Second editing pass 2019-09-28 13:25:52 -04:00			`Our benchmark system will be a simple data processor; given depth-of-book market data from`
First editing pass 2019-09-28 12:55:46 -04:00			`[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format,`
Second editing pass 2019-09-28 13:25:52 -04:00			`then read back the message for some basic aggregation. This test isn't complex, but is representative`
First editing pass 2019-09-28 12:55:46 -04:00			`of the project I need a binary format for.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
First editing pass 2019-09-28 12:55:46 -04:00			`But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes,
First editing pass 2019-09-28 12:55:46 -04:00			`so this was an opportunity to build a non-trivial program and get familiar again.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Second editing pass 2019-09-28 13:25:52 -04:00			If you don't already know about `nom`, it's a "parser generator". By combining different smaller parsers,
			`you can build a parser to handle more complex structures without writing all the tedious code by hand.`
First editing pass 2019-09-28 12:55:46 -04:00			`For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3):`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
			```
			`0 1 2 3`
			`0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1`
			`+---------------------------------------------------------------+`
			`0 \| Block Type = 0x00000006 \|`
			`+---------------------------------------------------------------+`
			`4 \| Block Total Length \|`
			`+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`
			`8 \| Interface ID \|`
			`+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`
			`12 \| Timestamp (High) \|`
			`+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`
			`16 \| Timestamp (Low) \|`
			`+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`
			`20 \| Captured Len \|`
			`+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`
			`24 \| Packet Len \|`
			`+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`
			`\| Packet Data \|`
			`\| ... \|`
			```

First editing pass 2019-09-28 12:55:46 -04:00			...you can build a parser in `nom` that looks like
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			`[this](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93):`

			```rust
			`const ENHANCED_PACKET: [u8; 4] = [0x06, 0x00, 0x00, 0x00];`
			`pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> {`
			`let (`
			`remaining,`
			`(`
			`block_type,`
			`block_len,`
			`interface_id,`
			`timestamp_high,`
			`timestamp_low,`
			`captured_len,`
			`packet_len,`
			`),`
			`) = tuple((`
			`tag(ENHANCED_PACKET),`
			`le_u32,`
			`le_u32,`
			`le_u32,`
			`le_u32,`
			`le_u32,`
			`le_u32,`
			`))(input)?;`

			`let (remaining, packet_data) = take(captured_len)(remaining)?;`
			`Ok((remaining, packet_data))`
			`}`
			```

Second editing pass 2019-09-28 13:25:52 -04:00			`While this example isn't too interesting, more complex formats (like IEX market data) are where`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			[`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs).

Second editing pass 2019-09-28 13:25:52 -04:00			Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance.
			`Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write all the boring code by hand.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
			`# Part 1: Cap'n Proto`

			`Now it's time to get into the meaty part of the story. Cap'n Proto was the first format I tried because of how long`
Second editing pass 2019-09-28 13:25:52 -04:00			`it has supported Rust (thanks to [dwrensha](https://github.com/dwrensha) for maintaining the Rust port since`
Wording cleanup 2019-09-27 23:36:38 -04:00			`[2014!](https://github.com/capnproto/capnproto-rust/releases/tag/rustc-0.10)). However, I had a ton of performance concerns`
Second editing pass 2019-09-28 13:25:52 -04:00			`once I started using it.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
			`To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message`
			`content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate`
Second editing pass 2019-09-28 13:25:52 -04:00			`a new buffer for every single message. I was able to work around this with a`
			`[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51)`
			`that could re-use the buffer, but it required reading through Cap'n Proto's`
			`[benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)`
			to find an example, and used [`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow checker.
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00
First editing pass 2019-09-28 12:55:46 -04:00			`The process of reading messages was better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)`
			`representation, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it;`
			`Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that.`
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00			`In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/).`
Second editing pass 2019-09-28 13:25:52 -04:00			`However, accomplishing zero-copy deserialization required code in the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)),`
First editing pass 2019-09-28 12:55:46 -04:00			`and we still allocate a vector on every read for the segment table.`
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00
Second editing pass 2019-09-28 13:25:52 -04:00			`In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel comfortable`
			`using it long-term.`
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00
			`# Part 2: Flatbuffers`

First editing pass 2019-09-28 12:55:46 -04:00			`This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out,`
Second editing pass 2019-09-28 13:25:52 -04:00			`official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers intends to address`
First editing pass 2019-09-28 12:55:46 -04:00			`the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims`
			`to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).`
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00
			`On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike`
			`Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though.`

			`First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following:`

First editing pass 2019-09-28 12:55:46 -04:00			```
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00			`table Message {`
			`symbol: string;`
			`}`
			`table MultiMessage {`
			`messages:[Message];`
			`}`
			```

Second editing pass 2019-09-28 13:25:52 -04:00			We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type).
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00			I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)
			in a `SmallVec` before building the final `MultiMessage`, but it was a painful process.

			`Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898).`
Second editing pass 2019-09-28 13:25:52 -04:00			Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just sticks a `u32` at the front of each
			`message to indicate the size. Not specifically a problem, but calculating message size without that tag is nigh on impossible.`
Writeup for Flatbuffers 2019-09-27 23:20:46 -04:00
Wording cleanup 2019-09-27 23:36:38 -04:00			`Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Finish first draft 2019-09-28 00:18:20 -04:00			`# Part 3: Simple Binary Encoding`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Finish first draft 2019-09-28 00:18:20 -04:00			`Support for SBE was added by the author of one of my favorite`
			`[Rust blog posts](https://web.archive.org/web/20190427124806/https://polysync.io/blog/session-types-for-hearty-codecs/).`
			`I've [talked previously]({% post_url 2019-06-31-high-performance-systems %}) about how important variance is in`
			`high-performance systems, so it was encouraging to read about a format that`
			`[directly addressed](https://github.com/real-logic/simple-binary-encoding/wiki/Why-Low-Latency) my concerns. SBE has by far`
			`the simplest binary format, but it does make some tradeoffs.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Second editing pass 2019-09-28 13:25:52 -04:00			`Both Cap'n Proto and Flatbuffers use [message offsets](https://capnproto.org/encoding.html#structs) to handle`
First editing pass 2019-09-28 12:55:46 -04:00			`variable-length data, [unions](https://capnproto.org/language.html#unions), and various other features. In contrast,`
			`messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml);`
Finish first draft 2019-09-28 00:18:20 -04:00			`variable-length data is supported, but there's no union type.`

First editing pass 2019-09-28 12:55:46 -04:00			`As mentioned in the beginning, the Rust port of SBE works well, but is essentially unmaintained. However, if you`
Second editing pass 2019-09-28 13:25:52 -04:00			`don't need union types, and can accept that schemas are XML documents, it's still worth using. The implementation`
			`had the best streaming support of all formats being tested, and doesn't trigger allocation during de/serialization.`
Finish first draft 2019-09-28 00:18:20 -04:00
			`# Results`

			`After building a test harness [for](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/capnp_runner.rs)`
			`[each](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/flatbuffers_runner.rs)`
Second editing pass 2019-09-28 13:25:52 -04:00			`[format](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs),`
Finish first draft 2019-09-28 00:18:20 -04:00			`it was time to actually take them for a spin. I used`
Second editing pass 2019-09-28 13:25:52 -04:00			`[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to manage the benchmarking,`
Finish first draft 2019-09-28 00:18:20 -04:00			`and the raw results are [here](https://github.com/bspeice/speice.io-md_shootout/blob/master/shootout.csv). All data`
Second editing pass 2019-09-28 13:25:52 -04:00			`reported below is the average of 10 runs over a single day of IEX data. Results were validated to make sure`
			`that each format parsed the data correctly.`
Finish first draft 2019-09-28 00:18:20 -04:00
			`## Serialization`

Second editing pass 2019-09-28 13:25:52 -04:00			`This test measures, on a`
			`[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L268-L272),`
			`how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Continue formatting work 2019-09-26 23:35:53 -04:00			`\| Schema \| Median \| 99th Pctl \| 99.9th Pctl \| Total \|`
			`\|:---------------------\|:-------\|:----------\|:------------\|:-------\|`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			`\| Cap'n Proto Packed \| 413ns \| 1751ns \| 2943ns \| 14.80s \|`
			`\| Cap'n Proto Unpacked \| 273ns \| 1828ns \| 2836ns \| 10.65s \|`
			`\| Flatbuffers \| 355ns \| 2185ns \| 3497ns \| 14.31s \|`
			`\| SBE \| 91ns \| 1535ns \| 2423ns \| 3.91s \|`

Finish first draft 2019-09-28 00:18:20 -04:00			`## Deserialization`

Second editing pass 2019-09-28 13:25:52 -04:00			`This test measures, on a`
			`[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L294-L298),`
			`how long it takes to read the previously-serialized message and`
Finish first draft 2019-09-28 00:18:20 -04:00			`perform some basic aggregation. The aggregation code is the same for each format,`
			`so any performance differences are due solely to the format implementation.`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00
Continue formatting work 2019-09-26 23:35:53 -04:00			`\| Schema \| Median \| 99th Pctl \| 99.9th Pctl \| Total \|`
			`\|:---------------------\|:-------\|:----------\|:------------\|:-------\|`
Start a second pass on the article Also change the table formatting to actually be readable 2019-09-26 23:24:39 -04:00			`\| Cap'n Proto Packed \| 539ns \| 1216ns \| 2599ns \| 18.92s \|`
			`\| Cap'n Proto Unpacked \| 366ns \| 737ns \| 1583ns \| 12.32s \|`
			`\| Flatbuffers \| 173ns \| 421ns \| 1007ns \| 6.00s \|`
			`\| SBE \| 116ns \| 286ns \| 659ns \| 4.05s \|`
Finish first draft 2019-09-28 00:18:20 -04:00
			`# Conclusion`

			`Building a benchmark turned out to be incredibly helpful in making a decision; because a`
Second editing pass 2019-09-28 13:25:52 -04:00			`"union" type isn't important to me, I can be confident that SBE best addresses my needs.`
Finish first draft 2019-09-28 00:18:20 -04:00
First editing pass 2019-09-28 12:55:46 -04:00			`While SBE was the fastest in terms of both median and worst-case performance, its worst case`
Second editing pass 2019-09-28 13:25:52 -04:00			`performance was proportionately far higher than any other format. It seems to be that de/serialization`
First editing pass 2019-09-28 12:55:46 -04:00			`time scales with message size, but I'll need to do some more research to understand what exactly`
			`is going on.`