First editing pass

timing
Bradlee Speice 2019-09-28 12:55:46 -04:00
parent 84fa2f5fa0
commit ec490bfc99
1 changed files with 44 additions and 42 deletions

View File

@ -7,18 +7,16 @@ tags: [rust]
---
I've found that in many personal projects, [analysis paralysis](https://en.wikipedia.org/wiki/Analysis_paralysis)
is particularly deadly. There's nothing like having other options available to make you question your decisions.
There's a particular scenario that scares me: I'm a couple months into a project, only to realize that if I had
made a different choice at an earlier juncture, weeks of work could have been saved. If only an extra hour or
two of research had been conducted, everything would've turned out differently.
is particularly deadly. Making good decisions at the start avoids pain and suffering down the line;
if doing extra research avoids problems in the future, I'm happy to continue researching indefinitely.
Let's say you're in need of a binary serialization schema for a project you're working on. Data will be going
over the network, not just in memory, so having a schema document is a must. Performance is important;
So let's say you're in need of a binary serialization schema for a project you're working on. Data will be going
over the network, not just in memory, so having a schema document and code generation is a must. Performance is important;
there's no reason to use Protocol Buffers when other projects support similar features at faster speed.
And it must be polyglot; Rust support needs to be there, but we can't predict what other languages this will
And it must be polyglot; Rust support is a minimum, but we can't predict what other languages this will
interact with.
Given these requirements, the formats I could find were:
Given these requirements, the candidates I could find were:
1. [Cap'n Proto](https://capnproto.org/) has been around the longest, and integrates well with all the build tools
2. [Flatbuffers](https://google.github.io/flatbuffers/) is the newest, and claims to have a simpler encoding
@ -26,31 +24,34 @@ Given these requirements, the formats I could find were:
but the Rust implementation is essentially unmaintained
Any one of these will satisfy the project requirements: easy to transmit over a network, reasonably fast,
and support multiple languages. But actually picking one to build a system on is intimidating; it's impossible
to know what issues that choice will lead to.
and support multiple languages. But how do you actually pick one? It's impossible to know what issues that
choice will lead to, so you avoid commitment until the last possible moment.
Still, a choice must be made. It's not particularly groundbreaking, but I decided to build a test system to help
understand how they all behave. All code can be found in the [repository](https://github.com/bspeice/speice.io-md_shootout).
Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small
proof-of-concept system in each format and pit them against each other. All code can be found in the
[repository](https://github.com/bspeice/speice.io-md_shootout) for this project.
We'll discuss more in detail, but the TLDR:
We'll discuss more in detail, but a quick preview of the results:
- Cap'n Proto can theoretically perform incredibly well, but the implementation had performance issues
- Flatbuffers had poor serialization performance, but more than made up for it during deserialiation
- SBE has the best median and worst-case performance, but the message structure doesn't support some
features that both Cap'n Proto and Flatbuffers have
features that both Cap'n Proto and Flatbuffers do
# Prologue: Reading the Data
Our benchmark will be a simple market data processor; given messages from [IEX](https://iextrading.com/trading/market-data/#deep),
serialize each message into the schema format, then read back each message to do some basic aggregation.
Our benchmark system will be a simple market data processor; given messages from
[IEX](https://iextrading.com/trading/market-data/#deep), serialize each message into the schema format,
then read back the message to do some basic aggregation. This test isn't complex, but it is representative
of the project I need a binary format for.
But before we make it to that point, we have to read in the market data. To do so, I'm using a library
But before we make it to that point, we have to actually read in the market data. To do so, I'm using a library
called [`nom`](https://github.com/Geal/nom). Version 5.0 was recently released and brought some big changes,
so this was an opportunity to build a non-trivial program and see how it fared.
so this was an opportunity to build a non-trivial program and get familiar again.
If you're not familiar with `nom`, the idea is to build a binary data parser by combining different
mini-parsers. For example, if your data looks like
[this](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3):
If you don't already know about `nom`, it's a kind of "parser generator". By combining different
mini-parsers, you can parse more complex structures without writing all tedious code by hand.
For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html#rfc.section.3.3):
```
0 1 2 3
@ -74,7 +75,7 @@ mini-parsers. For example, if your data looks like
| ... |
```
...you can build a parser in `nom` like
...you can build a parser in `nom` that looks like
[this](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93):
```rust
@ -106,12 +107,11 @@ pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> {
}
```
This demonstration isn't too interesting, but when more complex formats need to be parsed (like IEX market data),
This example isn't too interesting, but when more complex formats need to be parsed (like IEX market data),
[`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs).
Ultimately, because `nom` was used to parse the IEX-format market data before serialization, we're not too interested
in its performance. However, it's worth mentioning how much easier this project was because I didn't have to write
all the boring code by hand.
Ultimately, because the `nom` code in this shootout was used for all formats, we're not too interested in its performance.
Still, building the market data parser was actually fun because I didn't have to write all the boring code by hand.
# Part 1: Cap'n Proto
@ -127,29 +127,29 @@ a new buffer for every single message. I was able to work around this and re-use
but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)
to find an example and using `transmute` to bypass Rust's borrow checker.
Reading messages is better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)
version, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it;
Cap'n Proto allocates a new buffer to unpack the message every time, and I wasn't able to figure out a way around that.
The process of reading messages was better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)
representation, and an "unpacked" version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it;
Cap'n Proto allocates a new buffer for each message we unpack, and I wasn't able to figure out a way around that.
In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/).
However, accomplishing zero-copy deserialization required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)),
and we still allocate a vector on every read for the segment table (not fixed at time of writing).
and we still allocate a vector on every read for the segment table.
In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues
for me to feel comfortable using it long-term.
# Part 2: Flatbuffers
This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't work out,
This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't pan out,
official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers is intended to address
the same problems as Cap'n Proto; have a binary schema to describe the format that can be used from many languages. The difference
is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).
the same problems as Cap'n Proto: high-performance, polyglot, binary messaging. The difference is that Flatbuffers claims
to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).
On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike
Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though.
First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following:
```flatbuffers
```
table Message {
symbol: string;
}
@ -164,7 +164,7 @@ in a `SmallVec` before building the final `MultiMessage`, but it was a painful p
Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898).
Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each
message to indicate the size. Not specifically a problem, but I would've rather seen message size integrated into the underlying format.
message to indicate the size. Not specifically a problem, but calculating message size without that size tag at the front is nigh on impossible.
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it perform well.
@ -178,12 +178,13 @@ high-performance systems, so it was encouraging to read about a format that
the simplest binary format, but it does make some tradeoffs.
Both Cap'n Proto and Flatbuffers use [pointers in their messages](https://capnproto.org/encoding.html#structs) to handle
variable-length data, [unions](https://capnproto.org/language.html#unions), and a couple other features. In contrast,
messages in SBE are essentially [primitive structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml);
variable-length data, [unions](https://capnproto.org/language.html#unions), and various other features. In contrast,
messages in SBE are essentially [just structs](https://github.com/real-logic/simple-binary-encoding/blob/master/sbe-samples/src/main/resources/example-schema.xml);
variable-length data is supported, but there's no union type.
As mentioned in the beginning, the Rust port of SBE is certainly usable, but is essentially unmaintained. However, if you
don't need union types, and can accept that schemas are XML documents, it's still worth using.
As mentioned in the beginning, the Rust port of SBE works well, but is essentially unmaintained. However, if you
don't need union types, and can accept that schemas are XML documents, it's still worth using. The Rust SBE implementation
had the best streaming support of any format I used, and doesn't trigger allocation during de/serialization.
# Results
@ -230,6 +231,7 @@ so any performance differences are due solely to the format implementation.
Building a benchmark turned out to be incredibly helpful in making a decision; because a
"union" type isn't important to me, I'll be using SBE for my personal projects.
And while SBE was the fastest in terms of both median and worst-case performance, its worst case
performance was proportionately far higher than any other format. Further research is necessary
to figure out why this is the case. But that's for another time.
While SBE was the fastest in terms of both median and worst-case performance, its worst case
performance was proportionately far higher than any other format. It seems to be that deserialization
time scales with message size, but I'll need to do some more research to understand what exactly
is going on.