Migrate links for marketdata shootout

This commit is contained in:
Bradlee Speice 2020-03-21 17:01:11 -04:00
parent 75dce1863a
commit eccbc11cfe

View File

@ -28,7 +28,7 @@ so I tend to avoid commitment until the last possible moment.
Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small Still, a choice must be made. Instead of worrying about which is "the best," I decided to build a small
proof-of-concept system in each format and pit them against each other. All code can be found in the proof-of-concept system in each format and pit them against each other. All code can be found in the
[repository](https://github.com/bspeice/speice.io-md_shootout) for this post. [repository](https://github.com/speice-io/marketdata-shootout) for this post.
We'll discuss more in detail, but a quick preview of the results: We'll discuss more in detail, but a quick preview of the results:
@ -74,7 +74,7 @@ For example, when parsing [PCAP files](https://www.winpcap.org/ntar/draft/PCAP-D
``` ```
...you can build a parser in `nom` that looks like ...you can build a parser in `nom` that looks like
[this](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93): [this](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/parsers.rs#L59-L93):
```rust ```rust
const ENHANCED_PACKET: [u8; 4] = [0x06, 0x00, 0x00, 0x00]; const ENHANCED_PACKET: [u8; 4] = [0x06, 0x00, 0x00, 0x00];
@ -106,7 +106,7 @@ pub fn enhanced_packet_block(input: &[u8]) -> IResult<&[u8], &[u8]> {
``` ```
While this example isn't too interesting, more complex formats (like IEX market data) are where While this example isn't too interesting, more complex formats (like IEX market data) are where
[`nom` really shines](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs). [`nom` really shines](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/iex.rs).
Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance. Ultimately, because the `nom` code in this shootout was the same for all formats, we're not too interested in its performance.
Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write tons of boring code by hand. Still, it's worth mentioning that building the market data parser was actually fun; I didn't have to write tons of boring code by hand.
@ -121,7 +121,7 @@ once I started using it.
To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message To serialize new messages, Cap'n Proto uses a "builder" object. This builder allocates memory on the heap to hold the message
content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate content, but because builders [can't be re-used](https://github.com/capnproto/capnproto-rust/issues/111), we have to allocate
a new buffer for every single message. I was able to work around this with a a new buffer for every single message. I was able to work around this with a
[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51) [special builder](https://github.com/speice-io/marketdata-shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51)
that could re-use the buffer, but it required reading through Cap'n Proto's that could re-use the buffer, but it required reading through Cap'n Proto's
[benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)
to find an example, and used [`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow checker. to find an example, and used [`std::mem::transmute`](https://doc.rust-lang.org/std/mem/fn.transmute.html) to bypass Rust's borrow checker.
@ -158,7 +158,7 @@ table MultiMessage {
``` ```
We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type). We want to create a `MultiMessage` which contains a vector of `Message`, and each `Message` itself contains a vector (the `string` type).
I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83) I was able to work around this by [caching `Message` elements](https://github.com/speice-io/marketdata-shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)
in a `SmallVec` before building the final `MultiMessage`, but it was a painful process that I believe contributed to poor serialization performance. in a `SmallVec` before building the final `MultiMessage`, but it was a painful process that I believe contributed to poor serialization performance.
Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898). Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898).
@ -188,19 +188,19 @@ had the best streaming support of all formats I tested, and doesn't trigger allo
# Results # Results
After building a test harness [for](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/capnp_runner.rs) After building a test harness [for](https://github.com/speice-io/marketdata-shootout/blob/master/src/capnp_runner.rs)
[each](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/flatbuffers_runner.rs) [each](https://github.com/speice-io/marketdata-shootout/blob/master/src/flatbuffers_runner.rs)
[format](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/sbe_runner.rs), [format](https://github.com/speice-io/marketdata-shootout/blob/master/src/sbe_runner.rs),
it was time to actually take them for a spin. I used it was time to actually take them for a spin. I used
[this script](https://github.com/bspeice/speice.io-md_shootout/blob/master/run_shootout.sh) to run the benchmarks, [this script](https://github.com/speice-io/marketdata-shootout/blob/master/run_shootout.sh) to run the benchmarks,
and the raw results are [here](https://github.com/bspeice/speice.io-md_shootout/blob/master/shootout.csv). All data and the raw results are [here](https://github.com/speice-io/marketdata-shootout/blob/master/shootout.csv). All data
reported below is the average of 10 runs on a single day of IEX data. Results were validated to make sure reported below is the average of 10 runs on a single day of IEX data. Results were validated to make sure
that each format parsed the data correctly. that each format parsed the data correctly.
## Serialization ## Serialization
This test measures, on a This test measures, on a
[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L268-L272), [per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L268-L272),
how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer. how long it takes to serialize the IEX message into the desired format and write to a pre-allocated buffer.
| Schema | Median | 99th Pctl | 99.9th Pctl | Total | | Schema | Median | 99th Pctl | 99.9th Pctl | Total |
@ -213,7 +213,7 @@ how long it takes to serialize the IEX message into the desired format and write
## Deserialization ## Deserialization
This test measures, on a This test measures, on a
[per-message basis](https://github.com/bspeice/speice.io-md_shootout/blob/master/src/main.rs#L294-L298), [per-message basis](https://github.com/speice-io/marketdata-shootout/blob/master/src/main.rs#L294-L298),
how long it takes to read the previously-serialized message and how long it takes to read the previously-serialized message and
perform some basic aggregation. The aggregation code is the same for each format, perform some basic aggregation. The aggregation code is the same for each format,
so any performance differences are due solely to the format implementation. so any performance differences are due solely to the format implementation.