Writeup for Flatbuffers

timing
Bradlee Speice 2019-09-27 23:20:46 -04:00
parent 388bd413d5
commit 50df5f19c2
1 changed files with 40 additions and 10 deletions

View File

@ -121,18 +121,48 @@ content, but because builders [can't be re-used](https://github.com/capnproto/ca
a new buffer for every single message. I was able to work around this and re-use memory with a a new buffer for every single message. I was able to work around this and re-use memory with a
[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51), [special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51),
but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156) but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)
to find an example usage and using `transmute` to bypass Rust's borrow checker. to find an example and using `transmute` to bypass Rust's borrow checker.
Reading messages was similarly problematic. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing) Reading messages is better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)
version, and an unpacked version. When reading "packed" messages, we need to unpack the message before we can make use of it. version, and an unpacked version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it;
This allocates a new buffer for each message, and I wasn't able to find a way to get around this. Unpacked messages, however, Cap'n Proto allocates a new buffer to unpack the message every time, and I wasn't able to figure out a way around that.
shouldn't require any allocation or decoding steps. In practice, because of a In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/).
[bounds check](https://github.com/capnproto/capnproto-rust/blob/master/capnp/src/serialize.rs#L60) on the payload size, However, accomplishing this required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)),
I had to [copy parts](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L255-L340) and we still allocate a vector on every read for the segment table.
of the Cap'n Proto API to read messages without allocation.
In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues
comfortable making use of Cap'n Proto. for me to feel comfortable using it long-term.
# Part 2: Flatbuffers
This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't work out,
official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers is intended to address
the same problems as Cap'n Proto; have a binary schema to describe the format that can be used from many languages. The difference
is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).
On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike
Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though.
First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following:
```flatbuffers
table Message {
symbol: string;
}
table MultiMessage {
messages:[Message];
}
```
We want to create a `MultiMessage` that contains a vector of `Message`, but each `Message` has a vector (the `string` type).
I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)
in a `SmallVec` before building the final `MultiMessage`, but it was a painful process.
Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898).
Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each
message to indicate the size. Not specifically a problem, but I would've rather seen message size integrated into the underlying format.
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it fast.
# Final Results # Final Results