Writeup for Flatbuffers

This commit is contained in:
Bradlee Speice 2019-09-27 23:20:46 -04:00
parent 388bd413d5
commit 50df5f19c2

View File

@ -121,18 +121,48 @@ content, but because builders [can't be re-used](https://github.com/capnproto/ca
a new buffer for every single message. I was able to work around this and re-use memory with a
[special builder](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L17-L51),
but it required reading through Cap'n Proto's [benchmarks](https://github.com/capnproto/capnproto-rust/blob/master/benchmark/benchmark.rs#L124-L156)
to find an example usage and using `transmute` to bypass Rust's borrow checker.
to find an example and using `transmute` to bypass Rust's borrow checker.
Reading messages was similarly problematic. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)
version, and an unpacked version. When reading "packed" messages, we need to unpack the message before we can make use of it.
This allocates a new buffer for each message, and I wasn't able to find a way to get around this. Unpacked messages, however,
shouldn't require any allocation or decoding steps. In practice, because of a
[bounds check](https://github.com/capnproto/capnproto-rust/blob/master/capnp/src/serialize.rs#L60) on the payload size,
I had to [copy parts](https://github.com/bspeice/speice.io-md_shootout/blob/369613843d39cfdc728e1003123bf87f79422497/src/capnp_runner.rs#L255-L340)
of the Cap'n Proto API to read messages without allocation.
Reading messages is better, but still had issues. Cap'n Proto has two message encodings: a ["packed"](https://capnproto.org/encoding.html#packing)
version, and an unpacked version. When reading "packed" messages, we need a buffer to unpack the message into before we can use it;
Cap'n Proto allocates a new buffer to unpack the message every time, and I wasn't able to figure out a way around that.
In contrast, the unpacked message format should be where Cap'n Proto shines; its main selling point is that there's [no decoding step](https://capnproto.org/).
However, accomplishing this required copying code from the private API ([since fixed](https://github.com/capnproto/capnproto-rust/issues/148)),
and we still allocate a vector on every read for the segment table.
In the end, I put in significant work to make Cap'n Proto as fast as possible, but there were too many issues for me to feel
comfortable making use of Cap'n Proto.
In the end, I put in significant work to make Cap'n Proto as fast as possible in the tests, but there were too many issues
for me to feel comfortable using it long-term.
# Part 2: Flatbuffers
This is the new kid on the block. After a [first attempt](https://github.com/google/flatbuffers/pull/3894) didn't work out,
official support was [recently added](https://github.com/google/flatbuffers/pull/4898). Flatbuffers is intended to address
the same problems as Cap'n Proto; have a binary schema to describe the format that can be used from many languages. The difference
is that Flatbuffers claims to have a simpler wire format and [more flexibility](https://google.github.io/flatbuffers/flatbuffers_benchmarks.html).
On the whole, I enjoyed using Flatbuffers; the [tooling](https://crates.io/crates/flatc-rust) is nice enough, and unlike
Cap'n Proto, parsing messages was actually zero-copy and zero-allocation. There were some issues though.
First, Flatbuffers (at least in Rust) can't handle nested vectors. This is a problem for formats like the following:
```flatbuffers
table Message {
symbol: string;
}
table MultiMessage {
messages:[Message];
}
```
We want to create a `MultiMessage` that contains a vector of `Message`, but each `Message` has a vector (the `string` type).
I was able to work around this by [caching `Message` elements](https://github.com/bspeice/speice.io-md_shootout/blob/e9d07d148bf36a211a6f86802b313c4918377d1b/src/flatbuffers_runner.rs#L83)
in a `SmallVec` before building the final `MultiMessage`, but it was a painful process.
Second, streaming support in Flatbuffers seems to be something of an [afterthought](https://github.com/google/flatbuffers/issues/3898).
Where Cap'n Proto in Rust handles reading messages from a stream as part of the API, Flatbuffers just puts a `u32` at the front of each
message to indicate the size. Not specifically a problem, but I would've rather seen message size integrated into the underlying format.
Ultimately, I enjoyed using Flatbuffers, and had to do significantly less work to make it fast.
# Final Results