Skip to content

Latest commit

 

History

History
158 lines (132 loc) · 8.41 KB

comparison.org

File metadata and controls

158 lines (132 loc) · 8.41 KB

Comparison of JSON, BULK, CBOR, MsgPack and BSON

Architecture

The main architectural differences are the following:

  • structure
    • the document in JSON is a single value that can be an atom or a data structure; data structures can be heterogenous lists or mappings from names to values (and complex structures use nested mappings and arrays)
      • BSON forces the top-level value to be a mapping
    • the document in MsgPack and CBOR is a length-carrying data structre
    • a BULK stream is zero, one or several values (and complex structures use nested lists)
  • identifiers
    • JSON uses text names, effectively creating a tree of directories with named entries
      • true for mappings in BSON and MsgPack
    • CBOR and BULK don’t mandate the use of text names
      • a stream could be made almost entirely of data, except for a few marker bytes for nesting and primitive types, when a prior agreement on a schema exists
      • names in BULK are usually 136bits collision-free words, represented in the stream as 16bits markers

Now, why don’t JSON users already use number as identifiers for compactness? I see two reasons.

First, it’d make JSON data far less readable, although you could easily use a schema to transform any number-id JSON document to a text-id equivalent. But an error in the use of the schema in the document makes it impossible to render correctly (this is related to the next issue).

Second and probably more important: naming collisions. You cannot store number-id JSON documents and keep the ability to insert a document within another, because their ids are likely to clash, thus yielding an ambiguous sturcture. Naming collisions are also possible with text, but they are far less likely and immensely easier to debug.

It is important to note that naming collisions are possible and are bound to happen if you try to use JSON to store arbitatray complex aggregated data. But it is neither a capability nor an intended use case of JSON. JSON itself cannot store arbitrary binary data without some encoding like base64. BSON is actually even limited in the size of documents (to 4Gb).

Contrary to JSON, BSON, MsgPack or CBOR, a BULK stream doesn’t need a schema to be transformed to a human-readable structure, because number-ids are unique (you just need a mapping between those numbers and readable names, but that mapping is of no concern to applications using the data for anything else than presenting the wire format to humans). So you actually have both compactness and the potential readability of JSON.

Feature grid

JSONBULKCBORMsgPackBSON
decentralizednoyesnonono
streamablenoyesyes*nono
file size limitnononono4G bytes
container size limitnono4G items4G items4G bytes
smallest padding1 byte1 byte1 byte1 byte2 bytes
most probable padding0x200x000x00?0x00??
future extension slotsn/aunlimited254*128235
  • streamable means being able to be sent before all data is known and that it can be parsed before the end of stream is reached, without inspecting partially sent structures
    • CBOR was not initially streamable, but RFC 8742 adds a new parsing rule that makes it streamable
  • CBOR has actually 1.8e19 names avalaible for extension, but only if the size of wire data is allowed to get bigger. Some names will take 3, 5 or 9 bytes and the first extensions will get the shorter names.
  • MsgPack and CBOR may use 0x00 as padding, but it represents the zero integer

Wire data

Let’s compare a few simple examples. The data in the table is the size of the example in bytes.

format123
BULK582616n+8
CBOR1004927n+18
MsgPack1064933n+19
Naive BULK1235236n+21
JSON1456039n+24
BSON1786547n+27

JSON

1
{"status":"ok","data":[{"foo":"bar","baz":"quux","value":12.5},{"foo":"bar","baz":"quuux","value":0},{"foo":"fubar","baz":"quuuux","value":-40}]}
2
{"status":"error","code":735,"message":"delegation expired"}
3
{"status":"ok","data":[{"foo":"bar","baz":"quux","value":12.5} repeated n times]}

BULK

1
( example:ok ( ( example:foobaz/f "bar" "quux" #[4] 0x41480000 ) ( example:foobaz/i "bar" "quuux" 0 ) ( example:foobaz/i "fubar" "quuuux" #[1] 0xD8 ) ) )
2
( example:error 735 "delegation expired" )
3
( example:ok ( bulk:prefix example:foobaz/f "bar" "quux" uw32 0x41480000 example:foobaz/f "bar" "quux" uw32 0x41480000 … ) )

Profile

These examples are intended to be used with the following profile (this is not a schema, as it is not needed to parse the data):

( bulk:ns 0x1800 #[16] 0x4C89D12F-1D71-4132-92E9-6335797659EE )
( bulk:ns-mnemonic 0x1800 "example" )
( bulk:mnemonic/def 0x1800 "ok" "successful message containing data list" )
( bulk:mnemonic/def 0x1801 "error" "error message with code and message" )
( bulk:mnemonic/dev 0x1802 "foobaz" "foobaz data item" )
( bulk:mnemonic/dev 0x1803 "foobaz/i" "foobaz data item with int" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:signed-int ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1804 "foobaz/f" "foobaz data item with float" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:binfloat ( bulk:arg 2 ) ) ) ) )
( bulk:arity 3 example:foobaz example:foobaz/i example:foobaz/f )

If the message format must be flexible to removing and adding fields, this example could be used with the following profile where the forms are evaluated into JSON-like maps:

( bulk:ns 0x1800 #[16] 0x4C89D12F-1D71-4132-92E9-6335797659EE )
( bulk:ns-mnemonic 0x1800 "example" )
( bulk:mnemonic/def 0x1800 "ok" "successful message containing data list" ( bulk:subst ( example:map "status" "ok" "data" ( bulk:arg 0 ) ) ) )
( bulk:mnemonic/def 0x1801 "error" "error message with code and message" ( bulk:subst ( example:map "status" "error" "code" ( bulk:arg 0 ) "message" ( bulk:arg 1 ) ) ) )
( bulk:mnemonic/dev 0x1802 "foobaz" "foobaz data item" ( bulk:subst ( example:map "foo" ( bulk:arg 0 ) "baz" ( bulk:arg 1 ) "value" ( bulk:arg 2 ) ) ) )
( bulk:mnemonic/dev 0x1803 "foobaz/i" "foobaz data item with int" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:signed-int ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1804 "foobaz/f" "foobaz data item with float" ( bulk:subst ( example:foobaz ( bulk:arg 0 ) ( bulk:arg 1 ) ( bulk:binary-float ( bulk:arg 2 ) ) ) ) )
( bulk:mnemonic/dev 0x1805 "map" "like JSON objects" )
( bulk:arity 3 example:foobaz example:foobaz/i example:foobaz/f )

Naive BULK

This makes no use of BULK’s ability to abstract common patterns, just encodes JSON maps and arrays as-is.

1
( example:map "status" "ok" "data" ( ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) ( example:map "foo" "bar" "baz" "quuux" "value" 0 ) ( example:map "foo" "fubar" "baz" "quuuux" "value" -40 ) ) )
2
( example:map "status" "error" "code" 735 "message" "delegation expired" )
3
( example:map "status" "ok" "data" ( ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) ( example:map "foo" "bar" "baz" "quux" "value" ( bulk:binary-float #[4] 0x41480000 ) ) … ) )

BSON

https://euandreh.github.io/cl-BSON/api.html

MsgPack

https://msgpack.org/

CBOR

https://cbor.me/