From 11358dfbcfe62b32f6a8789f1ee38f95fe2bd87b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 08:39:39 +0000 Subject: [PATCH 01/11] Initial EEP68 - JSON --- eeps/eep-0068.md | 369 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 369 insertions(+) create mode 100644 eeps/eep-0068.md diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md new file mode 100644 index 0000000..554aeea --- /dev/null +++ b/eeps/eep-0068.md @@ -0,0 +1,369 @@ + Author: Michał Muskała + Status: Draft + Type: Standards Track + Created: 12-02-2024 + Erlang-Version: + Post-History: +**** +# EEP 68: JSON library +---- + +## Abstract + +This EEP proposes introducing a module `json` to the Erlang standard +library with support for encoding and decoding [JSON][1] documents +from and to Erlang data structures. The main reason is to cover +a gap in the Erlang standard library with regards to such a vastly +popular and widespread data format. + +## Rationale + +JSON is commonly in many different use-cases: +* by web services as a lightweight and human-readable data interchange format; +* as a configuration language in static files; +* as data interchange format by developer tooling; +* and more. + +There are many existing JSON libraries for Erlang and other BEAM languages, +however adding such a support to standard library would offer unique benefits. +Most notably being able to use it in situations where leveraging third-party +libraries is complex or cumbersome -- such as stand-alone escripts or +fundamental tooling like a build system, or inside OTP itself. + +There have been previous attempts to bring JSON support into OTP, most notably +[EEP 18][EEP], which ultimately weren't adopted previously for various reasons. +However, I believe the time is right to revisit this subject with a fresh +take on an interface such support could take. + +JSON is a well defined format specified in parallel in [RFC 8259][RFC] and +[ECMA 404][ECMA], however how this representation should be translated +into Erlang is not fully clear since the data structures don't present +a direct, 1:1 mapping. To help with this, this EEP proposes an interface +that presents both a convenient and "cannonical" simple API, as well +as an extensible and highly-customisable API with common underlying +implementation. + +This EEP proposes a JSON library which: +* should be easy to adopt in large codebases using one of the popular, + existing, open-source JSON libraries; +* will allow the existing open-source libraries with custom features + (like support for Elixir protocols) to become thin wrappers around + this library; +* will improve, or at least not regress, performance compared to + leading open-source JSON libraries. + +The proposed JSON library will provide: +* JSON encoding, allowing for single-pass encoding of custom data types –- + in particular, for Elixir, integrating with a protocol through a thin layer + (implemented outside of OTP); +* JSON decoding with some streaming support allowing to decode messages that + don't fully fit into memory; +* JSON decoding with support for decoding values split across separate + messages without fully concatenating them upfront; +* focus on high-performance encoding and decoding; +* full conformance to [RFC 8259][RFC] and [ECMA 404][ECMA] standards, + the decoder should pass the entire [JSONTestSuite][JSONTestSuite]; +* simple API for common use-cases with canonical data type mapping. + +## Design choices + +### Data mapping + +We propose, in the "cannonical" API to map JSON data structues to +Erlang and back in the following way: + +| **Decoding from JSON** | **Erlang** | **Encoding into JSON** | +|------------------------|----------------------|------------------------| +| Number | integer() \| float() | Number | +| Boolean | true \| false | Boolean | +| Null | null | Null | +| String | binary() | String | +| | atom() | String | +| Array | list() | Array | +| Object | #{binary() => _} | Object | +| | #{atom() => _} | Object | +| | #{integer() => _} | Object | + +Erlang has generally a richer value system than JSON, therefore +there's generally more types that can be encoded into JSON, +even if they can never be produced directly by the decoder. + +However, with the flexible API, as demonstrated below, the user will +be able to customize the decoding & encoding routines to produce and +consume any Erlang term as necessary in the particular application. + +### Streaming vs value-based parser + +When it comes to data-structure parsers it's common to encounter two +types: ones that given the data produce a complete parsed value, +and others the same data produce a stream of events that can later +be processed to extract values. + +The first kind, which we'll call here value-based, is generally simpler, +usually more efficient, and more convient to use. The second one offers +unique advantages in specific use-cases: for example, where data +can't fully fit into memory. + +For the proposed `json` library this EEP suggests a hybrid approach. + +First, a simple, value-based API: + +```erlang +-type value() :: + integer() | + float() | + boolean() | + null | + binary() | + list(value()) | + #{binary() => value()}. + +-spec decode(binary()) -> value(). +``` + +Error handling is achieved through exceptions. The following errors +are possible: +```erlang +-type error() :: + unexpected_end | + {unexpected_sequence, binary()} | + {invalid_byte, byte()} +``` + +The exceptions might be enhanced through the [Error Info][ERRINFO] mechanism +with additional meta-data like byte offset where the error occured. + +For the advanced and customizable API, this EEP proposes a callback-based +API that the decoder will use to produce values from the data it parses. + +```erlang +-type from_binary_fun() :: fun((binary()) -> dynamic()). +-type array_start_fun() :: fun((Acc :: dynamic()) -> ArrayAcc :: dynamic()). +-type array_push_fun() :: fun((Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). +-type array_finish_fun() :: fun((ArrayAcc :: dynamic()) -> dynamic()). +-type object_start_fun() :: fun((Acc :: dynamic()) -> ObjectAcc :: dynamic()). +-type object_push_fun() :: fun((Key :: dynamic(), Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). +-type object_finish_fun() :: fun((ObjectAcc :: dynamic()) -> dynamic()). + +-type decoders() :: #{ + empty_array => term(), + array_start => array_start_fun(), + array_push => array_push_fun(), + array_finish => array_finish_fun(), + empty_object => term(), + object_start => object_start_fun(), + object_push => object_push_fun(), + object_finish => object_finish_fun(), + float => from_binary_fun(), + integer => from_binary_fun(), + string => from_binary_fun(), + null => term() +}. + +-spec decode(binary(), Acc :: dynamic(), decoders()) -> + {Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()}. +``` + +This allows the user to fully customize the decoded format, including +features seen in open-source JSON libraries: +* decoding string keys as atoms; +* decoding objects as lists of pairs; +* decoding floats as custom structures with decimal precision; +* decoding `null` as another atom, in particular `undefined` or `nil`; +* using `binary:copy/1` on strings that will be retained in memory; +* decoding multiple JSON messages from a single binary blob; +* and more. + +Furthermore, this allows the user to only retain parts of the data structure +to achieve results similar to using a streaming SAX-like parser for data +that does't fully fit into memory. + +All the callbacks are optional and have a default value correspnding to the +"simple" API behaviour and using lists as accumulators. + +### Incomplete data parsing + +We propose a future enhancement to the full `decode/3` API, where +it can return an `{incomplete, continuation()}` value that can be used to +decode values split across multiple binary blobs (for example as received +from a TCP socket). + +```erlang +-spec decode_continue(binary(), continuation()) -> + {Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()} | + {incomplete, continuation()}. +``` + +### Encoding API + +For encoding this EEP again proposes two separate sets of APIs. +A simple API using "cannonical" data types: + +```erlang +-type encode_value() :: + integer() | + float() | + boolean() | + null | + binary() | + atom() | + list(encode_value()) | + #{binary() | atom() | integer() => encode_value()}. + +-spec encode(encode_value()) -> iodata(). +``` + +And an advanced, callback-based API allowing for single-pass encoding +of custom data structures. This API is acompanied by a set of functions +facilitating the implementation of custom encoding callbacks. + +```erlang +-type encoder() :: fun((dynamic(), encoder()) -> iodata()). + +-spec encode(dynamic(), encoder()) -> iodata(). + +-spec encode_value(dynamic(), encoder()) -> iodata(). +-spec encode_atom(atom(), encoder()) -> iodata(). +-spec encode_integer(integer()) -> iodata(). +-spec encode_float(float()) -> iodata(). +-spec encode_list(list(), encoder()) -> iodata(). +-spec encode_map(map(), encoder()) -> iodata(). +-spec encode_map_checked(map(), encoder()) -> iodata(). +-spec encode_key_value_list([{dynamic(), dynamic()}], encoder()) -> iodata(). +-spec encode_key_value_list_checked([{dynamic(), dynamic()}], encoder()) -> iodata(). +-spec encode_binary(binary()) -> iodata(). +-spec encode_binary_escape_all(binary()) -> iodata(). +``` + +The `encoder()` callback is invoked on every value during traversal. +The simple API specified above is equivalent to using the +`fun json:encode_value/2` function as the encoder. + +The `*_checked/2` variants of functions offer verifying the encoder +doesn't produce repeated keys. +The default `encode_binary/1` function will emit unescaped unicode values +as allowed by the specifications; however for compatibility reasons +we provide the optional `encode_binary_escape_all/1` function +that will always produce purely ASCII messages encoding all higher +unicode values with the `\u` escape sequences. + + +### Formatting and pretty-printing + +This EEP further proposes an additional API for formatting (and pretty-printing) +JSON messages. This API consists of transforming a textual JSON message into +a formatted JSON message. +This is the most flexible solution that orthogonally supports +formatting results of custom encoding functions like described above, +without adding the burden of complex formatting options in the middle of the +encoders. +Formatting isn't usually done in critical hot-paths of high-performance +services, thgerefore the overhead of a two-pass formatting is deemed acceptable. + +```erlang +-type format_option() :: #{ + indent => iodata(), + line_separator => iodata(), + after_colon => iodata() +}. +-spec format(iodata()) -> iodata(). +-spec format(iodata(), format_option()) -> iodata(). +``` + +## Reference Implementation + +[PR-8111][PR] Implements the `encode/1`, `encode/2`, `decode/1`, and `decode/3` +functions as proposed in this EEP. +The formatting API and the support for incomplete message decoding is left +as a follow-up taskk. + +## Appendix + +### Example of a decoding trace + +Given the following data: +```json +{"a": [[], {}, true, false, null, {"foo": "baz"}], "b": [1, 2.0, "three"]} +``` +the decoding APIs will be called with following arguments: +```erlang +object_start(Acc0) => Acc1 + string(<<"a">>) => Str1 + array_start(Acc1) => Acc2 + empty_array() => Arr1 + array_push(Acc2, Arr1) => Acc3 + empty_object() => Obj1 + array_push(Obj1, Acc3) => Acc4 + array_push(true, Acc4) => Acc5 + array_push(false, Acc5) => Acc6 + null() => Null + array_push(Null, Acc6) => Acc7 + object_start(Acc7) => Acc8 + string(<<"foo">>) => Str2 + string(<<"baz">>) => Str3 + object_push(Str2, Str3, Acc8) => Acc9 + object_finish(Acc9) => Obj2 + array_push(Obj2, Acc7) => Acc10 + array_finish(Acc10) => Arr1 + object_push(Arr1, Acc1) => Acc11 + string(<<"b">>) => Str4 + array_start(Acc11) => Acc12 + integer(<<"1">>) => Int1 + array_push(Int1, Acc12) => Acc13 + float(<<"2.0">>) => Float1 + array_push(Float1, Acc13) => Acc14 + string(<<"three">>) => Str5 + array_push(Str5, Acc14) => Acc15 + array_finish(Acc15) => Arr2 + object_push(Str4, Arr2, Acc11) => Acc16 +object_finish(Acc16) => Obj3 +% final decode/3 return +{Obj3, Acc16, <<"">>} +``` + +### Example of a custom encoder + +An example of a custom encoder that would support using a heuristic +to differentiate pais of object-like key-value lists from plain +lists of values could look as follows: +```erlang +custom_encode(Value) -> json:encode(Value, fun encoder/2). + +encoder(null, _Encode) -> <<"\"null\"">>; +encoder(nil, _Encode) -> <<"null">>; +encoder([{_, _} | _] = Value, Encode) -> json:encode_key_value_list(Value, Encode); +encoder(Other, Encode) -> json:encode_value(Other, Encode). +``` + +Another encoder that supports using Elixir `nil` as Null and protocols for +further customisation could look as follows: +```erlang +encoder(nil, _Encode) -> <<"null">>; +encoder(null, _Encode) -> <<"\"null\"">>; +encoder(#{__struct__ => _} = Struct, Encode) -> 'Elixir.JSONProtocol':encode(Struct, Encode); +encoder(Other, Encode) -> json:encode_value(Other, Encode). +``` + +[1]: https://www.json.org/json-en.html + "Introducing JSON" + +[RFC]: https://datatracker.ietf.org/doc/html/rfc8259 + "The JavaScript Object Notation (JSON) Data Interchange Format" + +[ECMA]: https://ecma-international.org/publications-and-standards/standards/ecma-404/ + "The JSON data interchange syntax" + +[EEP]: https://github.com/erlang/eep/blob/master/eeps/eep-0018.md + "EEP 18: JSON bifs" + +[ERRINFO]: https://github.com/erlang/eep/blob/master/eeps/eep-0054.md + "EEP 54: Provide more information about errors" + +[JSONTestSuite]: https://github.com/nst/JSONTestSuite + +[PR]: https://github.com/erlang/otp/pull/8111 + +## Copyright + +This document is placed in the public domain or under the CC0-1.0-Universal +license, whichever is more permissive. From 4b4dff6c2e39b67db60caac34317d7722f9f6f49 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 10:20:25 +0000 Subject: [PATCH 02/11] Update eeps/eep-0068.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: José Valim --- eeps/eep-0068.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 554aeea..5788a4f 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -329,8 +329,6 @@ lists of values could look as follows: ```erlang custom_encode(Value) -> json:encode(Value, fun encoder/2). -encoder(null, _Encode) -> <<"\"null\"">>; -encoder(nil, _Encode) -> <<"null">>; encoder([{_, _} | _] = Value, Encode) -> json:encode_key_value_list(Value, Encode); encoder(Other, Encode) -> json:encode_value(Other, Encode). ``` From 108c489620e4d378f1b412f8be25d5f61f04a633 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 10:20:59 +0000 Subject: [PATCH 03/11] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: José Valim --- eeps/eep-0068.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 5788a4f..17d5e3c 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -39,7 +39,7 @@ JSON is a well defined format specified in parallel in [RFC 8259][RFC] and [ECMA 404][ECMA], however how this representation should be translated into Erlang is not fully clear since the data structures don't present a direct, 1:1 mapping. To help with this, this EEP proposes an interface -that presents both a convenient and "cannonical" simple API, as well +that presents both a convenient and "canonical" simple API, as well as an extensible and highly-customisable API with common underlying implementation. @@ -258,7 +258,7 @@ formatting results of custom encoding functions like described above, without adding the burden of complex formatting options in the middle of the encoders. Formatting isn't usually done in critical hot-paths of high-performance -services, thgerefore the overhead of a two-pass formatting is deemed acceptable. +services, therefore the overhead of a two-pass formatting is deemed acceptable. ```erlang -type format_option() :: #{ @@ -275,7 +275,7 @@ services, thgerefore the overhead of a two-pass formatting is deemed acceptable. [PR-8111][PR] Implements the `encode/1`, `encode/2`, `decode/1`, and `decode/3` functions as proposed in this EEP. The formatting API and the support for incomplete message decoding is left -as a follow-up taskk. +as a follow-up task. ## Appendix From 8b94fc5a96d5bde12591ae943b0c2791777c61fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 10:38:09 +0000 Subject: [PATCH 04/11] More review suggestions --- eeps/eep-0068.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 17d5e3c..96f8e37 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -69,7 +69,7 @@ The proposed JSON library will provide: ### Data mapping -We propose, in the "cannonical" API to map JSON data structues to +We propose, in the "canonical" API to map JSON data structues to Erlang and back in the following way: | **Decoding from JSON** | **Erlang** | **Encoding into JSON** | @@ -92,6 +92,11 @@ However, with the flexible API, as demonstrated below, the user will be able to customize the decoding & encoding routines to produce and consume any Erlang term as necessary in the particular application. +**Note**: A decode-encode rountrip might not produce the same data, +even with custom decoders -- since JSON has such a limited data-type +options, compared to Erlang, some information will be commonly be lost, +for example, coercing all keys in maps to binaries. + ### Streaming vs value-based parser When it comes to data-structure parsers it's common to encounter two @@ -197,7 +202,7 @@ from a TCP socket). ### Encoding API For encoding this EEP again proposes two separate sets of APIs. -A simple API using "cannonical" data types: +A simple API using "canonical" data types: ```erlang -type encode_value() :: From 85eaa90297d86f184a6033d35d2801a5c3129e91 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 11:00:30 +0000 Subject: [PATCH 05/11] Apply suggestions from code review Co-authored-by: Magnus Henoch --- eeps/eep-0068.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 96f8e37..748938f 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -105,7 +105,7 @@ and others the same data produce a stream of events that can later be processed to extract values. The first kind, which we'll call here value-based, is generally simpler, -usually more efficient, and more convient to use. The second one offers +usually more efficient, and more convenient to use. The second one offers unique advantages in specific use-cases: for example, where data can't fully fit into memory. @@ -181,9 +181,9 @@ features seen in open-source JSON libraries: Furthermore, this allows the user to only retain parts of the data structure to achieve results similar to using a streaming SAX-like parser for data -that does't fully fit into memory. +that doesn't fully fit into memory. -All the callbacks are optional and have a default value correspnding to the +All the callbacks are optional and have a default value corresponding to the "simple" API behaviour and using lists as accumulators. ### Incomplete data parsing From eadd22a54fcdf51f2b53eaf8a5a1e7041666750b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 11:00:55 +0000 Subject: [PATCH 06/11] More typos --- eeps/eep-0068.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 748938f..07b4e67 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -329,7 +329,7 @@ object_finish(Acc16) => Obj3 ### Example of a custom encoder An example of a custom encoder that would support using a heuristic -to differentiate pais of object-like key-value lists from plain +to differentiate pairs of object-like key-value lists from plain lists of values could look as follows: ```erlang custom_encode(Value) -> json:encode(Value, fun encoder/2). From 5b43d9a040c0ca1f63ba71786857ad375664e222 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Mon, 12 Feb 2024 11:59:55 +0000 Subject: [PATCH 07/11] More proofreading errors --- eeps/eep-0068.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 07b4e67..b2d7ac2 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -18,7 +18,7 @@ popular and widespread data format. ## Rationale -JSON is commonly in many different use-cases: +JSON is commonly used in many different use-cases: * by web services as a lightweight and human-readable data interchange format; * as a configuration language in static files; * as data interchange format by developer tooling; @@ -219,7 +219,7 @@ A simple API using "canonical" data types: ``` And an advanced, callback-based API allowing for single-pass encoding -of custom data structures. This API is acompanied by a set of functions +of custom data structures. This API is accompanied by a set of functions facilitating the implementation of custom encoding callbacks. ```erlang @@ -290,7 +290,7 @@ Given the following data: ```json {"a": [[], {}, true, false, null, {"foo": "baz"}], "b": [1, 2.0, "three"]} ``` -the decoding APIs will be called with following arguments: +the decoding APIs will be called with the following arguments: ```erlang object_start(Acc0) => Acc1 string(<<"a">>) => Str1 From 064402c0037c9194837f4e6f0a6d32e4102f726c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Tue, 20 Feb 2024 15:40:24 +0000 Subject: [PATCH 08/11] Update to _finish callback return new acc --- eeps/eep-0068.md | 45 ++++++++++++++++++++++++++++++--------------- 1 file changed, 30 insertions(+), 15 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index b2d7ac2..ec1868e 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -145,17 +145,15 @@ API that the decoder will use to produce values from the data it parses. -type from_binary_fun() :: fun((binary()) -> dynamic()). -type array_start_fun() :: fun((Acc :: dynamic()) -> ArrayAcc :: dynamic()). -type array_push_fun() :: fun((Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). --type array_finish_fun() :: fun((ArrayAcc :: dynamic()) -> dynamic()). +-type array_finish_fun() :: fun((ArrayAcc :: dynamic(), OldAcc :: dynamic()) -> {dynamic(), Acc :: dynamic()}). -type object_start_fun() :: fun((Acc :: dynamic()) -> ObjectAcc :: dynamic()). -type object_push_fun() :: fun((Key :: dynamic(), Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). --type object_finish_fun() :: fun((ObjectAcc :: dynamic()) -> dynamic()). +-type object_finish_fun() :: fun((ObjectAcc :: dynamic(), OldAcc :: dynamic()) -> {dynamic(), Acc :: dynamic()}). -type decoders() :: #{ - empty_array => term(), array_start => array_start_fun(), array_push => array_push_fun(), array_finish => array_finish_fun(), - empty_object => term(), object_start => object_start_fun(), object_push => object_push_fun(), object_finish => object_finish_fun(), @@ -183,8 +181,25 @@ Furthermore, this allows the user to only retain parts of the data structure to achieve results similar to using a streaming SAX-like parser for data that doesn't fully fit into memory. +The `array_finish` and `object_finish` callbacks are responsible for +restoring the accumulator to continue processing the parent object. +To simplify the case where accumulators are not connected, these +callbacks receive value of the accumulator that was passed to the +corresponding `_start` call. + All the callbacks are optional and have a default value corresponding to the -"simple" API behaviour and using lists as accumulators. +"simple" API behaviour, using lists as accumulators, in particular: + +* for `array_start`: `fun(_) -> [] end` +* for `array_push`: `fun(Elem, Acc) -> [Elem | Acc] end` +* for `array_finish`: `fun(Acc, OldAcc) -> {lists:reverse(Acc), OldAcc} end` +* for `object_start`: `fun(_) -> [] end` +* for `object_push`: `fun(Key, Value, Acc) -> [{Key, Value} | Acc] end` +* for `object_finish`: `fun(Acc, OldAcc) -> {maps:from_list(Acc), OldAcc} end` +* for `float`: `fun erlang:binary_to_float/1` +* for `integer`: `fun erlang:binary_to_integer/1` +* for `string`: `fun (Value) -> Value end` +* for `null`: the atom `null` ### Incomplete data parsing @@ -309,21 +324,21 @@ object_start(Acc0) => Acc1 object_push(Str2, Str3, Acc8) => Acc9 object_finish(Acc9) => Obj2 array_push(Obj2, Acc7) => Acc10 - array_finish(Acc10) => Arr1 - object_push(Arr1, Acc1) => Acc11 + array_finish(Acc10, Acc1) => {Arr1, Acc11} + object_push(Arr1, Acc11) => Acc12 string(<<"b">>) => Str4 - array_start(Acc11) => Acc12 + array_start(Acc12) => Acc13 integer(<<"1">>) => Int1 - array_push(Int1, Acc12) => Acc13 + array_push(Int1, Acc13) => Acc14 float(<<"2.0">>) => Float1 - array_push(Float1, Acc13) => Acc14 + array_push(Float1, Acc14) => Acc15 string(<<"three">>) => Str5 - array_push(Str5, Acc14) => Acc15 - array_finish(Acc15) => Arr2 - object_push(Str4, Arr2, Acc11) => Acc16 -object_finish(Acc16) => Obj3 + array_push(Str5, Acc15) => Acc16 + array_finish(Acc16, Acc12) => {Arr2, Acc17} + object_push(Str4, Arr2, Acc17) => Acc18 +object_finish(Acc18, Acc0) => {Obj3, Acc19} % final decode/3 return -{Obj3, Acc16, <<"">>} +{Obj3, Acc19, <<"">>} ``` ### Example of a custom encoder From 4c8e3cad8174591368f2bf411d5bb89e6f72427f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Tue, 20 Feb 2024 15:50:00 +0000 Subject: [PATCH 09/11] Formatting fixes --- eeps/eep-0068.md | 51 +++++++++++++++++++++++++++++++++++------------- 1 file changed, 37 insertions(+), 14 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index ec1868e..94b8753 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -5,10 +5,11 @@ Erlang-Version: Post-History: **** -# EEP 68: JSON library +EEP 68: JSON library ---- -## Abstract +Abstract +======== This EEP proposes introducing a module `json` to the Erlang standard library with support for encoding and decoding [JSON][1] documents @@ -16,9 +17,11 @@ from and to Erlang data structures. The main reason is to cover a gap in the Erlang standard library with regards to such a vastly popular and widespread data format. -## Rationale +Rationale +========= JSON is commonly used in many different use-cases: + * by web services as a lightweight and human-readable data interchange format; * as a configuration language in static files; * as data interchange format by developer tooling; @@ -44,6 +47,7 @@ as an extensible and highly-customisable API with common underlying implementation. This EEP proposes a JSON library which: + * should be easy to adopt in large codebases using one of the popular, existing, open-source JSON libraries; * will allow the existing open-source libraries with custom features @@ -53,6 +57,7 @@ This EEP proposes a JSON library which: leading open-source JSON libraries. The proposed JSON library will provide: + * JSON encoding, allowing for single-pass encoding of custom data types –- in particular, for Elixir, integrating with a protocol through a thin layer (implemented outside of OTP); @@ -65,9 +70,11 @@ The proposed JSON library will provide: the decoder should pass the entire [JSONTestSuite][JSONTestSuite]; * simple API for common use-cases with canonical data type mapping. -## Design choices +Design choices +============== -### Data mapping +Data mapping +------------ We propose, in the "canonical" API to map JSON data structues to Erlang and back in the following way: @@ -97,7 +104,8 @@ even with custom decoders -- since JSON has such a limited data-type options, compared to Erlang, some information will be commonly be lost, for example, coercing all keys in maps to binaries. -### Streaming vs value-based parser +Streaming vs value-based parser +------------------------------- When it comes to data-structure parsers it's common to encounter two types: ones that given the data produce a complete parsed value, @@ -128,6 +136,7 @@ First, a simple, value-based API: Error handling is achieved through exceptions. The following errors are possible: + ```erlang -type error() :: unexpected_end | @@ -169,6 +178,7 @@ API that the decoder will use to produce values from the data it parses. This allows the user to fully customize the decoded format, including features seen in open-source JSON libraries: + * decoding string keys as atoms; * decoding objects as lists of pairs; * decoding floats as custom structures with decimal precision; @@ -201,7 +211,8 @@ All the callbacks are optional and have a default value corresponding to the * for `string`: `fun (Value) -> Value end` * for `null`: the atom `null` -### Incomplete data parsing +Incomplete data parsing +----------------------- We propose a future enhancement to the full `decode/3` API, where it can return an `{incomplete, continuation()}` value that can be used to @@ -214,7 +225,8 @@ from a TCP socket). {incomplete, continuation()}. ``` -### Encoding API +Encoding API +------------ For encoding this EEP again proposes two separate sets of APIs. A simple API using "canonical" data types: @@ -268,7 +280,8 @@ that will always produce purely ASCII messages encoding all higher unicode values with the `\u` escape sequences. -### Formatting and pretty-printing +Formatting and pretty-printing +------------------------------ This EEP further proposes an additional API for formatting (and pretty-printing) JSON messages. This API consists of transforming a textual JSON message into @@ -290,22 +303,28 @@ services, therefore the overhead of a two-pass formatting is deemed acceptable. -spec format(iodata(), format_option()) -> iodata(). ``` -## Reference Implementation +Reference Implementation +======================== [PR-8111][PR] Implements the `encode/1`, `encode/2`, `decode/1`, and `decode/3` functions as proposed in this EEP. The formatting API and the support for incomplete message decoding is left as a follow-up task. -## Appendix +Appendix +======== -### Example of a decoding trace +Example of a decoding trace +--------------------------- Given the following data: + ```json {"a": [[], {}, true, false, null, {"foo": "baz"}], "b": [1, 2.0, "three"]} ``` + the decoding APIs will be called with the following arguments: + ```erlang object_start(Acc0) => Acc1 string(<<"a">>) => Str1 @@ -341,11 +360,13 @@ object_finish(Acc18, Acc0) => {Obj3, Acc19} {Obj3, Acc19, <<"">>} ``` -### Example of a custom encoder +Example of a custom encoder +--------------------------- An example of a custom encoder that would support using a heuristic to differentiate pairs of object-like key-value lists from plain lists of values could look as follows: + ```erlang custom_encode(Value) -> json:encode(Value, fun encoder/2). @@ -355,6 +376,7 @@ encoder(Other, Encode) -> json:encode_value(Other, Encode). Another encoder that supports using Elixir `nil` as Null and protocols for further customisation could look as follows: + ```erlang encoder(nil, _Encode) -> <<"null">>; encoder(null, _Encode) -> <<"\"null\"">>; @@ -381,7 +403,8 @@ encoder(Other, Encode) -> json:encode_value(Other, Encode). [PR]: https://github.com/erlang/otp/pull/8111 -## Copyright +Copyright +========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. From adcbed240e16ef0c053a8b3695349afab5ad5d7f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Tue, 20 Feb 2024 16:53:57 +0000 Subject: [PATCH 10/11] More formatting --- eeps/eep-0068.md | 243 +++++++++++++++++++++-------------------------- 1 file changed, 110 insertions(+), 133 deletions(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 94b8753..9adb37e 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -121,28 +121,24 @@ For the proposed `json` library this EEP suggests a hybrid approach. First, a simple, value-based API: -```erlang --type value() :: - integer() | - float() | - boolean() | - null | - binary() | - list(value()) | - #{binary() => value()}. - --spec decode(binary()) -> value(). -``` + -type value() :: + integer() | + float() | + boolean() | + null | + binary() | + list(value()) | + #{binary() => value()}. + + -spec decode(binary()) -> value(). Error handling is achieved through exceptions. The following errors are possible: -```erlang --type error() :: - unexpected_end | - {unexpected_sequence, binary()} | - {invalid_byte, byte()} -``` + -type error() :: + unexpected_end | + {unexpected_sequence, binary()} | + {invalid_byte, byte()} The exceptions might be enhanced through the [Error Info][ERRINFO] mechanism with additional meta-data like byte offset where the error occured. @@ -150,31 +146,29 @@ with additional meta-data like byte offset where the error occured. For the advanced and customizable API, this EEP proposes a callback-based API that the decoder will use to produce values from the data it parses. -```erlang --type from_binary_fun() :: fun((binary()) -> dynamic()). --type array_start_fun() :: fun((Acc :: dynamic()) -> ArrayAcc :: dynamic()). --type array_push_fun() :: fun((Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). --type array_finish_fun() :: fun((ArrayAcc :: dynamic(), OldAcc :: dynamic()) -> {dynamic(), Acc :: dynamic()}). --type object_start_fun() :: fun((Acc :: dynamic()) -> ObjectAcc :: dynamic()). --type object_push_fun() :: fun((Key :: dynamic(), Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). --type object_finish_fun() :: fun((ObjectAcc :: dynamic(), OldAcc :: dynamic()) -> {dynamic(), Acc :: dynamic()}). - --type decoders() :: #{ - array_start => array_start_fun(), - array_push => array_push_fun(), - array_finish => array_finish_fun(), - object_start => object_start_fun(), - object_push => object_push_fun(), - object_finish => object_finish_fun(), - float => from_binary_fun(), - integer => from_binary_fun(), - string => from_binary_fun(), - null => term() -}. - --spec decode(binary(), Acc :: dynamic(), decoders()) -> - {Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()}. -``` + -type from_binary_fun() :: fun((binary()) -> dynamic()). + -type array_start_fun() :: fun((Acc :: dynamic()) -> ArrayAcc :: dynamic()). + -type array_push_fun() :: fun((Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). + -type array_finish_fun() :: fun((ArrayAcc :: dynamic(), OldAcc :: dynamic()) -> {dynamic(), Acc :: dynamic()}). + -type object_start_fun() :: fun((Acc :: dynamic()) -> ObjectAcc :: dynamic()). + -type object_push_fun() :: fun((Key :: dynamic(), Value :: dynamic(), Acc :: dynamic()) -> NewAcc :: dynamic()). + -type object_finish_fun() :: fun((ObjectAcc :: dynamic(), OldAcc :: dynamic()) -> {dynamic(), Acc :: dynamic()}). + + -type decoders() :: #{ + array_start => array_start_fun(), + array_push => array_push_fun(), + array_finish => array_finish_fun(), + object_start => object_start_fun(), + object_push => object_push_fun(), + object_finish => object_finish_fun(), + float => from_binary_fun(), + integer => from_binary_fun(), + string => from_binary_fun(), + null => term() + }. + + -spec decode(binary(), Acc :: dynamic(), decoders()) -> + {Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()}. This allows the user to fully customize the decoded format, including features seen in open-source JSON libraries: @@ -219,11 +213,9 @@ it can return an `{incomplete, continuation()}` value that can be used to decode values split across multiple binary blobs (for example as received from a TCP socket). -```erlang --spec decode_continue(binary(), continuation()) -> - {Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()} | - {incomplete, continuation()}. -``` + -spec decode_continue(binary(), continuation()) -> + {Value :: dynamic(), FinalAcc :: dynamic(), Rest :: binary()} | + {incomplete, continuation()}. Encoding API ------------ @@ -231,41 +223,37 @@ Encoding API For encoding this EEP again proposes two separate sets of APIs. A simple API using "canonical" data types: -```erlang --type encode_value() :: - integer() | - float() | - boolean() | - null | - binary() | - atom() | - list(encode_value()) | - #{binary() | atom() | integer() => encode_value()}. + -type encode_value() :: + integer() | + float() | + boolean() | + null | + binary() | + atom() | + list(encode_value()) | + #{binary() | atom() | integer() => encode_value()}. --spec encode(encode_value()) -> iodata(). -``` + -spec encode(encode_value()) -> iodata(). And an advanced, callback-based API allowing for single-pass encoding of custom data structures. This API is accompanied by a set of functions facilitating the implementation of custom encoding callbacks. -```erlang --type encoder() :: fun((dynamic(), encoder()) -> iodata()). + -type encoder() :: fun((dynamic(), encoder()) -> iodata()). --spec encode(dynamic(), encoder()) -> iodata(). + -spec encode(dynamic(), encoder()) -> iodata(). --spec encode_value(dynamic(), encoder()) -> iodata(). --spec encode_atom(atom(), encoder()) -> iodata(). --spec encode_integer(integer()) -> iodata(). --spec encode_float(float()) -> iodata(). --spec encode_list(list(), encoder()) -> iodata(). --spec encode_map(map(), encoder()) -> iodata(). --spec encode_map_checked(map(), encoder()) -> iodata(). --spec encode_key_value_list([{dynamic(), dynamic()}], encoder()) -> iodata(). --spec encode_key_value_list_checked([{dynamic(), dynamic()}], encoder()) -> iodata(). --spec encode_binary(binary()) -> iodata(). --spec encode_binary_escape_all(binary()) -> iodata(). -``` + -spec encode_value(dynamic(), encoder()) -> iodata(). + -spec encode_atom(atom(), encoder()) -> iodata(). + -spec encode_integer(integer()) -> iodata(). + -spec encode_float(float()) -> iodata(). + -spec encode_list(list(), encoder()) -> iodata(). + -spec encode_map(map(), encoder()) -> iodata(). + -spec encode_map_checked(map(), encoder()) -> iodata(). + -spec encode_key_value_list([{dynamic(), dynamic()}], encoder()) -> iodata(). + -spec encode_key_value_list_checked([{dynamic(), dynamic()}], encoder()) -> iodata(). + -spec encode_binary(binary()) -> iodata(). + -spec encode_binary_escape_all(binary()) -> iodata(). The `encoder()` callback is invoked on every value during traversal. The simple API specified above is equivalent to using the @@ -279,7 +267,6 @@ we provide the optional `encode_binary_escape_all/1` function that will always produce purely ASCII messages encoding all higher unicode values with the `\u` escape sequences. - Formatting and pretty-printing ------------------------------ @@ -293,15 +280,13 @@ encoders. Formatting isn't usually done in critical hot-paths of high-performance services, therefore the overhead of a two-pass formatting is deemed acceptable. -```erlang --type format_option() :: #{ - indent => iodata(), - line_separator => iodata(), - after_colon => iodata() -}. --spec format(iodata()) -> iodata(). --spec format(iodata(), format_option()) -> iodata(). -``` + -type format_option() :: #{ + indent => iodata(), + line_separator => iodata(), + after_colon => iodata() + }. + -spec format(iodata()) -> iodata(). + -spec format(iodata(), format_option()) -> iodata(). Reference Implementation ======================== @@ -319,46 +304,42 @@ Example of a decoding trace Given the following data: -```json -{"a": [[], {}, true, false, null, {"foo": "baz"}], "b": [1, 2.0, "three"]} -``` + {"a": [[], {}, true, false, null, {"foo": "baz"}], "b": [1, 2.0, "three"]} the decoding APIs will be called with the following arguments: -```erlang -object_start(Acc0) => Acc1 - string(<<"a">>) => Str1 - array_start(Acc1) => Acc2 - empty_array() => Arr1 - array_push(Acc2, Arr1) => Acc3 - empty_object() => Obj1 - array_push(Obj1, Acc3) => Acc4 - array_push(true, Acc4) => Acc5 - array_push(false, Acc5) => Acc6 - null() => Null - array_push(Null, Acc6) => Acc7 - object_start(Acc7) => Acc8 - string(<<"foo">>) => Str2 - string(<<"baz">>) => Str3 - object_push(Str2, Str3, Acc8) => Acc9 - object_finish(Acc9) => Obj2 - array_push(Obj2, Acc7) => Acc10 - array_finish(Acc10, Acc1) => {Arr1, Acc11} - object_push(Arr1, Acc11) => Acc12 - string(<<"b">>) => Str4 - array_start(Acc12) => Acc13 - integer(<<"1">>) => Int1 - array_push(Int1, Acc13) => Acc14 - float(<<"2.0">>) => Float1 - array_push(Float1, Acc14) => Acc15 - string(<<"three">>) => Str5 - array_push(Str5, Acc15) => Acc16 - array_finish(Acc16, Acc12) => {Arr2, Acc17} - object_push(Str4, Arr2, Acc17) => Acc18 -object_finish(Acc18, Acc0) => {Obj3, Acc19} -% final decode/3 return -{Obj3, Acc19, <<"">>} -``` + object_start(Acc0) => Acc1 + string(<<"a">>) => Str1 + array_start(Acc1) => Acc2 + empty_array() => Arr1 + array_push(Acc2, Arr1) => Acc3 + empty_object() => Obj1 + array_push(Obj1, Acc3) => Acc4 + array_push(true, Acc4) => Acc5 + array_push(false, Acc5) => Acc6 + null() => Null + array_push(Null, Acc6) => Acc7 + object_start(Acc7) => Acc8 + string(<<"foo">>) => Str2 + string(<<"baz">>) => Str3 + object_push(Str2, Str3, Acc8) => Acc9 + object_finish(Acc9) => Obj2 + array_push(Obj2, Acc7) => Acc10 + array_finish(Acc10, Acc1) => {Arr1, Acc11} + object_push(Arr1, Acc11) => Acc12 + string(<<"b">>) => Str4 + array_start(Acc12) => Acc13 + integer(<<"1">>) => Int1 + array_push(Int1, Acc13) => Acc14 + float(<<"2.0">>) => Float1 + array_push(Float1, Acc14) => Acc15 + string(<<"three">>) => Str5 + array_push(Str5, Acc15) => Acc16 + array_finish(Acc16, Acc12) => {Arr2, Acc17} + object_push(Str4, Arr2, Acc17) => Acc18 + object_finish(Acc18, Acc0) => {Obj3, Acc19} + % final decode/3 return + {Obj3, Acc19, <<"">>} Example of a custom encoder --------------------------- @@ -367,22 +348,18 @@ An example of a custom encoder that would support using a heuristic to differentiate pairs of object-like key-value lists from plain lists of values could look as follows: -```erlang -custom_encode(Value) -> json:encode(Value, fun encoder/2). + custom_encode(Value) -> json:encode(Value, fun encoder/2). -encoder([{_, _} | _] = Value, Encode) -> json:encode_key_value_list(Value, Encode); -encoder(Other, Encode) -> json:encode_value(Other, Encode). -``` + encoder([{_, _} | _] = Value, Encode) -> json:encode_key_value_list(Value, Encode); + encoder(Other, Encode) -> json:encode_value(Other, Encode). Another encoder that supports using Elixir `nil` as Null and protocols for further customisation could look as follows: -```erlang -encoder(nil, _Encode) -> <<"null">>; -encoder(null, _Encode) -> <<"\"null\"">>; -encoder(#{__struct__ => _} = Struct, Encode) -> 'Elixir.JSONProtocol':encode(Struct, Encode); -encoder(Other, Encode) -> json:encode_value(Other, Encode). -``` + encoder(nil, _Encode) -> <<"null">>; + encoder(null, _Encode) -> <<"\"null\"">>; + encoder(#{__struct__ => _} = Struct, Encode) -> 'Elixir.JSONProtocol':encode(Struct, Encode); + encoder(Other, Encode) -> json:encode_value(Other, Encode). [1]: https://www.json.org/json-en.html "Introducing JSON" From e57cdbd93ea60792f8e249f468cc3a4eeee1cb21 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Muska=C5=82a?= Date: Thu, 22 Feb 2024 14:22:32 +0000 Subject: [PATCH 11/11] Update eeps/eep-0068.md Co-authored-by: Lukas Larsson --- eeps/eep-0068.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/eeps/eep-0068.md b/eeps/eep-0068.md index 9adb37e..3453a6a 100644 --- a/eeps/eep-0068.md +++ b/eeps/eep-0068.md @@ -141,7 +141,7 @@ are possible: {invalid_byte, byte()} The exceptions might be enhanced through the [Error Info][ERRINFO] mechanism -with additional meta-data like byte offset where the error occured. +with additional meta-data like byte offset where the error occurred. For the advanced and customizable API, this EEP proposes a callback-based API that the decoder will use to produce values from the data it parses.