From 2eb7b9e7de67fc7152d3e15745929ef85dafcf38 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Tue, 8 Aug 2023 16:58:40 +0100 Subject: [PATCH 01/15] IPIP: CAR `meta` (content type parameter) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- src/http-gateways/trustless-gateway.md | 28 +++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 949e2b0bf..d4fedc634 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -80,7 +80,8 @@ Below response types SHOULD be supported: - [application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car) - Disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned, implementations MAY support optional CAR content type parameters - (:cite[ipip-0412]) and the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request). + (:cite[ipip-0412]), the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request) + and the optional [CAR metadata block](#car-meta-content-type-parameter). - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) - A verifiable :cite[ipns-record] (multicodec `0x0300`). @@ -301,6 +302,31 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as the raw data is already present in the parent block that links to the identity CID. +## CAR `meta` (content type parameter) + +The `meta` parameter allows clients to request the server to include additional metadata about the +CAR to be included at the end of the response body. + +This parameter can be used with `version=1` only. + +When the parameter is not set, the server must not add any extra CAR blocks to the response. + +The metadata block is a regular CAR block with the following properties: + +- CID specifies multicodec `car-metada` (0x04ff), see + [multicodec#334](https://github.com/multiformats/multicodec/pull/334). + +- The payload contains metadata encoded as DAG-CBOR. + +The metadata MUST include the following fields: + +- `len` - byte length of the CAR data (excluding the metadata block) +- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block). +- `b3h_sig` - A signature over `` using server's Ed2559 identity. + - `len` is encoded as `varint`, + - `b3h` is encoded as 32 bytes, + - The effective query as executed by the gateway. This query is the request url - path and query string arguments. + ## CAR format parameters and determinism The default header and block order in a CAR format is not specified by IPLD specifications. From 3814b6a2246e21e07cf11973f81a451f85fc26cf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Tue, 8 Aug 2023 17:00:58 +0100 Subject: [PATCH 02/15] add IPIP/0431-gateway-car-trailer.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- IPIP/0431-gateway-car-trailer.md | 94 ++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) create mode 100644 IPIP/0431-gateway-car-trailer.md diff --git a/IPIP/0431-gateway-car-trailer.md b/IPIP/0431-gateway-car-trailer.md new file mode 100644 index 000000000..08e35cd0c --- /dev/null +++ b/IPIP/0431-gateway-car-trailer.md @@ -0,0 +1,94 @@ +--- +# IPIP number should match its pull request number. After you open a PR, +# please update title and update the filename to `ipip0000`. +title: "IPIP-0431: CAR metadata trailer in Gateway responses" +date: 2023-08-08 +ipip: proposal +editors: + - name: Miroslav Bajtoš +# relatedIssues: +# - link to issue +order: 0000 +tags: ['ipips', 'httpGateways'] +--- + +## Summary + +Define an optional enhancement of the CARv1 stream that allows a Gateway server to provide +additional metadata about the CARv1 response. Introduce a new content type that allows the client +and the server to signal or negotiate the inclusion of extra metadata. + +## Motivation + +SPARK is a Filecoin Station module that measures the reputation of Storage Providers by periodically +retrieving a random CID. Since both SPs and SPARK nodes are permissionless, and Proof of Retrieval +is an unsolved problem, we need a way to verify that a SPARK node retrieved the given CID from the +given SP. To enable that, we need the Trustless Gateway serving the retrieval request to include a +retrieval attestation after the entire response was sent to the client. + +We currently have no mechanism to signal that a CAR file transmission over HTTP completed +successfully. However, we need this in order to be able to use CARs as a way of serving streaming +responses for queries. One way of solving this problem is to append an extra block at the end of the +CAR stream with information that clients can use to check whether all CAR blocks have been received. + +## Detailed design + +CAR content type +(`[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)`) +already supports multiple parameters like `version` and `order`, which allows gateways to indicate +which CAR flavors is returned with the response. + +The proposed solution introduces a new parameters for the content type headers in HTTP requests +and responses: `meta`. + +When the content type parameter `meta` is set to `eof`, the Gateway will write one additional CAR +block with metadata to the response, after it sent all CAR blocks. + +The metadata format is DAG-CBOR and open to extension. + +## Design rationale + +The proposal introduces a minimal change allowing Gateways and retrieval clients to explicitly opt +into receiving additional metadata block at the end of the CAR response stream. + +The metadata block is designed to be very flexible and able to support new use-cases that may arise +in the future. + +### User benefit + +- Clients of trustless gateways can use the fields from the metadata as an attestation that they +performed the retrieval from the given server. + +- The `len` field in the metadata block allows clients to verify whether they received all CAR +bytes. + +### Compatibility + +The new feature requires clients to explicitly ask the server to include the extra block, +therefore the change is fully backwards-compatible for all existing gateway clients. + +Gateways receiving requests for the new content type can ignore the `meta` parameter they don't +support and return back a response with one of the content types they support. This makes the +proposed change backwards-compatible for existing gateways too. + + +### Security + +The proposed specification change does not introduce any negative security implications. + +### Alternatives + +Instead of adding a new content type argument, we were considering sending the additional metadata +in HTTP response trailers. Unfortunately, HTTP trailers are not widely supported by the ecosystem. +Nginx proxy module discards them, browser `Fetch API` do not allow clients to access trailer +headers, neither does the Rust `reqwest` client. + +## Test fixtures + +TBD + +Using one CID, request the CAR data using various combinations of content type parameters. + +### Copyright + +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 1c0fbaa3cfeb23b98169f0c40eb2390a39640b62 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 8 Aug 2023 21:39:02 +0200 Subject: [PATCH 03/15] chore: editora fixes, prep for HTML preview --- IPIP/0431-gateway-car-trailer.md | 69 ++++++++++++++++++-------- src/http-gateways/trustless-gateway.md | 7 +-- 2 files changed, 52 insertions(+), 24 deletions(-) diff --git a/IPIP/0431-gateway-car-trailer.md b/IPIP/0431-gateway-car-trailer.md index 08e35cd0c..0bed2789b 100644 --- a/IPIP/0431-gateway-car-trailer.md +++ b/IPIP/0431-gateway-car-trailer.md @@ -1,15 +1,17 @@ --- -# IPIP number should match its pull request number. After you open a PR, -# please update title and update the filename to `ipip0000`. -title: "IPIP-0431: CAR metadata trailer in Gateway responses" +title: "IPIP-0431: Opt-in Extensible CAR Metadata on Trustless Gateway" date: 2023-08-08 ipip: proposal editors: - name: Miroslav Bajtoš -# relatedIssues: -# - link to issue -order: 0000 -tags: ['ipips', 'httpGateways'] + github: bajtos + affiliation: + name: Protocol Labs + url: https://protocol.ai/ +relatedIssues: + - https://github.com/filecoin-project/boost/issues/1597 +order: 431 +tags: ['ipips'] --- ## Summary @@ -26,25 +28,32 @@ is an unsolved problem, we need a way to verify that a SPARK node retrieved the given SP. To enable that, we need the Trustless Gateway serving the retrieval request to include a retrieval attestation after the entire response was sent to the client. -We currently have no mechanism to signal that a CAR file transmission over HTTP completed -successfully. However, we need this in order to be able to use CARs as a way of serving streaming +Aside from this specific use case, the IPFS Ecosystem at large has no reliable +mechanism to signal that a CAR file transmission over HTTP completed successfully. + +However, we need this in order to be able to use CARs as a way of serving streaming responses for queries. One way of solving this problem is to append an extra block at the end of the CAR stream with information that clients can use to check whether all CAR blocks have been received. ## Detailed design CAR content type -(`[application/vnd.ipld.car](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)`) -already supports multiple parameters like `version` and `order`, which allows gateways to indicate -which CAR flavors is returned with the response. +([`application/vnd.ipld.car`](https://www.iana.org/assignments/media-types/application/vnd.ipld.car)) +already supports optional parameters like `version` and `order`, which allows +HTTP client to opt-in via `Accept` header and Gateway to indicate via +`Content-Type` header which CAR flavor is returned with the response. -The proposed solution introduces a new parameters for the content type headers in HTTP requests +The proposed solution introduces a new parameter for the CAR content type in HTTP requests and responses: `meta`. -When the content type parameter `meta` is set to `eof`, the Gateway will write one additional CAR +When the CAR content type parameter `meta` is set to `eof`, the Gateway will write one additional CAR block with metadata to the response, after it sent all CAR blocks. -The metadata format is DAG-CBOR and open to extension. +The metadata format is DAG-CBOR and open to extension, allowing standardized +userland experimentation similar to the Extensible Data field from IPNS V2. + +See [CAR `meta` (content type parameter)](/http-gateways/trustless-gateway/#car-meta-content-type-parameter) +in Trustless Gateway specification for more details. ## Design rationale @@ -60,15 +69,15 @@ in the future. performed the retrieval from the given server. - The `len` field in the metadata block allows clients to verify whether they received all CAR -bytes. +bytes, which provides a backward-compatible solution for the [CARv1 streaming problem](https://github.com/ipfs/specs/pull/332) until new CAR version is introduced. ### Compatibility -The new feature requires clients to explicitly ask the server to include the extra block, +The new feature requires clients to explicitly ask the server to include the extra block via `Accept` header, therefore the change is fully backwards-compatible for all existing gateway clients. -Gateways receiving requests for the new content type can ignore the `meta` parameter they don't -support and return back a response with one of the content types they support. This makes the +Gateways receiving requests for the CAR content type can ignore the `meta` parameter they don't +support and return back a response with one of the CAR content types they support. This makes the proposed change backwards-compatible for existing gateways too. @@ -78,10 +87,28 @@ The proposed specification change does not introduce any negative security impli ### Alternatives +#### HTTP Trailers + Instead of adding a new content type argument, we were considering sending the additional metadata in HTTP response trailers. Unfortunately, HTTP trailers are not widely supported by the ecosystem. -Nginx proxy module discards them, browser `Fetch API` do not allow clients to access trailer -headers, neither does the Rust `reqwest` client. +Nginx proxy module discards them, [browser `Fetch API` does not allow JS clients to access trailer +headers](https://github.com/mdn/browser-compat-data/issues/14703), neither does the Rust `reqwest` client. + +#### New Content-Type + +We could introduce new `Content-Type: application/vnd.ipld.car-stream` and +create a specification of its wire format that wraps CARv1 and includes +additional DAG-CBOR manifest at the end. It would be effectively the same CAR +byte stream, but with different `Content-Type`. + +Downsides of this solution: + +- maintenance cost, requires duplicating of all CAR-related tests and features +- opportunity cost, in creating new content type, we increase cognitive + overhead for everyone working with IPFS over HTTP +- no backward-compatible interop with existing tools and gateways that only + speak `application/vnd.ipld.car` +- distracts us away from working on things like large blocks and CARv3 ## Test fixtures diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index d4fedc634..946538b05 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -304,16 +304,17 @@ CID. ## CAR `meta` (content type parameter) -The `meta` parameter allows clients to request the server to include additional metadata about the +The `meta=eof` parameter allows clients to request the server to include additional metadata about the CAR to be included at the end of the response body. -This parameter can be used with `version=1` only. +This parameter SHOULD only be used with CAR `version=1`. +Values other than `eof` SHOULD be ignored. When the parameter is not set, the server must not add any extra CAR blocks to the response. The metadata block is a regular CAR block with the following properties: -- CID specifies multicodec `car-metada` (0x04ff), see +- CID specifies multicodec `car-metadata` (`0x04ff`), see [multicodec#334](https://github.com/multiformats/multicodec/pull/334). - The payload contains metadata encoded as DAG-CBOR. From a7e75d71cdcd29bbc70f904137eaca84f93bb1e0 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 8 Aug 2023 21:39:40 +0200 Subject: [PATCH 04/15] chore: enable HTML preview --- IPIP/0431-gateway-car-trailer.md => src/ipips/ipip-0431.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename IPIP/0431-gateway-car-trailer.md => src/ipips/ipip-0431.md (100%) diff --git a/IPIP/0431-gateway-car-trailer.md b/src/ipips/ipip-0431.md similarity index 100% rename from IPIP/0431-gateway-car-trailer.md rename to src/ipips/ipip-0431.md From 68715c4c20d3ffbf96707111468873e3293585e5 Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Tue, 8 Aug 2023 21:46:20 +0200 Subject: [PATCH 05/15] ipip-431: add upside to one of alternatives --- src/ipips/ipip-0431.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index 0bed2789b..74aed6f56 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -101,11 +101,15 @@ create a specification of its wire format that wraps CARv1 and includes additional DAG-CBOR manifest at the end. It would be effectively the same CAR byte stream, but with different `Content-Type`. +Upside of this solution: + +- does not require registering new codec, or sniffing the last DAG-CBOR block + Downsides of this solution: - maintenance cost, requires duplicating of all CAR-related tests and features -- opportunity cost, in creating new content type, we increase cognitive - overhead for everyone working with IPFS over HTTP +- ecosystem opportunity cost, in creating new content type, we increase + cognitive overhead for everyone working with IPFS over HTTP - no backward-compatible interop with existing tools and gateways that only speak `application/vnd.ipld.car` - distracts us away from working on things like large blocks and CARv3 From ed86a0fe2e340b24565dba678649d5a7b0341c3f Mon Sep 17 00:00:00 2001 From: Marcin Rataj Date: Thu, 10 Aug 2023 18:27:24 +0200 Subject: [PATCH 06/15] ipip-431: add CARv3 to Alternatives --- src/ipips/ipip-0431.md | 30 +++++++++++++++++++++++++----- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index 74aed6f56..f47fce654 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -96,14 +96,23 @@ headers](https://github.com/mdn/browser-compat-data/issues/14703), neither does #### New Content-Type -We could introduce new `Content-Type: application/vnd.ipld.car-stream` and -create a specification of its wire format that wraps CARv1 and includes -additional DAG-CBOR manifest at the end. It would be effectively the same CAR -byte stream, but with different `Content-Type`. +We could introduce a new content type that is not CARv3, but a thin envelope +around CARv1 with purpose of streaming over HTTP (e.g. `Content-Type: +application/vnd.ipld.car-stream`). + +It would have three fields: +- `car-stream-header` (optional DAG-CBOR) +- `car` (same as `application/vnd.ipld.car;version=1`) +- `car-stream-end` (optional DAG-CBOR) + +This will be enough to append DAG-CBOR manifest at the end of the stream. It +would be effectively the same CAR byte stream, but with different +`Content-Type`. Upside of this solution: -- does not require registering new codec, or sniffing the last DAG-CBOR block +- does not require registering new codec, or mixing data plane with control + plane, no sniffing the last DAG-CBOR block Downsides of this solution: @@ -114,6 +123,17 @@ Downsides of this solution: speak `application/vnd.ipld.car` - distracts us away from working on things like large blocks and CARv3 +#### Create CARv3 + +We could admit we've clearly hit limitation of what we can do with HTTP and CARv1 and CARv2 and stop abusing existing CARv1 by mixing data plane with control plane. + +Spend energy on creating CARv3 that solves the problems from "Motivation" section and more: +- optional index or key-value metadata before or after data +- native truncation detection and standardized error handling and passing during streaming +- support for things like [Large Blocks](https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093/) + +TODO: link to some public artifact about CARv3 + ## Test fixtures TBD From 5056bdee5bde6169809158a42f6ec5bec8f95429 Mon Sep 17 00:00:00 2001 From: patrickwoodhead Date: Tue, 3 Oct 2023 10:09:38 +0100 Subject: [PATCH 07/15] meta=eof+data update --- src/http-gateways/trustless-gateway.md | 35 +++++++++++++------------- src/ipips/ipip-0431.md | 29 ++++++++++++++++----- 2 files changed, 40 insertions(+), 24 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 946538b05..93eae3281 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -81,7 +81,7 @@ Below response types SHOULD be supported: - Disables IPLD/IPFS deserialization, requests a verifiable CAR stream to be returned, implementations MAY support optional CAR content type parameters (:cite[ipip-0412]), the explicit [CAR format signaling in HTTP Request](#car-format-signaling-in-request) - and the optional [CAR metadata block](#car-meta-content-type-parameter). + and the optional [metadata block](#meta-content-type-parameter). - [application/vnd.ipfs.ipns-record](https://www.iana.org/assignments/media-types/application/vnd.ipfs.ipns-record) - A verifiable :cite[ipns-record] (multicodec `0x0300`). @@ -302,31 +302,30 @@ of their presence in the DAG or the value assigned to the "dups" parameter, as the raw data is already present in the parent block that links to the identity CID. -## CAR `meta` (content type parameter) +## `meta` (content type parameter) -The `meta=eof` parameter allows clients to request the server to include additional metadata about the -CAR to be included at the end of the response body. +The `meta` parameter allows clients to request the server to include additional metadata about the CAR along with the response body. -This parameter SHOULD only be used with CAR `version=1`. -Values other than `eof` SHOULD be ignored. +The value of this parameter includes both the location where the metadata is given (e.g. `eof`) as well as the type of data received (e.g. `json`) separated by a `+`, to give a value such as `meta=eof+json` -When the parameter is not set, the server must not add any extra CAR blocks to the response. +When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with <0x00 byte> . -The metadata block is a regular CAR block with the following properties: +The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be of type `dag-json` (multicodec `0x0129`). -- CID specifies multicodec `car-metadata` (`0x04ff`), see - [multicodec#334](https://github.com/multiformats/multicodec/pull/334). +This parameter MUST only be used with CAR `version=1`. -- The payload contains metadata encoded as DAG-CBOR. +When the parameter is not set or does not equal `eof+json`, the server SHOULD not add any extra blocks to the response, neither the 0x00 byte nor any metadata. -The metadata MUST include the following fields: +When `meta=eof+json`, the dag-json object can include the following keys that SHOULD take values based on their corresponding definiton below. -- `len` - byte length of the CAR data (excluding the metadata block) -- `b3h` - Blake3 hash (checksum) of the CAR data (excluding the metadata block). -- `b3h_sig` - A signature over `` using server's Ed2559 identity. - - `len` is encoded as `varint`, - - `b3h` is encoded as 32 bytes, - - The effective query as executed by the gateway. This query is the request url - path and query string arguments. +- `car_bytes`: The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block) +- `data_bytes`: Total byte length of blocks (excluding the 0x00 byte and the metadata block, but including duplicates when present) +- `block_count`: Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present) +- `car_cid`: A hash of the CAR stream giving a CIDv1 with 0x0202 codec +- `b3checksum`: A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block) +- `content_path`: The url path in the request as executed by the gateway +- `query_params`: The query string in the request as executed by the gateway +- `sig`: A signature, using the server's Ed2559 identity, over all other fields returned in the metadata block ## CAR format parameters and determinism diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index f47fce654..1c4746b7b 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -31,7 +31,7 @@ retrieval attestation after the entire response was sent to the client. Aside from this specific use case, the IPFS Ecosystem at large has no reliable mechanism to signal that a CAR file transmission over HTTP completed successfully. -However, we need this in order to be able to use CARs as a way of serving streaming +We need this in order to be able to use CARs as a way of serving streaming responses for queries. One way of solving this problem is to append an extra block at the end of the CAR stream with information that clients can use to check whether all CAR blocks have been received. @@ -46,11 +46,19 @@ HTTP client to opt-in via `Accept` header and Gateway to indicate via The proposed solution introduces a new parameter for the CAR content type in HTTP requests and responses: `meta`. -When the CAR content type parameter `meta` is set to `eof`, the Gateway will write one additional CAR -block with metadata to the response, after it sent all CAR blocks. +The `meta` parameter allows clients to request the server to include additional metadata about the CAR along with the response body. -The metadata format is DAG-CBOR and open to extension, allowing standardized -userland experimentation similar to the Extensible Data field from IPNS V2. +The value of this parameter includes both the location where the metadata is given (e.g. `eof`) as well as the type of data received (e.g. `json`) separated by a `+`, to give a value such as `meta=eof+json` + +When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with <0x00 byte> . + +The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be of type `dag-json` (multicodec `0x0129`). + +This parameter MUST only be used with CAR `version=1`. + +When the parameter is not set or does not equal `eof+json`, the server SHOULD not add any extra blocks to the response, neither the 0x00 byte nor any metadata. + +This results in a example content type of `application/vnd.ipld.car;version=1;meta=eof+json` See [CAR `meta` (content type parameter)](/http-gateways/trustless-gateway/#car-meta-content-type-parameter) in Trustless Gateway specification for more details. @@ -68,9 +76,18 @@ in the future. - Clients of trustless gateways can use the fields from the metadata as an attestation that they performed the retrieval from the given server. -- The `len` field in the metadata block allows clients to verify whether they received all CAR +- For example, the metadata block could include a `car_bytes` field, the byte length of the CAR stream (excluding the metadata block). This would allow clients to verify whether they received all CAR bytes, which provides a backward-compatible solution for the [CARv1 streaming problem](https://github.com/ipfs/specs/pull/332) until new CAR version is introduced. +- As another example, the metadata block could include the `error` field. This would allow the server to pass back additional information about why the response is an error. + +- In the SPARK use case, retrieval clients would like to prove they have retrieved an entire file from a specific retrieval provder that has implemented the trustless gateway spec. The additional metadata block allows custom checksums and signatures to be passed along with the data, allowing the retrieval client to create a proof of correct retrieval. For SPARK, the metadata SHOULD include the following fields: + - `car_bytes` - total byte length of the CAR stream (excluding the meta block) + - `data_bytes` - total byte length of blocks (excluding the `meta` block, including duplicates when present) + - `block_count` - total number of blocks present in the CAR stream (excluding the `meta` block, including duplicates when present) + - `b3checksum` - A Blake3 hash (checksum) of the CAR data (excluding the metadata block). + - `sig` - A signature over the above fields using server's Ed2559 identity. + ### Compatibility The new feature requires clients to explicitly ask the server to include the extra block via `Accept` header, From 9170c29e207c0a0b070bb00004a24222d692f935 Mon Sep 17 00:00:00 2001 From: patrickwoodhead Date: Wed, 4 Oct 2023 11:39:04 +0100 Subject: [PATCH 08/15] metadata schema update including json schema usage --- src/http-gateways/trustless-gateway.md | 63 ++++++++++++++++++++++---- src/ipips/ipip-0431.md | 41 +++++++++++++---- 2 files changed, 86 insertions(+), 18 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 93eae3281..9e59258bc 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -310,22 +310,65 @@ The value of this parameter includes both the location where the metadata is giv When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with <0x00 byte> . -The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be of type `dag-json` (multicodec `0x0129`). +The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be a JSON object. This parameter MUST only be used with CAR `version=1`. When the parameter is not set or does not equal `eof+json`, the server SHOULD not add any extra blocks to the response, neither the 0x00 byte nor any metadata. -When `meta=eof+json`, the dag-json object can include the following keys that SHOULD take values based on their corresponding definiton below. +When `meta=eof+json`, the JSON object SHOULD conform to the following [JSON schema](https://json-schema.org/). + +```json +{ + "properties": { + "description": "Properties of the response" + "type": "object" + }, + "error": { + "description": "Error message" + "type": "string" + }, + "sig": { + "description": "A signature, using the server's Ed2559 identity, over the metadata properties object" + "type": "string" + } +} +``` -- `car_bytes`: The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block) -- `data_bytes`: Total byte length of blocks (excluding the 0x00 byte and the metadata block, but including duplicates when present) -- `block_count`: Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present) -- `car_cid`: A hash of the CAR stream giving a CIDv1 with 0x0202 codec -- `b3checksum`: A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block) -- `content_path`: The url path in the request as executed by the gateway -- `query_params`: The query string in the request as executed by the gateway -- `sig`: A signature, using the server's Ed2559 identity, over all other fields returned in the metadata block +The properties object can include any fields that the server would like to implement. The following properties fields are mentioned explicitly to reach a convention on their definition as they have existing use cases. + +```json +{ + "car_bytes": { + "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "integer" + }, + "data_bytes": { + "description": "Total byte length of blocks (excluding the 0x00 byte and the metadata block, but including duplicates when present)", + "type": "integer" + }, + "block_count": { + "description": "Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present)", + "type": "integer" + }, + "car_cid": { + "description": "A hash of the CAR stream giving a CIDv1 with 0x0202 codec", + "type": "string" + }, + "b3checksum": { + "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "string" + }, + "content_path": { + "description": "The url path in the request as executed by the gateway", + "type": "string" + }, + "query_params": { + "description": "The query string in the request as executed by the gateway", + "type": "string" + } +} +``` ## CAR format parameters and determinism diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index 1c4746b7b..e1bad1ae3 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -79,14 +79,39 @@ performed the retrieval from the given server. - For example, the metadata block could include a `car_bytes` field, the byte length of the CAR stream (excluding the metadata block). This would allow clients to verify whether they received all CAR bytes, which provides a backward-compatible solution for the [CARv1 streaming problem](https://github.com/ipfs/specs/pull/332) until new CAR version is introduced. -- As another example, the metadata block could include the `error` field. This would allow the server to pass back additional information about why the response is an error. - -- In the SPARK use case, retrieval clients would like to prove they have retrieved an entire file from a specific retrieval provder that has implemented the trustless gateway spec. The additional metadata block allows custom checksums and signatures to be passed along with the data, allowing the retrieval client to create a proof of correct retrieval. For SPARK, the metadata SHOULD include the following fields: - - `car_bytes` - total byte length of the CAR stream (excluding the meta block) - - `data_bytes` - total byte length of blocks (excluding the `meta` block, including duplicates when present) - - `block_count` - total number of blocks present in the CAR stream (excluding the `meta` block, including duplicates when present) - - `b3checksum` - A Blake3 hash (checksum) of the CAR data (excluding the metadata block). - - `sig` - A signature over the above fields using server's Ed2559 identity. +- As another example, the metadata object includes the `error` field, allowing the server to pass back additional information about why the response is an error, such as why the CAR stream was incomplete. + +- In the SPARK use case, retrieval clients would like to prove they have retrieved an entire file from a specific retrieval provder that has implemented the trustless gateway spec. The additional metadata block allows checksums and signatures to be passed along with the data, allowing the retrieval client to create a proof of correct retrieval. For SPARK, the metadata properties object SHOULD include the following fields: + +```json +{ + "car_bytes": { + "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "integer" + }, + "data_bytes": { + "description": "Total byte length of blocks (excluding the 0x00 byte and the metadata block, but including duplicates when present)", + "type": "integer" + }, + "block_count": { + "description": "Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present)", + "type": "integer" + }, + "b3checksum": { + "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "string" + }, + "content_path": { + "description": "The url path in the request as executed by the gateway", + "type": "string" + }, + "query_params": { + "description": "The query string in the request as executed by the gateway", + "type": "string" + } +} +``` +The metadata `sig` field SHOULD also be populated, returning a signature, using the server's Ed2559 identity, over the metadata properties object. ### Compatibility From 65ffcfc7541efab7d32c6adfceffbd9c148531ae Mon Sep 17 00:00:00 2001 From: patrickwoodhead <91056047+patrickwoodhead@users.noreply.github.com> Date: Wed, 4 Oct 2023 16:24:39 +0100 Subject: [PATCH 09/15] Update content path and query params to retrieval params MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Miroslav Bajtoš --- src/http-gateways/trustless-gateway.md | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 9e59258bc..637058ac4 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -359,13 +359,9 @@ The properties object can include any fields that the server would like to imple "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", "type": "string" }, - "content_path": { - "description": "The url path in the request as executed by the gateway", - "type": "string" - }, - "query_params": { - "description": "The query string in the request as executed by the gateway", - "type": "string" + "retrieval_params": { + "description": "Retrieval parameters describing what the client requested from the gateway", + "type": "object" } } ``` From 9d1b61f49a18712d1e34ccda598aef9225a91267 Mon Sep 17 00:00:00 2001 From: patrickwoodhead Date: Wed, 4 Oct 2023 16:37:48 +0100 Subject: [PATCH 10/15] fixes from Miros feedback --- src/http-gateways/trustless-gateway.md | 2 +- src/ipips/ipip-0431.md | 10 +--------- 2 files changed, 2 insertions(+), 10 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 637058ac4..a834cf175 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -344,7 +344,7 @@ The properties object can include any fields that the server would like to imple "type": "integer" }, "data_bytes": { - "description": "Total byte length of blocks (excluding the 0x00 byte and the metadata block, but including duplicates when present)", + "description": "Total byte length of the flat file before it was encoded into a CAR file", "type": "integer" }, "block_count": { diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index e1bad1ae3..481408845 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -52,7 +52,7 @@ The value of this parameter includes both the location where the metadata is giv When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with <0x00 byte> . -The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be of type `dag-json` (multicodec `0x0129`). +The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be a JSON object. This parameter MUST only be used with CAR `version=1`. @@ -89,14 +89,6 @@ bytes, which provides a backward-compatible solution for the [CARv1 streaming pr "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", "type": "integer" }, - "data_bytes": { - "description": "Total byte length of blocks (excluding the 0x00 byte and the metadata block, but including duplicates when present)", - "type": "integer" - }, - "block_count": { - "description": "Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present)", - "type": "integer" - }, "b3checksum": { "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", "type": "string" From 93c3c2825930a9b786b9b1652af6b0c5f8c85288 Mon Sep 17 00:00:00 2001 From: patrickwoodhead Date: Wed, 4 Oct 2023 16:45:59 +0100 Subject: [PATCH 11/15] json schema wrapper around top level object --- src/http-gateways/trustless-gateway.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index a834cf175..61140e5f8 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -320,17 +320,21 @@ When `meta=eof+json`, the JSON object SHOULD conform to the following [JSON sche ```json { + "type": "object", "properties": { - "description": "Properties of the response" - "type": "object" - }, - "error": { - "description": "Error message" - "type": "string" - }, - "sig": { - "description": "A signature, using the server's Ed2559 identity, over the metadata properties object" - "type": "string" + "data": { + "description": "Properties of the response" + "type": "object" + }, + "error": { + "description": "Error message" + "type": "string" + }, + "sig": { + "description": "A signature, using the server's Ed2559 identity, over the metadata properties object" + "type": "string" + }, + "required": [] } } ``` From eacf51aa936ece3602c42d55193a7cd68dae2d89 Mon Sep 17 00:00:00 2001 From: patrickwoodhead Date: Fri, 13 Oct 2023 10:13:54 +0100 Subject: [PATCH 12/15] dag and car params --- src/http-gateways/trustless-gateway.md | 78 ++++++++++++++++++-------- src/ipips/ipip-0431.md | 60 +++++++++++++++----- 2 files changed, 99 insertions(+), 39 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 61140e5f8..1dcb73a92 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -339,34 +339,64 @@ When `meta=eof+json`, the JSON object SHOULD conform to the following [JSON sche } ``` -The properties object can include any fields that the server would like to implement. The following properties fields are mentioned explicitly to reach a convention on their definition as they have existing use cases. +The properties object can include any fields that the server would like to implement. The following JSON schema explicitly mentions certain properties fields in order to reach a convention on their definition as they have existing use cases. ```json { - "car_bytes": { - "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", - "type": "integer" - }, - "data_bytes": { - "description": "Total byte length of the flat file before it was encoded into a CAR file", - "type": "integer" - }, - "block_count": { - "description": "Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present)", - "type": "integer" - }, - "car_cid": { - "description": "A hash of the CAR stream giving a CIDv1 with 0x0202 codec", - "type": "string" - }, - "b3checksum": { - "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", - "type": "string" + "type": "object", + "properties": { + "car_bytes": { + "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "integer" + }, + "data_bytes": { + "description": "Total byte length of the flat file before it was encoded into a CAR file", + "type": "integer" + }, + "block_count": { + "description": "Total number of blocks present in the CAR stream (excluding the 0x00 byte and the metadata block, but including duplicates when present)", + "type": "integer" + }, + "car_cid": { + "description": "A hash of the CAR stream giving a CIDv1 with 0x0202 codec", + "type": "string" + }, + "b3checksum": { + "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "string" + }, + "dag_params": { + "description": "A map with DAG params like dag-scope, entity-bytes from [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/)", + "type": "object", + "properties": { + "dag-scope": { + "description": "See [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/) for the definition", + "type": "string" + }, + "entity-bytes": { + "description": "See [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/) for the definition", + "type": "string" + } + }, + "required": [] + }, + "car_params": { + "description": "A map with CAR content type params like order and dups from [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/)", + "type": "object", + "properties": { + "order": { + "description": "See [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/) for the definition.", + "type": "string" + }, + "dups": { + "description": "See [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/) for the definition.", + "type": "string" + } + }, + "required": [] + } }, - "retrieval_params": { - "description": "Retrieval parameters describing what the client requested from the gateway", - "type": "object" - } + "required": [] } ``` diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index 481408845..8e232c784 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -85,21 +85,51 @@ bytes, which provides a backward-compatible solution for the [CARv1 streaming pr ```json { - "car_bytes": { - "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", - "type": "integer" - }, - "b3checksum": { - "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", - "type": "string" - }, - "content_path": { - "description": "The url path in the request as executed by the gateway", - "type": "string" - }, - "query_params": { - "description": "The query string in the request as executed by the gateway", - "type": "string" + "type": "object", + "properties": { + "car_bytes": { + "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "integer" + }, + "b3checksum": { + "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", + "type": "string" + }, + "content_path": { + "description": "The url path in the request as executed by the gateway", + "type": "string" + }, + "dag_params": { + "description": "A map with DAG params like dag-scope, entity-bytes from [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/)", + "type": "object", + "properties": { + "dag-scope": { + "description": "See [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/) for the definition", + "type": "string" + }, + "entity-bytes": { + "description": "See [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/) for the definition", + "type": "string" + } + }, + "required": [] + }, + "car_params": { + "description": "A map with CAR content type params like order and dups from [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/)", + "type": "object", + "properties": { + "order": { + "description": "See [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/) for the definition.", + "type": "string" + }, + "dups": { + "description": "See [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/) for the definition.", + "type": "string" + } + }, + "required": [] + }, + "required": ["car_bytes", "b3checksum", "content_path", "dag_params", "car_params"] } } ``` From 62fb207f2edcaed85e2a2ec9fd6cbf0765e3cc55 Mon Sep 17 00:00:00 2001 From: patrickwoodhead Date: Fri, 13 Oct 2023 13:08:18 +0100 Subject: [PATCH 13/15] more alternatvies from discussion added --- src/ipips/ipip-0431.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index 8e232c784..bba336216 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -196,7 +196,15 @@ Spend energy on creating CARv3 that solves the problems from "Motivation" sectio - native truncation detection and standardized error handling and passing during streaming - support for things like [Large Blocks](https://discuss.ipfs.tech/t/supporting-large-ipld-blocks/15093/) -TODO: link to some public artifact about CARv3 +TODO: link to some public artifact about CARv3 + +#### Create a new multicodec for this metadata block + +Initially, we proposed to create a new multicodec for this metadata block called `car-metadata`. This was ruled out due to some concerns that you can find documented [here](https://github.com/multiformats/multicodec/pull/334#issuecomment-1668086641). + +#### Using CBOR instead of JSON for the metadata block + +We could use CBOR instead of JSON for the metadata block. However it was [decided](https://github.com/ipfs/specs/pull/431#issuecomment-1719634928) to opt for user readibility over number of bytes since CBOR doesn't greatly reduce the number of bytes in a key value map compared with JSON. ## Test fixtures From e1fc2963ff1c1ceee0348778743bab21f641fdd0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 18 Oct 2023 14:18:46 +0200 Subject: [PATCH 14/15] add Patrick as a co-editor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- src/ipips/ipip-0431.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index bba336216..a2f96219f 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -8,6 +8,11 @@ editors: affiliation: name: Protocol Labs url: https://protocol.ai/ + - name: Patrick Woodhead + github: patrickwoodhead + affiliation: + name: Protocol Labs + url: https://protocol.ai/ relatedIssues: - https://github.com/filecoin-project/boost/issues/1597 order: 431 From 152f4a67feda0a695fddced5022c86e13550d447 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 18 Oct 2023 16:03:18 +0200 Subject: [PATCH 15/15] formatting cleanup, remove duplicate schema, describe attack vectors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- src/http-gateways/trustless-gateway.md | 20 ++++-- src/ipips/ipip-0431.md | 93 ++++++++++---------------- 2 files changed, 48 insertions(+), 65 deletions(-) diff --git a/src/http-gateways/trustless-gateway.md b/src/http-gateways/trustless-gateway.md index 1dcb73a92..9b7059eb5 100644 --- a/src/http-gateways/trustless-gateway.md +++ b/src/http-gateways/trustless-gateway.md @@ -308,7 +308,11 @@ The `meta` parameter allows clients to request the server to include additional The value of this parameter includes both the location where the metadata is given (e.g. `eof`) as well as the type of data received (e.g. `json`) separated by a `+`, to give a value such as `meta=eof+json` -When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with <0x00 byte> . +When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with the following response body: + +``` + <0x00 byte> +``` The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be a JSON object. @@ -323,16 +327,16 @@ When `meta=eof+json`, the JSON object SHOULD conform to the following [JSON sche "type": "object", "properties": { "data": { + "type": "object", "description": "Properties of the response" - "type": "object" }, "error": { + "type": "string", "description": "Error message" - "type": "string" }, "sig": { - "description": "A signature, using the server's Ed2559 identity, over the metadata properties object" - "type": "string" + "type": "string", + "description": "A signature, using the server's Ed2559 identity, over the `data` object serialized as JSON." }, "required": [] } @@ -362,7 +366,11 @@ The properties object can include any fields that the server would like to imple "type": "string" }, "b3checksum": { - "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", + "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block). The value should be serialized as a multihash with multibase prefix, preferably using Base58 encoding.", + "type": "string" + }, + "content_path": { + "description": "The url path in the request as executed by the gateway, e.g. `/ipfs/bafy1234/cat.jpg`. The query string MUST BE stripped from the path.", "type": "string" }, "dag_params": { diff --git a/src/ipips/ipip-0431.md b/src/ipips/ipip-0431.md index a2f96219f..190486912 100644 --- a/src/ipips/ipip-0431.md +++ b/src/ipips/ipip-0431.md @@ -21,8 +21,8 @@ tags: ['ipips'] ## Summary -Define an optional enhancement of the CARv1 stream that allows a Gateway server to provide -additional metadata about the CARv1 response. Introduce a new content type that allows the client +Define an optional enhancement of the CARv1 response that allows a Gateway server to provide +additional metadata about the CARv1 stream. Introduce a new content type that allows the client and the server to signal or negotiate the inclusion of extra metadata. ## Motivation @@ -30,13 +30,13 @@ and the server to signal or negotiate the inclusion of extra metadata. SPARK is a Filecoin Station module that measures the reputation of Storage Providers by periodically retrieving a random CID. Since both SPs and SPARK nodes are permissionless, and Proof of Retrieval is an unsolved problem, we need a way to verify that a SPARK node retrieved the given CID from the -given SP. To enable that, we need the Trustless Gateway serving the retrieval request to include a +given SP. To enable that, we want the Trustless Gateway serving the retrieval request to include a retrieval attestation after the entire response was sent to the client. Aside from this specific use case, the IPFS Ecosystem at large has no reliable mechanism to signal that a CAR file transmission over HTTP completed successfully. -We need this in order to be able to use CARs as a way of serving streaming +We need such signalling mechanism in order to be able to use CARs as a way of serving streaming responses for queries. One way of solving this problem is to append an extra block at the end of the CAR stream with information that clients can use to check whether all CAR blocks have been received. @@ -55,7 +55,11 @@ The `meta` parameter allows clients to request the server to include additional The value of this parameter includes both the location where the metadata is given (e.g. `eof`) as well as the type of data received (e.g. `json`) separated by a `+`, to give a value such as `meta=eof+json` -When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with <0x00 byte> . +When the location parameter is set to `eof`, which is currently the only supported value, the server SHOULD respond with the following response body: + +``` + <0x00 byte> +``` The only supported value for the data type parameter is `json`. This signifies that the metadata MUST be a JSON object. @@ -86,59 +90,9 @@ bytes, which provides a backward-compatible solution for the [CARv1 streaming pr - As another example, the metadata object includes the `error` field, allowing the server to pass back additional information about why the response is an error, such as why the CAR stream was incomplete. -- In the SPARK use case, retrieval clients would like to prove they have retrieved an entire file from a specific retrieval provder that has implemented the trustless gateway spec. The additional metadata block allows checksums and signatures to be passed along with the data, allowing the retrieval client to create a proof of correct retrieval. For SPARK, the metadata properties object SHOULD include the following fields: - -```json -{ - "type": "object", - "properties": { - "car_bytes": { - "description": "The total byte length of the CAR stream (excluding the 0x00 byte and the metadata block)", - "type": "integer" - }, - "b3checksum": { - "description": "A Blake3 hash (checksum) of the CAR stream (excluding the 0x00 byte and the metadata block)", - "type": "string" - }, - "content_path": { - "description": "The url path in the request as executed by the gateway", - "type": "string" - }, - "dag_params": { - "description": "A map with DAG params like dag-scope, entity-bytes from [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/)", - "type": "object", - "properties": { - "dag-scope": { - "description": "See [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/) for the definition", - "type": "string" - }, - "entity-bytes": { - "description": "See [IPIP-402](https://specs.ipfs.tech/ipips/ipip-0402/) for the definition", - "type": "string" - } - }, - "required": [] - }, - "car_params": { - "description": "A map with CAR content type params like order and dups from [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/)", - "type": "object", - "properties": { - "order": { - "description": "See [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/) for the definition.", - "type": "string" - }, - "dups": { - "description": "See [IPIP-412](https://specs.ipfs.tech/ipips/ipip-0412/) for the definition.", - "type": "string" - } - }, - "required": [] - }, - "required": ["car_bytes", "b3checksum", "content_path", "dag_params", "car_params"] - } -} -``` -The metadata `sig` field SHOULD also be populated, returning a signature, using the server's Ed2559 identity, over the metadata properties object. +- In the SPARK use case, retrieval clients would like to prove they have retrieved an entire file from a specific retrieval provider that has implemented the trustless gateway spec. The additional metadata block allows checksums and signatures to be passed along with the data, allowing the retrieval client to create a proof of correct retrieval. + +- The metadata `sig` field SHOULD also be populated, returning a signature, using the server's Ed2559 identity, over the metadata properties object. This allows gateway clients to submit the metadata block as an attestation of retrieval that 3rd parties can verify. ### Compatibility @@ -149,10 +103,31 @@ Gateways receiving requests for the CAR content type can ignore the `meta` param support and return back a response with one of the CAR content types they support. This makes the proposed change backwards-compatible for existing gateways too. +All metadata fields are optional to allow different applications to experiment with different metadata. Future IPIPs may standardize metadata fields that are observed to be widely used. ### Security -The proposed specification change does not introduce any negative security implications. +#### Zero-length-block insertion attacks + +The idea of using the zero-length block (a single byte `0x00`) to signal the end of the CARv1 stream has been already considered in the past. + +> CARv1 is nicely sectioned, such that each section has a specific length, you know when it ends. In the [ZeroLengthSectionAsEOF](https://pkg.go.dev/github.com/ipld/go-car/v3#ZeroLengthSectionAsEOF) mode, when it gets to a new section and reads a 0x00, i.e. zero length (sections are prefixed with a length varint), it treats that as the end of the CAR. So all it takes with this turned on is to attach a 0x00 to the end of a stream and you get your EOF. +> +> The background for this is the power-of-two padding that is needed for a Filecoin sector — stick a CAR into the sector and fill it out with zeros but have no way of saying that the CAR is x-bytes long; hence the need for an EOF signal, which is this. + +However, introducing a `0x00` into CARv1 spec would create a security vulnerability: +- Tools and services not aware of this new semantics will happily accept a CARv1 payload containing zero-length blocks in the middle. +- Tools and services treating `0x00` as EOF will discard the remaining blocks in such CARv1 file + after encountering the zero-length block. + + Our proposal avoids this attack vector: + - It does not change the current semantics of CARv1. Zero-length blocks remain invalid. + - Instead, we treat the response body as a new container format combining the CARv1 file with additional data. + - Clients must explicitly request this new container format. Existing clients not aware of the new metadata will not receive responses in the new format. + +#### Denial of Service attacks + +Computing the signature for the metadata blcok has a non-negligible performance cost. To mittigate DoS attacks, we designed the metadata to be highly cacheable. When a gateway receives two requests for the same content, it can return the same metadata block in both responses, including the signature. This allows gateway operators to deploy a traditional caching layer operating at the HTTP protocol, the cache does not need to understand any specifics of IPFS and Trustless Gateway protocols. ### Alternatives