Add RTCRtpEncodedSource and explainer #198

guidou · 2024-02-08T13:22:58Z

Add an explainer for the upcoming RTCRtpSenderEncodedSource extension proposal

Preview | Diff

guidou · 2024-03-05T15:55:31Z

For easier reading: https://guidou.github.io/webrtc-extensions/#encoded-source-for-rtc-rtp-sender

aboba · 2024-03-07T15:26:55Z

It looks like this PR inadvertently removes Section 14 Event Summary. I fixed this.

guidou · 2024-03-14T10:08:09Z

It looks like this PR inadvertently removes Section 14 Event Summary. I fixed this.

Thanks!

youennf · 2024-03-14T15:36:43Z

My main thought is that this API requires to wait for all packets of a frame, which creates some latency. This solution does not allow to do what an SFU is doing with the same level of performance. It might be a good enough compromise. If we already consider supporting packet forwarding, maybe that is what we should do instead.

API wise, transferring seems harder than using what we have done for RTCRtpScriptTransform. We should consider the pros and cons of both approaches.
For instance, the current proposal is allowing to transfer to cross agent cluster dedicated workers, which is not allowed with RTCRtpScriptTransform. I see that more as an issue than a feature.
I also think that an API where you create the source where you use it (instead of creating in worker and then transferring) is slightly easier to use.
From an implementation point of view, it would also be simpler (at least in Safari) compared to supporting transferability.
From a spec point of view, it would also be simpler since we would not have to deal with when you can transfer and when you cannot.

It is not clear why RTCRtpSenderEncodedSource is not providing a WritableStream like done for RTCRtpScriptTransform.

guidou · 2024-03-18T13:48:17Z

My main thought is that this API requires to wait for all packets of a frame, which creates some latency. This solution does not allow to do what an SFU is doing with the same level of performance. It might be a good enough compromise. If we already consider supporting packet forwarding, maybe that is what we should do instead.

This is a good compromise for the use case of glitch-free forwarding with multiple input peer-connections with failover. In this case, frames provide a convenient abstraction to do failover quickly (no need for timeouts, just forward a frame from the first peer connection that provides it). It is also ergonomic for some other SFU-like operations where the outcome is frame-based (e.g., drop frames that don't satisfy a certain property).

There is also an effort going on for a packet-level API, although we have not heard yet about developers interested in that API for the use case of glitch-free forwarding using multiple input peer connections. There are other use cases driving the design of that API.

API wise, transferring seems harder than using what we have done for RTCRtpScriptTransform. We should consider the pros and cons of both approaches. For instance, the current proposal is allowing to transfer to cross agent cluster dedicated workers, which is not allowed with RTCRtpScriptTransform. I see that more as an issue than a feature. I also think that an API where you create the source where you use it (instead of creating in worker and then transferring) is slightly easier to use.

We actually only need to transfer to workers within the same agent cluster, so maybe a clarification is needed in the text of the proposed spec. However, I do agree with you that it is better to be able to create the source where you use it.

The main reasons the proposal exposes RTCRtpSenderEncodedSource only on DedicatedWorker are:

Your proposal on which this is text is based exposes it only on workers
In other similar APIs we have always had agreement about exposing on DedicatedWorker, and disagreement about exposing on Window. So, I thought exposing only on DedicatedWorker would lead to faster consensus, while the larger discussion about exposing things on Window could continue separately.

From an implementation point of view, it would also be simpler (at least in Safari) compared to supporting transferability From a spec point of view, it would also be simpler since we would not have to deal with when you can transfer and when you cannot.

I agree with you on this point. The idea was to deviate as little as possible from w3c/webrtc-encoded-transform#211 (comment) in order to make it easier to achieve consensus, but if you agree about exposing RTCRtpSenderEncodedSource on Window, then we can simplify this by removing transferrability.

It is not clear why RTCRtpSenderEncodedSource is not providing a WritableStream like done for RTCRtpScriptTransform.

I agree that a WritableStream would provide more flexibility here. Again, the reason was to minimize deviations from w3c/webrtc-encoded-transform#211 (comment)

If I understand correctly, your concerns can be summarized as follows:

The extra latency a frame-based API has compared with a potential packet-based API
RTCRtpSenderEncodedSource is exposed only on DedicatedWorker.
RTCRtpSenderEncodedSourceHandle is transferrable.
RTCRtpSenderEncodedSource does not expose a WritableStream to write frames.

We have already presented arguments in favor of the frame-based API and in previous discussions we have concluded that while latency may be a small disadvantage in some cases, frame-based has some clear advantages, in particular for the scenario of forwarding with glitch-free failover over multiple input peer connections.

The concerns about the spec text I think can be addressed as follows:

Expose RTCRtpSenderEncodedSource on Window.
Make RTCRtpSenderEncodedSourceHandle not transferrable, or maybe just remove it and use the source directly.
Expose a WritableStream instead of an enqueue() method.

WDYT?

youennf · 2024-03-18T17:40:57Z

I would summarise my concern as:

The potential redundancy and the extra latency a frame-based API has compared with a potential packet-based API
My other non blocking comment is API shape (keeping the feature set as it is, meaning let's keep it to dedicated worker in this version):
Evaluate transfer-based vs. script-transform-like API

As I commented on w3c/webrtc-encoded-transform#211 (comment), there are two different API proposals. The first one is well described in the explainer.
The second is a script transform like API, something like:

[Exposed=Window]
interface RTCRtpSenderEncodedSource {
    constructor(Worker worker, optional any options, optional sequence<object> transfer)
};
[Exposed=DedicatedWorker]
interface RTCRtpSenderEncodedSourceController {
    WritableStream writable; // Or enqueue method.
    // Need congestion API, error API and enqueue API
};
partial interface DedicatedWorkerGlobalScope {
    attribute EventHandler onrtcencodedsource;
}

From a user perspective, there is probably no difference.
There are probably some differences in terms of future-proofness, ease of use, ease of spec/dev...

jan-ivar · 2024-03-18T21:33:17Z

relay-explainer.md

+```
+// code in main.js file
+const worker = new Worker('worker.js');
+
+// Let recvPc1, recvPc2 be the receiving PCs. 
+recvPc{1|2}.ontrack = evt => {
+ evt.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverTransform" });
+};
+
+
+// Let relayPc be the PC used to relay frames to the next peer. 
+worker.onmessage = evt => {
+ relayPc.replaceTrack(evt.data);
+};
+```
+
+```
+// code in worker.js file
+async function relayFrames(reader, writer, encodedSource) {
+ if(!reader || !writer || !encodedSource){
+ return;
+ }
+ while (true) {
+ const {frame, done} = await reader.read();
+ if (done) return;
+
+ let newFrame = new RTCRtpEncodedVideoFrame(frame, getUnifiedMetadata(frame));
+ if(!isDuplicate(newFrame)) {
+ encodedSource.enqueue(newFrame);
+ }
+ // Put the original frame back in the receiver PC
+ writer.write(frame);
+ }
+}
+
+// Code to instantiate reader and writer from the RTPReceiver and RTPSender.
+onrtctransform = (event) => {
+ if (event.transformer.options.name == "receiverTransform") {
+ reader = event.transformer.readable;
+ writer = event.transformer.writable;
+ if (!encodedSource) {
+ encodedSource = new RTCRtpSenderEncodedSource();
+ postMessage(encodedSource.handle);
+ }
+ } else {
+ return;
+ }
+
+ relayFrames(reader, writer, encodedSource);
+};
+```


@guidou I think I found a couple of bugs and I'd prefer if we used .pipeTo. LMK if I got this right:

// main.js const worker = new Worker('worker.js'); // Let recvPc1, recvPc2 be the receiving PCs. recvPc1.ontrack = ({receiver}) => receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"}); recvPc2.ontrack = ({receiver}) => receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"}); // Let relayPc be the PC used to relay frames to the next peer. const [sender] = relayPc.getSenders(); worker.onmessage = async ({data}) => await sender.replaceTrack(data.handle); // worker.js let encodedSource; onrtctransform = async ({transformer: {readable, writable, options}}) => { if (options.name != "receiverTransform") return; if (!encodedSource) { encodedSource = new RTCRtpSenderEncodedSource(); postMessage({handle: encodedSource.handle}); } await readable.pipeThrough(new TransformStream({transform})).pipeTo(writable); function transform(frame, controller) { const newFrame = new RTCRtpEncodedVideoFrame(frame, getUnifiedMetadata(frame)); if (!isDuplicate(newFrame)) { encodedSource.enqueue(newFrame); } controller.enqueue(frame); } }

But how well will this will work? recvPc1 and recvPc2 are racing here to provide frames to a single outgoing relayPC? You mention redundancy as the reason, but it's odd seeing recvPc1, recvPc2 being merged into a single output. It looks like fan-in, not a fan-out. Is this important to emphasize in the example?

I think I found a couple of bugs and I'd prefer if we used .pipeTo. LMK if I got this right:
Yes, you got it right. I'll update the explainer to use this version.

But how well will this will work? recvPc1 and recvPc2 are racing here to provide frames to a single outgoing relayPC? You mention redundancy as the reason, but it's odd seeing recvPc1, recvPc2 being merged into a single output. It looks like fan-in, not a fan-out. Is this important to emphasize in the example?

Yes, I would call this fan-in and it is key for glitch-free failover, so it important to emphasize it.
I will also add fan-out to the example for completeness.

But what about the racing? onrtctransform is going to fire twice here and free-run pumping frames from recvPc1 and recvPc2, two independent peer connections, through the transform function in parallel. What syncs up the output from these two independent sources? What assures these encoded frames have any relation? isDuplicate?

For fail-over, what's wrong with replaceTrack?

Redo based on @youennf's proposed API:

// main.js const worker = new Worker('worker.js'); // Let recvPc1, recvPc2 be the receiving PCs. recvPc1.ontrack = ({receiver}) => receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"}); recvPc2.ontrack = ({receiver}) => receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"}); // Let relayPc be the PC used to relay frames to the next peer. const [sender] = relayPc.getSenders(); const encodedSource = new RTCRtpSenderEncodedSource(worker); await sender.replaceTrack(encodedSource); // worker.js let encodedWritable; onrtcencodedsource = ({controller: {writable}}) => encodedWritable = writable; onrtctransform = async ({transformer: {readable, writable, options}}) => { if (options.name != "receiverTransform") return; const [readable1, readable2] = readable.tee(); await Promise.all([ readable1.pipeTo(writable), readable2.pipeThrough(new TransformStream({transform})).pipeTo(encodedWritable) ]); function transform(frame, controller) { const newFrame = new RTCRtpEncodedVideoFrame(frame, getUnifiedMetadata(frame)); if (!isDuplicate(newFrame)) { controller.enqueue(newFrame); } } }

This doesn't solve my racing concerns, but I thought it might be helpful to show what the application code might look like with the different API shapes.

E.g. note this uses the tee() function which might run into whatwg/streams#1156 since unlike WebCodecs our encoded chunks are not immutable (should be fine in this case since there's no transform on readable1, but it's brittle).

jan-ivar · 2024-03-18T22:31:21Z

How does this stack up against just optimizing the obvious/inoptimal API? E.g.

// Let relayPc be the PC used to relay frames to the next peer.
const [sender] = relayPc.getSenders();
// Let recvPc be the receiving PC
recvPc.ontrack = ({receiver}) => await sender.replaceTrack(receiver.track);
sender.transform = new RTCRtpScriptTransform(worker, {});

Browsers could detect when the input is a receiver.track and forward received RTCEncodedVideoFrames directly on the sender transform's readable (skipping decode and re-encode) where JS can modify metadata.

guidou · 2024-03-20T12:29:07Z

I would summarise my concern as:

The potential redundancy and the extra latency a frame-based API has compared with a potential packet-based API

My position about this is that a frame-based API and a packet-based API offer different tradeoffs and therefore support different use cases.
For use cases where the decision is based on frame properties, the extra latency of the frame-based API is not a problem since you have to wait for the whole frame to make the decision. For the glitch-free failover case (a.k.a. fan-in) it is also good because it provides an ergonomic mechanism that increases reliability without introducing timeouts (other than the latency of waiting for a frame).
Also, an important practical consideration is that we have production-tested building blocks that make it easy to deploy a frame-based API, whereas a packet-based API is still in early stages of discussion with challenges that may still be unknown.

My other non blocking comment is API shape (keeping the feature set as it is, meaning let's keep it to dedicated worker in this version):

Evaluate transfer-based vs. script-transform-like API

As I commented on w3c/webrtc-encoded-transform#211 (comment), there are two different API proposals. The first one is well described in the explainer. The second is a script transform like API, something like:
[Exposed=Window]
interface RTCRtpSenderEncodedSource {
    constructor(Worker worker, optional any options, optional sequence<object> transfer)
};
[Exposed=DedicatedWorker]
interface RTCRtpSenderEncodedSourceController {
    WritableStream writable; // Or enqueue method.
    // Need congestion API, error API and enqueue API
};
partial interface DedicatedWorkerGlobalScope {
    attribute EventHandler onrtcencodedsource;
}
From a user perspective, there is probably no difference. There are probably some differences in terms of future-proofness, ease of use, ease of spec/dev...

Something I like about this alternative shape is the consistency with webrc-encoded-transform, which will give us the opportunity to evolve both APIs in a similar manner. For example, we can define the congestion API in the same manner for both. It also addresses your concern about transferability.
I'm fine with proceeding with this shape if you and @jan-ivar are also OK with it.

jan-ivar · 2024-03-20T20:33:55Z

I understand we got here incrementally, but I don't see why JS is needed when we can add first-class support for fanout. Is the source of an RTCRtpEncodedSource always RTCRtpReceiver? If so, why not skip the JS for this WebIDL:

Promise<undefined> replaceTrack((MediaStreamTrack or RTCRtpReceiver)? withTrackOrReceiver);

Apps could then declare encoded-level forwarding explicitly:

recvPc.ontrack = async ({receiver}) => await relayPc.getSenders()[0].replaceTrack(receiver);

The app can fail-over using replaceTrack to a different RTCRtpReceiver.

youennf · 2024-03-21T09:28:02Z

I understand we got here incrementally, but I don't see why JS is needed when we can add first-class support for fanout.

AIUI, the desire is for the web page to act a little bit like an SFU. I do not think the UA will be able to implement everything that a web page could do, for instance:

Handle the case where there is bandwidth limitation (the receiver can try to do some adaptation, say if the receiver is L1T2, it might only send L1T1, switch from source...).
Handle the case where the receiver gets redundant frames from multiple paths and will forward from one of this path on a per frame basis. I think this use case was presented but @guidou probably knows more.
Handle recovery cases, like packet loss or PLI/FIR (which of the multiple potential receivers to ask for the key frame, what to do when waiting...).

youennf · 2024-03-21T09:29:56Z

I'm fine with proceeding with this shape if you and @jan-ivar are also OK with it.

This API shape seems indeed to have some benefits over the transfer based API, I would tend to proceed with this one.

jan-ivar · 2024-03-21T15:00:47Z

AIUI, the desire is for the web page to act a little bit like an SFU.

Is the goal still to repurpose RTCRtpScriptTransform to run an SFU in JS?

For example, previous choices there like RTCEncodedVideoFrame being mutable and its serialization steps not copying the ArrayBuffer, was tailored to the simple frame modification use cases of encrypt/decrypt and add metadata.

How do we reconcile that SFU use cases seem better served by immutable RTCEncodedVideoFrames (per the tee problem? Does not a perfect fit mean not the right surface?

w3c/webrtc-encoded-transform#134 discusses creating one from another, but it's not clear how one would go from immutable to mutable without a copy.

dontcallmedom-bot · 2024-03-27T10:25:43Z

This issue was discussed in WebRTC March 26 2024 meeting – 26 March 2024 (RTCRtpEncodedSource)

alvestrand · 2024-04-04T12:10:09Z

We can argue that mutable RTCEncodedVideoFrame was a design mistake at the time.
However, the prospect of dealing with two variants of RTCEncodedVideoFrame, one mutable and one not, does not exactly fill me with joy.
We should ensure that the implementation of RTCEncodedVideoFrame is performant for the non-mutating case, and users will get the point when they need the performance.

guidou requested a review from alvestrand February 8, 2024 13:24

henbos self-requested a review February 8, 2024 15:39

guidou force-pushed the main branch 2 times, most recently from 0e50773 to d059b2b Compare March 5, 2024 14:12

guidou changed the title ~~Add relay-explainer.md~~ Add RTCRtpEncodedSource Mar 5, 2024

guidou changed the title ~~Add RTCRtpEncodedSource~~ Add RTCRtpEncodedSource and explainer Mar 5, 2024

guidou requested review from aboba, jan-ivar and youennf March 5, 2024 15:32

guidou removed the request for review from henbos March 18, 2024 13:55

jan-ivar reviewed Mar 18, 2024

View reviewed changes

guidou mentioned this pull request Apr 15, 2024

Constructors for RTC Encoded Frames with metadata WebKit/standards-positions#334

Closed

guidou closed this May 21, 2024

guidou force-pushed the main branch from 550617a to 28fd764 Compare May 21, 2024 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RTCRtpEncodedSource and explainer #198

Add RTCRtpEncodedSource and explainer #198

guidou commented Feb 8, 2024 •

edited by pr-preview bot

Loading

guidou commented Mar 5, 2024

aboba commented Mar 7, 2024 •

edited

Loading

guidou commented Mar 14, 2024

youennf commented Mar 14, 2024

guidou commented Mar 18, 2024

youennf commented Mar 18, 2024

jan-ivar Mar 18, 2024

guidou Mar 20, 2024

jan-ivar Mar 20, 2024 •

edited

Loading

jan-ivar Mar 20, 2024

jan-ivar commented Mar 18, 2024 •

edited

Loading

guidou commented Mar 20, 2024 •

edited

Loading

jan-ivar commented Mar 20, 2024 •

edited

Loading

youennf commented Mar 21, 2024

youennf commented Mar 21, 2024

jan-ivar commented Mar 21, 2024

dontcallmedom-bot commented Mar 27, 2024

alvestrand commented Apr 4, 2024

Add RTCRtpEncodedSource and explainer #198

Add RTCRtpEncodedSource and explainer #198

Conversation

guidou commented Feb 8, 2024 • edited by pr-preview bot Loading

guidou commented Mar 5, 2024

aboba commented Mar 7, 2024 • edited Loading

guidou commented Mar 14, 2024

youennf commented Mar 14, 2024

guidou commented Mar 18, 2024

youennf commented Mar 18, 2024

jan-ivar Mar 18, 2024

Choose a reason for hiding this comment

guidou Mar 20, 2024

Choose a reason for hiding this comment

jan-ivar Mar 20, 2024 • edited Loading

Choose a reason for hiding this comment

jan-ivar Mar 20, 2024

Choose a reason for hiding this comment

jan-ivar commented Mar 18, 2024 • edited Loading

guidou commented Mar 20, 2024 • edited Loading

jan-ivar commented Mar 20, 2024 • edited Loading

youennf commented Mar 21, 2024

youennf commented Mar 21, 2024

jan-ivar commented Mar 21, 2024

dontcallmedom-bot commented Mar 27, 2024

alvestrand commented Apr 4, 2024

guidou commented Feb 8, 2024 •

edited by pr-preview bot

Loading

aboba commented Mar 7, 2024 •

edited

Loading

jan-ivar Mar 20, 2024 •

edited

Loading

jan-ivar commented Mar 18, 2024 •

edited

Loading

guidou commented Mar 20, 2024 •

edited

Loading

jan-ivar commented Mar 20, 2024 •

edited

Loading