-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RTCRtpEncodedSource and explainer #198
Conversation
0e50773
to
d059b2b
Compare
For easier reading: https://guidou.github.io/webrtc-extensions/#encoded-source-for-rtc-rtp-sender |
It looks like this PR inadvertently removes Section 14 Event Summary. I fixed this. |
Thanks! |
My main thought is that this API requires to wait for all packets of a frame, which creates some latency. This solution does not allow to do what an SFU is doing with the same level of performance. It might be a good enough compromise. If we already consider supporting packet forwarding, maybe that is what we should do instead. API wise, transferring seems harder than using what we have done for RTCRtpScriptTransform. We should consider the pros and cons of both approaches. It is not clear why RTCRtpSenderEncodedSource is not providing a WritableStream like done for RTCRtpScriptTransform. |
This is a good compromise for the use case of glitch-free forwarding with multiple input peer-connections with failover. In this case, frames provide a convenient abstraction to do failover quickly (no need for timeouts, just forward a frame from the first peer connection that provides it). It is also ergonomic for some other SFU-like operations where the outcome is frame-based (e.g., drop frames that don't satisfy a certain property). There is also an effort going on for a packet-level API, although we have not heard yet about developers interested in that API for the use case of glitch-free forwarding using multiple input peer connections. There are other use cases driving the design of that API.
We actually only need to transfer to workers within the same agent cluster, so maybe a clarification is needed in the text of the proposed spec. However, I do agree with you that it is better to be able to create the source where you use it. The main reasons the proposal exposes RTCRtpSenderEncodedSource only on DedicatedWorker are:
I agree with you on this point. The idea was to deviate as little as possible from w3c/webrtc-encoded-transform#211 (comment) in order to make it easier to achieve consensus, but if you agree about exposing RTCRtpSenderEncodedSource on Window, then we can simplify this by removing transferrability.
I agree that a WritableStream would provide more flexibility here. Again, the reason was to minimize deviations from w3c/webrtc-encoded-transform#211 (comment) If I understand correctly, your concerns can be summarized as follows:
We have already presented arguments in favor of the frame-based API and in previous discussions we have concluded that while latency may be a small disadvantage in some cases, frame-based has some clear advantages, in particular for the scenario of forwarding with glitch-free failover over multiple input peer connections. The concerns about the spec text I think can be addressed as follows:
WDYT? |
I would summarise my concern as:
As I commented on w3c/webrtc-encoded-transform#211 (comment), there are two different API proposals. The first one is well described in the explainer.
From a user perspective, there is probably no difference. |
relay-explainer.md
Outdated
``` | ||
// code in main.js file | ||
const worker = new Worker('worker.js'); | ||
|
||
// Let recvPc1, recvPc2 be the receiving PCs. | ||
recvPc{1|2}.ontrack = evt => { | ||
evt.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverTransform" }); | ||
}; | ||
|
||
|
||
// Let relayPc be the PC used to relay frames to the next peer. | ||
worker.onmessage = evt => { | ||
relayPc.replaceTrack(evt.data); | ||
}; | ||
``` | ||
|
||
``` | ||
// code in worker.js file | ||
async function relayFrames(reader, writer, encodedSource) { | ||
if(!reader || !writer || !encodedSource){ | ||
return; | ||
} | ||
while (true) { | ||
const {frame, done} = await reader.read(); | ||
if (done) return; | ||
|
||
let newFrame = new RTCRtpEncodedVideoFrame(frame, getUnifiedMetadata(frame)); | ||
if(!isDuplicate(newFrame)) { | ||
encodedSource.enqueue(newFrame); | ||
} | ||
// Put the original frame back in the receiver PC | ||
writer.write(frame); | ||
} | ||
} | ||
|
||
// Code to instantiate reader and writer from the RTPReceiver and RTPSender. | ||
onrtctransform = (event) => { | ||
if (event.transformer.options.name == "receiverTransform") { | ||
reader = event.transformer.readable; | ||
writer = event.transformer.writable; | ||
if (!encodedSource) { | ||
encodedSource = new RTCRtpSenderEncodedSource(); | ||
postMessage(encodedSource.handle); | ||
} | ||
} else { | ||
return; | ||
} | ||
|
||
relayFrames(reader, writer, encodedSource); | ||
}; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guidou I think I found a couple of bugs and I'd prefer if we used .pipeTo
. LMK if I got this right:
// main.js
const worker = new Worker('worker.js');
// Let recvPc1, recvPc2 be the receiving PCs.
recvPc1.ontrack = ({receiver}) =>
receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"});
recvPc2.ontrack = ({receiver}) =>
receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"});
// Let relayPc be the PC used to relay frames to the next peer.
const [sender] = relayPc.getSenders();
worker.onmessage = async ({data}) => await sender.replaceTrack(data.handle);
// worker.js
let encodedSource;
onrtctransform = async ({transformer: {readable, writable, options}}) => {
if (options.name != "receiverTransform") return;
if (!encodedSource) {
encodedSource = new RTCRtpSenderEncodedSource();
postMessage({handle: encodedSource.handle});
}
await readable.pipeThrough(new TransformStream({transform})).pipeTo(writable);
function transform(frame, controller) {
const newFrame = new RTCRtpEncodedVideoFrame(frame, getUnifiedMetadata(frame));
if (!isDuplicate(newFrame)) {
encodedSource.enqueue(newFrame);
}
controller.enqueue(frame);
}
}
But how well will this will work? recvPc1
and recvPc2
are racing here to provide frames to a single outgoing relayPC
? You mention redundancy as the reason, but it's odd seeing recvPc1, recvPc2 being merged into a single output. It looks like fan-in, not a fan-out. Is this important to emphasize in the example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found a couple of bugs and I'd prefer if we used .pipeTo. LMK if I got this right:
Yes, you got it right. I'll update the explainer to use this version.
But how well will this will work? recvPc1 and recvPc2 are racing here to provide frames to a single outgoing relayPC? You mention redundancy as the reason, but it's odd seeing recvPc1, recvPc2 being merged into a single output. It looks like fan-in, not a fan-out. Is this important to emphasize in the example?
Yes, I would call this fan-in and it is key for glitch-free failover, so it important to emphasize it.
I will also add fan-out to the example for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what about the racing? onrtctransform
is going to fire twice here and free-run pumping frames from recvPc1
and recvPc2
, two independent peer connections, through the transform
function in parallel. What syncs up the output from these two independent sources? What assures these encoded frames have any relation? isDuplicate
?
For fail-over, what's wrong with replaceTrack
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redo based on @youennf's proposed API:
// main.js
const worker = new Worker('worker.js');
// Let recvPc1, recvPc2 be the receiving PCs.
recvPc1.ontrack = ({receiver}) =>
receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"});
recvPc2.ontrack = ({receiver}) =>
receiver.transform = new RTCRtpScriptTransform(worker, {name: "receiverTransform"});
// Let relayPc be the PC used to relay frames to the next peer.
const [sender] = relayPc.getSenders();
const encodedSource = new RTCRtpSenderEncodedSource(worker);
await sender.replaceTrack(encodedSource);
// worker.js
let encodedWritable;
onrtcencodedsource = ({controller: {writable}}) => encodedWritable = writable;
onrtctransform = async ({transformer: {readable, writable, options}}) => {
if (options.name != "receiverTransform") return;
const [readable1, readable2] = readable.tee();
await Promise.all([
readable1.pipeTo(writable),
readable2.pipeThrough(new TransformStream({transform})).pipeTo(encodedWritable)
]);
function transform(frame, controller) {
const newFrame = new RTCRtpEncodedVideoFrame(frame, getUnifiedMetadata(frame));
if (!isDuplicate(newFrame)) {
controller.enqueue(newFrame);
}
}
}
This doesn't solve my racing concerns, but I thought it might be helpful to show what the application code might look like with the different API shapes.
E.g. note this uses the tee()
function which might run into whatwg/streams#1156 since unlike WebCodecs our encoded chunks are not immutable (should be fine in this case since there's no transform on readable1
, but it's brittle).
How does this stack up against just optimizing the obvious/inoptimal API? E.g. // Let relayPc be the PC used to relay frames to the next peer.
const [sender] = relayPc.getSenders();
// Let recvPc be the receiving PC
recvPc.ontrack = ({receiver}) => await sender.replaceTrack(receiver.track);
sender.transform = new RTCRtpScriptTransform(worker, {}); Browsers could detect when the input is a |
My position about this is that a frame-based API and a packet-based API offer different tradeoffs and therefore support different use cases.
Something I like about this alternative shape is the consistency with webrc-encoded-transform, which will give us the opportunity to evolve both APIs in a similar manner. For example, we can define the congestion API in the same manner for both. It also addresses your concern about transferability. |
I understand we got here incrementally, but I don't see why JS is needed when we can add first-class support for fanout. Is the source of an Promise<undefined> replaceTrack((MediaStreamTrack or RTCRtpReceiver)? withTrackOrReceiver); Apps could then declare encoded-level forwarding explicitly: recvPc.ontrack = async ({receiver}) => await relayPc.getSenders()[0].replaceTrack(receiver); The app can fail-over using |
AIUI, the desire is for the web page to act a little bit like an SFU. I do not think the UA will be able to implement everything that a web page could do, for instance:
|
This API shape seems indeed to have some benefits over the transfer based API, I would tend to proceed with this one. |
Is the goal still to repurpose RTCRtpScriptTransform to run an SFU in JS? For example, previous choices there like RTCEncodedVideoFrame being mutable and its serialization steps not copying the ArrayBuffer, was tailored to the simple frame modification use cases of encrypt/decrypt and add metadata. How do we reconcile that SFU use cases seem better served by immutable RTCEncodedVideoFrames (per the tee problem? Does not a perfect fit mean not the right surface? w3c/webrtc-encoded-transform#134 discusses creating one from another, but it's not clear how one would go from immutable to mutable without a copy. |
This issue was discussed in WebRTC March 26 2024 meeting – 26 March 2024 (RTCRtpEncodedSource) |
We can argue that mutable RTCEncodedVideoFrame was a design mistake at the time. |
Add an explainer for the upcoming RTCRtpSenderEncodedSource extension proposal
Preview | Diff