Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVC getCapabilities() is redundant with Media Capabilities query #49

Closed
youennf opened this issue Oct 19, 2021 · 53 comments
Closed

SVC getCapabilities() is redundant with Media Capabilities query #49

youennf opened this issue Oct 19, 2021 · 53 comments
Labels
Needs Test Needs a Test

Comments

@youennf
Copy link

youennf commented Oct 19, 2021

Given Media Capabilities allow to query WebRTC scalability mode, it seems https://w3c.github.io/webrtc-svc/#rtcrtpcodeccapability* could be removed from the spec.

@alvestrand
Copy link
Contributor

Can you give a reference to the Media Capabilities function that you think corresponds?

@alvestrand
Copy link
Contributor

@aboba
Copy link
Contributor

aboba commented Oct 19, 2021

It's not clear how Media Capabilities can be used to provide the same information as getCapabilities(). You have to test each mode and codec combination, providing the required attributes bitrate, framerate, width and height. However, in WebRTC the resolution, bitrate and framerate can be chosen dynamically. So what values should be provided to MC in order to generate the list of supported scalabilityMode values for a given codec?

See: https://webrtc.internaut.com/mc/

@youennf
Copy link
Author

youennf commented Oct 19, 2021

Media Capabilities doesn't provide the ability to quickly determine the supported scalability modes for each codec. You have to test each mode and codec combination with an API such as isConfigSupported().

This is by design, and an actual advantage in terms of User Agent ability to do privacy mitigation.
In practice, for web developers, I think it should not be a burden, or the whole MC model has an issue.

@aboba
Copy link
Contributor

aboba commented Oct 19, 2021

Media Capabilities exposes considerable hardware info so it needs privacy mitigation, precisely to prevent iterating over every possible combination of codecs, hardware support, etc. Since SVC is rarely supported in hardware (WebCodecs is shipping with no SVC support in hardware currently), WebRTC-SVC doesn't have the same level of privacy concerns. Since getCapabilities() provides other related information such as header extension support (important for negotiating forwarding header extensions), the application need to call getCapabilities() anyway, so it might as well get all the information in a single call.

@youennf
Copy link
Author

youennf commented Oct 20, 2021

In general, we should just have one way of doing things.
I understand there might be legacy reasons for getCapabilities and MC redundancy, but WebRTC-SVC is a new spec.

getCapabilities has known issues with regards to codecs:

  • getCapabilities is synchronous while codec enumeration is usually asynchronous, see discussion about getCapabilities results changing after the first createOffer/createAnswer call
  • Applications might want to know about SVC referenceScaling support, which is only in MC.
  • Applications might want to know whether a given codec is power efficient or not, which is only in MC.

If MC is not good enough for WebRTC applications to easily and precisely select their preferred codec, we should try to improve MC directly instead of patching it with getCapabilities.

WebRTC-SVC doesn't have the same level of privacy concerns.

Getting a list of codecs is a known fingerprinting surface, exposing scalability modes extends this fingerprinting surface.
We should treat it is as such and apply https://w3c.github.io/fingerprinting-guidance/#api-minimization.

@alvestrand
Copy link
Contributor

It's a tradeoff between convenience for the Web page developer vs fingerprinting surface. Nothing new here.

But if we connect these, we need to know that the interface works. In particular, I'm not clear on how one will know that a given codec in WebRTC corresponds 100% to a given configuration in MediaCapabilities. we know that it does correspond because we've set "webrtc" as the usage scenario, but is there a guaranteed way to say "I want webrtc to send with the codec that I just queried the capabilities for"?

ATM the only way I can think of is to call getCapabilities(), iterate through the codecs to find the suitable ones, and for each of them query MediaCapabilities with each acceptable scalability mode. That's complicated.

@youennf
Copy link
Author

youennf commented Oct 20, 2021

but is there a guaranteed way to say "I want webrtc to send with the codec that I just queried the capabilities for"?

Agreed there is a small impedance mismatch between APIS as there is a translation step required to go from MC to setCodecPreferences via getCapabilities. This seems solvable.
One possibility is for MediaCapabilitiesInfo to directly provide a RTCRtpCodecCapability field when 'webrtc' is the provided MC type.

@aboba
Copy link
Contributor

aboba commented Oct 20, 2021

WebRTC-SVC isn't a new specification. It was originally developed in the ORTC CG, so it predates much of the WebRTC 1.0 specification, including "Unified Plan" and many of the sections of WebRTC Extensions. scalability mode info was added to MC for use by WebCodecs, not WebRTC.

So far, I haven't seen much MC usage by WebRTC developers. MC wasn't designed to be convenient for use in WebRTC and the additional information it provides isn't actionable in most cases.

referenceScaling isn't really an SVC issue. It also affects the ability to change resolution without a keyframe in a stream with no layers. For a WebRTC application not using SVC, it isn't actionable because the API doesn't have a way to tell the encoder to only make resolution changes at keyframe boundaries. "prefer-resolution" is just a hint. Since reference scaling is required for a VP8/VP9/AV1 decoder to be compliant, IMHO the lack of support is more of a "bug" than a "capability".

Similarly, "power efficiency" is more actionable in WebCodecs than WebRTC. WebRTC doesn't have the equivalent of WebCodecs HardwareAcceleration enum.

@fippo
Copy link
Contributor

fippo commented Oct 21, 2021

In general, we should just have one way of doing things.

the way for WebRTC applications to configure codec options is via getCapabilities and then setParameters which is the model webrtc-svc is following.

Mediacapabilities may inform that choice but that is just icing on the cake.
@drkron may have a better view on how it plays together with WebRTC in chrome (and the status, I only found an intent to experiment?)

@alvestrand
Copy link
Contributor

Another interesting wrinkle is that the scalabilityModes is suggested as the way to feature-detect support for the feature: "Support can be detected by checking the presence of scalabilityModes values in the RTCRtpCodecCapability dictionary."

I think "we should have one way of doing things" is not always the right answer - remembering that the slogan of Perl was "There's more than one way to do it"... (and that Python was written partly in reaction to that attitude).

@chcunningham
Copy link

For my part, I'm not opposed conceptually two having 2 ways. Maybe some developers will like MC for power info while others will like getCapabilities for the convenient single call. But, I do have some comments.

Since SVC is rarely supported in hardware (WebCodecs is shipping with no SVC support in hardware currently), WebRTC-SVC doesn't have the same level of privacy concerns.

HW SVC support is coming. Having said that, I don't think it will be particularly privacy sensitive.

getCapabilities is synchronous while codec enumeration is usually asynchronous, see discussion about getCapabilities results changing after the first createOffer/createAnswer call

Worth highlighting. As we initially implemented SVC hw encode for WebCodecs, we (@Djuffin) were reminded that querying the platform encoder capabilities at browser startup is too slow. Async solves the issue. WebRTC may not feel this pain as long as it always has software fallback w/ capabilities matching/exceeding that of the hardware path. Maybe that's a safe assumption?

Applications might want to know about SVC referenceScaling support, which is only in MC.

Still under discussion. Might be abandoned.

One possibility is for MediaCapabilitiesInfo to directly provide a RTCRtpCodecCapability field when 'webrtc' is the provided MC type.

From the MC side I'm open to adding this. We did a similar thing for MediaKeySystemAccess. I'm not very familiar with setCodecPreferences. Some sample code would help me evaluate effectiveness.

scalability mode info was added to MC for use by WebCodecs, not WebRTC.

I think we crossed wires here. MC doesn't talk about WebCodecs at all. It really is intended for RTC use when type='webrtc'. But you're definitely right that no one is using it for this yet... Chrome hasn't shipped this part of MC. Hoping to do that soon.

Similarly, "power efficiency" is more actionable in WebCodecs than WebRTC. WebRTC doesn't have the equivalent of WebCodecs HardwareAcceleration enum.

I imagine this being used is as an input for codec selection. For instance, if your app's differentiating feature is battery-saving, maybe you use this to prioritize accelerated codecs during negotiation? Or, if your a cloud gaming service concerns about power, maybe you use this to determine what stream type to send down. I'm not an RTC expert though - lmk if I've overlooked something.

@alvestrand
Copy link
Contributor

The Meet team is quite heavily involved in the MediaCapabilities work. They will use it with WebRTC.
Tagging @drkron for reference.

@drkron
Copy link

drkron commented Oct 28, 2021

The Meet team is quite heavily involved in the MediaCapabilities work. They will use it with WebRTC. Tagging @drkron for reference.

That's correct. The current status is that the feature is implemented behind a runtime flag. The response should be correct in regards to supported and powerEfficient and I'm currently working on the implementation for smooth. The goal is to land some code within the next weeks and thereafter launch an origin trial.

@chcunningham
Copy link

getCapabilities is synchronous while codec enumeration is usually asynchronous, see discussion about getCapabilities results changing after the first createOffer/createAnswer call

Worth highlighting. As we initially implemented SVC hw encode for WebCodecs, we (@Djuffin) were reminded that querying the platform encoder capabilities at browser startup is too slow. Async solves the issue. WebRTC may not feel this pain as long as it always has software fallback w/ capabilities matching/exceeding that of the hardware path. Maybe that's a safe assumption?

@henbos to comment on the above. I recall that you earlier had issue w/ the sync getCapabilities() API... so maybe its already true that we don't always have equally sw fallback? Or maybe I've overlooked some subtlety.

@fippo
Copy link
Contributor

fippo commented Oct 28, 2021

@aboba
Copy link
Contributor

aboba commented Oct 28, 2021

"WebRTC may not feel this pain as long as it always has software fallback w/ capabilities matching/exceeding that of the hardware path. Maybe that's a safe assumption?"

[BA] WebRTC acknowledges the potential for hardware-only codecs in a few places. The basic model is laid out in Section 4.4.2:

"If a system has limited resources (e.g. a finite number of decoders), createOffer needs to return an offer that reflects the current state of the system, so that setLocalDescription will succeed when it attempts to acquire those resources. The session descriptions MUST remain usable by setLocalDescription without causing an error until at least the end of the fulfillment callback of the returned promise."

Since both createOffer() and createAnswer() are async APIs, it is possible to interrogate the hardware so as to reflect the "current state of the system".

In getCapabilities() (Section 5.2) there is a note stating that getCapabilities() should be consistent with the codec info returned in createOffer() and createAnswer(). In practice, if hardware codecs are discovered and returned by createOffer() or createAnswer() these will subsequently be reflected in getCapabilities(). If hardware resources are exhausted, the paragraph in Section 4.4.2 might be interpreted to imply that hardware-only codecs should be removed from createOffer(), createAnswer() and subsequent calls to getCapabilities(). Not sure if this happens in practice, though.

In setParameters(), it is acknowledged that a codec may only be supported in hardware, and that resources may not be available for some configurations, in which case a "hardware-encoder-unavailable" error will be returned. Not sure if this happens in practice, either.

Some codecs may be supported in both hardware and software. In that situation, the implementation will use software if hardware resources are not available and the codec info would remain in createOffer(), createAnswer() and getCapabilities() regardless of the "current state of the system". WebRTC doesn't have a way for the application to indicate "prefer-hardware" (e.g. don't include the codec in createOffer(), createAnswer() or 'getCapabilities()` if hardware-acceleration isn't available).

@aboba aboba added the question Further information is requested label Nov 2, 2021
@chcunningham
Copy link

chcunningham commented Nov 4, 2021

Thanks @aboba. My read of your and @fippo's comments:

  1. sw fallback is already not a given
  2. but WebRTC already has workarounds in place

The workarounds are quirky for sure, but scalabilityMode doesn't make that worse. How do folks feel about those quirks? Are they bad enough to motivate only extending MC w/ new signaling like scalabilityMode (effectively deprecating getCapabilities())?

@aboba
Copy link
Contributor

aboba commented Nov 4, 2021

@chcunningham WebRTC has been implemented on devices with codecs only available in hardware (encoders or decoders or both). The workarounds in WebRTC are a bit clumsy (e.g. having to call createOffer() prior to getCapabilities()), and unpredictable (e.g. it is difficult for an application to guarantee access to resources). At the moment it is not a "hair on fire" situation, although it might become more urgent if 4K video and hardware-accelerated next generation codecs become widespread (e.g. AR/VR or super-realistic cloud gaming).

@alvestrand
Copy link
Contributor

Going for a summary.....

  • getCapabilities() has troublesome aspects. One of these is fingerprinting; another one is its synchronous aspect.
  • the amount of new fingerprint surface exposed by SVC modes is limited
  • the interaction between setParameters and MediaCapabilities (if any) is not specified, implemented or tested. So it's not clear if this is a clean replacement (now, or in the near future).

My current preference is to take three actions:

  • Follow up on the issue of getCapabilities, suggesting that it can be replaced with MediaCapabilities API, and that it should be deprecated and eventually removed once the specification of MediaCapabilities API is good enough
  • Ask MediaCapabilities specification authors to add a clear description of how codec settings vetted through MediaCapabilities can be used in the preference-setting calls of the WebRTC API
  • Ship SVC as-is, including the getCapabilities extension

Would that be an acceptable course of action?

@aboba
Copy link
Contributor

aboba commented Nov 10, 2021

Step 2 seems like a pre-requisite for Step 1. Since preference-setting calls are in part dependent on info available from SDP, this will involve both the WebRTC-PC and MediaCapabilities APIs. As a result, it might not fit neatly into the MediaCapabilities API specification. The investigation seems like a worthwhile exercise though - asking what would be required to polyfill an (improved) version of getCapabilities().

@chcunningham
Copy link

For my part, I remain open to having both APIs. I defer to @drkron to drive investigations and spec updates for preference setting. My only concern with shipping getCapabilities() as-is is the unresolved discussions happening in MC and RTC around decoder capability signalling. While these may appear to be MC issues only, it would be regrettable to have MC and getCapabilities() diverge significantly while co-existing.

w3c/media-capabilities#182 (and RTC parallel #52)
w3c/media-capabilities#183 (and RTC parallel #48)

@aboba
Copy link
Contributor

aboba commented Nov 11, 2021

@chcunningham Yes, we should figure out decoder capabilities, both for WebRTC-SVC and MC. I will label Issue 52 as CR-blocking.

@youennf
Copy link
Author

youennf commented Nov 18, 2021

FWIW, I am not asking to deprecate the whole getCapabilities, but only the codec capabilities part of it, in favour of MC.
Given there is a course of actions to make that happen, we should spend our time in building MC and not building on getCapabilities codec support that we want to remove in a follow up step.

One possibility is for MediaCapabilitiesInfo to directly provide a RTCRtpCodecCapability field when 'webrtc' is the provided MC type.

From the MC side I'm open to adding this. We did a similar thing for MediaKeySystemAccess. I'm not very familiar with setCodecPreferences. Some sample code would help me evaluate effectiveness.

How do people feel about this idea?
I can raise an issue on MC side to start the discussion in MC.

@alvestrand
Copy link
Contributor

Youenn, would you be OK with shipping the spec as-is for now, with an intent to deprecate and remove codec capabilities when MediaCapabilities have a sufficient replacement, but using this until the MediaCapabilities interactions are completely specified?

Given the speed of spec-work, it seems that the alternative course (not adding it) would lead to quite a bit of delay before the feature can be made useful.

@alvestrand
Copy link
Contributor

and to the MC expansion side: Yes, I'd be happy to see this raised, and very happy to see @youennf drive it!

@youennf
Copy link
Author

youennf commented Nov 18, 2021

Given the speed of spec-work, it seems that the alternative course (not adding it) would lead to quite a bit of delay before the feature can be made useful.

I filed w3c/media-capabilities#185.
Let's try to see whether we can do that spec work quickly enough.

@aboba aboba added CR-blocking and removed question Further information is requested labels Feb 16, 2022
@aboba
Copy link
Contributor

aboba commented Mar 16, 2022

Related Issues: 192, 190, 187, 185

@aboba
Copy link
Contributor

aboba commented Sep 7, 2022

Checked VP9 and AV1 on Chrome and MC and getCapabilities() results are out of sync.

For AV1, getCapabilities() reports support for "L1T2","L1T3","L2T1","L2T1h", "L2T1_KEY", "L2T2","L2T2_KEY","L2T2_KEY_SHIFT", "L3T1","L3T3","L3T3_KEY","S2T1". MC reports that for AV1, no modes are supported.

For VP9, getCapabilities() reports support for L1T2, L1T3, whereas MC reports support for L1T1, L1T2, L1T3, L2T1, L2T2, L2T3, L3T1, L3T3, L2T1h, S2T1, S2T3, S3T3, L2T2_KEY, L2T2_KEY_SHIFT, L2T3_KEY and L3T3_KEY.

@drkron
Copy link

drkron commented Sep 8, 2022

The VP9 differences is a known issue and is tracked in https://crbug.com/1299427 . The plan is to have the two in sync before the API to set scalability mode is shipped.
I'm surprised about the AV1 difference, I expected MC to report that it supports more or less all available formats. Will look into this.

@drkron
Copy link

drkron commented Sep 8, 2022

I was not able to reproduce the missing support for scalability modes for AV1. I manually tested 5-6 modes using this code:

navigator.mediaCapabilities.encodingInfo({
  type: 'webrtc',
  video: {
    contentType: 'video/AV1',
    scalabilityMode: 'L2T1h',
    height: 720,
    width: 1280,
    framerate: 24,
    bitrate: 1216848,
  },
}).then(result => {
  console.log(result.supported);
  console.log(result.smooth);
  console.log(result.powerEfficient);
});

@bc-lee
Copy link

bc-lee commented Oct 5, 2022

The mismatch between getCapabilities() and MediaCapabilities in Chrome seems to have been fixed.

It seems related to https://chromium-review.googlesource.com/c/chromium/src/+/3876268

To check the results of MediaCapabilities, I used https://webrtc.internaut.com/mc with some edits:

var modes = ["L1T1","L1T2","L1T3","L2T1","L2T1_KEY","L2T1h","L2T2","L2T2_KEY","L2T2_KEY_SHIFT",
             "L2T2h","L2T3","L2T3_KEY","L2T3_KEY_SHIFT","L2T3h","L3T1","L3T1_KEY","L3T1h","L3T2",
             "L3T2_KEY","L3T2_KEY_SHIFT","L3T2h","L3T3","L3T3_KEY","L3T3_KEY_SHIFT","L3T3h","S2T1",
             "S2T1h","S2T2","S2T2h","S2T3","S2T3h","S3T1","S3T1h","S3T2","S3T2h","S3T3","S3T3h"];

@Orphis
Copy link
Contributor

Orphis commented Dec 8, 2022

Chrome's implementation of the webrtc-svc spec is getting closer to completion and while the implementation has support for both MediaCapabilities and getCapabilities(), I don't think that getCapabilities() adds much value for the SVC case and its extensions and mentions should be removed from the webrtc-svc specification.

I will submit a PR for review before the next interim to cover this.

Should we really want to have a mechanism to list all the existing modes known to the browser (to be queried later with MediaCapabilities) or an API similar to getCapabilities() in place, we could then discuss it in a new issue.

@aboba
Copy link
Contributor

aboba commented Dec 9, 2022

In Media Capabilities, the bitrate and framerate are required to determine whether a configuration (including a scalabilityMode) is supported for a particular codec. Since the bitrate and framerate are not provided in addTransceiver or setParameters(), it has not been clear how an implementation could decide what SVC modes are legal without getCapabilities().

Currently, Section 4.2.1 says:

"If sendEncodings contains any encoding whose scalabilityMode value is not supported by any codec in RTCRtpSender.getCapabilities(kind).codecs, throw an OperationError."

Here is the language in Section 4.2.2:

"[WEBRTC] Section 5.2 describes validation of parameters within setParameters(). Add the following to the conditions under which the operation causes a promise rejected with an InvalidModificationError (step 4 within step 6):

Before initial negotiation has concluded, encodings contains any encoding whose scalabilityMode value is not supported by any codec in RTCRtpSender.getCapabilities(kind).codecs. After initial negotiation has concluded, encodings contains an encoding whose scalabilityMode value is not supported by the most preferred codec.
N is greater than 1, and encodings contains an encoding whose scalabilityMode value represents an "S mode"."

Do you have a proposal for how these two sections could use Media Capabilities instead of getCapabilities()?

@youennf
Copy link
Author

youennf commented Dec 9, 2022

Do you have a proposal for how these two sections could use Media Capabilities instead of getCapabilities()?

bitrate and framerate are somehow varying in WebRTC, so I am unclear how validation is meaningful here for webrtc.
Maybe the MC spec should acknowledge this in the validation of the scalability mode when the encoding type is webrtc.
Or we could always have a power-user option in MC that webrtc-pc would use.
The spec could do something like:

  • if scalability mode is not supported for that codec, reject.
  • if scalability mode is not supported for that bitrate and framerate, and we are NOT in webrtc-pc mode (or type is NOT webrtc), reject.

@alvestrand
Copy link
Contributor

I don't think there is a significant difference in terms of codec implementation between WebCodecs and WebRTC when it comes to bitrate and framerate. They both show some implementation limitation; when in use, the bitrate and framerate may be further restricted by downstream issues such as congestion, but it's still not possible to go over the implementation limitation.

So I don't see any benefit to treating webrtc mode differently here.

@youennf
Copy link
Author

youennf commented Dec 9, 2022

We have two modes currently, webrtc and record.
Record is usually fixed frame rate and fixed target frame rate. It makes sense to provide the corresponding information.
For WebRTC, it is known it may vary a lot, so does it make sense to require such information?

In any case, I think it is editorial work to provide a private hook in MC to answer whether a particular codec supports a scalability mode irrespective of any bitrate/framerate.

WebCodecs interaction with MC is interesting. Either it should be its own mode, or we should consider that WebCodecs can be used to implement any of these modes. This might be worth an issue in MC.

@youennf
Copy link
Author

youennf commented Dec 9, 2022

I filed w3c/media-capabilities#202

@henbos
Copy link

henbos commented Jan 19, 2023

Was consensus ever reached here? It caught us by surprise, seems cumbersome, and that there is a risk of mismatch.

Was the possibility of introducing an async version of getCapabilities discussed?

E.g. Promise<RTCRtpCapabilities> capabilities() returning the same thing as getCapabilities PLUS codec specifics like supported SVC modes.

@fippo
Copy link
Contributor

fippo commented Jan 19, 2023

See w3c/webrtc-extensions#100 (comment) for a per-transceiver async getCapabilities proposal (which would also solve header extensions and avoid reinventing the wheel for w3c/webrtc-extensions#137 et al)

@aboba
Copy link
Contributor

aboba commented Jan 28, 2023

With the merger PR #77 I believe this issue is resolved, except for WPT tests.

@opusonline

This comment was marked as off-topic.

@aboba
Copy link
Contributor

aboba commented Mar 13, 2023

@opusonline Since your comments relate to the Media Capabilities API, I have moved them to w3c/media-capabilities#203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Test Needs a Test
Projects
None yet
Development

No branches or pull requests