Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving RTCRtpCodecCapability from MediaCapabilities when queried for webrtc #185

Open
youennf opened this issue Nov 18, 2021 · 26 comments · May be fixed by #186
Open

Retrieving RTCRtpCodecCapability from MediaCapabilities when queried for webrtc #185

youennf opened this issue Nov 18, 2021 · 26 comments · May be fixed by #186
Labels
Milestone

Comments

@youennf
Copy link

youennf commented Nov 18, 2021

As discussed in w3c/webrtc-svc#49, while Media Capabilities have a webrtc mode, it might be difficult to actually call https://w3c.github.io/webrtc-pc/#dom-rtcrtptransceiver-setcodecpreferences from Media Capabilities results.
The issue is that setCodecPreferences takes a RTCRtpCodecCapability value, which the web application currently would have to query from WebRTC API after calling MC or rebuild the RTCRtpCodecCapability itself.

To improve this, a straightforward approach would be that MediaCapability would provide a RTCRtpCodecCapability field, say inside MediaCapabilitiesInfo, when the configuration type is webrtc.

@youennf
Copy link
Author

youennf commented Nov 18, 2021

@chcunningham , you seemed ok with this approach in the webrtc thread.
Do you have any additional feedback?
Should we loop in additional people or is it reasonable to start writing a PR?

@drkron
Copy link
Contributor

drkron commented Nov 19, 2021

I'm happy to see that there's an interest in using MediaCapabilities API for WebRTC.

I think that an example is helpful to understand how the API would be used. So the API is called something like this

let mediaConfig = {`
  type: 'webrtc'.`
  audio: {
    contentType: 'audio/opus',
    channels: '2',
    bitrate: 132266,
    samplerate: 48000
  },
  video: {
    contentType: 'video/VP9; profile-id=1',
    width: 1280,
    height: 720,
    bitrate: 1234567,
    framerate: '25'
  }
};

result = await navigator.mediaCapabilities.decodingInfo(mediaConfig);

and given the proposed PR, result would be something like this:

result = {
  supported: true,
  smooth: true,
  powerEfficient: false,
  webrtcCodec: {
    clockRate: 90000,
    mimeType: 'video/VP9',
    sdpFmtpLine: 'profile-id=0'
  }
}

result.webrtcCodec could next be used as input to setCodecPreferences() to select this as the preferred video codec for this transceiver.

Is this a correct understanding?

Do we need a corresponding entry for the audio codec?

I think that a drawback of the MediaCapabilities API in this context is that the user needs to know the available codecs since there's no way to get a list of all codecs. This may not be a problem in practice though since there are only a few codecs to choose from.

@youennf
Copy link
Author

youennf commented Nov 19, 2021

Is this a correct understanding?

Yes

Do we need a corresponding entry for the audio codec?

Oh right, I forgot about a combined query.
I guess we could go with webrtcCodec.audio/webrtcCodec.video or webrtcAudioCodec/webrtcVideoCodec.
Wdyt?

I think that a drawback of the MediaCapabilities API in this context is that the user needs to know the available codec

The use case here is mainly to select one particular codec or a small list of preferred codecs.
Agreed this does not work super well for the case of reordering all webrtc codecs but I am not sure how used/useful that is.

@drkron
Copy link
Contributor

drkron commented Nov 22, 2021

Oh right, I forgot about a combined query. I guess we could go with webrtcCodec.audio/webrtcCodec.video or webrtcAudioCodec/webrtcVideoCodec. Wdyt?

I have a slight preference for webrtcCodec.audio/webrtcCodec.video since it more clearly groups the WebRTC specifics, but I don't have a strong opinion.

@chcunningham
Copy link
Contributor

Should we loop in additional people or is it reasonable to start writing a PR?

Looking at the PR now

I have a slight preference for webrtcCodec.audio/webrtcCodec.video since it more clearly groups the WebRTC specifics, but I don't have a strong opinion.

+1

@chcunningham
Copy link
Contributor

PR generally looks good, but I have one concern: how should we set clockrate if the system / codec supports multiple rates?

For example, I see getCapabilities() returns two entries with mimeType: "audio/ISAC". The first has clockRate: 16000 the second has clockRate: 32000.

For the other members of RTCRtpCodecCapability can all be derived from the input configuration.

  • mimeType and sdpFmtpLine are found in our contentType string
  • channels is found in our channels string

Should we follow that model for clockRate, adding it to the input dictionary? Would it be reasonable to instead always take the top clock rate? Open to alternatives... my WebRTC familiarity isn't strong enough for me to make a firm suggestion.

@youennf
Copy link
Author

youennf commented Nov 24, 2021

how should we set clockrate if the system / codec supports multiple rates?

Looking specifically at ISAC, it has wideband (16 KHz sample rate, 16KHz clock rate) and super-wideband (32 KHz sample rate, 32KHz clock rate).
I believe https://www.w3.org/TR/media-capabilities/#dom-audioconfiguration-samplerate could be used for selection.

If there are several matching potential codec configurations, the default one (in terms of getCapabilities list order) should probably be selected. In the ISAC case, that would mean wideband for Chrome.

Should we follow that model for clockRate, adding it to the input dictionary?

That could be useful if we want to tackle the case of a codec with a sample rate but several clock rates.
I am not sure how useful that is in practice and would treat it as a separate issue.

@aboba, @alvestrand, thoughts?

@chcunningham
Copy link
Contributor

@aboba @alvestrand friendly ping for thoughts.

I follow the idea. I'm not RTC savvy enough to say whether sample rate and clock rate can/should be tied in this way. Also, sample rate is not currently required for MC, so the defaulting scenario is potentially real. We could alternatively make sample rate required (just for RTC) if that is desirable.

mimeType and sdpFmtpLine are found in our contentType string

I wan't to revisit this. I note that the sdpFmtLine can be pretty long in some cases... for ex:
"mimeType": "video/H264",
"sdpFmtpLine": "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f"

@drkron I know profile-level-id is covered by contentType. What about about level-asymmetry-allowed and packetization-mode?

@drkron
Copy link
Contributor

drkron commented Dec 9, 2021

The sdpFmtpLine is more or less the same as the parameters of the mime/media type so they are covered by contentType as well. Here's the full list of H264 parameters, https://datatracker.ietf.org/doc/html/rfc6184#section-8.1
but I think that it's only level-asymmetry-allowed, packetization-mode, and profile-level-id that are used in WebRTC at the moment.

@chcunningham
Copy link
Contributor

Thanks @drkron. Any thoughts on the clock rate question?

@alvestrand
Copy link

There's an 1-to-1 relationship between the triplet of "m=" line, the "a=rtpmap" line and the "a=fmtp" line and the MIME type (with parameters).

The "video" part of the MIME type lives on the m= line, the "h264" or "vp8" part of the MIME type lives on the a=rtpmap line, and the rest of the parameters live on the a=fmtp line. This is defined in the registration rules for RTP mime types (forgive me for not having the RFC number handy).

In addition, the clock rate and the number of channels (for audio) are supposed to be represented in the MIME type format as parameters.

For audio, the clock rate is an important parameter that the user needs to select for, so it needs to be part of the input parameters.
For video, it's always 90000, so nobody cares about it.

@chcunningham
Copy link
Contributor

chcunningham commented Jan 13, 2022

Thanks @alvestrand. Let me try to combine your insights w/ my ISAC example. Relevant lines from my createOffer() call are

  • m=audio 9 UDP/TLS/RTP/SAVPF 111 63 103 104 9 0 8 110 112 113 126
  • a=rtpmap:103 ISAC/16000
  • a=rtpmap:104 ISAC/32000

so we have "audio/ISAC ...", but I'm not sure how best to include the clockrate since it is not part of the a=fmtp line (no such line for this codec). Would it be correct to pass "audio/ISAC/1600" as the mime type for MediaCapabilities? Note that this has implications for other codecs; opus would then become "audio/opus/48000/2".

@alvestrand
Copy link

alvestrand commented Jan 14, 2022 via email

@chcunningham
Copy link
Contributor

Thanks @alvestrand. Do you think rate should be a strictly required part of the mime (at least for audio)? Earlier in the thread we talked about maybe defaulting to whatever would otherwise come first in terms of getCapabilities list order. If we return the a RTCRtpCodecCapability as part of the output MediaCapabilitiesInfo (per Youenn's PR), callers could inspect the return to know what we defaulted to.

@alvestrand
Copy link

alvestrand commented Jan 17, 2022 via email

@chcunningham
Copy link
Contributor

@drkron thoughts on the above? We currently aren't enforcing any requirements on having a rate. For example, one of the wpt tests expects supported=true for audio/ISAC

@drkron
Copy link
Contributor

drkron commented Feb 1, 2022

My thoughts on this is that it seems most in line with what's returned from RTCRtpReceiver.getCapabilities("audio") to specify channels and samplerate as explicit dictionary members of AudioConfiguration instead of specifying them as MIME type parameters (although this is an option according to specs). The clockRate can probably be deduced from the MIME type and samplerate? If channels/samplerate have not been specified it sounds good to me to use whatever come first in terms of getCapabilities as default value.

However, I think that @alvestrand and @youennf are the experts here so I wouldn't argue if they have a different opinion.

@chcunningham
Copy link
Contributor

I follow. My leaning is to do whatever is easiest for API users. If RTC APIs typically break out components of samplerate etc, I agree it makes a compelling case for MC to do the same. I defer to RTC folks to build consensus on that. My main priorities are to ensure MC requires enough that inputs are clearly defined and meaning is always unambiguous.

@alvestrand
Copy link

Either representing as part of the MIME parameters or as separate attributes works, technically. I have a weak preference for separate attributes.

@chcunningham
Copy link
Contributor

I think we're mostly converged. Let's wrap this up by planning out how to amend @youennf's PR (#186).

Currently the PR has a few sentences like:

... set webrtc’s audio to a RTCRtpCodecCapability dictionary representing the supported audio configuration.

We should add some steps here to clarify how construct the RTCRtpCodecCapability. The dictionary consists of

dictionary RTCRtpCodecCapability {
  required DOMString mimeType;
  required unsigned long clockRate;
  unsigned short channels;
  DOMString sdpFmtpLine;
};

Using our discussion above to map that from MediaCapabilities inputs, we have

  • webrtcCodec.audio.mimeType = the audio/* part of mcInput.audio.contentType
  • webrtcCodec.audio.clockRate = ...
    • if mcInput.audio.clockRate (new thing) is provided, assign mcInput.audio.clockRate
    • elseif mcInput.audio.mimeType == audio/PCMA or audio/PCMU, default to 8000
    • else throw TypeError (clockRate is required).
  • webrtcCodec.audio.channels = mcInput.audio.channels (string to int conversion here. throw TypeError for invalid values)
  • webrtcCodec.audio.sdpFmtpLine = mcInput.audio.contentType with leading audio/* removed

@drkron @alvestrand does this match your expectations? @youennf thoughts?

@aboba
Copy link

aboba commented Apr 12, 2022

If the goal is to replace getCapabilities() entirely, you'd also need to return info on "codecs" like rtx, ulpfec, red, flexfec, etc.
To see what would be returned, look here.

@chcunningham
Copy link
Contributor

@alvestrand @youennf - thoughts on the last 2 comments?

@youennf
Copy link
Author

youennf commented Apr 29, 2022

If the goal is to replace getCapabilities() entirely, you'd also need to return info on "codecs" like rtx, ulpfec, red, flexfec, etc.

As discussed during the last WebRTC WG meeting, the goal is not to fully replace getCapabilities() entirely, just the real media codecs that for instance WebTransport+WebCodecs say could be interested in.

We should add some steps here to clarify how construct the RTCRtpCodecCapability.

Sounds good.

@drkron @alvestrand does this match your expectations? @youennf thoughts?

Overall, this looks good.
@alvestrand mentions that clockRate would be a required member and I wonder whether we could have it as an optional parameter (if not provided, use default values as currently being done by getCapabilities).
Maybe we could leave clockRate to a follow-up PR?

I am also still unclear about whether we are fine/want defaulting rules or not.
Say if I do not provide channel, or I just provide 'video/H264' or just 'video/H264'+profile-level-id, can I still get a valid capability dictionary>?
Having default rules might be more web dev friendly so it is appealing to me.

If we anticipate WebTransport+WebCodecs RTC applications to use the 'webrtc' MC type (are we?), it seems we should not require these applications to pass parameters that would be RTP/SDP specific.

@alvestrand
Copy link

I'm happy with chcunningham's proposed algorithm for constructing an RTCRtpCodecCapability.
I don't quite understand "webrtcCodec.audio.mimeType = the audio/* part of mcInput.audio.contentType" - could you give an example of the strings that would be used?

RTCRtpReceiver.getCapabilities('audio') currently returns (example):

mimeType: "audio/opus"; sdpFmtpLine: "minptime=10;useinbandfec=1"

What's passed in as mcInput.audio.contentType?

@aboba
Copy link

aboba commented May 10, 2022

Discussion at April 2022 WEBRTC WG meeting is here. Summary is that there is little interest in having Media Capabilities return info on "fake" codecs (e.g. telephone-event, CN, FEC, RTX, RED).

@chrisn
Copy link
Member

chrisn commented Aug 3, 2023

Minutes from April 12 2022 Media WG meeting: https://www.w3.org/2022/04/12-mediawg-minutes.html (preceded the WebRTC meeting)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants