Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Support @owner format in uploader_id etc #31675

Merged
merged 2 commits into from
Feb 24, 2023

Conversation

dirkf
Copy link
Contributor

@dirkf dirkf commented Feb 24, 2023

Boilerplate: own code, bug fix ## Please follow the guide below
  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

According to #31530, YouTube has changed its metadata so that the owner URL has the format .../@ownerslug rather than .../user/user_id or .../channel/channel_id. Some, if not all, music videos retain the previous scheme.

This caused several problems:

  1. Owing to an extraction bug, failing to match the previous patterns crashed the extractor.
  2. After resolving the crash, various metadata items were not being extracted, and some study was needed to work out what values should be extracted.
  3. The resolution of this meant that many tests failed.

Proposed final resolution from linked post:

  • uploader is the author text value
  • channel is the same
  • uploader_id becomes the @... value
  • channel_id is channelId
  • uploader_url is the author URL with /@...
  • channel_url is /channel/{channel_id} at least while that is valid.

The linked post identifies that these metadata items may be available in various parts of the YT webpage, with slightly different access paths depending on the page type.

Thus, this PR

  1. updates the YouTube extractor to handle the new format
  2. returns values as proposed in [YouTube] Unable to extract uploader id #31530 (comment)
  3. updates the tests affected by the changes
  4. to support one of the tests, solves a long-standing issue in the download test that prevented testing cases where the id changed.

Closes #31568
Resolves #31530

Signalled by regexp ID value, eg: `'id': r're:[\da-zA-Z_-]{8,}'`
* implement ytdl-org#31530 (comment)
* update affected tests
* misc clean-ups
@dirkf dirkf changed the title Newmaster [YouTube] Support @owner format in uploader_id etc Feb 24, 2023
@dirkf
Copy link
Contributor Author

dirkf commented Feb 24, 2023

These tests pass locally but not in CI:

2023-02-24T03:41:39.4591969Z ======================================================================
2023-02-24T03:41:39.4592162Z FAIL: test_YoutubePlaylist_2 (test.test_download.TestDownload):
2023-02-24T03:41:39.4592438Z ----------------------------------------------------------------------
2023-02-24T03:41:39.4592564Z Traceback (most recent call last):
2023-02-24T03:41:39.4592944Z   File "/home/runner/work/youtube-dl/youtube-dl/test/test_download.py", line 192, in test_template
2023-02-24T03:41:39.4593104Z     len(res_dict['entries'])))
2023-02-24T03:41:39.4593493Z   File "/home/runner/work/youtube-dl/youtube-dl/test/helper.py", line 266, in assertGreaterEqual
2023-02-24T03:41:39.4593638Z     self.assertTrue(got >= expected, msg)
2023-02-24T03:41:39.4593974Z AssertionError: Expected at least 1 in playlist https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu, but got only 0
2023-02-24T03:41:39.4594254Z -------------------- >> begin captured stdout << ---------------------
2023-02-24T03:41:39.4594455Z [youtube:tab] PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu: Downloading webpage
2023-02-24T03:41:39.4594594Z [download] Downloading playlist: JODA15
2023-02-24T03:41:39.4594749Z [youtube:tab] playlist JODA15: Downloading 0 videos
2023-02-24T03:41:39.4594907Z [download] Finished downloading playlist: JODA15
2023-02-24T03:41:39.4594916Z 
2023-02-24T03:41:39.4595193Z --------------------- >> end captured stdout << ----------------------
2023-02-24T03:41:39.4595201Z 
2023-02-24T03:41:39.4595314Z ======================================================================
2023-02-24T03:41:39.4595497Z FAIL: test_YoutubeTab_20 (test.test_download.TestDownload):
2023-02-24T03:41:39.4595767Z ----------------------------------------------------------------------
2023-02-24T03:41:39.4596076Z Traceback (most recent call last):
2023-02-24T03:41:39.4596472Z   File "/home/runner/work/youtube-dl/youtube-dl/test/test_download.py", line 192, in test_template
2023-02-24T03:41:39.4596647Z     len(res_dict['entries'])))
2023-02-24T03:41:39.4597023Z   File "/home/runner/work/youtube-dl/youtube-dl/test/helper.py", line 266, in assertGreaterEqual
2023-02-24T03:41:39.4597169Z     self.assertTrue(got >= expected, msg)
2023-02-24T03:41:39.4597425Z AssertionError: Expected at least 350 in playlist https://www.youtube.com/hashtag/cctv9, but got only 339
2023-02-24T03:41:39.4597704Z -------------------- >> begin captured stdout << ---------------------
2023-02-24T03:41:39.4597826Z [youtube:tab] cctv9: Downloading webpage
2023-02-24T03:41:39.4597964Z [download] Downloading playlist: #cctv9
2023-02-24T03:41:39.4598090Z [youtube:tab] Downloading page 1
2023-02-24T03:41:39.4598211Z [youtube:tab] Downloading page 2
2023-02-24T03:41:39.4598333Z [youtube:tab] Downloading page 3
2023-02-24T03:41:39.4598459Z [youtube:tab] Downloading page 4
2023-02-24T03:41:39.4598579Z [youtube:tab] Downloading page 5
2023-02-24T03:41:39.4598720Z [youtube:tab] playlist #cctv9: Downloading 339 videos
2023-02-24T03:41:39.4598850Z [download] Downloading video 1 of 339
2023-02-24T03:41:39.4598979Z [download] Downloading video 2 of 339
...
2023-02-24T03:41:39.4650471Z [download] Downloading video 338 of 339
2023-02-24T03:41:39.4650595Z [download] Downloading video 339 of 339
2023-02-24T03:41:39.4650765Z [download] Finished downloading playlist: #cctv9
2023-02-24T03:41:39.4650935Z 
2023-02-24T03:41:39.4651433Z --------------------- >> end captured stdout << ----------------------
2023-02-24T03:41:39.4651442Z 
2023-02-24T03:41:39.4651570Z ======================================================================

Presumably there is some regional difference in the availability of the playlist items, but XFF seems to have no effect.

@dirkf dirkf merged commit f7ce98a into ytdl-org:master Feb 24, 2023
@Tectract

This comment was marked as resolved.

@dirkf

This comment was marked as resolved.

@Tectract

This comment was marked as resolved.

@dirkf
Copy link
Contributor Author

dirkf commented Mar 4, 2023

I don't really understand the difference between merging a commit into the master branch, and doing a release. Isn't that the same thing?

Starting from a selected commit, there's preparation (eg changelog), and a build step, and a deployment step to make the built versions available.

See #31585.

@ToddAndMargo

This comment was marked as resolved.

@xk

This comment was marked as resolved.

@dirkf

This comment was marked as outdated.

@CeliaBlorville
Copy link

Thank you @dirkf for the fix ! Everything goes fine now :)

@varelycode

This comment was marked as resolved.

@dirkf
Copy link
Contributor Author

dirkf commented Mar 9, 2023

Tried to upgrade with brew the version there is from 2021

Read #31530.

@ArticRigid

This comment was marked as resolved.

@gosuto-inzasheru

This comment was marked as resolved.

@benjibasson83

This comment was marked as spam.

@truebit

This comment was marked as off-topic.

@dirkf

This comment was marked as off-topic.

@makwanji

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[YouTube] Unable to extract uploader id