Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube videos download is failing but scraper succeeds #285

Closed
benoit74 opened this issue Aug 2, 2024 · 2 comments · Fixed by #288
Closed

Youtube videos download is failing but scraper succeeds #285

benoit74 opened this issue Aug 2, 2024 · 2 comments · Fixed by #288
Assignees
Labels
Milestone

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Aug 2, 2024

Task: https://farm.openzim.org/pipeline/4b08e2a7-04e4-41ca-aad8-e9945225dd69

Task succeeded at Zimfarm level so exit code is probably 0. However scraper gave-up on downloading videos, so there is something wrong here.

[youtube] U4N-17xJxaE: Downloading web player API JSON
[youtube] U4N-17xJxaE: Downloading web player API JSON
[youtube] U4N-17xJxaE: Downloading web player API JSON
[youtube] U4N-17xJxaE: Downloading m3u8 information
[youtube2zim::2024-08-01 07:27:41,717] ERROR:Video file for U4N-17xJxaE could not be downloaded
[youtube2zim::2024-08-01 07:27:41,718] DEBUG:ERROR: [youtube] U4N-17xJxaE: Requested format is not available. Use --list-formats for a list of available formats
[youtube2zim::2024-08-01 07:27:41,718] DEBUG:Attempting to download video file for 3Pvp9vCiVAU from cache...
[youtube] Extracting URL: 3Pvp9vCiVAU
[youtube] 3Pvp9vCiVAU: Downloading webpage
[youtube] 3Pvp9vCiVAU: Downloading ios player API JSON
[youtube] 3Pvp9vCiVAU: Downloading player 20dfca59
[youtube] 3Pvp9vCiVAU: Downloading web player API JSON
[youtube] 3Pvp9vCiVAU: Downloading web player API JSON
[youtube] 3Pvp9vCiVAU: Downloading web player API JSON
[youtube] 3Pvp9vCiVAU: Downloading web player API JSON
[youtube] 3Pvp9vCiVAU: Downloading m3u8 information
[youtube2zim::2024-08-01 07:27:43,036] ERROR:Video file for 3Pvp9vCiVAU could not be downloaded
[youtube2zim::2024-08-01 07:27:43,036] DEBUG:ERROR: [youtube] 3Pvp9vCiVAU: Requested format is not available. Use --list-formats for a list of available formats
[youtube2zim::2024-08-01 07:27:43,036] DEBUG:Attempting to download video file for zNl00mOSnJI from cache...
[youtube] Extracting URL: zNl00mOSnJI
[youtube] zNl00mOSnJI: Downloading webpage
[youtube] zNl00mOSnJI: Downloading ios player API JSON
[youtube] zNl00mOSnJI: Downloading player 20dfca59
[youtube] zNl00mOSnJI: Downloading m3u8 information
[youtube2zim::2024-08-01 07:27:45,290] ERROR:Video file for zNl00mOSnJI could not be downloaded
[youtube2zim::2024-08-01 07:27:45,290] DEBUG:ERROR: [youtube] zNl00mOSnJI: Requested format is not available. Use --list-formats for a list of available formats
[youtube2zim::2024-08-01 07:27:45,290] ERROR:82 video(s) failed to download: ['9fx0zWFjL98', 'GQLGHi41HIg', '2VhSRhRo0so', 'sZpNMcWKWqg', '8rm19ZlLWyg', 'wnnf4gqUOHg', 'MR9YoHxiYqY', 'P0gyQppy4fk', 'heBuRJpETD0', 'UW7Vayv2mnw', 'QU_So3rvfPM', 'YYJ4iUEUQ9U', 'DNLnuUMIvKk', 'h6KMiJ4t4n8', 'N45AMmGRDEI', '-at_QNdfnIY', 'OaWJUwtRIIY', 'eHb1a5Ym8J8', 'ZGqSm4rFbWQ', 'U5RyazBI840', 'pkXGPO2uYfA', 'jnJXOrffshg', 'f8RayI8y_pU', 'L5eBmLYTuZo', 'l4WNyCccfbE', 'bdJnU-l87UA', '-BgvPUCexvI', 'GCdCgcgUdIw', '7n8-buBegjI', '8pLplpyW5ss', '85YWZsNmRwc', '2dvW5WpCt7Y', '7HK9C_Bxv24', 'oyeyxNhZhKQ', 'h9bpD69ZO0c', 'aX-dgsZGkIk', 'vDyoyr8R8IQ', 'GchQHOv94-A', '9YPU-ixXOP8', 'f7srI-bJGHQ', 'rFB-2IC7t1M', '2v-GzU0o8KI', 'QlRXMmRay8U', 'L5sfIEjSjQQ', '4LCPZZYa6M8', 'jJSt1ZwCpCg', 'W2c7uReipVs', 'kAjEonHdrE8', 'qolj_xNJCPg', 'UoLlHpylqIM', 'CKASD4Y-t_c', 'OkhcfgvjfAI', '_ablpsK5C68', 'HHY0tGmXTug', 'bouOnyr2NFE', 'mLnM9H2CJXU', 'nQDnXt0nzBU', 'pb2hKk1r4PQ', 'sgzN6SJrkRs', 'bvVB1EMMmpQ', 'Vu5l3JBaTew', 'OzrFLqm-OSk', 'CwPxicCa61o', 'QtM4FuVyV2U', 'q8d9HhBuLqM', '9KMwVLz3ucc', '5xG7MHVnm_I', 'E1KKwSK2hm4', '_vfZJDV7bZA', '5Tjd9juIs94', 'tWs-gGThOoc', 'r4SyrpT73YM', '7eaQsdp6lXU', 't_dHkPN2WAk', '-LRF4ouWFfs', 'fIoJC0JwzBs', '13gUw6CYXD4', 'dXm4Ar3K3BY', '2JAYq7grAts', 'U4N-17xJxaE', '3Pvp9vCiVAU', 'zNl00mOSnJI']
[youtube2zim::2024-08-01 07:27:45,291] CRITICAL:More than half of videos failed. exiting
[youtube2zim::2024-08-01 07:27:45,291] ERROR:Interrupting process due to error: Too much videos failed to download
[youtube2zim::2024-08-01 07:27:45,291] ERROR:Too much videos failed to download
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/youtube2zim/scraper.py", line 413, in run
    raise OSError("Too much videos failed to download")
OSError: Too much videos failed to download
[youtube2zim::2024-08-01 07:27:45,291] INFO:Finishing ZIM file…
[youtube2zim::2024-08-01 07:27:45,291] INFO:removing temp folder
[youtube2zim::2024-08-01 07:27:45,293] INFO:all done!

I suspect this will be hard to reproduce, but a code analysis might help.

This happened on 3.0.0, don't know if it is linked or not.

@benoit74 benoit74 added the bug label Aug 2, 2024
@benoit74 benoit74 added this to the 3.0.1 milestone Aug 2, 2024
@dan-niles
Copy link
Collaborator

The following line was moved into a try-except block during the refactoring in #262.

raise OSError("Too much videos failed to download")

When this OSError is raised the following gets executed, causing the scraper to exit as if everything was done successfully:

except Exception as exc:
# request Creator not to create a ZIM file on finish
self.zim_file.can_finish = False
logger.error(f"Interrupting process due to error: {exc}")
logger.exception(exc)
finally:
logger.info("Finishing ZIM file…")
self.zim_file.finish()

@benoit74 Instead of raising the error, shall I replace it with sys.exit(1)?

@dan-niles dan-niles self-assigned this Aug 4, 2024
@benoit74
Copy link
Collaborator Author

benoit74 commented Aug 5, 2024

I'm concerned by the fact that sys.exit(1) would probably not cleanup properly tmp resources created by the libzim.

What I suggest to find the proper solution is this:

  • raise whatever Exception somewhere in the try block, after having started the creator and add few files to the ZIM
  • confirm how to handle this properly: no temporary resources of the libzim left over + exit code different than 0 + logging of the exception is sufficient to debug (stack trace + exception message)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants