-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quicktour Audio] Improve && remove ffmpeg dependency #16723
[Quicktour Audio] Improve && remove ffmpeg dependency #16723
Conversation
@@ -123,14 +123,21 @@ A continuación, carga el dataset (ve 🤗 Datasets [Quick Start](https://huggin | |||
>>> dataset = datasets.load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT | |||
``` | |||
|
|||
Debemos asegurarnos de que la frecuencia de muestreo del conjunto de datos coincide con la frecuencia de muestreo con la que se entrenó `facebook/wav2vec2-base-960h`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@osanseviero could you take a look at my amazing Spanish skills here? haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, your Spanish was flawless 🚀 nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepL really is not bad then 😅
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
{'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, | ||
{'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, | ||
{'text': 'HOW DO I FURN A JOINA COUT'}] | ||
>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we select the four first indices then still pass a dataset? Using a list like this contradicts the text above (you can pass a whole dataset to a pileline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the blocker for this is that the pipeline expects
- A raw
ndarray
(hered["array"]
) - A dict of
{"raw": ndarray, "sampling_rate": 16_000}
butdatasets
creates{"array": ndarray, "sampling_rate": 16_000}
speech_recognizer(dataset["audio"]["array"])
....
Should work no ? (since the sampling rate is already taken care of beforehand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dataset["audio"]
loads + resamples the whole dataset. For audio it's important to first slice the list and then call the columns.
We could use dataset.select(...)
to remedy this but it'd still not work, e.g.:
from datasets import load_dataset
ds = load_dataset("PolyAI/minds14", "en-US", split="train")
ds = ds.select(range(4))
ds["audio"]["array"] # <- gives TypeError: list indices must be integers or slices, not str
Think the best I can do here is to rewrite the text a bit, no?
cc @lhoestq @albertovilla @polinaeterna @anton-l here as well - think that's a common problem we have with ds["audio"]["array"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok for me to rewrite the text and say it can take list/array/dataset. But the previous example looked simpler :-p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - @lhoestq @albertovilla do you have a nifty trick here by any chance to allow one to pass ds[:4]["audio"]["array"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't do ds[:4]["audio"]["array"]
. Indeed in the general case, the sampling rate may not be the same for all samples: sampling_rate
is an optional parameter of Audio
. So the format of ds[:4]["audio"]
is a list of dicts {"array":..., "sampling_rate":..., "path":...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can update the pipeline
to accept both raw
and array
. Seems like the path of least resistance here, wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the text and merging for now though
[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'}, | ||
{'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"}, | ||
{'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"}, | ||
>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for improving Patrick!
What does this PR do?
Fixes #16563
As discussed in #16563 , it's not good if the official quicktour example depends on Quicktour. Let's rather let
datasets
handle the audio loading and resampling here. IMO, it's also important to directly showcase here how to resample the audio.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.