Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quicktour Audio] Improve && remove ffmpeg dependency #16723

Merged

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Apr 12, 2022

What does this PR do?

Fixes #16563

As discussed in #16563 , it's not good if the official quicktour example depends on Quicktour. Let's rather let datasets handle the audio loading and resampling here. IMO, it's also important to directly showcase here how to resample the audio.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@@ -123,14 +123,21 @@ A continuación, carga el dataset (ve 🤗 Datasets [Quick Start](https://huggin
>>> dataset = datasets.load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT
```

Debemos asegurarnos de que la frecuencia de muestreo del conjunto de datos coincide con la frecuencia de muestreo con la que se entrenó `facebook/wav2vec2-base-960h`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@osanseviero could you take a look at my amazing Spanish skills here? haha

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, your Spanish was flawless 🚀 nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepL really is not bad then 😅

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 12, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

{'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"},
{'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"},
{'text': 'HOW DO I FURN A JOINA COUT'}]
>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we select the four first indices then still pass a dataset? Using a list like this contradicts the text above (you can pass a whole dataset to a pileline).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the blocker for this is that the pipeline expects

  • A raw ndarray (here d["array"])
  • A dict of {"raw": ndarray, "sampling_rate": 16_000} but datasets creates {"array": ndarray, "sampling_rate": 16_000}
speech_recognizer(dataset["audio"]["array"])
....

Should work no ? (since the sampling rate is already taken care of beforehand)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataset["audio"] loads + resamples the whole dataset. For audio it's important to first slice the list and then call the columns.

We could use dataset.select(...) to remedy this but it'd still not work, e.g.:

from datasets import load_dataset

ds = load_dataset("PolyAI/minds14", "en-US", split="train")
ds = ds.select(range(4))

ds["audio"]["array"] # <- gives TypeError: list indices must be integers or slices, not str

Think the best I can do here is to rewrite the text a bit, no?

cc @lhoestq @albertovilla @polinaeterna @anton-l here as well - think that's a common problem we have with ds["audio"]["array"]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for me to rewrite the text and say it can take list/array/dataset. But the previous example looked simpler :-p

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - @lhoestq @albertovilla do you have a nifty trick here by any chance to allow one to pass ds[:4]["audio"]["array"] ?

Copy link
Member

@lhoestq lhoestq Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do ds[:4]["audio"]["array"]. Indeed in the general case, the sampling rate may not be the same for all samples: sampling_rate is an optional parameter of Audio. So the format of ds[:4]["audio"] is a list of dicts {"array":..., "sampling_rate":..., "path":...}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update the pipeline to accept both raw and array. Seems like the path of least resistance here, wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the text and merging for now though

[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'},
{'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"},
{'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"},
>>> raw_audio_waveforms = [d["array"] for d in dataset[:4]["audio"]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving Patrick!

@patrickvonplaten patrickvonplaten merged commit 9a2995e into huggingface:main Apr 18, 2022
@patrickvonplaten patrickvonplaten deleted the remove_ffmpeg_quigkhouc_ branch April 18, 2022 14:50
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
)

* [Quicktour Audio] Improve && remove ffmpeg dependency

* final fix

* final touches
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when running "Quick Tour" code snippets
7 participants