Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for audio data to DataPack #582

Closed
mylibrar opened this issue Jan 7, 2022 · 0 comments · Fixed by #585
Closed

Add support for audio data to DataPack #582

mylibrar opened this issue Jan 7, 2022 · 0 comments · Fixed by #585
Assignees
Labels
topic:audio topic: data Issue about data loader modules and data processing related topic: infra Core infrastructure related issues.

Comments

@mylibrar
Copy link
Collaborator

mylibrar commented Jan 7, 2022

Is your feature request related to a problem? Please describe.
We want DataPack to support multi media data representation including text and audio. This allows DataPack to handle audio processing tasks (e.g., Audio Classification, Audio Separation and Segmentation) and can also drive several multi media use cases (e.g., automatic speech recognition (ASR)).

Describe the solution you'd like
Support audio payload in DataPack as a numpy array with sample_rate, span_unit, seq_length, and other related info added as meta data.

Currently DataPack use DataPack._text to store text data, so maybe we can use DataPack._audio to store audio payload. Correspondingly, we might need to add some basic operations on the audio data, such as DataPack.audio and DataPack.set_audio().

The type of audio data can be a numpy array, which is a commonly used data structure to store raw waveform. Python libraries like Librosa and soundfile support loading audio files from different formats (wav, flac, mp3, etc.) into a numpy array with its sample rate. We can add a AudioReader to wrap this loading operation.

We also need to store some metadata along with the waveform array. For example, sample rate is the key info that determines the unit of timestamps. Right now we have a span_unit for text, so maybe we should add a similar field for audio to handle different units (e.g., sample, segment, frame). Another example is channel, which specifies the number of channels of the audio (stereo or mono), though this can also be inferred from the shape ofDataPack._audio. Other optional metadata includes bit-depth, which is an indicator of the resolution of audio measurement. All these information can be store in the Meta class here.

@mylibrar mylibrar self-assigned this Jan 7, 2022
@hunterhector hunterhector added topic: data Issue about data loader modules and data processing related topic: infra Core infrastructure related issues. topic:audio labels Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:audio topic: data Issue about data loader modules and data processing related topic: infra Core infrastructure related issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants