Add support for audio data to DataPack #582

mylibrar · 2022-01-07T20:06:40Z

Is your feature request related to a problem? Please describe.
We want DataPack to support multi media data representation including text and audio. This allows DataPack to handle audio processing tasks (e.g., Audio Classification, Audio Separation and Segmentation) and can also drive several multi media use cases (e.g., automatic speech recognition (ASR)).

Describe the solution you'd like
Support audio payload in DataPack as a numpy array with sample_rate, span_unit, seq_length, and other related info added as meta data.

Currently DataPack use DataPack._text to store text data, so maybe we can use DataPack._audio to store audio payload. Correspondingly, we might need to add some basic operations on the audio data, such as DataPack.audio and DataPack.set_audio().

The type of audio data can be a numpy array, which is a commonly used data structure to store raw waveform. Python libraries like Librosa and soundfile support loading audio files from different formats (wav, flac, mp3, etc.) into a numpy array with its sample rate. We can add a AudioReader to wrap this loading operation.

We also need to store some metadata along with the waveform array. For example, sample rate is the key info that determines the unit of timestamps. Right now we have a span_unit for text, so maybe we should add a similar field for audio to handle different units (e.g., sample, segment, frame). Another example is channel, which specifies the number of channels of the audio (stereo or mono), though this can also be inferred from the shape ofDataPack._audio. Other optional metadata includes bit-depth, which is an indicator of the resolution of audio measurement. All these information can be store in the Meta class here.

The text was updated successfully, but these errors were encountered:

mylibrar self-assigned this Jan 7, 2022

hunterhector added topic: data Issue about data loader modules and data processing related topic: infra Core infrastructure related issues. topic:audio labels Jan 7, 2022

mylibrar mentioned this issue Jan 8, 2022

Add audio support to DataPack #585

Merged

hunterhector closed this as completed in #585 Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for audio data to DataPack #582

Add support for audio data to DataPack #582

mylibrar commented Jan 7, 2022 •

edited

Loading

Add support for audio data to DataPack #582

Add support for audio data to DataPack #582

Comments

mylibrar commented Jan 7, 2022 • edited Loading

mylibrar commented Jan 7, 2022 •

edited

Loading