Add audio support to DataPack #585

mylibrar · 2022-01-08T04:25:19Z

This PR fixes #582.

Description of changes

Add new dependency: soundfile
- soundfile>=0.10.3 is inserted to setup.py and docs/requirements.txt
- CI workflow main.yml is updated with sudo apt-get install -y libsndfile1-dev as required by soundfile for Linux.
Update DataPack with audio support
- Add _audio for payload and sample_rate for metadata
- Corresponding operations are also added
Add AudioReader to load audio files
- User can provide a directory with a file extension, and the reader will automatically walk through the directory and load all the files with the specified suffix.

Test Conducted

A unit test for AudioReader is added. It builds and runs an audio processing pipeline for automatic speech recognition (ASR) in order to jointly test AudioReader and DataPack.

codecov · 2022-01-08T04:31:11Z

Codecov Report

Merging #585 (b004259) into master (818564c) will increase coverage by 0.08%.
The diff coverage is 93.42%.

@@            Coverage Diff             @@
##           master     #585      +/-   ##
==========================================
+ Coverage   79.78%   79.86%   +0.08%     
==========================================
  Files         227      229       +2     
  Lines       16163    16239      +76     
==========================================
+ Hits        12896    12970      +74     
- Misses       3267     3269       +2

Impacted Files	Coverage Δ
forte/data/readers/audio_reader.py	`88.00% <88.00%> (ø)`
tests/forte/data/readers/audio_reader_test.py	`94.87% <94.87%> (ø)`
forte/data/data_pack.py	`78.27% <100.00%> (+0.45%)`	⬆️
forte/data/readers/__init__.py	`100.00% <100.00%> (ø)`
forte/data/base_pack.py	`77.35% <0.00%> (+1.28%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 818564c...b004259. Read the comment docs.

hunterhector

Thanks for the PR, looks like it works OK.

Some other small suggestions:

We are at a point where we have quite a few data samples in the data_samples folder. Could you add a README.md in that folder? This time let's add a description of what the audio_reader_test folder contains.
Once we add this PR we would need to start documenting the feature, so from the start, let's consider adding a new markdown file in the docs folder, start to serve as a tutorial for the audio project. Then we can make a link from https:/asyml/forte/wiki, and from the root README.md

docs/requirements.txt

forte/data/data_pack.py

hunterhector · 2022-01-09T21:00:19Z

forte/data/readers/audio_reader.py

+"""
+import os
+from typing import Any, Iterator
+import soundfile


so this reader would depend on the soundfile dependency, which means it needs to be in our core requirement. So what happens would be if we do pip install forte and from forte.data.readers.misc_reader import xxx, this would fail (since this is in the __init__.py)

We need to think of a better way to place this reader. Any suggestions

forte/data/data_pack.py

forte/data/readers/audio_reader.py

setup.py

README.md

Add audio support to DataPack

05ffd30

mylibrar requested a review from hunterhector January 8, 2022 04:46

hunterhector reviewed Jan 9, 2022

View reviewed changes

Suqi Sun added 2 commits January 11, 2022 12:15

Fix PR comments

33cbfdf

Update soundfile dependency and docs

a5e90d7

hunterhector reviewed Jan 11, 2022

View reviewed changes

forte/data/readers/audio_reader.py Outdated Show resolved Hide resolved

hunterhector reviewed Jan 11, 2022

View reviewed changes

setup.py Show resolved Hide resolved

Suqi Sun added 2 commits January 11, 2022 20:03

Update README for extra requirements

a86c03f

Add soundfile to docs/requirements.txt

b7c919c

hunterhector reviewed Jan 12, 2022

View reviewed changes

README.md Show resolved Hide resolved

Suqi Sun added 2 commits January 12, 2022 13:46

Update README for extra requirements

1d00cb8

Add wikipedia extra req to README

b004259

hunterhector approved these changes Jan 12, 2022

View reviewed changes

hunterhector merged commit 0a16879 into asyml:master Jan 12, 2022

mylibrar deleted the localtest branch January 12, 2022 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio support to DataPack #585

Add audio support to DataPack #585

mylibrar commented Jan 8, 2022

codecov bot commented Jan 8, 2022 •

edited

Loading

hunterhector left a comment

hunterhector Jan 9, 2022

Add audio support to DataPack #585

Add audio support to DataPack #585

Conversation

mylibrar commented Jan 8, 2022

Description of changes

Test Conducted

codecov bot commented Jan 8, 2022 • edited Loading

Codecov Report

hunterhector left a comment

Choose a reason for hiding this comment

hunterhector Jan 9, 2022

Choose a reason for hiding this comment

codecov bot commented Jan 8, 2022 •

edited

Loading