Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any suggestion to fine-tune with a small dataset? #72

Open
hahunavth opened this issue Oct 10, 2024 · 0 comments
Open

Any suggestion to fine-tune with a small dataset? #72

hahunavth opened this issue Oct 10, 2024 · 0 comments

Comments

@hahunavth
Copy link

hahunavth commented Oct 10, 2024

Hi,

I tried fine-tuning with a small clean dataset of Vietnamese speech, self-collected from YouTube, about 100 hours of audio. Here are a few audio demos. However, the results did not meet my expectations.

Here’s how I prepare data:

  • Clean dataset: I used the Vietnamese data mentioned above. I filtered the collected audio segments that were shorter than 3 seconds to match sub_sample_length = 3.072.
  • Noise dataset: I downloaded the DNS Interspeech 2020 noise data from here: DNS-Challenge noise data.
  • RIR dataset: I downloaded the dataset from the release page here: RIR dataset.
  • Test dataset: I used the test set from DNS-Challenge: Test set.

I used a 3080 GPU with a batch size of 12 and gradient accumulation steps set to 3. Model starting from the checkpoint fullsubnet_best_model_58epochs.tar.

I trained for 15 epochs. However, the loss decreased only in the first few epochs and then started increasing. When I tested the inference on a few samples, I noticed that the model left more noise compared to the original performance.

Am I missing something in the fine-tuning process?
Do you have any advice for me?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant