Any suggestion to fine-tune with a small dataset? #72

hahunavth · 2024-10-10T03:48:42Z

Hi,

I tried fine-tuning with a small clean dataset of Vietnamese speech, self-collected from YouTube, about 100 hours of audio. Here are a few audio demos. However, the results did not meet my expectations.

Here’s how I prepare data:

Clean dataset: I used the Vietnamese data mentioned above. I filtered the collected audio segments that were shorter than 3 seconds to match sub_sample_length = 3.072.
Noise dataset: I downloaded the DNS Interspeech 2020 noise data from here: DNS-Challenge noise data.
RIR dataset: I downloaded the dataset from the release page here: RIR dataset.
Test dataset: I used the test set from DNS-Challenge: Test set.

I used a 3080 GPU with a batch size of 12 and gradient accumulation steps set to 3. Model starting from the checkpoint fullsubnet_best_model_58epochs.tar.

I trained for 15 epochs. However, the loss decreased only in the first few epochs and then started increasing. When I tested the inference on a few samples, I noticed that the model left more noise compared to the original performance.

Am I missing something in the fine-tuning process?
Do you have any advice for me?

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any suggestion to fine-tune with a small dataset? #72

Any suggestion to fine-tune with a small dataset? #72

hahunavth commented Oct 10, 2024 •

edited

Loading

Any suggestion to fine-tune with a small dataset? #72

Any suggestion to fine-tune with a small dataset? #72

Comments

hahunavth commented Oct 10, 2024 • edited Loading

hahunavth commented Oct 10, 2024 •

edited

Loading