Fastpitch: Parallel text-to-speech with pitch prediction.

Łańcucki, Adrian

In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6588-6592. IEEE, 2021 [Arxiv].

Whats Unique Fastpitch is a fully parallel approach for text-to-speech model, it is based on fastspeech and conditioning the fundamental freuqency contours.

How it works

It is inspired from fastspeech, where it has two Feed Forward transformer blocks, one in the dimensionality of input, and another in the dimensionality of the output.
It uses duration predictor, similar to fastspeech, where it uses a trained Tacotran2 model for the same.
It does not require knowledge distillation approach for mel-spectrogram prediction, which is there in the fastspeech.
Instead, it has pitch prediction module, which is trained alogn with, where ground truth pitch is derived for each input value.
It is similar to fastspeech2 model, where pitch was predicted not for each value, but for each spectogram frame, which makes it bit costly.
Architecture diagram is as follow:

Source: Author

Ground truth for the pitch prediciton per input value:

Source: Author

Predicted pitch is added to the hidden representation from the first FFT block, and after which it is upsampled by the duration predicted for each input value.

$\hat{\boldsymbol{d}}=\text { DurationPredictor }(\boldsymbol{h}), \quad \hat{\boldsymbol{p}}=\operatorname{PitchPredictor}(\boldsymbol{h})\\ \begin{aligned} \boldsymbol{g} &=\boldsymbol{h}+\text { PitchEmbedding }(\boldsymbol{p}) \\ \hat{\boldsymbol{y}} &=\operatorname{FFTr}\left([\underbrace{g_{1}, \ldots, g_{1}}_{d_{1}}, \ldots \underbrace{g_{n}, \ldots, g_{n}}_{d_{n}}]\right) . \end{aligned}$

Its inference is 900 times faster than Tacotron2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fastpitch.md

fastpitch.md

Fastpitch: Parallel text-to-speech with pitch prediction.

Łańcucki, Adrian

In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6588-6592. IEEE, 2021 [Arxiv].

Files

fastpitch.md

Latest commit

History

fastpitch.md

File metadata and controls

Fastpitch: Parallel text-to-speech with pitch prediction.

Łańcucki, Adrian

In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6588-6592. IEEE, 2021 [Arxiv].