TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

torchaudio.pipelines.TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

基于音素的文本转语音（TTS）流水线，使用在 LJSpeech [Ito and Johnson, 2017] 上训练了 1,500 个 epoch 的 Tacotron2，以及在 LJSpeech [Ito and Johnson, 2017] 8 位深度波形上训练了 10,000 个 epoch 的 WaveRNN 声码器。

文本处理器基于音素对输入文本进行编码。它使用 DeepPhonemizer 将字素转换为音素。该模型 (en_us_cmudict_forward) 在 CMUDict 上训练。

您可以在此处找到 Tacotron2 的训练脚本。使用了以下参数：win_length=1100, hop_length=275, n_fft=2048, mel_fmin=40 和 mel_fmax=11025。

您可以在此处找到 WaveRNN 的训练脚本。

请参阅 torchaudio.pipelines.Tacotron2TTSBundle() 了解用法。

示例 - “Hello world! T T S stands for Text to Speech!”

示例 - “The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired,”

TACOTRON2_WAVERNN_PHONE_LJSPEECH¶

文档

教程

资源