HiFiGANVocoderBundle¶
- class torchaudio.prototype.pipelines.HiFiGANVocoderBundle[source]¶
数据类,捆绑使用预训练的
HiFiGANVocoder
所需的关联信息。此类提供接口,用于实例化预训练模型,以及检索预训练权重和模型所需的其他数据的信息。
Torchaudio 库实例化此类的对象,每个对象代表一个不同的预训练模型。客户端代码应通过这些实例访问预训练模型。
此捆绑包可以将梅尔谱图转换为波形,反之亦然。典型的用例将是像 文本 -> 梅尔谱图 -> 波形 这样的流程,其中可以使用外部组件(例如 Tacotron2)从文本生成梅尔谱图。有关代码示例,请参见下文。
- 示例:将合成梅尔谱图转换为音频。
>>> import torch >>> import torchaudio >>> # Since HiFiGAN bundle is in prototypes, it needs to be exported explicitly >>> from torchaudio.prototype.pipelines import HIFIGAN_VOCODER_V3_LJSPEECH as bundle >>> >>> # Load the HiFiGAN bundle >>> vocoder = bundle.get_vocoder() Downloading: "https://download.pytorch.org/torchaudio/models/hifigan_vocoder_v3_ljspeech.pth" 100%|████████████| 5.59M/5.59M [00:00<00:00, 18.7MB/s] >>> >>> # Generate synthetic mel spectrogram >>> specgram = torch.sin(0.5 * torch.arange(start=0, end=100)).expand(bundle._vocoder_params["in_channels"], 100) >>> >>> # Transform mel spectrogram into audio >>> waveform = vocoder(specgram) >>> torchaudio.save('sample.wav', waveform, bundle.sample_rate)
- 示例:与 Tacotron2 一起使用,将文本转换为音频。
>>> import torch >>> import torchaudio >>> # Since HiFiGAN bundle is in prototypes, it needs to be exported explicitly >>> from torchaudio.prototype.pipelines import HIFIGAN_VOCODER_V3_LJSPEECH as bundle_hifigan >>> >>> # Load Tacotron2 bundle >>> bundle_tactron2 = torchaudio.pipelines.TACOTRON2_WAVERNN_CHAR_LJSPEECH >>> processor = bundle_tactron2.get_text_processor() >>> tacotron2 = bundle_tactron2.get_tacotron2() >>> >>> # Use Tacotron2 to convert text to mel spectrogram >>> text = "A quick brown fox jumped over a lazy dog" >>> input, lengths = processor(text) >>> specgram, lengths, _ = tacotron2.infer(input, lengths) >>> >>> # Load HiFiGAN bundle >>> vocoder = bundle_hifigan.get_vocoder() Downloading: "https://download.pytorch.org/torchaudio/models/hifigan_vocoder_v3_ljspeech.pth" 100%|████████████| 5.59M/5.59M [00:03<00:00, 1.55MB/s] >>> >>> # Use HiFiGAN to convert mel spectrogram to audio >>> waveform = vocoder(specgram).squeeze(0) >>> torchaudio.save('sample.wav', waveform, bundle_hifigan.sample_rate)
属性¶
sample_rate¶
方法¶
get_mel_transform¶
get_vocoder¶
- HiFiGANVocoderBundle.get_vocoder(*, dl_kwargs=None) HiFiGANVocoder [source]¶
构造 HiFiGAN 生成器模型,可作为声码器使用,并加载预训练权重。
权重文件从互联网下载并使用
torch.hub.load_state_dict_from_url()
缓存。- 参数:
dl_kwargs (关键字参数字典) – 传递给
torch.hub.load_state_dict_from_url()
。- 返回:
HiFiGANVocoder
的变体。