Conformer¶
- class torchaudio.models.Conformer(input_dim: int, num_heads: int, ffn_dim: int, num_layers: int, depthwise_conv_kernel_size: int, dropout: float = 0.0, use_group_norm: bool = False, convolution_first: bool = False)[source]¶
Conformer: Convolution-augmented Transformer for Speech Recognition [Gulati et al., 2020] 中介绍的 Conformer 架构。
- 参数:
input_dim (int) – 输入维度。
num_heads (int) – 每个 Conformer 层中的注意力头数。
ffn_dim (int) – 前馈网络的隐藏层维度。
num_layers (int) – 要实例化的 Conformer 层数。
depthwise_conv_kernel_size (int) – 每个 Conformer 层的深度卷积层的内核大小。
dropout (float, optional) – dropout 概率。(默认值:0.0)
use_group_norm (bool, optional) – 在卷积模块中使用
GroupNorm
而不是BatchNorm1d
。(默认值:False
)convolution_first (bool, optional) – 在注意力模块之前应用卷积模块。(默认值:
False
)
示例
>>> conformer = Conformer( >>> input_dim=80, >>> num_heads=4, >>> ffn_dim=128, >>> num_layers=4, >>> depthwise_conv_kernel_size=31, >>> ) >>> lengths = torch.randint(1, 400, (10,)) # (batch,) >>> input = torch.rand(10, int(lengths.max()), input_dim) # (batch, num_frames, input_dim) >>> output = conformer(input, lengths)
方法¶
forward¶
- Conformer.forward(input: Tensor, lengths: Tensor) Tuple[Tensor, Tensor] [source]¶
- 参数:
input (torch.Tensor) – 形状为 (B, T, input_dim)。
lengths (torch.Tensor) – 形状为 (B,),第 i 个元素表示
input
中第 i 个批次元素的有效帧数。
- 返回:
- (torch.Tensor, torch.Tensor)
- torch.Tensor
输出帧,形状为 (B, T, input_dim)
- torch.Tensor
输出长度,形状为 (B,),第 i 个元素表示输出帧中第 i 个批次元素的有效帧数。