ctc_decoder¶

torchaudio.models.decoder.ctc_decoder(lexicon: Optional[str], tokens: Union[str, List[str]], lm: Optional[Union[str, CTCDecoderLM]] = None, lm_dict: Optional[str] = None, nbest: int = 1, beam_size: int = 50, beam_size_token: Optional[int] = None, beam_threshold: float = 50, lm_weight: float = 2, word_score: float = 0, unk_score: float = -inf, sil_score: float = 0, log_add: bool = False, blank_token: str = '-', sil_token: str = '|', unk_word: str = '<unk>') → CTCDecoder[源代码]¶

构建 CTCDecoder 的实例。

参数：

lexicon (str 或 None) – 包含可能单词及其对应拼写的词典文件。每行包含一个单词及其由空格分隔的拼写。如果为 None，则使用无词典解码。
tokens (str 或 List[str]) – 包含有效标记（tokens）的文件或列表。如果使用文件，预期格式是将映射到同一索引的标记放在同一行上
lm (str, CTCDecoderLM, 或 None, 可选) – KenLM 语言模型的路径，或 CTCDecoderLM 类型的自定义语言模型，如果不需要语言模型则为 None
lm_dict (str 或 None, 可选) – 包含用于 LM 的词典文件，每行一个单词，按 LM 索引排序。如果使用词典进行解码，lm_dict 中的条目也必须出现在词典文件中。如果为 None，则使用词典文件构建 LM 的词典。(默认值: None)
nbest (int, 可选) – 返回的最佳解码结果数量 (默认值: 1)
beam_size (int, 可选) – 每个解码步骤后保留的最大假设数量 (默认值: 50)
beam_size_token (int, 可选) – 每个解码步骤考虑的最大标记数量。如果为 None，则设置为标记总数 (默认值: None)
beam_threshold (float, 可选) – 修剪假设的阈值 (默认值: 50)
lm_weight (float, 可选) – 语言模型权重 (默认值: 2)
word_score (float, 可选) – 单词插入分数 (默认值: 0)
unk_score (float, 可选) – 未知单词插入分数 (默认值: -inf)
sil_score (float, 可选) – 静音插入分数 (默认值: 0)
log_add (bool, 可选) – 合并假设时是否使用 logadd (默认值: False)
blank_token (str, 可选) – 对应于空白符的标记 (默认值: “-“)
sil_token (str, 可选) – 对应于静音符的标记 (默认值: “|”)
unk_word (str, 可选) – 对应于未知词的单词 (默认值: “<unk>”)

返回值：

解码器

返回类型：

CTCDecoder

示例

>>> decoder = ctc_decoder(
>>>     lexicon="lexicon.txt",
>>>     tokens="tokens.txt",
>>>     lm="kenlm.bin",
>>> )
>>> results = decoder(emissions) # List of shape (B, nbest) of Hypotheses

使用 ctc_decoder 的教程

使用 CTC 解码器进行 ASR 推理

ctc_decoder¶

文档

教程

资源