注意
单击 此处 下载完整的示例代码
AudioEffector 用法¶
作者: Moto Hira
本教程展示了如何使用 torchaudio.io.AudioEffector
将各种效果和编解码器应用于波形张量。
注意
本教程需要 FFmpeg 库。请参考 FFmpeg 依赖项 了解详细信息。
概述¶
AudioEffector
结合了 StreamWriter
和 StreamReader
提供的内存编码、解码和滤波功能。
下图说明了该过程。

import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
2.6.0
2.6.0
from torchaudio.io import AudioEffector, CodecConfig
import matplotlib.pyplot as plt
from IPython.display import Audio
libavcodec (60, 3, 100)
libavdevice (60, 1, 100)
libavfilter (9, 3, 100)
libavformat (60, 3, 100)
libavutil (58, 2, 100)
用法¶
要使用 AudioEffector
,请使用 effect
和 format
实例化它,然后将波形传递给 apply()
或 stream()
方法。
effector = AudioEffector(effect=..., format=...,)
# Apply at once
applied = effector.apply(waveform, sample_rate)
apply
方法一次将效果和编解码器应用于整个波形。因此,如果输入波形很长,并且内存消耗是一个问题,则可以使用 stream
方法逐块处理。
# Apply chunk by chunk
for applied_chunk = effector.stream(waveform, sample_rate):
...
示例¶
图库¶
def show(effect, *, stereo=False):
wf = torch.cat([waveform] * 2, dim=1) if stereo else waveform
figsize = (6.4, 2.1 if stereo else 1.2)
effector = AudioEffector(effect=effect, pad_end=False)
result = effector.apply(wf, int(sr))
num_channels = result.size(1)
f, ax = plt.subplots(num_channels, 1, squeeze=False, figsize=figsize, sharex=True)
for i in range(num_channels):
ax[i][0].specgram(result[:, i], Fs=sr)
f.set_tight_layout(True)
return Audio(result.numpy().T, rate=sr)
原始音频¶
show(effect=None)

效果¶
tempo(速度)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#atempo
show("atempo=0.7")

show("atempo=1.8")

highpass(高通)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#highpass
show("highpass=frequency=1500")

lowpass(低通)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#lowpass
show("lowpass=frequency=1000")

allpass(全通)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#allpass
show("allpass")

bandpass(带通)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#bandpass
show("bandpass=frequency=3000")

bandreject(带阻)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#bandreject
show("bandreject=frequency=3000")

echo(回声)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#aecho
show("aecho=in_gain=0.8:out_gain=0.88:delays=6:decays=0.4")

show("aecho=in_gain=0.8:out_gain=0.88:delays=60:decays=0.4")

show("aecho=in_gain=0.8:out_gain=0.9:delays=1000:decays=0.3")

chorus(合唱)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#chorus
show("chorus=0.5:0.9:50|60|40:0.4|0.32|0.3:0.25|0.4|0.3:2|2.3|1.3")

fft filter(FFT 滤波器)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#afftfilt
# fmt: off
show(
"afftfilt="
"real='re * (1-clip(b * (b/nb), 0, 1))':"
"imag='im * (1-clip(b * (b/nb), 0, 1))'"
)

show(
"afftfilt="
"real='hypot(re,im) * sin(0)':"
"imag='hypot(re,im) * cos(0)':"
"win_size=512:"
"overlap=0.75"
)

show(
"afftfilt="
"real='hypot(re,im) * cos(2 * 3.14 * (random(0) * 2-1))':"
"imag='hypot(re,im) * sin(2 * 3.14 * (random(1) * 2-1))':"
"win_size=128:"
"overlap=0.8"
)
# fmt: on

vibrato(颤音)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#vibrato
show("vibrato=f=10:d=0.8")

/pytorch/audio/ci_env/lib/python3.10/site-packages/IPython/lib/display.py:188: RuntimeWarning: invalid value encountered in cast
return scaled.astype("<h").tobytes(), nchan
tremolo(颤音)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#tremolo
show("tremolo=f=8:d=0.8")

crystalizer(晶化器)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#crystalizer
show("crystalizer")

flanger(镶边)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#flanger
show("flanger")

phaser(相位器)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#aphaser
show("aphaser")

pulsator(脉冲器)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#apulsator
show("apulsator", stereo=True)

haas(哈斯效应)¶
https://ffmpeg.cpp.org.cn/ffmpeg-filters.html#haas
show("haas")

编解码器¶
def show_multi(configs):
results = []
for config in configs:
effector = AudioEffector(**config)
results.append(effector.apply(waveform, int(sr)))
num_configs = len(configs)
figsize = (6.4, 0.3 + num_configs * 0.9)
f, axes = plt.subplots(num_configs, 1, figsize=figsize, sharex=True)
for result, ax in zip(results, axes):
ax.specgram(result[:, 0], Fs=sr)
f.set_tight_layout(True)
return [Audio(r.numpy().T, rate=sr) for r in results]
ogg¶
results = show_multi(
[
{"format": "ogg"},
{"format": "ogg", "encoder": "vorbis"},
{"format": "ogg", "encoder": "opus"},
]
)

ogg - 默认编码器 (flac)¶
results[0]
ogg - vorbis¶
results[1]
ogg - opus¶
results[2]
mp3¶
https://trac.ffmpeg.org/wiki/Encode/MP3
results = show_multi(
[
{"format": "mp3"},
{"format": "mp3", "codec_config": CodecConfig(compression_level=1)},
{"format": "mp3", "codec_config": CodecConfig(compression_level=9)},
{"format": "mp3", "codec_config": CodecConfig(bit_rate=192_000)},
{"format": "mp3", "codec_config": CodecConfig(bit_rate=8_000)},
{"format": "mp3", "codec_config": CodecConfig(qscale=9)},
{"format": "mp3", "codec_config": CodecConfig(qscale=1)},
]
)

default(默认)¶
results[0]
compression_level=1(压缩级别=1)¶
results[1]
compression_level=9(压缩级别=9)¶
results[2]
bit_rate=192k(比特率=192k)¶
results[3]
bit_rate=8k(比特率=8k)¶
results[4]
qscale=9(质量等级=9)¶
results[5]
qscale=1(质量等级=1)¶
results[6]
标签: torchaudio.io
脚本总运行时间: ( 0 分钟 3.057 秒)