注意
点击 这里 下载完整的示例代码
加法合成¶
作者: Moto Hira
本教程是 振荡器和 ADSR 包络 的延续。
本教程展示了如何使用 TorchAudio 的 DSP 函数执行加法合成和减法合成。
加法合成通过组合多个波形来创建音色。减法合成通过应用滤波器来创建音色。
警告
本教程需要原型 DSP 特性,这些特性在 nightly 版本中可用。
请参考 https://pytorch.ac.cn/get-started/locally 获取安装 nightly 版本的说明。
import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
2.5.0
2.5.0
概述¶
try:
from torchaudio.prototype.functional import adsr_envelope, extend_pitch, oscillator_bank
except ModuleNotFoundError:
print(
"Failed to import prototype DSP features. "
"Please install torchaudio nightly builds. "
"Please refer to https://pytorch.ac.cn/get-started/locally "
"for instructions to install a nightly build."
)
raise
import matplotlib.pyplot as plt
from IPython.display import Audio
创建多个频率音高¶
加法合成的核心是振荡器。我们通过对振荡器产生的多个波形求和来创建一个音色。
在 振荡器教程 中,我们使用了 oscillator_bank()
和 adsr_envelope()
来生成各种波形。
在本教程中,我们使用 extend_pitch()
从基频创建一个音色。
首先,我们定义一些常量和辅助函数,这些函数将在整个教程中使用。
PI = torch.pi
PI2 = 2 * torch.pi
F0 = 344.0 # fundamental frequency
DURATION = 1.1 # [seconds]
SAMPLE_RATE = 16_000 # [Hz]
NUM_FRAMES = int(DURATION * SAMPLE_RATE)
def plot(freq, amp, waveform, sample_rate, zoom=None, vol=0.1):
t = (torch.arange(waveform.size(0)) / sample_rate).numpy()
fig, axes = plt.subplots(4, 1, sharex=True)
axes[0].plot(t, freq.numpy())
axes[0].set(title=f"Oscillator bank (bank size: {amp.size(-1)})", ylabel="Frequency [Hz]", ylim=[-0.03, None])
axes[1].plot(t, amp.numpy())
axes[1].set(ylabel="Amplitude", ylim=[-0.03 if torch.all(amp >= 0.0) else None, None])
axes[2].plot(t, waveform)
axes[2].set(ylabel="Waveform")
axes[3].specgram(waveform, Fs=sample_rate)
axes[3].set(ylabel="Spectrogram", xlabel="Time [s]", xlim=[-0.01, t[-1] + 0.01])
for i in range(4):
axes[i].grid(True)
pos = axes[2].get_position()
fig.tight_layout()
if zoom is not None:
ax = fig.add_axes([pos.x0 + 0.02, pos.y0 + 0.03, pos.width / 2.5, pos.height / 2.0])
ax.plot(t, waveform)
ax.set(xlim=zoom, xticks=[], yticks=[])
waveform /= waveform.abs().max()
return Audio(vol * waveform, rate=sample_rate, normalize=False)
泛音¶
泛音是指频率是基频整数倍的频率成分。
我们看看如何生成合成器中常用的波形。即,
锯齿波
方波
三角波
锯齿波¶
锯齿波 可以表示为以下公式。它包含所有整数泛音,因此也常用于减法合成。
以下函数接受基频和振幅,并根据上述公式扩展音高。
现在合成一个波形
freq0 = torch.full((NUM_FRAMES, 1), F0)
amp0 = torch.ones((NUM_FRAMES, 1))
freq, amp, waveform = sawtooth_wave(freq0, amp0, int(SAMPLE_RATE / F0), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
可以使基频振荡,以基于锯齿波创建时变音调。
fm = 10 # rate at which the frequency oscillates [Hz]
f_dev = 0.1 * F0 # the degree of frequency oscillation [Hz]
phase = torch.linspace(0, fm * PI2 * DURATION, NUM_FRAMES)
freq0 = F0 + f_dev * torch.sin(phase).unsqueeze(-1)
freq, amp, waveform = sawtooth_wave(freq0, amp0, int(SAMPLE_RATE / F0), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
方波¶
方波 仅包含奇数整数泛音。
def square_wave(freq0, amp0, num_pitches, sample_rate):
mults = [2.0 * i + 1.0 for i in range(num_pitches)]
freq = extend_pitch(freq0, mults)
mults = [4 / (PI * (2.0 * i + 1.0)) for i in range(num_pitches)]
amp = extend_pitch(amp0, mults)
waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
return freq, amp, waveform
freq0 = torch.full((NUM_FRAMES, 1), F0)
amp0 = torch.ones((NUM_FRAMES, 1))
freq, amp, waveform = square_wave(freq0, amp0, int(SAMPLE_RATE / F0 / 2), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
三角波¶
三角波 也仅包含奇数整数泛音。
def triangle_wave(freq0, amp0, num_pitches, sample_rate):
mults = [2.0 * i + 1.0 for i in range(num_pitches)]
freq = extend_pitch(freq0, mults)
c = 8 / (PI**2)
mults = [c * ((-1) ** i) / ((2.0 * i + 1.0) ** 2) for i in range(num_pitches)]
amp = extend_pitch(amp0, mults)
waveform = oscillator_bank(freq, amp, sample_rate=sample_rate)
return freq, amp, waveform
freq, amp, waveform = triangle_wave(freq0, amp0, int(SAMPLE_RATE / F0 / 2), SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))
/pytorch/audio/src/torchaudio/prototype/functional/_dsp.py:63: UserWarning: Some frequencies are above nyquist frequency. Setting the corresponding amplitude to zero. This might cause numerically unstable gradient.
warnings.warn(
非泛音分音¶
非泛音分音是指频率不是基频整数倍的频率。
它们对于重新创建逼真的声音或使合成结果更有趣至关重要。
钟声¶
https://computermusicresource.com/Simple.bell.tutorial.html
num_tones = 9
duration = 2.0
num_frames = int(SAMPLE_RATE * duration)
freq0 = torch.full((num_frames, 1), F0)
mults = [0.56, 0.92, 1.19, 1.71, 2, 2.74, 3.0, 3.76, 4.07]
freq = extend_pitch(freq0, mults)
amp = adsr_envelope(
num_frames=num_frames,
attack=0.002,
decay=0.998,
sustain=0.0,
release=0.0,
n_decay=2,
)
amp = torch.stack([amp * (0.5**i) for i in range(num_tones)], dim=-1)
waveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)
plot(freq, amp, waveform, SAMPLE_RATE, vol=0.4)
作为对比,以下是上述内容的泛音版本。只有频率值不同。泛音数量及其振幅相同。
参考资料¶
脚本总运行时间: ( 0 分钟 4.900 秒)