警告

TorchAudio 的 C++ API 是一个原型功能。API/ABI 向后兼容性不受保证。

注意

顶级命名空间已从 torchaudio 更改为 torio。 StreamReader 已重命名为 StreamingMediaDecoder。

torio::io::StreamingMediaDecoder¶

StreamingMediaDecoder 是 Python 等效项使用的实现，并提供类似的接口。当使用自定义 I/O（例如内存中数据）时，可以使用 StreamingMediaDecoderCustomIO 类。

这两个类都定义了相同的方法，因此它们的用法相同。

构造函数¶

StreamingMediaDecoder¶

class StreamingMediaDecoder¶

逐块获取和解码音频/视频流。

由 torio::io::StreamingMediaDecoderCustomIO 子类化

explicit torio::io::StreamingMediaDecoder::StreamingMediaDecoder(const std::string &src, const c10::optional<std::string> &format = c10::nullopt, const c10::optional<OptionDict> &option = c10::nullopt)¶

从源 URI 构造媒体处理器。

参数：:

src – 源媒体的 URL，格式为 FFmpeg 可理解的格式。
format – 指定格式（例如 mp4）或设备（例如 lavfi 和 avfoundation）
option – 初始化格式上下文（打开源）时传递的自定义选项。

StreamingMediaDecoderCustomIO¶

class StreamingMediaDecoderCustomIO : private detail::CustomInput, public torio::io::StreamingMediaDecoder ¶: 一个 StreamingMediaDecoder 的子类，它使用自定义读取函数。可用于从内存或自定义对象解码媒体。

torio::io::StreamingMediaDecoderCustomIO::StreamingMediaDecoderCustomIO(void *opaque, const c10::optional<std::string> &format, int buffer_size, int (*read_packet)(void *opaque, uint8_t *buf, int buf_size), int64_t (*seek)(void *opaque, int64_t offset, int whence) = nullptr, const c10::optional<OptionDict> &option = c10::nullopt)¶

使用自定义读取和查找函数构造 StreamingMediaDecoder。

参数：:

opaque – read_packet 和 seek 函数使用的自定义数据。
format – 指定输入格式。
buffer_size – 中间缓冲区的大小，FFmpeg 使用它将数据传递给函数 read_packet。
read_packet – 从 FFmpeg 调用以从目标读取数据的自定义读取函数。
seek – 用于查找目标的可选查找函数。
option – 初始化格式上下文时传递的自定义选项。

查询方法¶

find_best_audio_stream¶

int64_t torio::io::StreamingMediaDecoder::find_best_audio_stream() const¶

使用 ffmpeg 的启发式算法查找合适的音频流。

如果成功，则返回最佳流的索引（>=0）。否则返回负值。

find_best_video_stream¶

int64_t torio::io::StreamingMediaDecoder::find_best_video_stream() const¶

使用 ffmpeg 的启发式算法查找合适的视频流。

如果成功，则返回最佳流的索引（>=0）。否则返回负值。

get_metadata¶

OptionDict torio::io::StreamingMediaDecoder::get_metadata() const¶: 获取源媒体的元数据。

num_src_streams¶

int64_t torio::io::StreamingMediaDecoder::num_src_streams() const¶

获取输入媒体中找到的源流数量。

源流不仅包括音频/视频流，还包括字幕和其他流。

get_src_stream_info¶

SrcStreamInfo torio::io::StreamingMediaDecoder::get_src_stream_info(int i) const¶

获取指定源流的信息。

有效值范围为 [0, num_src_streams())。

num_out_streams¶

int64_t torio::io::StreamingMediaDecoder::num_out_streams() const¶: 获取客户端代码定义的输出流数量。

get_out_stream_info¶

OutputStreamInfo torio::io::StreamingMediaDecoder::get_out_stream_info(int i) const¶

获取指定输出流的信息。

有效值范围为 [0, num_out_streams())。

is_buffer_ready¶

bool torio::io::StreamingMediaDecoder::is_buffer_ready() const¶: 检查所有输出流的缓冲区是否包含足够的解码帧。

配置方法¶

add_audio_stream¶

void torio::io::StreamingMediaDecoder::add_audio_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt)¶

定义一个输出音频流。

参数：:

i – 源流的索引。
frames_per_chunk – 作为单个块返回的帧数。
如果源流在缓冲 frames_per_chunk 帧之前耗尽，则按原样返回块。因此，块中的帧数可能小于 ``frames_per_chunk。

提供 -1 将禁用分块，在这种情况下，方法 pop_chunks() 将返回所有缓冲的帧作为单个块。
num_chunks – 内部缓冲区大小。
当缓冲的块数超过此数量时，旧块将被丢弃。例如，如果 frames_per_chunk 为 5 且 buffer_chunk_size 为 3，则超过 15 帧的帧将被丢弃。

提供 -1 将禁用此行为，强制保留所有块。
filter_desc – 应用于源流的过滤器图的描述。
decoder – 要使用的解码器的名称。提供时，使用指定的解码器而不是默认解码器。
decoder_option – 传递给解码器的选项。
要列出解码器的解码器选项，可以使用 ffmpeg -h decoder=<DECODER> 命令。

除了特定于解码器的选项之外，您还可以传递与多线程相关的选项。它们仅在解码器支持它们时才有效。如果两者都没有提供，StreamingMediaDecoder 默认使用单线程。
- "threads": 线程数或值 "0"，让 FFmpeg 根据其启发式算法决定。
- "thread_type": 要使用的多线程方法。有效值为 "frame" 或 "slice"。请注意，每个解码器都支持不同的方法集。如果未提供，将使用默认值。
  - "frame": 同时解码多个帧。每个线程处理一帧。这将使解码延迟增加每线程一帧。
  - "slice": 同时解码单个帧的多个部分。

add_video_stream¶

void torio::io::StreamingMediaDecoder::add_video_stream(int64_t i, int64_t frames_per_chunk, int64_t num_chunks, const c10::optional<std::string> &filter_desc = c10::nullopt, const c10::optional<std::string> &decoder = c10::nullopt, const c10::optional<OptionDict> &decoder_option = c10::nullopt, const c10::optional<std::string> &hw_accel = c10::nullopt)¶

定义输出视频流。

参数：:

i, frames_per_chunk, num_chunks, filter_desc, decoder, decoder_option – 请参见 add_audio_stream()。
hw_accel – 启用硬件加速。
当视频在 CUDA 硬件上解码时（例如通过指定 "h264_cuvid" 解码器），将 CUDA 设备指示器传递给 hw_accel（即 hw_accel="cuda:0"）将使 StreamingMediaDecoder 将生成的帧直接放置在指定的 CUDA 设备上作为 CUDA 张量。

如果为 None，则该块将被移动到 CPU 内存。

remove_stream¶

void torio::io::StreamingMediaDecoder::remove_stream(int64_t i)¶

移除输出流。

参数：:: i – 要移除的输出流的索引。有效值范围为 [0, num_out_streams())。

流方法¶

seek¶

void torio::io::StreamingMediaDecoder::seek(double timestamp, int64_t mode)¶

跳转到给定的时间戳。

参数：:

timestamp – 目标时间戳，单位为秒。
mode – 跳转模式。
- 0: 关键帧模式。跳转到给定时间戳之前最近的关键帧。
- 1: 任意模式。跳转到给定时间戳之前的任何帧（包括非关键帧）。
- 2: 精确模式。首先跳转到给定时间戳之前最近的关键帧，然后解码帧，直到到达最接近给定时间戳的帧。

process_packet¶

int torio::io::StreamingMediaDecoder::process_packet()¶

解复用并处理一个数据包。

返回值:

0: 数据包处理成功，流中还有剩余数据包，客户端代码可以再次调用此方法。
1: 数据包处理成功，已到达文件末尾。客户端代码不应该再次调用此方法。
<0: 发生了错误。

process_packet_block¶

int torio::io::StreamingMediaDecoder::process_packet_block(const double timeout, const double backoff)¶

类似于 process_packet()，但如果由于资源暂时不可用而失败，它会自动重试。

这种行为在使用设备输入（如麦克风）时很有用，因为在样本采集过程中，缓冲区可能处于繁忙状态。

参数：:

timeout – 超时时间，单位为毫秒。
- >=0: 持续重试，直到超过指定时间。
- <0: 无限重试。
backoff – 重试前等待时间（毫秒）。

process_all_packets¶

void torio::io::StreamingMediaDecoder::process_all_packets()¶: 处理数据包，直到遇到 EOF。

fill_buffer¶

int torio::io::StreamingMediaDecoder::fill_buffer(const c10::optional<double> &timeout = c10::nullopt, const double backoff = 10.)¶

处理数据包，直到所有块缓冲区至少包含一个块。

参数：:

timeout – 参见 process_packet_block()
backoff – 参见 process_packet_block()

检索方法¶

pop_chunks¶

std::vector<c10::optional<Chunk>> torio::io::StreamingMediaDecoder::pop_chunks()¶: 从每个输出流中弹出单个块（如果可用）。

支持结构¶

块¶

struct Chunk¶

存储解码后的帧和元数据。

公共成员

torch::Tensor frames¶

音频/视频帧。

对于音频，形状为 [time, num_channels]，dtype 取决于输出流配置。

对于视频，形状为 [time, channel, height, width]，dtype 为 torch.uint8。

double pts¶: 第一帧的呈现时间戳，以秒为单位。

SrcStreaminfo¶

struct SrcStreamInfo¶

输入媒体中找到的源流信息。

通用成员

AVMediaType media_type¶

流媒体类型。

请参阅 FFmpeg 文档以了解可用值。

待办事项: 引入自己的枚举并摆脱 FFmpeg 依赖项。

const char *codec_name = "N/A"¶: 编解码器的名称。

const char *codec_long_name = "N/A"¶: 编解码器的名称，以长格式、用户友好的形式显示。

const char *fmt_name = "N/A"¶

对于音频，它是采样格式。

常见的发现值是；

"u8", "u8p": 8 位无符号整数。
"s16", "s16p": 16 位有符号整数。
"s32", "s32p": 32 位有符号整数。
"s64", "s64p": 64 位有符号整数。
"flt", "fltp": 32 位浮点数。
"dbl", "dblp": 64 位浮点数。

对于视频，它是颜色通道格式。

常见的值包括；

"gray8": 灰度
"rgb24": RGB
"bgr24": BGR
"yuv420p": YUV420p

int64_t bit_rate = 0¶: 比特率。

int64_t num_frames = 0¶: 帧数。

注意

在某些格式中，该值不可靠或不可用。

int bits_per_sample = 0¶: 每样本的比特数。

OptionDict metadata = {}¶

元数据

此方法可以从 MP3 中获取 ID3 标签。

示例

{
  "title": "foo",
  "artist": "bar",
  "date": "2017"
}

音频特定成员

double sample_rate = 0¶: 采样率。

int num_channels = 0¶: 通道数量。

视频特定成员

int width = 0¶: 宽度。

int height = 0¶: 高度。

double frame_rate = 0¶: 帧率。

OutputStreaminfo¶

struct OutputStreamInfo¶

用户代码配置的输出流信息。

音频特定成员

double sample_rate = -1¶: 采样率。

int num_channels = -1¶: 通道数量。

视频特定成员

int width = -1¶: 宽度。

int height = -1¶: 高度。

AVRational frame_rate = {0, 1}¶: 帧率。

公共成员

int source_index¶: 输入源流的索引。

AVMediaType media_type = AVMEDIA_TYPE_UNKNOWN¶

流媒体类型。

请参阅 FFmpeg 文档以了解可用值。

待办事项: 引入自己的枚举并摆脱 FFmpeg 依赖项。

int format = -1¶: 媒体格式。音频使用 AVSampleFormat，视频使用 AVPixelFormat。

std::string filter_description = {}¶: 滤波器图定义，例如 "aresample=16000,aformat=sample_fmts=fltp"。

torio::io::StreamingMediaDecoder¶

构造函数¶

StreamingMediaDecoder¶

StreamingMediaDecoderCustomIO¶

查询方法¶

find_best_audio_stream¶

find_best_video_stream¶

get_metadata¶

num_src_streams¶

get_src_stream_info¶

num_out_streams¶

get_out_stream_info¶

is_buffer_ready¶

配置方法¶

add_audio_stream¶

add_video_stream¶

remove_stream¶

流方法¶

seek¶

process_packet¶

process_packet_block¶

process_all_packets¶

fill_buffer¶

检索方法¶

pop_chunks¶

支持结构¶

块¶

SrcStreaminfo¶

OutputStreaminfo¶

文档

教程

资源