目录

快捷方式

VideoDecoder¶

class torchcodec.decoders.VideoDecoder(source: Union[str, Path, RawIOBase, BufferedReader, bytes, Tensor], *, stream_index: Optional[int] = None, dimension_order: Literal['NCHW', 'NHWC'] = 'NCHW', num_ffmpeg_threads: int = 1, device: Optional[Union[str, device]] = 'cpu', seek_mode: Literal['exact', 'approximate'] = 'exact')[source]¶

单流视频解码器。

参数：

source (str, Pathlib.path, bytes, torch.Tensor 或类文件对象) –
视频源
- 如果为 str 类型：视频文件的本地路径或 URL。
- 如果为 Pathlib.path 类型：本地视频文件的路径。
- 如果为 bytes 对象或 torch.Tensor 类型：原始编码视频数据。
- 如果为类文件对象：我们按需从该对象读取视频数据。该对象必须公开方法 read(self, size: int) -> bytes 和 seek(self, offset: int, whence: int) -> bytes。更多信息请阅读：通过类文件支持流式传输数据。
stream_index (int, 可选) – 指定从视频中的哪个流解码帧。请注意，此索引在所有媒体类型中是绝对的。如果未指定，则使用最佳流。

dimension_order (str, 可选) –

解码帧的维度顺序。可以是“NCHW”（默认）或“NHWC”，其中 N 是批量大小，C 是通道数，H 是高度，W 是帧宽度。 .. 注意

Frames are natively decoded in NHWC format by the underlying
FFmpeg implementation. Converting those into NCHW format is a
cheap no-copy operation that allows these frames to be
transformed using the `torchvision transforms
<https://pytorch.ac.cn/vision/stable/transforms.html>`_.

num_ffmpeg_threads (int, 可选) – 用于解码的线程数。使用 1 进行单线程解码，如果您并行运行多个 VideoDecoder 实例，这可能是最佳选择。如果运行单个 VideoDecoder 实例，使用更高的数字进行多线程解码是最佳选择。传递 0 让 FFmpeg 自行决定线程数。默认值：1。
device (str 或 torch.device, 可选) – 用于解码的设备。默认值：“cpu”。
seek_mode (str, 可选) – 确定帧访问是“精确”还是“近似”。精确模式保证请求帧 i 总是返回帧 i，但这需要对文件进行初始扫描。近似模式更快，因为它避免扫描文件，但准确性较低，因为它使用文件的元数据来计算 i 可能的位置。默认值：“exact”。更多关于此参数的信息请阅读：精确 vs 近似 seek 模式：性能和准确性比较

变量：

metadata (VideoStreamMetadata) – 视频流的元数据。
stream_index (int) – 此解码器从中检索帧的流索引。如果在初始化时提供了流索引，则此值与提供的值相同。如果未指定，则这是最佳流。

使用 VideoDecoder 的示例

精确 vs 近似 seek 模式：性能和准确性比较

精确 vs 近似 seek 模式：性能和准确性比较

使用 CUDA 和 NVDEC 在 GPU 上加速视频解码

使用 CUDA 和 NVDEC 在 GPU 上加速视频解码

使用 VideoDecoder 解码视频

使用 VideoDecoder 解码视频

通过类文件支持流式传输数据

通过类文件支持流式传输数据

如何采样视频片段

如何采样视频片段

__getitem__(key: Union[Integral, slice]) → Tensor[source]¶

以张量形式返回给定索引或范围处的帧。

注意

如果您需要解码多帧，我们建议改用批量方法，因为它们速度更快：get_frames_at()、get_frames_in_range()、get_frames_played_at() 和 get_frames_played_in_range()。

参数：: key (int 或 slice) – 要检索的帧的索引或范围。
返回：: 给定索引或范围处的帧。
返回类型：: torch.Tensor

get_frame_at(index: int) → Frame[source]¶

返回给定索引处的单帧。

注意

如果您需要解码多帧，我们建议改用批量方法，因为它们速度更快：get_frames_at()、get_frames_in_range()、get_frames_played_at()、get_frames_played_in_range()。

参数：: index (int) – 要检索的帧的索引。
返回：: 给定索引处的帧。
返回类型：: Frame

get_frame_played_at(seconds: float) → Frame[source]¶

返回给定时间戳（以秒为单位）播放的单帧。

注意

如果您需要解码多帧，我们建议改用批量方法，因为它们速度更快：get_frames_at()、get_frames_in_range()、get_frames_played_at()、get_frames_played_in_range()。

参数：: seconds (float) – 帧播放时的时间戳（以秒为单位）。
返回：: 在 seconds 时播放的帧。
返回类型：: Frame

get_frames_at(indices: list[int]) → FrameBatch[source]¶

返回给定索引处的帧。

参数：: indices (list of int) – 要检索的帧的索引列表。
返回：: 给定索引处的帧。
返回类型：: FrameBatch

get_frames_in_range(start: int, stop: int, step: int = 1) → FrameBatch[source]¶

返回给定索引范围内的多帧。

帧在 [start, stop) 范围内。

参数：

start (int) – 要检索的第一帧的索引。
stop (int) – 索引范围的结束（不包含，按照 Python 约定）。
step (int, 可选) – 帧之间的步长。默认值：1。

返回：

指定范围内的帧。

返回类型：

get_frames_played_at(seconds: list[float]) → FrameBatch[source]¶

返回给定时间戳（以秒为单位）播放的帧。

参数：: seconds (list of float) – 帧播放时的时间戳列表（以秒为单位）。
返回：: 在 seconds 时播放的帧。
返回类型：: FrameBatch

get_frames_played_in_range(start_seconds: float, stop_seconds: float) → FrameBatch[source]¶

返回给定范围内的多帧。

帧在半开区间 [start_seconds, stop_seconds) 内。返回的每帧的pts（以秒为单位）都在此半开区间内。

参数：

start_seconds (float) – 范围起始时间（以秒为单位）。
stop_seconds (float) – 范围结束时间（以秒为单位）。由于是半开区间，结束时间不包含在内。

返回：

指定范围内的帧。

返回类型：

文档

访问 PyTorch 的全面开发者文档

查看文档

教程

获取针对初学者和高级开发者的深度教程

查看教程

资源

查找开发资源并获得解答

查看资源