快捷方式

精确 vs 近似 seek 模式:性能与精度对比

在此示例中,我们将介绍 VideoDecoder 类的 seek_mode 参数。此参数在 VideoDecoder 对象的创建速度与检索到的帧的寻址(seeking)精度之间提供了权衡(即在近似模式下,请求第 i 帧可能不一定会返回第 i 帧)。

首先是一些样板代码:我们将从网上下载一个短视频,并使用 ffmpeg CLI 将其重复 100 次。我们将得到两个视频:一个大约 13 秒的短视频和一个大约 20 分钟的长视频。您可以忽略这一部分,直接跳到下面的性能:VideoDecoder 创建

import torch
import requests
import tempfile
from pathlib import Path
import shutil
import subprocess
from time import perf_counter_ns


# Video source: https://www.pexels.com/video/dog-eating-854132/
# License: CC0. Author: Coverr.
url = "https://videos.pexels.com/video-files/854132/854132-sd_640_360_25fps.mp4"
response = requests.get(url, headers={"User-Agent": ""})
if response.status_code != 200:
    raise RuntimeError(f"Failed to download video. {response.status_code = }.")

temp_dir = tempfile.mkdtemp()
short_video_path = Path(temp_dir) / "short_video.mp4"
with open(short_video_path, 'wb') as f:
    for chunk in response.iter_content():
        f.write(chunk)

long_video_path = Path(temp_dir) / "long_video.mp4"
ffmpeg_command = [
    "ffmpeg",
    "-stream_loop", "99",  # repeat video 100 times
    "-i", f"{short_video_path}",
    "-c", "copy",
    f"{long_video_path}"
]
subprocess.run(ffmpeg_command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

from torchcodec.decoders import VideoDecoder
print(f"Short video duration: {VideoDecoder(short_video_path).metadata.duration_seconds} seconds")
print(f"Long video duration: {VideoDecoder(long_video_path).metadata.duration_seconds / 60} minutes")
Short video duration: 13.8 seconds
Long video duration: 23.0 minutes

性能:VideoDecoder 创建

在性能方面,seek_mode 参数最终影响 VideoDecoder 对象的创建。视频越长,性能提升越大。

def bench(f, average_over=50, warmup=2, **f_kwargs):

    for _ in range(warmup):
        f(**f_kwargs)

    times = []
    for _ in range(average_over):
        start = perf_counter_ns()
        f(**f_kwargs)
        end = perf_counter_ns()
        times.append(end - start)

    times = torch.tensor(times) * 1e-6  # ns to ms
    std = times.std().item()
    med = times.median().item()
    print(f"{med = :.2f}ms +- {std:.2f}")


print("Creating a VideoDecoder object with seek_mode='exact' on a short video:")
bench(VideoDecoder, source=short_video_path, seek_mode="exact")
print("Creating a VideoDecoder object with seek_mode='approximate' on a short video:")
bench(VideoDecoder, source=short_video_path, seek_mode="approximate")
print()
print("Creating a VideoDecoder object with seek_mode='exact' on a long video:")
bench(VideoDecoder, source=long_video_path, seek_mode="exact")
print("Creating a VideoDecoder object with seek_mode='approximate' on a long video:")
bench(VideoDecoder, source=long_video_path, seek_mode="approximate")
Creating a VideoDecoder object with seek_mode='exact' on a short video:
med = 7.98ms +- 0.02
Creating a VideoDecoder object with seek_mode='approximate' on a short video:
med = 7.05ms +- 0.02

Creating a VideoDecoder object with seek_mode='exact' on a long video:
med = 108.56ms +- 1.42
Creating a VideoDecoder object with seek_mode='approximate' on a long video:
med = 10.43ms +- 0.02

性能:帧解码和片段采样

严格来说,seek_mode 参数只影响 VideoDecoder 对象的创建性能。它对帧解码或采样性能没有直接影响。但是,由于帧解码和采样模式通常涉及 VideoDecoder 对象的创建(每个视频一个),因此 seek_mode 很可能最终影响解码和采样器的性能。例如

from torchcodec import samplers


def sample_clips(seek_mode):
    return samplers.clips_at_random_indices(
        decoder=VideoDecoder(
            source=long_video_path,
            seek_mode=seek_mode
        ),
        num_clips=5,
        num_frames_per_clip=2,
    )


print("Sampling clips with seek_mode='exact':")
bench(sample_clips, seek_mode="exact")
print("Sampling clips with seek_mode='approximate':")
bench(sample_clips, seek_mode="approximate")
Sampling clips with seek_mode='exact':
med = 272.07ms +- 34.39
Sampling clips with seek_mode='approximate':
med = 188.57ms +- 39.54

精度:元数据和帧检索

我们已经看到,使用 seek_mode="approximate" 可以显著加快 VideoDecoder 对象的创建速度。为此付出的代价是寻址不会总是像使用 seek_mode="exact" 时那样精确。它也可能影响元数据的准确性。

然而,在很多情况下,你会发现两种模式之间没有精度差异,这意味着 seek_mode="approximate" 总体上是有益的。

print("Metadata of short video with seek_mode='exact':")
print(VideoDecoder(short_video_path, seek_mode="exact").metadata)
print("Metadata of short video with seek_mode='approximate':")
print(VideoDecoder(short_video_path, seek_mode="approximate").metadata)

exact_decoder = VideoDecoder(short_video_path, seek_mode="exact")
approx_decoder = VideoDecoder(short_video_path, seek_mode="approximate")
for i in range(len(exact_decoder)):
    torch.testing.assert_close(
        exact_decoder.get_frame_at(i).data,
        approx_decoder.get_frame_at(i).data,
        atol=0, rtol=0,
    )
print("Frame seeking is the same for this video!")
Metadata of short video with seek_mode='exact':
VideoStreamMetadata:
  duration_seconds_from_header: 13.8
  begin_stream_seconds_from_header: 0.0
  bit_rate: 505790.0
  codec: h264
  stream_index: 0
  begin_stream_seconds_from_content: 0.0
  end_stream_seconds_from_content: 13.8
  width: 640
  height: 360
  num_frames_from_header: 345
  num_frames_from_content: 345
  average_fps_from_header: 25.0
  duration_seconds: 13.8
  begin_stream_seconds: 0.0
  end_stream_seconds: 13.8
  num_frames: 345
  average_fps: 25.0

Metadata of short video with seek_mode='approximate':
VideoStreamMetadata:
  duration_seconds_from_header: 13.8
  begin_stream_seconds_from_header: 0.0
  bit_rate: 505790.0
  codec: h264
  stream_index: 0
  begin_stream_seconds_from_content: None
  end_stream_seconds_from_content: None
  width: 640
  height: 360
  num_frames_from_header: 345
  num_frames_from_content: None
  average_fps_from_header: 25.0
  duration_seconds: 13.8
  begin_stream_seconds: 0
  end_stream_seconds: 13.8
  num_frames: 345
  average_fps: 25.0

Frame seeking is the same for this video!

其底层原理是什么?

使用 seek_mode="exact" 时,VideoDecoder 在实例化时会执行一次扫描。扫描不涉及解码,但会处理整个文件以推断更准确的元数据(如时长),并构建一个内部帧和关键帧索引。这个内部索引可能比文件头中的索引更准确,从而实现更精确的寻址行为。如果不进行扫描,TorchCodec 只依赖文件中包含的元数据,而这些元数据可能不总是那么准确。

我应该使用哪种模式?

一般原则如下

  • 如果您非常关心帧寻址的精确性,请使用“exact”模式。

  • 如果您可以牺牲寻址精确性来换取速度,这通常在进行片段采样时是合适的,请使用“approximate”模式。

  • 如果您的视频没有可变帧率且其元数据是正确的,那么“approximate”模式总体上有益:它将像“exact”模式一样准确,同时显著更快。

脚本总运行时间: (0 分钟 33.718 秒)

由 Sphinx-Gallery 生成的画廊

文档

访问 PyTorch 全面的开发者文档

查看文档

教程

获取面向初学者和高级开发者的深入教程

查看教程

资源

查找开发资源并获取问题解答

查看资源