如何编写自己的 v2 变换¶

注意

在 Colab 或跳转至末尾下载完整示例代码。

本指南解释了如何编写与 torchvision transforms V2 API 兼容的变换。

from typing import Any, Dict, List

import torch
from torchvision import tv_tensors
from torchvision.transforms import v2

只需创建一个 `nn.Module` 并重写 `forward` 方法¶

在大多数情况下，只要您已经知道变换期望的输入结构，这就足够了。例如，如果您只进行图像分类，您的变换通常会接受单个图像作为输入，或者一个 (img, label) 元组作为输入。因此，您可以直接在 forward 方法中硬编码，使其只接受这些输入，例如：

class MyCustomTransform(torch.nn.Module):
    def forward(self, img, label):
        # Do some transformations
        return new_img, new_label

注意

这意味着，如果您已有的自定义变换与 V1 变换（即 torchvision.transforms 中的变换）兼容，那么它无需任何修改即可与 V2 变换一起使用！

我们将在下面通过一个典型的检测用例更完整地说明这一点，该用例中的样本包括图像、边界框和标签。

class MyCustomTransform(torch.nn.Module):
    def forward(self, img, bboxes, label):  # we assume inputs are always structured like this
        print(
            f"I'm transforming an image of shape {img.shape} "
            f"with bboxes = {bboxes}\n{label = }"
        )
        # Do some transformations. Here, we're just passing though the input
        return img, bboxes, label


transforms = v2.Compose([
    MyCustomTransform(),
    v2.RandomResizedCrop((224, 224), antialias=True),
    v2.RandomHorizontalFlip(p=1),
    v2.Normalize(mean=[0, 0, 0], std=[1, 1, 1])
])

H, W = 256, 256
img = torch.rand(3, H, W)
bboxes = tv_tensors.BoundingBoxes(
    torch.tensor([[0, 10, 10, 20], [50, 50, 70, 70]]),
    format="XYXY",
    canvas_size=(H, W)
)
label = 3

out_img, out_bboxes, out_label = transforms(img, bboxes, label)

I'm transforming an image of shape torch.Size([3, 256, 256]) with bboxes = BoundingBoxes([[ 0, 10, 10, 20],
               [50, 50, 70, 70]], format=BoundingBoxFormat.XYXY, canvas_size=(256, 256))
label = 3

print(f"Output image shape: {out_img.shape}\nout_bboxes = {out_bboxes}\n{out_label = }")

Output image shape: torch.Size([3, 224, 224])
out_bboxes = BoundingBoxes([[224,   8, 224,  17],
               [196,  44, 218,  62]], format=BoundingBoxFormat.XYXY, canvas_size=(224, 224))
out_label = 3

注意

在代码中使用 TVTensor 类时，请务必熟悉本节内容：我有一个 TVTensor，但现在变成了 Tensor。求助！

支持任意输入结构¶

在上一节中，我们假设您已经知道输入的结构，并且可以接受在代码中硬编码这种期望的结构。如果您希望自定义变换尽可能灵活，这可能会有一些限制。

内置 Torchvision V2 变换的一个关键特性是它们可以接受任意输入结构并返回相同的结构作为输出（变换了其中的条目）。例如，变换可以接受单个图像、一个 (img, label) 元组，或任意嵌套字典作为输入。以下是内置变换 RandomHorizontalFlip 的一个示例：

structured_input = {
    "img": img,
    "annotations": (bboxes, label),
    "something that will be ignored": (1, "hello"),
    "another tensor that is ignored": torch.arange(10),
}
structured_output = v2.RandomHorizontalFlip(p=1)(structured_input)

assert isinstance(structured_output, dict)
assert structured_output["something that will be ignored"] == (1, "hello")
assert (structured_output["another tensor that is ignored"] == torch.arange(10)).all()
print(f"The input bboxes are:\n{structured_input['annotations'][0]}")
print(f"The transformed bboxes are:\n{structured_output['annotations'][0]}")

The input bboxes are:
BoundingBoxes([[ 0, 10, 10, 20],
               [50, 50, 70, 70]], format=BoundingBoxFormat.XYXY, canvas_size=(256, 256))
The transformed bboxes are:
BoundingBoxes([[246,  10, 256,  20],
               [186,  50, 206,  70]], format=BoundingBoxFormat.XYXY, canvas_size=(256, 256))

基础：重写 transform() 方法¶

为了在自定义变换中支持任意输入，您需要继承 Transform 并重写 .transform() 方法（而不是 forward() 方法！）。下面是一个基本示例

class MyCustomTransform(v2.Transform):
    def transform(self, inpt: Any, params: Dict[str, Any]):
        if type(inpt) == torch.Tensor:
            print(f"I'm transforming an image of shape {inpt.shape}")
            return inpt + 1  # dummy transformation
        elif isinstance(inpt, tv_tensors.BoundingBoxes):
            print(f"I'm transforming bounding boxes! {inpt.canvas_size = }")
            return tv_tensors.wrap(inpt + 100, like=inpt)  # dummy transformation


my_custom_transform = MyCustomTransform()
structured_output = my_custom_transform(structured_input)

assert isinstance(structured_output, dict)
assert structured_output["something that will be ignored"] == (1, "hello")
assert (structured_output["another tensor that is ignored"] == torch.arange(10)).all()
print(f"The input bboxes are:\n{structured_input['annotations'][0]}")
print(f"The transformed bboxes are:\n{structured_output['annotations'][0]}")

I'm transforming an image of shape torch.Size([3, 256, 256])
I'm transforming bounding boxes! inpt.canvas_size = (256, 256)
The input bboxes are:
BoundingBoxes([[ 0, 10, 10, 20],
               [50, 50, 70, 70]], format=BoundingBoxFormat.XYXY, canvas_size=(256, 256))
The transformed bboxes are:
BoundingBoxes([[100, 110, 110, 120],
               [150, 150, 170, 170]], format=BoundingBoxFormat.XYXY, canvas_size=(256, 256))

需要注意的重要一点是，当我们对 structured_input 调用 my_custom_transform 时，输入会被展平，然后每个单独的部分被传递给 transform()。也就是说，transform() 会接收输入的图像，然后是边界框，等等。在 transform() 中，您可以根据输入的类型决定如何变换每个输入。

如果您好奇为什么另一个张量（torch.arange()）没有被传递给 transform()，请参阅此注意事项了解更多详情。

进阶：`make_params()` 方法¶

make_params() 方法在对每个输入调用 transform() 之前内部调用。这通常用于生成随机参数值。在下面的示例中，我们使用它以 0.5 的概率随机应用变换

class MyRandomTransform(MyCustomTransform):
    def __init__(self, p=0.5):
        self.p = p
        super().__init__()

    def make_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
        apply_transform = (torch.rand(size=(1,)) < self.p).item()
        params = dict(apply_transform=apply_transform)
        return params

    def transform(self, inpt: Any, params: Dict[str, Any]):
        if not params["apply_transform"]:
            print("Not transforming anything!")
            return inpt
        else:
            return super().transform(inpt, params)


my_random_transform = MyRandomTransform()

torch.manual_seed(0)
_ = my_random_transform(structured_input)  # transforms
_ = my_random_transform(structured_input)  # doesn't transform

I'm transforming an image of shape torch.Size([3, 256, 256])
I'm transforming bounding boxes! inpt.canvas_size = (256, 256)
Not transforming anything!
Not transforming anything!

注意

重要的是，此类随机参数的生成应在 make_params() 中发生，而不是在 transform() 中发生，这样对于给定的变换调用，相同的随机数生成适用于所有输入，方式一致。如果我们在 transform() 中执行随机数生成，我们就可能面临例如变换图像但不变换边界框的风险。

make_params() 方法将所有输入的列表作为参数（此列表中的每个元素稍后将被传递给 transform()）。您可以使用 flat_inputs 来确定输入的维度，例如使用 query_chw() 或 query_size()。

make_params() 应该返回一个 dict（或者实际上，任何您想要的对象），该对象随后将被传递给 transform()。

脚本总运行时间： (0 分钟 0.010 秒)

Gallery 由 Sphinx-Gallery 生成

如何编写自己的 v2 变换¶

只需创建一个 `nn.Module` 并重写 `forward` 方法¶

支持任意输入结构¶

基础：重写 transform() 方法¶

进阶：`make_params()` 方法¶

文档

教程

资源

如何编写自己的 v2 变换¶

只需创建一个 nn.Module 并重写 forward 方法¶

支持任意输入结构¶

基础：重写 transform() 方法¶

进阶：make_params() 方法¶

文档

教程

资源

只需创建一个 `nn.Module` 并重写 `forward` 方法¶

进阶：`make_params()` 方法¶