MultiScaleRoIAlign¶

class torchvision.ops.MultiScaleRoIAlign(featmap_names: List[str], output_size: Union[int, Tuple[int], List[int]], sampling_ratio: int, *, canonical_scale: int = 224, canonical_level: int = 4)[source]¶

多尺度 RoIAlign 池化，适用于带 FPN 或不带 FPN 的检测任务。

它通过特征金字塔网络论文中公式 1 中指定的启发式方法来推断池化尺度。关键字参数 canonical_scale 和 canonical_level 分别对应公式 1 中的 224 和 k0=4，其含义如下：canonical_level 是金字塔中用于对宽度 x 高度 = canonical_scale x canonical_scale 的感兴趣区域进行池化的目标层级。

参数：

featmap_names (List[str]) – 将用于池化的特征图名称。
output_size (Union[int, Tuple[int, int], List[int]]) – 池化区域的输出大小。
sampling_ratio (int) – ROIAlign 的采样比率。
canonical_scale (int, 可选) – LevelMapper 的 canonical_scale。
canonical_level (int, 可选) – LevelMapper 的 canonical_level。

示例

>>> m = torchvision.ops.MultiScaleRoIAlign(['feat1', 'feat3'], 3, 2)
>>> i = OrderedDict()
>>> i['feat1'] = torch.rand(1, 5, 64, 64)
>>> i['feat2'] = torch.rand(1, 5, 32, 32)  # this feature won't be used in the pooling
>>> i['feat3'] = torch.rand(1, 5, 16, 16)
>>> # create some random bounding boxes
>>> boxes = torch.rand(6, 4) * 256; boxes[:, 2:] += boxes[:, :2]
>>> # original image size, before computing the feature maps
>>> image_sizes = [(512, 512)]
>>> output = m(i, [boxes], image_sizes)
>>> print(output.shape)
>>> torch.Size([6, 5, 3, 3])

forward(x: Dict[str, Tensor], boxes: List[Tensor], image_shapes: List[Tuple[int, int]]) → Tensor[source]¶

参数：

x (OrderedDict[Tensor]) – 每个层级的特征图。假定它们具有相同的通道数，但大小可以不同。
boxes (List[Tensor[N, 4]]) – 用于执行池化操作的边界框，格式为 (x1, y1, x2, y2)，尺寸参照原始图像而非特征图。坐标必须满足 0 <= x1 < x2 和 0 <= y1 < y2。
image_shapes (List[Tuple[height, width]]) – 每张图像在输入 CNN 获取特征图之前的尺寸。这使我们能够推断出要池化的每个层级的尺度因子。

返回值：

result (Tensor)

MultiScaleRoIAlign¶

文档

教程

资源