FCN

import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet50', pretrained=True)
# or
# model = torch.hub.load('pytorch/vision:v0.10.0', 'fcn_resnet101', pretrained=True)
model.eval()

所有预训练模型都期望输入图像以相同的方式进行归一化，即形状为 (N, 3, H, W) 的 3 通道 RGB 图像的小批量，其中 N 是图像数量，H 和 W 预计至少为 224 像素。图像必须加载到 [0, 1] 范围，然后使用 mean = [0.485, 0.456, 0.406] 和 std = [0.229, 0.224, 0.225] 进行归一化。

模型返回一个 OrderedDict，其中包含两个张量，它们与输入张量具有相同的高度和宽度，但有 21 个类别。output['out'] 包含语义掩码，而 output['aux'] 包含每个像素的辅助损失值。在推理模式下，output['aux'] 没有用。因此，output['out'] 的形状为 (N, 21, H, W)。更多文档可以在这里找到。

# Download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/images/deeplab1.png", "deeplab1.png")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
input_image = input_image.convert("RGB")
preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)['out'][0]
output_predictions = output.argmax(0)

这里的输出形状为 (21, H, W)，在每个位置，都有对应于每个类别预测的未归一化概率。要获得每个类别的最大预测，然后将其用于下游任务，可以执行 output_predictions = output.argmax(0)。

这里有一个小片段，用于绘制预测，其中每种颜色分配给每个类别（请参见左侧的可视化图像）。

# create a color pallette, selecting a color for each class
palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
colors = (colors % 255).numpy().astype("uint8")

# plot the semantic segmentation predictions of 21 classes in each color
r = Image.fromarray(output_predictions.byte().cpu().numpy()).resize(input_image.size)
r.putpalette(colors)

import matplotlib.pyplot as plt
plt.imshow(r)
# plt.show()

模型描述

FCN-ResNet 由全卷积网络模型构建，使用 ResNet-50 或 ResNet-101 主干。预训练模型已在 COCO train2017 的子集上进行训练，涵盖 Pascal VOC 数据集中存在的 20 个类别。

预训练模型在 COCO val2017 数据集上评估的准确性如下所示。

模型结构	平均 IOU	全局像素准确性
fcn_resnet50	60.5	91.4
fcn_resnet101	63.7	91.9

资源

用于语义分割的全卷积网络

带有 ResNet-50 和 ResNet-101 主干的全卷积网络模型

模型类型： 可脚本化 | 视觉

提交者： PyTorch 团队

在 GitHub 上查看 17.2k

在Google Collab上打开

打开模型演示