快捷方式

自定义组件

本指南介绍如何构建一个简单的应用程序和自定义组件规范,并通过两种不同的调度器启动它。

有关安装和基本用法的详细信息,请参阅 快速入门指南

Hello World

让我们从编写一个简单的“Hello World” Python 应用程序开始。这只是一个普通的 Python 程序,可以包含任何您想要的内容。

注意

此示例使用 Jupyter Notebook %%writefile 为示例目的创建本地文件。在正常使用情况下,您将拥有这些作为独立文件。

[1]:
%%writefile my_app.py

import sys
import argparse

def main(user: str) -> None:
    print(f"Hello, {user}!")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Hello world app"
    )
    parser.add_argument(
        "--user",
        type=str,
        help="the person to greet",
        required=True,
    )
    args = parser.parse_args(sys.argv[1:])

    main(args.user)
Overwriting my_app.py

现在我们有了应用程序,可以为它编写组件文件。此功能使我们能够以用户友好的方式重复使用和共享我们的应用程序。

我们可以通过 torchx cli 或以编程方式(作为管道的一部分)使用此组件。

[2]:
%%writefile my_component.py

import torchx.specs as specs

def greet(user: str, image: str = "my_app:latest") -> specs.AppDef:
    return specs.AppDef(
        name="hello_world",
        roles=[
            specs.Role(
                name="greeter",
                image=image,
                entrypoint="python",
                args=[
                    "-m", "my_app",
                    "--user", user,
                ],
            )
        ],
    )
Overwriting my_component.py

我们可以通过 torchx run 执行我们的组件。local_cwd 调度器相对于当前目录执行组件。

[3]:
%%sh
torchx run --scheduler local_cwd my_component.py:greet --user "your name"
torchx 2024-07-17 02:04:04 INFO     Tracker configurations: {}
torchx 2024-07-17 02:04:04 INFO     Log directory not set in scheduler cfg. Creating a temporary log dir that will be deleted on exit. To preserve log directory set the `log_dir` cfg option
torchx 2024-07-17 02:04:04 INFO     Log directory is: /tmp/torchx_b44hv08a
torchx 2024-07-17 02:04:04 INFO     Waiting for the app to finish...
greeter/0 Hello, your name!
torchx 2024-07-17 02:04:05 INFO     Job finished: SUCCEEDED
local_cwd://torchx/hello_world-l72k6xzs9nl7qc

如果我们想要在其他环境中运行,我们可以构建一个 Docker 容器,以便我们可以在启用 Docker 的环境(例如 Kubernetes)或通过本地 Docker 调度器运行我们的组件。

注意

这需要安装 Docker,并且在 Google Colab 等环境中不起作用。如果您尚未安装,请按照以下链接中的安装说明进行操作:https://docs.docker.net.cn/get-docker/

[4]:
%%writefile Dockerfile.custom

FROM ghcr.io/pytorch/torchx:0.1.0rc1

ADD my_app.py .
Overwriting Dockerfile.custom

创建 Dockerfile 后,我们可以创建 Docker 镜像。

[5]:
%%sh
docker build -t my_app:latest -f Dockerfile.custom .
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.custom
#1 transferring dockerfile: 158B done
#1 DONE 0.0s

#2 [internal] load metadata for ghcr.io/pytorch/torchx:0.1.0rc1
#2 DONE 0.4s

#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [1/2] FROM ghcr.io/pytorch/torchx:0.1.0rc1@sha256:a738949601d82e7f100fa1efeb8dde0c35ce44c66726cf38596f96d78dcd7ad3
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 484B done
#5 DONE 0.0s

#6 [2/2] ADD my_app.py .
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:593705c4d39b8ee102d0f6f22e670c233a1d92b4ad03a000c6ce1e410d793c16 done
#7 naming to docker.io/library/my_app:latest done
#7 DONE 0.0s

然后,我们可以在本地调度器上启动它。

[6]:
%%sh
torchx run --scheduler local_docker my_component.py:greet --image "my_app:latest" --user "your name"
torchx 2024-07-17 02:04:06 INFO     Tracker configurations: {}
torchx 2024-07-17 02:04:06 INFO     Checking for changes in workspace `file:///home/ec2-user/torchx/docs/source`...
torchx 2024-07-17 02:04:06 INFO     To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
torchx 2024-07-17 02:04:06 INFO     Workspace `file:///home/ec2-user/torchx/docs/source` resolved to filesystem path `/home/ec2-user/torchx/docs/source`
torchx 2024-07-17 02:04:07 WARNING  failed to pull image my_app:latest, falling back to local: 404 Client Error for http+docker://127.0.0.1/v1.44/images/create?tag=latest&fromImage=my_app: Not Found ("pull access denied for my_app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied")
torchx 2024-07-17 02:04:07 INFO     Building workspace docker image (this may take a while)...
torchx 2024-07-17 02:04:07 INFO     Step 1/4 : ARG IMAGE
torchx 2024-07-17 02:04:07 INFO     Step 2/4 : FROM $IMAGE
torchx 2024-07-17 02:04:07 INFO      ---> 593705c4d39b
torchx 2024-07-17 02:04:07 INFO     Step 3/4 : COPY . .
torchx 2024-07-17 02:04:07 INFO      ---> 0f94922fb1c1
torchx 2024-07-17 02:04:07 INFO     Step 4/4 : LABEL torchx.pytorch.org/version=0.7.0
torchx 2024-07-17 02:04:07 INFO      ---> Running in c97dda5daba1
torchx 2024-07-17 02:04:07 INFO      ---> Removed intermediate container c97dda5daba1
torchx 2024-07-17 02:04:07 INFO      ---> 2b9fac3b2ebb
torchx 2024-07-17 02:04:07 INFO     [Warning] One or more build-args [WORKSPACE] were not consumed
torchx 2024-07-17 02:04:07 INFO     Successfully built 2b9fac3b2ebb
torchx 2024-07-17 02:04:07 INFO     Built new image `sha256:2b9fac3b2ebbe5ad86995a981d772a7481555fa7e497aab8207c1aee1cb48a50` based on original image `my_app:latest` and changes in workspace `file:///home/ec2-user/torchx/docs/source` for role[0]=greeter.
torchx 2024-07-17 02:04:08 INFO     Waiting for the app to finish...
greeter/0 Hello, your name!
torchx 2024-07-17 02:04:09 INFO     Job finished: SUCCEEDED
local_docker://torchx/hello_world-nbt0qx5vk3cvzc

如果您有 Kubernetes 集群,您可以使用 Kubernetes 调度器 在集群上启动它,而不是在本地启动。

$ docker push my_app:latest
$ torchx run --scheduler kubernetes my_component.py:greet --image "my_app:latest" --user "your name"

内置组件

TorchX 还提供了一些内置组件,这些组件具有预制的镜像。您可以通过以下方式发现它们

[7]:
%%sh
torchx builtins
Found 11 builtin components:
  1. dist.ddp
  2. dist.spmd
  3. metrics.tensorboard
  4. serve.torchserve
  5. utils.binary
  6. utils.booth
  7. utils.copy
  8. utils.echo
  9. utils.python
 10. utils.sh
 11. utils.touch

您可以像使用任何其他组件一样,从 CLI、管道或以编程方式使用它们。

[8]:
%%sh
torchx run utils.echo --msg "Hello :)"
torchx 2024-07-17 02:04:11 INFO     Tracker configurations: {}
torchx 2024-07-17 02:04:11 INFO     Checking for changes in workspace `file:///home/ec2-user/torchx/docs/source`...
torchx 2024-07-17 02:04:11 INFO     To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
torchx 2024-07-17 02:04:11 INFO     Workspace `file:///home/ec2-user/torchx/docs/source` resolved to filesystem path `/home/ec2-user/torchx/docs/source`
torchx 2024-07-17 02:04:12 INFO     Building workspace docker image (this may take a while)...
torchx 2024-07-17 02:04:12 INFO     Step 1/4 : ARG IMAGE
torchx 2024-07-17 02:04:12 INFO     Step 2/4 : FROM $IMAGE
torchx 2024-07-17 02:04:12 INFO      ---> 2fd60971a176
torchx 2024-07-17 02:04:12 INFO     Step 3/4 : COPY . .
torchx 2024-07-17 02:04:12 INFO      ---> 89d40e4a8fb5
torchx 2024-07-17 02:04:12 INFO     Step 4/4 : LABEL torchx.pytorch.org/version=0.7.0
torchx 2024-07-17 02:04:12 INFO      ---> Running in e077d6aa7bf2
torchx 2024-07-17 02:04:12 INFO      ---> Removed intermediate container e077d6aa7bf2
torchx 2024-07-17 02:04:12 INFO      ---> 89cf730f8a5a
torchx 2024-07-17 02:04:12 INFO     [Warning] One or more build-args [WORKSPACE] were not consumed
torchx 2024-07-17 02:04:12 INFO     Successfully built 89cf730f8a5a
torchx 2024-07-17 02:04:12 INFO     Built new image `sha256:89cf730f8a5a91a01dc463800aa151aa87dd965a15b0e2f331e5ccc1cd4fe0b4` based on original image `ghcr.io/pytorch/torchx:0.7.0` and changes in workspace `file:///home/ec2-user/torchx/docs/source` for role[0]=echo.
torchx 2024-07-17 02:04:13 INFO     Waiting for the app to finish...
torchx 2024-07-17 02:04:13 INFO     Job finished: SUCCEEDED
echo/0 Hello :)
local_docker://torchx/echo-jdh7s6zhqzvczc

文档

访问 PyTorch 的全面开发者文档

查看文档

教程

获取面向初学者和高级开发者的深入教程

查看教程

资源

查找开发资源并获得问题的解答

查看资源