自定义组件¶
本指南介绍如何构建一个简单的应用程序和自定义组件规范,并通过两种不同的调度器启动它。
有关安装和基本用法的详细信息,请参阅 快速入门指南。
Hello World¶
让我们从编写一个简单的“Hello World” Python 应用程序开始。这只是一个普通的 Python 程序,可以包含任何您想要的内容。
注意
此示例使用 Jupyter Notebook %%writefile
为示例目的创建本地文件。在正常使用情况下,您将拥有这些作为独立文件。
[1]:
%%writefile my_app.py
import sys
import argparse
def main(user: str) -> None:
print(f"Hello, {user}!")
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Hello world app"
)
parser.add_argument(
"--user",
type=str,
help="the person to greet",
required=True,
)
args = parser.parse_args(sys.argv[1:])
main(args.user)
Overwriting my_app.py
现在我们有了应用程序,可以为它编写组件文件。此功能使我们能够以用户友好的方式重复使用和共享我们的应用程序。
我们可以通过 torchx
cli 或以编程方式(作为管道的一部分)使用此组件。
[2]:
%%writefile my_component.py
import torchx.specs as specs
def greet(user: str, image: str = "my_app:latest") -> specs.AppDef:
return specs.AppDef(
name="hello_world",
roles=[
specs.Role(
name="greeter",
image=image,
entrypoint="python",
args=[
"-m", "my_app",
"--user", user,
],
)
],
)
Overwriting my_component.py
我们可以通过 torchx run
执行我们的组件。local_cwd
调度器相对于当前目录执行组件。
[3]:
%%sh
torchx run --scheduler local_cwd my_component.py:greet --user "your name"
torchx 2024-07-17 02:04:04 INFO Tracker configurations: {}
torchx 2024-07-17 02:04:04 INFO Log directory not set in scheduler cfg. Creating a temporary log dir that will be deleted on exit. To preserve log directory set the `log_dir` cfg option
torchx 2024-07-17 02:04:04 INFO Log directory is: /tmp/torchx_b44hv08a
torchx 2024-07-17 02:04:04 INFO Waiting for the app to finish...
greeter/0 Hello, your name!
torchx 2024-07-17 02:04:05 INFO Job finished: SUCCEEDED
local_cwd://torchx/hello_world-l72k6xzs9nl7qc
如果我们想要在其他环境中运行,我们可以构建一个 Docker 容器,以便我们可以在启用 Docker 的环境(例如 Kubernetes)或通过本地 Docker 调度器运行我们的组件。
注意
这需要安装 Docker,并且在 Google Colab 等环境中不起作用。如果您尚未安装,请按照以下链接中的安装说明进行操作:https://docs.docker.net.cn/get-docker/
[4]:
%%writefile Dockerfile.custom
FROM ghcr.io/pytorch/torchx:0.1.0rc1
ADD my_app.py .
Overwriting Dockerfile.custom
创建 Dockerfile 后,我们可以创建 Docker 镜像。
[5]:
%%sh
docker build -t my_app:latest -f Dockerfile.custom .
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.custom
#1 transferring dockerfile: 158B done
#1 DONE 0.0s
#2 [internal] load metadata for ghcr.io/pytorch/torchx:0.1.0rc1
#2 DONE 0.4s
#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s
#4 [1/2] FROM ghcr.io/pytorch/torchx:0.1.0rc1@sha256:a738949601d82e7f100fa1efeb8dde0c35ce44c66726cf38596f96d78dcd7ad3
#4 DONE 0.0s
#5 [internal] load build context
#5 transferring context: 484B done
#5 DONE 0.0s
#6 [2/2] ADD my_app.py .
#6 CACHED
#7 exporting to image
#7 exporting layers done
#7 writing image sha256:593705c4d39b8ee102d0f6f22e670c233a1d92b4ad03a000c6ce1e410d793c16 done
#7 naming to docker.io/library/my_app:latest done
#7 DONE 0.0s
然后,我们可以在本地调度器上启动它。
[6]:
%%sh
torchx run --scheduler local_docker my_component.py:greet --image "my_app:latest" --user "your name"
torchx 2024-07-17 02:04:06 INFO Tracker configurations: {}
torchx 2024-07-17 02:04:06 INFO Checking for changes in workspace `file:///home/ec2-user/torchx/docs/source`...
torchx 2024-07-17 02:04:06 INFO To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
torchx 2024-07-17 02:04:06 INFO Workspace `file:///home/ec2-user/torchx/docs/source` resolved to filesystem path `/home/ec2-user/torchx/docs/source`
torchx 2024-07-17 02:04:07 WARNING failed to pull image my_app:latest, falling back to local: 404 Client Error for http+docker://127.0.0.1/v1.44/images/create?tag=latest&fromImage=my_app: Not Found ("pull access denied for my_app, repository does not exist or may require 'docker login': denied: requested access to the resource is denied")
torchx 2024-07-17 02:04:07 INFO Building workspace docker image (this may take a while)...
torchx 2024-07-17 02:04:07 INFO Step 1/4 : ARG IMAGE
torchx 2024-07-17 02:04:07 INFO Step 2/4 : FROM $IMAGE
torchx 2024-07-17 02:04:07 INFO ---> 593705c4d39b
torchx 2024-07-17 02:04:07 INFO Step 3/4 : COPY . .
torchx 2024-07-17 02:04:07 INFO ---> 0f94922fb1c1
torchx 2024-07-17 02:04:07 INFO Step 4/4 : LABEL torchx.pytorch.org/version=0.7.0
torchx 2024-07-17 02:04:07 INFO ---> Running in c97dda5daba1
torchx 2024-07-17 02:04:07 INFO ---> Removed intermediate container c97dda5daba1
torchx 2024-07-17 02:04:07 INFO ---> 2b9fac3b2ebb
torchx 2024-07-17 02:04:07 INFO [Warning] One or more build-args [WORKSPACE] were not consumed
torchx 2024-07-17 02:04:07 INFO Successfully built 2b9fac3b2ebb
torchx 2024-07-17 02:04:07 INFO Built new image `sha256:2b9fac3b2ebbe5ad86995a981d772a7481555fa7e497aab8207c1aee1cb48a50` based on original image `my_app:latest` and changes in workspace `file:///home/ec2-user/torchx/docs/source` for role[0]=greeter.
torchx 2024-07-17 02:04:08 INFO Waiting for the app to finish...
greeter/0 Hello, your name!
torchx 2024-07-17 02:04:09 INFO Job finished: SUCCEEDED
local_docker://torchx/hello_world-nbt0qx5vk3cvzc
如果您有 Kubernetes 集群,您可以使用 Kubernetes 调度器 在集群上启动它,而不是在本地启动。
$ docker push my_app:latest
$ torchx run --scheduler kubernetes my_component.py:greet --image "my_app:latest" --user "your name"
内置组件¶
TorchX 还提供了一些内置组件,这些组件具有预制的镜像。您可以通过以下方式发现它们
[7]:
%%sh
torchx builtins
Found 11 builtin components:
1. dist.ddp
2. dist.spmd
3. metrics.tensorboard
4. serve.torchserve
5. utils.binary
6. utils.booth
7. utils.copy
8. utils.echo
9. utils.python
10. utils.sh
11. utils.touch
您可以像使用任何其他组件一样,从 CLI、管道或以编程方式使用它们。
[8]:
%%sh
torchx run utils.echo --msg "Hello :)"
torchx 2024-07-17 02:04:11 INFO Tracker configurations: {}
torchx 2024-07-17 02:04:11 INFO Checking for changes in workspace `file:///home/ec2-user/torchx/docs/source`...
torchx 2024-07-17 02:04:11 INFO To disable workspaces pass: --workspace="" from CLI or workspace=None programmatically.
torchx 2024-07-17 02:04:11 INFO Workspace `file:///home/ec2-user/torchx/docs/source` resolved to filesystem path `/home/ec2-user/torchx/docs/source`
torchx 2024-07-17 02:04:12 INFO Building workspace docker image (this may take a while)...
torchx 2024-07-17 02:04:12 INFO Step 1/4 : ARG IMAGE
torchx 2024-07-17 02:04:12 INFO Step 2/4 : FROM $IMAGE
torchx 2024-07-17 02:04:12 INFO ---> 2fd60971a176
torchx 2024-07-17 02:04:12 INFO Step 3/4 : COPY . .
torchx 2024-07-17 02:04:12 INFO ---> 89d40e4a8fb5
torchx 2024-07-17 02:04:12 INFO Step 4/4 : LABEL torchx.pytorch.org/version=0.7.0
torchx 2024-07-17 02:04:12 INFO ---> Running in e077d6aa7bf2
torchx 2024-07-17 02:04:12 INFO ---> Removed intermediate container e077d6aa7bf2
torchx 2024-07-17 02:04:12 INFO ---> 89cf730f8a5a
torchx 2024-07-17 02:04:12 INFO [Warning] One or more build-args [WORKSPACE] were not consumed
torchx 2024-07-17 02:04:12 INFO Successfully built 89cf730f8a5a
torchx 2024-07-17 02:04:12 INFO Built new image `sha256:89cf730f8a5a91a01dc463800aa151aa87dd965a15b0e2f331e5ccc1cd4fe0b4` based on original image `ghcr.io/pytorch/torchx:0.7.0` and changes in workspace `file:///home/ec2-user/torchx/docs/source` for role[0]=echo.
torchx 2024-07-17 02:04:13 INFO Waiting for the app to finish...
torchx 2024-07-17 02:04:13 INFO Job finished: SUCCEEDED
echo/0 Hello :)
local_docker://torchx/echo-jdh7s6zhqzvczc