⚠️ 注意：有限维护¶

本项目不再积极维护。现有版本仍然可用，但没有计划的更新、错误修复、新功能或安全补丁。用户应注意，漏洞可能不会得到解决。

多图生成 Streamlit 应用：使用 TorchServe、torch.compile 和 OpenVINO 串联 Llama 和 Stable Diffusion¶

此多图生成 Streamlit 应用旨在根据提供的文本提示生成多张图像。与直接使用 Stable Diffusion 不同，此应用通过串联 Llama 和 Stable Diffusion 来增强图像生成过程。其工作原理如下：

该应用接收用户提示，并使用 Meta-Llama-3.2 创建多个有趣且相关的提示。
然后将这些生成的提示发送到带有 latent-consistency/lcm-sdxl 模型的 Stable Diffusion，以生成图像。
为了优化性能，模型使用采用 OpenVINO 后端的 torch.compile 进行编译。
该应用利用 TorchServe 实现高效的模型服务和管理。

Multi-Image Generation App Workflow

快速入门指南¶

先决条件:

您的系统上已安装 Docker
Hugging Face 令牌：创建一个 Hugging Face 账户并获取一个能够访问 meta-llama/Llama-3.2-3B-Instruct 模型的令牌。

要启动多图生成应用，请按照以下步骤操作：

# 1: Set HF Token as Env variable
export HUGGINGFACE_TOKEN=<HUGGINGFACE_TOKEN>

# 2: Build Docker image for this Multi-Image Generation App
git clone https://github.com/pytorch/serve.git
cd serve
./examples/usecases/llm_diffusion_serving_app/docker/build_image.sh

# 3: Launch the streamlit app for server & client
# After the Docker build is successful, you will see a "docker run" command printed to the console.
# Run that "docker run" command to launch the Streamlit app for both the server and client.

Docker 构建示例输出：¶

ubuntu@ip-10-0-0-137:~/serve$ ./examples/usecases/llm_diffusion_serving_app/docker/build_image.sh
EXAMPLE_DIR: .//examples/usecases/llm_diffusion_serving_app/docker
ROOT_DIR: /home/ubuntu/serve
DOCKER_BUILDKIT=1 docker buildx build --platform=linux/amd64 --file .//examples/usecases/llm_diffusion_serving_app/docker/Dockerfile --build-arg BASE_IMAGE="pytorch/torchserve:latest-cpu" --build-arg EXAMPLE_DIR=".//examples/usecases/llm_diffusion_serving_app/docker" --build-arg HUGGINGFACE_TOKEN=hf_<token> --build-arg HTTP_PROXY= --build-arg HTTPS_PROXY= --build-arg NO_PROXY= -t "pytorch/torchserve:llm_diffusion_serving_app" .
[+] Building 1.4s (18/18) FINISHED                                                                                                                                                               docker:default
 => [internal] load .dockerignore                                                                                                                                                                          0.0s
 .
 .
 .
 => => naming to docker.io/pytorch/torchserve:llm_diffusion_serving_app                                                                                                                                    0.0s

Docker Build Successful !

............................ Next Steps ............................
--------------------------------------------------------------------
[Optional] Run the following command to benchmark Stable Diffusion:
--------------------------------------------------------------------

docker run --rm --platform linux/amd64 \
        --name llm_sd_app_bench \
        -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
        --entrypoint python \
        pytorch/torchserve:llm_diffusion_serving_app \
        /home/model-server/llm_diffusion_serving_app/sd-benchmark.py -ni 3

-------------------------------------------------------------------
Run the following command to start the Multi-Image generation App:
-------------------------------------------------------------------

docker run --rm -it --platform linux/amd64 \
        --name llm_sd_app \
        -p 127.0.0.1:8080:8080 \
        -p 127.0.0.1:8081:8081 \
        -p 127.0.0.1:8082:8082 \
        -p 127.0.0.1:8084:8084 \
        -p 127.0.0.1:8085:8085 \
        -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
        -e MODEL_NAME_LLM=meta-llama/Llama-3.2-3B-Instruct \
        -e MODEL_NAME_SD=stabilityai/stable-diffusion-xl-base-1.0 \
        pytorch/torchserve:llm_diffusion_serving_app

Note: You can replace the model identifiers (MODEL_NAME_LLM, MODEL_NAME_SD) as needed.

预期结果¶

在成功构建后，使用显示的 docker run .. 命令启动 Docker 容器后，您可以访问两个独立的 Streamlit 应用：

TorchServe 服务器应用（运行在 https://:8084），用于启动/停止 TorchServe、加载/注册模型、扩展/缩减工作进程。
客户端应用（运行在 https://:8085），您可以在其中输入提示词进行图像生成。

注意：您还可以运行一个快速基准测试，比较 Stable Diffusion 在 Eager 模式、使用 inductor 的 torch.compile 和 openvino 后端下的性能。请查看成功构建后显示的 docker run .. 命令以进行基准测试。

启动应用示例输出：¶

ubuntu@ip-10-0-0-137:~/serve$ docker run --rm -it --platform linux/amd64 \
        --name llm_sd_app \
        -p 127.0.0.1:8080:8080 \
        -p 127.0.0.1:8081:8081 \
        -p 127.0.0.1:8082:8082 \
        -p 127.0.0.1:8084:8084 \
        -p 127.0.0.1:8085:8085 \
        -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
        -e MODEL_NAME_LLM=meta-llama/Llama-3.2-3B-Instruct \
        -e MODEL_NAME_SD=stabilityai/stable-diffusion-xl-base-1.0 \
        pytorch/torchserve:llm_diffusion_serving_app

Preparing meta-llama/Llama-3.2-1B-Instruct
/home/model-server/llm_diffusion_serving_app/llm /home/model-server/llm_diffusion_serving_app
Model meta-llama---Llama-3.2-1B-Instruct already downloaded.
Model archive for meta-llama---Llama-3.2-1B-Instruct exists.
/home/model-server/llm_diffusion_serving_app

Preparing stabilityai/stable-diffusion-xl-base-1.0
/home/model-server/llm_diffusion_serving_app/sd /home/model-server/llm_diffusion_serving_app
Model stabilityai/stable-diffusion-xl-base-1.0 already downloaded
Model archive for stabilityai---stable-diffusion-xl-base-1.0 exists.
/home/model-server/llm_diffusion_serving_app

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.

Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.

  You can now view your Streamlit app in your browser.

  Local URL: https://:8085
  Network URL: http://123.11.0.2:8085
  External URL: http://123.123.12.34:8085


  You can now view your Streamlit app in your browser.

  Local URL: https://:8084
  Network URL: http://123.11.0.2:8084
  External URL: http://123.123.12.34:8084

Stable Diffusion 基准测试示例输出：¶

要运行 Stable Diffusion 基准测试，请使用 sd-benchmark.py。有关示例控制台输出的详细信息，请参阅下文。

ubuntu@ip-10-0-0-137:~/serve$ docker run --rm --platform linux/amd64 \
        --name llm_sd_app_bench \
        -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
        --entrypoint python \
        pytorch/torchserve:llm_diffusion_serving_app \
        /home/model-server/llm_diffusion_serving_app/sd-benchmark.py -ni 3
.
.
.

Hardware Info:
--------------------------------------------------------------------------------
cpu_model: Intel(R) Xeon(R) Platinum 8488C
cpu_count: 64
threads_per_core: 2
cores_per_socket: 32
socket_count: 1
total_memory: 247.71 GB

Software Versions:
--------------------------------------------------------------------------------
Python: 3.9.20
TorchServe: 0.12.0
OpenVINO: 2024.5.0
PyTorch: 2.5.1+cpu
Transformers: 4.46.3
Diffusers: 0.31.0

Benchmark Summary:
--------------------------------------------------------------------------------
+-------------+----------------+---------------------------+
| Run Mode    | Warm-up Time   | Average Time for 3 iter   |
+=============+================+===========================+
| eager       | 11.25 seconds  | 10.13 +/- 0.02 seconds    |
+-------------+----------------+---------------------------+
| tc_inductor | 85.40 seconds  | 8.85 +/- 0.03 seconds     |
+-------------+----------------+---------------------------+
| tc_openvino | 52.57 seconds  | 2.58 +/- 0.04 seconds     |
+-------------+----------------+---------------------------+

Results saved in directory: /home/model-server/model-store/benchmark_results_20241123_071103
Files in the /home/model-server/model-store/benchmark_results_20241123_071103 directory:
benchmark_results.json
image-eager-final.png
image-tc_inductor-final.png
image-tc_openvino-final.png

Results saved at /home/model-server/model-store/ which is a Docker container mount, corresponds to 'serve/model-store-local/' on the host machine.

带性能分析的 Stable Diffusion 基准测试示例输出：¶

要运行带性能分析的 Stable Diffusion 基准测试，请使用 --run_profiling 或 -rp。有关示例控制台输出的详细信息，请参阅下文。示例性能分析基准测试输出文件可在 assets/benchmark_results_20241123_044407/ 中找到。

ubuntu@ip-10-0-0-137:~/serve$ docker run --rm --platform linux/amd64 \
        --name llm_sd_app_bench \
        -v /home/ubuntu/serve/model-store-local:/home/model-server/model-store \
        --entrypoint python \
        pytorch/torchserve:llm_diffusion_serving_app \
        /home/model-server/llm_diffusion_serving_app/sd-benchmark.py -rp
.
.
.
Hardware Info:
--------------------------------------------------------------------------------
cpu_model: Intel(R) Xeon(R) Platinum 8488C
cpu_count: 64
threads_per_core: 2
cores_per_socket: 32
socket_count: 1
total_memory: 247.71 GB

Software Versions:
--------------------------------------------------------------------------------
Python: 3.9.20
TorchServe: 0.12.0
OpenVINO: 2024.5.0
PyTorch: 2.5.1+cpu
Transformers: 4.46.3
Diffusers: 0.31.0

Benchmark Summary:
--------------------------------------------------------------------------------
+-------------+----------------+---------------------------+
| Run Mode    | Warm-up Time   | Average Time for 1 iter   |
+=============+================+===========================+
| eager       | 9.33 seconds   | 8.57 +/- 0.00 seconds     |
+-------------+----------------+---------------------------+
| tc_inductor | 81.11 seconds  | 7.20 +/- 0.00 seconds     |
+-------------+----------------+---------------------------+
| tc_openvino | 50.76 seconds  | 1.72 +/- 0.00 seconds     |
+-------------+----------------+---------------------------+

Results saved in directory: /home/model-server/model-store/benchmark_results_20241123_071629
Files in the /home/model-server/model-store/benchmark_results_20241123_071629 directory:
benchmark_results.json
image-eager-final.png
image-tc_inductor-final.png
image-tc_openvino-final.png
profile-eager.txt
profile-tc_inductor.txt
profile-tc_openvino.txt

num_iter is set to 1 as run_profiling flag is enabled !

Results saved at /home/model-server/model-store/ which is a Docker container mount, corresponds to 'serve/model-store-local/' on the host machine.

多图生成应用 UI¶

应用工作流¶

Multi-Image Generation App Workflow Gif

应用截图¶

服务器应用截图 1	服务器应用截图 2	服务器应用截图 3

客户端应用截图 1	客户端应用截图 2	客户端应用截图 3