运行 RL 所需的硬件资源
===============================

最后更新: 2025年6月25日。

相比常规训练，RL 需要更多资源，因此在训练前确定能成功运行所需的资源量是一项相对困难的任务。为了给更多人在处理不同模型和任务时选择资源提供参考点，本节主要介绍我们通过实验进行的软硬件环境需求。

然而，由于人力和设备资源的限制，我们也期望开源社区能做出更多贡献。在提交 PR 时，需要提供一个可添加到 `examples/tuning` 脚本的脚本。

我们需要两种类型的脚本：一种是可以使用**最小资源 (min)** 运行的配置，另一种是可以使用**推荐资源 (recommended)** 运行的配置。前者可以理解为应用了所有内存优化技术（例如 offload，gradient checkpointing）后可以运行的脚本。后者可以理解为在尽可能避免额外时间开销的操作（目标是最佳吞吐量）下可以运行的脚本。

在定义脚本名称时，请遵循此格式： ``[model]_[task]_[gpunums]_[device]_[train]_[infer].sh``。这将有效地提高脚本的可识别度。您可以将脚本放置在 ``examples/tuning/`` 目录下。

如果您恰好有已测试过的配置，欢迎您提交 PR 并附上来自 Wandb 或其他可验证证据的截图。

----------------------------------------

0.5B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1
    
    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2.5-0.5B
      - GRPO-LoRA
      - 1*H100
      - 116
      - fsdp
      - vllm0.8.3
      - `qwen2-0.5b_grpo-lora_1_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/0.5b/qwen2-0.5b_grpo-lora_1_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

1.5B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1
    
    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2.5-1.5B
      - GRPO-LoRA
      - 1*H100
      - 128
      - fsdp
      - vllm0.8.3
      - `qwen2-1.5b_grpo-lora_1_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/1.5b/qwen2-1.5b_grpo-lora_1_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

3B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1
    
    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2.5-3B
      - GRPO-LoRA
      - 1*H100
      - 62
      - fsdp
      - vllm0.8.3
      - `qwen2-3b_grpo-lora_1_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/3b/qwen2-3b_grpo-lora_1_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

7B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1
    
    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2-7B
      - GRPO
      - 2*H800
      - \
      - fsdp
      - vllm0.8.2
      - `qwen2-7b_grpo_2_h800_fsdp_vllm <https://github.com/volcengine/verl/blob/main/examples/tuning/7b/qwen2-7b_grpo_2_h800_fsdp_vllm.sh>`_
      - `Xiangyongan <xiangyongan@bytedance.com>`_
    * - MIN
      - Qwen2.5-7B
      - GRPO-LoRA
      - 1*H100
      - 16
      - fsdp
      - vllm0.8.3
      - `qwen2-7b_grpo-lora_1_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/7b/qwen2-7b_grpo-lora_1_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

14B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1
    
    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2-14B
      - GRPO
      - 4*H800
      - \
      - fsdp
      - vllm0.8.2
      - `qwen2-14b_grpo_4_h800_fsdp_vllm <https://github.com/volcengine/verl/blob/main/examples/tuning/14b/qwen2-14b_grpo_4_h800_fsdp_vllm.sh>`_
      - `Xiangyongan <xiangyongan@bytedance.com>`_
    * - MIN
      - Qwen2.5-14B
      - GRPO-LoRA
      - 2*H100
      - 116
      - fsdp
      - vllm0.8.3
      - `qwen2-14b_grpo-lora_2_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/14b/qwen2-14b_grpo-lora_2_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

32B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1
    
    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2-32B
      - GRPO
      - 8*H20
      - \
      - megatron
      - vllm0.8.2
      - `qwen2-32b_grpo_8_h20_megatron_vllm <https://github.com/volcengine/verl/tree/main/examples/tuning/32b/qwen2_32B_grpo_8_h20_megatron_vllm.sh>`_
      - `Xiangyongan <xiangyongan@bytedance.com>`_
    * - MIN
      - Qwen2.5-32B
      - GRPO-LoRA
      - 4*H100
      - 180
      - fsdp
      - vllm0.8.3
      - `qwen2-32b_grpo-lora_4_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/32b/qwen2-32b_grpo-lora_4_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

70B
~~~

.. list-table::
    :widths: auto
    :header-rows: 1

    * - Tag
      - Model
      - Task
      - Resource
      - MaxBatch
      - Train
      - Infer
      - Link
      - Contributor
    * - MIN
      - Qwen2-70B
      - GRPO
      - 32*H20
      - \
      - fsdp
      - vllm0.8.2
      - `qwen2-70b_grpo_32_h20_fsdp_vllm <https://github.com/volcengine/verl/blob/main/examples/tuning/70b/qwen2-70b_grpo_32_h20_fsdp_vllm.sh>`_
      - `Xiangyongan <xiangyongan@bytedance.com>`_
    * - MIN
      - Qwen2-70B
      - GRPO
      - 32*H800
      - \
      - fsdp
      - vllm0.8.3
      - `qwen2-70b_grpo_32_h800_fsdp_vllm <https://github.com/volcengine/verl/blob/main/examples/tuning/70b/qwen2-70b_grpo_32_h800_fsdp_vllm.sh>`_
      - `Xiangyongan <xiangyongan@bytedance.com>`_
    * - MIN
      - Qwen2.5-72B
      - GRPO-LoRA
      - 8*H100
      - 176
      - fsdp
      - vllm0.8.3
      - `qwen2-72b_grpo-lora_8_h100_fsdp_vllm.sh <https://github.com/volcengine/verl/blob/main/examples/tuning/70b/qwen2-72b_grpo-lora_8_h100_fsdp_vllm.sh>`_
      - `SimonHuang <thelongestusernameofall@gmail.com>`_

405B
~~~~

.. table::
   :widths: auto

   ====== ====== ====== ======== ======== ====== ====== ======
   tag    model  task   resource MaxBatch train  infer  link
   ====== ====== ====== ======== ======== ====== ====== ======
   \      \      \        \        \      \      \
   ====== ====== ====== ======== ======== ====== ====== ======

671B
~~~~

.. table::
   :widths: auto

   ====== ====== ====== ======== ======== ====== ====== ======
   tag    model  task   resource MaxBatch train  infer  link
   ====== ====== ====== ======== ======== ====== ====== ======
   \      \      \        \        \      \      \
   ====== ====== ====== ======== ======== ====== ====== ======