欢迎来到 verl 的文档！
================================================

verl 是一个灵活、高效且支持生产的 RL 训练框架，专为大型语言模型（LLMs）的后训练而设计。它是 `HybridFlow <https://arxiv.org/pdf/2409.19256>`_ 论文的开源实现。

verl 灵活且易于使用，它具有：

- **轻松扩展多样化的 RL 算法**：混合编程模型结合了单控制器和多控制器的范式优势，能够灵活地表示和高效地执行复杂的后训练数据流。让用户只需几行代码即可构建 RL 数据流。

- **与现有 LLM]]:rinfra 实现无缝集成，提供]r模块化 API**：解耦计算和数据依赖，能够与现有的 LLM 框架（如 PyTorch FSDP、Megatron-LM、vLLM 和 SGLang）无缝集成。此外，用户还可以轻松扩展到其他 LLM 训练和推理框架。

- **灵活的设备映射和并行策略**：支持将模型放置在不同的 GPU 集合上，以实现高效的资源利用和不同规模集群的可扩展性。

- 预集成流行的 HuggingFace 模型


verl 速度快，因为它：

- **实现了 SOTA]]:r吞吐量**：通过与现有的 SOTA LLM]]:r训练和推理框架无缝集成，verl 实现了高生成和训练吞吐量。

- **高效的 actor 模型重分片（3D-HybridEngine）**：消除了内存冗余，并显著降低了训练和生成阶段之间转换时的通信开销。
--------------------------------------------

.. _Contents:

.. toctree::
   :maxdepth: 2
   :caption: 快速入门

   start/install
   start/quickstart
   start/multinode
   start/ray_debug_tutorial
   start/more_resources
   start/agentic_rl

.. toctree::
   :maxdepth: 2
   :caption: 编程指南

   hybrid_flow
   single_controller

.. toctree::
   :maxdepth: 1
   :caption: 数据准备

   preparation/prepare_data
   preparation/reward_function

.. toctree::
   :maxdepth: 2
   :caption: 配置

   examples/config

.. toctree::
   :maxdepth: 1
   :caption: PPO 示例

   examples/ppo_code_architecture
   examples/gsm8k_example
   examples/multi_modal_example
   examples/skypilot_examples

.. toctree::
   :maxdepth: 1
   :caption: 算法

   algo/ppo.md
   algo/grpo.md
   algo/collabllm.md
   algo/dapo.md
   algo/spin.md
   algo/sppo.md
   algo/entropy.md
   algo/opo.md
   algo/baseline.md
   algo/gpg.md

.. toctree::
   :maxdepth: 1
   :caption: PPO Trainer 和 Workers

   workers/ray_trainer
   workers/fsdp_workers
   workers/megatron_workers
   workers/sglang_worker
   workers/model_engine

.. toctree::
   :maxdepth: 1
   :caption: 性能调优指南

   perf/dpsk.md
   perf/perf_tuning
   README_vllm0.8.md
   perf/device_tuning
   perf/verl_profiler_system.md
   perf/nsight_profiling.md

.. toctree::
   :maxdepth: 1
   :caption: 添加新模型

   advance/fsdp_extension
   advance/megatron_extension

.. toctree::
   :maxdepth: 1
   :caption: 高级功能

   advance/checkpoint
   advance/rope
   advance/ppo_lora.rst
   sglang_multiturn/multiturn.rst
   sglang_multiturn/interaction_system.rst
   advance/placement
   advance/dpo_extension
   examples/sandbox_fusion_example
   advance/rollout_trace.rst
   advance/rollout_skip.rst
   advance/rollout_is.md
   advance/one_step_off
   advance/agent_loop
   advance/reward_loop
   advance/fully_async
   data/transfer_queue.md

.. toctree::
   :maxdepth: 1
   :caption: 硬件支持

   amd_tutorial/amd_build_dockerfile_page.rst
   amd_tutorial/amd_vllm_page.rst
   ascend_tutorial/ascend_quick_start.rst
   ascend_tutorial/ascend_profiling_zh.rst
   ascend_tutorial/ascend_profiling_en.rst
   ascend_tutorial/ascend_sglang_quick_start.rst

.. toctree::
   :maxdepth: 1
   :caption: API 参考

   api/data
   api/single_controller.rst
   api/trainer.rst
   api/utils.rst


.. toctree::
   :maxdepth: 2
   :caption: FAQ

   faq/faq

.. toctree::
   :maxdepth: 1
   :caption: 开发笔记

   sglang_multiturn/sandbox_fusion.rst

贡献
-------------

verl 是免费软件；您可以在 Apache License 2.0 的条款下重新分发和/或修改它。我们欢迎您的贡献。
加入我们的 `GitHub <https://github.com/volcengine/verl>`_、`Slack <https://join.slack.com/t/verlgroup/shared_invite/zt-2w5p9o4c3-yy0x2Q56s_VlGLsJ93A6vA>`_ 和 `微信 <https://raw.githubusercontent.com/eric-haibin-lin/verl-community/refs/heads/main/WeChat.JPG>`_ 进行讨论。

欢迎社区贡献！请查看我们的 `项目路线图 <https://github.com/volcengine/verl/issues/710>`_ 和 `贡献指南 (good first issues) <https://github.com/volcengine/verl/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22>`_，了解可以贡献的地方。

代码 Linting 和格式化
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

我们使用 pre-commit 来帮助提高代码质量。要初始化 pre-commit，请运行：

.. code-block:: bash

   pip install pre-commit
   pre-commit install

要本地解决 CI 错误，您也可以通过以下方式手动运行 pre-commit：

.. code-block:: bash

   pre-commit run

添加 CI 测试
^^^^^^^^^^^^^^^^^^^^^^^^

如果可能，请为您的新功能添加 CI 测试：

1. 找到最相关的 workflow yml 文件，通常对应一个 ``hydra`` 默认配置（例如 ``ppo_trainer``、``ppo_megatron_trainer``、``sft_trainer`` 等）。
2. 如果尚未包含，请向 ``paths`` 部分添加相关的路径模式。
3. 最小化测试脚本的工作量（请参阅现有脚本示例）。

我们正在招聘！如果您对 MLsys/LLM 推理/多模态对齐方向的实习/全职机会感兴趣，请发送 `邮件 <mailto:haibin.lin@bytedance.com>`_ 给我们。