AdaptiveKLController¶
- class torchrl.data.AdaptiveKLController(*, init_kl_coef: float, target: float, horizon: int, model: Optional[Module] = None)[source]¶
自适应 KL 控制器,如 Ziegler 等人在“从人类偏好微调语言模型”中所述。
- 关键词参数:
参考:第 2.2 节 https://arxiv.org/pdf/1909.08593.pdf#page=2 来源: https://github.com/openai/lm-human-preferences/blob/master/lm_human_preferences/train_policy.py