make_trainer¶
- torchrl.trainers.helpers.make_trainer(collector: DataCollectorBase, loss_module: LossModule, recorder: Optional[EnvBase] = None, target_net_updater: Optional[TargetNetUpdater] = None, policy_exploration: Optional[Union[TensorDictModuleWrapper, TensorDictModule]] = None, replay_buffer: Optional[ReplayBuffer] = None, logger: Optional[Logger] = None, cfg: DictConfig = None) Trainer [source]¶
给定其组成部分创建 Trainer 实例。
- 参数:
collector (DataCollectorBase) – 用于收集数据的收集器。
loss_module (LossModule) – TorchRL 损失模块
recorder (EnvBase, 可选) – 记录器环境。如果为 None,则训练器将在没有测试的情况下训练策略。
target_net_updater (TargetNetUpdater, 可选) – 目标网络更新对象。
policy_exploration (TDModule 或 TensorDictModuleWrapper, 可选) – 用于记录和探索更新的策略(应与学习到的策略同步)。
replay_buffer (ReplayBuffer, 可选) – 用于收集数据的回放缓冲区。
logger (Logger, 可选) – 用于日志记录的日志记录器。
cfg (DictConfig, 可选) – 包含脚本参数的 DictConfig。如果为 None,则使用默认参数。
- 返回值:
使用输入对象构建的训练器。优化器由此辅助函数使用提供的 cfg 构建。
示例
>>> import torch >>> import tempfile >>> from torchrl.trainers.loggers import TensorboardLogger >>> from torchrl.trainers import Trainer >>> from torchrl.envs import EnvCreator >>> from torchrl.collectors.collectors import SyncDataCollector >>> from torchrl.data import TensorDictReplayBuffer >>> from torchrl.envs.libs.gym import GymEnv >>> from torchrl.modules import TensorDictModuleWrapper, SafeModule, ValueOperator, EGreedyWrapper >>> from torchrl.objectives.common import LossModule >>> from torchrl.objectives.utils import TargetNetUpdater >>> from torchrl.objectives import DDPGLoss >>> env_maker = EnvCreator(lambda: GymEnv("Pendulum-v0")) >>> env_proof = env_maker() >>> obs_spec = env_proof.observation_spec >>> action_spec = env_proof.action_spec >>> net = torch.nn.Linear(env_proof.observation_spec.shape[-1], action_spec.shape[-1]) >>> net_value = torch.nn.Linear(env_proof.observation_spec.shape[-1], 1) # for the purpose of testing >>> policy = SafeModule(action_spec, net, in_keys=["observation"], out_keys=["action"]) >>> value = ValueOperator(net_value, in_keys=["observation"], out_keys=["state_action_value"]) >>> collector = SyncDataCollector(env_maker, policy, total_frames=100) >>> loss_module = DDPGLoss(policy, value, gamma=0.99) >>> recorder = env_proof >>> target_net_updater = None >>> policy_exploration = EGreedyWrapper(policy) >>> replay_buffer = TensorDictReplayBuffer() >>> dir = tempfile.gettempdir() >>> logger = TensorboardLogger(exp_name=dir) >>> trainer = make_trainer(collector, loss_module, recorder, target_net_updater, policy_exploration, ... replay_buffer, logger) >>> print(trainer)