Spaces:
Sleeping
Sleeping
@startuml serial_main | |
header Serial Pipeline | |
title Serial Main | |
participant controller | |
participant env_manager | |
participant policy | |
participant learner | |
participant replay_buffer | |
participant collector | |
participant evaluator | |
participant commander | |
autonumber | |
controller -> env_manager: init collector and evaluator env_manager; set seed | |
controller -> policy: init policy | |
controller -> learner: init learner; set learn_mode policy | |
controller -> collector: init collector; set collect_mode policy; set env_manager | |
controller -> evaluator: init evaluator; set eval_mode policy; set env_manager | |
controller -> commander: init commander; set command_mode policy | |
controller -> replay_buffer: init replay_buffer | |
alt random collect before training starts | |
collector -> collector: reset policy to random one; generate random data | |
collector -> replay_buffer: push_data | |
collector -> collector: reset policy back to the original one | |
end | |
learner -> learner: call before_run hook | |
loop | |
commander -> commander: step | |
alt this iteration needs evaluation | |
evaluator -> evaluator: eval_performance | |
alt reach eval stop_value | |
learner -> learner: save checkpoint and exit | |
else episode_return is new highest | |
learner -> learner: save checkpoint | |
end | |
end | |
collector -> collector: generate data (steps or episodes) | |
collector -> replay_buffer: push_data | |
loop learner_train_iteration times | |
replay_buffer -> learner: sample_data | |
learner -> learner: train | |
alt replay replay_buffer use prioritization | |
learner -> replay_buffer: update with priority_info | |
end | |
end | |
alt on_policy training | |
replay_buffer -> replay_buffer: clear | |
end | |
end | |
learner -> learner: call after_run hook | |
controller -> replay_buffer: close replay_buffer | |
controller -> learner: close learner | |
controller -> collector: close collector | |
controller -> evaluator: close evaluator | |
@enduml | |