Latest Version: 7
test running
Latest Version: 4
Improved model structure
Latest Version: 0
add more info for forwarding
Latest Version: 0
it is encouraged to achieve less shanten faster
Latest Version: 0
as it is shown..
Latest Version: 1
Using modified TRPO for training
Latest Version: 1
decoupled TRPO system with Residuals
Latest Version: 3
Using GRPO to improve PPO
Bot Version | Description | View Source |
---|---|---|
Please choose a bot on the left first. |