完整代码参考: https://gitee.com/chencib/ailib/blob/master/rl/ppo_cartpole.py
执行结果:
 
部分训练得分:
(sd) D:\Dev\traditional_nn\feiai\test\rl>python ppo_cartpole_v2_succeed.py
Ep:    0 | Reward:   23.0 | Running:   23.0
Ep:    1 | Reward:   12.0 | Running:   21.9
Ep:    2 | Reward:   31.0 | Running:   22.8
Ep:    3 | Reward:   25.0 | Running:   23.0
Ep:    4 | Reward:    9.0 | Running:   21.6
Ep:    5 | Reward:   20.0 | Running:   21.5
Ep:    6 | Reward:   20.0 | Running:   21.3
Ep:    7 | Reward:   28.0 | Running:   22.0
Ep:    8 | Reward:   32.0 | Running:   23.0
Ep:    9 | Reward:   18.0 | Running:   22.5
……
Ep:  990 | Reward:   15.0 | Running:   19.7
Ep:  991 | Reward:   19.0 | Running:   19.7
Ep:  992 | Reward:   20.0 | Running:   19.7
Ep:  993 | Reward:   24.0 | Running:   20.1
Ep:  994 | Reward:   16.0 | Running:   19.7
Ep:  995 | Reward:   20.0 | Running:   19.7
Ep:  996 | Reward:   19.0 | Running:   19.7
Ep:  997 | Reward:   26.0 | Running:   20.3
Ep:  998 | Reward:   13.0 | Running:   19.6
Ep:  999 | Reward:   11.0 | Running:   18.7