Dokumentation
Influences to the usual RL system that are tested in different experiments:
First test how distributions influence the average performance of CartPole with system uncertainties:
In this case the kind of distribution is not the important parameter but the spread of the distribution is. A performance improvement by learning over distributions first of all is possible. Since also the performance using just the standart parameter of the system, the learned behaviour is as good as learning without distributions, shows that learning over distributions leads to a benefit for the RL Problem.
Comming to the robustness benchmark three different algorithms in different environments are tested with changing system parameter, adding noise or perform a transfer learning from just system dynamics to a physics simulator.
System Variation:
Adding Noise:
Transfer Learning:
All in all the robustness evaluation leads to the result that the state of the art algorithm which is a PPO used with domain randomization still leads to the best results in robustness and still gives a good standart performance. Comparing all evaluation metrics and environments the robustness performance is very dependent on the different environments but all over, the PPO-DR outperformes the standart PPO and the PPO with RMDPs in the Most cases. Furthermore, the types of distributions, introduced in the learning process, do not have an impact on the performance.