Kurzbeschreibung

Machine learning and especially reinforcement learning (RL) technologies have not yet been developed to such an extent that they can replace control technologies in large sections of industrial and technological areas. Missing robustness in RL algo- rithms is one of the issues that hinder the real-world application so far. There are many disturbances, such as noise or manufacturing tolerances in the real world that cause significant problems. In order to become applicable, the robustness of RL algorithms against different disturbances has to increase. Moreover, objective measures have to be developed that allow the robustness evaluation of an algorithm.

This thesis focuses on the field of robustness in RL algorithms. In the first part, a new robust approach of deep RL is introduced and evaluated. Therefore, the state of the art algorithm Proximal Policy Optimization is applied to a problem defined within a robust Markov decision process. To model these processes, the system is defined with uncertainties in the system parameters. In this case, a worst-case assumption of the possible behavior under the uncertainty of the system parameters is taken into account. Additionally, the influence of the distribution type and spread of the uncer- tainties is tested. In the second part, the research deals with the question about the robustness evaluation of RL algorithms. Therefore, a robustness benchmark of the standard PPO and the new approach, where PPO and RMDP are combined, is per- formed. To address the different aspects of robustness in this work, several ways to evaluate robustness against different disturbances are considered. The influence of noise, changing environment parameters and transfer learning are metrics to perform the robustness evaluation.

The robustness evaluation of this thesis leads to the result that the state of the art algorithm based on domain randomization still achieves the most robust performance. Comparing all evaluation metrics and environments the robustness performance is very dependent on the different environments. Furthermore, the types of distributions, introduced in the learning process, do not have an impact on the performance. This research shows that the important parameter to achieve more robust algorithms is the spread of the underlying distribution. Additionally, it uncovers that this parameter varies from every environment and algorithm. In the last part, an outlook for future work concerning robustness based on these results is provided.


Dokumentation

Influences to the usual RL system that are tested in different experiments: 


First test how distributions influence the average performance of CartPole with system uncertainties:

In this case the kind of distribution is not the important parameter but the spread of the distribution is. A performance improvement by learning over distributions first of all is possible. Since also the performance using just the standart parameter of the system, the learned behaviour is as good as learning without distributions, shows that learning over distributions leads to a benefit for the RL Problem.

Comming to the robustness benchmark three different algorithms in different environments are tested with changing system parameter, adding noise or perform a transfer learning from just system dynamics to a physics simulator.

System Variation:

Adding Noise:

Transfer Learning:

All in all the robustness evaluation leads to the result that the state of the art algorithm which is a PPO used with domain randomization still leads to the best results in robustness and still gives a good standart performance. Comparing all evaluation metrics and environments the robustness performance is very dependent on the different environments but all over, the PPO-DR outperformes the standart PPO and the PPO with RMDPs in the Most cases. Furthermore, the types of distributions, introduced in the learning process, do not have an impact on the performance.

Dateien


  • Keine Stichwörter