Kurzbeschreibung
Recently, deep reinforcement learning has accomplished success in solving complex tasks. By employing function approximations based on neural networks, representations are learned that allow to generalize large state-action spaces and handle sophisticated problems with high dimensionality. However, the state-action dimensionality is further exacerbated when multiple agents interact and learn simultaneously. As a result, each agent faces a moving target problem which renders the learning problem inherently difficult and, therefore, methods in the multi-agent domain differ from those in the single-agent domain.
This thesis is concerned with the presence of multiple agents and examines the challenges in the context of reinforcement learning. The first contribution of this thesis is a comprehensive overview of the recent methods in multi-agent deep reinforcement learning. The overview is divided into a structural analysis of common architectures, an investigation of emergent agent behaviors, and general problems that arise in reinforcement learning.
As the second contribution, the state-of-the-art algorithm, Proximal Policy Optimization, is adopted to the multi-agent domain. Two approaches for training multiple agents are introduced. On the one hand, a master controller observes the states of all agents and computes the optimal action for each agent. On the other hand, a parameter sharing approach is applied where all agents own the same policy network and share the experience among all agents. It is shown that parameter sharing can speed up the learning process and constitutes a scalable method for training multiple agents when a mutual goal is addressed. Further, the emergence of different agent behaviors is analyzed and evaluated in the two-dimensional Fruit Catcher Game. It is demonstrated that the reward scheme impacts the behavior of agents such that agents either learn to collaborate or compete against each other.
Finally, the parameter sharing method is extended to the real world. The agents are trained to push a box in simulation and their policies are conveyed to the physical world thereafter. This illustrates that it is possible to transfer behavior from a simulation to the real world and that multiple agents are able to coordinate in a physical task while being trained only in a virtual environment.