Game, or "Game playing" , is the classic problem of artificial intelligence. The game whose outcome could be objectively determined not only provides a perfect test platform for the artificial intelligence algorithm, but also make it possible to make computer players and human players to play against. According to different classification methods, the game can be divided into single and double (or multiplayer) games, board games and video games, collaborative games and war games, etc. Double or more people's games are often called Game, and Game can be divided into perfect information games (such as chess and chess) and incomplete information games (such as poker and military chess.) Early artificial intelligence algorithms are often dependent on the search algorithm, which in simple, limited search space (Such as Tic-Tac-Toe) is very effective.With the increase in the difficulty of the search, more complex games tend to use the combination of search and machine learning algorithms, especially Reinforcement learning algorithm.
Reinforcement learning is not deep learning. It is an important branch of machine learning that focuses on sequential decision making [1-2], and the most advanced algorithms in many games are based on reinforcement learning. Befor AlphaGo, the best Go algorithm (such as CrazyStone and Zen) combines reinforcement learning with Monte Carlo Tree search (MCTS). The reinforcement learning focuses on two unique features of learning, compared to unsupervised learning and supervised learning Problem: Exploitation vs. exploitation and Temporal credit assignment. A reinforcement learning algorithm needs to answer two basic questions:
- How to evaluate a policy
- How to find an optimal strategy for the problem .
Deep learning in the application of the game, often by helping reinforcement learning to better solve these two problems (such as shown in Figure 1).
Fig. 1 A convolutional neural network learns a mapping from game screens to game policy
Deep Reinforcement Learning
The traditional study of Reinforcement Learning focuses on Tabular Representations or Linear Function Approximations. For large-scale and complex sequential decision-making processes in reality, simple tabular representation and linear approximation are not enough. Deep learning can provide end-to-end, non-linear function approximation[1] so that reinforcement learning can solve more complex problems(such as how to express the state of the board in Go). On the other hand, in order to solve some of the observable Markov decision problems (POMDP) [3-5] which are common in games, the algorithm of reinforcement learning needs effectively process the sequence of actions and observations data (such as effectively summarizing historical actions and observations into a state representation) to find the optimal strategy. In areas such as audio, video, and text deep learning has been shown to be successful as learning sequence, so it is also used to learn the state representation in POMDP. Reinforcement learning with deep learning optimization is known as deep reinforcement learning, and a review of deep reinforcement learning can be found in [6].
Such table representations can be applied to situations where the game states and actions are not too numerous.When the number of possible game states and actions is large, the key problem with reinforcement learning is not the amount of space required to store this large table, but the amount of time and data required to correctly fill in the target values in the table.So the real challenge here is generalization: how to get the values corresponding to possible actions in the unexperienced state space when only a limited game state is experienced. The generalization method commonly used in reinforcement learning is function approximation. Function approximation Obtains some input-to-output mappings from (unknown) objective functions and attempts to generalize them to the whole function definition field to construct an approximation to the entire objective function. For example, in a linear function approximation, the value function of an action is represented as:
(2) | Q(s,a;\theta) = \theta^T \varphi (s,a) |
Where \theta is the parameter that can be learned and \varphi(\cdot)is the feature function defined on the pair of states + actions. The linear function approximation method extracts the artificially designed features from the most recent frame of the game frame, and uses linear combination of these features to express and learn value functions. Bellemare of the University of Alberta first used a linear function approximation SARSA algorithm on ALE and the following four new generic artificial design feature sets:
- First, the screen is divided into blocks of intersection, for each block a vector is used to indicate whether each color is present or not and set the set of these vectors as the BASIC feature set of the current game screen.
- Based on the basic feature set, we add pairing combinations of basic features, which constitute BASS feature set.
- The objects on the screen are first extracted and classified by clustering. And the position and velocity of these objects are inferred from the information between the frames.Types, positions and speeds of all the objects on the screen constitute the DISCO feature set of the screen of current game;
- Locality-sensitive Hashing is applied to the game screen, and the resulting low-dimensional representation is used as the LSH feature set[7].
They have proposed Contingency Awareness in the follow-up work, which is based on the original feature set Has additional features are added to indicate which elements of the screen are directly affected by the player's input.46 Bellemare et al later proposed a further extension of the feature set by using the tug-of-war sketch. But in general, the reinforcement learning algorithm based on linear value function approximation is much weaker than that of the human player. In addition, it is not an easy task to design the feature function manually.
DeepMind's Wang et al. Proposed a new "dueling" neural network architecture that separates the value functions of the game state from those of the state-dependent actions, which makes the valuations of the individual actions no longer independent. Actions sharing more generalized state-valued functions are useful for reducing the estimation bias for different actions.
Conclusion
Game Theory has always been an important branch of artificial intelligence.The success of deep learning in other areas brought to unprecedented inspiration to the game artificial intelligence. For example, the computer Go has experienced the earliest rule-based algorithm, heuristic-based situation assessment algorithm, Monte Carlo tree-based search algorithm, reinforcement learning algorithm, and finally got a qualitative leap by the deep reinforcement learning. In the near future, we will see the deep learning algorithms and ideas applied to more and more games. The future research direction should be from the classic game gradually biased towards more complex, more complete information multiplayer games, especially video games. Deep learning in the game still has many problems to be solved, such as how to generalize the acquired knowledge to new issues, how to make more effective use of expert knowledge, how to learn new knowledge in the game, how to change the game according to the opponent's strategy and so on. We also expect to see successful game algorithms being applied across a wide range of industries, such as health and education, to truly improve the lives of ordinary people.
Literature
- Sutton R S, Barto A G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.
- Kaelbling L P, Littman M L, Moore A W. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 1996, 4: 237−285
- Hausknecht M, Stone P. Deep recurrent q-learning for partially observable MDPS. In: Proceedings of the 2015 AAAI Fall Symposium Series. The Westin Arlington Gateway, Arlington, Virginia: AIAA, 2015.
- Bakker B, Zhumatiy V, Gruener G, Schmidhuber J. A robot that reinforcement-learns to identify and memorize important previous observations. In: Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems. Manno-Lugano, Switzerland: IEEE, 2003
- Wierstra D, F¨orster A, Peters J, Schmidhuber J. Recurrent policy gradients. Logic Journal of IGPL, 2010, 18(5): 620−634
- Schmidhuber J. Deep learning in neural networks: an overview. Neural Networks, 2015, 61: 85−117
- Bellemare M, Naddaf Y, Veness J, Bowling M. The arcade learning environment: an evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013, 47: 253−279
- Clark C, Storkey A. Training deep convolutional neural networks to play go. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ICML, 2015.
- Maddison C J, Huang A, Sutskever I, Silver D. Move evaluation in Go using deep convolutional neural networks. In: Proceedings of the 2014 International Conference on Learning Representations. Rimrock Resort Hotel, Banff, Canada: ICRR, 2014.
- Tian Y D, Zhu Y. Better computer go player with neural network and long-term prediction. In: Proceeding of the 2016 International Conference on Learning Representations. Caribe Hilton, San Juan, Puerto Rico: ICLR, 2016.