PPSN XII - LNCS 7491-7492 CD-ROM

Reinforcement Learning with N-tuples on the Game Connect-4

Markus Thill, Patrick Koch, and Wolfgang Konen

Department of Computer Science, Cologne University of Applied Sciences, 51643, Gummersbach, Germany
patrick.koch@fh-koeln.de
wolfgang.konen@fh-koeln.de

Abstract. Learning complex game functions is still a difficult task. We apply temporal difference learning (TDL), a well-known variant of the reinforcement learning approach, in combination with n-tuple networks to the game Connect-4. Our agent is trained just by self-play. It is able, for the first time, to consistently beat the optimal-playing Minimax agent (in game situations where a win is possible). The n-tuple network induces a mighty feature space: It is not necessary to design certain features, but the agent learns to select the right ones. We believe that the n-tuple network is an important ingredient for the overall success and identify several aspects that are relevant for achieving high-quality results. The architecture is sufficiently general to be applied to similar reinforcement learning tasks as well.

Keywords: Machine learning, reinforcement learning, TDL, self-play, n-tuple systems, feature generation, board games

LNCS 7491, p. 184 ff.

Full article in PDF | BibTeX