01) Rainbow: Combining Improvements in Deep Reinforcement Learning Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver 2017 02) Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games? Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg 2017 03) Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis 2017 04) An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning Richard S. Sutton, A. Rupam Mahmood, Martha White 2015 05) Safe and Efficient Off-Policy Reinforcement Learning Rémi Munos, Tom Stepleton, Anna Harutyunyan, Marc G. Bellemare 2016 06) Convergence of stochastic iterative dynamic programming algorithms T Jaakkola, MI Jordan, SP Singh 1993 07) Deep reinforcement learning from human preferences Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei 2017 08) Neuro-dynamic Programming, Chap 7 Dimitri Bertsekas, John Tsitsiklis 1996 09) Montezuma’s Revenge Solved by Go-Explore, a New Algorithm for Hard-Exploration Problems (Sets Records on Pitfall, Too) Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune 2018 10) From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning Rémi Munos 2014