Limits of Q-learning