Google DeepMind的Atari Breakout深度Q学习



Google DeepMind使用深度强化学习创建了一个人工智能程序,该程序可以玩Atari游戏并将自身提升到超人的水平。它能够玩许多Atari游戏,并结合了深度人工神经网络和强化学习功能。在用算法展示出最初的结果之后,Google几乎立即以数亿美元的价格收购了该公司,因此得名Google DeepMind。请欣赏影片,如果您对深度学习有任何疑问,请告诉我!

______________________

为你推荐:
1. DeepMind的AlphaGo如何击败Lee Sedol-https://www.youtube.com/watch?v=a-ovvd_ZrmA&index=58&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e
2. DeepMind如何克服深度学习(AlphaGo)-https://www.youtube.com/watch?v=IFmj5M5Q5jg&index=42&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e
3. Google DeepMind的深度Q学习和超人Atari游戏玩法-

如果您想查看更多类似的内容,请订阅:http://www.youtube.com/subscription_center?add_user=keeroyz

-原始的DeepMind代码:https://sites.google.com/a/deepmind.com/dqn/

-Ilya Kuzovkin的叉子,带有可视化效果:
https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner

-此补丁修复了重新加载预训练网络时的可视化问题。在第一个评估批处理完成后(通常是几分钟),将出现该窗口:
http://cg.tuwien.ac.at/~zsolnai/wp/wp-content/uploads/2015/03/train_agent.patch

-此配置文件将使用少于1GB的VRAM运行Ilya Kuzovkin的版本:
http://cg.tuwien.ac.at/~zsolnai/wp/wp-content/uploads/2015/03/run_gpu

-有关此深度学习技术的原始《自然》论文可在此处获得:
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

-还有一些不在收费区后面的镜子:
http://www.cs.swarthmore.edu/~meeden/cs63/s15/nature15b.pdf
http://diyhpl.us/~nmz787/pdf/Human-level_control_through_deep_reinforcement_learning.pdf

网址→https://cg.tuwien.ac.at/~zsolnai/
Twitter→https://twitter.com/karoly_zsolnai。

44 comments
  1. Do you think that within the decade q-learning could manage to figure out how to play super Mario brothers on the nes with only visual input. It would have to learn the concept of lives and fail states, some things could play naturally like if it got to the first castle it knows that it needs to move to the right to progress, and certain actions can give you score. It would get to bowser, the sprite is moving. So it might be an enemy, or it could be a platform. But you die when you touch it, so it determines that this is a hazard that is mobile. It figured out that stationary hazards like reaching the bottom of the screen it can't kill with fireballs, but a mobile hazard can, up to this point. So it shoots it with fireballs, maybe dying once or twice to the fire before realizing that you cant jump on that. So it either avoids the enemy by jumping over it or going around it, or blasting it with fireballs. Once the enemy is clear, it will continue to navigate to the right, and it sees the score going up from the extra time. Probably  way harder to do than that but it could be feasible to do. Something like Zelda? maybe later.

  2. I’m sure this is obvious but how do you program an AI to have an open goal like “as many points as possible”?

    Does it just note everything that happened in achieving a higher score and attempt to replicate that with minor changes to leave open the possibility of a better one?

    Does it figure out how the game actually works (such as needing to bounce the thing back) and avoid missing it, or is this a brute force approach where it reaches that end through trial and error?

    I find these things to be so interesting but very confusing lol

  3. One important point with this is that when researchers moved the "paddle" up a pixel the AI couldn't play the game at all even though it was at superhuman master level. So it was not able to abstract to something that was basically the exact same. This is an example of a hypersmart computer that lacks the common sense of a mouse.

  4. I remember as a kid my brothers and I were struggling over the same level on a video game. We had all taken a shot at it for an entire day and frustrated, we went to bed. We woke up the next morning and immediately powered on the playstation and took our controllers. Just as we were ready to sit on the couch and move our controls, we suddenly realized that the player was moving without our controlling it. Confused, we looked at one another. I said, "I'm not controlling it, are you?" All of us agreed that none of us were in control. Our confusion slowly turned to awe as we watched the level completed with an exactness and expertise never seen before. Our awe quickly turned to glee and we began shortly triumphantly at the screen "Go computer! Kick their butts!" And cheering on the A.I. haha. It won the level and will forever stay in our minds as a glorious day, when the computer decided to look fondly upon us and give us kids a second chance 🙂

  5. If AI can accomplish all intellectual tasks, the only field left to us human being is to develop spiritual values and moral virtues: courage, wisdom, justice, temperance

  6. the thing is though, does it really "see" that it has tunneled through and bounced the ball off the back, or did the network simply NOT select against that behavior of tunneling? To test its understanding of delayed gratification, you'd have to introduce a consequence for tunneling that the AI "sees" is worth taking.

  7. What would REALLY be astonishing is that if learning algorithms can learn to play games like Mario, which is an NP problem, they could learn to solve NP problems and tell us how. Thus leading to a unifying or differentiation between P and NP problems in general. Amazing!

  8. Interesting, but one thing I wonder is why DeepMind tried to play in effective way that it dug the hole rather than just receiving every drop of the ball which it could have done as well.

  9. ''It realizes that digging a tunnel … is the most effective way…" Sorry, but in Breakout, you can either miss the ball or reflect the ball, but not control its direction. Disappointing that the Deepmind gurus try selling a chance event as an example of the deep insights reached by their learning algorithms. These were early days, I guess.

Comments are closed.