Google DeepMind的Atari Breakout深度Q學習

Google DeepMind使用深度強化學習創建了一個人工智慧程序，該程序可以玩Atari遊戲並將自身提升到超人的水平。它能夠玩許多Atari遊戲，並結合了深度人工神經網路和強化學習功能。在用演算法展示出最初的結果之後，Google幾乎立即以數億美元的價格收購了該公司，因此得名Google DeepMind。請欣賞影片，如果您對深度學習有任何疑問，請告訴我！

______________________

為你推薦：
1. DeepMind的AlphaGo如何擊敗Lee Sedol-https://www.youtube.com/watch?v=a-ovvd_ZrmA&index=58&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e
2. DeepMind如何克服深度學習（AlphaGo）-https://www.youtube.com/watch?v=IFmj5M5Q5jg&index=42&list=PLujxSBD-JXgnqDD1n-V30pKtp6Q886x7e
3. Google DeepMind的深度Q學習和超人Atari遊戲玩法-

如果您想查看更多類似的內容，請訂閱：http://www.youtube.com/subscription_center?add_user=keeroyz

-原始的DeepMind代碼：https：//sites.google.com/a/deepmind.com/dqn/

-Ilya Kuzovkin的叉子，帶有可視化效果：
https://github.com/kuz/DeepMind-Atari-Deep-Q-Learner

-此補丁修復了重新載入預訓練網路時的可視化問題。在第一個評估批處理完成後（通常是幾分鐘），將出現該窗口：
http://cg.tuwien.ac.at/~zsolnai/wp/wp-content/uploads/2015/03/train_agent.patch

-此配置文件將使用少於1GB的VRAM運行Ilya Kuzovkin的版本：
http://cg.tuwien.ac.at/~zsolnai/wp/wp-content/uploads/2015/03/run_gpu

-有關此深度學習技術的原始《自然》論文可在此處獲得：
http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html

-還有一些不在收費區後面的鏡子：
http://www.cs.swarthmore.edu/~meeden/cs63/s15/nature15b.pdf
http://diyhpl.us/~nmz787/pdf/Human-level_control_through_deep_reinforcement_learning.pdf

網址→https://cg.tuwien.ac.at/~zsolnai/
Twitter→https://twitter.com/karoly_zsolnai。

44 comments

The Retro Bandit說道：

2020年3月16日下午9:28

Do you think that within the decade q-learning could manage to figure out how to play super Mario brothers on the nes with only visual input. It would have to learn the concept of lives and fail states, some things could play naturally like if it got to the first castle it knows that it needs to move to the right to progress, and certain actions can give you score. It would get to bowser, the sprite is moving. So it might be an enemy, or it could be a platform. But you die when you touch it, so it determines that this is a hazard that is mobile. It figured out that stationary hazards like reaching the bottom of the screen it can't kill with fireballs, but a mobile hazard can, up to this point. So it shoots it with fireballs, maybe dying once or twice to the fire before realizing that you cant jump on that. So it either avoids the enemy by jumping over it or going around it, or blasting it with fireballs. Once the enemy is clear, it will continue to navigate to the right, and it sees the score going up from the extra time. Probably way harder to do than that but it could be feasible to do. Something like Zelda? maybe later.
I M說道：

2020年3月16日下午9:28

it is very interesting
ValensBellator說道：

2020年3月16日下午9:28

I』m sure this is obvious but how do you program an AI to have an open goal like 「as many points as possible」?

Does it just note everything that happened in achieving a higher score and attempt to replicate that with minor changes to leave open the possibility of a better one?

Does it figure out how the game actually works (such as needing to bounce the thing back) and avoid missing it, or is this a brute force approach where it reaches that end through trial and error?

I find these things to be so interesting but very confusing lol
Roger Michou說道：

2020年3月16日下午9:28

D.A.R.Y.L playing Atari games
Hopi Ng說道：

2020年3月16日下午9:28

The moment it finds the narrow gate.
Nels說道：

2020年3月16日下午9:28

One important point with this is that when researchers moved the "paddle" up a pixel the AI couldn't play the game at all even though it was at superhuman master level. So it was not able to abstract to something that was basically the exact same. This is an example of a hypersmart computer that lacks the common sense of a mouse.
Red Jay說道：

2020年3月16日下午9:28

How did you get DeepMind program?
Chantal X說道：

2020年3月16日下午9:28

I remember as a kid my brothers and I were struggling over the same level on a video game. We had all taken a shot at it for an entire day and frustrated, we went to bed. We woke up the next morning and immediately powered on the playstation and took our controllers. Just as we were ready to sit on the couch and move our controls, we suddenly realized that the player was moving without our controlling it. Confused, we looked at one another. I said, "I'm not controlling it, are you?" All of us agreed that none of us were in control. Our confusion slowly turned to awe as we watched the level completed with an exactness and expertise never seen before. Our awe quickly turned to glee and we began shortly triumphantly at the screen "Go computer! Kick their butts!" And cheering on the A.I. haha. It won the level and will forever stay in our minds as a glorious day, when the computer decided to look fondly upon us and give us kids a second chance 🙂
Zapy說道：

2020年3月16日下午9:28

If AI can accomplish all intellectual tasks, the only field left to us human being is to develop spiritual values and moral virtues: courage, wisdom, justice, temperance
atangce A說道：

2020年3月16日下午9:28

from life 3.0
DrDress說道：

2020年3月16日下午9:28

Wow. That impressive. Now get it to make paperclips!
S E說道：

2020年3月16日下午9:28

GOO GULL=EVIL!!!!
GOO=SIHT!!!
EVIL=GOOGLE!!!
SHAH SHAHIN ALI說道：

2020年3月16日下午9:28

Liked
inklike說道：

2020年3月16日下午9:28

Who else is here from Max Tegmark's book "Life 3.0"?
위클래스說道：

2020年3月16日下午9:28

사람은 기계를 이길 수 있을까…
이길 수 없다면 이 이상의 발전은 그만둬야 하는게 아닐까
Гальванизированный Труп說道：

2020年3月16日下午9:28

we're all doomed motherfuckers
Ray Gordon Teaches Chess說道：

2020年3月16日下午9:28

If you try to teach it Qix you better also teach it to kick the machine in frustration.
Ray Gordon Teaches Chess說道：

2020年3月16日下午9:28

Yes the tunnel was an obvious technique.
Mediocre White Male說道：

2020年3月16日下午9:28

the thing is though, does it really "see" that it has tunneled through and bounced the ball off the back, or did the network simply NOT select against that behavior of tunneling? To test its understanding of delayed gratification, you'd have to introduce a consequence for tunneling that the AI "sees" is worth taking.
Matthew LaMacchia說道：

2020年3月16日下午9:28

Brick by Brick….. piece by piece….Tomato….Tamato what's the difference WatchTower/Deep Mind…?
Clown Baby說道：

2020年3月16日下午9:28

This is what is going to kill all of us.
奶派說道：

2020年3月16日下午9:28

Let it solve the hacking problem of GTA V! Really need it!
Josh Campbell說道：

2020年3月16日下午9:28

What would REALLY be astonishing is that if learning algorithms can learn to play games like Mario, which is an NP problem, they could learn to solve NP problems and tell us how. Thus leading to a unifying or differentiation between P and NP problems in general. Amazing!
Josh Campbell說道：

2020年3月16日下午9:28

Deep Thought, what is the answer to the ultimate question of life, the universe, and everything?
Monkey Robots Inc.說道：

2020年3月16日下午9:28

as long as everyone knows, this has absolutely nothing to do with the idiots at google.
Fingolfin說道：

2020年3月16日下午9:28

It is not technically an algorithm, its an artificial intelligence that uses Q-learning with a neural network.
Matthew Hynds說道：

2020年3月16日下午9:28

Max Tegmark』s book 「Life 3.0」 brought me here ?
BowlG Official說道：

2020年3月16日下午9:28

It should start playing skyrim
Damien Lancry說道：

2020年3月16日下午9:28

what kind of hardware do you need to train it in 240 minutes?
biffrapper說道：

2020年3月16日下午9:28

We're fucked
Blownhither Ma說道：

2020年3月16日下午9:28

I wonder how does it converge on a move-efficient scheme if the loss only covers maximizing the score? Would a 'catch-all' scheme be more risky?
Willy Kitheka說道：

2020年3月16日下午9:28

Very interesting stuff indeed! We are living in exciting times!
Ezra as a camera man and my cousin and Elijah說道：

2020年3月16日下午9:28

Find a game disc is supposed to be is it supposed to be a ball game or soccer
LeChat TheCat說道：

2020年3月16日下午9:28

you can make similar program with DQN and keras-rl: https://noteoneverything.blogspot.com/2018/02/reinforcement-learning-of-atari-breakout.html
Jun Park說道：

2020年3月16日下午9:28

Interesting, but one thing I wonder is why DeepMind tried to play in effective way that it dug the hole rather than just receiving every drop of the ball which it could have done as well.
Saw wil說道：

2020年3月16日下午9:28

When it learns to play dungeons and dragons we are all doomed…
Walala Land說道：

2020年3月16日下午9:28

So, what about a next level AI created by a next level AI by a superhuman AI…. Maybe 'they' can figure out faster than light travel.
Thomas Oertner說道：

2020年3月16日下午9:28

''It realizes that digging a tunnel … is the most effective way…" Sorry, but in Breakout, you can either miss the ball or reflect the ball, but not control its direction. Disappointing that the Deepmind gurus try selling a chance event as an example of the deep insights reached by their learning algorithms. These were early days, I guess.
XinGang Li說道：

2020年3月16日下午9:28

Amazing !
Thorston Bartrop說道：

2020年3月16日下午9:28

Mantis Shrimp
Pepper說道：

2020年3月16日下午9:28

How many people will use ai to cheat on games…
Крыжовник說道：

2020年3月16日下午9:28

Let him play in the "Detroit: becoming a man" ))
deniz yıldırım說道：

2020年3月16日下午9:28

Keşke biri Türkçe yorum yapsa bende anlayabilsem ?
이정훈說道：

2020年3月16日下午9:28

알파고님 충성충성충성
전 기계제국의 충실한 노예입니다. 핥짝핥짝

Comments are closed.