Incorrect :) What you get using DQN is exactly the probabilities to select each action.

You wrote: "We then use those Q-Values to decide on our policy π (where a policy is simply each…
1
Piotr Niewiński
Shaked Zychlinski 🎗️
·Follow
Nov 26, 2020
--
Incorrect :) What you get using DQN is exactly the probabilities to select each action. These probabilities are based on the predicted Q-Values.
--
--
Written by Shaked Zychlinski 🎗️1.5K Followers
·27 Following
Lives in Tel-Aviv, Israel 🇮🇱 See me on shakedzy.xyz
Responses (1)
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams