1 min readSep 25, 2019
Hi Aydar, it’s true that action-state input will allow a continuous actions selection, but you can’t perform maxarg over such continuous set of actions, as its infinite.
Overall, Q-Learning is only meant to deal with a fixed set of possible actions.