I assume you refer to the Double DQN.

1 min readJun 4, 2019

I assume you refer to the Double DQN. IT’s just like any other network you train — you always start with random numbers and optimize as you go. Same thing happens when you train a regular DQN — the initial Q values it spits are worthless, but it’s getting better as it learns. Same thing happens here, but we just freeze the target network for longer than the first one.

Written by Shaked Zychlinski 🎗️

Responses (1)