1 min readJun 4, 2019
I assume you refer to the Double DQN. ITโs just like any other network you train โ you always start with random numbers and optimize as you go. Same thing happens when you train a regular DQN โ the initial Q values it spits are worthless, but itโs getting better as it learns. Same thing happens here, but we just freeze the target network for longer than the first one.