Shaked Zychlinski ๐ŸŽ—๏ธ
1 min readJun 4, 2019

--

I assume you refer to the Double DQN. ITโ€™s just like any other network you train โ€” you always start with random numbers and optimize as you go. Same thing happens when you train a regular DQN โ€” the initial Q values it spits are worthless, but itโ€™s getting better as it learns. Same thing happens here, but we just freeze the target network for longer than the first one.

--

--

Shaked Zychlinski ๐ŸŽ—๏ธ
Shaked Zychlinski ๐ŸŽ—๏ธ

Written by Shaked Zychlinski ๐ŸŽ—๏ธ

Lives in Tel-Aviv, Israel ๐Ÿ‡ฎ๐Ÿ‡ฑ See me on shakedzy.xyz

Responses (1)