Hi Nabila, I’m not entirely sure I understand your question.

1 min readDec 12, 2019

Hi Nabila, I’m not entirely sure I understand your question. What an MDN is trying to do is indeed to maximize the likelihood that the training data was sampled from the predicted distribution. The use of the PDF here is somewhat of a “cheat” — what we really want to maximize is the probability of sampling the dataset, but we use the PDF for simplicity (where the main difference is that the PDF isn’t bounded to be no more than 1).

As for the other part of the question, I did not understood it. At no point have I asked my network to learn a mean of 0 and STD of 1. The only constraint I impose is that STD must be a positive number.

Written by Shaked Zychlinski 🎗️

No responses yet