-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Dear authors. thank you very much for your contribution. I know you have improved the code structure but I am afraid it is still very hard for me to understand some method details.
I thought I should ask here, for anyone else having the same questions.
-
On Table 2 on the paper, presents the recommender system evaluation. If I understand correctly, you ignore the conversational part wile performing these experiments, so that you can properly compare only the recommender methods.
-
On Table 3, again you only evaluate the conversational part, ignoring the recommender task. In this case, you calculate the perplexity of the Ground Truth sentences and some of them may include UKN tokens, that might be predicted properly.
-
I do not understand what is the Dist-N metric. Is it the ratio of distinct N-grams divided by the total number of words produced by the model ? In that case, I would expect it to be greater than one, since the possible distinct N-grams are way more than the distinct 1-gram (distinct one words)
Regarding the big picture of the complete End-to-End model.
-
You identify Named entities on real time from the conversation or do you have a dictionary with all mentioned named entities mentioned at each utterance (Similarly to the ReDial authors) ?
-
Do you perform sentiment analysis and use it on your recommending module, or do you ignore the sentiment regarding the entities and only use them as an ordered "bag of words"?
-
If you perform sentiment analysis during the time of conversation, you only give the utterances that have been sent up to that time ?
-
You use the same Switching technique for joining the Conversational output space with the Recommending output space, like the ReDial authors. Does any of your results (maybe Table 3) present joint evaluation (recommending and NLG tasks)? If so, when you evaluate the token of some mentioned movie, do you evaluate if the specific movie was predicted, or do you simply evaluate if any movie was predicted, and use that as a correct NLG evaluation?
-
Figure 2, evaluates the recommending performance of the full End-to-End model or only the performance of the recommending method? If it is about the full End-to-End model, does the predicted recommended item needs to be on the same token position with the Ground Truth
one, or just mentioned anywhere on the generated response?
I hope my question will not be a lot of trouble, and will help more of us to better understand your work.
Thank you in advance for your time!
Best Regards,
Nikos.