-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Description
It's indeed a nice work! Great thanks for you to make it open-source so quickly! I am the author of a similar repository on CALVIN benchmark: GR1-Training. However, it takes more training time...
I have several questions about your paper:
- What's the length of the history do you use, i.e., how many historical frames are sent to the visual encoder? GR1 uses a history length of 10. In my experiment, I found the success rate would be higher if the history length become longer. I guess CALVIN has sufficient data to train the network and avoid the common causal confusion problem.
- Did you try to train MDT using only data with language annotations? Recent works like GR1 and 3D Diffuser Actor only use a small portion of CALVIN data (only the data with lang annotations). Although MDT's strength is training on a combination of data with and without language annotation (of course it's nice), training it with only lang-annotated data may be a useful ablation.
- Did you try to close diffusion and let GPT-Diffusion policy to be just a normal GPT policy? Sometime a diffusion action head is not necessary, as shown by baku which is also trained on libero-90. When you are predicting an action chunk, the multimodality of the action distribution has been reduced (for a little bit).
- Fig.4 and Table IX seems to have different conclusions and they are both on CALVIN ABCD->D task. Fig 4 shows that CLA loss is more important than MGF loss on the task, while Table IX shows MGF loss is more important.
Again it's a very nice work! I star it and would like to recommend it to others. Good lucks!
Zhuoheng
Metadata
Metadata
Assignees
Labels
No labels

