Skip to content

Nice work! And several questions of the paper #3

@StarCycle

Description

@StarCycle

Hi @omeryagmurlu @mbreuss,

It's indeed a nice work! Great thanks for you to make it open-source so quickly! I am the author of a similar repository on CALVIN benchmark: GR1-Training. However, it takes more training time...

I have several questions about your paper:

  • What's the length of the history do you use, i.e., how many historical frames are sent to the visual encoder? GR1 uses a history length of 10. In my experiment, I found the success rate would be higher if the history length become longer. I guess CALVIN has sufficient data to train the network and avoid the common causal confusion problem.
  • Did you try to train MDT using only data with language annotations? Recent works like GR1 and 3D Diffuser Actor only use a small portion of CALVIN data (only the data with lang annotations). Although MDT's strength is training on a combination of data with and without language annotation (of course it's nice), training it with only lang-annotated data may be a useful ablation.
  • Did you try to close diffusion and let GPT-Diffusion policy to be just a normal GPT policy? Sometime a diffusion action head is not necessary, as shown by baku which is also trained on libero-90. When you are predicting an action chunk, the multimodality of the action distribution has been reduced (for a little bit).
  • Fig.4 and Table IX seems to have different conclusions and they are both on CALVIN ABCD->D task. Fig 4 shows that CLA loss is more important than MGF loss on the task, while Table IX shows MGF loss is more important.

图片
图片

Again it's a very nice work! I star it and would like to recommend it to others. Good lucks!

Zhuoheng

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions