World Model for Language Model

Official code for Language Models Meet World Models: Embodied Experiences Enhance Language Models (NeurIPS 2023). Also check our twitter here.

Setting Up

Install the dependencies by

pip install -r requirements.txt

E2WM Benchmark

All constructed training data from embodied experiences and evaluation data are released in /data/train and /data/eval, respectively.

Data Statistics

Training

Task	Size
Plan Generation	1659
Activity Recognition	1659
Counting	1000
Object Path Tracking	1000

Evaluation

Task	Size
Plan Generation
- Vanilla Seen	175
- Vanilla Unseen	54
- Confusing Seen	135
- Confusing UnSeen	43
Houwork QA	261
Negation Housework QA	162
Activity Recognition QA	549
Activity Inference QA	262
Counting QA	194
Object Path Tracking	200
Object Location QA	200

Train & Eval

We compute the fiser matrixs on the sampled 20000 examples from Pile validation set. You can download fisher-matrix-1.3B and fisher-matrix-6B from huggingface model hub, and put them under fisher-matrix directory.

Then go to the scripts directory where you can find all the training and evaluation scripts:

cd scripts

We provide usage examples of GPT-J-6B below. If you want to use GPJ-Neo-1.3B, just replace 6B in the script name with 1.3B.

Train

If you want to train GPT-J-6B, use:

sh run_6B.sh

This script trains GPT-J-6B on a single GPU.

If you want to do distributed training, first run this:

accelerate config --config_file accelerate_config.json

and follow the instructions to set up the config file. (We also provide a sample config file in scripts)

Then, you can simply run:

sh run_6B_multi_gpu.sh

Eval

To do evaluation on QA tasks and generation tasks, run

sh eval_qa_6B.sh

and

sh eval_gen_6B.sh

The results will be stored in output/ewc-lora-6B/qa-metric.txt and output/ewc-lora-6B/gen-metric.txt, respectively.

If you want to do distributed evaluation for generation tasks, please modify eval_gen_6B.sh as:

Replace python eval_gen.py with accelerate launch --config_file accelerate_config.json eval_gen.py
Remove export CUDA_VISIBLE_DEVICES=0

This is same as the change from run_6B.sh to run_6B_multi_gpu.sh.

Citation

@article{xiang2023language,
  title={Language Models Meet World Models: Embodied Experiences Enhance Language Models},
  author={Xiang, Jiannan and Tao, Tianhua and Gu, Yi and Shu, Tianmin and Wang, Zirui and Yang, Zichao and Hu, Zhiting},
  journal={arXiv preprint arXiv:2305.10626},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
build_dataset/mcts		build_dataset/mcts
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
compute_fisher_matrix.py		compute_fisher_matrix.py
eval_gen.py		eval_gen.py
eval_ppl.py		eval_ppl.py
eval_qa.py		eval_qa.py
method.png		method.png
requirements.txt		requirements.txt
run.py		run.py
world.png		world.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

World Model for Language Model

Setting Up

E2WM Benchmark

Train & Eval

Train

Eval

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

World Model for Language Model

Setting Up

E2WM Benchmark

Train & Eval

Train

Eval

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages