Zifu Wan, Yaqi Xie, Ce Zhang, Zhiqiu Lin, Zihan Wang, Simon Stepputtis, Deva Ramanan, Katia Sycara
Robotics Institute, Carnegie Mellon University
ACL 2025 Main
This dataset contains annotated images and language instructions for task-oriented part segmentation. The dataset is divided into three main folders: all, train, and test.
instructpart_part/InstructPart
βββ all
β βββ data_all.json
β βββ images
β βββ masks
β βββ visualize_overlap
βββ train
β βββ train200
β β βββ images
β β βββ masks
β β βββ data_train.json
β βββ train600
β β βββ images
β β βββ masks
β β βββ data_train.json
β βββ train1200
β β βββ images
β β βββ masks
β β βββ data_train.json
β βββ train1800
β βββ images
β βββ masks
β βββ data_train.json
βββ test
β βββ images
β βββ masks
β βββ data_test.json
βββ README.md
- Total samples: 2400
- Training samples: 1800
- The training set is further divided into 200, 600, 1200, 1800 samples. Specifically, starting with 200 samples, we gradually increase the number of training samples to 600, 1,200, and finally 1,800. Each increment includes all the previously used training samples.
- Testing samples: 600
[
{
"image_path": "image_name.jpg",
"part_list": [
{
"object": "object_name",
"part": "part_name",
"affordance": "affordance_name",
"action": "action_name",
"instruction": [
"human-written-instruction",
"GPT4-rewritten-instruction",
"part-of-object sentence",
"part-of-object-affordance sentence",
"part-of-object",
"part-of-object-affordance"
]
}
]
},
...
]def read_my_data(instruction_id):
val_dataset = "InstructPart|test"
base_image_dir = "xxx"
splits = val_dataset.split("|")
if len(splits) == 2:
ds, split = splits
if ds == "InstructPart":
images = glob.glob(
os.path.join(base_image_dir, "instructpart_part", ds, split, "images", "*.jpg")
)
masks_path = os.path.join(base_image_dir, "instructpart_part", ds, split, "masks")
json_path_reason_part = os.path.join(base_image_dir, "instructpart_part", ds, split, "data_" + split +".json")
reason_part_data = {}
with open(json_path_reason_part, 'r') as file:
data = json.load(file)
for entry in data:
part_list = entry["part_list"]
image_path = entry["image_path"]
for item in part_list:
object_name = item['object'].replace(" ", "_")
part_name = item['part'].replace(" ", "_")
# Generate filenames
base_filename = os.path.splitext(image_path)[0] # Filename without extension
object_part_filename = f"{base_filename}-{object_name}-{part_name}.jpg"
instruction = item["instruction"][instruction_id]
reason_part_data[object_part_filename] = instruction
return reason_part_data, images
instruction_ids = [
0, # human-annotated task instruction
1, # gpt-rewritten task instruction
2, # object-part task instruction
3, # object-part-affordance task instruction
4, # object-part name
5, # object-part-affordance name
]
# get image and instruction content
reason_part_data, images_paths = read_my_data(instruction_ids[i])
for idx in range(len(images_paths)):
image_path = images_paths[idx]
instruction = reason_part_data[os.path.basename(image_path)]
image = Image.open(image_path).convert("RGB")Our model is adapted from LISA. Besides, we appreciate the free-licensed images from Flickr website and AGD20K.
If you have any questions, please contact at [email protected].
If you find this code useful, please consider citing our work:
@inproceedings{
wan2024instructpart,
title={InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning},
author={Wan, Zifu and Xie, Yaqi and Zhang, Ce and Lin, Zhiqiu and Wang, Zihan and Stepputtis, Simon and Ramanan, Deva and Sycara, Katia},
booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics},
year={2025},
url={https://openreview.net/forum?id=IMEr4XgJSZ}
}