TrackVerse: A Large-scale Dataset of Object Tracks

This repository provides the data, tools, and code to download, explore, and utilize the TrackVerse dataset.

The TrackVerse dataset is a large-scale collection of 31.9 million object tracks, each capturing the motion and appearance of an object over time. These tracks are automatically extracted from YouTube videos using state-of-the-art object detection (DETIC) and tracking (ByteTrack) algorithms. The dataset spans 1203 object categories from the LVIS ontology, ensuring a diverse and long-tailed distribution of object classes.

TrackVerse is designed to ensure object-centricity, class diversity, and rich object motions and states. Each track is enriched with metadata, including bounding boxes, timestamps, and prediction labels, making it a valuable resource for research in object-centric representation learning, video analysis, and robotics.

In our paper, we explore the use of TrackVerse for learning unsupervised image representations. By introducing natural temporal augmentations—i.e., viewing an object across time and motion—TrackVerse enables models to learn fine-grained, state-aware representations that are more sensitive to object transformations and behaviors (See paper and for details).

🎁 Bonus: Our fully automated object track collection pipeline can be easily scaled up without any manual annotation. You can also create your own customized dataset of object tracks using different vocabularies, source videos, or curation strategies.

🚀 News

[Oct 2025] Our fully automated object track collection pipeline is now publicly released!
[July 2025] TrackVerse dataset and download scripts are now publicly released!
[June 2025] 🎉 Our paper TrackVerse has been accepted to ICCV 2025 🌺

Stay tuned for future updates and improvements!

Download TrackVerse

TrackVerse is released as a collection of object track metadata stored in JSONL files, where each line represents a single track with the following fields:

metadata keys

track_id: Unique ID for the track
track_ts: Start and end timestamps of the track (seconds) in the original video
frame_ts: Timestamps for each frame in the track (seconds) in the original video
frame_bboxes: Bounding boxes [x, y, width, height] for each frame
yid: YouTube video ID
track_mp4_filename: Local filename of the track video
top10_label_ids: Top-10 predicted class IDs
top10_label_names: Top-10 predicted class names

To support diverse research needs, we provide the full TrackVerse dataset, curated subsets at various scales to ensure more balanced class distributions, and a human-verified validation set for in-domain evaluation:

Subset	#Tracks	Max Tracks per Class	Link
Full TrackVerse	31.9M	---	Coming soon.
82K-CB100	82K	100	🤗 Link
184K-CB300	184K	300	🤗 Link
259K-CB500	259K	500	🤗 Link
392K-CB1000	392K	1000	🤗 Link
1121K-CB2500	1.1M	2500	🤗 Link
3778K-CB8000	3.8M	8000	🤗 Link
Validation Set	4188	6	Link

For detailed instructions on extracting TrackVerse from the JSONL files, refer to the download guide.

Create Customized TrackVerse Dataset

You can also create your own customized dataset of object tracks, for example, using different vocabulary, different source videos or different curation strategies.

Set Up the Environment: Refer to the install guidelines for detailed instructions.
Clone the Repository: git clone --recurse-submodules https://github.com/MMPLab/TrackVerse.git
Follow the Pipeline: Follow the detailed steps outlined in our pipeline documentation.

Maintenance

For support or inquiries, please open a GitHub issue. If you have questions about technical details or need further assistance, feel free to reach out to us directly.

License

All code and data in this repo are available under the MIT License for research purposes only.

Citation

Please consider giving a star ⭐ and citing our paper if you find this repo useful:

@InProceedings{Wei_2025_ICCV,
    author    = {Wei, Yibing and Church, Samuel and Suciu, Victor and Lin, Jinhong and Wu, Cheng-En and Morgado, Pedro},
    title     = {TrackVerse: A Large-Scale Object-Centric Video Dataset for Image-Level Representation Learning},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
    pages     = {11153-11163}
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
bytetrack		bytetrack
detic		detic
doc		doc
pipeline		pipeline
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_tracks.py		download_tracks.py
extract_tracks.py		extract_tracks.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TrackVerse: A Large-scale Dataset of Object Tracks

🚀 News

Table of Contents

Download TrackVerse

Create Customized TrackVerse Dataset

Maintenance

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

MMPLab/TrackVerse

Folders and files

Latest commit

History

Repository files navigation

TrackVerse: A Large-scale Dataset of Object Tracks

🚀 News

Table of Contents

Download TrackVerse

Create Customized TrackVerse Dataset

Maintenance

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages