-
Notifications
You must be signed in to change notification settings - Fork 19
alphavideo.model.tubeTK
TubeTK is an end-to-end one training stage model for video multi-object tracking. This is the official implementation of paper "TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model". The detailed intact training and inference scripts can be found here.
FUNCTION alphavideo.model.tubeTK(num_class=1, pretrain=True)
-
Build the tubeTK model for multi-object tracking.
-
Parameters:
-
num_class (int): Number of object categories the model tracks. At present,num_class=1is usually used for pedestrian or car tracking. We only provide pretrained model fornum_class=1. By defaultnum_class=1. -
pretrain (bool): Whether to load weight pretrained on MOT16. We only provide pretrained model fornum_class=1. By default, ```pretrain=True'''.
-
-
Input:
-
img (tensor): Input frames of the target video. Its shape is (). By default,
.
-
img_meta (list of dic): Meta data for input frames. It is a list of dic. Each dic is corresponding for a video in the batch. The shape of dic is:{'img_shape': [8, 1080, 1920], 'value_range': 1, 'pad_percent': [1,1]}where
img_shapeindicates the original shape of the input clip before transformation likeresizeorpadding. It is used for mapping the predicted coordinates to original space.value_rangeis the value the model used to present the coordinate. For example, ifvalue_range=2,(2, 2)will be the coordinate of bottom right corner.pad_percentindicated the padding percent of the input frames. For example, if an image with original shape of (80, 100) is padding to (100, 100) for input, thepad_percentshould be[1, 0.8]. -
gt_tubes (list of tensor): Only needed when training. It is a list of tensor. Each tensor is corresponding for a video in the batch. The shape of tensor isn, 15, representingnBtubes which is expressed by 15 coordinates. For example, if a Btube'sis
,
is
, and
is
, then the input coordinates should be
.
-
gt_labels (list of tensor): Only needed when training. It is a list of tensor. Each tensor is corresponding for a video in the batch. The shape of tensor isn, num_class, representingnone-hot labels. -
return_loss (bool): A flag to control whether to train the model and return the loss (True) or conduct inference process and return Btube list (False).
-
-
Output:
- When
return_loss=True: The output is a dic containing multiple loss:dict( loss_cls=loss_cls, loss_reg=loss_reg, loss_centerness=loss_centerness, mid_iou_loss=mid_iou_loss)
- When
return_loss=False: The output is a list of results. Each element in the list is the results of one video in the batch. The results is also a list:[tubes, labels, others].tubesis a list of Btubes with shape[n ,15]just as the inputgt_tubes.labelsis a list of lables with shape[n, num_class]just as the inputgt_labels.othersis some intermediate results. For details, please see detailed train and evaluation scripts here.
- When
-
Example:
# model
model = alphavideo.model.tubeTK(pretrain=True)
print(model)
# input
images = torch.zeros((1, 3, 8, 896, 1152))
image_meta = [{'img_shape': [8, 1080, 1920],
'value_range': 1,
'pad_percent': [1, 1]}]
gt_tubes = [torch.tensor([[0, 0, 0.1, 0.1, 3, 2, 0, 0, 0, 0, -2, 0, 0, 0, 0]])]
gt_labels = [torch.ones((1, 1))]
results = model(images, image_meta, return_loss=False, gt_tubes=gt_tubes, gt_labels=gt_labels)
print(results)@inproceedings{pang2020tubeTK,
title={TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model},
author={Pang, Bo and Li, Yizhuo and Zhang, Yifan and Li, Muchen and Lu, Cewu},
booktitle={CVPR},
year={2020}
}