Add SSDlite architecture with MobileNetV3 backbones by datumbox · Pull Request #3757 · pytorch/vision

datumbox · 2021-04-30T17:21:05Z

Resolves #1422, fixes #3757

This PR implements SSDlite with MobileNetV3 backbone as outlined in the papers [1] and [2].

Trained using the code committed at 8aa3f58. The current best pre-trained model was trained with (using latest git hash):

python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
    --dataset coco --model ssdlite320_mobilenet_v3_large --epochs 660\
    --aspect-ratio-group-factor 3 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24\
    --weight-decay 0.00004 --data-augmentation ssdlite

Submitted batch job 40959060, 41037042, 41046786

Accuracy metrics at 4ca472e:

Epoch 648 (reconfigured):
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.213
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.343
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.221
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.011
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.202
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.444
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.208
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.307
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.334
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.043
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.344
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643

Validated with:

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py\
   --dataset coco --model ssdlite320_mobilenet_v3_large --pretrained --test-only

Speed benchmark:
0.09 sec per image on CPU

fmassa

Looks great, thanks a lot Vasilis!

I have a couple of comments, let me know what you think

fmassa · 2021-05-11T09:30:20Z

torchvision/models/detection/ssdlite.py

+    # Enable [-1, 1] rescaling and reduced tail if no pretrained backbone is selected
+    rescaling = reduce_tail = not pretrained_backbone


This is a bit confusing, but I assume the [-1, 1] rescaling is necessary to get best results given the current settings?

That is correct. Rescaling was part of the changes needed to boost the accuracy by 1mAP.

fmassa · 2021-05-11T09:33:20Z

torchvision/models/detection/ssdlite.py

+    backbone = _mobilenet_extractor("mobilenet_v3_large", progress, pretrained_backbone, trainable_backbone_layers,
+                                    norm_layer, rescaling, _reduced_tail=reduce_tail, _width_mult=1.0)
+
+    size = (320, 320)


This means that the size is hard-coded and even if the user passes a different size **kwargs in the constructor it won't be used?

What about doing something like

size = kwargs.get("size", (320, 320))

instead, so that the users can potentially customize the input size if they wish?

I chose to hardcode it because this is the ssdlite320 model which uses a fixed 320x320 size. The input size is much less flexible on SSD models comparing to FasterRCNN because they make a few strong assumptions about the input.

If someone wants to use a different size, it would be simpler to just create the backbone, configure the DefaultBoxGenerator and then initialize directly the SSD with the config of their choice. Overall I felt that this approach would be simpler than trying to offer an API that tries to cover all user needs.

fmassa · 2021-05-11T09:40:10Z

torchvision/models/detection/ssdlite.py

+    kwargs = {**defaults, **kwargs}
+    model = SSD(backbone, anchor_generator, size, num_classes,
+                head=SSDLiteHead(out_channels, num_anchors, num_classes, norm_layer),
+                image_mean=[0., 0., 0.], image_std=[1., 1., 1.], **kwargs)


hum, interesting.

I would have expected that we could have removed the rescaling part and instead changed mean / std here to be image_mean=0.5, image_std=0.5, but I assume this wasn't done due to padded regions having a different value than what you would have liked, is that correct?

Also, this will probably mean that even if you were to use a pretrained backbone, it wouldn't give good results because you are passing a non-default image mean / std.

In this case, might be better to disable passing a pretrained backbone altogether?

Correct. Here I'm trying to be as close to the canonical implementation as possible and that helped me close the gap in the accuracy.

You are right to say that a pretrained backbone would need different mean/std. Thankfully because our setup is to train end-to-end and use extensive BN, the backbone adapts to the different input fairly quickly even when one uses a pre-trained backbone. In the end, as I trained it for quite a few epochs, it was better to start from random weights which led to a better result (this is a common finding in similar setups).

Though indeed overall for the API it might be simpler to disable passing a pretrained backbone, this means that the API for SSDlite will be different from any other model. It will also create issues with our training scripts that expect to be able to pass this parameter. I think what I will do to address better this remark is make the mean/std configurable.

fmassa

Thanks for the answers Vasilis!

Chatted offline, let's get this PR merged and then follow up with creating a few issues to investigate some of the points that I brought.

Summary: * Partial implementation of SSDlite. * Add normal init and BN hyperparams. * Refactor to keep JIT happy * Completed SSDlite. * Fix lint * Update todos * Add expected file in repo. * Use C4 expansion instead of C4 output. * Change scales formula for Default Boxes. * Add cosine annealing on trainer. * Make T_max count epochs. * Fix test and handle corner-case. * Add support of support width_mult * Add ssdlite presets. * Change ReLU6, [-1,1] rescaling, backbone init & no pretraining. * Use _reduced_tail=True. * Add sync BN support. * Adding the best config along with its weights and documentation. * Make mean/std configurable. * Fix not implemented for half exception Reviewed By: cpuhrsch Differential Revision: D28538769 fbshipit-source-id: df6c2e79b76e6d6297aa51ca0ff4535dc59eaf9b

evekeen · 2022-11-11T03:29:23Z

torchvision/models/detection/ssdlite.py

+        )
+
+        get_depth = lambda d: max(min_depth, int(d * width_mult))  # noqa: E731
+        extra = nn.ModuleList([


@datumbox could you please help me figure it out - I cannot find the info about these extra layers in the papers. Where did you get them from?
I'm trying to create a modification for this model and struggle to understand it - any help would be appreciated!
I want to reduce the number of encoder layers to make feature maps detect small objects.

@evekeen I have written a blogpost about the implementation details of this model. See here. The extra layers are described on section 6.3 of the paper though to get their exact values you need to dig in the original TF code. Hope that helps!

@datumbox Thank you for the quick reply! It's very helpful

@datumbox in 6.3 of MobileNet3 paper, I only see the info on connecting C4 and C5 layers to the SSD head. There is nothing on these extra layers there.

Have you checked the reference code I sent? This comes from their official repo.

Yes, I see that in the TensorFlow implementation.
I'm trying to understand if I'm reducing the depth of C4 (and thus the output stride for targeting super small objects) - how should I change the rest of the layers?

Sorry, it's been quite sometime since I wrote the implementation. I think you will need to dig into the original research repo to get the details.

Partial implementation of SSDlite.

6e87247

facebook-github-bot added the cla signed label Apr 30, 2021

datumbox marked this pull request as draft April 30, 2021 17:21

datumbox mentioned this pull request Apr 30, 2021

TorchVision Roadmap - 2021 H1 #3221

Closed

13 tasks

datumbox and others added 9 commits May 1, 2021 10:46

Add normal init and BN hyperparams.

1f1381d

Refactor to keep JIT happy

41e107b

Completed SSDlite.

cd25aef

Fix lint

e3680ad

Update todos

1fafb8f

Add expected file in repo.

415058b

Use C4 expansion instead of C4 output.

d17eb6c

Change scales formula for Default Boxes.

f318332

Add cosine annealing on trainer.

f4b907d

datumbox force-pushed the models/ssdlite branch from fc821a0 to f4b907d Compare May 1, 2021 21:24

datumbox added 4 commits May 1, 2021 23:02

Make T_max count epochs.

6d40406

Fix test and handle corner-case.

34c2769

Add support of support width_mult

ea46bfc

Add ssdlite presets.

0dca06c

datumbox mentioned this pull request May 3, 2021

Pass custom scales on DefaultBoxGenerator and change default estimation #3766

Merged

datumbox and others added 9 commits May 3, 2021 21:53

Merge branch 'master' into models/ssdlite

7cce538

Change ReLU6, [-1,1] rescaling, backbone init & no pretraining.

8b9ca53

Merge branch 'master' into models/ssdlite

f8cbe46

Use _reduced_tail=True.

8aa3f58

Add sync BN support.

da81b69

Merge branch 'master' into models/ssdlite

5fbc112

Merge branch 'master' into models/ssdlite

d4024cb

Adding the best config along with its weights and documentation.

4ca472e

Merge branch 'master' into models/ssdlite

d8d55b7

datumbox marked this pull request as ready for review May 10, 2021 13:26

datumbox changed the title ~~[WIP] Add SSDlite architecture with MobileNetV3 backbones~~ Add SSDlite architecture with MobileNetV3 backbones May 10, 2021

datumbox requested a review from fmassa May 10, 2021 13:26

datumbox added module: models topic: object detection labels May 10, 2021

Merge branch 'master' into models/ssdlite

bad974a

fmassa reviewed May 11, 2021

View reviewed changes

datumbox added 2 commits May 11, 2021 11:26

Make mean/std configurable.

4020705

Fix not implemented for half exception

c0bfc51

fmassa approved these changes May 11, 2021

View reviewed changes

Merge branch 'master' into models/ssdlite

d05ef49

datumbox merged commit 43d7720 into pytorch:master May 11, 2021

datumbox deleted the models/ssdlite branch May 11, 2021 14:50

oke-aditya mentioned this pull request May 15, 2021

Are new models planned to be added? #2707

Closed

37 tasks

evekeen reviewed Nov 11, 2022

View reviewed changes

		# Enable [-1, 1] rescaling and reduced tail if no pretrained backbone is selected
		rescaling = reduce_tail = not pretrained_backbone

Conversation

datumbox commented Apr 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

evekeen Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

datumbox commented Apr 30, 2021 •

edited

Loading

evekeen Nov 15, 2022 •

edited

Loading