DSVT-P trainning on Kitti Dataset

Hi,

I tried to train a dsvt-pillar model using the kitti dataset, below is my config:

    CLASS_NAMES: ['Car', 'Pedestrian', 'Cyclist']

    DATA_CONFIG: 
        _BASE_CONFIG_: cfgs/dataset_configs/kitti_dataset.yaml
        POINT_CLOUD_RANGE: [0, -40, -3, 70.4, 40, 1]
    
        DATA_AUGMENTOR:
            DISABLE_AUG_LIST: ['placeholder']
            AUG_CONFIG_LIST:
                - NAME: gt_sampling
                  USE_ROAD_PLANE: True
                  DB_INFO_PATH:
                      - kitti_dbinfos_train.pkl
                  PREPARE: {
                     filter_by_min_points: ['Car:5', 'Pedestrian:5', 'Cyclist:5'],
                     filter_by_difficulty: [-1],
                  }
    
                  SAMPLE_GROUPS: ['Car:15','Pedestrian:15', 'Cyclist:15']
                  NUM_POINT_FEATURES: 4
                  REMOVE_EXTRA_WIDTH: [0.0, 0.0, 0.0]
                  LIMIT_WHOLE_SCENE: True
    
                - NAME: random_world_flip
                  ALONG_AXIS_LIST: ['x','y']
    
                - NAME: random_world_rotation
                  WORLD_ROT_ANGLE: [-0.78539816, 0.78539816]
    
                - NAME: random_world_scaling
                  WORLD_SCALE_RANGE: [0.95, 1.05]
                - NAME: random_world_translation
                  NOISE_TRANSLATE_STD: [0.5, 0.5, 0.5]
        DATA_PROCESSOR:
          -   NAME: mask_points_and_boxes_outside_range
              REMOVE_OUTSIDE_BOXES: True
          -   NAME: shuffle_points
              SHUFFLE_ENABLED: {
                'train': True,
                'test': False
              }
          -   NAME: transform_points_to_voxels_placeholder
              VOXEL_SIZE: [ 0.1505, 0.1709, 4 ]
    
    MODEL:
      NAME: CenterPoint
    
      VFE:
        NAME: DynPillarVFE3D
        WITH_DISTANCE: False
        USE_ABSLOTE_XYZ: True
        USE_NORM: True
        NUM_FILTERS: [192, 192]
    
      BACKBONE_3D:
        NAME: DSVT
        INPUT_LAYER:
          sparse_shape: [468, 468, 1]
          downsample_stride: []
          d_model: [192]
          set_info: [[36, 4]]
          window_shape: [[12, 12, 1]]
          hybrid_factor: [2, 2, 1] # x, y, z
          shifts_list: [[[0, 0, 0], [6, 6, 0]]]
          normalize_pos: False
        
        block_name: ['DSVTBlock']
        set_info: [[36, 4]]
        d_model: [192]
        nhead: [8]
        dim_feedforward: [384]
        
        
        dropout: 0.0 
        activation: gelu
        reduction_type: 'attention'
        output_shape: [468, 468]
        conv_out_channel: 192
        # ues_checkpoint: True
    
      MAP_TO_BEV:
        NAME: PointPillarScatter3d
        INPUT_SHAPE: [468, 468, 1]
        NUM_BEV_FEATURES: 192
    
      BACKBONE_2D:
        NAME: BaseBEVResBackbone
        LAYER_NUMS: [ 1, 2, 2 ]
        LAYER_STRIDES: [ 1, 2, 2 ]
        NUM_FILTERS: [ 128, 128, 256 ]
        UPSAMPLE_STRIDES: [ 1, 2, 4 ]
        NUM_UPSAMPLE_FILTERS: [ 128, 128, 128 ]
    
      DENSE_HEAD:
        NAME: CenterHead
        CLASS_AGNOSTIC: False
    
        CLASS_NAMES_EACH_HEAD: [
          ['Car', 'Pedestrian', 'Cyclist']
        ]
    
        SHARED_CONV_CHANNEL: 64
        USE_BIAS_BEFORE_NORM: False
        NUM_HM_CONV: 2
    
        BN_EPS: 0.001
        BN_MOM: 0.01
        SEPARATE_HEAD_CFG:
          HEAD_ORDER: ['center', 'center_z', 'dim', 'rot']
          HEAD_DICT: {
            'center': {'out_channels': 2, 'num_conv': 2},
            'center_z': {'out_channels': 1, 'num_conv': 2},
            'dim': {'out_channels': 3, 'num_conv': 2},
            'rot': {'out_channels': 2, 'num_conv': 2},
            'iou': {'out_channels': 1, 'num_conv': 2},
          }
    
        TARGET_ASSIGNER_CONFIG:
          FEATURE_MAP_STRIDE: 1
          NUM_MAX_OBJS: 500
          GAUSSIAN_OVERLAP: 0.1
          MIN_RADIUS: 2
    
        IOU_REG_LOSS: True
    
        LOSS_CONFIG:
          LOSS_WEIGHTS: {
            'cls_weight': 1.0,
            'loc_weight': 2.0,
            'code_weights': [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
          }
    
        POST_PROCESSING:
          SCORE_THRESH: 0.5
          POST_CENTER_LIMIT_RANGE: [-80, -80, -10.0, 80, 80, 10.0]
          MAX_OBJ_PER_SAMPLE: 500
    
          USE_IOU_TO_RECTIFY_SCORE: True
          IOU_RECTIFIER: [0.68, 0.71, 0.65]
    
    
          NMS_CONFIG:
            # NMS_TYPE: multi_class_nms  # only for centerhead， use mmdet3d version nms
            # NMS_THRESH: [0.7, 0.6, 0.55]
            # NMS_PRE_MAXSIZE: [4096, 4096, 4096]
            # NMS_POST_MAXSIZE: [500, 500, 500]
            
            NMS_TYPE: nms_gpu 
            NMS_THRESH: 0.1
            NMS_PRE_MAXSIZE: 4096
            NMS_POST_MAXSIZE: 500
    
      POST_PROCESSING:
        RECALL_THRESH_LIST: [0.3, 0.5, 0.7]
    
        EVAL_METRIC: kitti
    
    OPTIMIZATION:
        BATCH_SIZE_PER_GPU: 1
        NUM_EPOCHS: 20
    
        OPTIMIZER: adam_onecycle
        LR: 0.001
        WEIGHT_DECAY: 0.01
        MOMENTUM: 0.9
    
        MOMS: [0.95, 0.85]
        PCT_START: 0.4
        DIV_FACTOR: 10
        DECAY_STEP_LIST: [35, 45]
        LR_DECAY: 0.1
        LR_CLIP: 0.0000001
    
        LR_WARMUP: False
        WARMUP_EPOCH: 1
        
        GRAD_NORM_CLIP: 10
        LOSS_SCALE_FP16: 32.0
    
    HOOK:
      DisableAugmentationHook:
        DISABLE_AUG_LIST: ['gt_sampling','random_world_flip','random_world_rotation','random_world_scaling', 'random_world_translation']
        NUM_LAST_EPOCHS: 1



I only modified the point cloud range to match with the kitti settings and the voxel size to match with the default sparce shape [468, 468, 1], but I am constantly getting an error:

    RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

I traced down the error happened in DynamicPillarVFE3D module where the batch_dict['points'] often return some empty tensor point. However, when I tried to  use the default point cloud range from waymo settings: [-74.88, -74.88, -2, 74.88, 74.88, 4.0], this error disappered. Can u give me some guidance?

Thank you!





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DSVT-P trainning on Kitti Dataset #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DSVT-P trainning on Kitti Dataset #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions