Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
…
8 pages
1 file
This paper synthesizes insights from recent research on 3D object detection and recognition in digital images. The surveyed literature showcases advancements in deep learning architectures, sensor fusion techniques, real-time processing, robustness to occlusion, and domain adaptation. Notably, the integration of point cloud data in deep learning models enhances accuracy, while sensor fusion improves reliability in diverse lighting conditions. Optimized real-time processing, multi-view systems, and domain adaptation methods address specific challenges, contributing to the field's progress. Standard metrics and benchmark evaluations validate the effectiveness of proposed methodologies, highlighting their potential for real-world applications.
Electronic Imaging, 2020
The performance of autonomous agents in both commercial and consumer applications increases along with their situational awareness. Tasks such as obstacle avoidance, agent to agent interaction, and path planning are directly dependent upon their ability to convert sensor readings into scene understanding. Central to this is the ability to detect and recognize objects. Many object detection methodologies operate on a single modality such as vision or LiDAR. Camera-based object detection models benefit from an abundance of feature-rich information for classifying different types of objects. LiDAR-based object detection models use sparse point clouds, where each point contains accurate 3D position of object surfaces. Camera-based methods lack accurate object to lens distance measurements, while LiDAR-based methods lack dense feature-rich details. By utilizing information from both camera and LiDAR sensors, advanced object detection and identification is possible. In this work, we introduce a deep learning framework for fusing these modalities and produce a robust real-time 3D bounding box object detection network. We demonstrate qualitative and quantitative analysis of the proposed fusion model on the popular KITTI dataset.
Recently, object detection in 3D point clouds are very demanding due to the spatial geometric structure of the feature. For the past decade of years, deep learning technology has achieved enormous success in distinct object detection for 2D images, whereas understanding 3D point cloud data is relatively unformed. Nevertheless, deep learning network architectures are implementing algorithms that can process unstructured Euclidean data and learn automatically to make classification of 3d sensed data. Furthermore, the current approaches to deep learning with architectural designs are explained in this survey. Finally, the performance of different approaches, advantages and disadvantages have been given with a detailed discussion of where promising future research would be most valuable.
Intelligent Computing Methodologies
For the camera-LiDAR-based three-dimensional (3D) object detection, image features have rich texture descriptions and LiDAR features possess objects' 3D information. To fully fuse view-specific feature maps, this paper aims to explore the two-directional fusion of arbitrary size camera feature maps and LiDAR feature maps in the early feature extraction stage. Towards this target, a deep dense fusion 3D object detection framework is proposed for autonomous driving. This is a two stage end-to-end learnable architecture, which takes 2D images and raw LiDAR point clouds as inputs and fully fuses view-specific features to achieve high-precision oriented 3D detection. To fuse the arbitrary-size features from different views, a multi-view resizes layer (MVRL) is born. Massive experiments evaluated on the KITTI benchmark suite show that the proposed approach outperforms most state-of-the-art multi-sensorbased methods on all three classes on moderate difficulty (3D/BEV): Car (75.60%/88.65%), Pedestrian (64.36%/66.98%), Cyclist (57.53%/57.30%). Specifically, the DDF3D greatly improves the detection accuracy of hard difficulty in 2D detection with an 88.19% accuracy for the car class.
Journal of Physical Agents (JoPha)
Journal of Physics: Conference Series, 2021
The purpose of this article is to detect 3D objects inside the independent vehicle with great accuracy. The method proposed a Multi-View 3D System (MV3D) framework which encodes the sparse 3d-point cloud with a compact multi-view image, using LIDAR satellite image and RGB pictures as inputs, and predicts 3D boundary boxes. The network comprises two sub-networks: one for creating 3D artifacts and one for multi-visual fusion functionality. Propose an autonomous 3D object tracking approach to manipulate sparse and dense knowledge about romanticizing and geometry in stereo images. The Stereo R-CNN strategy applies Faster R-CNNs to stereo inputs such that objects are simultaneously identified and linked in conservative and liberal images. Such charts were then combined and fed into a 3D proposal generator to generate accurate 3D proposals for vehicles. In the second step, the refining network extended the features of the proposal regions further and carried through the classification, re...
IEEE 17th World Symposium on Applied Machine Intelligence and Informatics (SAMI 2019), 2019
The paper focuses on the problem of raw data fusion in neural networks based 3D object detection architectures. Here we consider the case of autonomous driving with data from camera and LiDAR sensors. Understanding the vehicle surroundings is a crucial task in autonomous driving since any subsequent action taken is strongly dependent on it. In this paper we present an alternative method of fusing camera image information with LiDAR poinclouds at a close to raw level of abstraction. Our results suggest that our approach improves the average precision of 3D bounding box detection of cyclists (and possibly other objects) in sparse point clouds compared to the baseline architecture without low-level fusion. The proposed approach has been evaluated on the KITTI dataset containing driving scenes with corresponding camera and LiDAR data. The long-term goal of our research is to develop a neural network architecture for environment perception that fuses multi-sensor data at the earliest stages possible, thus leveraging the full benefits of possible inter-sensor synergies.
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019
In this paper, we present an extension to LaserNet, an efficient and state-of-the-art LiDAR based 3D object detector. We propose a method for fusing image data with the LiDAR data and show that this sensor fusion method improves the detection performance of the model especially at long ranges. The addition of image data is straightforward and does not require image labels. Furthermore, we expand the capabilities of the model to perform 3D semantic segmentation in addition to 3D object detection. On a large benchmark dataset, we demonstrate our approach achieves state-of-the-art performance on both object detection and semantic segmentation while maintaining a low runtime.
Entropy
The computer vision, graphics, and machine learning research groups have given a significant amount of focus to 3D object recognition (segmentation, detection, and classification). Deep learning approaches have lately emerged as the preferred method for 3D segmentation problems as a result of their outstanding performance in 2D computer vision. As a result, many innovative approaches have been proposed and validated on multiple benchmark datasets. This study offers an in-depth assessment of the latest developments in deep learning-based 3D object recognition. We discuss the most well-known 3D object recognition models, along with evaluations of their distinctive qualities.
IEEE Access
Nowadays, computer vision with 3D (dimension) object detection and 6D (degree of freedom) pose assumptions are widely discussed and studied in the field. In the 3D object detection process, classifications are centered on the object's size, position, and direction. And in 6D pose assumptions, networks emphasize 3D translation and rotation vectors. Successful application of these strategies can have a huge impact on various machine learning-based applications, including the autonomous vehicles, the robotics industry, and the augmented reality sector. Although extensive work has been done on 3D object detection with a pose assumption from RGB images, the challenges have not been fully resolved. Our analysis provides a comprehensive review of the proposed contemporary techniques for complete 3D object detection and the recovery of 6D pose assumptions of an object. In this review research paper, we have discussed several proposed sophisticated methods in 3D object detection and 6D pose estimation, including some popular data sets, evaluation matrix, and proposed method challenges. Most importantly, this study makes an effort to offer some possible future directions in 3D object detection and 6D pose estimation. We accept the autonomous vehicle as the sample case for this detailed review. Finally, this review provides a complete overview of the latest in-depth learning-based research studies related to 3D object detection and 6D pose estimation systems and also points out a comparison between some popular frameworks. To be more concise, we propose a detailed summary of the state-of-the-art techniques of modern deep learningbased object detection and pose estimation models.
An accurate and robust perception system is key to understanding the driving environment of autonomous driving and robots. Autonomous driving needs 3D information about objects, including the object’s location and pose, to understand the driving environment clearly. A camera sensor is widely used in autonomous driving because of its richness in color, texture, and low price. The major problem with the camera is the lack of 3D information, which is necessary to understand the 3D driving environment. Additionally, the object’s scale change and cclusion make 3D object detection more challenging. Many deep learning-based methods, such as depth estimation, have been developed to solve the lack of 3D information. This survey presents the image 3D object detection 3D bounding box encoding techniques, feature extraction techniques, and evaluation metrics of 3D object detection. The image-based methods are categorized based on the technique used to estimate an image’s depth information, and ...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020
2018 International Joint Conference on Neural Networks (IJCNN), 2018
2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2021
ISPRS Journal of Photogrammetry and Remote Sensing, 2020
arXiv:1903.09847, 2019
IEEE Transactions on Image Processing, 2021
arXiv (Cornell University), 2023
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021