What are the advantages of using deep learning for object detection, specifically with YOLO and SSD?

Deep learning models like YOLO (You Only Look Once) and SSD (Single Shot Multibox Detector) provide high accuracy and real-time detection capabilities due to their ability to learn complex features directly from data. YOLO excels at fast inference and simultaneous multiple object detection, while SSD achieves fine-grained localization and high precision through feature extraction from multiple layers. Both are optimized for speed and accuracy, making them highly suitable for applications requiring instantaneous object recognition .

Describe the data augmentation techniques used in training deep learning object detection models and their impact on performance.

Data augmentation for training deep learning models involves transformations like flipping images horizontally, scaling, color jittering, and random cropping. These techniques increase dataset diversity, helping models generalize better and avoid overfitting by exposing them to varied input scenarios without increasing the number of labeled samples. This process is critical for enhancing the model’s robustness and improving accuracy on unseen data by simulating different real-world variations .

Discuss the process and importance of feature matching in Visual SLAM.

Feature matching in Visual SLAM involves extracting and matching ORB features across frames to establish correspondences between observed map points and 3-D coordinates. This matching is crucial for estimating camera poses, establishing connections between key frames, and updating the map’s structure by ensuring continuity and reliability in map points and poses. Persistent and accurate feature matching prevents drift and improves the overall robustness of the SLAM process .

How does bundle adjustment improve the accuracy of camera poses and map points in SLAM?

Bundle adjustment optimizes both camera poses and world points to minimize the overall reprojection errors. During SLAM, it refines attributes of map points, including 3-D locations, view direction, and depth range, ensuring that the estimates are more accurate. This process involves adjusting camera poses and the 3-D positions of map points to maintain consistency and precision throughout the SLAM system .

How does the Perspective-n-Point algorithm function in estimating camera pose within a SLAM system?

The Perspective-n-Point algorithm estimates the pose of a calibrated camera using a set of 3-D points and corresponding 2-D projections. In a SLAM system, this algorithm refines the camera pose based on newly observed features, contributing to the accurate tracking of the camera through a sequence of key frames, thereby maintaining consistent mapping and localization accuracy .

What measures are taken during SLAM to initialize and store key frames and map points effectively?

Initial key frames and map points in SLAM are stored using specialized data structures such as imageviewset and worldpointset. These hold information about key frames including ORB descriptors, feature points, camera poses, and connections between key frames. Worldpointset manages 3-D map points, their correspondences, mean view direction, representative ORB descriptors, and observable distance range, ensuring efficient handling and retrieval of SLAM data .

Explain the workflow for code generation using MATLAB Coder in the context of feature matching and image registration.

The workflow for generating C code using MATLAB Coder starts with modifying MATLAB code to ensure compatibility, defining input properties, and using the codegen command to produce a C-MEX function. The generated code can be executed within MATLAB for verification against MATLAB code outputs. This process involves careful handling of variable properties, runtime sizes, and specific function declarations to transition seamlessly from MATLAB algorithms to C implementations, ensuring high performance and flexibility even in embedded systems .

In what way does loop detection contribute to robustness in SLAM systems?

Loop detection enhances SLAM robustness by recognizing previously visited locations, which helps correct drifts in the camera trajectory accumulated over time. This is achieved using the bags-of-words approach, where a visual vocabulary is built offline to map visual words to image features. Upon detection of a loop, the database is incrementally built to refine camera trajectories and ensure more accurate localization .

What criteria determine whether a new frame should become a key frame in Visual SLAM with an RGB-D camera?

A new frame becomes a key frame if at least 20 frames have passed since the last key frame, or if the current frame tracks fewer than 100 map points or less than 25% of points tracked by the reference key frame. Additionally, the map points tracked by the current frame must be fewer than 90% of points tracked by the reference key frame. This process ensures that key frames contribute significantly to the map’s structure and are not redundant .

Open navigation menu

Upload

0% found this document useful (0 votes)

711 views2,100 pages

Computer Vision User Guide

Q: What role do ORB features play in the Visual Simultaneous Localization and Mapping (SLAM) process using RGB-D cameras?

In Visual SLAM, ORB features are extracted from each new color image and matched with features in the last key frame that have known corresponding 3-D map points. This process supports the estimation of the camera pose using the Perspective-n-Point algorithm. It further refines the camera pose through motion-only bundle adjustment and helps determine if a new frame should be a key frame .

Uploaded by

premprakashsliet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

711 views2,100 pages

Computer Vision User Guide

Uploaded by

premprakashsliet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2100

Computer Vision Toolbox™

User's Guide

R2023b
How to Contact MathWorks

Latest news: www.mathworks.com

Sales and services: www.mathworks.com/sales_and_services

User community: www.mathworks.com/matlabcentral

Technical support: www.mathworks.com/support/contact_us

Phone: 508-647-7000

The MathWorks, Inc.

1 Apple Hill Drive
Natick, MA 01760-2098
Computer Vision Toolbox™ User's Guide
© COPYRIGHT 2004–2023 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied
only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form
without prior written consent from The MathWorks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through
the federal government of the United States. By accepting delivery of the Program or Documentation, the government
hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer
software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014.
Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain
to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and
Documentation by the federal government (or other entity acquiring for or through the federal government) and shall
supersede any conflicting contractual terms or conditions. If this License fails to meet the government's needs or is
inconsistent in any respect with federal procurement law, the government agrees to return the Program and
Documentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be
trademarks or registered trademarks of their respective holders.
Patents
MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for
more information.
Revision History
July 2004 First printing New for Version 1.0 (Release 14)
October 2004 Second printing Revised for Version 1.0.1 (Release 14SP1)
March 2005 Online only Revised for Version 1.1 (Release 14SP2)
September 2005 Online only Revised for Version 1.2 (Release 14SP3)
November 2005 Online only Revised for Version 2.0 (Release 14SP3+)
March 2006 Online only Revised for Version 2.1 (Release 2006a)
September 2006 Online only Revised for Version 2.2 (Release 2006b)
March 2007 Online only Revised for Version 2.3 (Release 2007a)
September 2007 Online only Revised for Version 2.4 (Release 2007b)
March 2008 Online only Revised for Version 2.5 (Release 2008a)
October 2008 Online only Revised for Version 2.6 (Release 2008b)
March 2009 Online only Revised for Version 2.7 (Release 2009a)
September 2009 Online only Revised for Version 2.8 (Release 2009b)
March 2010 Online only Revised for Version 3.0 (Release 2010a)
September 2010 Online only Revised for Version 3.1 (Release 2010b)
April 2011 Online only Revised for Version 4.0 (Release 2011a)
September 2011 Online only Revised for Version 4.1 (Release 2011b)
March 2012 Online only Revised for Version 5.0 (Release 2012a)
September 2012 Online only Revised for Version 5.1 (Release R2012b)
March 2013 Online only Revised for Version 5.2 (Release R2013a)
September 2013 Online only Revised for Version 5.3 (Release R2013b)
March 2014 Online only Revised for Version 6.0 (Release R2014a)
October 2014 Online only Revised for Version 6.1 (Release R2014b)
March 2015 Online only Revised for Version 6.2 (Release R2015a)
September 2015 Online only Revised for Version 7.0 (Release R2015b)
March 2016 Online only Revised for Version 7.1 (Release R2016a)
September 2016 Online only Revised for Version 7.2 (Release R2016b)
March 2017 Online only Revised for Version 7.3 (Release R2017a)
September 2017 Online only Revised for Version 8.0 (Release R2017b)
March 2018 Online only Revised for Version 8.1 (Release R2018a)
September 2018 Online only Revised for Version 8.2 (Release R2018b)
March 2019 Online only Revised for Version 9.0 (Release R2019a)
September 2019 Online only Revised for Version 9.1 (Release R2019b)
March 2020 Online only Revised for Version 9.2 (Release R2020a)
September 2020 Online only Revised for Version 9.3 (Release R2020b)
March 2021 Online only Revised for Version 10.0 (Release R2021a)
September 2021 Online only Revised for Version 10.1 (Release R2021b)
March 2022 Online only Revised for Version 10.2 (Release R2022a)
September 2022 Online only Revised for Version 10.3 (Release R2022b)
March 2023 Online only Revised for Version 10.4 (Release R2023a)
September 2023 Online only Revised for Version 23.2 (R2023b)
Contents

Camera Calibration and SfM Examples

1
Monocular Visual-Inertial SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

Code Generation for Monocular Visual Simultaneous Localization and

Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-22

Build and Deploy Visual SLAM Algorithm with ROS in MATLAB . . . . . . 1-25

Monocular Visual-Inertial Odometry Using Factor Graph . . . . . . . . . . . . 1-30

Visual SLAM with an RGB-D Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-54

Import Stereo Camera Parameters from ROS . . . . . . . . . . . . . . . . . . . . . . 1-67

Import Camera Intrinsic Parameters from ROS . . . . . . . . . . . . . . . . . . . . 1-71

Develop Visual SLAM Algorithm Using Unreal Engine Simulation . . . . . 1-75

Visual Localization in a Parking Lot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-88

Stereo Visual SLAM for UAV Navigation in 3D Simulation . . . . . . . . . . . 1-94

Camera Calibration Using AprilTag Markers . . . . . . . . . . . . . . . . . . . . . 1-100

Configure Monocular Fisheye Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-117

Monocular Visual Simultaneous Localization and Mapping . . . . . . . . . 1-122

Structure from Motion from Two Views . . . . . . . . . . . . . . . . . . . . . . . . . . 1-144

Stereo Visual Simultaneous Localization and Mapping . . . . . . . . . . . . . 1-153

Evaluating the Accuracy of Single Camera Calibration . . . . . . . . . . . . . 1-167

Measuring Planar Objects with a Calibrated Camera . . . . . . . . . . . . . . 1-172

Depth Estimation from Stereo Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-181

Structure from Motion from Multiple Views . . . . . . . . . . . . . . . . . . . . . . 1-192

Uncalibrated Stereo Image Rectification . . . . . . . . . . . . . . . . . . . . . . . . 1-199

v
Code Generation and Third-Party Examples
2
Code Generation for Monocular Visual Simultaneous Localization and
Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

Code Generation for Object Detection by Using Single Shot Multibox

Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5

Code Generation for Object Detection by Using YOLO v2 . . . . . . . . . . . . . 2-8

Introduction to Code Generation with Feature Matching and Registration

......................................................... 2-12

Code Generation for Face Tracking with PackNGo . . . . . . . . . . . . . . . . . . 2-19

Code Generation for Depth Estimation From Stereo Video . . . . . . . . . . . 2-27

Detect Face (Raspberry Pi2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32

Track Face (Raspberry Pi2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-38

Video Display in a Custom User Interface . . . . . . . . . . . . . . . . . . . . . . . . . 2-44

Generate Code for Detecting Objects in Images by Using ACF Object

Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-49

Deep Learning, Semantic Segmentation, and Detection

Examples
3
Hand Pose Estimation Using HRNet Deep Learning . . . . . . . . . . . . . . . . . 3-3

Recognize Seven-Segment Digits Using OCR . . . . . . . . . . . . . . . . . . . . . . 3-17

Train an OCR Model to Recognize Seven-Segment Digits . . . . . . . . . . . . 3-22

Automate Ground Truth Labeling for OCR . . . . . . . . . . . . . . . . . . . . . . . . 3-33

Object Detection In Large Satellite Imagery Using Deep Learning . . . . 3-47

Augmented Reality Using AprilTag Markers . . . . . . . . . . . . . . . . . . . . . . . 3-74

Multiclass Object Detection Using YOLO v2 Deep Learning . . . . . . . . . . 3-84

Generate Adversarial Examples for Semantic Segmentation . . . . . . . . 3-101

Classify Defects on Wafer Maps Using Deep Learning . . . . . . . . . . . . . . 3-112

Detect Image Anomalies Using Explainable FCDD Network . . . . . . . . . 3-128

vi Contents
Detect Image Anomalies Using Pretrained ResNet-18 Feature
Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-141

Localize Industrial Defects Using PatchCore Anomaly Detector . . . . . . 3-161

Detect Defects on Printed Circuit Boards Using YOLOX Network . . . . 3-174

Train Object Detectors in Experiment Manager . . . . . . . . . . . . . . . . . . . 3-182

Activity Recognition Using R(2+1)D Video Classification . . . . . . . . . . . 3-189

Activity Recognition from Video and Optical Flow Data Using Deep
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-212

Evaluate a Video Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-240

Extract Training Data for Video Classification . . . . . . . . . . . . . . . . . . . . 3-244

Classify Streaming Webcam Video Using SlowFast Video Classifier . . . 3-248

Gesture Recognition using Videos and Deep Learning . . . . . . . . . . . . . 3-251

Explore Semantic Segmentation Network Using Grad-CAM . . . . . . . . . 3-272

Point Cloud Classification Using PointNet Deep Learning . . . . . . . . . . 3-279

Object Detection Using SSD Deep Learning . . . . . . . . . . . . . . . . . . . . . . 3-302

Object Detection in a Cluttered Scene Using Point Feature Matching 3-315

Semantic Segmentation Using Deep Learning . . . . . . . . . . . . . . . . . . . . 3-326

Calculate Segmentation Metrics in Block-Based Workflow . . . . . . . . . . 3-345

Semantic Segmentation of Multispectral Images Using Deep Learning

........................................................ 3-350

3-D Brain Tumor Segmentation Using Deep Learning . . . . . . . . . . . . . . 3-368

Prune and Quantize Semantic Segmentation Network . . . . . . . . . . . . . 3-378

Train Vision Transformer Network for Image Classification . . . . . . . . . 3-395

Image Category Classification Using Bag of Features . . . . . . . . . . . . . . 3-404

Image Category Classification Using Deep Learning . . . . . . . . . . . . . . . 3-411

Image Retrieval Using Customized Bag of Features . . . . . . . . . . . . . . . 3-420

Create SSD Object Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-427

Train YOLO v2 Network for Vehicle Detection . . . . . . . . . . . . . . . . . . . . 3-431

Import Pretrained ONNX YOLO v2 Object Detector . . . . . . . . . . . . . . . . 3-436

vii
Export YOLO v2 Object Detector to ONNX . . . . . . . . . . . . . . . . . . . . . . . 3-443

Estimate Anchor Boxes From Training Data . . . . . . . . . . . . . . . . . . . . . . 3-449

Object Detection Using YOLO v3 Deep Learning . . . . . . . . . . . . . . . . . . 3-453

Object Detection Using YOLO v2 Deep Learning . . . . . . . . . . . . . . . . . . 3-468

Create YOLO v2 Object Detection Network . . . . . . . . . . . . . . . . . . . . . . . 3-478

Train Object Detector Using R-CNN Deep Learning . . . . . . . . . . . . . . . . 3-483

Object Detection Using Faster R-CNN Deep Learning . . . . . . . . . . . . . . 3-496

Train Classification Network to Classify Object in 3-D Point Cloud . . . 3-506

Estimate Body Pose Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . 3-516

Generate Image from Segmentation Map Using Deep Learning . . . . . . 3-524

Train Simple Semantic Segmentation Network in Deep Network Designer

........................................................ 3-538

Train ACF-Based Stop Sign Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-543

Train Fast R-CNN Stop Sign Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-546

Perform Instance Segmentation Using SOLOv2 . . . . . . . . . . . . . . . . . . . 3-549

Perform Instance Segmentation Using Mask R-CNN . . . . . . . . . . . . . . . 3-556

Object Detection Using YOLO v4 Deep Learning . . . . . . . . . . . . . . . . . . 3-561

Feature Detection and Extraction Examples

4
Automatically Detect and Recognize Text Using MSER and OCR . . . . . . . 4-2

Automatically Detect and Recognize Text Using Pretrained CRAFT

Network and OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14

Digit Classification Using HOG Features . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17

Find Image Rotation and Scale Using Automated Feature Matching . . . 4-25

Feature Based Panoramic Image Stitching . . . . . . . . . . . . . . . . . . . . . . . . 4-30

Cell Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-36

Object Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-39

viii Contents
Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41

Recognize Text Using Optical Character Recognition (OCR) . . . . . . . . . . 4-46

Cell Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59

Lidar and Point Cloud Processing Examples

5
Build a Map from Lidar Data Using SLAM on GPU . . . . . . . . . . . . . . . . . . . 5-2

Design Lidar SLAM Algorithm Using Unreal Engine Simulation

Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13

Ground Plane and Obstacle Detection Using Lidar . . . . . . . . . . . . . . . . . 5-23

Augment Point Cloud Data For Deep Learning . . . . . . . . . . . . . . . . . . . . . 5-32

Import Point Cloud Data For Deep Learning . . . . . . . . . . . . . . . . . . . . . . . 5-37

Encode Point Cloud Data For Deep Learning . . . . . . . . . . . . . . . . . . . . . . 5-41

Build a Map from Lidar Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47

Build a Map from Lidar Data Using SLAM . . . . . . . . . . . . . . . . . . . . . . . . 5-66

3-D Point Cloud Registration and Stitching . . . . . . . . . . . . . . . . . . . . . . . 5-82

Computer Vision with Simulink Examples

6
Multicore Simulation of Video Processing System . . . . . . . . . . . . . . . . . . . 6-2

Concentricity Inspection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

Object Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9

Video Focus Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11

Video Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13

Motion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15

Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17

Scene Change Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20

Surveillance Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22

ix
Traffic Warning Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-24

Abandoned Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-27

Color-based Road Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-30

Detect and Track Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-34

Lane Departure Warning System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-41

Tracking Cars Using Foreground Detection . . . . . . . . . . . . . . . . . . . . . . . 6-45

Tracking Cars Using Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-48

Tracking Based on Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-50

Video Mosaicking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-52

Video Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-57

Periodic Noise Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-59

Rotation Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-61

Barcode Recognition Using Live Video Acquisition . . . . . . . . . . . . . . . . . 6-65

Edge Detection Using Live Video Acquisition . . . . . . . . . . . . . . . . . . . . . . 6-67

Noise Removal and Image Sharpening . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-72

Track Marker Using Simulink Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-78

Video and Image Ground Truth Labeling

7
Export Ground Truth Object to Custom and COCO JSON Files . . . . . . . . . 7-2

Automate Ground Truth Labeling for Semantic Segmentation . . . . . . . . . 7-7

Convert Image Labeler Polygons to Labeled Blocked Image for Semantic

Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16

Automate Ground Truth Labeling for Object Detection . . . . . . . . . . . . . . 7-21

Tracking and Motion Estimation Examples

8
Multi-Object Tracking with DeepSORT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2

x Contents
Visual Tracking of Occluded and Unresolved Objects . . . . . . . . . . . . . . . 8-15

Implement Simple Online and Realtime Tracking . . . . . . . . . . . . . . . . . . 8-36

Import Camera-Based Datasets in MOT Challenge Format for Object

Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-45

Video Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-52

Video Stabilization Using Point Feature Matching . . . . . . . . . . . . . . . . . . 8-55

Reidentify People Throughout a Video Sequence Using ReID Network

......................................................... 8-65

Face Detection and Tracking Using CAMShift . . . . . . . . . . . . . . . . . . . . . 8-84

Face Detection and Tracking Using the KLT Algorithm . . . . . . . . . . . . . . 8-89

Face Detection and Tracking Using Live Video Acquisition . . . . . . . . . . . 8-95

Motion-Based Multiple Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . 8-100

Tracking Pedestrians from a Moving Car . . . . . . . . . . . . . . . . . . . . . . . . 8-109

Use Kalman Filter for Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 8-119

Detect Cars Using Gaussian Mixture Models . . . . . . . . . . . . . . . . . . . . . 8-130

Semantic Segmentation With Deep Learning Examples

9
Analyze Training Data for Semantic Segmentation . . . . . . . . . . . . . . . . . . 9-2

Create a Semantic Segmentation Network . . . . . . . . . . . . . . . . . . . . . . . . . 9-6

Train A Semantic Segmentation Network . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9

Evaluate and Inspect the Results of Semantic Segmentation . . . . . . . . . 9-12

Import Pixel Labeled Dataset For Semantic Segmentation . . . . . . . . . . . 9-18

Faster R-CNN Examples

10
Create R-CNN Object Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . 10-2

Create Fast R-CNN Object Detection Network . . . . . . . . . . . . . . . . . . . . . 10-5

xi
Create Faster R-CNN Object Detection Network . . . . . . . . . . . . . . . . . . 10-10

Labelers
11
Elements of Ground Truth Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Exported Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2

Get Started with Team-Based Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7

Create Project for a Labeling Team and Publish Tasks . . . . . . . . . . . . . . 11-8

Load Image Data and Create Label Definitions . . . . . . . . . . . . . . . . . . . . 11-8
Create Label Tasks and Assign to Task Owners . . . . . . . . . . . . . . . . . . . . 11-9
Publish Label Tasks and Share Project with Task Owners . . . . . . . . . . . 11-11
Create Executable for Labeling Task Owners Without Full MATLAB License
.................................................... 11-11
Modify Published Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-12

Label Images and Send for Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-13

Open Labeling Project and Load Assigned Labeling Task . . . . . . . . . . . 11-13
Label Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-15
Share Labeled Images for Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-18

Create Review Tasks for Labeled Images . . . . . . . . . . . . . . . . . . . . . . . . 11-21

Open Existing Labeling Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-21
Check Status of Images Assigned for Labeling . . . . . . . . . . . . . . . . . . . 11-22
Create Review Tasks and Assign to Task Owners . . . . . . . . . . . . . . . . . . 11-23
Publish Review Tasks and Share Project with Review Task Owners . . . . 11-25
Create Executable for Review Task Owners Without Full MATLAB License
.................................................... 11-26

Review Labeled Images and Send Feedback . . . . . . . . . . . . . . . . . . . . . . 11-28

Open Labeling Project and Load Assigned Review Task . . . . . . . . . . . . . 11-28
Review Labeled Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-29
Send Review Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-31

View Summary of ROI and Scene Labels . . . . . . . . . . . . . . . . . . . . . . . . . 11-32

Create Automation Algorithm Function for Labeling . . . . . . . . . . . . . . . 11-34

How to Specify an Automation Function in an App . . . . . . . . . . . . . . . . 11-34
Use a Function to Automate Labeling with Your Custom Detector . . . . . 11-34
Create an Automation Algorithm Function . . . . . . . . . . . . . . . . . . . . . . 11-35

Combine Reviewed Images and Export . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38

Open Existing Labeling Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-38
Image Labeling and Review Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-39
Combine and Export Image Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-40
Ground Truth Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-42

Create Automation Algorithm for Labeling . . . . . . . . . . . . . . . . . . . . . . . 11-45

Create New Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-45
Import Existing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-46

xii Contents
Custom Algorithm Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-46

Label Large Images in the Image Labeler . . . . . . . . . . . . . . . . . . . . . . . . 11-49

Import Blocked Image into Image Labeler . . . . . . . . . . . . . . . . . . . . . . 11-49
Work with Blocked Images in the Image Labeler . . . . . . . . . . . . . . . . . . 11-51
Use Blocked Image Automation with Images . . . . . . . . . . . . . . . . . . . . . 11-52
Postprocess Exported Labels to Create a Labeled Blocked Image . . . . . 11-54

Label Pixels for Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 11-56

Start Pixel Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-56
Label Pixels Using Flood Fill Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-56
Label Pixels Using Superpixel Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-57
Label Pixels Using Smart Polygon Tool . . . . . . . . . . . . . . . . . . . . . . . . . 11-58
Label Pixels Using Polygon Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-59
Label Pixels Using Assisted Freehand Tool . . . . . . . . . . . . . . . . . . . . . . 11-60
Replace Pixel Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-61
Refine Labels Using Brush Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-61
Visualize Pixel Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-61
Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-62

Label Objects Using Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-64

About Polygon Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-64
Load Unlabeled Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-64
Create Polygon Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-65
Draw Polygon ROI Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-65
Modify Polygon Preferences and Stacking Order . . . . . . . . . . . . . . . . . . 11-65
Postprocess Exported Labels for Instance or Semantic Segmentation
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-68

Get Started with the Image Labeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-71

Using This Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-71
Load Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-71
Layout of the Image Labeler App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-73
Create Label Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-74
Label Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-78
Export Labeled Images for a Team-based Project . . . . . . . . . . . . . . . . . 11-79
Export Labeled Images for an Individual Project . . . . . . . . . . . . . . . . . . 11-79

Choose an App to Label Ground Truth Data . . . . . . . . . . . . . . . . . . . . . . 11-82

Get Started with the Video Labeler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-87

Load Unlabeled Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-87
Create Label Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-87
Label Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-93
Export Labeled Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-95
Label Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-97
Save App Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-98

Use Custom Image Source Reader for Labeling . . . . . . . . . . . . . . . . . . 11-100

Create Custom Reader Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-100
Import Data Source into Video Labeler App . . . . . . . . . . . . . . . . . . . . 11-100
Import Data Source into Ground Truth Labeler App . . . . . . . . . . . . . . 11-101

Keyboard Shortcuts and Mouse Actions for Video Labeler . . . . . . . . . 11-102

Label Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-102

xiii
Frame Navigation and Time Interval Settings . . . . . . . . . . . . . . . . . . . 11-102
Labeling Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-102
Polyline Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-103
Polygon Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-104
Zooming and Panning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-104
App Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-104

Keyboard Shortcuts and Mouse Actions for Image Labeler . . . . . . . . . 11-106

Label Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-106
Image Browsing and Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-106
Labeling Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-106
Polyline Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-107
Polygon Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-108
Zooming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-108
Zooming and Panning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-109
App Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-109
Label and Sublabel Attribute Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-109
View Labels, Sublabels, and Attributes Right-Panel . . . . . . . . . . . . . . . 11-109
Attribute Column: Drop-down Menu . . . . . . . . . . . . . . . . . . . . . . . . . . 11-109
Attribute Column: Edit Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-110
Review Status Column: Accept/Reject . . . . . . . . . . . . . . . . . . . . . . . . . 11-110

Share and Store Labeled Ground Truth Data . . . . . . . . . . . . . . . . . . . . 11-111

Share Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-111
Move Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-114
Store Ground Truth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-115
Extract Labeled Video Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-115

View Summary of Ground Truth Labels . . . . . . . . . . . . . . . . . . . . . . . . . 11-117

View Label Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-117
Compare Selected Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-119

Temporal Automation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-121

Create Temporal Automation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 11-121
Run Temporal Automation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 11-121

Blocked Image Automation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 11-123

Create Blocked Image Automation Algorithm . . . . . . . . . . . . . . . . . . . 11-123
Run Blocked Image Automation Algorithm . . . . . . . . . . . . . . . . . . . . . 11-123

Use Sublabels and Attributes to Label Ground Truth Data . . . . . . . . . 11-124

When to Use Sublabels vs. Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 11-124
Draw Sublabels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-125
Copy and Paste Sublabels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-125
Delete Sublabels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-126
Sublabel Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-127

Training Data for Object Detection and Semantic Segmentation . . . . 11-128

Create Automation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-132

Create New Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-132
Import Existing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-133
Custom Algorithm Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-133

xiv Contents
Featured Examples
12
Localize and Read Multiple Barcodes in Image . . . . . . . . . . . . . . . . . . . . 12-2

Monocular Visual Odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-17

Detect and Track Vehicles Using Lidar Data . . . . . . . . . . . . . . . . . . . . . . 12-29

Semantic Segmentation Using Dilated Convolutions . . . . . . . . . . . . . . . 12-48

Define Custom Pixel Classification Layer with Tversky Loss . . . . . . . . . 12-53

Track a Face in Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-60

Create 3-D Stereo Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-65

Measure Distance from Stereo Camera to a Face . . . . . . . . . . . . . . . . . . 12-66

Reconstruct 3-D Scene from Disparity Map . . . . . . . . . . . . . . . . . . . . . . 12-68

Visualize Stereo Pair of Camera Extrinsic Parameters . . . . . . . . . . . . . . 12-71

Remove Distortion from an Image Using Camera Parameters Object . 12-74

Structure from Motion and Visual SLAM

13
Choose SLAM Workflow Based on Sensor Data . . . . . . . . . . . . . . . . . . . . . 13-2
Choose SLAM Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2

Implement Visual SLAM in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8

Terms Used in Visual SLAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8
Typical Feature-based Visual SLAM Workflow . . . . . . . . . . . . . . . . . . . . . 13-8
Key Frame and Map Data Management . . . . . . . . . . . . . . . . . . . . . . . . . 13-9
Map Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-10
Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-11
Local Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-12
Loop Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Drift Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-14
Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-15

Point Cloud Processing

14
Choose a Point Cloud Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2

xv
Getting Started with Point Clouds Using Deep Learning . . . . . . . . . . . . . 14-3
Import Point Cloud Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3
Augment Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-3
Encode Point Cloud Data to Image-like Format . . . . . . . . . . . . . . . . . . . . 14-4
Train a Deep Learning Classification Network with Encoded Point Cloud
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-4

Implement Point Cloud SLAM in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . 14-5

Mapping and Localization Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-5
Manage Data for Mapping and Localization . . . . . . . . . . . . . . . . . . . . . . 14-7
Preprocess Point Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-7
Register Point Clouds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-7
Detect Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
Correct Drift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
Assemble Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
Localize Vehicle in Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10
Alternate Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-10

The PLY Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13

File Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
Common Elements and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-15

Using the Installer for Computer Vision System Toolbox

Product
15
Install Computer Vision Toolbox Add-on Support Files . . . . . . . . . . . . . . 15-2

Install OCR Language Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3

Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-3
Pretrained Language Data and the ocr function . . . . . . . . . . . . . . . . . . . 15-3

Install and Use Computer Vision Toolbox Interface for OpenCV in

MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6
Support Package Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6

Build MEX-Files for OpenCV Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8

Create MEX-File from OpenCV C++ file . . . . . . . . . . . . . . . . . . . . . . . . . 15-8
Create Your Own OpenCV MEX-files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8
Run OpenCV Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-8

Use Prebuilt MATLAB Interface to OpenCV . . . . . . . . . . . . . . . . . . . . . . 15-10

Call MATLAB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
Call Functions in OpenCV Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-11
Display Help for MATLAB Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-12
Display Help for MATLAB Interface to OpenCV Library . . . . . . . . . . . . . 15-12
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-13

Perform Edge-Preserving Image Smoothing Using OpenCV in MATLAB

........................................................ 15-15

xvi Contents
Subtract Image Background by Using OpenCV in MATLAB . . . . . . . . . 15-19

Perform Face Detection by Using OpenCV in MATLAB . . . . . . . . . . . . . 15-22

Install and Use Computer Vision Toolbox Interface for OpenCV in

Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
Import OpenCV Code into Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-24
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-29

Draw Different Shapes by Using OpenCV Code in Simulink . . . . . . . . . 15-31

Convert RGB Image to Grayscale Image by Using OpenCV Importer . . 15-38

Smile Detection by Using OpenCV Code in Simulink . . . . . . . . . . . . . . . 15-45

Shadow Detection by Using OpenCV Code in Simulink . . . . . . . . . . . . . 15-55

Vehicle and Pedestrian Detector by Using OpenCV Importer . . . . . . . . 15-60

Video Cartoonizer by Using OpenCV Code in Simulink . . . . . . . . . . . . . 15-64

Convert Between Simulink Image Type and Matrices . . . . . . . . . . . . . . 15-69

Copy Example Model to a Writable Location . . . . . . . . . . . . . . . . . . . . . 15-69
Example Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-69
Simulate Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-69
Generate C++ Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-70

Input, Output, and Conversions

16
Video Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
Defining Intensity and Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
Video Data Stored in Column-Major Format . . . . . . . . . . . . . . . . . . . . . . 16-2

Image Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3

Binary Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3
Intensity Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3
RGB Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3

Display and Graphics

17
Choose Function to Visualize Detected Objects . . . . . . . . . . . . . . . . . . . . 17-2

Display, Stream, and Preview Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6

View Streaming Video in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6
Preview Video in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6
View Video in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-6

xvii
Draw Shapes and Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
Rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
Line and Polyline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-8
Polygon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-10
Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-10

Registration and Stereo Vision

18
Select Calibration Pattern and Set Properties . . . . . . . . . . . . . . . . . . . . . 18-2

Prepare Camera and Capture Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4

Camera Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4
Capture Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4

Calibration Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6

What Are Calibration Patterns? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-6
Supported Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
Checkerboard Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-8
Circle Grid Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-9
Custom Pattern Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-9

Fisheye Calibration Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-11

Fisheye Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-13
Fisheye Camera Calibration in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . 18-18

Using the Single Camera Calibrator App . . . . . . . . . . . . . . . . . . . . . . . . 18-22

Camera Calibrator Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-22
Choose a Calibration Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-22
Capture Calibration Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-22
Using the Camera Calibrator App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-23

Using the Stereo Camera Calibrator App . . . . . . . . . . . . . . . . . . . . . . . . 18-35

Stereo Camera Calibrator Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-35
Choose a Calibration Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-36
Capture Calibration Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-36
Using the Stereo Camera Calibrator App . . . . . . . . . . . . . . . . . . . . . . . 18-36

What Is Camera Calibration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-47

Camera Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-47
Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-49
Camera Calibration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-50
Distortion in Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-54

Structure from Motion Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-57

Structure from Motion from Two Views . . . . . . . . . . . . . . . . . . . . . . . . 18-57
Structure from Motion from Multiple Views . . . . . . . . . . . . . . . . . . . . . 18-58

xviii Contents
Object Detection
19
Train Custom OCR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
Prepare Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-2
Train an OCR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4
Evaluate OCR training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-4

Getting Started with OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-6

Text Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-6
Text Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-7
Troubleshoot OCR Function Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-8
Train Custom OCR Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
Create Ground Truth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9
Evaluate and Quantize OCR Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-9

Getting Started with Anomaly Detection Using Deep Learning . . . . . . 19-11

Prepare Training and Calibration Data . . . . . . . . . . . . . . . . . . . . . . . . . 19-11
Choose the Anomaly Detection Model . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
Train the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
Calibrate and Evaluate the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-13
Perform Classification Using the Model . . . . . . . . . . . . . . . . . . . . . . . . 19-13
Deploy the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-14

Getting Started with Video Classification Using Deep Learning . . . . . . 19-15

Create Training Data for Video Classification . . . . . . . . . . . . . . . . . . . . 19-16
Create Video Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-16
Train Video Classifier and Evaluate Results . . . . . . . . . . . . . . . . . . . . . 19-23
Classify Using Deep Learning Video Classifiers . . . . . . . . . . . . . . . . . . . 19-24

Choose an Object Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-25

General Object Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-26
Object Detectors for Detecting Vehicles and People . . . . . . . . . . . . . . . 19-31
Detection Models for Instance Segmentation . . . . . . . . . . . . . . . . . . . . 19-32

Getting Started with SSD Multibox Detection . . . . . . . . . . . . . . . . . . . . 19-34

Predict Objects in the Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-34
Design an SSD Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-35
Train an Object Detector and Detect Objects with an SSD Model . . . . . 19-35
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-36
Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-36
Label Training Data for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 19-36

Getting Started with Object Detection Using Deep Learning . . . . . . . . 19-37

Create Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-37
Create Object Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-38
Train Object Detector and Evaluate Results . . . . . . . . . . . . . . . . . . . . . 19-38
Detect Objects Using Deep Learning Detectors . . . . . . . . . . . . . . . . . . . 19-38
Detect and Segment Objects Using Pretrained Models . . . . . . . . . . . . . 19-39
MathWorks GitHub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-41

How Labeler Apps Store Exported Pixel Labels . . . . . . . . . . . . . . . . . . . 19-42

Location of Pixel Label Data Folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-42
View Exported Pixel Label Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-42

xix
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-43

Anchor Boxes for Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-47

What Is an Anchor Box? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-47
Advantage of Using Anchor Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-47
How Do Anchor Boxes Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-48
Anchor Box Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-51

Get Started with Image Preprocessing and Augmentation for Deep

Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-52
Preprocess and Augment Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-52
Preprocess and Augment Pixel Label Images for Semantic Segmentation
.................................................... 19-54
Preprocess and Augment Bounding Boxes for Object Detection . . . . . . . 19-55

Augment Bounding Boxes for Object Detection . . . . . . . . . . . . . . . . . . . 19-57

Getting Started with YOLO v2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-70

Predicting Objects in the Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-70
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-71
Design a YOLO v2 Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . 19-71
Train an Object Detector and Detect Objects with a YOLO v2 Model . . . 19-72
Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-72
Label Training Data for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 19-72

Getting Started with YOLO v3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-74

Predicting Objects in the Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-74
Design a YOLO v3 Detection Network . . . . . . . . . . . . . . . . . . . . . . . . . . 19-75
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-75
Train an Object Detector and Detect Objects with a YOLO v3 Model . . . 19-75
Label Training Data for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 19-75

Getting Started with YOLO v4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-77

Predict Objects Using YOLO v4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-78
Create YOLO v4 Object Detection Network . . . . . . . . . . . . . . . . . . . . . . 19-78
Train and Detect Objects Using YOLO v4 Network . . . . . . . . . . . . . . . . 19-79
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-80
Label Training Data for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 19-80

Getting Started with YOLOX for Object Detection . . . . . . . . . . . . . . . . . 19-82

Install Automated Visual Inspection Support Package . . . . . . . . . . . . . . 19-83
Detect Objects in Image Using Pretrained YOLOX Network . . . . . . . . . 19-83
Train YOLOX Network and Perform Transfer Learning . . . . . . . . . . . . . 19-84

Getting Started with HRNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-86

Create HRNet Object Keypoint Detection Network . . . . . . . . . . . . . . . . 19-87
Detect Object Keypoints Using HRNet . . . . . . . . . . . . . . . . . . . . . . . . . 19-87
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-87
Visualize Object Keypoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-88

Getting Started with R-CNN, Fast R-CNN, and Faster R-CNN . . . . . . . . 19-89
Object Detection Using R-CNN Algorithms . . . . . . . . . . . . . . . . . . . . . . 19-89
Comparison of R-CNN Object Detectors . . . . . . . . . . . . . . . . . . . . . . . . 19-91
Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-91
Design an R-CNN, Fast R-CNN, and a Faster R-CNN Model . . . . . . . . . . 19-92

xx Contents
Label Training Data for Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . 19-93

Getting Started with Mask R-CNN for Instance Segmentation . . . . . . . 19-95

Design Mask R-CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-95
Prepare Mask R-CNN Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-96
Train Mask R-CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-101
Perform Instance Segmentation and Evaluate Results . . . . . . . . . . . . . 19-101

Get Started with SOLOv2 for Instance Segmentation . . . . . . . . . . . . . 19-103

Install Support Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-103
Segment Image with Pretrained SOLOv2 Network . . . . . . . . . . . . . . . 19-103
Perform Transfer Learning with SOLOv2 . . . . . . . . . . . . . . . . . . . . . . 19-106
Evaluate Instance Segmentation Results . . . . . . . . . . . . . . . . . . . . . . 19-109

Getting Started with Semantic Segmentation Using Deep Learning . 19-111

Label Training Data for Semantic Segmentation . . . . . . . . . . . . . . . . . 19-111
Train and Test a Semantic Segmentation Network . . . . . . . . . . . . . . . 19-112
Segment Objects Using Pretrained DeepLabv3+ Network . . . . . . . . . 19-112

Augment Pixel Labels for Semantic Segmentation . . . . . . . . . . . . . . . 19-114

Point Feature Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-124

Functions That Return Points Objects . . . . . . . . . . . . . . . . . . . . . . . . . 19-124
Functions That Accept Points Objects . . . . . . . . . . . . . . . . . . . . . . . . . 19-126

Local Feature Detection and Extraction . . . . . . . . . . . . . . . . . . . . . . . . 19-130

What Are Local Features? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-130
Benefits and Applications of Local Features . . . . . . . . . . . . . . . . . . . . 19-130
What Makes a Good Local Feature? . . . . . . . . . . . . . . . . . . . . . . . . . . 19-131
Feature Detection and Feature Extraction . . . . . . . . . . . . . . . . . . . . . 19-131
Choose a Feature Detector and Descriptor . . . . . . . . . . . . . . . . . . . . . 19-132
Use Local Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-133
Image Registration Using Multiple Features . . . . . . . . . . . . . . . . . . . . 19-139

Get Started with Cascade Object Detector . . . . . . . . . . . . . . . . . . . . . . 19-147

Why Train a Detector? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-147
What Kinds of Objects Can You Detect? . . . . . . . . . . . . . . . . . . . . . . . 19-147
How Does the Cascade Classifier Work? . . . . . . . . . . . . . . . . . . . . . . . 19-147
Create a Cascade Classifier Using the trainCascadeObjectDetector . . 19-148
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-151
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-153
Train Stop Sign Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-157

Using OCR Trainer App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-161

Open the OCR Trainer App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-161
Train OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-161
App Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-163

Create a Custom Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-165

Example of a Custom Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . 19-165

Image Retrieval with Bag of Visual Words . . . . . . . . . . . . . . . . . . . . . . 19-168

Retrieval System Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-169
Evaluate Image Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-170

xxi
Image Classification with Bag of Visual Words . . . . . . . . . . . . . . . . . . . 19-171
Step 1: Set Up Image Category Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 19-171
Step 2: Create Bag of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-171
Step 3: Train an Image Classifier With Bag of Visual Words . . . . . . . . 19-172
Step 4: Classify an Image or Image Set . . . . . . . . . . . . . . . . . . . . . . . . 19-173

Motion Estimation and Tracking

20
Multiple Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
Data Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
Track Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-4

Fixed-Point Design
21
Fixed-Point Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2
Fixed-Point Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2
Benefits of Fixed-Point Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-2
Benefits of Fixed-Point Design with System Toolboxes Software . . . . . . . 21-2

Fixed-Point Concepts and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-4

Fixed-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-4
Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-5
Precision and Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-6

Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-8

Modulo Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-8
Two's Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-8
Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-9
Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-10
Casts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-12

Fixed-Point Support for MATLAB System Objects . . . . . . . . . . . . . . . . . 21-15

Getting Information About Fixed-Point System Objects . . . . . . . . . . . . . 21-15
Setting System Object Fixed-Point Properties . . . . . . . . . . . . . . . . . . . . 21-15

Code Generation and Shared Library

22
Simulink Shared Library Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2

Accelerating Simulink Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-3

xxii Contents
Portable C Code Generation for Functions That Use OpenCV Library . . 22-4
Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-4

Vision Blocks Examples

23
Rotate ROI in Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-4

Apply Horizontal Shear Transformation to Image . . . . . . . . . . . . . . . . . . 23-7

Find Location of Object in Image Using Template Matching . . . . . . . . 23-10

Compute Optical Flow Velocities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-13

Rotate an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-15

Generate Image Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-18

Export Image to MATLAB Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-20

Import Video from MATLAB Workspace . . . . . . . . . . . . . . . . . . . . . . . . . 23-23

Find Minimum Value in ROI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-25

Write Image to Binary File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-29

Compute Standard Deviation of ROIs . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-30

Read Video Stored as Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-33

Compare Image Quality Using PSNR . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-37

Compute Autocorrelation of Input Matrix . . . . . . . . . . . . . . . . . . . . . . . . 23-39

Compute Correlation between Two Matrices . . . . . . . . . . . . . . . . . . . . . 23-40

Find Statistics of Circular Blobs in Image . . . . . . . . . . . . . . . . . . . . . . . 23-41

Replace Intensity Values in ROI with its Maximum Value . . . . . . . . . . . 23-45

Median based Image Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-49

Import Image From MATLAB Workspace . . . . . . . . . . . . . . . . . . . . . . . . 23-52

Import Image from Specified Location . . . . . . . . . . . . . . . . . . . . . . . . . . 23-54

Remove Interlacing Effect From Image . . . . . . . . . . . . . . . . . . . . . . . . . . 23-58

Estimate Motion between Two Images . . . . . . . . . . . . . . . . . . . . . . . . . . 23-61

xxiii
Enhance Contrast of Grayscale Image Using Histogram Equalization
........................................................ 23-63

Enhance Contrast of Color Image Using Histogram Equalization . . . . 23-66

Compute Mean of ROIs in Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-69

Detect Corners in Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-72

Edge Detection of Intensity Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-76

Read, Process, and Write Video Frames to File . . . . . . . . . . . . . . . . . . . 23-79

Find Local Maxima in Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-81

Read, Convert, and View Video from File . . . . . . . . . . . . . . . . . . . . . . . . 23-84

Read and Display YCbCr Video from File . . . . . . . . . . . . . . . . . . . . . . . . . 23-86

Display Frame Rate of Input Video . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-88

Draw Rectangles on Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-89

Draw Circles on Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-91

Overlay Images Using Binary Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-93

Linearly Combine Two Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-98

Pad Zeros to Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-102

Insert Text into Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-105

Compress Image Using 2-D DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-108

Draw Markers on Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-112

Read and Display RGB Video from File . . . . . . . . . . . . . . . . . . . . . . . . . 23-115

Label Objects in Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-117

Boundary Extraction of Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . 23-121

Select String to Insert into Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-125

Insert Two Strings into Image at Different Locations . . . . . . . . . . . . . 23-128

Dilation of Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-130

Find Complement of Intensity Image . . . . . . . . . . . . . . . . . . . . . . . . . . 23-132

Perform Top-Hat Filtering of Binary Image . . . . . . . . . . . . . . . . . . . . . 23-135

Perform Bottom-hat Filtering of Binary Image . . . . . . . . . . . . . . . . . . 23-138

xxiv Contents
Perform Opening of Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-141

Perform Closing of Binary Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-144

Blur Image Using Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-147

Convert Image Color Space from RGB to YCbCr . . . . . . . . . . . . . . . . . 23-150

Convert Data Type and Color Space of Image from RGB to HSV . . . . . 23-153

Perform Gamma Correction of Image . . . . . . . . . . . . . . . . . . . . . . . . . . 23-156

Adjust Contrast of Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-159

Remove Impulse Noise from Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-162

Draw Hough Lines on Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-165

Construct Laplacian Pyramid Image . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-167

Apply Affine Transformation to Image . . . . . . . . . . . . . . . . . . . . . . . . . . 23-170

Trace Boundary of Object in Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-173

Convert Grayscale Image to Binary Image . . . . . . . . . . . . . . . . . . . . . . 23-177

Perform Chroma Resampling of Image . . . . . . . . . . . . . . . . . . . . . . . . . 23-180

Compute Variance of ROIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-183

Smooth Image Using Gaussian Kernel . . . . . . . . . . . . . . . . . . . . . . . . . 23-187

Plot Hough Transform of Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-190

Apply Vertical Shear Transformation to Image . . . . . . . . . . . . . . . . . . . 23-194

Resize ROI in Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-197

Demosaic an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-200

Rotate an Image in Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-202

Filter Image Using FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-205

Visualize Point Cloud Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-209

Filter Image in Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-211

Transpose Blocks of Image Using Block Processing . . . . . . . . . . . . . . 23-214

Resize RGB Image Using Block Processing . . . . . . . . . . . . . . . . . . . . . 23-217

xxv
1

Camera Calibration and SfM Examples

• “Monocular Visual-Inertial SLAM” on page 1-2

• “Code Generation for Monocular Visual Simultaneous Localization and Mapping ” on page 1-22
• “Build and Deploy Visual SLAM Algorithm with ROS in MATLAB” on page 1-25
• “Monocular Visual-Inertial Odometry Using Factor Graph” on page 1-30
• “Visual SLAM with an RGB-D Camera” on page 1-54
• “Import Stereo Camera Parameters from ROS” on page 1-67
• “Import Camera Intrinsic Parameters from ROS” on page 1-71
• “Develop Visual SLAM Algorithm Using Unreal Engine Simulation” on page 1-75
• “Visual Localization in a Parking Lot” on page 1-88
• “Stereo Visual SLAM for UAV Navigation in 3D Simulation” on page 1-94
• “Camera Calibration Using AprilTag Markers” on page 1-100
• “Configure Monocular Fisheye Camera” on page 1-117
• “Monocular Visual Simultaneous Localization and Mapping” on page 1-122
• “Structure from Motion from Two Views” on page 1-144
• “Stereo Visual Simultaneous Localization and Mapping” on page 1-153
• “Evaluating the Accuracy of Single Camera Calibration” on page 1-167
• “Measuring Planar Objects with a Calibrated Camera” on page 1-172
• “Depth Estimation from Stereo Video” on page 1-181
• “Structure from Motion from Multiple Views” on page 1-192
• “Uncalibrated Stereo Image Rectification” on page 1-199
1 Camera Calibration and SfM Examples

Monocular Visual-Inertial SLAM

Simultaneous Localization and Mapping (SLAM) is a complex process that involves determining the
position and orientation of a sensor, such as a camera, in relation to its surroundings. It also entails
creating a map of the environment by identifying the 3D locations of various points. This example
demonstrates how to effectively perform SLAM by combining images captured by a monocular
camera with measurements obtained from an IMU sensor.

The method demonstrated in this example is inspired by ORB-SLAM3 which is a feature-based visual-
inertial SLAM algorithm. The IMU and camera fusion is achieved using a factorGraph (Navigation
Toolbox), which is a bipartite graph that consists of two types of nodes:

1 Variable Nodes: They represent the unknown random variables in an estimation problem such as
the position of a mobile robot or a 3D point.
2 Factor Nodes: They represent a function which quantifies the relationship between the random
variables. For example, a camera correspondence factor between 2D image points and 3D map
points or an IMU measurement factor between two positions.

Factor graphs are a popular way of designing algorithms for sensor fusion-based SLAM because they
provide an efficient and modular solution to SLAM problems and allow the graph to be easily updated
if new measurements or constraints become available. Additionally, this type of model provides a
flexible approach incorporating different types of sensors and data, including visual, lidar and inertial
sensors, which makes it useful for variety of of SLAM applications.

This example illustrates how to construct a monocular visual-inertial SLAM pipeline using a factor
graph step by step.

Overview

The visual-inertial SLAM pipeline includes the following steps:

• Initial IMU Bias Estimation (Optional): If your data sequence contains a sufficient number of
static frames, you can use them to estimate the IMU bias. This helps you initialize the factor graph
later.

1-2
Monocular Visual-Inertial SLAM

• Map Initialization: Start by initializing the map of 3-D points from two video frames. The 3-D
points and relative camera pose are computed using triangulation based on 2-D ORB feature
correspondences.
• Camera and IMU alignment: Once the map is initialized, generate a set of camera poses using
Structure From Motion (SFM). Then use corresponding IMU measurements to align the camera
and IMU frames and to find the scale conversion between the IMU and camera odometries.
• Tracking: Once the camera and IMU data are aligned and scaled correctly, initialize the factor
graph. Then, for each new key frame, estimate the camera pose by matching features in the
current frame to features in the last key frame. Refine the estimated camera pose by tracking the
local map. Then, add the camera poses to the factor graph with their corresponding IMU
measurements.
• Local Mapping: Use the current frame to create new 3-D map points if it is identified as a key
frame. At this stage, the factor graph is optimized, and the camera poses and 3-D points are
adjusted using a set of local key frames.
• Loop Closure: Detect loops for each key frame by comparing the key frame against all previous
key frames using the bag-of-features approach. Once a loop closure is detected, add a new link to
the factor graph to reflect the connection found by the loop closure.

Download the Input Data Sequence

The data used in this demo has been extracted from the Blackbird data set (NYC Subway Winter). It
represents a sequence from a UAV flying around in a simulated subway environment. Download the
MAT file which contains the images, camera intrinsics and IMU measurements.

uavData = helperDownloadSLAMData();

images = uavData.images;
intrinsics = uavData.intrinsics;
timeStamps = uavData.timeStamps;

Set up the noise parameters using the factorIMUParameters object. These values are typically
provided by the IMU manufacturer.

imuParams = factorIMUParameters(SampleRate=100,GyroscopeNoise=0.1, ...

GyroscopeBiasNoise=3e-3,AccelerometerNoise=0.3, ...
AccelerometerBiasNoise=1e-3,ReferenceFrame="ENU");

Initial Bias Estimation

A section of the input data sequence has only static frames. These frames are useful for estimating
the initial bias of the IMU sensor, which you can then use as a prior when we construct the factor
graph. Accounting for the bias helps to reduce pose estimation errors because it can serve as an
initial guess during the optimization. This step is optional, as it depends on the availibity of static
frames.

startBiasIdx = 40; % Index from which the bias estimation starts

startFrameIdx = 150; % Index from which the SLAM pipline starts
initBias = helperEstimateInitialBias(uavData,startBiasIdx,startFrameIdx,imuParams);

Map Initialization

We start by initializing the map which holds the 3-D world points. This step is crucial and has a
significant impact on the accuracy of final SLAM result. Find the initial ORB feature point
correspondences using the matchFeatures function between a pair of images. After the

1-3
1 Camera Calibration and SfM Examples

correspondences are found, you can use these two geometric transformation models to establish map
initialization:

• Homography: If the scene is planar, a homography projective transformation is a better choice to

describe feature point correspondences.
• Fundamental Matrix: If the scene is non-planar, a fundamental matrix must be used instead.

The homography and the fundamental matrix can be computed using estgeotform2d and
estimateFundamentalMatrix, respectively. The model that results in a smaller reprojection error
is selected to estimate the relative rotation and translation between the two frames using
estrelpose. Because the RGB images are taken by a monocular camera which does not provide the
depth information, the relative translation can only be recovered up to a specific scale factor.

Given the relative camera pose and the matched feature points in the two images, the 3-D locations of
the matched points are determined using triangulateMultiview function. A triangulated map
point is valid when it is in the front of both cameras, when its reprojection error is low, and when the
parallax of the two views of the point is sufficiently large.
% Set random seed for reproducibility
rng(0);

% Detect and extract ORB features

scaleFactor = 1.2;
numLevels = 8;
numPoints = 5000;

currI = images{1, startFrameIdx};

[preFeatures, prePoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoint

currFrameIdx = startFrameIdx + 1;
firstI = currI; % Preserve the first frame

isMapInitialized = false;

% Map initialization loop

while ~isMapInitialized && currFrameIdx < size(images,2)
currI = images{currFrameIdx};

[currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, nu

currFrameIdx = currFrameIdx + 1;

% Find putative feature matches

indexPairs = matchFeatures(preFeatures, currFeatures, Unique=true, MaxRatio=0.9, MatchThresho

% If not enough matches are found, check the next frame

minMatches = 100;
if size(indexPairs, 1) < minMatches
continue
end

preMatchedPoints = prePoints(indexPairs(:,1),:);
currMatchedPoints = currPoints(indexPairs(:,2),:);

% Compute homography and evaluate reconstruction

[tformH, scoreH, inliersIdxH] = helperComputeHomography(preMatchedPoints, currMatchedPoints);

1-4
Monocular Visual-Inertial SLAM

% Compute fundamental matrix and evaluate reconstruction

[tformF, scoreF, inliersIdxF] = helperComputeFundamentalMatrix(preMatchedPoints, currMatchedP

% Select the model based on a heuristic

ratio = scoreH/(scoreH + scoreF);
ratioThreshold = 0.45;
if ratio > ratioThreshold
inlierTformIdx = inliersIdxH;
tform = tformH;
else
inlierTformIdx = inliersIdxF;
tform = tformF;
end

% Computes the camera location up to scale. Use half of the

% points to reduce computation
inlierPrePoints = preMatchedPoints(inlierTformIdx);
inlierCurrPoints = currMatchedPoints(inlierTformIdx);
[relPose, validFraction] = estrelpose(tform, intrinsics, ...
inlierPrePoints(1:2:end), inlierCurrPoints(1:2:end));

% If not enough inliers are found, move to the next frame

if validFraction < 0.9 || numel(relPose)==3
continue
end

% Triangulate two views to obtain 3-D map points

minParallax = 1; % In degrees
[isValid, xyzWorldPoints, inlierTriangulationIdx] = helperTriangulateTwoFrames(...
rigidtform3d, relPose, inlierPrePoints, inlierCurrPoints, intrinsics, minParallax);

if ~isValid
continue
end

% Get the original index of features in the two key frames

indexPairs = indexPairs(inlierTformIdx(inlierTriangulationIdx),:);

isMapInitialized = true;

disp(['Map initialized with frame ', num2str(startFrameIdx),' and frame ', num2str(currFrameI
end % End of map initialization loop

Map initialized with frame 150 and frame 216

if isMapInitialized
% Show matched features
hfeature = showMatchedFeatures(firstI, currI, prePoints(indexPairs(:,1)), ...
currPoints(indexPairs(:, 2)), "Montage");
else
error('Unable to initialize the map.')
end

1-5
1 Camera Calibration and SfM Examples

Store Initial Key Frames and Map Points

After the map is initialized using two frames, you can use imageviewset and worldpointset to
store the two key frames and the corresponding map points:

• imageviewset stores the key frames and their attributes, such as ORB descriptors, feature
points, camera poses, and graph connections between the key frames, such as feature points
matches and relative camera poses. It also builds and updates a pose graph. The absolute camera
poses and relative camera poses of odometry edges are stored as rigidtform3d objects. The
relative camera poses of loop-closure edges are stored as affinetform3d objects.
• worldpointset stores 3-D positions of the map points and the 3-D to 2-D projection
correspondences, indicating which map points are observed in a key frame and which key frames
observe a map point. It also stores other attributes of map points, such as the mean view
direction, the representative ORB descriptors, and the range of distance at which the map point
can be observed.

% Create an empty imageviewset object to store key frames

vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points

mapPointSet = worldpointset;

% Add the first key frame. Place the camera associated with the first
% key frame at the origin, oriented along the Z-axis
preViewId = 1;
vSetKeyFrames = addView(vSetKeyFrames, preViewId, rigidtform3d, Points=prePoints,...
Features=preFeatures.Features);

% Add the second key frame

currViewId = 2;
vSetKeyFrames = addView(vSetKeyFrames, currViewId, relPose, Points=currPoints,...
Features=currFeatures.Features);

% Add connection between the first and the second key frame
vSetKeyFrames = addConnection(vSetKeyFrames, preViewId, currViewId, relPose, Matches=indexPairs);

% Add 3-D map points

[mapPointSet, newPointIdx] = addWorldPoints(mapPointSet, xyzWorldPoints);

% Add observations of the map points

preLocations = prePoints.Location;
currLocations = currPoints.Location;
preScales = prePoints.Scale;
currScales = currPoints.Scale;

% Add image points corresponding to the map points in the first key frame
mapPointSet = addCorrespondences(mapPointSet, preViewId, newPointIdx, indexPairs(:,1));

% Add image points corresponding to the map points in the second key frame
mapPointSet = addCorrespondences(mapPointSet, currViewId, newPointIdx, indexPairs(:,2));

Initialize Loop Closure Detection Database

Loop detection in this example is performed using the bags-of-words approach. You can create a
visual vocabulary represented as a bagOfFeatures object offline using the ORB descriptors
extracted from a large set of images in the dataset by calling:

1-6
Monocular Visual-Inertial SLAM

bag =
bagOfFeatures(imds,CustomExtractor=@helperORBFeatureExtractorFunction,TreePro
perties=[3, 10],StrongestFeatures=1);

where imds is an imageDatastore object storing the training images and

helperORBFeatureExtractorFunction is the ORB feature extractor function. See Image
Retrieval with Bag of Visual Words for more information.

The loop closure process builds a database, by encode the incoming images into feature vectors
compactly representing each view.

% Load the bag of features data created offline

bofData = load("uavBoF.mat");
encodedImages = [];

% Add features of the first two key frames to the database

encodedImages = [encodedImages; encode(bofData.bag,firstI,Verbose=false)];
encodedImages = [encodedImages; encode(bofData.bag,currI ,Verbose=false)];

Initialize the Factor Graph and Configure Trajectory and Map Visualization

Refine the initial reconstruction using a factorGraph (Navigation Toolbox), which optimizes both
camera poses and world points to minimize the overall reprojection errors.

In MATLAB, working with a factor graph involves managing a set of unique IDs for different parts of
the graph, including: poses, 3D points or IMU measurements. By using these IDs, you can add
additional constraints can be added between the variable nodes in the factor graph, such as the
corresponding 2D image matches for a set of 3D points, or gyroscope measurements between two
sets of poses. This is accomplished using factor nodes. You can generate unique IDs using the
generateNodeID (Navigation Toolbox) function.

To initialize the factor graph, two pose variable nodes representing the first and second poses are
combined with a factorCameraSE3AndPointXYZ (Navigation Toolbox) that represents the links
between 2D feature points and 3D map points. At this stage, IMU measurements cannot be added to
the graph yet, as the correct alignment and scaling of IMU data require several camera poses to be
collected first.

Once the initialization is successful, update the attributes of the map points including 3-D locations,
view direction, and depth range. You can use helperVisualizeMotionAndStructure to visualize the map
points and the camera locations.

% Initialize the factor graph

% - viewToNode : array to convert imageViewSet indices to factorGraph pose id
% - pointToNode: array to convert worldPointSet indices to factorGraph 3d point id
[fGraph, refinedAbsPoses, refinedPoints, viewToNode, pointToNode] = helperInitFactorGraph(vSetKey

% Update key frames with the refined poses

vSetKeyFrames = updateView(vSetKeyFrames, refinedAbsPoses);
vSetKeyFrames = updateConnection(vSetKeyFrames, preViewId, currViewId, relPose);

% Update map points with the refined positions

mapPointSet = updateWorldPoints(mapPointSet, newPointIdx, refinedPoints);

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, newPointIdx, vSetKeyFrames.Views);

1-7
1 Camera Calibration and SfM Examples

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Update factor graph with refined poses

fgRefinedPose = double([refinedAbsPoses.AbsolutePose(1).Translation rotm2quat(refinedAbsPoses.Abs
nodeState(fGraph,1,fgRefinedPose);
fgRefinedPose = double([refinedAbsPoses.AbsolutePose(2).Translation rotm2quat(refinedAbsPoses.Abs
nodeState(fGraph,2,fgRefinedPose);

% Update factor graph with refined points

ptsIds = nodeIDs(fGraph,NodeType="POINT_XYZ");
nodeState(fGraph,ptsIds,refinedPoints);

% Visualize matched features in the current frame

close(hfeature.Parent.Parent);
featurePlot = helperVisualizeMatchedFeatures(currI, currPoints(indexPairs(:,2)));

% Visualize initial map points and camera trajectory

mapPlot = helperVisualizeMotionAndStructureVI(vSetKeyFrames, mapPointSet);

showPlotLegend(mapPlot);

1-8
Monocular Visual-Inertial SLAM

% ViewId of the current key frame

currKeyFrameId = currViewId;

% ViewId of the last key frame

lastKeyFrameId = currViewId;

% Index of the last key frame in the input image sequence

lastKeyFrameIdx = currFrameIdx - 1;

% Indices of all the key frames in the input image sequence

addedFramesIdx = [1; lastKeyFrameIdx];

IMU and Camera Alignment and Scaling

While using a monocular camera, pose estimates are obtained at an unknown scale, and differ from
the metric measurements collected by an IMU. To address this issue, accelerometer measurements
are used to estimate a scale factor which can scale the input camera poses to match the metric scale
of the IMU measurements. In addition to that, you must align the reference frames of the input poses
with the local navigation reference frame of the IMU to remove the constant gravity acceleration that
is captured by the accelerometer can be removed. The estimateGravityRotationAndPoseScale
function estimates both the scale and gravity rotation which you can use to improve the accuracy of
the sensor fusion.

1-9
1 Camera Calibration and SfM Examples

The frames used to obtain this alignment and scaling should be collected while keeping in mind a set
of criteria:

• Map initialization should be as accurate as possible, because it affects the camera poses greatly.
Extracting a large number of features may be necessary to obtain great accuracy.
• Camera poses should contain both static and moving frames, with the moving frames containing a
strong acceleration in the gravity axis direction.
• For efficiency, a sliding window should be used to store the poses. The window size should be kept
relatively small to reduce the amount of scale drift in the poses.

Once the estimation has converged, update the poses on the factor graph by rotating and scaling
them correctly. Also update the factor graph by setting the bias prior using the previously estimated
value. Use the factorIMU (Navigation Toolbox) to link the corresponding IMU measurements to the
previously stored poses. Finally, optimize the factor graph to adjust the poses and 3D points
according to the newly added constrains. Use factorGraphSolverOptions (Navigation Toolbox) to
set up the optimization parameters for the factor graph.

% Main loop
isLastFrameKeyFrame = true;
isIMUInitialized = false;
isLoopClosed = false;

keyTimeStamps = [timeStamps.imageTimeStamps(startFrameIdx); timeStamps.imageTimeStamps(currFram

keyFrameToFrame = [startFrameIdx; currFrameIdx-1]; % Array to convert keyframe ids to frame id
biasToNode = []; % Array to convert imageViewSet indices to factorGraph bias id
velToNode = []; % Array to convert imageViewSet indices to factorGraph velocity id

loopCtr = 0; % Loop closure rolling window counter

scale = 0; % Placeholder for IMU-Camera scale

% Factor graph optimization parameters

fgso = factorGraphSolverOptions;
fgso.MaxIterations = 20;
fgso.VerbosityLevel = 0;
fgso.FunctionTolerance = 1e-5;
fgso.GradientTolerance = 1e-5;
fgso.StepTolerance = 1e-5;
fgso.TrustRegionStrategyType = 1;

while currFrameIdx < size(images,2)

if ~isIMUInitialized && currKeyFrameId>30

camPoses = poses(vSetKeyFrames);

% Estimate the gravity rotation and pose scale that helps in transforming input camera po
% local navigation reference frame of IMU
[gDir,scale] = helperAlignIMUCamera(camPoses,uavData,imuParams,...
keyFrameToFrame,currKeyFrameId);

if scale>0.3
% Scale estimation was successful
isIMUInitialized = true;

xyzPoints1 = mapPointSet.WorldPoints;
xyzPoints2 = nodeState(fGraph,nodeIDs(fGraph,NodeType="POINT_XYZ"));

1-10
Monocular Visual-Inertial SLAM

% Transform and scale the input camera poses and XYZ points using the estimated gravi
[updatedCamPoses,updatedXYZPoints1,updatedXYZPoints2] = helperTransformToIMU(...
camPoses,xyzPoints1,xyzPoints2,gDir,scale); %two sets of points because we can't

% Update mapPointSet and vSetKeyFrames

vSetKeyFrames = updateView(vSetKeyFrames, updatedCamPoses);
mapPointSet = updateWorldPoints(mapPointSet, (1:mapPointSet.Count), updatedXYZPoints1

% Update the poses and 3D points values in the factor graph

for i=1:height(updatedCamPoses)
pose = updatedCamPoses.AbsolutePose(i);
nodeState(fGraph, viewToNode(i), double([pose.Translation rotm2quat(pose.R)]));
end
nodeState(fGraph,nodeIDs(fGraph,NodeType="POINT_XYZ"),updatedXYZPoints2);

% Update the factor graph by linking the IMU mesurements

veloId = generateNodeID(fGraph,length(viewToNode));
biasId = generateNodeID(fGraph,length(viewToNode));

velToNode = [velToNode;veloId'];
biasToNode = [biasToNode;biasId'];

initVelPrior = factorVelocity3Prior(velToNode(1));
initBiasPrior = factorIMUBiasPrior(biasToNode(1));

addFactor(fGraph,initVelPrior);
addFactor(fGraph,initBiasPrior);

nodeState(fGraph, biasToNode(1), initBias);

for i=2:length(viewToNode)
imuMeasurements = helperExtractIMUMeasurements(uavData, keyFrameToFrame(i-1), key
imuId = [viewToNode(i-1),velToNode(i-1),biasToNode(i-1), ...
viewToNode(i) ,velToNode(i) ,biasToNode(i)];
fIMU = factorIMU(imuId,imuMeasurements.gyro,imuMeasurements.accel,imuParams,Senso
addFactor(fGraph,fIMU);
end

optimize(fGraph,fgso);

% Compare the scaled poses to the ground truth

alignementPlot = helperDrawScaledandRotatedTraj(uavData,camPoses,scale,startFrameIdx,
end
end

Tracking

To determine when to insert a new key frame, perform the tracking process for each frame using
these steps:

1 Extract ORB features for each new frame and then matche (using matchFeatures), with
features in the last key frame that have known corresponding 3-D map points.
2 Estimate the camera pose with the Perspective-n-Point algorithm using estworldpose.
3 Using the camera pose, project the map points observed by the last key frame into the current
frame and search for feature correspondences using matchFeaturesInRadius.

1-11
1 Camera Calibration and SfM Examples

4 With 3-D to 2-D correspondence in the current frame, refine the camera pose by performing a
motion-only bundle adjustment using factorGraph. Construct a factor graph using the steps in
the previous section and then fix the 3D point nodes before proceeding to the optimization step.
5 Project the local map points into the current frame to search for more feature correspondences
using matchFeaturesInRadius and refine the camera pose again using motion-only bundle
adjustment.
6 Determine if the current frame is a new key frame. If the current frame is a key frame, continue
to the Local Mapping process. Otherwise, start Tracking for the next frame.

If tracking is lost because not enough number of feature points could be matched, try inserting new
key frames more frequently.

% Extract ORB features from the current image

currI = images{1, currFrameIdx};
[currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, ..
numPoints, intrinsics);

% Track the last key frame

[currPose, mapPointsIdx, featureIdx] = helperTrackLastKeyFrameVI(mapPointSet, ...
vSetKeyFrames.Views, currFeatures, currPoints, lastKeyFrameId, intrinsics, scaleFactor);

% Track the local map and check if the current frame is a key frame.
numSkipFrames = 20;
numPointsKeyFrame = 200;
[localKeyFrameIds, currPose, mapPointsIdx, featureIdx, isKeyFrame] = ...
helperTrackLocalMapVI(mapPointSet, vSetKeyFrames, mapPointsIdx, ...
featureIdx, currPose, currFeatures, currPoints, intrinsics, scaleFactor, numLevels, ...
isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

% Visualize matched features

updatePlot(featurePlot, currI, currPoints(featureIdx));

if ~isKeyFrame
isLastFrameKeyFrame = false;
currFrameIdx = currFrameIdx + 1;
continue
else
isLastFrameKeyFrame = true;
keyFrameToFrame = [keyFrameToFrame; currFrameIdx];
encodedImages = [encodedImages; encode(bofData.bag,currI,Verbose=false)];
keyTimeStamps = [keyTimeStamps; timeStamps.imageTimeStamps(currFrameIdx)];
loopCtr = loopCtr+1;
end

% Update current key frame ID

currKeyFrameId = currKeyFrameId + 1;

% Extract IMU measurements between two KFs

imuMeasurements = helperExtractIMUMeasurements(uavData, keyFrameToFrame(end-1), keyFrameToFra

% Add the new key frame and IMU data

[fGraph, viewToNode, velToNode, biasToNode, mapPointSet, vSetKeyFrames] = helperAddNewKeyFram
fGraph, viewToNode, pointToNode, velToNode, biasToNode, mapPointSet, vSetKeyFrames, imuMe
currPose, currFeatures, currPoints, mapPointsIdx, featureIdx, localKeyFrameIds, intrinsic
uavData.camToIMUTransform, imuParams, isIMUInitialized);

1-12
Monocular Visual-Inertial SLAM

% Remove outlier map points that are observed in fewer than 3 key frames
[mapPointSet,pointToNode] = helperCullRecentMapPoints(mapPointSet, pointToNode, mapPointsIdx,

% Create new map points by triangulation

minNumMatches = 10;
minParallax = 3;
[fGraph, pointToNode, mapPointSet, vSetKeyFrames, newPointIdx] = helperCreateNewMapPointsVI(.
fGraph, pointToNode, viewToNode, mapPointSet, vSetKeyFrames, ...
currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minParallax);

Loop Closure

Loop candidates are recognized by comparing feature descriptors, which represent encoded images
using a pre-trained bag of words. The detected loop candidate is validated when it is located near a
previous pose, and the distance between the feature descriptors associated to both poses is smaller
than a set threshold.

When a valid loop candidate is found, use estgeotform3d to compute the relative pose between the
loop candidate frame and the current key frame. Then add the loop connection with the relative pose
to the factor graph.

loopClosureCheck = 350; % Before this frame, loop closure does not occur.
% Check loop closure after some key frames have been created
if currKeyFrameId > loopClosureCheck && loopCtr>10

% Minimum number of feature matches of loop edges

loopEdgeNumMatches = 75;

% Detect possible loop closure key frame candidates

potentialLoopCandidates = helperFindPotentialLC(vSetKeyFrames);

if ~isempty(potentialLoopCandidates)
[isDetected, validLoopCandidates] = helperCheckLoopClosureVI(encodedImages, potential
if isDetected
% Add loop closure connections
[fGraph, isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnectionsVI(f
mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
currFeatures, currPoints, loopEdgeNumMatches, intrinsics);
if isLoopClosed
loopCtr=0;
end
end
end

end

Local Mapping

Local mapping is performed for every key frame. When a new key frame is determined, add it and its
corresponding IMU measurement to the factor graph and update to the mapPointSet and
vSetKeyFrames objects. To ensure that mapPointSet contains as few outliers as possible, a valid
map point must be observed in at least 3 key frames.

Create new map points by triangulating ORB feature points in the current key frame and its
connected key frames. For each unmatched feature point in the current key frame, search for a match
with other unmatched points in the connected key frames using matchFeatures. The local

1-13
1 Camera Calibration and SfM Examples

optimization of the factor graph refines the pose of the current key frame, the poses of connected key
frames, and all the map points observed in these key frames.

% Local bundle adjustment

[refinedViews, dist] = connectedViews(vSetKeyFrames, currKeyFrameId, MaxDistance=2);
refinedKeyFrameIds = refinedViews.ViewId;
fixedViewIds = refinedKeyFrameIds(dist==2);
fixedViewIds = fixedViewIds(1:min(10, numel(fixedViewIds)));

refinedKeyFrameIds = [refinedKeyFrameIds; currKeyFrameId];

[fGraph, solInfo] = helperLocalFactorGraphOptimization(fGraph, viewToNode, refinedKeyFrameIds

allOptNodes=solInfo.OptimizedNodeIDs;
optPointNodes=setdiff(allOptNodes,viewToNode(refinedKeyFrameIds));
[~,mapPointIdx]=intersect(pointToNode,optPointNodes,'stable');

% Update world points

mapPointSet = updateWorldPoints(mapPointSet,mapPointIdx,nodeState(fGraph,pointToNode(mapPoint

% Update view poses

for i=1:length(refinedKeyFrameIds)
viewId = refinedKeyFrameIds(i);
fgPose = nodeState(fGraph,viewToNode(viewId));
absPose = rigidtform3d(quat2rotm(fgPose(4:7)),fgPose(1:3));
vSetKeyFrames = updateView(vSetKeyFrames,viewId,absPose);
end

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

% Visualize 3D world points and camera trajectory

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

% Update IDs and indices

lastKeyFrameId = currKeyFrameId;
lastKeyFrameIdx = currFrameIdx;
addedFramesIdx = [addedFramesIdx; currFrameIdx]; %#ok<AGROW>
currFrameIdx = currFrameIdx + 1;

end % End of main loop

1-14
Monocular Visual-Inertial SLAM

Loop edge added between keyframe: 42 and 354

Loop edge added between keyframe: 72 and 385
Loop edge added between keyframe: 102 and 415
Loop edge added between keyframe: 178 and 483

1-15
1 Camera Calibration and SfM Examples

Global Optimization

Once all the frames have been processed, we proceed to optimize the factor graph one last time by
including all the keyframes, 3D points and IMU measurements.
% Optimize the poses
vSetKeyFramesOptim = vSetKeyFrames;
fGraphOptim = fGraph;

optimize(fGraphOptim,fgso);

% Update map points after optimizing the poses

for i=1:vSetKeyFrames.NumViews
fgPose = nodeState(fGraphOptim,viewToNode(i));
absPose = rigidtform3d(quat2rotm(fgPose(4:7)),fgPose(1:3));
vSetKeyFramesOptim = updateView(vSetKeyFramesOptim,i,absPose);
end

mapPointSet = helperUpdateGlobalMap(mapPointSet,vSetKeyFrames,vSetKeyFramesOptim);

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

1-16
Monocular Visual-Inertial SLAM

% Plot the optimized camera trajectory

plotOptimizedTrajectory(mapPlot, poses(vSetKeyFramesOptim))

% Update legend
showPlotLegend(mapPlot);

% Load ground truth

gTruth = uavData.gTruth;

% Plot the actual camera trajectory

plotActualTrajectory(mapPlot, uavData, keyFrameToFrame, gDir);

% Show legend
showPlotLegend(mapPlot);

Comparison Against the Ground Truth

The downloaded data contains the gTruth array, which you can use to compare the obtained
trajectory with the ground truth to qualitatively evaluate the accuracy of the SLAM pipeline. You can
also calculate the root-mean-square-error (RMSE) and absolute-mean-error (AME) of trajectory
estimates.

[rmse, ame] = helperGetAccuracy(poses(vSetKeyFramesOptim),uavData,keyTimeStamps,startFrameIdx,gDi

1-17
1 Camera Calibration and SfM Examples

"AME for trajectory (m): " "0.60919"

"RMSE for trajectory (m): " "0.53816"

In this example you learned how to utilize the factor graph to combine pose estimates from a camera
with IMU measurements to obtain more accurate results.

Supporting Functions

Helper functions related to factor graph processing are included below.

Other functions are included in separate files. For more information on these functions, please refer
to the “Monocular Visual Simultaneous Localization and Mapping” on page 1-122 example.

helperAddLoopConnectionsVI adds connections between the current keyframe and the valid loop
candidate.

helperCheckLoopClosureVI detects key frames that are loop candidates by retrieving visually
similar images from the database.

helperCreateNewMapPointsVI creates new map points by triangulation.

helperTrackLastKeyFrameVI estimates the current camera pose by tracking the last key frame.

helperTrackLocalMapVI refines the current camera pose by tracking the local map.

helperDetectAndExtractFeatures detects and extracts and ORB features from the image.

helperComputeHomography computes homography and evaluates reconstruction.

helperComputeFundamentalMatrix computes fundamental matrix and evaluates reconstruction.

helperTriangulateTwoFrames triangulates two frames to initialize the map.

helperCullRecentMapPoints culls recently added map points.

helperUpdateGlobalMap updates 3-D locations of map points after pose graph optimization

helperAlignIMUCamera estimates the gravity rotation and pose scale to transform the input poses
to the local navigation reference frame of IMU using IMU measurements and factor graph
optimization.

helperTransformToIMU transforms and scales input poses and XYZ 3-D points according to
specified gravity direction and pose scale.

helperLocalFactorGraphOptimization optimizes the factor graph using a set of local key frames.

helperInitFactorGraph initializes the factor graph using the results obtained from SfM.

function [fGraph, refinedAbsPoses, refinedPoints, viewToNode, pointToNode] = helperInitFactorGra

K = intrinsics.K;
camInfo = ((K(1,1)/1.5)^2)*eye(2);

refinedKeyFrameIds = vSetKeyFrames.Views.ViewId';

graphIds = [];

1-18
Monocular Visual-Inertial SLAM

graphMes = [];
graphPts = [];

fGraph = factorGraph;

% Generate the necessary IDs for poses and 3D points

zeroId = generateNodeID(fGraph,1);
viewToNode = generateNodeID(fGraph,vSetKeyFrames.NumViews)';
pointToNode = generateNodeID(fGraph,mapPointSet.Count)';

mapPointIdx = [];

% Extract the data from mapPointSet to initialize the factor graph

for i=1:length(refinedKeyFrameIds)
viewId=refinedKeyFrameIds(i);
[pointIndices,featureIndices]=findWorldPointsInView(mapPointSet,viewId);

pointIds = pointToNode(pointIndices);
poseIds = ones(length(pointIds),1)*viewToNode(viewId);
graphIds = [graphIds; [poseIds pointIds]];

ptsInView = vSetKeyFrames.Views.Points{viewId,1}.Location;
graphMes = [graphMes; ptsInView(featureIndices,:)];

mapPointIdx = [mapPointIdx; pointIndices];

graphPts = [graphPts; mapPointSet.WorldPoints(pointIndices,:)];
end

[mapPointIdx, uniqueIndex, ~] = unique(mapPointIdx);

graphPts = graphPts(uniqueIndex,:);

% Set up a pose prior using the inital position and fix it

initPos = double([vSetKeyFrames.Views.AbsolutePose(1,1).Translation rotm2quat(vSetKeyFrames.V
initPosePrior = factorPoseSE3Prior(viewToNode(1),Measurement=initPos);
addFactor(fGraph,initPosePrior);
fixNode(fGraph,viewToNode(1));

% Set up a factorCameraSE3AndPointXYZ using the matches extracted above

fCam=factorCameraSE3AndPointXYZ(graphIds, K, Measurement=graphMes, ...
Information=camInfo);
addFactor(fGraph,fCam);
nodeState(fGraph,pointToNode(mapPointIdx),double(graphPts));

% Set up a pose node using the current position

pose=vSetKeyFrames.Views.AbsolutePose(2,1);
pose=double([pose.Translation rotm2quat(pose.R)]);
nodeState(fGraph,viewToNode(2),pose);

% Fix a few 3D points for more stability

ptsIDS = nodeIDs(fGraph,NodeType="POINT_XYZ");
idx = randsample(height(ptsIDS'),int32(height(ptsIDS')*0.05));
fixNode(fGraph,ptsIDS(idx));

optimize(fGraph,factorGraphSolverOptions);

fixNode(fGraph,ptsIDS(idx),false);

% Extract the results of the factor graph optimization and re-package them

1-19
1 Camera Calibration and SfM Examples

poseIDs = nodeIDs(fGraph,NodeType="POSE_SE3");
fgposopt = nodeState(fGraph,poseIDs);

initPose = rigidtform3d(quat2rotm(fgposopt(1,4:7)),fgposopt(1,1:3));
refinedPose = rigidtform3d(quat2rotm(fgposopt(2,4:7)),fgposopt(2,1:3));

ViewId=[1;2];
AbsolutePose=[initPose;refinedPose];

refinedAbsPoses=table(ViewId,AbsolutePose);

ptsIDS = nodeIDs(fGraph,NodeType="POINT_XYZ");
refinedPoints = nodeState(fGraph,ptsIDS);
end

helperAddNewKeyFrame adds key frame data to the factor graph.

function [fGraph, viewToNode, velToNode, biasToNode, mapPoints, vSetKeyFrames] = helperAddNewKeyF

fGraph, viewToNode, pointToNode, velToNode, biasToNode, mapPoints, vSetKeyFrames, imuMeasurem
cameraPose, currFeatures, currPoints, mapPointsIndices, featureIndices, keyFramesIndices, ...
intrinsics, camToIMUTransform, imuParams, initIMU)

N = length(mapPointsIndices);
K = intrinsics.K;
camInfo = ((K(1,1)/1.5)^2)*eye(2);

viewId = vSetKeyFrames.Views.ViewId(end)+1;

vSetKeyFrames = addView(vSetKeyFrames, viewId, cameraPose,...

'Features', currFeatures.Features, ...
'Points', currPoints);

viewsAbsPoses = vSetKeyFrames.Views.AbsolutePose;

% Generate the pose ID

poseId = generateNodeID(fGraph,1);
viewToNode = [viewToNode;poseId];

for i = 1:numel(keyFramesIndices)
localKeyFrameId = keyFramesIndices(i);
[index3d, index2d] = findWorldPointsInView(mapPoints, localKeyFrameId);
[~, index1, index2] = intersect(index3d, mapPointsIndices, 'stable');

prePose = viewsAbsPoses(localKeyFrameId);
relPose = rigidtform3d(prePose.R' * cameraPose.R, ...
(cameraPose.Translation-prePose.Translation)*prePose.R);

if numel(index1) > 5
vSetKeyFrames = addConnection(vSetKeyFrames, localKeyFrameId, viewId, relPose, ...
Matches=[index2d(index1),featureIndices(index2)]);

% Add a 3d pose factor to link both frames in the local map

fgRelPose=double([relPose.Translation rotm2quat(relPose.R)]);
fPose=factorTwoPoseSE3([viewToNode(localKeyFrameId) viewToNode(viewId)],Measurement=f
addFactor(fGraph,fPose);
end
end

1-20
Monocular Visual-Inertial SLAM

mapPoints = addCorrespondences(mapPoints, viewId, mapPointsIndices, ...

featureIndices);

% Generate the IDs to be used in the factorCameraSE3AndPointXYZ

ptsIds = pointToNode(mapPointsIndices);
poseIds = ones(N,1)*poseId;

ptsInView = vSetKeyFrames.Views.Points{viewId,1}.Location;
mesPts = ptsInView(featureIndices,:);

% Add camera factor to the factor graph

fCam = factorCameraSE3AndPointXYZ([poseIds ptsIds], K, Measurement=mesPts, ...
Information=camInfo);
addFactor(fGraph,fCam);

% Add pose node to the factor graph

pose = double([cameraPose.Translation rotm2quat(cameraPose.R)]);
nodeState(fGraph,poseId,pose);

% If the camera poses have been scaled properly, then add the IMU
% factor to the factor graph
if initIMU
velId = generateNodeID(fGraph,1);
velToNode=[velToNode;velId];

biasId = generateNodeID(fGraph,1);
biasToNode=[biasToNode;biasId];

imuId = [viewToNode(end-1),velToNode(end-1),biasToNode(end-1), ...

viewToNode(end) ,velToNode(end) ,biasToNode(end)];
fIMU = factorIMU(imuId,imuMeasurements.gyro,imuMeasurements.accel,imuParams,SensorTransfo
addFactor(fGraph,fIMU);
end
end

References
[1] Dellaert, Frank, and Michael Kaess. Factor Graphs for Robot Perception. Now Publishers, 2017.

[2] Campos, Carlos, et al. “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial,
and Multimap SLAM.” IEEE Transactions on Robotics, vol. 37, no. 6, Dec. 2021, pp. 1874–90.
DOI.org (Crossref), https://doi.org/10.1109/TRO.2021.3075644.

1-21
1 Camera Calibration and SfM Examples

Code Generation for Monocular Visual Simultaneous

Localization and Mapping

This example shows how to use the MATLAB® Coder™ to generate C/C++ code for the visual
simultaneous localization and mapping algorithm from the “Monocular Visual Simultaneous
Localization and Mapping” on page 1-122 example.

Visual simultaneous localization and mapping (vSLAM) is the process of calculating the position and
the orientation of a camera, with respect to its surroundings, while simultaneously mapping the
environment.

This example shows how to process image data from a monocular camera to build a map of an indoor
environment and estimate the trajectory of the camera. The steps in this process are:

1 Package the visual SLAM algorithm from the “Monocular Visual Simultaneous Localization and
Mapping” on page 1-122 example into an entry-point function, helperMonoVisualSLAM.
2 Modify the helperMonoVisualSLAM function to support code generation.
3 Generate C/C++ code, and verify the results.

You can also integrate the generated code into external software for further testing.

Download Data

This example uses data from the TUM RGB-D benchmark [1] on page 1-24. The size of the data set is
1.38 GB. You can download the data set to a temporary folder using this code.

baseDownloadURL = "https://vision.in.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_of
dataFolder = fullfile(tempdir,"tum_rgbd_dataset",filesep);
options = weboptions(Timeout=Inf);
tgzFileName = dataFolder + "fr3_office.tgz";
folderExists = exist(dataFolder,"dir");

% Create a folder in a temporary directory in which to save the downloaded file

if ~folderExists
mkdir(dataFolder)
disp("Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.")
websave(tgzFileName,baseDownloadURL,options);

% Extract contents of the downloaded file

disp("Extracting fr3_office.tgz (1.38 GB) ...")
untar(tgzFileName, dataFolder);
end

Entry-Point Function

To meet the requirements of MATLAB Coder, restructure the code from the “Monocular Visual
Simultaneous Localization and Mapping” on page 1-122 example into the entry-point function
helperMonoVisualSLAM. This function takes a cell array of images as an input and outputs 3-D
worldpointset, estimated camera poses, and frame indices.

The helperMonoVisualSLAM function performs the steps:

1-22
Code Generation for Monocular Visual Simultaneous Localization and Mapping

1 Initializes a map of 3-D points from the first two video frames, and then computes the 3-D points
and relative camera pose using triangulation based on 2-D ORB ORBPoints feature
correspondences.
2 For each new frame, estimates the camera pose by matching features in the current frame to
features in the previous key frame. The function refines the estimated camera pose by tracking
the local map.
3 If the function identifies the new frame as a key frame, the function uses the new frame to create
new 3-D points. In this step, the function uses bundle adjustment to minimize reprojection errors
in the estimated camera poses and 3-D points.
4 Detects loops in each key frame by comparing it with all previous key frames using the bag-of-
features approach. Once the function detects a loop closure, it optimizes the pose graph by
refining the camera poses of all the key frames.

Convert Image Inputs to Enable Code Generation

As code generation does not support the imageDatastore object, read the images, convert them to
grayscale, and store them in a cell array.

imageFolder = dataFolder + "rgbd_dataset_freiburg3_long_office_household/rgb/";

imds = imageDatastore(imageFolder);
for frameIdx = 1:numel(imds.Files)
Images{frameIdx} = im2gray(readimage(imds,frameIdx));
end

Generate C++ Code

Use the “Compilation Directive %#codegen” (MATLAB Coder) function to compile the
helperVisualSLAMCodegen function into a MEX file. You can specify the -report option to
generate a compilation report that shows the original MATLAB code and the associated files created
during code generation. You can also create a temporary directory where MATLAB Coder can store
the generated files. Note that, by default, the generated MEX file has the same name as the original
MATLAB function with "_mex" appended as a suffix: helperVisualSLAMCodegen_mex.
Alternatively, you can use the -o option to specify the name of the MEX file.

For code generation, you must pass Images as an input to the helperVisualSLAMCodegen
function.

cpuConfig = coder.config("mex");
cpuConfig.TargetLang = "C++";
codegen -config cpuConfig helperVisualSLAMCodegen -args {Images}

Code generation successful.

Run Generated MEX File

Use the helperVisualSLAMCodegen_mex function to find the estimated and optimized camera
poses based on Images cell array.

monoSlamOut = helperVisualSLAMCodegen_mex(Images);

Plot the estimated trajectory and actual trajectory of the camera by specifying monoSlamOut as an
input argument to the helperVisualizeMonoSlam helper function.

% Visualize the results

mapPlot = helperVisualizeMonoVisualSlam(monoSlamOut);

1-23
1 Camera Calibration and SfM Examples

% Clear up
clear helperMonoVisualSLAMCodegen_mex

Reference

[1] Sturm Jürgen, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. “A
Benchmark for the Evaluation of RGB-D SLAM Systems.” In 2012 IEEE/RSJ International Conference
on Intelligent Robots and Systems, 573–80, 2012. https://doi.org/10.1109/IROS.2012.6385773.

1-24
Build and Deploy Visual SLAM Algorithm with ROS in MATLAB

Build and Deploy Visual SLAM Algorithm with ROS in MATLAB

In this example, you implement a visual simultaneous localization and mapping (SLAM) algorithm to
estimate the camera poses for the TUM RGB-D Benchmark [1] on page 1-29 dataset. You then
generate C++ code for the visual SLAM algorithm and deploy it as a ROS node to a remote device
using MATLAB®.

This example requires MATLAB Coder™.

Prerequisities for Remote Device

The remote device to which you want to deploy the code must have the following dependencies
installed:

• OpenCV 4.5.0 — For more information about downloading the OpenCV source and building it on
your remote device, see OpenCV linux installation.
• g2o library — Download the g2o source and build it on your remote device.
• Eigen3 library — Install eigen3 library using the command $ sudo apt install libeigen3-
dev.

Connect to Remote Device

For this example, download a virtual machine (VM) using the instructions in , and then follow these
steps.

• Start the Ubuntu® virtual machine.

• On the Ubuntu desktop, click the ROS Noetic Core icon to start the ROS core on the virtual
machine.
• Specify the IP address and port number of the ROS master to MATLAB so that it can communicate
with the virtual machine. For this example, the ROS master is at the address 192.168.192.135
on port 11311.
• Start the ROS network using rosinit.

masterIP = '192.168.192.135';
rosinit(masterIP,11311)

Initializing global node /matlab_global_node_33565 with NodeURI http://192.168.192.1:60627/ and M

Set Up Data on Remote Device

This example uses TUM RGB-D Benchmark [1] on page 1-29 dataset. Download the dataset as a ROS
bag file on the remote device.

$ wget https://cvg.cit.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household

If you encounter issues playing the bag file, decompress it first.

$ rosbag decompress rgbd_dataset_freiburg3_long_office_household.bag

Implement Visual SLAM Algorithm

This example uses the monovslam object to implement visual SLAM. For each new frame added using
its addFrame object function, the monovslam object extracts and tracks features to estimate camera

1-25
1 Camera Calibration and SfM Examples

poses, identify key frames and compute the 3-D map points in the world frame. The monovslam
object also searches for loop closures using the bag-of-features algorithm and optimizes camera poses
using pose graph optimization, once a loop closure is detected. For more information on visual SLAM
pipeline, see “Monocular Visual Simultaneous Localization and Mapping” on page 1-122.

The helperROSVisualSLAM function implements the visual SLAM algorithm using these steps:

1 Create a monovslam object using known camera intrinsics.

2 Create a ROS subscriber to listen to the TUM RGB-D dataset frames published by the ROS bag
file.
3 Create a publisher to publish map points and camera poses to MATLAB.
4 For each frame published by the bag file, add it to the monovslam object using its addFrame
object function. The monovslam object then determines whether it is a key frame.
5 For each key frame, obtain the map points in the world frame and camera poses from the
monovlsam object.
6 Publish the map points and camera poses to MATLAB for visualization.

type("helperROSVisualSLAM.m")

function helperROSVisualSLAM()
% The helperROSVisualSLAM function implements the visual SLAM pipeline for
% deployment as a ROS node.
% Copyright 2023 The MathWorks, Inc.

% Create a visual SLAM object

intrinsics = cameraIntrinsics([535.4,539.2], [320.1,247.6], [480, 640]);
vslam = monovslam(intrinsics);

% Create a subscriber to read camera images

cameraSub = rossubscriber('/camera/rgb/image_color', 'sensor_msgs/Image', @(varargin)vslamNod

% Create a publisher to publish map points and camera poses to MATLAB

cameraPub = rospublisher('/visualizePoints','std_msgs/Float64MultiArray','DataFormat','struct

while 1
if hasNewKeyFrame(vslam)
msg = rosmessage('std_msgs/Float64MultiArray', 'DataFormat', 'struct');
% Get map points and camera poses
worldPoints = mapPoints(vslam);
[camPoses] = poses(vslam);

% Pack camera poses for publishing

poseSize = numel(camPoses);
transAndRots = zeros(poseSize*4,3);
for i = 0:poseSize-1
transAndRots(i*4+1,:) = camPoses(i+1).Translation;
transAndRots(i*4+2:i*4+4,:) = camPoses(i+1).R;
end

% Concatenate poses and points into one struct

allData = vertcat(transAndRots, worldPoints);
allDataSize = size(allData,1);
flattenPoints = reshape(allData,[allDataSize*3 1]);
msg.Data = flattenPoints;
msg.Layout.Dim = rosmessage('std_msgs/MultiArrayDimension', 'DataFormat', 'struct');

1-26
Build and Deploy Visual SLAM Algorithm with ROS in MATLAB

msg.Layout.Dim(end).Size = uint32(poseSize);

% Send data to MATLAB

send(cameraPub, msg);
end
end

function vslamNodeCallback(~, data)

img = rosReadImage(data);
addFrame(vslam, img);
end
end

Generate and Deploy Visual SLAM Node

Use MATLAB Coder™ to generate a ROS node for the visual SLAM algorithm defined by the
helperROSVisualSLAM function . You can then deploy this node on the remote virtual machine.
Create a MATLAB Coder configuration object that uses "Robot Operating System (ROS)"
hardware. Before remote deployment, set these configuration parameters for the Linux virtual
machine. Note that, if you are deploying to a different remote machine, you must change these to the
appropriate parameters for your device.

Note: By default, the "Build and Load" build action deploys the node to the device, but does not
automatically run it. If you want the node to run immediately after code generation, use the "Build
and Run" build action, instead.

cfg = coder.config('exe');
cfg.Hardware = coder.hardware('Robot Operating System (ROS)');
cfg.Hardware.BuildAction = 'Build and load';
cfg.Hardware.CatkinWorkspace = '~/catkin_ws';
cfg.Hardware.RemoteDeviceAddress = '192.168.192.135';
cfg.Hardware.RemoteDeviceUsername = 'user';
cfg.Hardware.RemoteDevicePassword = 'password';
cfg.Hardware.DeployTo = 'Remote Device';
codegen helperROSVisualSLAM -args {} -config cfg -std:c++11

Configure Visualization

Use the helperVisualSLAMViewer object to create a viewer that visualizes map points along with
the camera trajectory and the current camera pose.

viewer = helperVisualSLAMViewer(zeros(0,3),rigidtform3d(eye(4)));

1-27
1 Camera Calibration and SfM Examples

Create a ROS subscriber to visualize map points and camera poses published by the deployed visual
SLAM node. Assign helperVisualizePointsAndPoses function as a callback to be triggered
whenever the subscriber receives a message from the deployed node.
visualizeSub = rossubscriber('/visualizePoints', 'std_msgs/Float64MultiArray', @(varargin)helperV

Run Deployed Visual SLAM Node on Remote Device

On the Ubuntu desktop of the virtual machine, click the ROS Noetic Terminal icon. Source the
catkin workspace.
$ source ~/catkin_ws/devel/setup.bash

To help the deployed node access library dependencies, append /usr/local/lib path to the
environment variable, LD_LIBRARY_PATH.
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib

Navigate to the source directory of the deployed helperrosvisualslam node on the remote device.
You must run the node from this directory because the bag of features file used by the deployed node
is present in this directory.

1-28
Build and Deploy Visual SLAM Algorithm with ROS in MATLAB

$ cd ~/catkin_ws/src/helperrosvisualslam/src/

Run the deployed visual SLAM node on the remote device.

$ rosrun helperrosvisualslam helperROSVisualSLAM

Start playing the ROS bag file in a separate ROS Noetic Terminal.

$ rosbag play rgbd_dataset_freiburg3_long_office_household.bag

Disconnect

Disconnect from ROS Network after the nodes have finished execution.

rosshutdown

References

[1] Sturm, Jürgen, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. "A
benchmark for the evaluation of RGB-D SLAM systems". In Proceedings of IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 573-580, 2012

Helper Functions

The helperVisualizePointsAndPoses function unpacks the std_msgs/Float64MultiArray

message received from the deployed node into map points and camera poses. It then calls the
updatePlot object function of the helperVisualSLAMViewer object to update the visualization
with the new data.

function helperVisualizePointsAndPoses(~, msg,viewer)

% Unpack multi-array message based on the packing layout of translation
% and rotation values
offset = msg.Layout.Dim.Size * 4;
nSize = numel(msg.Data) / 3;
allData = reshape(msg.Data, [nSize 3]);
% Extract camera poses and map points
camPosesR = allData(1:offset, :);
xyzPoints = allData(offset+1:end, :);
% Convert camera poses to an array of rigidtform values
camPoses = [];
for i=0:msg.Layout.Dim.Size-1
if i == 0
camPoses = rigidtform3d(camPosesR(i*4+2:i*4+4,:),camPosesR(i*4+1,:));
else
camPoses(end+1) = rigidtform3d(camPosesR(i*4+2:i*4+4,:),camPosesR(i*4+1,:));
end
end

% Update the viewer plot

viewer.updatePlot(xyzPoints, camPoses);
end

1-29
1 Camera Calibration and SfM Examples

Monocular Visual-Inertial Odometry Using Factor Graph

Monocular visual-inertial odometry estimates the position and orientation of the robot using camera
and inertial measurement unit (IMU) sensor data. Camera-based state estimation is accurate during
low-speed navigation. However, camera-based estimation faces challenges such as motion blur and
track loss at higher speeds. Also monocular camera-based estimation can estimate poses at an
arbitrary scale. On the other hand, inertial navigation can handle high-speed navigation easily and
estimate poses at world scale. You can combine the advantages of both types of sensor data to
achieve better accuracy using tightly coupled factor graph optimization. For good execution time
performance only a small portion of the entire factor graph containing only most recent
measurements is optimized at every optimization step. This variant of factor graph optimization is
popularly referred to as Sliding Window or Partial Graph Optimization.

Overview

The visual-inertial system implemented in this example consists of a simplified version of the
monocular visual odometry front-end of the VINS [1 on page 1-53] algorithm and a factor graph
back-end.

The visual odometry front-end has responsibilities similar to standard structure from motion (SfM)
algorithms, such as oriented FAST and rotated BRIEF (ORB) and simultaneous localization and
mapping (SLAM). The visual odometry front-end detects and tracks key points across multiple
frames, estimates camera poses, and triangulates 3-D points using multi-view geometry. The factor
graph back-end jointly optimizes the estimated camera poses, 3-D points, IMU velocity, and bias.
Before fusing the camera and IMU measurements, you must align the camera and IMU to compute
the camera pose scale, gravity rotation, and initial IMU velocity and bias.

1-30
Monocular Visual-Inertial Odometry Using Factor Graph

Set Up

This example uses the Blackbird data set (NYC Subway Winter) to demonstrate the visual-inertial
odometry workflow. Download the data set.

data = helperDownloadData();

Fix the random seed for repeatability.

rng(0)

Initialize Algorithm Parameters

Use the helperVIOParameters function to initialize and tune these parameters:

Visual Front-End Parameters

• Random sample consensus (RANSAC) threshold (F_Threshold), confidence (F_Confidence),

and iterations (F_Iterations)
• Kanade-Lucas-Tomasi (KLT) tracker bidirectional error (KLT_BiErr), number of levels
(KLT_Levels), and block size (KLT_Block)
• Minimum parallax for key frame selection (keyFrameParallax) and triangulating new 3-D points
(triangulateParallax)
• Minimum number of key points to track in each frame (numTrackedThresh)
• Maximum number of key points to track in each frame (maxPointsToTrack)

Factor Graph Optimization Back-End Parameters

• Factor graph solver options (SolverOpts)

1-31
1 Camera Calibration and SfM Examples

• Sliding window size (windowSize) - Maximum number of recent frames or pose nodes to optimize
in the factor graph. Usually the state estimation happens incrementally. Meaning that the system
sequentially processes the frame data, like camera images and IMU measurements, at each time
step to estimate the robot pose or state at that particular time step. The factor graph optimization
can use all the sensor measurements and estimates until the current frame to refine the state
estimates. Using all the frame data in the factor graph at every optimization step produces more
accurate solutions but can very computationally extensive. To improve execution-time
performance, you can consider only a few recent frame measurements for optimization. This type
of optimization is referred to as sliding window optimization or partial graph optimization.
• Frame rate at which to run the factor graph optimization (optimizationFrequency). After
processing a fixed number of frames specified by optimizationFrequency the factor graph
optimization is used to refine the pose estimates. Calling graph optimization after every frame
produces more accurate estimates but increases execution time.

params = helperVIOParameters();
% Set to true if IMU data is available.
useIMU = true;

Initialize variables.

status = struct("firstFrame",true,"isMapInitialized",false,"isIMUAligned",false,"Mediandepth",fal
% Set to true to attempt camera-IMU alignment.
readyToAlignCameraAndIMU = false;
% Set initial scene median depth.
initialSceneMedianDepth = 4;
viewId = 0;
removedFrameIds = [];
allCameraTracks = cell(1,5000);

% Enable visualization.
vis = true;
showMatches = false;
if vis
% Figure variables.
axMatches = [];
axTraj = [];
axMap = [];
end

Set up factor graph for back-end tightly coupled factor graph optimization.

% Set up factor graph for fusion.

slidingWindowFactorGraph = factorGraph;
maxFrames = 10000;
maxLandmarks = 100000;
ids = helperGenerateNodeID(slidingWindowFactorGraph,maxFrames,maxLandmarks);
% Information matrix (measure of accuracy) associated with the camera
% projection factors that relate 3-D points and estimated camera poses.
cameraInformation = ((data.intrinsics.K(1,1)/1.5)^2)*eye(2);
% Initialize IMU parameters. The fusion accuracy depends on these parameters.
imuParams = factorIMUParameters(SampleRate=100,GyroscopeNoise=0.1, ...
GyroscopeBiasNoise=3e-3,AccelerometerNoise=0.3, ...
AccelerometerBiasNoise=1e-3,ReferenceFrame="ENU");

Create the point tracker to track key points across multiple frames.

1-32
Monocular Visual-Inertial Odometry Using Factor Graph

tracker = vision.PointTracker(MaxBidirectionalError=params.KLT_BiErr, ...

NumPyramidLevels=params.KLT_Levels,BlockSize=params.KLT_Block);

Set up the feature manager to maintain key point tracks.

fManager = helperFeaturePointManager(data.intrinsics,params,maxFrames,maxLandmarks);
% Set up the key point detector.
fManager.DetectorFunc = @(I)helperDetectKeyPoints(I);

Create an image view set to maintain frame poses.

vSet = imageviewset;

Specify the first and last frames to process from the data set. Then, process the first frame.

% IMU data is available from frame number 40 in the data set.

startFrameIdx = 40;
% Index of the last frame to process in this example. For reasonable
% example execution time, process up to only frame 1000 of the data set.
endFrameIdx = 1000;
allFrameIds = startFrameIdx:endFrameIdx;

% In the first frame, detect new key points and initialize the tracker for
% future tracking.
status.firstFrame = false;
I = data.images{startFrameIdx};
if params.Equalize
% Enhance contrast if images are dark.
I = adapthisteq(I,NumTiles=params.NumTiles,ClipLimit=params.ClipLimit);
end
if params.Undistort
% Undistort if images contain perspective distortion.
I = undistortImage(I,data.intrinsics);
end
% Assign a unique view ID for each processed camera frame or image.
viewId = viewId + 1;
currPoints = createNewFeaturePoints(fManager,I);
updateSlidingWindow(fManager,I,currPoints,true(size(currPoints,1),1),viewId);
initialize(tracker, currPoints, I);
prevI = I;
firstI = I;
vSet = addView(vSet,viewId,rigidtform3d);

Begin a loop through the entire dataset.

for curIdx = allFrameIds(2:end)

Image Preprocessing

Image preprocessing involves these steps:

• Equalize — Enhance the contrast of an image to correct for dim lighting, which can affect feature
extraction and tracking.
• Undistort — Correct for radial and tangential distortions that can impact state estimation.

% Read image data.

I = data.images{curIdx};
if params.Equalize

1-33
1 Camera Calibration and SfM Examples

% Enhance contrast if images are dark.

I = adapthisteq(I,NumTiles=params.NumTiles,ClipLimit=params.ClipLimit);
end
if params.Undistort
% Undistort if images contain perspective distortion.
I = undistortImage(I,data.intrinsics);
end
% Assign a unique view ID for each processed camera frame or image.
viewId = viewId + 1;

Feature Tracking

To compute a camera frame pose, you must calculate 2D-2D correspondences (2-D image point tracks
across multiple frames). There are several ways to estimate 2-D feature points that see the same
landmark (key point tracks), but this example uses a Kalman tracker for tracking feature points in
multiple images.

Tracks are not all accurate and can contain outliers. Tracking performance also depends on the
Kalman tracker parameters, such as bidirectional error. Even in an ideal case, you can expect some
invalid tracks, such as those due to repetitive structures. As such, outlier rejection is a critical task in
feature tracking. To reject outliers from tracks, use fundamental matrix decomposition in the feature
point manager while updating the sliding window with the latest feature point tracks.

% Track previous frame points in the current frame.

[currPoints,validIdx] = tracker(I);
if status.isMapInitialized
[prevPoints,pointIds,isTriangulated] = getKeyPointsInView(fManager,viewId-1);
end
% Update the sliding window after tracking features in the current
% frame. If the sliding window already contains maximum number of
% frames specified using windowSize, one frame with id
% removeFrameId will be removed from the window to accommodate
% space for the current frame.
[removedFrameId,windowState] = updateSlidingWindow(fManager,I,currPoints,validIdx,viewId);
if (removedFrameId > fManager.slidingWindowViewIds(1))
% Store non-key frames or removed frame IDs.
removedFrameIds(end + 1) = removedFrameId; %#ok
end

Visualize the feature point tracks between the last key frame and current frame.

if status.isMapInitialized
svIds = getSlidingWindowIds(fManager);
if length(svIds) > 2
[matches1,matches2] = get2DCorrespondensesBetweenViews(fManager,svIds(end-2),viewId);

if vis && showMatches

if isempty(axMatches) %#ok
axMatches = axes(figure); %#ok
end
% Visualize matches between the last key frame and the
% current view.
showMatchedFeatures(data.images{allFrameIds(svIds(end-2))},I,matches1,matches2, .
Parent=axMatches);
end

1-34
Monocular Visual-Inertial Odometry Using Factor Graph

end
end

Initial Structure from Motion (SfM)

The accelerometer and gyroscope readings of the IMU data contain some bias and noise. To estimate
bias values, you must obtain accurate pose estimates between the first few frames. You can achieve
this by using SfM. SfM involves these major steps:

• When there is enough parallax between the first key frame and the current frame, estimate the
relative pose between the two, using 2D-2D correspondences (key point tracks across multiple
frames).
• Triangulate 3-D points using the world poses of key frames and 2D-2D correspondences.
• Track the 3-D points in the current frame, and compute the current frame pose using 3D-2D
correspondences.
if ~status.isMapInitialized
if windowState.FirstFewViews
% Accept the first few camera views.
vSet = addView(vSet,viewId,rigidtform3d);
elseif windowState.EnoughParallax
% Estimate relative pose between the first key frame in the
% window and the current frame.
svIds = getSlidingWindowIds(fManager);
[matches1,matches2] = get2DCorrespondensesBetweenViews(fManager,svIds(end-1),svId

valRel = false(size(matches1,1),1);
for k = 1:10
[F1,valRel1] = estimateFundamentalMatrix( ...
matches1,matches2,Method="RANSAC", ...
NumTrials=params.F_Iterations,DistanceThreshold=params.F_Threshold, ...
Confidence=params.F_Confidence);
if length(find(valRel)) < length(find(valRel1))
valRel = valRel1;
F = F1;
end
end

inlierPrePoints = matches1(valRel,:);
inlierCurrPoints = matches2(valRel,:);
relPose = estrelpose(F,data.intrinsics, ...
inlierPrePoints,inlierCurrPoints);

% Get the table containing the previous camera pose.

prevPose = rigidtform3d;

% Compute the current camera pose in the global coordinate

% system relative to the first view.
currPose = relPose;

%vSet = addView(vSet,svIds(end-1),currPose);
vSet = addView(vSet,viewId,currPose);

status.isMapInitialized = true;
if vis
axisSFM = axes(figure); %#ok
showMatchedFeatures(firstI,I,matches1,matches2, ...

1-35
1 Camera Calibration and SfM Examples

Parent=axisSFM);
title(axisSFM,"Enough Parallax Between Key Frames");
end
end
else

Camera-IMU Alignment

To optimize camera and IMU measurements, you must align them by bringing them to the same base
coordinate frame and scale. Alignment primarily consists of these major tasks:

• Compute the camera pose scale to make it similar to the IMU or world scale.
• Calculate the gravity rotation required to rotate gravity vector from local navigation reference
frame of IMU to initial camera reference frame. The inverse of this rotation aligns the z-axis of the
camera with the local navigation reference frame.
• Estimate the initial IMU bias.
if ~status.isIMUAligned && readyToAlignCameraAndIMU
svIds = getSlidingWindowIds(fManager);
% Because you have not yet computed the latest frame pose,
% So use only the past few frames for alignment.
svIds = svIds(1:end-1);

1-36
Monocular Visual-Inertial Odometry Using Factor Graph

[gyro,accel] = helperExtractIMUDataBetweenViews( ...

data.gyroReadings,data.accelReadings,data.timeStamps,allFrameIds(svIds));
[xyz] = getXYZPoints(fManager,xyzIds);
% Align camera with IMU.
camPoses = poses(vSet,svIds);
[gRot,scale,info] = ...
estimateGravityRotationAndPoseScale(camPoses,gyro,accel, ...
SensorTransform=data.camToIMUTransform,IMUParameters=imuParams);
disp("Estimated scale: " + scale);

If the alignment is successful, update the camera poses, 3-D points, and add IMU factors between the
initial frames in the current sliding window.

if info.IsSolutionUsable && scale > 1e-3

status.isIMUAligned = true;
posesUpdated = poses(vSet);
% Transform camera poses to navigation frame using
% computed gravity rotation and pose scale.
[posesUpdated,xyz] = helperTransformToNavigationFrame(posesUpdated,xyz,gRot,s
vSet = updateView(vSet,posesUpdated);
% Plot the scaled and unscaled estimated trajectory against
% ground truth.
if vis
p1 = data.camToIMUTransform.transform(vertcat(camPoses.AbsolutePose.Trans
axAlign = axes(figure); %#ok
g1 = data.gTruth(allFrameIds(camPoses.ViewId),1:3);
plot3(g1(:,1),g1(:,2),g1(:,3),"g",Parent=axAlign);
hold(axAlign,"on")
plot3(scale*p1(:,1),scale*p1(:,2),scale*p1(:,3),"r",Parent=axAlign);
plot3(p1(:,1),p1(:,2),p1(:,3),"b",Parent=axAlign);
hold(axAlign,"off")
legend(axAlign,"Ground Truth","Estimated scaled trajectory","Estimated tr
title("Camera-IMU Alignment")
drawnow
end

if status.isIMUAligned
% After alignment, add IMU factors to factor graph.
for k = 1:length(gyro)
nId = [ids.pose(svIds(k)),ids.vel(svIds(k)),ids.bias(svIds(k)), ...
ids.pose(svIds(k+1)),ids.vel(svIds(k+1)),ids.bias(svIds(k+1))];
fIMU = factorIMU(nId,gyro{k},accel{k},imuParams,SensorTransform=data.
slidingWindowFactorGraph.addFactor(fIMU);
end
end

% Set camera pose node guesses and 3-D point guesses

% after alignment.
slidingWindowFactorGraph.nodeState( ...
ids.pose(svIds), ...
helperCameraPoseTableToSE3Vector( ...
poses(vSet,svIds)));
slidingWindowFactorGraph.nodeState( ...
ids.point3(xyzIds),xyz);

Estimate an initial guess for IMU bias by using factor graph optimization with the camera projection
and IMU factors.

1-37
1 Camera Calibration and SfM Examples

% Add prior to first camera pose to fix it softly during

% optimization.
fixNode(slidingWindowFactorGraph,ids.pose(svIds));
fixNode(slidingWindowFactorGraph,ids.point3(xyzIds));
% Add velocity prior to first IMU velocity node.
fVelPrior = factorVelocity3Prior(ids.vel(svIds(1)));
addFactor(slidingWindowFactorGraph,fVelPrior);

% Add bias prior to first bias node.

fBiasPrior = factorIMUBiasPrior(ids.bias(svIds(1)));
addFactor(slidingWindowFactorGraph,fBiasPrior);

% Perform visual-inertial optimization after alignment to estimate

% initial IMU bias values.
soll1 = optimize(slidingWindowFactorGraph, ...
params.SolverOpts);
fixNode(slidingWindowFactorGraph,ids.pose(svIds),false);
fixNode(slidingWindowFactorGraph,ids.point3(xyzIds),false);

fixNode(slidingWindowFactorGraph,ids.pose(svIds(1)));
soll = optimize(slidingWindowFactorGraph, ...
params.SolverOpts);
fixNode(slidingWindowFactorGraph,ids.pose(svIds(1)),false);

% Update feature manager and view set after optimization.

vSet = updateView(vSet,helperUpdateCameraPoseTable(poses(vSet,svIds), ...
slidingWindowFactorGraph.nodeState( ...
ids.pose(svIds))));
xyz = slidingWindowFactorGraph.nodeState( ...
ids.point3(xyzIds));
setXYZPoints(fManager,xyz,xyzIds);
end
end

Estimated scale: 1.8529

1-38
Monocular Visual-Inertial Odometry Using Factor Graph

IMU Pose Prediction

When IMU data is available, you can predict the world pose of the camera by integrating
accelerometer and gyroscope readings. Use factor graph optimization to further refine this
prediction.

imuGuess = false;
if status.isIMUAligned
% Extract gyro and accel reading between current image frame
% and last acquired image frame to create IMU factor.
svIds = getSlidingWindowIds(fManager);
svs = svIds((end-1):end);
[gyro,accel] = helperExtractIMUDataBetweenViews(data.gyroReadings, ...
data.accelReadings,data.timeStamps,allFrameIds(svs));
nodeID = [ids.pose(svs(1)) ...
ids.vel(svs(1)) ...
ids.bias(svs(1)) ...
ids.pose(svs(2)) ...
ids.vel(svs(2)) ...
ids.bias(svs(2))];
% Create the transformation required to trasform a camera pose
% to IMU base frame for the IMU residual computation.
fIMU = factorIMU(nodeID,gyro{1},accel{1},imuParams, ...
SensorTransform=data.camToIMUTransform);

% Add camera pose and IMU factor to graph.

slidingWindowFactorGraph.addFactor(fIMU);

1-39
1 Camera Calibration and SfM Examples

% Set velocity and bias guess.

prevP = nodeState(slidingWindowFactorGraph,ids.pose(svs(1)));
prevVel = nodeState(slidingWindowFactorGraph,ids.vel(svs(1)));
prevBias = nodeState(slidingWindowFactorGraph,ids.bias(svs(1)));
[pp,pv] = fIMU.predict(prevP,prevVel,prevBias);
imuGuess = true;
end

[currPoints,pointIds,isTriangulated] = getKeyPointsInView(fManager,viewId);
cVal = true(size(currPoints,1),1);
cTrf = find(isTriangulated);

If no IMU prediction is available, then use 3D-2D correspondences to estimate the current view pose.

if ~imuGuess
x3D = getXYZPoints(fManager,pointIds(isTriangulated));
c2D = currPoints(isTriangulated,:);
ii = false(size(x3D,1),1);
currPose = rigidtform3d;
for k = 1:params.F_loop
[currPosel,iil] = estworldpose( ...
currPoints(isTriangulated,:),x3D, ...
data.intrinsics,MaxReprojectionError=params.F_Threshold,Confidence=params.F_Confi
MaxNumTrials=params.F_Iterations);
if length(find(ii)) < length(find(iil))
ii = iil;
currPose = currPosel;
end
end
cVal(cTrf(~ii)) = false;
else

Use the IMU predicted pose as an initial guess for motion-only bundle adjustment.

x3D = getXYZPoints(fManager,pointIds(isTriangulated));
c2D = currPoints(isTriangulated,:);
[currPose,velRefined,biasRefined,ii] = helperBundleAdjustmentMotion( ...
x3D,c2D,data.intrinsics,size(I),pp,pv,prevP,prevVel,prevBias,fIMU);
slidingWindowFactorGraph.nodeState( ...
ids.vel(viewId),velRefined);
slidingWindowFactorGraph.nodeState( ...
ids.bias(viewId),biasRefined);
cVal(cTrf(~ii)) = false;
end
setKeyPointValidityInView(fManager,viewId,cVal);
vSet = addView(vSet,viewId,currPose);

Add camera projection factors related to the 3-D point tracks of the current view.

obs2 = pointIds(isTriangulated);
obs2 = obs2(ii);
fCam = factorCameraSE3AndPointXYZ( ...
[ids.pose(viewId*ones(size(obs2))) ids.point3(obs2)], ...
data.intrinsics.K,Measurement=c2D(ii,:), ...
Information=cameraInformation);
allCameraTracks{viewId} = [viewId*ones(size(obs2)) obs2 fCam.Measurement];
slidingWindowFactorGraph.addFactor(fCam);
end

1-40
Monocular Visual-Inertial Odometry Using Factor Graph

3-D Point Triangulation

When using the latest 2D-2D correspondences for camera-world pose estimation, you must frequently
create new 3-D points.

if status.isMapInitialized
[newXYZ,newXYZID,newPointViews,newPointObs] = triangulateNew3DPoints(fManager,vSet);

if vis && isempty(axMap) && windowState.WindowFull

axMap = axes(figure); %#ok
% Plot the map created by Initial SfM
helperPlotCameraPosesAndLandmarks(axMap,fManager,vSet,removedFrameIds,true);
end

Estimated Pose Refinement Using Factor Graph Optimization

Factor graph optimization reduces the error in trajectory or camera pose estimation. Various factors,
like inaccurate tracking and outliers, can contribute to estimation errors.

Graph optimization adjusts camera pose node estimates to satisfy various sensor measurement
constraints, like camera observations (3-D point projection onto an image-frame-generating 2-D
image point observation), IMU relative poses, and relative velocity change. You can categorize
optimization based on the type of factors used. The two important categories are the following.

• Bundle adjustment — Uses only camera measurements. factorCameraSE3AndPointXYZ

(Navigation Toolbox) is useful for adding camera measurement constraints to the graph.

1-41
1 Camera Calibration and SfM Examples

• Visual-inertial optimization — Along with camera measurements, add IMU measurements, like
gyroscope and accelerometer readings, to the graph by using factorIMU (Navigation Toolbox).

The visual-inertial factor graph system consists of the following node types connected using different
factors:

• Camera pose nodes at different timestamps - These are connected to other nodes using
camera projection and IMU factors. The camera pose node estimates are computed using SfM.
• 3-D point landmark nodes - These are connected to camera pose nodes using camera projection
factors. The landmark node estimates are computed using SfM.
• IMU velocity nodes at different timestamps - These are connected to other nodes using IMU
factors. The velocity node estimates are computed using IMU preintegration.
• IMU bias nodes at different timestamps - These are connected to other nodes using IMU
factors. The bias node estimates are usually result of factor graph optimization. These are
unknown before the optimization.

Each timestep contains a pose node, a velocity node, and a bias node. Each of these nodes in one time
step is connected to the pose, velocity and bias nodes of another time step using a factorIMU object.
When the UAV observes feature points using the camera, a factorCameraSE3AndPointXYZ object
relates the observed feature points to the pose node of the observed time step. This process repeats
for each new time step of the UAV flight.

Update the sliding window with the latest 3-D points and camera view pose.
newPointsTriangulated = false;
if ~isempty(newXYZ)
newPointsTriangulated = true;
% Store all new 3D-2D correspondenses.
for pIte = 1:length(newPointViews)
allCameraTracks{newPointViews(pIte)} = [allCameraTracks{newPointViews(pIte)}; new
end

1-42
Monocular Visual-Inertial Odometry Using Factor Graph

obs = vertcat(newPointObs{:});
% Create camera projection factors with the latest
% 3-D point landmark observations in the current image.
fCam = factorCameraSE3AndPointXYZ([ids.pose(obs(:,1)) ...
ids.point3(obs(:,2))],data.intrinsics.K, ...
Measurement=obs(:,3:4), ...
Information=cameraInformation);
addFactor(slidingWindowFactorGraph,fCam);
end

% Set current camera pose node state guess.

slidingWindowViewIds = getSlidingWindowIds(fManager);
currentCameraPose = poses(vSet,slidingWindowViewIds(end));
nodeState(slidingWindowFactorGraph, ...
ids.pose(slidingWindowViewIds(end)), ...
helperCameraPoseTableToSE3Vector(currentCameraPose));

Refine the estimated camera frame poses and 3-D points using factor graph optimization. The
optimization is time consuming. So, the optimization is not run after estimating the pose of each
frame. The frame frequency at which the optimization is run can be controlled using a parameter.
if helperDecideToRunGraphOptimization(curIdx,newPointsTriangulated,params)
% Use partial factor graph optimization with only the latest key
% frames, for performance.
nodeIdsToOptimize = ids.pose(slidingWindowViewIds);
xyzIds = getPointIdsInViews(fManager,slidingWindowViewIds);

% Add guess for newly triangulated 3-D point node states.

xyz = getXYZPoints(fManager,xyzIds);
slidingWindowFactorGraph.nodeState( ...
ids.point3(xyzIds),xyz);

% Fix a few nodes during graph optimization

% to fix the camera pose scale. Unfix them after optimization.
if windowState.WindowFull
fixNode(slidingWindowFactorGraph, ...
ids.pose(slidingWindowViewIds(1:11)));
else
fixNode(slidingWindowFactorGraph, ...
ids.pose(slidingWindowViewIds(1)));
end
if status.isIMUAligned
% Fix the first velocity and bias nodes in the sliding
% window.
fixNode(slidingWindowFactorGraph, ...
ids.vel(slidingWindowViewIds(1)));
fixNode(slidingWindowFactorGraph, ...
ids.bias(slidingWindowViewIds(1)));
end
% Optimize the sliding window.
optiInfo = optimize(slidingWindowFactorGraph,nodeIdsToOptimize,params.SolverOpts);
if windowState.WindowFull
fixNode(slidingWindowFactorGraph, ...
ids.pose(slidingWindowViewIds(1:11)),false);
else
fixNode(slidingWindowFactorGraph, ...
ids.pose(slidingWindowViewIds(1)),false);
end

1-43
1 Camera Calibration and SfM Examples

Update the feature manager and view set with your optimization results.

if ~status.Mediandepth
status.Mediandepth = true;
xyz = slidingWindowFactorGraph.nodeState( ...
ids.point3(xyzIds));
medianDepth = median(vecnorm(xyz.'));
[posesUpdated,xyz] = helperTransformToNavigationFrame(helperUpdateCameraPoseTable
slidingWindowFactorGraph.nodeState(ids.pose(slidingWindowViewIds))), ...
xyz,rigidtform3d,initialSceneMedianDepth/medianDepth);
% Set current camera pose node state guess.
slidingWindowFactorGraph.nodeState(ids.pose(slidingWindowViewIds), ...
helperCameraPoseTableToSE3Vector(posesUpdated));
% Add guess for newly triangulated 3-D points node states.
slidingWindowFactorGraph.nodeState( ...
ids.point3(xyzIds),xyz);
else
posesUpdated = helperUpdateCameraPoseTable(poses(vSet,slidingWindowViewIds), ...
slidingWindowFactorGraph.nodeState( ...
ids.pose(slidingWindowViewIds)));

xyz = slidingWindowFactorGraph.nodeState( ...

ids.point3(xyzIds));
end
% Update the view set after visual-inertial optimization.
vSet = updateView(vSet,posesUpdated);
setXYZPoints(fManager,xyz,xyzIds);
end

end

Add a new feature point to the Kalman tracker, in case the number of points goes below the feature
tracking threshold.

createNewFeaturePoints(fManager,I);
currPoints = getKeyPointsInView(fManager,viewId);
setPoints(tracker,currPoints);

if ~status.isIMUAligned && useIMU && status.isMapInitialized && windowState.WindowFull

% The sliding window is full and the camera and IMU are not yet aligned.
readyToAlignCameraAndIMU = true;
end

prevPrevI = prevI;
prevI = I;

Visualize the estimated trajectory.

if status.isMapInitialized && (mod(curIdx,10)==0)

if vis
if isempty(axTraj)
axTraj = helperCreateTrajectoryVisualization([-4 7 -8 3 -3 1]);
end
% Visualize the estimated trajectory.
helperVisualizeTrajectory(axTraj,fManager,vSet,removedFrameIds);
end
end
end

1-44
Monocular Visual-Inertial Odometry Using Factor Graph

Sample image of the scene.

1-45
1 Camera Calibration and SfM Examples

Plot all key frame camera poses and 3-D points. Observe the landmarks on features such as the
ceiling, floor, and pillars.

helperPlotCameraPosesAndLandmarks(axMap,fManager,vSet,removedFrameIds);

1-46
Monocular Visual-Inertial Odometry Using Factor Graph

1-47
1 Camera Calibration and SfM Examples

Compare Estimated Trajectory with Ground Truth

As a measure of accuracy, compute these metrics:

• Absolute trajectory error (ATE) - Root Mean Squared Error (RMSE) between computed camera
locations and ground truth camera locations.
• Scale error - Percentage of how far the computed median scale is to original scale.

Plot the estimated trajectory against the ground truth.

addedFrameIds = allFrameIds(vSet.Views.ViewId);
axf = axes(figure);
helperPlotAgainstGroundTruth(vSet,data.gTruth,data.camToIMUTransform, ...
addedFrameIds,axf,removedFrameIds);

1-48
Monocular Visual-Inertial Odometry Using Factor Graph

Evaluate the tracking accuracy, based on root mean square error (RMSE) and median scale error.

helperComputeErrorAgainstGroundTruth(data.gTruth,vSet,allFrameIds,removedFrameIds,data.camToIMUTr

"Absolute RMSE for key frame trajectory (m): " "0.19021"

"Percentage of median scale error: " "1.9709"

Supporting Functions

This section details the short helper functions included in this example. Larger helper functions have
been included in separate files.

helperFeaturePointManager manages key point tracks.

helperVIOParameters initializes visual-inertial odometry algorithm tunable parameters.

helperBundleAdjustmentMotion refines pose of current frame using motion-only bundle

adjustment.

helperSelectNewKeyPointsUniformly selects specified number of newly created key points a

specified distance from tracked points in current key frame.

helperCreateTrajectoryVisualization creates trajectory plot with highlighted sliding window.

helperVisualizeTrajectory updates trajectory plot with latest data stored in view set and
feature manager.

1-49
1 Camera Calibration and SfM Examples

helperPlotAgainstGroundTruth plots estimated trajectory and ground truth trajectory for visual
comparison.

helperGenerateNodeID generates unique factor graph node IDs for fixed number of camera view
poses, IMU velocities, IMU biases, and 3-D point nodes.

function ids = helperGenerateNodeID(fg,maxFrames,maxLandmarks)

% helperGenerateNodeID

ids.pose = generateNodeID(fg,[maxFrames 1]);

ids.vel = generateNodeID(fg,[maxFrames 1]);
ids.bias = generateNodeID(fg,[maxFrames 1]);
ids.point3 = generateNodeID(fg,[maxLandmarks 1]);
end

helperCameraPoseTableToSE3Vector converts pose table to N-by-7 SE(3) pose matrix.

function cameraPoses = helperCameraPoseTableToSE3Vector(cameraPoseTable)

% helperCameraPoseTableToSE3Vector converts camera pose table returned by
% poses method of imageviewset to N-by-7 SE3 pose vector format.

cameraPoses = [cat(1,cameraPoseTable.AbsolutePose.Translation) rotm2quat(cat(3,cameraPoseTable.Ab

end

helperUpdateCameraPoseTable updates pose table with latest estimated N-by-7 SE(3) poses.

function cameraPoseTableUpdated = helperUpdateCameraPoseTable(cameraPoseTable,cameraPoses)

% helperUpdateCameraPoseTable updates camera pose table with specified
% N-by-7 SE(3) camera poses.

cameraPoseTableUpdated = cameraPoseTable;
R = quat2rotm(cameraPoses(:,4:7));
for k = 1:size(cameraPoses,1)
cameraPoseTableUpdated.AbsolutePose(k).Translation = cameraPoses(k,1:3);
cameraPoseTableUpdated.AbsolutePose(k).R = R(:,:,k);
end
end

helperDetectKeyPoints detects key points.

function keyPoints = helperDetectKeyPoints(grayImage)

%helperDetectKeyPoints

% Detect multi-scale FAST corners.

keyPoints = detectORBFeatures(grayImage,ScaleFactor=1.2,NumLevels=4);
% Uncomment any of the following or try different detectors to tune
% keyPoints = detectFASTFeatures(grayImage,MinQuality=0.0786);
% keyPoints = detectMinEigenFeatures(grayImage,MinQuality=0.01,FilterSize=3);
end

helperDecideToRunGraphOptimization decides whether to run or skip graph optimization at

current frame.

function shouldOptimize = helperDecideToRunGraphOptimization(curIdx,newPointsTriangulated,params)

% helperDecideToRunGraphOptimization

% If the current frame belongs to the initial set of frames, then run graph
% optimization every frame, because the initial SfM is still running.

1-50
Monocular Visual-Inertial Odometry Using Factor Graph

% Otherwise, after a number of frames specified by optimization frequency,

% run graph optimization. Lower frequency can result in a more accurate
% estimation, but can increase execution time.
numberOfInitialFrames = 250;
shouldOptimize = (curIdx < numberOfInitialFrames) || (mod(curIdx,params.optimizationFrequency) ==
end

helperTransformToNavigationFrame transforms and scales input poses and XYZ 3-D points to
local navigation reference frame of IMU using gravity rotation and pose scale.

function [posesUpdated,xyzUpdated] = helperTransformToNavigationFrame(poses,xyz,gRot,poseScale)

% helperTransformToNavigationFrame transforms and scales the input poses and XYZ points
% using specified gravity rotation and pose scale.

posesUpdated = poses;
% Input gravity rotation transforms the gravity vector from local
% navigation reference frame to initial camera pose reference frame.
% The inverse of this transforms the poses from camera reference frame
% to local navigation reference frame.
Ai = gRot.A';
for k = 1:length(poses.AbsolutePose)
T = Ai*poses.AbsolutePose(k).A;
T(1:3,4) = poseScale*T(1:3,4);
posesUpdated.AbsolutePose(k) = rigidtform3d(T);
end
% Transform points from initial camera pose reference frame to
% local navigation reference frame of IMU.
xyzUpdated = poseScale*gRot.transformPointsInverse(xyz);
end

helperExtractIMUDataBetweenViews extracts IMU data between specified views.

function [gyro,accel] = helperExtractIMUDataBetweenViews(gyroReadings,accelReadings,timeStamps,fr

% helperExtractIMUDataBetweenViews extracts IMU Data (accelerometer and
% gyroscope readings) between specified consecutive frames.

len = length(frameIds);
gyro = cell(1,len-1);
accel = cell(1,len-1);
for k = 2:len
% Assumes the IMU data is time-synchorized with the camera data. Compute
% indices of accelerometer readings between consecutive view IDs.
[~,ind1] = min(abs(timeStamps.imuTimeStamps - timeStamps.imageTimeStamps(frameIds(k-1))));
[~,ind2] = min(abs(timeStamps.imuTimeStamps - timeStamps.imageTimeStamps(frameIds(k))));
imuIndBetweenFrames = ind1:(ind2-1);
% Extract the data at the computed indices and store in a cell.
gyro{k-1} = gyroReadings(imuIndBetweenFrames,:);
accel{k-1} = accelReadings(imuIndBetweenFrames,:);
end
end

helperPlotCameraPosesAndLandmarks plots estimated trajectory and 3-D landmarks.

function helperPlotCameraPosesAndLandmarks(axisHandle,fManager,vSet,removedFrameIds,plotCams)
% helperPlotCameraPosesAndLandmarks plots the key frame camera poses and
% triangulated 3-D point landmarks.

if nargin < 5

1-51
1 Camera Calibration and SfM Examples

% By deafult plot trajectory as a line plot. If plotCams is true the

% function uses the plotCamera utility to draw trajectory.
plotCams = false;
end

% Extract key frame camera poses from view set

vId = vSet.Views.ViewId;
kfInd = true(length(vId),1);
[~,ind] = intersect(vId,removedFrameIds);
kfInd(ind) = false;
camPoses = poses(vSet,vId(kfInd));
% Extract triangulated 3-D point landmarks
xyzPoints = getXYZPoints(fManager);
% Compute indices of nearby points
indToPlot = vecnorm(xyzPoints,2,2) < 10;

pcshow(xyzPoints(indToPlot,:),Parent=axisHandle,Projection="orthographic");
hold(axisHandle,"on")
if plotCams
c = table(camPoses.AbsolutePose,VariableNames={'AbsolutePose'});
plotCamera(c,Parent=axisHandle,Size=0.25);
title(axisHandle,"Initial Structure from Motion")
else
traj = vertcat(camPoses.AbsolutePose.Translation);
plot3(traj(:,1),traj(:,2),traj(:,3),"r-",Parent=axisHandle);
view(axisHandle,27.28,-2.81)
title(axisHandle,"Estimated Trajectory and Landmarks")
end
hold off
drawnow
end

helperComputeErrorAgainstGroundTruth computes absolute trajectory error and scale error

compared to known ground truth.

function [rmse,scaleError] = helperComputeErrorAgainstGroundTruth(gTruth,vSet,allFrameIds,removed

% helperComputeErrorAgainstGroundTruth computes the absolute trajectory
% error and scale error.

% Extract key frame camera poses from view set

vId = vSet.Views.ViewId;
kfInd = true(length(vId),1);
[~,ind] = intersect(vId,removedFrameIds);
kfInd(ind) = false;
camPoses = poses(vSet,vId(kfInd));
locations = vertcat(camPoses.AbsolutePose.Translation);
% Convert camera positions to first IMU reference frame
T = se3(camPoses.AbsolutePose(1).R*(camToIMUTransform.rotm')).inv;
locations = T.transform(locations);
% Convert ground truth to first IMU reference frame
g1 = se3(quat2rotm(gTruth(:,4:7)),gTruth(:,1:3));
g11 = (g1(1).inv)*g1;
gl = vertcat(g11.trvec);
gLocations = gl(allFrameIds(vId(kfInd)),1:3);
scale = median(vecnorm(gLocations,2,2))/median(vecnorm(locations,2,2));

rmse = sqrt(mean(sum((locations - gLocations).^2,2)));

scaleError = abs(scale-1)*100;

1-52
Monocular Visual-Inertial Odometry Using Factor Graph

disp(["Absolute RMSE for key frame trajectory (m): ",num2str(rmse)])

disp(["Percentage of median scale error: ",num2str(scaleError)])
end

helperDownloadData downloads data set from specified URL to specified output folder.

function vioData = helperDownloadData()

% helperDownloadData downloads the data set from the specified URL to the
% specified output folder.

vioDataTarFile = matlab.internal.examples.downloadSupportFile(...
'shared_nav_vision/data','BlackbirdVIOData.tar');

% Extract the file.

outputFolder = fileparts(vioDataTarFile);
if (~exist(fullfile(outputFolder,"BlackbirdVIOData"),"dir"))
untar(vioDataTarFile,outputFolder);
end

vioData = load(fullfile(outputFolder,"BlackbirdVIOData","data.mat"));
end

References

[1] Qin, Tong, Peiliang Li, and Shaojie Shen. “VINS-Mono: A Robust and Versatile Monocular Visual-
Inertial State Estimator.” IEEE Transactions on Robotics 34, no. 4 (August 2018): 1004–20. https://
doi.org/10.1109/TRO.2018.2853729

[2] Antonini, Amado, Winter Guerra, Varun Murali, Thomas Sayre-McCord, and Sertac Karaman. “The
Blackbird Dataset: A Large-Scale Dataset for UAV Perception in Aggressive Flight.” In Proceedings of
the 2018 International Symposium on Experimental Robotics, edited by Jing Xiao, Torsten Kröger, and
Oussama Khatib, 11:130–39. Cham: Springer International Publishing, 2020. https://doi.org/
10.1007/978-3-030-33950-0_12

1-53
1 Camera Calibration and SfM Examples

Visual SLAM with an RGB-D Camera

You can perform vSLAM using a monocular camera. However, the depth cannot be accurately
calculated, and the estimated trajectory is unknown and drifts over time. To produce an initial map,
which cannot be triangulated from the first frame, you must use multiple views of a monocular
camera. A better, more reliable solution is to use an RGB-D camera, which is composed of one RGB
color image and one depth image.

This example shows how to process RGB-D image data to build a map of an indoor environment and
estimate the trajectory of the camera. The example uses a version of the ORB-SLAM2 [1] algorithm,
which is feature-based and supports RGB-D cameras.

Overview of Processing Pipeline

The pipeline for RGB-D vSLAM is very similar to the monocular vSLAM pipeline in the “Monocular
Visual Simultaneous Localization and Mapping” on page 1-122 example. The major difference is that
in the Map Initialization stage, the 3-D map points are created from a pair of images consisting of
one color image and one depth image instead of two frames of color images.

• Map Initialization: The initial 3-D world points can be constructed by extracting ORB feature
points from the color image and then computing their 3-D world locations from the depth image.
The color image is stored as the first key frame.
• Tracking: Once a map is initialized, the pose of the camera is estimated for each new RGB-D
image by matching features in the color image to features in the last key frame.
• Local Mapping: If the current color image is identified as a key frame, new 3-D map points are
computed from the depth image. At this stage, bundle adjustment is used to minimize reprojection
errors by adjusting the camera pose and 3-D points.
• Loop Closure: Loops are detected for each key frame by comparing it against all previous key
frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is
optimized to refine the camera poses of all the key frames.

1-54
Visual SLAM with an RGB-D Camera

Download and Explore the Input Image Sequence

The data used in this example is from the TUM RGB-D benchmark [2]. You can download the data to a
temporary folder using a web browser or by running the following code:

baseDownloadURL = "https://cvg.cit.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_offi
dataFolder = fullfile(tempdir, 'tum_rgbd_dataset', filesep);
options = weboptions('Timeout', Inf);
tgzFileName = [dataFolder, 'fr3_office.tgz'];
folderExists = exist(dataFolder, 'dir');

% Create a folder in a temporary directory to save the downloaded file

if ~folderExists
mkdir(dataFolder);
disp('Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.')
websave(tgzFileName, baseDownloadURL, options);

% Extract contents of the downloaded file

disp('Extracting fr3_office.tgz (1.38 GB) ...')
untar(tgzFileName, dataFolder);
end

Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.

Extracting fr3_office.tgz (1.38 GB) ...

imageFolder = [dataFolder, 'rgbd_dataset_freiburg3_long_office_household/'];

Create two imageDatastore objects to store the color and depth images, respectively.

imgFolderColor = [imageFolder,'rgb/'];
imgFolderDepth = [imageFolder,'depth/'];
imdsColor = imageDatastore(imgFolderColor);
imdsDepth = imageDatastore(imgFolderDepth);

Note that the color and depth images are generated in an un-synchronized way in the dataset.
Therefore, we need to associate color images to depth images based on the time stamp.

% Load time stamp data of color images

timeColor = helperImportTimestampFile([imageFolder, 'rgb.txt']);

% Load time stamp data of depth images

timeDepth = helperImportTimestampFile([imageFolder, 'depth.txt']);

% Align the time stamp

indexPairs = helperAlignTimestamp(timeColor, timeDepth);

% Select the synchronized image data

imdsColor = subset(imdsColor, indexPairs(:, 1));
imdsDepth = subset(imdsDepth, indexPairs(:, 2));

% Inspect the first RGB-D image

currFrameIdx = 1;
currIcolor = readimage(imdsColor, currFrameIdx);
currIdepth = readimage(imdsDepth, currFrameIdx);
imshowpair(currIcolor, currIdepth, "montage");

1-55
1 Camera Calibration and SfM Examples

Map Initialization

The pipeline starts by initializing the map that holds 3-D world points. This step is crucial and has a
significant impact on the accuracy of the final SLAM result. Initial ORB feature points are extracted
from the first color image using helperDetectAndExtractFeatures on page 1-65. Their
corresponding 3-D world locations can be computed from the pixel coordinates of the feature points
and the depth value using helperReconstructFromRGBD on page 1-65.

% Set random seed for reproducibility

rng(0);

% Create a cameraIntrinsics object to store the camera intrinsic parameters.

% The intrinsics for the dataset can be found at the following page:
% https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats
focalLength = [535.4, 539.2]; % in units of pixels
principalPoint = [320.1, 247.6]; % in units of pixels
imageSize = size(currIcolor,[1,2]); % in pixels [mrows, ncols]
depthFactor = 5e3;
intrinsics = cameraIntrinsics(focalLength,principalPoint,imageSize);

% Detect and extract ORB features from the color image

scaleFactor = 1.2;
numLevels = 8;
[currFeatures, currPoints] = helperDetectAndExtractFeatures(currIcolor, scaleFactor, numLevels);

initialPose = rigidtform3d();
[xyzPoints, validIndex] = helperReconstructFromRGBD(currPoints, currIdepth, intrinsics, initialPo

Initialize Place Recognition Database

Loop detection is performed using the bags-of-words approach. A visual vocabulary represented as a
bagOfFeatures object is created offline with the ORB descriptors extracted from a large set of
images in the dataset by calling:

bag = bagOfFeatures(imds,CustomExtractor=@helperORBFeatureExtractorFunction,
TreeProperties=[5, 10], StrongestFeatures=1);

where imds is an imageDatastore object storing the training images and

helperORBFeatureExtractorFunction is the ORB feature extractor function. See “Image
Retrieval with Bag of Visual Words” on page 19-168 for more information.

The loop closure process incrementally builds a database, represented as an invertedImageIndex

object, that stores the visual word-to-image mapping based on the bag of ORB features.

% Load the bag of features data created offline

bofData = load("bagOfFeaturesDataSLAM.mat");

1-56
Visual SLAM with an RGB-D Camera

% Initialize the place recognition database

loopDatabase = invertedImageIndex(bofData.bof, SaveFeatureLocations=false);

% Add features of the first key frame to the database

currKeyFrameId = 1;
addImageFeatures(loopDatabase, currFeatures, currKeyFrameId);

Data Management and Visualization

After the map is initialized using the first pair of color and depth image, you can use imageviewset
and worldpointset to store the first key frames and the corresponding map points:

% Create an empty imageviewset object to store key frames

vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points

mapPointSet = worldpointset;

% Add the first key frame

vSetKeyFrames = addView(vSetKeyFrames, currKeyFrameId, initialPose, Points=currPoints,...
Features=currFeatures.Features);

% Add 3-D map points

[mapPointSet, rgbdMapPointsIdx] = addWorldPoints(mapPointSet, xyzPoints);

% Add observations of the map points

mapPointSet = addCorrespondences(mapPointSet, currKeyFrameId, rgbdMapPointsIdx, validIndex);

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, rgbdMapPointsIdx, vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, rgbdMapPointsIdx, vSetKeyFrames.Views);

% Visualize matched features in the first key frame

featurePlot = helperVisualizeMatchedFeaturesRGBD(currIcolor, currIdepth, currPoints(validIndex));

% Visualize initial map points and camera trajectory

xLim = [-4 4];
yLim = [-3 1];
zLim = [-1 6];
mapPlot = helperVisualizeMotionAndStructure(vSetKeyFrames, mapPointSet, xLim, yLim, zLim);

% Show legend
showLegend(mapPlot);

Tracking

The tracking process is performed using every RGB-D image and determines when to insert a new
key frame.

% ViewId of the last key frame

lastKeyFrameId = currKeyFrameId;

% Index of the last key frame in the input image sequence

lastKeyFrameIdx = currFrameIdx;

1-57
1 Camera Calibration and SfM Examples

% Indices of all the key frames in the input image sequence

addedFramesIdx = lastKeyFrameIdx;

currFrameIdx = 2;
isLoopClosed = false;

Each frame is processed as follows:

1 ORB features are extracted for each new color image and then matched (using
matchFeatures), with features in the last key frame that have known corresponding 3-D map
points.
2 Estimate the camera pose using Perspective-n-Point algorithm, which estimates the pose of a
calibrated camera given a set of 3-D points and their corresponding 2-D projections using
estworldpose.
3 Given the camera pose, project the map points observed by the last key frame into the current
frame and search for feature correspondences using matchFeaturesInRadius.
4 With 3-D to 2-D correspondences in the current frame, refine the camera pose by performing a
motion-only bundle adjustment using bundleAdjustmentMotion.
5 Project the local map points into the current frame to search for more feature correspondences
using matchFeaturesInRadius and refine the camera pose again using
bundleAdjustmentMotion.
6 The last step of tracking is to decide if the current frame should be a new key frame. A frame is a
key frame if both of the following conditions are satisfied:

• At least 20 frames have passed since the last key frame or the current frame tracks fewer than
100 map points or 25% of points tracked by the reference key frame.
• The map points tracked by the current frame are fewer than 90% of points tracked by the
reference key frame.

If the current frame is to become a key frame, continue to the Local Mapping process. Otherwise,
start Tracking for the next frame.

% Main loop
isLastFrameKeyFrame = true;
while ~isLoopClosed && currFrameIdx < numel(imdsColor.Files)

currIcolor = readimage(imdsColor, currFrameIdx);

currIdepth = readimage(imdsDepth, currFrameIdx);

[currFeatures, currPoints] = helperDetectAndExtractFeatures(currIcolor, scaleFactor, numLe

% Track the last key frame

% trackedMapPointsIdx: Indices of the map points observed in the current left frame
% trackedFeatureIdx: Indices of the corresponding feature points in the current left frame
[currPose, trackedMapPointsIdx, trackedFeatureIdx] = helperTrackLastKeyFrame(mapPointSet, ...
vSetKeyFrames.Views, currFeatures, currPoints, lastKeyFrameId, intrinsics, scaleFactor);

if isempty(currPose) || numel(trackedMapPointsIdx) < 30

currFrameIdx = currFrameIdx + 1;
continue
end

% Track the local map and check if the current frame is a key frame.
% A frame is a key frame if both of the following conditions are satisfied:

1-58
Visual SLAM with an RGB-D Camera

%
% 1. At least 20 frames have passed since the last key frame or the
% current frame tracks fewer than 100 map points.
% 2. The map points tracked by the current frame are fewer than 90% of
% points tracked by the reference key frame.
%
% localKeyFrameIds: ViewId of the connected key frames of the current frame
numSkipFrames = 20;
numPointsKeyFrame = 100;
[localKeyFrameIds, currPose, trackedMapPointsIdx, trackedFeatureIdx, isKeyFrame] = ...
helperTrackLocalMap(mapPointSet, vSetKeyFrames, trackedMapPointsIdx, ...
trackedFeatureIdx, currPose, currFeatures, currPoints, intrinsics, scaleFactor, numLevels
isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

% Visualize matched features

updatePlot(featurePlot, currIcolor, currIdepth, currPoints(trackedFeatureIdx));

if ~isKeyFrame
currFrameIdx = currFrameIdx + 1;
isLastFrameKeyFrame = false;
continue
else
% Match feature points between the stereo images and get the 3-D world positions
[xyzPoints, validIndex] = helperReconstructFromRGBD(currPoints, currIdepth, ...
intrinsics, currPose, depthFactor);

[untrackedFeatureIdx, ia] = setdiff(validIndex, trackedFeatureIdx);

xyzPoints = xyzPoints(ia, :);
isLastFrameKeyFrame = true;
end

% Update current key frame ID

currKeyFrameId = currKeyFrameId + 1;

Local Mapping

Local mapping is performed for every key frame. When a new key frame is determined, add it to the
key frames and update the attributes of the map points observed by the new key frame. To ensure
that mapPointSet contains as few outliers as possible, a valid map point must be observed in at least
3 key frames.

New map points are created by triangulating ORB feature points in the current key frame and its
connected key frames. For each unmatched feature point in the current key frame, search for a match
with other unmatched points in the connected key frames using matchFeatures. The local bundle
adjustment refines the pose of the current key frame, the poses of connected key frames, and all the
map points observed in these key frames.

% Add the new key frame

[mapPointSet, vSetKeyFrames] = helperAddNewKeyFrame(mapPointSet, vSetKeyFrames, ...
currPose, currFeatures, currPoints, trackedMapPointsIdx, trackedFeatureIdx, localKeyFrame

% Remove outlier map points that are observed in fewer than 3 key frames
if currKeyFrameId == 2
triangulatedMapPointsIdx = [];
end

1-59
1 Camera Calibration and SfM Examples

[mapPointSet, trackedMapPointsIdx] = ...

helperCullRecentMapPoints(mapPointSet, trackedMapPointsIdx, triangulatedMapPointsIdx, ...
rgbdMapPointsIdx);

% Add new map points computed from disparity

[mapPointSet, rgbdMapPointsIdx] = addWorldPoints(mapPointSet, xyzPoints);
mapPointSet = addCorrespondences(mapPointSet, currKeyFrameId, rgbdMapPointsIdx, ...
untrackedFeatureIdx);

% Create new map points by triangulation

minNumMatches = 10;
minParallax = 0.35;
[mapPointSet, vSetKeyFrames, triangulatedMapPointsIdx, rgbdMapPointsIdx] = helperCreateNewMap
mapPointSet, vSetKeyFrames, currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minPa
untrackedFeatureIdx, rgbdMapPointsIdx);

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, [triangulatedMapPointsIdx; rgbdMapPointsI
vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, [triangulatedMapPointsIdx; rgbdMapPointsI
vSetKeyFrames.Views);

% Local bundle adjustment

[mapPointSet, vSetKeyFrames, triangulatedMapPointsIdx, rgbdMapPointsIdx] = ...
helperLocalBundleAdjustmentStereo(mapPointSet, vSetKeyFrames, ...
currKeyFrameId, intrinsics, triangulatedMapPointsIdx, rgbdMapPointsIdx);

% Visualize 3-D world points and camera trajectory

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

Loop Closure

The loop closure detection step takes the current key frame processed by the local mapping process
and tries to detect and close the loop. Loop candidates are identified by querying images in the
database that are visually similar to the current key frame using evaluateImageRetrieval. A
candidate key frame is valid if it is not connected to the last key frame and three of its neighbor key
frames are loop candidates.

When a valid loop candidate is found, use estgeotform3d to compute the relative pose between the
loop candidate frame and the current key frame. The relative pose represents a 3-D rigid
transformation stored in a rigidtform3d object. Then add the loop connection with the relative
pose and update mapPointSet and vSetKeyFrames.

% Check loop closure after some key frames have been created
if currKeyFrameId > 20

% Minimum number of feature matches of loop edges

loopEdgeNumMatches = 120;

% Detect possible loop closure key frame candidates

[isDetected, validLoopCandidates] = helperCheckLoopClosure(vSetKeyFrames, currKeyFrameId,
loopDatabase, currIcolor, loopEdgeNumMatches);

if isDetected
% Add loop closure connections

1-60
Visual SLAM with an RGB-D Camera

maxDistance = 0.1;
[isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnectionsStereo(...
mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
currFeatures, currPoints, loopEdgeNumMatches, maxDistance);
end
end

% If no loop closure is detected, add current features into the database

if ~isLoopClosed
addImageFeatures(loopDatabase, currFeatures, currKeyFrameId);
end

% Update IDs and indices

lastKeyFrameId = currKeyFrameId;
lastKeyFrameIdx = currFrameIdx;
addedFramesIdx = [addedFramesIdx; currFrameIdx]; %#ok<AGROW>
currFrameIdx = currFrameIdx + 1;
end % End of main loop

Loop edge added between keyframe: 4 and 87

Loop edge added between keyframe: 3 and 87

Finally, apply pose graph optimization over the essential graph in vSetKeyFrames to correct the
drift. The essential graph is created internally by removing connections with fewer than
minNumMatches matches in the covisibility graph. After pose graph optimization, update the 3-D
locations of the map points using the optimized poses.

% Optimize the poses

minNumMatches = 50;
vSetKeyFramesOptim = optimizePoses(vSetKeyFrames, minNumMatches, Tolerance=1e-16);

% Update map points after optimizing the poses

mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim);

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

% Plot the optimized camera trajectory

optimizedPoses = poses(vSetKeyFramesOptim);

1-61
1 Camera Calibration and SfM Examples

plotOptimizedTrajectory(mapPlot, optimizedPoses)

% Update legend
showLegend(mapPlot);

Compare with the Ground Truth

You can compare the optimized camera trajectory with the ground truth to evaluate the accuracy. The
downloaded data contains a groundtruth.txt file that stores the ground truth of camera pose of
each frame. The data has been saved in the form of a MAT-file. You can also calculate the root-mean-
square-error (RMSE) of trajectory estimates.

% Load ground truth

gTruthData = load("orbslamGroundTruth.mat");
gTruth = gTruthData.gTruth;

% Plot the actual camera trajectory

plotActualTrajectory(mapPlot, gTruth(indexPairs(addedFramesIdx, 1)), optimizedPoses);

% Show legend
showLegend(mapPlot);

% Evaluate tracking accuracy

helperEstimateTrajectoryError(gTruth(indexPairs(addedFramesIdx, 1)), optimizedPoses);

1-62
Visual SLAM with an RGB-D Camera

Absolute RMSE for key frame trajectory (m): 0.15358

Dense Reconstruction from Depth Image

Given the refined camera poses, you can reproject all the valid image points in the associated depth
images back to the 3-D space to perform dense reconstruction.

% Create an array of pointCloud objects to store the world points constructed

% from the key frames
ptClouds = repmat(pointCloud(zeros(1, 3)), numel(addedFramesIdx), 1);

% Ignore image points at the boundary

offset = 40;
[X, Y] = meshgrid(offset:2:imageSize(2)-offset, offset:2:imageSize(1)-offset);

for i = 1: numel(addedFramesIdx)
Icolor = readimage(imdsColor, addedFramesIdx(i));
Idepth = readimage(imdsDepth, addedFramesIdx(i));

[xyzPoints, validIndex] = helperReconstructFromRGBD([X(:), Y(:)], ...

Idepth, intrinsics, optimizedPoses.AbsolutePose(i), depthFactor);

colors = zeros(numel(X), 1, 'like', Icolor);

for j = 1:numel(X)
colors(j, 1:3) = Icolor(Y(j), X(j), :);
end
ptClouds(i) = pointCloud(xyzPoints, Color=colors(validIndex, :));
end

% Concatenate the point clouds

pointCloudsAll = pccat(ptClouds);

figure
pcshow(pointCloudsAll,VerticalAxis="y", VerticalAxisDir="down");
xlabel('X')
ylabel('Y')
zlabel('Z')

1-63
1 Camera Calibration and SfM Examples

Supporting Functions

Short helper functions are listed below. Larger function are included in separate files.

helperImportTimestampFile Import time stamp file

function timestamp = helperImportTimestampFile(filename)

% Input handling
dataLines = [4, Inf];

%% Set up the Import Options and import the data

opts = delimitedTextImportOptions("NumVariables", 2);

% Specify range and delimiter

opts.DataLines = dataLines;
opts.Delimiter = " ";

% Specify column names and types

opts.VariableNames = ["VarName1", "Var2"];
opts.SelectedVariableNames = "VarName1";
opts.VariableTypes = ["double", "string"];

% Specify file level properties

opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
opts.ConsecutiveDelimitersRule = "join";
opts.LeadingDelimitersRule = "ignore";

% Specify variable properties

opts = setvaropts(opts, "Var2", "WhitespaceRule", "preserve");
opts = setvaropts(opts, "Var2", "EmptyFieldRule", "auto");

% Import the data

data = readtable(filename, opts);

% Convert to output type

timestamp = table2array(data);
end

helperAlignTimestamp align time stamp of color and depth images.

function indexPairs = helperAlignTimestamp(timeColor, timeDepth)

idxDepth = 1;
indexPairs = zeros(numel(timeColor), 2);
for i = 1:numel(timeColor)
for j = idxDepth : numel(timeDepth)
if abs(timeColor(i) - timeDepth(j)) < 1e-4
idxDepth = j;
indexPairs(i, :) = [i, j];
break
elseif timeDepth(j) - timeColor(i) > 1e-3
break
end
end
end
indexPairs = indexPairs(indexPairs(:,1)>0, :);
end

1-64
Visual SLAM with an RGB-D Camera

helperDetectAndExtractFeatures detect and extract and ORB features from the image.
function [features, validPoints] = helperDetectAndExtractFeatures(Irgb, scaleFactor, numLevels)

numPoints = 1000;

% Detect ORB features

Igray = rgb2gray(Irgb);

points = detectORBFeatures(Igray, ScaleFactor=scaleFactor, NumLevels=numLevels);

% Select a subset of features, uniformly distributed throughout the image

points = selectUniform(points, numPoints, size(Igray, 1:2));

% Extract features
[features, validPoints] = extractFeatures(Igray, points);
end

helperReconstructFromRGBD reconstruct scene from color and depth image.

function [xyzPoints, validIndex] = helperReconstructFromRGBD(points, ...
depthMap, intrinsics, currPose, depthFactor)

ptcloud = pcfromdepth(depthMap,depthFactor,intrinsics,ImagePoints=points, DepthRange=[0.1, 5]);

isPointValid = ~isnan(ptcloud.Location(:, 1));

xyzPoints = ptcloud.Location(isPointValid, :);
xyzPoints = transformPointsForward(currPose, xyzPoints);
validIndex = find(isPointValid);
end

helperCullRecentMapPoints cull recently added map points.

function [mapPointSet, mapPointsIdx] = ...
helperCullRecentMapPoints(mapPointSet, mapPointsIdx, newPointIdx, rgbdMapPointsIndices)
outlierIdx = setdiff([newPointIdx; rgbdMapPointsIndices], mapPointsIdx);
if ~isempty(outlierIdx)
mapPointSet = removeWorldPoints(mapPointSet, outlierIdx);
mapPointsIdx = mapPointsIdx - arrayfun(@(x) nnz(x>outlierIdx), mapPointsIdx);
end
end

helperEstimateTrajectoryError calculate the tracking error.

function rmse = helperEstimateTrajectoryError(gTruth, cameraPoses)
locations = vertcat(cameraPoses.AbsolutePose.Translation);
gLocations = vertcat(gTruth.Translation);
scale = median(vecnorm(gLocations, 2, 2))/ median(vecnorm(locations, 2, 2));
scaledLocations = locations * scale;

rmse = sqrt(mean( sum((scaledLocations - gLocations).^2, 2) ));

disp(['Absolute RMSE for key frame trajectory (m): ', num2str(rmse)]);
end

helperUpdateGlobalMap update 3-D locations of map points after pose graph optimization

1-65
1 Camera Calibration and SfM Examples

function mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim)

posesOld = vSetKeyFrames.Views.AbsolutePose;
posesNew = vSetKeyFramesOptim.Views.AbsolutePose;
positionsOld = mapPointSet.WorldPoints;
positionsNew = positionsOld;
indices = 1:mapPointSet.Count;

% Update world location of each map point based on the new absolute pose of
% the corresponding major view
for i = 1: mapPointSet.Count
majorViewIds = mapPointSet.RepresentativeViewId(i);
tform = rigidtform3d(posesNew(majorViewIds).A/posesOld(majorViewIds).A);
positionsNew(i, :) = transformPointsForward(tform, positionsOld(i, :));
end
mapPointSet = updateWorldPoints(mapPointSet, indices, positionsNew);
end

References

[1] Mur-Artal, Raul, and Juan D. Tardos. “ORB-SLAM2: An Open-Source SLAM System for Monocular,
Stereo, and RGB-D Cameras.” IEEE Transactions on Robotics 33, no. 5 (October 2017): 1255–62.

[2] Sturm, Jürgen, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. “A
Benchmark for the Evaluation of RGB-D SLAM Systems.” In 2012 IEEE/RSJ International Conference
on Intelligent Robots and Systems, 573–80, 2012.

1-66
Import Stereo Camera Parameters from ROS

Import Stereo Camera Parameters from ROS

The ROS camera calibration package estimates stereo camera parameters using the OpenCV camera
calibration tools [1]. After calibrating a stereo camera in ROS, you can export its camera parameters
to an INI file using the camera calibration parser. To use the calibrated stereo camera with Computer
Vision Toolbox™ functions, such as rectifyStereoImages, you must read the camera parameters
from the INI file and convert them into a stereoParameters object using
stereoParametersFromOpenCV.

Note: The stereoParametersFromOpenCV function supports importing stereo camera parameters

for only those pinhole camera models that use the ROS plumb-bob distortion model.

Read Stereo Camera Parameters from ROS INI File

Read the stereo camera parameters stored in stereoParams.ini using the helper function
helperReadINI.
stereoParamsINI = helperReadINI("stereoParams.ini");

Compute Baseline Parameters of Stereo Camera

The baseline parameters of a stereo camera describe the relative translation and rotation of the two
cameras in the stereo camera pair. The relative rotation and translation of camera 2 with respect to
camera 1 is required to create the stereoParameters object using
stereoParametersFromOpenCV. You can compute these from the rectification and projection
matrices read from the ROS INI file [2].

Extract the two camera parameters from the stereoParams structure.

cameraParams1 = stereoParamsINI.narrow_stereo_left;
cameraParams2 = stereoParamsINI.narrow_stereo_right;

Extract the translation of camera 2 relative to camera 1 from the last column of the projection matrix.
translationOfCamera2 = cameraParams2.projection(:,end);

The rotation of camera 2 relative to camera 1, R21, is derived from the rectification matrices of the
stereo pair R1 and R2. The rectification matrices are the rotation matrices that align the camera
coordinate system to the ideal stereo image plane such that epipolar lines in both stereo images are
parallel. Compute the rotation of camera 2 relative to camera 1 as R21= R2*R1T .

rotationOfCamera2 = cameraParams2.rectification*cameraParams1.rectification';

Create stereoParameters Object using stereoParametersFromOpenCV

Extract the intrinsic matrices and distortion coefficients of the two cameras from the stereoParams
structure.
intrinsicMatrix1 = cameraParams1.camera_matrix;
intrinsicMatrix2 = cameraParams2.camera_matrix;

distortionCoefficients1 = cameraParams1.distortion;
distortionCoefficients2 = cameraParams2.distortion;

Obtain the image size from the image field of the stereoParams structure.

1-67
1 Camera Calibration and SfM Examples

imageSize = [stereoParamsINI.image.height stereoParamsINI.image.width];

Use stereoParametersFromOpenCV to create a stereoParameters object from the ROS stereo

camera parameters.

stereoParametersObj = stereoParametersFromOpenCV(intrinsicMatrix1, ...

distortionCoefficients1,intrinsicMatrix2,distortionCoefficients2, ...
rotationOfCamera2,translationOfCamera2,imageSize);

Rectify Pair of Stereo Images

Use the imported stereo parameters with rectifyStereoImages to rectify an image pair captured
using the calibrated stereo camera.

% Load the image pair.

imageDir = fullfile(toolboxdir("vision"),"visiondata","calibration","stereo");
leftImages = imageDatastore(fullfile(imageDir,"left"));
rightImages = imageDatastore(fullfile(imageDir,"right"));
I1 = readimage(leftImages,1);
I2 = readimage(rightImages,1);

% Rectify the image pair.

[J1,J2] = rectifyStereoImages(I1,I2,stereoParametersObj,OutputView="full");

% Display the results.

figure
J = stereoAnaglyph(J1,J2);
imshow(J)

1-68
Import Stereo Camera Parameters from ROS

Supporting Functions

helperReadINI

The helperReadINI function reads the camera parameters from its input INI file that has been
exported from ROS.
function cameraParams = helperReadINI(filename)
% helperReadINI reads a ROS INI file, filename, and returns a structure with
% these fields: image, <camera_name1>, <camera_name2>. image is a
% structure describing the height and width of the image captured by the
% cameras of the stereo pair. The fields <camera_name1> and <camera_name2>
% are structures named after the camera names present in the INI file, and they contain
% these fields: camera_matrix, distortion, rectification_matrix,
% and projection_matrix. These fields are stored in the INI file with their
% values placed in a new line followed by their name.

f = fopen(filename,"r");
sectionName = '';

while ~feof(f)
% Read line from file.
line = fgetl(f);

1-69
1 Camera Calibration and SfM Examples

% Trim leading and trailing whitespaces.

line = strtrim(line);

if isempty(line) || line(1)=='#'
% Skip empty line and comments.
continue
elseif line(1) == '[' && line(end) == ']'
% Identify section names and continue reading.
sectionName = line(2:end-1);
sectionName = strrep(sectionName,'/','_');
continue
end

% Replace blankspaces with underscores to create valid MATLAB variable

% name.
name = line;
name(name == ' ') = '_';

% Read the value data in upcoming lines.

value = [];
while ~feof(f)
line = fgetl(f);
line = strtrim(line);

if isempty(line)
% A empty line indicates end of value data.
break
elseif line(1)=='#'
% Skip comment lines.
continue
end
line = str2num(line); %#ok
value = [value; line]; %#ok
end

% Store post-processed value.

if isempty(sectionName)
cameraParams.(name) = value;
else
cameraParams.(sectionName).(name) = value;
end
end

fclose(f);
end

References

[1] http://wiki.ros.org/camera_calibration

[2] http://docs.ros.org/en/melodic/api/sensor_msgs/html/msg/CameraInfo.html

1-70
Import Camera Intrinsic Parameters from ROS

Import Camera Intrinsic Parameters from ROS

The ROS camera calibration package estimates camera intrinsic parameters using the OpenCV
camera calibration tools [1]. After calibrating a camera in ROS, you can import its intrinsic
parameters to a YAML file using the camera calibration parser in ROS. To use the calibrated camera
with Computer Vision Toolbox™ functions, such as undistortImage, you must read the camera
parameters from the YAML file and then convert them into a cameraIntrinsics object using
cameraIntrinsicsFromOpenCV.

Note: The cameraIntrinsicsFromOpenCV function supports importing camera intrinsic

parameters for only those pinhole camera models that use the ROS plumb-bob distortion model.

Read Camera Intrinsic Parameters from a ROS YAML File

Read the camera parameters stored in cameraParams.yaml using the helper function
helperReadYAML.

intrinsicsParams = helperReadYAML('cameraParams.yaml');

Create cameraIntrinsics Object Using cameraIntrinsicsFromOpenCV

Use the cameraIntrinsicsFromOpenCV function to create a cameraIntrinsics object from the

camera matrix and the distortion coefficients.

imageSize = [intrinsicsParams.image_height intrinsicsParams.image_width];

intrinsicMatrix = intrinsicsParams.camera_matrix;
distortionCoefficients = intrinsicsParams.distortion_coefficients;

intrinsicsObj = cameraIntrinsicsFromOpenCV(intrinsicMatrix,distortionCoefficients,imageSize);

Undistort Image

Use the imported camera intrinsics with undistortImage to undistort an image captured using the
calibrated camera.

% Load the captured image.

imageName = fullfile(toolboxdir('vision'),'visiondata','calibration','stereo','left','left01.png'
I = imread(imageName);

% Undistort the image.

J = undistortImage(I,intrinsicsObj,'OutputView','full');

% Display the result.

figure
montage({I,J})

1-71
1 Camera Calibration and SfM Examples

Supporting Functions

helperReadYAML

The helperReadYAML function reads the monocular camera parameters from the input YAML file
that was exported from ROS.
function cameraParams = helperReadYAML(filename)
% helperReadYAML reads a ROS YAML file, filename, and returns a structure
% with these fields: image_width, image_height, camera_name,
% camera_matrix, distortion_model, distortion_coefficients,
% rectification_matrix, and projection_matrix. These fields are stored
% in the YAML file colon separated from their values in different lines.

f = fopen(filename,'r');
stringFields = {'camera_name','distortion_model'};

while ~feof(f)

[name,value,isEmptyLine] = helperReadYAMLLine(f);
if isEmptyLine
continue
end

if ~isempty(value)
% Convert all values to numbers except for known string
% fields.
if ~any(contains(name, stringFields))
value = str2num(value); %#ok
end
else
% An empty value in ROS YAML files indicates a matrix in
% upcoming lines. Read the matrix from the upcoming lines.
value = helperReadYAMLMatrix(f);
end

% Store post-processed value.

cameraParams.(name) = value;

1-72
Import Camera Intrinsic Parameters from ROS

end

fclose(f);
end

helperReadYAMLMatrix

The helperReadYAMLMatrix function reads the rows, columns and data fields of a matrix in the
ROS YAML file.
function matrix = helperReadYAMLMatrix(f)
% helperReadYAMLMatrix reads a matrix from the ROS YAML file. A matrix in
% a ROS YAML file has three fields: rows, columns and data. rows and col
% describe the matrix size. data is a continguous array of the matrix
% elements in row-major order. This helper function assumes the presence
% of all three fields of a matrix to return the correct matrix.

numRows = 0;
numCols = 0;
data = [];

% Read numRows, numCols and matrix data.

while ~feof(f)
[name,value,isEmptyLine] = helperReadYAMLLine(f);

if isEmptyLine
continue
end

switch name
case 'rows'
numRows = str2num(value); %#ok
case 'cols'
numCols = str2num(value); %#ok
case 'data'
data = str2num(value); %#ok

% Terminate the while loop as data is the last

% field of a matrix in the ROS YAML file.
break
otherwise
% Terminate the while loop if any other field is
% encountered.
break
end
end

if numel(data) == numRows*numCols
% Reshape the matrix using row-major order.
matrix = reshape(data,[numCols numRows])';
end
end

helperReadYAMLLine

The helperReadYAMLLine function reads a line of a ROS YAML file.

function [name,value,isEmptyLine] = helperReadYAMLLine(f)

1-73
1 Camera Calibration and SfM Examples

% Read line from file.

line = fgetl(f);

% Trim leading and trailing whitespaces.

line = strtrim(line);

if isempty(line) || line(1)=='#'
% Empty line or comment.
name = '';
value = '';
isEmptyLine = true;
else
% Split the line to get name and value.
c = strsplit(line,':');
assert(length(c)==2,'Unexpected file format')

name = c{1};
value = strtrim(c{2}); % Trim leading whitespace.
isEmptyLine = false;
end
end

References

[1] http://wiki.ros.org/camera_calibration

1-74
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

Develop Visual SLAM Algorithm Using Unreal Engine

Simulation

This example shows how to develop a visual Simultaneous Localization and Mapping (SLAM)
algorithm using image data obtained from the Unreal Engine® simulation environment.

Visual SLAM is the process of calculating the position and orientation of a camera with respect to its
surroundings while simultaneously mapping the environment. Developing a visual SLAM algorithm
and evaluating its performance in varying conditions is a challenging task. One of the biggest
challenges is generating the ground truth of the camera sensor, especially in outdoor environments.
The use of simulation enables testing under a variety of scenarios and camera configurations while
providing precise ground truth.

This example demonstrates the use of Unreal Engine simulation to develop a visual SLAM algorithm
for either a monocular or a stereo camera in a parking scenario. For more information about the
implementation of the visual SLAM pipelines, see the “Monocular Visual Simultaneous Localization
and Mapping” on page 1-122 example and the “Stereo Visual Simultaneous Localization and
Mapping” on page 1-153 example.

Set Up Scenario in Simulation Environment

Use the Simulation 3D Scene Configuration block to set up the simulation environment. Select the
built-in Large Parking Lot scene, which contains several parked vehicles. The visual SLAM algorithm
matches features across consecutive images. To increase the number of potential feature matches,
you can use the Parked Vehicles subsystem to add more parked vehicles to the scene. To specify the
parking poses of the vehicles, use the helperAddParkedVehicle function. If you select a more
natural scene, the presence of additional vehicles is not necessary. Natural scenes usually have
enough texture and feature variety suitable for feature matching.

You can follow the “Select Waypoints for Unreal Engine Simulation” (Automated Driving Toolbox)
example to interactively select a sequence of parking locations. You can use the same approach to
select a sequence of waypoints and generate a reference trajectory for the ego vehicle. This example
uses a recorded reference trajectory and parked vehicle locations.

% Load reference path

data = load("parkingLotReferenceData.mat");

% Set reference trajectory of the ego vehicle

refPosesX = data.refPosesX;
refPosesY = data.refPosesY;
refPosesT = data.refPosesT;

% Set poses of the parked vehicles

parkedPoses = data.parkedPoses;

% Display the reference path and the parked vehicle locations

sceneName = "LargeParkingLot";
hScene = figure;
helperShowSceneImage(sceneName);
hold on
plot(refPosesX(:,2), refPosesY(:,2), LineWidth=2, DisplayName='Reference Path');
scatter(parkedPoses(:,1), parkedPoses(:,2), [], 'filled', DisplayName='Parked Vehicles');
xlim([-60 40])

1-75
1 Camera Calibration and SfM Examples

ylim([10 60])
hScene.Position = [100, 100, 1000, 500]; % Resize figure
legend
hold off

Open the model and add parked vehicles

modelName = 'GenerateImageDataOfParkingLot';
open_system(modelName);

1-76
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

helperAddParkedVehicles(modelName, parkedPoses);

1-77
1 Camera Calibration and SfM Examples

Set Up Ego Vehicle and Camera Sensor

Set up the ego vehicle moving along the specified reference path by using the Simulation 3D Vehicle
with Ground Following block. The Camera Variant Subsystem contains two configurations of camera
sensors: monocular and stereo. In both configurations, the camera is mounted on the vehicle roof
center. You can use the Camera Calibrator or Stereo Camera Calibrator app to estimate intrinsics of
the actual camera that you want to simulate. This example shows the monocular camera workflow
first followed by the stereo camera workflow.

% Select monocular camera

useMonoCamera = 1;

% Inspect the monocular camera

open_system([modelName, '/Camera/Monocular']);

% Camera intrinsics
focalLength = [700, 700]; % specified in units of pixels
principalPoint = [600, 180]; % in pixels [x, y]

1-78
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

imageSize = [370, 1230]; % in pixels [mrows, ncols]

intrinsics = cameraIntrinsics(focalLength, principalPoint, imageSize);

Visualize and Record Sensor Data

Run the simulation to visualize and record sensor data. Use the Video Viewer block to visualize the
image output from the camera sensor. Use the To Workspace block to record the ground truth
location and orientation of the camera sensor.
close(hScene)

if ~ispc
error("Unreal Engine Simulation is supported only on Microsoft" + char(174) + " Windows" + ch
end

% Run simulation
simOut = sim(modelName);

% Extract camera images as an imageDatastore

imds = helperGetCameraImages(simOut);

1-79
1 Camera Calibration and SfM Examples

% Extract ground truth as an array of rigidtform3d objects

gTruth = helperGetSensorGroundTruth(simOut);

Develop Monocular Visual SLAM Algorithm Using Recorded Data

Use the images to evaluate the monocular visual SLAM algorithm. The function helperVisualSLAM
implements the monocular ORB-SLAM pipeline:

• Map Initialization: ORB-SLAM starts by initializing the map of 3-D points from two images. Use
estrelpose to compute the relative pose based on 2-D ORB feature correspondences and
triangulate to compute the 3-D map points. The two frames are stored in an imageviewset
object as key frames. The 3-D map points and their correspondences to the key frames are stored
in a worldpointset object.
• Tracking: Once a map is initialized, for each new image, the function
helperTrackLastKeyFrame estimates the camera pose by matching features in the current
frame to features in the last key frame. The function helperTrackLocalMap refines the
estimated camera pose by tracking the local map.
• Local Mapping: The current frame is used to create new 3-D map points if it is identified as a key
frame. At this stage, bundleAdjustment is used to minimize reprojection errors by adjusting the
camera pose and 3-D points.
• Loop Closure: Loops are detected for each key frame by comparing it against all previous key
frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is
optimized to refine the camera poses of all the key frames using the optimizePoseGraph
(Navigation Toolbox) function.

For the implementation details of the algorithm, see the “Monocular Visual Simultaneous Localization
and Mapping” on page 1-122 example.
[mapPlot, optimizedPoses, addedFramesIdx] = helperVisualSLAM(imds, intrinsics);

Map initialized with frame 1 and frame 4

1-80
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

Loop edge added between keyframe: 4 and 97

Evaluate Against Ground Truth

You can evaluate the optimized camera trajectory against the ground truth obtained from the
simulation. Since the images are generated from a monocular camera, the trajectory of the camera
can only be recovered up to an unknown scale factor. You can approximately compute the scale factor
from the ground truth, thus simulating what you would normally obtain from an external sensor.

% Plot the camera ground truth trajectory

scaledTrajectory = plotActualTrajectory(mapPlot, gTruth(addedFramesIdx), optimizedPoses);

% Show legend
showLegend(mapPlot);

1-81
1 Camera Calibration and SfM Examples

You can also calculate the root mean square error (RMSE) of trajectory estimates.

helperEstimateTrajectoryError(gTruth(addedFramesIdx), scaledTrajectory);

Absolute RMSE for key frame trajectory (m): 1.9101

Stereo Visual SLAM Algorithm

In a monocular visual SLAM algorithm, depth cannot be accurately determined using a single
camera. The scale of the map and of the estimated trajectory is unknown and drifts over time.
Additionally, because map points often cannot be triangulated from the first frame, bootstrapping the
system requires multiple views to produce an initial map. Using a stereo camera solves these
problems and provides a more reliable visual SLAM solution. The function
helperVisualSLAMStereo implements the stereo visual SLAM pipeline. The key difference from
the monocular pipeline is that at the map initialization stage, the stereo pipeline creates 3-D map
points from a pair of stereo images of the same frame, instead of creating them from two images of
different frames. For the implementation details of the algorithm, see the “Stereo Visual
Simultaneous Localization and Mapping” on page 1-153 example.

% Select stereo camera

useMonoCamera = 0;

% Inspect the stereo camera

open_system([modelName, '/Camera/Stereo']);
snapnow;

% Set stereo camera baseline

baseline = 0.5; % In meters

% Construct the reprojection matrix for 3-D reconstruction

1-82
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

reprojectionMatrix = [1, 0, 0, -principalPoint(1);

0, 1, 0, -principalPoint(2);
0, 0, 0, focalLength(1);
0, 0, 1/baseline, 0];

% Maximum disparity in stereo images

maxDisparity = 48;

% Run simulation
simOut = sim(modelName);

snapnow;

1-83
1 Camera Calibration and SfM Examples

Extract Stereo Images

[imdsLeft, imdsRight] = helperGetCameraImagesStereo(simOut);

% Extract ground truth as an array of rigidtform3d objects

gTruth = helperGetSensorGroundTruth(simOut);

Run the stereo visual SLAM algorithm

[mapPlot, optimizedPoses, addedFramesIdx] = helperVisualSLAMStereo(imdsLeft, imdsRight, intrinsic

Loop edge added between keyframe: 2 and 92

% Plot the camera ground truth trajectory

optimizedTrajectory = plotActualTrajectory(mapPlot, gTruth(addedFramesIdx));

% Show legend
showLegend(mapPlot);

1-84
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

% Calculate the root mean square error (RMSE) of trajectory estimates

helperEstimateTrajectoryError(gTruth(addedFramesIdx), optimizedTrajectory);

Absolute RMSE for key frame trajectory (m): 0.27679

Compared with the monocular visual SLAM algorithm, the stereo visual SLAM algorithm produces a
more accurate estimation of the camera trajectory.

Dense Reconstruction from Stereo Images

Given the refined camera poses, you can perform dense reconstruction from the stereo images
corresponding to the key frames.

pointCloudsAll = helperDenseReconstructFromStereo(imdsLeft, imdsRight, ...

imageSize, addedFramesIdx, optimizedPoses, maxDisparity, reprojectionMatrix);

% Visualize the scene

figure(Position=[1100 600 1000 500]);
ax = pcshow(pointCloudsAll,VerticalAxis="y", VerticalAxisDir="down");
xlabel('X')
ylabel('Y')
zlabel('Z')

% Display bird's eye view of the parking lot

view(ax, [0 -1 0]);
camroll(ax, 90);

Close model and figures.

close_system(modelName, 0);
close all

1-85
1 Camera Calibration and SfM Examples

Supporting Functions

helperGetCameraImages Get camera output

function imds = helperGetCameraImages(simOut)

% Save image data to a temporary folder
dataFolder = fullfile(tempdir, 'parkingLotImages', filesep);
folderExists = exist(dataFolder, 'dir');
if ~folderExists
mkdir(dataFolder);
end

files = dir(dataFolder);
if numel(files) < 3
numFrames = numel(simOut.images.Time);
for i = 3:numFrames % Ignore the first two frames
img = squeeze(simOut.images.Data(:,:,:,i));
imwrite(img, [dataFolder, sprintf('%04d', i-2), '.png'])
end
end

% Create an imageDatastore object to store all the images

imds = imageDatastore(dataFolder);
end

helperGetCameraImagesStereo Get camera output

function [imdsLeft, imdsRight] = helperGetCameraImagesStereo(simOut)

% Save image data to a temporary folder
dataFolderLeft = fullfile(tempdir, 'parkingLotStereoImages', filesep, 'left', filesep);
dataFolderRight = fullfile(tempdir, 'parkingLotStereoImages', filesep, 'right', filesep);
folderExists = exist(dataFolderLeft, 'dir');
if ~folderExists
mkdir(dataFolderLeft);
mkdir(dataFolderRight);
end

files = dir(dataFolderLeft);
if numel(files) < 3
numFrames = numel(simOut.imagesLeft.Time);
for i = 3:numFrames % Ignore the first two frames
imgLeft = squeeze(simOut.imagesLeft.Data(:,:,:,i));
imwrite(imgLeft, [dataFolderLeft, sprintf('%04d', i-2), '.png'])

imgRight = squeeze(simOut.imagesRight.Data(:,:,:,i));
imwrite(imgRight, [dataFolderRight, sprintf('%04d', i-2), '.png'])
end
end

% Use imageDatastore objects to store the stereo images

imdsLeft = imageDatastore(dataFolderLeft);
imdsRight = imageDatastore(dataFolderRight);
end

helperGetSensorGroundTruth Save the sensor ground truth

function gTruth = helperGetSensorGroundTruth(simOut)

numFrames = numel(simOut.location.Time);

1-86
Develop Visual SLAM Algorithm Using Unreal Engine Simulation

gTruth = repmat(rigidtform3d, numFrames-2, 1);

for i = 1:numFrames-2 % Ignore the first two frames
gTruth(i).Translation = squeeze(simOut.location.Data(:, :, i+2));
% Ignore the roll and the pitch rotations since the ground is flat
yaw = double(simOut.orientation.Data(:, 3, i+2));
gTruth(i).R = [cos(yaw), -sin(yaw), 0; ...
sin(yaw), cos(yaw), 0; ...
0, 0, 1];
end
end

helperEstimateTrajectoryError Calculate the tracking error

function rmse = helperEstimateTrajectoryError(gTruth, scaledLocations)

gLocations = vertcat(gTruth.Translation);

rmse = sqrt(mean( sum((scaledLocations - gLocations).^2, 2) ));

disp(['Absolute RMSE for key frame trajectory (m): ', num2str(rmse)]);
end

helperDenseReconstructFromStereo Perform dense reconstruction from stereo images with

known camera poses

function pointCloudsAll = helperDenseReconstructFromStereo(imdsLeft, imdsRight, ...

imageSize, addedFramesIdx, optimizedPoses, maxDisparity, reprojectionMatrix)

ptClouds = repmat(pointCloud(zeros(1, 3)), numel(addedFramesIdx), 1);

for i = 1: numel(addedFramesIdx)
I1 = readimage(imdsLeft, addedFramesIdx(i));
I2 = readimage(imdsRight, addedFramesIdx(i));
disparityMap = disparitySGM(im2gray(I1), im2gray(I2), DisparityRange=[0, maxDisparity],Unique
xyzPoints = reconstructScene(disparityMap, reprojectionMatrix);

% Ignore the upper half of the images which mainly show the sky
xyzPoints(1:100, :, :) = NaN;

xyzPoints = reshape(xyzPoints, [imageSize(1)*imageSize(2), 3]);

validIndex = xyzPoints(:, 3) > 0 & xyzPoints(:, 3) < 40/reprojectionMatrix(4, 3);

xyzPoints = xyzPoints(validIndex, :);

colors = reshape(I1, [imageSize(1)*imageSize(2), 3]);
colors = colors(validIndex, :);

currPose = optimizedPoses.AbsolutePose(i);
xyzPoints = transformPointsForward(currPose, xyzPoints);
ptCloud = pointCloud(xyzPoints, Color=colors);
ptClouds(i) = pcdownsample(ptCloud, random=0.2);
end

% Concatenate the point clouds

pointCloudsAll = pccat(ptClouds);
end

1-87
1 Camera Calibration and SfM Examples

Visual Localization in a Parking Lot

This example shows how to develop a visual localization system using synthetic image data from the
Unreal Engine® simulation environment.

It is a challenging task to obtain ground truth for evaluating the performance of a localization
algorithm in different conditions. Virtual simulation in different scenarios is a cost-effective method to
obtain the ground truth in comparison with more expensive approaches such as using high-precision
inertial navigation systems or differential GPS. The use of simulation enables testing under a variety
of scenarios and sensor configurations. It also enables a rapid algorithm development, and provides
precise ground truth.

This example uses the Unreal Engine simulation environment from Epic Games® to develop and
evaluate a visual localization algorithm in a parking lot scenario.

Overview

Visual localization is the process of estimating the camera pose for a captured image relative to a
visual representation of a known scene. It is a key technology for applications such as augmented
reality, robotics, and automated driving. Compared with a “Implement Visual SLAM in MATLAB” on
page 13-8, visual localization assumes that a map of the environment is known and does not
require 3-D reconstruction or loop closure detection. The pipeline of visual localization includes the
following:

• Map Loading: Load the pre-built map 3-D map containing world point positions and the 3-D to 2-
D correspondences between the map points and the key frames. Additionally. for each key frame,
load the feature descriptors corresponding to the 3-D map points.
• Global Initialization: Extract features from the first image frame and match them with the
features corresponding to all the 3-D map points. After getting the 3-D to 2-D correspondences,
estimate the camera pose of the first frame in the world coordinate by solving a Perspective-n-
Point (PnP) problem. Refine the pose using motion-only bundle adjustment. The key frame that
shares the most covisible 3-D map points with the first frame is identified as the reference key
frame.
• Tracking: Once the first frame is localized, for each new frame, match features in the new frame
with features in the reference key frame that have known 3-D world points. Estimate and refine
the camera pose using the same approach as in Global Initialization step. The camera pose can be
further refined by tracking the features associated with nearby key frames.

Create Scene

Guiding a vehicle into a parking spot is a challenging maneuver that relies on accurate localization.
The VisualLocalizationInAParkingLot model simulates a visual localization system in the
parking lot scenario used in the “Develop Visual SLAM Algorithm Using Unreal Engine Simulation”
(Automated Driving Toolbox) example.

• The Simulation 3D Scene Configuration (Automated Driving Toolbox) block sets up the Large
Parking Lot scene. The Parked Vehicles subsystem adds parked cars into the parking lot.
• The Simulation 3D Vehicle with Ground Following (Automated Driving Toolbox) block controls the
motion of the ego vehicle.
• The Simulation 3D Camera (Automated Driving Toolbox)block models a monocular camera fixed at
the center of the vehicle's roof. You can use the Camera Calibrator app to estimate intrinsics of
the actual camera that you want to simulate.

1-88
Visual Localization in a Parking Lot

• The Helper Visual Localization MATLAB System block implements the visual localization
algorithm. The initial camera pose with respect to the map is estimated using the
helperGlobalInitialization function. The subsequent camera poses are estimated using the
helperTrackingRefKeyFrame function and refined using the helperTrackLocalKeyFrames
function. This block also provides a visualization of the estimated camera trajectory in the pre-
built map. You can specify the pre-built map data and the camera intrinsic parameters in the block
dialog.

% Open the model

modelName = 'VisualLocalizationInAParkingLot';
open_system(modelName);

1-89
1 Camera Calibration and SfM Examples

Load Map Data

The pre-built map data is generated using the stereo camera in the “Develop Visual SLAM Algorithm
Using Unreal Engine Simulation” (Automated Driving Toolbox) example. The data consists of three
objects that are commonly used to manage image and map data for visual SLAM:

• vSetKeyFrame: an imageviewset object storing the camera poses of key frames and the
associated feature points for each 3-D map point in mapPointSet.
• mapPointSet: a worldpointset object storing the 3-D map point locations and the
correspondences between the 3-D points and 2-D feature points across key frames. The 3-D map
points provide a sparse representation of the environment.

% Load pre-built map data

mapData = load("prebuiltMapData.mat")

mapData = struct with fields:

vSetKeyFrames: [1×1 imageviewset]
mapPointSet: [1×1 worldpointset]

1-90
Visual Localization in a Parking Lot

Set Up Ego Vehicle and Camera Sensor

You can follow the “Select Waypoints for Unreal Engine Simulation” (Automated Driving Toolbox)
example to select a sequence of waypoints and generate a reference trajectory for the ego vehicle.
This example uses a recorded reference trajectory.

% Load reference path

refPosesData = load('parkingLotLocalizationData.mat');

% Set reference trajectory of the ego vehicle

refPosesX = refPosesData.refPosesX;
refPosesY = refPosesData.refPosesY;
refPosesT = refPosesData.refPosesT;

% Set camera intrinsics

focalLength = [700, 700]; % specified in units of pixels
principalPoint = [600, 180]; % in pixels [x, y]
imageSize = [370, 1230]; % in pixels [mrows, ncols]

Run Simulation

Run the simulation and visualize the estimated camera trajectory in the pre-built map. The white
points represent the tracked 3-D map points in the current frame. You can compare the estimated
trajectory with the ground truth provided by the Simulation 3D Camera block to evaluate the
localization accuracy.

if ~ispc
error("Unreal Engine Simulation is supported only on Microsoft" + char(174) + " Windows" + ch
end

% Open video viewer to examine camera images

open_system([modelName, '/Video Viewer']);

% Run simulation
sim(modelName);

1-91
1 Camera Calibration and SfM Examples

1-92
Visual Localization in a Parking Lot

Close the model.

close_system([modelName, '/Video Viewer']);

close_system(modelName, 0);

Conclusion

With this setup, you can rapidly iterate over different scenarios, sensor configurations, or reference
trajectories and refine the visual localization algorithm before moving to real-world testing.

• To select a different scenario, use the Simulation 3D Scene Configuration (Automated Driving
Toolbox) block. Choose from the existing prebuilt scenes or create a custom scene in the Unreal®
Editor.
• To create a different reference trajectory, use the helperSelectSceneWaypoints tool, as shown
in the “Select Waypoints for Unreal Engine Simulation” (Automated Driving Toolbox) example.
• To alter the sensor configuration use the Simulation 3D Camera (Automated Driving
Toolbox)block. The Mounting tab provides options for specifying different sensor mounting
placements. The Parameters tab provides options for modifying sensor parameters such as
detection range, field of view, and resolution. You can also use the Simulation 3D Fisheye Camera
(Automated Driving Toolbox) block which provides a larger field of view.

1-93
1 Camera Calibration and SfM Examples

Stereo Visual SLAM for UAV Navigation in 3D Simulation

This example demonstrates the use of Unreal Engine® simulation to develop a visual SLAM
algorithm for a UAV equipped with a stereo camera in a city block scenario. For more information
about the implementation of the visual SLAM pipeline for a stereo camera [1] on page 1-99, see the
“Stereo Visual Simultaneous Localization and Mapping” on page 1-153 example.

Set Up Simulation Environment

First, set up a scenario in the simulation environment that can be used to test the visual SLAM
algorithm. Use a scene depicting a typical city block with a UAV as the vehicle under test.

Next, select a trajectory for the UAV to follow in the scene. You can follow the “Select Waypoints for
Unreal Engine Simulation” (Automated Driving Toolbox) example to interactively select a sequence of
waypoints and then use the helperSelectSceneWaypoints function to generate a reference
trajectory for the UAV. This example uses a recorded reference trajectory as shown below:

% Load reference path

data = load("uavStereoSLAMData.mat");

pos = data.pos; % Position

orientEuler = data.orientEuler; % Orientation

1-94
Stereo Visual SLAM for UAV Navigation in 3D Simulation

The UAVVisualSLAMIn3DSimulation Simulink® model is configured with the US City Block scene
using the Simulation 3D Scene Configuration (UAV Toolbox) block. The model places a UAV on the
scene using the Simulation 3D UAV Vehicle (UAV Toolbox) block. A stereo camera consisting of two
Simulation 3D Camera (UAV Toolbox) blocks is attached to the UAV. In the dialog box of the
Simulation 3D Camera (UAV Toolbox) block, use the Mounting tab to adjust the placement of the
camera. Use the Parameters tab to configure properties of the camera to simulate different cameras.
To estimate the intrinsics of the stereo camera that you want to simulate, use the “Using the Stereo
Camera Calibrator App” on page 18-35 app.

% Stereo camera parameters

focalLength = [1109, 1109]; % In pixels
principalPoint = [640, 360]; % In pixels [x, y]
imageSize = [720, 1280]; % In pixels [mrows, ncols]
baseline = 0.5; % In meters

% Open the model

1-95
1 Camera Calibration and SfM Examples

modelName = 'UAVVisualSLAMIn3DSimulation';
open_system(modelName);

Implement the Stereo Visual SLAM Algorithm

The Helper Stereo Visual SLAM System block implements the stereo visual SLAM pipeline, consisting
of the following steps:

• Map Initialization: The pipeline starts by initializing the map of 3-D points from a pair of images
generated from the stereo camera using the disparity map. The left image is stored as the first key
frame.
• Tracking: Once a map is initialized, for each new stereo pair, the pose of the camera is estimated
by matching features in the left image to features in the last key frame. The estimated camera
pose is refined by tracking the local map.
• Local Mapping: If the current left image is identified as a key frame, new 3-D map points are
computed from the disparity of the stereo pair. At this stage, bundle adjustment is used to
minimize reprojection errors by adjusting the camera pose and 3-D points.

1-96
Stereo Visual SLAM for UAV Navigation in 3D Simulation

• Loop Closure: Loops are detected for each key frame by comparing it against all previous key
frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is
optimized to refine the camera poses of all the key frames.

For the implementation details of the algorithm, see the “Stereo Visual Simultaneous Localization and
Mapping” on page 1-153 example.

Run Stereo Visual SLAM Simulation

Simulate the model and visualize the results. The Video Viewer block displays the stereo image
output. The Point Cloud Player displays the reconstructed 3-D map with the estimated camera
trajectory.

if ~ispc
error("Unreal Engine Simulation is supported only on Microsoft" + char(174) + " Windows" + ch
end

% Set the random seed to get consistent results

rng(0);

% Run simulation
sim(modelName);

1-97
1 Camera Calibration and SfM Examples

Loop edge added between keyframe: 2 and 91

Close the model.

close_system(modelName);

1-98
Stereo Visual SLAM for UAV Navigation in 3D Simulation

References

[1] Mur-Artal, Raul, and Juan D. Tardós. "ORB-SLAM2: An open-source SLAM system for monocular,
stereo, and RGB-D cameras." IEEE Transactions on Robotics 33, no. 5 (2017): 1255-1262.

1-99
1 Camera Calibration and SfM Examples

Camera Calibration Using AprilTag Markers

AprilTags are widely used as visual markers for applications in object detection, localization, and as a
target for camera calibration [1]. AprilTags are like QR codes, but are designed to encode less data,
and can therefore be decoded faster which is useful, for example, for real-time robotics applications.

The advantages of using AprilTags as a calibration pattern include greater feature point detection,
and consistent, repeatable detections. This example uses the readAprilTag function to detect and
localize AprilTags in a calibration pattern. The readAprilTag function supports all official tag
families. The example also uses additional Computer Vision Toolbox™ functions to perform end-to-end
camera calibration. The default checkerboard pattern is replaced by a grid of evenly spaced
AprilTags. For an example of using a checkerboard pattern for calibration, refer to “Using the Single
Camera Calibrator App” on page 18-22.

This example shows how to calibrate a camera using AprilTags programmatically, and by using the
Camera Calibrator app:

• Functional Interface on page 1-100

• Camera Calibration App on page 1-107

Camera Calibration Using Functional Interface

Step 1: Generate the calibration pattern

Download and prepare tag images

Pre-generated tags for all the supported families can be downloaded from here using a web browser
or by running the following code:
downloadURL = "https://github.com/AprilRobotics/apriltag-imgs/archive/master.zip";
dataFolder = fullfile(tempdir,"apriltag-imgs",filesep);
options = weboptions('Timeout', Inf);
zipFileName = fullfile(dataFolder,"apriltag-imgs-master.zip");
folderExists = exist(dataFolder,"dir");

% Create a folder in a temporary directory to save the downloaded file.

if ~folderExists
mkdir(dataFolder);
disp("Downloading apriltag-imgs-master.zip (60.1 MB)...")
websave(zipFileName,downloadURL,options);

% Extract contents of the downloaded file.

disp("Extracting apriltag-imgs-master.zip...")
unzip(zipFileName,dataFolder);
end

Downloading apriltag-imgs-master.zip (60.1 MB)...

Extracting apriltag-imgs-master.zip...

The helperGenerateAprilTagPattern on page 1-112 function at the end of the example can be used to
generate a calibration target using the tag images for a specific arrangement of tags. The pattern
image is contained in calibPattern, which can be used to print the pattern (from MATLAB). The
example uses the tag36h11 family, which provides a reasonable trade-off between detection
performance and robustness to false-positive detections.

1-100
Camera Calibration Using AprilTag Markers

% Set the properties of the calibration pattern.

tagArrangement = [5,8];
tagFamily = "tag36h11";

% Generate the calibration pattern using AprilTags.

tagImageFolder = fullfile(dataFolder,"apriltag-imgs-master",tagFamily);
imdsTags = imageDatastore(tagImageFolder);
calibPattern = helperGenerateAprilTagPattern(imdsTags,tagArrangement,tagFamily);

Using the readAprilTag function on this pattern results in detections with the corner locations of
the individual tags grouped together. The helperAprilTagToCheckerLocations on page 1-114 function
can be used to convert this arrangement to a column-major arrangement, such as a checkerboard.
% Read and localize the tags in the calibration pattern.
[tagIds, tagLocs] = readAprilTag(calibPattern,tagFamily);

% Sort the tags based on their ID values.

[~, sortIdx] = sort(tagIds);
tagLocs = tagLocs(:,:,sortIdx);

% Reshape the tag corner locations into an M-by-2 array.

tagLocs = reshape(permute(tagLocs,[1,3,2]),[],2);

% Convert the AprilTag corner locations to checkerboard corner locations.

checkerIdx = helperAprilTagToCheckerLocations(tagArrangement);
imagePoints = tagLocs(checkerIdx(:),:);

% Display corner locations.

1-101
1 Camera Calibration and SfM Examples

figure; imshow(calibPattern); hold on

plot(imagePoints(:,1),imagePoints(:,2),"ro-",MarkerSize=15)

Prepare images for calibration

A few points to note while preparing images for calibration:

• While the pattern is printed on a paper in this example, consider printing it on a surface that
remains flat, and is not subject to deformations due to moisture, etc.
• Since the calibration procedure assumes that the pattern is planar, any imperfections in the
pattern (e.g. an uneven surface) can reduce the accuracy of the calibration.
• The calibration procedure requires at least 2 images of the pattern but using between 10 and 20
images produces more accurate results.
• Capture a variety of images of the pattern such that the pattern fills most of the image, thus
covering the entire field of view. For example, to best capture the lens distortion, have images of
the pattern at all edges of the image frame.
• Make sure the pattern is completely visible in the captured images since images with partially
visible patterns will be rejected.
• For more information on preparing images of the calibration pattern, see “Prepare Camera and
Capture Images” on page 18-4.

1-102
Camera Calibration Using AprilTag Markers

Step 2: Detect and localize the AprilTags

The helperDetectAprilTagCorners on page 1-113 function, included at the end of the example, is used
to detect, and localize the tags from the captured images and arrange them in a checkerboard fashion
to be used as key points in the calibration procedure.

% Create an imageDatastore object to store the captured images.

imdsCalib = imageDatastore("aprilTagCalibImages/");

% Detect the calibration pattern from the images.

[imagePoints,boardSize] = helperDetectAprilTagCorners(imdsCalib,tagArrangement,tagFamily);

Step 3: Generate world points for the calibration pattern

The generated AprilTag pattern is such that the tags are in a checkerboard fashion, and so the world
coordinates for the corresponding image coordinates determined above (in imagePoints) can be
obtained using the generateCheckerboardPoints function.

Here, the size of the square is replaced by the size of the tag, and the size of the board is obtained
from the previous step. Measure the tag size between the outer black edges of one side of the tag.

1-103
1 Camera Calibration and SfM Examples

% Generate world point coordinates for the pattern.

tagSize = 16; % in millimeters
worldPoints = generateCheckerboardPoints(boardSize, tagSize);

Step 4: Estimate camera parameters

With the image and world point correspondences, estimate the camera parameters using the
estimateCameraParameters function.

% Determine the size of the images.

I = readimage(imdsCalib,1);
imageSize = size(I,1:2);

% Estimate the camera parameters.

params = estimateCameraParameters(imagePoints,worldPoints,ImageSize=imageSize);

Visualize the accuracy of the calibration and the extrinsic camera parameters. Show the planes of the
calibration pattern in the captured images.

% Display the reprojection errors.

figure
showReprojectionErrors(params)

1-104
Camera Calibration Using AprilTag Markers

% Display the extrinsics.

figure
showExtrinsics(params)

1-105
1 Camera Calibration and SfM Examples

Inspect the locations of the detected image points and the reprojected points, which were obtained
using the estimated camera parameters.

% Read a calibration image.

I = readimage(imdsCalib,10);

% Insert markers for the detected and reprojected points.

I = insertMarker(I,imagePoints(:,:,10),"o",MarkerColor="g",Size=5);
I = insertMarker(I,params.ReprojectedPoints(:,:,10),"x",MarkerColor="r",Size=5);

% Display the image.

figure
imshow(I)

1-106
Camera Calibration Using AprilTag Markers

Using other Calibration Patterns

While this example uses AprilTags markers in the calibration pattern, the same workflow can be
extended to other planar patterns as well. The estimateCameraParameters used to obtain the
camera parameters requires:

• imagePoints: Key points in the calibration pattern in image coordinates obtained from the
captured images.
• worldPoints: Corresponding world point coordinates of the key points in the calibration pattern.

Provided there is a way to obtain these key points, the rest of the calibration workflow remains the
same.

Integrating AprilTag Calibration Pattern Support into Camera Calibrator App

For convenience of use, the above workflow can also be integrated into the Camera Calibrator app.
The overall workflow remains the same and the steps are:

1. Add images with AprilTags.

2. Import a custom pattern detector class for AprilTags. The detector must do the following:

1-107
1 Camera Calibration and SfM Examples

• Detect and localize the AprilTags

• Generate world points for the calibration pattern

3. Estimate camera parameters.

Add Images with AprilTags

Open Camera Calibrator App:

• MATLAB Toolstrip: On the Apps tab, in the Image Processing and Computer Vision section,
click the Camera Calibrator icon.
• MATLAB command prompt: Enter “Using the Single Camera Calibrator App” on page 18-22.

On the Calibration tab, in the File section, click Add images, and then select From file. You can
add images from multiple folders by clicking Add images for each folder. We will reuse the same
images as above on page 1-102. You will need at least 2 images for camera calibration. Once you add
images, the following UI will appear:

Expand the Custom Pattern panel, to see more options.

1-108
Camera Calibration Using AprilTag Markers

Import Custom Pattern Detector Class

The above UI shows a drop-down list for pattern selection. By default, the app does not include a
pattern detector for AprilTags. You can create a custom pattern detector class and then add it to the
list to use in the app. For more information on how to create a custom pattern, click on the
information icon ( ). A custom pattern detector class for AprilTags has been provided in
MyCustomAprilTagPatternDetector.m file. This class contains UI code for parameters needed by the
detector and functions for detecting and processing the custom AprilTags calibration pattern.

The example uses the configureUIComponents() function to configure the UI component and the
initializePropertyValues() to initialize it. The helperDrawImageAxesLabels on page 1-114
function, included at the end of the example, is used to render the origin, X-axis and Y-axis labels in
the calibration images displayed in the Camera Calibrator app dialog.

The main calibrations functions are:

• detectPatternPoints() - Detects and localizes the AprilTags from the captured images and sorts
them for use as key points in the calibration procedure. This function is implemented using
helperDetectAprilTagCorners on page 1-113 function, given at the end of the example.
• generateWorldPoints() - Computes world coordinates for the corresponding image coordinates
in the AprilTag pattern. This function is implemented using helperGenerateAprilTagPattern on
page 1-112 function, given at the end of the example.

1-109
1 Camera Calibration and SfM Examples

Import the custom pattern detector class by clicking on the Import Pattern Detector button under
Custom Pattern panel. Choose the class file MyCustomAprilTagPatternDetector.m. If there are
no errors in the class, then you will see the following view:

For this example, all the fields in the Properties panel have correct values. But you can customize
these values to fit your needs. Note that Square Size represents the width of the tag in world units
and is also assumed to be equal to the spacing between each tag in the image.

Click OK and the Data Browser pane displays a list of images with IDs, as shown below:

These images will contain the detected pattern. To view an image, select it from the Data Browser
pane.

1-110
Camera Calibration Using AprilTag Markers

Estimate Camera Parameters

At this point, camera calibration process is the same as given in “Using the Single Camera Calibrator
App” on page 18-22.

With the default calibration settings, click the Calibrate button on the Calibration tab. Visualize the
accuracy of the calibration by inspecting the Reprojection Errors pane and then visualize estimates
of the extrinsic camera parameters in the Camera-centric pane which shows the patterns positioned
with respect to the camera.

1-111
1 Camera Calibration and SfM Examples

Supporting Functions and Classes

helperGenerateAprilTagPattern generates an AprilTag based calibration pattern.

function calibPattern = helperGenerateAprilTagPattern(imdsTags,tagArragement,tagFamily)

numTags = tagArragement(1)*tagArragement(2);
tagIds = zeros(1,numTags);

% Read the first image.

I = readimage(imdsTags,3);
Igray = im2gray(I);

% Scale up the thumbnail tag image.

Ires = imresize(Igray,15,"nearest");

% Detect the tag ID and location (in image coordinates).

[tagIds(1), tagLoc] = readAprilTag(Ires,tagFamily);

% Pad image with white boundaries (ensures the tags replace the black
% portions of the checkerboard).
tagSize = round(max(tagLoc(:,2)) - min(tagLoc(:,2)));
padSize = round(tagSize/2 - (size(Ires,2) - tagSize)/2);
Ires = padarray(Ires,[padSize,padSize],255);

% Initialize tagImages array to hold the scaled tags.

tagImages = zeros(size(Ires,1),size(Ires,2),numTags);
tagImages(:,:,1) = Ires;

1-112
Camera Calibration Using AprilTag Markers

for idx = 2:numTags

I = readimage(imdsTags,idx + 2);
Igray = im2gray(I);
Ires = imresize(Igray,15,"nearest");
Ires = padarray(Ires,[padSize,padSize],255);

tagIds(idx) = readAprilTag(Ires,tagFamily);

% Store the tag images.

tagImages(:,:,idx) = Ires;

end

% Sort the tag images based on their IDs.

[~, sortIdx] = sort(tagIds);
tagImages = tagImages(:,:,sortIdx);

% Reshape the tag images to ensure that they appear in column-major order
% (montage function places image in row-major order).
columnMajIdx = reshape(1:numTags,tagArragement)';
tagImages = tagImages(:,:,columnMajIdx(:));

% Create the pattern using 'montage'.

imgData = montage(tagImages,Size=tagArragement);
calibPattern = imgData.CData;

end

helperDetectAprilTagCorners detects AprilTag calibration pattern in images.

function [imagePoints,boardSize,imagesUsed] = helperDetectAprilTagCorners(imdsCalib,tagArrangemen

% Get the pattern size from tagArrangement.

boardSize = tagArrangement*2 + 1;

% Initialize number of images and tags.

numImages = length(imdsCalib.Files);
numTags = tagArrangement(1)*tagArrangement(2);

% Initialize number of corners in AprilTag pattern.

imagePoints = zeros(numTags*4,2,numImages);
imagesUsed = zeros(1,numImages);

% Get checkerboard corner indices from AprilTag corners.

checkerIdx = helperAprilTagToCheckerLocations(tagArrangement);

for idx = 1:numImages

% Read and detect AprilTags in image.

I = readimage(imdsCalib,idx);
[tagIds,tagLocs] = readAprilTag(I,tagFamily);

% Accept images if all tags are detected.

if numel(tagIds) == numTags
% Sort detected tags using ID values.

1-113
1 Camera Calibration and SfM Examples

[~,sortIdx] = sort(tagIds);
tagLocs = tagLocs(:,:,sortIdx);

% Reshape tag corner locations into a M-by-2 array.

tagLocs = reshape(permute(tagLocs,[1,3,2]),[],2);

% Populate imagePoints using checkerboard corner indices.

imagePoints(:,:,idx) = tagLocs(checkerIdx(:),:);
imagesUsed(idx) = true;
else
imagePoints(:,:,idx) = [];
end

end

helperAprilTagToCheckerLocations converts AprilTag corners to checkerboard corners.

function checkerIdx = helperAprilTagToCheckerLocations(tagArrangement)

numTagRows = tagArrangement(1);
numTagCols = tagArrangement(2);
numTags = numTagRows * numTagCols;

% Row index offsets.

rowIdxOffset = [0:numTagRows - 1; 0:numTagRows - 1];

% Row indices for first and second columns in board.

col1Idx = repmat([4 1]',numTagRows,1);
col2Idx = repmat([3 2]',numTagRows,1);
col1Idx = col1Idx + rowIdxOffset(:)*4;
col2Idx = col2Idx + rowIdxOffset(:)*4;

% Column index offsets

colIdxOffset = 0:4*numTagRows:numTags*4 - 1;

% Implicit expansion to get all indices in order.

checkerIdx = [col1Idx;col2Idx] + colIdxOffset;

end

helperDrawImageAxesLabels renders the origin, X-axis and Y-axis labels in the calibration images
displayed in the calibrator app.

function [originLabel,xLabel,yLabel] = helperDrawImageAxesLabels(boardSize,imagePoints)

numBoardRows = boardSize(1)-1;
numBoardCols = boardSize(2)-1;

% Reshape checkerboard corners to boardSize shaped array

boardCoordsX = reshape(imagePoints(:,1), [numBoardRows,numBoardCols]);
boardCoordsY = reshape(imagePoints(:,2), [numBoardRows,numBoardCols]);
boardCoords = cat(3, boardCoordsX,boardCoordsY);

1-114
Camera Calibration Using AprilTag Markers

% Origin label (check if origin location is inside the image)

if ~isnan(boardCoordsX(1,1))
p1 = boardCoords(1,1,:);

refPointIdx = find(~isnan(boardCoordsX(:,1)),2);
p2 = boardCoords(refPointIdx(2),1,:);

refPointIdx = find(~isnan(boardCoordsX(1,:)),2);
p3 = boardCoords(1,refPointIdx(2),:);

[loc, theta] = getAxesLabelPosition(p1,p2,p3);

originLabel.Location = loc;
originLabel.Orientation = theta;
else
originLabel = struct;
end

% X-axis label
firstRowIdx = numBoardCols:-1:1;
refPointIdx13 = find(~isnan(boardCoordsX(1,firstRowIdx)),2);
refPointIdx13 = firstRowIdx(refPointIdx13);

p1 = boardCoords(1,refPointIdx13(1),:);
p3 = boardCoords(1,refPointIdx13(2),:);

refPointIdx2 = find(~isnan(boardCoordsX(:,refPointIdx13(1))),2);
p2 = boardCoords(refPointIdx2(2),refPointIdx13(1),:);

[loc, theta] = getAxesLabelPosition(p1,p2,p3);

theta = 180 + theta;

xLabel.Location = loc;
xLabel.Orientation = theta;

% Y-axis label
firstColIdx = numBoardRows:-1:1;
refPointIdx12 = find(~isnan(boardCoordsX(firstColIdx,1)),2);
refPointIdx12 = firstColIdx(refPointIdx12);

p1 = boardCoords(refPointIdx12(1),1,:);
p2 = boardCoords(refPointIdx12(2),1,:);

refPointIdx3 = find(~isnan(boardCoordsX(refPointIdx12(1),:)), 2);

p3 = boardCoords(refPointIdx12(1),refPointIdx3(2),:);

[loc,theta] = getAxesLabelPosition(p1,p2,p3);

yLabel.Location = loc;
yLabel.Orientation = theta;

%--------------------------------------------------------------
% p1+v
% \
% \ v1
% p1 ------ p2
% |

1-115
1 Camera Calibration and SfM Examples

% v2 |
% |
% p3
function [loc,theta] = getAxesLabelPosition(p1,p2,p3)
v1 = p3 - p1;
theta = -atan2d(v1(2),v1(1));

v2 = p2 - p1;
v = -v1 - v2;
d = hypot(v(1),v(2));
minDist = 40;
if d < minDist
v = (v / d) * minDist;
end
loc = p1 + v;
end
%--------------------------------------------------------------

end

Reference

[1] E. Olson, "AprilTag: A robust and flexible visual fiducial system," 2011 IEEE International
Conference on Robotics and Automation, Shanghai, 2011, pp. 3400-3407, doi: 10.1109/
ICRA.2011.5979561.

1-116
Configure Monocular Fisheye Camera

Configure Monocular Fisheye Camera

This example shows how to convert a fisheye camera model to a pinhole model and construct a
corresponding monocular camera sensor simulation. In this example, you learn how to calibrate a
fisheye camera and configure a monoCamera (Automated Driving Toolbox) object.

Overview

To simulate a monocular camera sensor mounted in a vehicle, follow these steps:

1 Estimate the intrinsic camera parameters by calibrating the camera using a checkerboard. The
intrinsic parameters describe the properties of the fisheye camera itself.
2 Estimate the extrinsic camera parameters by calibrating the camera again, using the same
checkerboard from the previous step. The extrinsic parameters describe the mounting position of
the fisheye camera in the vehicle coordinate system.
3 Remove image distortion by converting the fisheye camera intrinsics to pinhole camera intrinsics.
These intrinsics describe a synthetic pinhole camera that can hypothetically generate undistorted
images.
4 Use the intrinsic pinhole camera parameters and the extrinsic parameters to configure the
monocular camera sensor for simulation. You can then use this sensor to detect objects and lane
boundaries.

Estimate Fisheye Camera Intrinsics

To estimate the intrinsic parameters, use a checkerboard for camera calibration. Alternatively, to
better visualize the results, use the Camera Calibrator app. For fisheye camera, it is useful to place
the checkerboard close to the camera, in order to capture large noticeable distortion in the image.
% Gather a set of calibration images.
images = imageDatastore(fullfile(toolboxdir('vision'), 'visiondata', ...
'calibration', 'gopro'));
imageFileNames = images.Files;

% Detect calibration pattern.

[imagePoints, boardSize] = detectCheckerboardPoints(imageFileNames);

% Generate world coordinates of the corners of the squares.

squareSize = 0.029; % Square size in meters
worldPoints = generateCheckerboardPoints(boardSize, squareSize);

% Calibrate the camera.

I = readimage(images, 1);
imageSize = [size(I, 1), size(I, 2)];
params = estimateFisheyeParameters(imagePoints, worldPoints, imageSize);

Estimate Fisheye Camera Extrinsics

To estimate the extrinsic parameters, use the same checkerboard to estimate the mounting position
of the camera in the vehicle coordinate system. The following step estimates the parameters from one
image. You can also take multiple checkerboard images to obtain multiple estimations, and average
the results.
% Load a different image of the same checkerboard, where the checkerboard
% is placed on the flat ground. Its X-axis is pointing to the right of the

1-117
1 Camera Calibration and SfM Examples

% vehicle, and its Y-axis is pointing to the camera. The image includes
% noticeable distortion, such as along the wall next to the checkerboard.

imageFileName = fullfile(toolboxdir('driving'), 'drivingdata', 'checkerboard.png');

I = imread(imageFileName);
imshow(I)
title('Distorted Checkerboard Image');

[imagePoints, boardSize] = detectCheckerboardPoints(I);

% Generate coordinates of the corners of the squares.

squareSize = 0.029; % Square size in meters
worldPoints = generateCheckerboardPoints(boardSize, squareSize);

% Estimate the parameters for configuring the monoCamera object.

% Height of the checkerboard is zero here, since the pattern is
% directly on the ground.
originHeight = 0;
[pitch, yaw, roll, height] = estimateMonoCameraParameters(params.Intrinsics, ...
imagePoints, worldPoints, originHeight);

1-118
Configure Monocular Fisheye Camera

Construct a Synthetic Pinhole Camera for the Undistorted Image

% Undistort the image and extract the synthetic pinhole camera intrinsics.
[J1, camIntrinsics] = undistortFisheyeImage(I, params.Intrinsics, 'Output', 'full');
imshow(J1)
title('Undistorted Image');

% Set up monoCamera with the synthetic pinhole camera intrinsics.

% Note that the synthetic camera has removed the distortion.
sensor = monoCamera(camIntrinsics, height, 'pitch', pitch, 'yaw', yaw, 'roll', roll);

Plot Bird's Eye View

Now you can validate the monoCamera (Automated Driving Toolbox) by plotting a bird's-eye view.
% Define bird's-eye-view transformation parameters
distAheadOfSensor = 6; % in meters
spaceToOneSide = 2.5; % look 2.5 meters to the right and 2.5 meters to the left
bottomOffset = 0.2; % look 0.2 meters ahead of the sensor
outView = [bottomOffset, distAheadOfSensor, -spaceToOneSide, spaceToOneSide];
outImageSize = [NaN,1000]; % output image width in pixels

1-119
1 Camera Calibration and SfM Examples

birdsEyeConfig = birdsEyeView(sensor, outView, outImageSize);

% Transform input image to bird's-eye-view image and display it

B = transformImage(birdsEyeConfig, J1);

% Place a 2-meter marker ahead of the sensor in bird's-eye view

imagePoint0 = vehicleToImage(birdsEyeConfig, [2, 0]);
offset = 5; % offset marker from text label by 5 pixels
annotatedB = insertMarker(B, imagePoint0 - offset);
annotatedB = insertText(annotatedB, imagePoint0, '2 meters');

figure
imshow(annotatedB)
title('Bird''s-Eye View')

1-120
Configure Monocular Fisheye Camera

The plot above shows that the camera measures distances accurately. Now you can use the
monocular camera for object and lane boundary detection. See the “Visual Perception Using
Monocular Camera” (Automated Driving Toolbox) example.

1-121
1 Camera Calibration and SfM Examples

Monocular Visual Simultaneous Localization and Mapping

Visual simultaneous localization and mapping (vSLAM), refers to the process of calculating the
position and orientation of a camera with respect to its surroundings, while simultaneously mapping
the environment. The process uses only visual inputs from the camera. Applications for vSLAM
include augmented reality, robotics, and autonomous driving.

This example shows two implementations of processing image data from a monocular camera to build
a map of an indoor environment and estimate the trajectory of the camera:

1 Modular and Modifiable: The first implementation builds a visual SLAM pipeline step-by-step
using functions and classes mentioned in “Implement Visual SLAM in MATLAB” on page 13-8.
It is modular and designed to teach about the details of the vSLAM implementation loosely based
on the popular and reliable ORB-SLAM [1] on page 1-143 algorithm. The code is easily navigable,
allowing you to understand the algorithm and test how its parameters can impact the
performance of the system. This modular implementation is most suitable for experimentation
and modification to test your own ideas.
2 Performant and Deployable: The second implementation uses the monovslam class which
packages all implementation details together and creates a more practical solution in terms of
performance (execution speed) and deployability. You can also generate C/C++ code from
monovslam using MATLAB Coder, and the generated code can also be utilized in non-PC
hardware.

To speed up computations, you can enable parallel computing from the “Computer Vision Toolbox
Preferences” dialog box. To open Computer Vision Toolbox™ preferences, on the Home tab, in the
Environment section, click Preferences. Then select Computer Vision Toolbox.

Glossary

The following terms are frequently used in this example:

• Key Frames: A subset of video frames that contain cues for localization and tracking. Two
consecutive key frames usually involve sufficient visual change.
• Map Points: A list of 3-D points that represent the map of the environment reconstructed from
the key frames.
• Covisibility Graph: A graph consisting of key frame as nodes. Two key frames are connected by
an edge if they share common map points. The weight of an edge is the number of shared map
points.
• Essential Graph: A subgraph of covisibility graph containing only edges with high weight, i.e.
more shared map points.
• Place Recognition Database: A database used to recognize whether a place has been visited in
the past. The database stores the visual word-to-image mapping based on the input bag of
features. It is used to search for an image that is visually similar to a query image.

1-122
Monocular Visual Simultaneous Localization and Mapping

Overview of ORB-SLAM

The ORB-SLAM pipeline includes:

• Map Initialization: ORB-SLAM starts by initializing the map of 3-D points from two video frames.
The 3-D points and relative camera pose are computed using triangulation based on 2-D ORB
feature correspondences.
• Tracking: Once a map is initialized, for each new frame, the camera pose is estimated by
matching features in the current frame to features in the last key frame. The estimated camera
pose is refined by tracking the local map.
• Local Mapping: The current frame is used to create new 3-D map points if it is identified as a key
frame. At this stage, bundle adjustment is used to minimize reprojection errors by adjusting the
camera pose and 3-D points.
• Loop Closure: Loops are detected for each key frame by comparing it against all previous key
frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is
optimized to refine the camera poses of all the key frames.

Download and Explore the Input Image Sequence

The data used in this example are from the TUM RGB-D benchmark [2] on page 1-143. You can
download the data to a temporary directory using a web browser or by running the following code:

baseDownloadURL = "https://cvg.cit.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_offi
dataFolder = fullfile(tempdir, 'tum_rgbd_dataset', filesep);
options = weboptions(Timeout=Inf);
tgzFileName = [dataFolder, 'fr3_office.tgz'];
folderExists = exist(dataFolder, "dir");

% Create a folder in a temporary directory to save the downloaded file

if ~folderExists
mkdir(dataFolder);
disp('Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.')
websave(tgzFileName, baseDownloadURL, options);

% Extract contents of the downloaded file

disp('Extracting fr3_office.tgz (1.38 GB) ...')
untar(tgzFileName, dataFolder);
end

Create an imageDatastore object to inspect the RGB images.

1-123
1 Camera Calibration and SfM Examples

imageFolder = [dataFolder,'rgbd_dataset_freiburg3_long_office_household/rgb/'];
imds = imageDatastore(imageFolder);

% Inspect the first image

currFrameIdx = 1;
currI = readimage(imds, currFrameIdx);
himage = imshow(currI);

Map Initialization

The ORB-SLAM pipeline starts by initializing the map that holds 3-D world points. This step is crucial
and has a significant impact on the accuracy of final SLAM result. Initial ORB feature point
correspondences are found using matchFeatures between a pair of images. After the
correspondences are found, two geometric transformation models are used to establish map
initialization:

• Homography: If the scene is planar, a homography projective transformation is a better choice to

describe feature point correspondences.
• Fundamental Matrix: If the scene is non-planar, a fundamental matrix must be used instead.

The homography and the fundamental matrix can be computed using estgeotform2d and
estimateFundamentalMatrix, respectively. The model that results in a smaller reprojection error
is selected to estimate the relative rotation and translation between the two frames using
estrelpose. Since the RGB images are taken by a monocular camera which does not provide the
depth information, the relative translation can only be recovered up to a specific scale factor.

Given the relative camera pose and the matched feature points in the two images, the 3-D locations of
the matched points are determined using triangulate function. A triangulated map point is valid
when it is located in the front of both cameras, when its reprojection error is low, and when the
parallax of the two views of the point is sufficiently large.

% Set random seed for reproducibility

rng(0);

% Create a cameraIntrinsics object to store the camera intrinsic parameters.

% The intrinsics for the dataset can be found at the following page:
% https://vision.in.tum.de/data/datasets/rgbd-dataset/file_formats
% Note that the images in the dataset are already undistorted, hence there
% is no need to specify the distortion coefficients.
focalLength = [535.4, 539.2]; % in units of pixels
principalPoint = [320.1, 247.6]; % in units of pixels
imageSize = size(currI,[1 2]); % in units of pixels
intrinsics = cameraIntrinsics(focalLength, principalPoint, imageSize);

% Detect and extract ORB features

scaleFactor = 1.2;
numLevels = 8;
numPoints = 1000;
[preFeatures, prePoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoint

currFrameIdx = currFrameIdx + 1;
firstI = currI; % Preserve the first frame

isMapInitialized = false;

% Map initialization loop

1-124
Monocular Visual Simultaneous Localization and Mapping

while ~isMapInitialized && currFrameIdx < numel(imds.Files)

currI = readimage(imds, currFrameIdx);

[currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, nu

currFrameIdx = currFrameIdx + 1;

% Find putative feature matches

indexPairs = matchFeatures(preFeatures, currFeatures, Unique=true, ...
MaxRatio=0.9, MatchThreshold=40);

preMatchedPoints = prePoints(indexPairs(:,1),:);
currMatchedPoints = currPoints(indexPairs(:,2),:);

% If not enough matches are found, check the next frame

minMatches = 100;
if size(indexPairs, 1) < minMatches
continue
end

preMatchedPoints = prePoints(indexPairs(:,1),:);
currMatchedPoints = currPoints(indexPairs(:,2),:);

% Compute homography and evaluate reconstruction

[tformH, scoreH, inliersIdxH] = helperComputeHomography(preMatchedPoints, currMatchedPoints);

% Compute fundamental matrix and evaluate reconstruction

[tformF, scoreF, inliersIdxF] = helperComputeFundamentalMatrix(preMatchedPoints, currMatchedP

% Select the model based on a heuristic

ratio = scoreH/(scoreH + scoreF);
ratioThreshold = 0.45;
if ratio > ratioThreshold
inlierTformIdx = inliersIdxH;
tform = tformH;
else
inlierTformIdx = inliersIdxF;
tform = tformF;
end

% Computes the camera location up to scale. Use half of the

% If not enough inliers are found, move to the next frame

if validFraction < 0.9 || numel(relPose)==3
continue
end

% Triangulate two views to obtain 3-D map points

minParallax = 1; % In degrees
[isValid, xyzWorldPoints, inlierTriangulationIdx] = helperTriangulateTwoFrames(...
rigidtform3d, relPose, inlierPrePoints, inlierCurrPoints, intrinsics, minParallax);

if ~isValid

1-125
1 Camera Calibration and SfM Examples

continue
end

% Get the original index of features in the two key frames

indexPairs = indexPairs(inlierTformIdx(inlierTriangulationIdx),:);

isMapInitialized = true;

disp(['Map initialized with frame 1 and frame ', num2str(currFrameIdx-1)])

end % End of map initialization loop

Map initialized with frame 1 and frame 29

if isMapInitialized
close(himage.Parent.Parent); % Close the previous figure
% Show matched features
hfeature = showMatchedFeatures(firstI, currI, prePoints(indexPairs(:,1)), ...
currPoints(indexPairs(:, 2)), "Montage");
else
error('Unable to initialize the map.')
end

Store Initial Key Frames and Map Points

After the map is initialized using two frames, you can use imageviewset and worldpointset to
store the two key frames and the corresponding map points:

• imageviewset stores the key frames and their attributes, such as ORB descriptors, feature
points and camera poses, and connections between the key frames, such as feature points
matches and relative camera poses. It also builds and updates a pose graph. The absolute camera
poses and relative camera poses of odometry edges are stored as rigidtform3d objects. The
relative camera poses of loop-closure edges are stored as affinetform3d objects.
• worldpointset stores 3-D positions of the map points and the 3-D into 2-D projection
correspondences: which map points are observed in a key frame and which key frames observe a
map point. It also stores other attributes of map points, such as the mean view direction, the
representative ORB descriptors, and the range of distance at which the map point can be
observed.
% Create an empty imageviewset object to store key frames
vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points

mapPointSet = worldpointset;

% Add the first key frame. Place the camera associated with the first
% key frame at the origin, oriented along the Z-axis

1-126
Monocular Visual Simultaneous Localization and Mapping

preViewId = 1;
vSetKeyFrames = addView(vSetKeyFrames, preViewId, rigidtform3d, Points=prePoints,...
Features=preFeatures.Features);

% Add the second key frame

currViewId = 2;
vSetKeyFrames = addView(vSetKeyFrames, currViewId, relPose, Points=currPoints,...
Features=currFeatures.Features);

% Add connection between the first and the second key frame
vSetKeyFrames = addConnection(vSetKeyFrames, preViewId, currViewId, relPose, Matches=indexPairs);

% Add 3-D map points

[mapPointSet, newPointIdx] = addWorldPoints(mapPointSet, xyzWorldPoints);

% Add observations of the map points

preLocations = prePoints.Location;
currLocations = currPoints.Location;
preScales = prePoints.Scale;
currScales = currPoints.Scale;

% Add image points corresponding to the map points in the first key frame
mapPointSet = addCorrespondences(mapPointSet, preViewId, newPointIdx, indexPairs(:,1));

% Add image points corresponding to the map points in the second key frame
mapPointSet = addCorrespondences(mapPointSet, currViewId, newPointIdx, indexPairs(:,2));

Initialize Place Recognition Database

bag =
bagOfFeatures(imds,CustomExtractor=@helperORBFeatureExtractorFunction,TreePro
perties=[3, 10],StrongestFeatures=1);

where imds is an imageDatastore object storing the training images and

helperORBFeatureExtractorFunction is the ORB feature extractor function. See “Image
Retrieval with Bag of Visual Words” on page 19-168 for more information.

The loop closure process incrementally builds a database, represented as an invertedImageIndex

object, that stores the visual word-to-image mapping based on the bag of ORB features.
% Load the bag of features data created offline
bofData = load("bagOfFeaturesDataSLAM.mat");

% Initialize the place recognition database

loopDatabase = invertedImageIndex(bofData.bof,SaveFeatureLocations=false);

% Add features of the first two key frames to the database

addImageFeatures(loopDatabase, preFeatures, preViewId);
addImageFeatures(loopDatabase, currFeatures, currViewId);

Refine and Visualize the Initial Reconstruction

Refine the initial reconstruction using bundleAdjustment, that optimizes both camera poses and
world points to minimize the overall reprojection errors. After the refinement, the attributes of the

1-127
1 Camera Calibration and SfM Examples

map points including 3-D locations, view direction, and depth range are updated. You can use
helperVisualizeMotionAndStructure to visualize the map points and the camera locations.

% Run full bundle adjustment on the first two key frames

tracks = findTracks(vSetKeyFrames);
cameraPoses = poses(vSetKeyFrames);

[refinedPoints, refinedAbsPoses] = bundleAdjustment(xyzWorldPoints, tracks, ...

cameraPoses, intrinsics, FixedViewIDs=1, ...
PointsUndistorted=true, AbsoluteTolerance=1e-7,...
RelativeTolerance=1e-15, MaxIteration=20, ...
Solver="preconditioned-conjugate-gradient");

% Scale the map and the camera pose using the median depth of map points
medianDepth = median(vecnorm(refinedPoints.'));
refinedPoints = refinedPoints / medianDepth;

refinedAbsPoses.AbsolutePose(currViewId).Translation = ...
refinedAbsPoses.AbsolutePose(currViewId).Translation / medianDepth;
relPose.Translation = relPose.Translation/medianDepth;

% Update key frames with the refined poses

vSetKeyFrames = updateView(vSetKeyFrames, refinedAbsPoses);
vSetKeyFrames = updateConnection(vSetKeyFrames, preViewId, currViewId, relPose);

% Update map points with the refined positions

mapPointSet = updateWorldPoints(mapPointSet, newPointIdx, refinedPoints);

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Visualize matched features in the current frame

close(hfeature.Parent.Parent);
featurePlot = helperVisualizeMatchedFeatures(currI, currPoints(indexPairs(:,2)));

1-128
Monocular Visual Simultaneous Localization and Mapping

% Visualize initial map points and camera trajectory

mapPlot = helperVisualizeMotionAndStructure(vSetKeyFrames, mapPointSet);

% Show legend
showLegend(mapPlot);

1-129
1 Camera Calibration and SfM Examples

Tracking

The tracking process is performed using every frame and determines when to insert a new key frame.
To simplify this example, we will terminate the tracking process once a loop closure is found.

% ViewId of the current key frame

currKeyFrameId = currViewId;

% ViewId of the last key frame

lastKeyFrameId = currViewId;

% Index of the last key frame in the input image sequence

lastKeyFrameIdx = currFrameIdx - 1;

% Indices of all the key frames in the input image sequence

addedFramesIdx = [1; lastKeyFrameIdx];

isLoopClosed = false;

Each frame is processed as follows:

1 ORB features are extracted for each new frame and then matched (using matchFeatures), with
features in the last key frame that have known corresponding 3-D map points.
2 Estimate the camera pose with the Perspective-n-Point algorithm using estworldpose.

1-130
Monocular Visual Simultaneous Localization and Mapping

3 Given the camera pose, project the map points observed by the last key frame into the current
frame and search for feature correspondences using matchFeaturesInRadius.
4 With 3-D to 2-D correspondence in the current frame, refine the camera pose by performing a
motion-only bundle adjustment using bundleAdjustmentMotion.
5 Project the local map points into the current frame to search for more feature correspondences
using matchFeaturesInRadius and refine the camera pose again using
bundleAdjustmentMotion.
6 The last step of tracking is to decide if the current frame is a new key frame. If the current frame
is a key frame, continue to the Local Mapping process. Otherwise, start Tracking for the next
frame.

If tracking is lost because not enough number of feature points could be matched, try inserting new
key frames more frequently.
% Main loop
isLastFrameKeyFrame = true;
while ~isLoopClosed && currFrameIdx < numel(imds.Files)
currI = readimage(imds, currFrameIdx);

[currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, nu

% Track the last key frame

% mapPointsIdx: Indices of the map points observed in the current frame
% featureIdx: Indices of the corresponding feature points in the
% current frame
[currPose, mapPointsIdx, featureIdx] = helperTrackLastKeyFrame(mapPointSet, ...
vSetKeyFrames.Views, currFeatures, currPoints, lastKeyFrameId, intrinsics, scaleFactor);

% Track the local map and check if the current frame is a key frame.
% A frame is a key frame if both of the following conditions are satisfied:
%
% 1. At least 20 frames have passed since the last key frame or the
% current frame tracks fewer than 100 map points.
% 2. The map points tracked by the current frame are fewer than 90% of
% points tracked by the reference key frame.
%
% Tracking performance is sensitive to the value of numPointsKeyFrame.
% If tracking is lost, try a larger value.
%
% localKeyFrameIds: ViewId of the connected key frames of the current frame
numSkipFrames = 20;
numPointsKeyFrame = 80;
[localKeyFrameIds, currPose, mapPointsIdx, featureIdx, isKeyFrame] = ...
helperTrackLocalMap(mapPointSet, vSetKeyFrames, mapPointsIdx, ...
featureIdx, currPose, currFeatures, currPoints, intrinsics, scaleFactor, numLevels, ...
isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

% Visualize matched features

updatePlot(featurePlot, currI, currPoints(featureIdx));

if ~isKeyFrame
currFrameIdx = currFrameIdx + 1;
isLastFrameKeyFrame = false;
continue
else
isLastFrameKeyFrame = true;

1-131
1 Camera Calibration and SfM Examples

end

% Update current key frame ID

currKeyFrameId = currKeyFrameId + 1;

Local Mapping

% Add the new key frame

[mapPointSet, vSetKeyFrames] = helperAddNewKeyFrame(mapPointSet, vSetKeyFrames, ...
currPose, currFeatures, currPoints, mapPointsIdx, featureIdx, localKeyFrameIds);

% Remove outlier map points that are observed in fewer than 3 key frames
outlierIdx = setdiff(newPointIdx, mapPointsIdx);
if ~isempty(outlierIdx)
mapPointSet = removeWorldPoints(mapPointSet, outlierIdx);
end

% Create new map points by triangulation

minNumMatches = 10;
minParallax = 3;
[mapPointSet, vSetKeyFrames, newPointIdx] = helperCreateNewMapPoints(mapPointSet, vSetKeyFram
currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minParallax);

% Local bundle adjustment

% Refine local key frames and map points

[mapPointSet, vSetKeyFrames, mapPointIdx] = bundleAdjustment(...
mapPointSet, vSetKeyFrames, [refinedKeyFrameIds; currKeyFrameId], intrinsics, ...
FixedViewIDs=fixedViewIds, PointsUndistorted=true, AbsoluteTolerance=1e-7,...
RelativeTolerance=1e-16, Solver="preconditioned-conjugate-gradient", ...
MaxIteration=10);

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

% Visualize 3D world points and camera trajectory

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

1-132
Monocular Visual Simultaneous Localization and Mapping

Loop Closure

When a valid loop candidate is found, use estgeotform3d to compute the relative pose between the
loop candidate frame and the current key frame. The relative pose represents a 3-D similarity
transformation stored in an affinetform3d object. Then add the loop connection with the relative
pose and update mapPointSet and vSetKeyFrames.

% Check loop closure after some key frames have been created
if currKeyFrameId > 20

% Minimum number of feature matches of loop edges

loopEdgeNumMatches = 50;

% Detect possible loop closure key frame candidates

[isDetected, validLoopCandidates] = helperCheckLoopClosure(vSetKeyFrames, currKeyFrameId,
loopDatabase, currI, loopEdgeNumMatches);

if isDetected
% Add loop closure connections
[isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnections(...
mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
currFeatures, loopEdgeNumMatches);
end
end

% If no loop closure is detected, add current features into the database

if ~isLoopClosed
addImageFeatures(loopDatabase, currFeatures, currKeyFrameId);
end

% Update IDs and indices

lastKeyFrameId = currKeyFrameId;
lastKeyFrameIdx = currFrameIdx;
addedFramesIdx = [addedFramesIdx; currFrameIdx]; %#ok<AGROW>
currFrameIdx = currFrameIdx + 1;
end % End of main loop

1-133
1 Camera Calibration and SfM Examples

1-134
Monocular Visual Simultaneous Localization and Mapping

Loop edge added between keyframe: 4 and 161

Finally, a similarity pose graph optimization is performed over the essential graph in vSetKeyFrames
to correct the drift of camera poses. The essential graph is created internally by removing
connections with fewer than minNumMatches matches in the covisibility graph. After similarity pose
graph optimization, update the 3-D locations of the map points using the optimized poses and the
associated scales.
if isLoopClosed
% Optimize the poses
minNumMatches = 20;
vSetKeyFramesOptim = optimizePoses(vSetKeyFrames, minNumMatches, Tolerance=1e-16);

% Update map points after optimizing the poses

mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim);

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

% Plot the optimized camera trajectory

optimizedPoses = poses(vSetKeyFramesOptim);
plotOptimizedTrajectory(mapPlot, optimizedPoses)

% Update legend

1-135
1 Camera Calibration and SfM Examples

showLegend(mapPlot);
end

Compare with the Ground Truth

You can compare the optimized camera trajectory with the ground truth to evaluate the accuracy of
ORB-SLAM. The downloaded data contains a groundtruth.txt file that stores the ground truth of
camera pose of each frame. You can import the data using the helperImportGroundTruth
function:

gTruthFileName = [dataFolder,'rgbd_dataset_freiburg3_long_office_household/
groundtruth.txt'];

gTruth = helperImportGroundTruth(gTruthFileName, imds);

In this example, the data has been saved in the form of a MAT-file. Once the ground truth is imported,
you can calculate the root-mean-square-error (RMSE) of trajectory estimates.

% Load ground truth

gTruthData = load("orbslamGroundTruth.mat");
gTruth = gTruthData.gTruth;

% Plot the actual camera trajectory

plotActualTrajectory(mapPlot, gTruth(addedFramesIdx), optimizedPoses);

% Show legend
showLegend(mapPlot);

1-136
Monocular Visual Simultaneous Localization and Mapping

% Evaluate tracking accuracy

helperEstimateTrajectoryError(gTruth(addedFramesIdx), optimizedPoses);

Absolute RMSE for key frame trajectory (m): 0.21829

You can test the visual SLAM pipeline with a different dataset by tuning the following parameters:

• numPoints: For image resolution of 480x640 pixels set numPoints to be 1000. For higher
resolutions, such as 720 × 1280, set it to 2000. Larger values require more time in feature
extraction.
• numSkipFrames: For frame rate of 30fps, set numSkipFrames to be 20. For a slower frame rate,
set it to be a smaller value. Increasing numSkipFrames improves the tracking speed, but may
result in tracking lost when the camera motion is fast.

Use Performant and Deployable monovslam Class to Implement Visual SLAM

The above sections demonstrated the core principles involved in building a visual SLAM pipeline step-
by-step. In this section, you will learn how to use the monovslam class to easily invoke the entire
pipeline in just a few lines of code. This implementation is significantly faster and is suitable for
deployment via C/C++ code generation on a host computer or an embedded platform.

% Create a monovslam class that implements ORB-SLAM algorithm

focalLength = [535.4, 539.2]; % in units of pixels

1-137
1 Camera Calibration and SfM Examples

principalPoint = [320.1, 247.6]; % in units of pixels

imageSize = size(currI,[1 2]); % in units of pixels
intrinsics = cameraIntrinsics(focalLength, principalPoint, imageSize);
numPoints = 1000;
numSkipFrames = 20;
vslam = monovslam(intrinsics,MaxNumPoints=numPoints,SkipMaxFrames=numSkipFrames);

% Process each image frame

for i=1:numel(imds.Files)
addFrame(vslam, readimage(imds,i));

if hasNewKeyFrame(vslam)
% Display 3-D map points and camera trajectory
plot(vslam);
end
end

% PLot intermediate results and Wait until all images are processed
while ~isDone(vslam)
if hasNewKeyFrame(vslam)
plot(vslam);
end
end

% Get final 3-D map points and camera poses

1-138
Monocular Visual Simultaneous Localization and Mapping

xyzPoints = mapPoints(vslam);
camPoses = poses(vslam);

Code Generation

You can generate C++ code from monovslam class suitable for deployment on a host computer or an
embedded platform that has all the third-party dependencies including OpenCV and g2o [3] on page
1-143. For illustrative purposes, in this section, we will generate MEX code. To meet the
requirements of MATLAB Coder, the code has to be restructured to isolate the algorithm from the
visualization code. The algorithmic content was encapsulated in the helperMonoVisualSLAM
function, which takes an image as the input and outputs 3-D world points and camera poses as
matrices. Inside the function, a monovslam object is created and saved into a persistent variable
called vslam. Note that helperMonoVisualSLAM function does not display the reconstructed 3-D
point cloud or the camera poses. The plot method of the monovslam class was not designed to
generate code because visualization is typically not deployed on embedded systems.

type("helperMonoVisualSLAM.m"); % display function contents

function [xyzPoint, camPoses] = helperMonoVisualSLAM(image)

% Copyright 2023 The MathWorks Inc.

%#codegen

persistent vslam xyzPointsInternal camPosesInternal

if isempty(vslam)
% Create a monovslam class to process the image data
focalLength = [535.4, 539.2]; % in units of pixels
principalPoint = [320.1, 247.6]; % in units of pixels
imageSize = [480, 640]; % in units of pixels
intrinsics = cameraIntrinsics(focalLength, principalPoint, imageSize);

vslam = monovslam(intrinsics);
end

% Process each image frame

addFrame(vslam, image);

% Get 3-D map points and camera poses

if isempty(xyzPointsInternal) || hasNewKeyFrame(vslam)
xyzPointsInternal = mapPoints(vslam);
camPosesInternal = poses(vslam);
end

xyzPoint = xyzPointsInternal;

% Convert camera poses to homogeneous transformation matrices

camPoses = cat(3, camPosesInternal.A);

You can compile the helperMonoVisualSLAM function into a MEX-file using the codegen command.
Note that the generated MEX-file has the same name as the original MATLAB file with _mex
appended, unless you use the -o option to specify the name of the executable.

compileTimeInputs = {coder.typeof(currI)};

codegen helperMonoVisualSLAM -args compileTimeInputs

1-139
1 Camera Calibration and SfM Examples

Code generation successful.

Process the image data frame-by-frame using the MEX-file.

% Process each image frame
for i=1:numel(imds.Files)
[xyzPoints, camPoses] = helperMonoVisualSLAM_mex(readimage(imds,i));
end

% Clear up
clear helperMonoVisualSLAM_mex

Supporting Functions

Short helper functions are included below. Larger function are included in separate files.

helperAddLoopConnections add connections between the current keyframe and the valid loop
candidate.

helperAddNewKeyFrame add key frames to the key frame set.

helperCheckLoopClosure detect loop candidates key frames by retrieving visually similar images
from the database.

helperCreateNewMapPoints create new map points by triangulation.

helperORBFeatureExtractorFunction implements the ORB feature extraction used in

bagOfFeatures.

helperTrackLastKeyFrame estimate the current camera pose by tracking the last key frame.

helperTrackLocalMap refine the current camera pose by tracking the local map.

helperVisualizeMatchedFeatures show the matched features in a frame.

helperVisualizeMotionAndStructure show map points and camera trajectory.

helperImportGroundTruth import camera pose ground truth from the downloaded data.

helperDetectAndExtractFeatures detect and extract and ORB features from the image.
function [features, validPoints] = helperDetectAndExtractFeatures(Irgb, ...
scaleFactor, numLevels, numPoints, varargin)

% In this example, the images are already undistorted. In a general

% workflow, uncomment the following code to undistort the images.
%
% if nargin > 4
% intrinsics = varargin{1};
% end
% Irgb = undistortImage(Irgb, intrinsics);

% Detect ORB features

Igray = im2gray(Irgb);

points = detectORBFeatures(Igray, ScaleFactor=scaleFactor, NumLevels=numLevels);

% Select a subset of features, uniformly distributed throughout the image

1-140
Monocular Visual Simultaneous Localization and Mapping

points = selectUniform(points, numPoints, size(Igray, 1:2));

% Extract features
[features, validPoints] = extractFeatures(Igray, points);
end

helperComputeHomography compute homography and evaluate reconstruction.

function [H, score, inliersIndex] = helperComputeHomography(matchedPoints1, matchedPoints2)

[H, inliersLogicalIndex] = estgeotform2d( ...

matchedPoints1, matchedPoints2, "projective", ...
MaxNumTrials=1e3, MaxDistance=4, Confidence=90);

inlierPoints1 = matchedPoints1(inliersLogicalIndex);
inlierPoints2 = matchedPoints2(inliersLogicalIndex);

inliersIndex = find(inliersLogicalIndex);

locations1 = inlierPoints1.Location;
locations2 = inlierPoints2.Location;
xy1In2 = transformPointsForward(H, locations1);
xy2In1 = transformPointsInverse(H, locations2);
error1in2 = sum((locations2 - xy1In2).^2, 2);
error2in1 = sum((locations1 - xy2In1).^2, 2);

outlierThreshold = 6;

score = sum(max(outlierThreshold-error1in2, 0)) + ...

sum(max(outlierThreshold-error2in1, 0));
end

helperComputeFundamentalMatrix compute fundamental matrix and evaluate reconstruction.

function [F, score, inliersIndex] = helperComputeFundamentalMatrix(matchedPoints1, matchedPoints2

[F, inliersLogicalIndex] = estimateFundamentalMatrix( ...

matchedPoints1, matchedPoints2, Method="RANSAC",...
NumTrials=1e3, DistanceThreshold=4);

inlierPoints1 = matchedPoints1(inliersLogicalIndex);
inlierPoints2 = matchedPoints2(inliersLogicalIndex);

inliersIndex = find(inliersLogicalIndex);

locations1 = inlierPoints1.Location;
locations2 = inlierPoints2.Location;

% Distance from points to epipolar line

lineIn1 = epipolarLine(F', locations2);
error2in1 = (sum([locations1, ones(size(locations1, 1),1)].* lineIn1, 2)).^2 ...
./ sum(lineIn1(:,1:2).^2, 2);
lineIn2 = epipolarLine(F, locations1);
error1in2 = (sum([locations2, ones(size(locations2, 1),1)].* lineIn2, 2)).^2 ...
./ sum(lineIn2(:,1:2).^2, 2);

outlierThreshold = 4;

1-141
1 Camera Calibration and SfM Examples

score = sum(max(outlierThreshold-error1in2, 0)) + ...

sum(max(outlierThreshold-error2in1, 0));

end

helperTriangulateTwoFrames triangulate two frames to initialize the map.

function [isValid, xyzPoints, inlierIdx] = helperTriangulateTwoFrames(...

pose1, pose2, matchedPoints1, matchedPoints2, intrinsics, minParallax)

camMatrix1 = cameraProjection(intrinsics, pose2extr(pose1));

camMatrix2 = cameraProjection(intrinsics, pose2extr(pose2));

[xyzPoints, reprojectionErrors, isInFront] = triangulate(matchedPoints1, ...

matchedPoints2, camMatrix1, camMatrix2);

% Filter points by view direction and reprojection error

minReprojError = 1;
inlierIdx = isInFront & reprojectionErrors < minReprojError;
xyzPoints = xyzPoints(inlierIdx ,:);

% A good two-view with significant parallax

ray1 = xyzPoints - pose1.Translation;
ray2 = xyzPoints - pose2.Translation;
cosAngle = sum(ray1 .* ray2, 2) ./ (vecnorm(ray1, 2, 2) .* vecnorm(ray2, 2, 2));

% Check parallax
isValid = all(cosAngle < cosd(minParallax) & cosAngle>0);
end

helperEstimateTrajectoryError calculate the tracking error.

function rmse = helperEstimateTrajectoryError(gTruth, cameraPoses)

locations = vertcat(cameraPoses.AbsolutePose.Translation);
gLocations = vertcat(gTruth.Translation);
scale = median(vecnorm(gLocations, 2, 2))/ median(vecnorm(locations, 2, 2));
scaledLocations = locations * scale;

rmse = sqrt(mean( sum((scaledLocations - gLocations).^2, 2) ));

disp(['Absolute RMSE for key frame trajectory (m): ', num2str(rmse)]);
end

helperUpdateGlobalMap update 3-D locations of map points after pose graph optimization

function mapPointSet = helperUpdateGlobalMap(...

mapPointSet, vSetKeyFrames, vSetKeyFramesOptim)
%helperUpdateGlobalMap update map points after pose graph optimization
posesOld = vSetKeyFrames.Views.AbsolutePose;
posesNew = vSetKeyFramesOptim.Views.AbsolutePose;
positionsOld = mapPointSet.WorldPoints;
positionsNew = positionsOld;
indices = 1:mapPointSet.Count;

% Update world location of each map point based on the new absolute pose of
% the corresponding major view
for i = indices
majorViewIds = mapPointSet.RepresentativeViewId(i);
poseNew = posesNew(majorViewIds).A;

1-142
Monocular Visual Simultaneous Localization and Mapping

tform = affinetform3d(poseNew/posesOld(majorViewIds).A);
positionsNew(i, :) = transformPointsForward(tform, positionsOld(i, :));
end
mapPointSet = updateWorldPoints(mapPointSet, indices, positionsNew);
end

Reference

[1] Mur-Artal, Raul, Jose Maria Martinez Montiel, and Juan D. Tardos. "ORB-SLAM: a versatile and
accurate monocular SLAM system." IEEE Transactions on Robotics 31, no. 5, pp 1147-116, 2015.

[2] Sturm, Jürgen, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. "A
benchmark for the evaluation of RGB-D SLAM systems". In Proceedings of IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 573-580, 2012.

[3] Kümmerle, Rainer, Giorgio Grisetti, Hauke Strasdat, Kurt Konolige, and Wolfram Burgard. "g 2 o:
A general framework for graph optimization." In Proceedings of IEEE International Conference on
Robotics and Automation, pp. 3607-3613, 2011.

Structure from Motion from Two Views

Structure from motion (SfM) is the process of estimating the 3-D structure of a scene from a set of 2-
D images. This example shows you how to estimate the poses of a calibrated camera from two
images, reconstruct the 3-D structure of the scene up to an unknown scale factor, and then recover
the actual scale factor by detecting an object of a known size.

Overview

This example shows how to reconstruct a 3-D scene from a pair of 2-D images taken with a camera
calibrated using the Camera Calibrator app. The algorithm consists of the following steps:

1 Match a sparse set of points between the two images. There are multiple ways of finding point
correspondences between two images. This example detects corners in the first image using the
detectMinEigenFeatures function, and tracks them into the second image using
vision.PointTracker. Alternatively you can use extractFeatures followed by
matchFeatures.
2 Estimate the fundamental matrix using estimateEssentialMatrix.
3 Compute the motion of the camera using the estrelpose function.
4 Match a dense set of points between the two images. Re-detect the point using
detectMinEigenFeatures with a reduced 'MinQuality' to get more points. Then track the
dense points into the second image using vision.PointTracker.
5 Determine the 3-D locations of the matched points using triangulate.
6 Detect an object of a known size. In this scene there is a globe, whose radius is known to be
10cm. Use pcfitsphere to find the globe in the point cloud.
7 Recover the actual scale, resulting in a metric reconstruction.

Read a Pair of Images

Load a pair of images into the workspace.

imageDir = fullfile(toolboxdir('vision'),'visiondata','upToScaleReconstructionImages');
images = imageDatastore(imageDir);
I1 = readimage(images, 1);
I2 = readimage(images, 2);
figure
imshowpair(I1, I2, 'montage');
title('Original Images');

1-144
Structure from Motion from Two Views

Load Camera Parameters

This example uses the camera parameters calculated by the Camera Calibrator app. The parameters
are stored in the cameraIntrinsics object, and include the camera intrinsics and lens distortion
coefficients.

% Load precomputed camera intrinsics

data = load('sfmCameraIntrinsics.mat');
intrinsics = data.intrinsics;

Remove Lens Distortion

Lens distortion can affect the accuracy of the final reconstruction. You can remove the distortion from
each of the images using the undistortImage function. This process straightens the lines that are
bent by the radial distortion of the lens.

I1 = undistortImage(I1, intrinsics);
I2 = undistortImage(I2, intrinsics);
figure
imshowpair(I1, I2, 'montage');
title('Undistorted Images');

1-145
1 Camera Calibration and SfM Examples

Find Point Correspondences Between the Images

Detect good features to track. Reduce 'MinQuality' to detect fewer points, which would be more
uniformly distributed throughout the image. If the motion of the camera is not very large, then
tracking using the KLT algorithm is a good way to establish point correspondences.

% Detect feature points

imagePoints1 = detectMinEigenFeatures(im2gray(I1), MinQuality = 0.1);

% Visualize detected points

figure
imshow(I1, InitialMagnification = 50);
title('150 Strongest Corners from the First Image');
hold on
plot(selectStrongest(imagePoints1, 150));

1-146
Structure from Motion from Two Views

% Create the point tracker

tracker = vision.PointTracker(MaxBidirectionalError=1, NumPyramidLevels=5);

% Initialize the point tracker

imagePoints1 = imagePoints1.Location;
initialize(tracker, imagePoints1, I1);

% Track the points

[imagePoints2, validIdx] = step(tracker, I2);
matchedPoints1 = imagePoints1(validIdx, :);
matchedPoints2 = imagePoints2(validIdx, :);

% Visualize correspondences
figure
showMatchedFeatures(I1, I2, matchedPoints1, matchedPoints2);
title('Tracked Features');

Estimate the Essential Matrix

Use the estimateEssentialMatrix function to compute the essential matrix and find the inlier
points that meet the epipolar constraint.

1-147
1 Camera Calibration and SfM Examples

% Estimate the fundamental matrix

[E, epipolarInliers] = estimateEssentialMatrix(...
matchedPoints1, matchedPoints2, intrinsics, Confidence = 99.99);

% Find epipolar inliers

inlierPoints1 = matchedPoints1(epipolarInliers, :);
inlierPoints2 = matchedPoints2(epipolarInliers, :);

% Display inlier matches

figure
showMatchedFeatures(I1, I2, inlierPoints1, inlierPoints2);
title('Epipolar Inliers');

Compute the Camera Pose

Compute the location and orientation of the second camera relative to the first one. Note that loc is
a translation unit vector, because translation can only be computed up to scale.
relPose = estrelpose(E, intrinsics, inlierPoints1, inlierPoints2);

Reconstruct the 3-D Locations of Matched Points

Re-detect points in the first image using lower 'MinQuality' to get more points. Track the new
points into the second image. Estimate the 3-D locations corresponding to the matched points using
the triangulate function, which implements the Direct Linear Transformation (DLT) algorithm [1].
Place the origin at the optical center of the camera corresponding to the first image.
% Detect dense feature points. Use an ROI to exclude points close to the
% image edges.
border = 30;
roi = [border, border, size(I1, 2)- 2*border, size(I1, 1)- 2*border];
imagePoints1 = detectMinEigenFeatures(im2gray(I1), ROI = roi, ...
MinQuality = 0.001);

% Create the point tracker

tracker = vision.PointTracker(MaxBidirectionalError=1, NumPyramidLevels=5);

% Initialize the point tracker

imagePoints1 = imagePoints1.Location;
initialize(tracker, imagePoints1, I1);

% Track the points

[imagePoints2, validIdx] = step(tracker, I2);
matchedPoints1 = imagePoints1(validIdx, :);
matchedPoints2 = imagePoints2(validIdx, :);

% Compute the camera matrices for each position of the camera

% The first camera is at the origin looking along the Z-axis. Thus, its
% transformation is identity.
camMatrix1 = cameraProjection(intrinsics, rigidtform3d);
camMatrix2 = cameraProjection(intrinsics, pose2extr(relPose));

% Compute the 3-D points

points3D = triangulate(matchedPoints1, matchedPoints2, camMatrix1, camMatrix2);

% Get the color of each reconstructed point

numPixels = size(I1, 1) * size(I1, 2);
allColors = reshape(I1, [numPixels, 3]);

1-148
Structure from Motion from Two Views

colorIdx = sub2ind([size(I1, 1), size(I1, 2)], round(matchedPoints1(:,2)), ...

round(matchedPoints1(:, 1)));
color = allColors(colorIdx, :);

% Create the point cloud

ptCloud = pointCloud(points3D, 'Color', color);

Display the 3-D Point Cloud

Use the plotCamera function to visualize the locations and orientations of the camera, and the
pcshow function to visualize the point cloud.

% Visualize the camera locations and orientations

cameraSize = 0.3;
figure
plotCamera(Size=cameraSize, Color='r', Label='1', Opacity=0);
hold on
grid on
plotCamera(AbsolutePose=relPose, Size=cameraSize, ...
Color='b', Label='2', Opacity=0);

% Visualize the point cloud

pcshow(ptCloud, VerticalAxis='y', VerticalAxisDir='down', MarkerSize=45);

% Rotate and zoom the plot

camorbit(0, -30);
camzoom(1.5);

% Label the axes

xlabel('x-axis');
ylabel('y-axis');
zlabel('z-axis')

title('Up to Scale Reconstruction of the Scene');

1-149
1 Camera Calibration and SfM Examples

Fit a Sphere to the Point Cloud to Find the Globe

Find the globe in the point cloud by fitting a sphere to the 3-D points using the pcfitsphere
function.

% Detect the globe

globe = pcfitsphere(ptCloud, 0.1);

% Display the surface of the globe

plot(globe);
title('Estimated Location and Size of the Globe');
hold off

1-150
Structure from Motion from Two Views

Metric Reconstruction of the Scene

The actual radius of the globe is 10cm. You can now determine the coordinates of the 3-D points in
centimeters.

% Determine the scale factor

scaleFactor = 10 / globe.Radius;

% Scale the point cloud

ptCloud = pointCloud(points3D * scaleFactor, Color=color);
relPose.Translation = relPose.Translation * scaleFactor;

% Visualize the point cloud in centimeters

cameraSize = 2;
figure
plotCamera(Size=cameraSize, Color='r', Label='1', Opacity=0);
hold on
grid on
plotCamera(AbsolutePose=relPose, Size=cameraSize, ...
Color='b', Label='2', Opacity=0);

% Visualize the point cloud

pcshow(ptCloud, VerticalAxis='y', VerticalAxisDir='down', MarkerSize=45);
camorbit(0, -30);
camzoom(1.5);

% Label the axes

1-151
1 Camera Calibration and SfM Examples

xlabel('x-axis (cm)');
ylabel('y-axis (cm)');
zlabel('z-axis (cm)')
title('Metric Reconstruction of the Scene');

Summary

This example showed you how to recover camera motion and reconstruct the 3-D structure of a scene
from two images taken with a calibrated camera.

References

[1] Hartley, Richard, and Andrew Zisserman. Multiple View Geometry in Computer Vision. Second
Edition. Cambridge, 2000.

1-152
Stereo Visual Simultaneous Localization and Mapping

Stereo Visual Simultaneous Localization and Mapping

vSLAM can be performed by using just a monocular camera. However, since depth cannot be
accurately calculated using a single camera, the scale of the map and the estimated trajectory is
unknown and drifts over time. In addition, to bootstrap the system, multiple views are required to
produce an initial map as it cannot be triangulated from the first frame. Using a stereo camera solves
these problems and provides a more reliable vSLAM solution.

This example shows how to process image data from a stereo camera to build a map of an outdoor
environment and estimate the trajectory of the camera. The example uses a version of ORB-SLAM2
[1] on page 1-166 algorithm, which is feature-based and supports stereo cameras.

Overview of Processing Pipeline

The pipeline for stereo vSLAM is very similar to the monocular vSLAM pipeline in the “Monocular
Visual Simultaneous Localization and Mapping” on page 1-122 example. The major difference is that
in the Map Initialization stage 3-D map points are created from a pair of stereo images of the same
stereo pair instead of two images of different frames.

• Map Initialization: The pipeline starts by initializing the map of 3-D points from a pair of stereo
images using the disparity map. The left image is stored as the first key frame.
• Tracking: Once a map is initialized, for each new stereo pair, the pose of the camera is estimated
by matching features in the left image to features in the last key frame. The estimated camera
pose is refined by tracking the local map.
• Local Mapping: If the current left image is identified as a key frame, new 3-D map points are
computed from the disparity of the stereo pair. At this stage, bundle adjustment is used to
minimize reprojection errors by adjusting the camera pose and 3-D points.
• Loop Closure: Loops are detected for each key frame by comparing it against all previous key
frames using the bag-of-features approach. Once a loop closure is detected, the pose graph is
optimized to refine the camera poses of all the key frames.

1-153
1 Camera Calibration and SfM Examples

Download and Explore the Input Stereo Image Sequence

The data used in this example are from the UTIAS Long-Term Localization and Mapping Dataset
provided by University of Toronto Institute for Aerospace Studies. You can download the data to a
directory using a web browser or by running the following code:

ftpObj = ftp('asrl3.utias.utoronto.ca');
tempFolder = fullfile(tempdir);
dataFolder = [tempFolder, '2020-vtr-dataset/UTIAS-In-The-Dark/'];
zipFileName = [dataFolder, 'run_000005.zip'];
folderExists = exist(dataFolder, 'dir');

% Create a folder in a temporary directory to save the downloaded file

if ~folderExists
mkdir(dataFolder);
disp('Downloading run_000005.zip (818 MB). This download can take a few minutes.')
mget(ftpObj,'/2020-vtr-dataset/UTIAS-In-The-Dark/run_000005.zip', tempFolder);

% Extract contents of the downloaded file

disp('Extracting run_000005.zip (818 MB) ...')
unzip(zipFileName, dataFolder);
end

Use two imageDatastore objects to store the stereo images.

imgFolderLeft = [dataFolder,'/images/left/'];
imgFolderRight = [dataFolder,'/images/right/'];
imdsLeft = imageDatastore(imgFolderLeft);
imdsRight = imageDatastore(imgFolderRight);

% Inspect the first pair of images

currFrameIdx = 1;
currILeft = readimage(imdsLeft, currFrameIdx);
currIRight = readimage(imdsRight, currFrameIdx);
imshowpair(currILeft, currIRight, 'montage');

Map Initialization

The ORB-SLAM pipeline starts by initializing the map that holds 3-D world points. This step is crucial
and has a significant impact on the accuracy of final SLAM result. Initial ORB feature point
correspondences are found using matchFeatures between two images of a stereo pair. The matched
pairs should satisfy the following constraints:

1-154
Stereo Visual Simultaneous Localization and Mapping

• The horizontal shift between the two corresponding feature points in the rectified stereo pair
image is less than the maximum disparity. You can determine the approximate maximum disparity
value from the stereo anaglyph of the stereo pair image. For more information, see “Choosing
Range of Disparity”
• The vertical shift between the two corresponding feature points in the rectified stereo pair image
is less than a threshold.
• The scales of the matched features are nearly identical.

The 3-D world locations corresponding to the matched feature points are determined as follows:

• Use “Choosing Range of Disparity” to compute the disparity map for each pair of stereo images by
using semi-global matching (SGM) method.
• Use reconstructScene to compute the 3-D world point coordinates from the disparity map.
• Find the locations in the disparity map that correspond to the feature points and their 3-D world
locations.
% Set random seed for reproducibility
rng(0);

% Load the initial camera pose. The initial camera pose is derived based
% on the transformation between the camera and the vehicle:
% http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/text_files/transform_camera_vehicle.tx
initialPoseData = load("initialPose.mat");
initialPose = initialPoseData.initialPose;

% Construct the reprojection matrix for 3-D reconstruction.

% The intrinsics for the dataset can be found at the following page:
% http://asrl.utias.utoronto.ca/datasets/2020-vtr-dataset/text_files/camera_parameters.txt
focalLength = [387.777 387.777]; % specified in pixels
principalPoint = [257.446 197.718]; % specified in pixels [x, y]
baseline = 0.239965; % specified in meters
imageSize = size(currILeft,[1,2]); % in pixels [mrows, ncols]
intrinsics = cameraIntrinsics(focalLength, principalPoint, imageSize);

reprojectionMatrix = [1, 0, 0, -principalPoint(1);

0, 1, 0, -principalPoint(2);
0, 0, 0, focalLength(1);
0, 0, 1/baseline, 0];

% In this example, the images are already undistorted and rectified. In a general workflow,
% uncomment the following code to undistort and rectify the images.
% currILeft = undistortImage(currILeft, intrinsics);
% currIRight = undistortImage(currIRight, intrinsics);
% stereoParams = stereoParameters(intrinsics, intrinsics, eye(3), [-baseline, 0 0]);
% [currILeft, currIRight] = rectifyStereoImages(currILeft, currIRight, stereoParams, OutputView="

% Detect and extract ORB features from the rectified stereo images
scaleFactor = 1.2;
numLevels = 8;
[currFeaturesLeft, currPointsLeft] = helperDetectAndExtractFeatures(im2gray(currILeft), scaleF
[currFeaturesRight, currPointsRight] = helperDetectAndExtractFeatures(im2gray(currIRight), scale

% Match feature points between the stereo images and get the 3-D world positions
disparityRange = [0 48]; % specified in pixels
[xyzPoints, matchedPairs] = helperReconstructFromStereo(im2gray(currILeft), im2gray(currIRight),
currFeaturesLeft, currFeaturesRight, currPointsLeft, currPointsRight, reprojectionMatrix, ini

1-155
1 Camera Calibration and SfM Examples

Data Management and Visualization

After the map is initialized using the first stereo pair, you can use imageviewset and
worldpointset to store the first key frames and the corresponding map points:

% Create an empty imageviewset object to store key frames

vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points

mapPointSet = worldpointset;

% Add the first key frame

currKeyFrameId = 1;
vSetKeyFrames = addView(vSetKeyFrames, currKeyFrameId, initialPose, Points=currPointsLeft,...
Features=currFeaturesLeft.Features);

% Add 3-D map points

[mapPointSet, stereoMapPointsIdx] = addWorldPoints(mapPointSet, xyzPoints);

% Add observations of the map points

mapPointSet = addCorrespondences(mapPointSet, currKeyFrameId, stereoMapPointsIdx, matchedPairs(:,

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, stereoMapPointsIdx, vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, stereoMapPointsIdx, vSetKeyFrames.Views);

% Visualize matched features in the first key frame

featurePlot = helperVisualizeMatchedFeaturesStereo(currILeft, currIRight, currPointsLeft, ...
currPointsRight, matchedPairs);

% Visualize initial map points and camera trajectory

mapPlot = helperVisualizeMotionAndStructureStereo(vSetKeyFrames, mapPointSet);

% Show legend
showLegend(mapPlot);

Initialize Place Recognition Database

bag =
bagOfFeatures(imds,CustomExtractor=@helperORBFeatureExtractorFunction,TreePro
perties=[3, 10], StrongestFeatures=1);

where imds is an imageDatastore object storing the training images and

helperORBFeatureExtractorFunction is the ORB feature extractor function. See “Image
Retrieval with Bag of Visual Words” on page 19-168 for more information.

The loop closure process incrementally builds a database, represented as an invertedImageIndex

object, that stores the visual word-to-image mapping based on the bag of ORB features.

% Load the bag of features data created offline

bofData = load("bagOfFeaturesDataSLAM.mat");

1-156
Stereo Visual Simultaneous Localization and Mapping

% Initialize the place recognition database

loopDatabase = invertedImageIndex(bofData.bof,SaveFeatureLocations=false);

% Add features of the first key frame to the database

addImageFeatures(loopDatabase, currFeaturesLeft, currKeyFrameId);

Tracking

The tracking process is performed using every pair and determines when to insert a new key frame.

% ViewId of the last key frame

lastKeyFrameId = currKeyFrameId;

% Index of the last key frame in the input image sequence

lastKeyFrameIdx = currFrameIdx;

% Indices of all the key frames in the input image sequence

addedFramesIdx = lastKeyFrameIdx;

currFrameIdx = 2;
isLoopClosed = false;

Each frame is processed as follows:

1 ORB features are extracted for each new stereo pair of images and then matched (using
matchFeatures), with features in the last key frame that have known corresponding 3-D map
points.
2 Estimate the camera pose with the Perspective-n-Point algorithm using estworldpose.

Given the camera pose, project the map points observed by the last key frame into the current frame
and search for feature correspondences using matchFeaturesInRadius.

1 With 3-D to 2-D correspondences in the current frame, refine the camera pose by performing a
motion-only bundle adjustment using bundleAdjustmentMotion.
2 Project the local map points into the current frame to search for more feature correspondences
using matchFeaturesInRadius and refine the camera pose again using
bundleAdjustmentMotion.
3 The last step of tracking is to decide if the current frame should be a new key frame. A frame is a
key frame if both of the following conditions are satisfied:

• At least 5 frames have passed since the last key frame or the current frame tracks fewer than 80
map points.
• The map points tracked by the current frame are fewer than 90% of points tracked by the
reference key frame.

If the current frame is to become a key frame, continue to the Local Mapping process. Otherwise,
start Tracking for the next frame.

% Main loop
isLastFrameKeyFrame = true;
while ~isLoopClosed && currFrameIdx <= numel(imdsLeft.Files)

currILeft = readimage(imdsLeft, currFrameIdx);

currIRight = readimage(imdsRight, currFrameIdx);

1-157
1 Camera Calibration and SfM Examples

currILeftGray = im2gray(currILeft);
currIRightGray = im2gray(currIRight);

[currFeaturesLeft, currPointsLeft] = helperDetectAndExtractFeatures(currILeftGray, scaleFa

[currFeaturesRight, currPointsRight] = helperDetectAndExtractFeatures(currIRightGray, scaleF

% Track the last key frame

% trackedMapPointsIdx: Indices of the map points observed in the current left frame
% trackedFeatureIdx: Indices of the corresponding feature points in the current left frame
[currPose, trackedMapPointsIdx, trackedFeatureIdx] = helperTrackLastKeyFrame(mapPointSet, ...
vSetKeyFrames.Views, currFeaturesLeft, currPointsLeft, lastKeyFrameId, intrinsics, scaleF

if isempty(currPose) || numel(trackedMapPointsIdx) < 30

currFrameIdx = currFrameIdx + 1;
continue
end

% Track the local map and check if the current frame is a key frame.
% localKeyFrameIds: ViewId of the connected key frames of the current frame
numSkipFrames = 5;
numPointsKeyFrame = 80;
[localKeyFrameIds, currPose, trackedMapPointsIdx, trackedFeatureIdx, isKeyFrame] = ...
helperTrackLocalMap(mapPointSet, vSetKeyFrames, trackedMapPointsIdx, ...
trackedFeatureIdx, currPose, currFeaturesLeft, currPointsLeft, intrinsics, scaleFactor, n
isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

% Match feature points between the stereo images and get the 3-D world positions
[xyzPoints, matchedPairs] = helperReconstructFromStereo(currILeftGray, currIRightGray, currFe
currFeaturesRight, currPointsLeft, currPointsRight, reprojectionMatrix, currPose, dispari

% Visualize matched features in the stereo image

updatePlot(featurePlot, currILeft, currIRight, currPointsLeft, currPointsRight, trackedFeatur

if ~isKeyFrame && currFrameIdx < numel(imdsLeft.Files)

currFrameIdx = currFrameIdx + 1;
isLastFrameKeyFrame = false;
continue
else
[untrackedFeatureIdx, ia] = setdiff(matchedPairs(:, 1), trackedFeatureIdx);
xyzPoints = xyzPoints(ia, :);
isLastFrameKeyFrame = true;
end

% Update current key frame ID

currKeyFrameId = currKeyFrameId + 1;

Local Mapping

1-158
Stereo Visual Simultaneous Localization and Mapping

adjustment refines the pose of the current key frame, the poses of connected key frames, and all the
map points observed in these key frames.

% Add the new key frame

[mapPointSet, vSetKeyFrames] = helperAddNewKeyFrame(mapPointSet, vSetKeyFrames, ...
currPose, currFeaturesLeft, currPointsLeft, trackedMapPointsIdx, trackedFeatureIdx, local

% Remove outlier map points that are observed in fewer than 3 key frames
if currKeyFrameId == 2
triangulatedMapPointsIdx = [];
end

mapPointSet = helperCullRecentMapPoints(mapPointSet, ...

trackedMapPointsIdx, triangulatedMapPointsIdx, stereoMapPointsIdx);

% Add new map points computed from disparity

[mapPointSet, stereoMapPointsIdx] = addWorldPoints(mapPointSet, xyzPoints);
mapPointSet = addCorrespondences(mapPointSet, currKeyFrameId, stereoMapPointsIdx, ...
untrackedFeatureIdx);

% Create new map points by triangulation

minNumMatches = 20;
minParallax = 0.35;
[mapPointSet, vSetKeyFrames, triangulatedMapPointsIdx, stereoMapPointsIdx] = helperCreateNewM
mapPointSet, vSetKeyFrames, currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minPa
untrackedFeatureIdx, stereoMapPointsIdx);

% Local bundle adjustment

[refinedViews, dist] = connectedViews(vSetKeyFrames, currKeyFrameId, MaxDistance=2);
refinedKeyFrameIds = refinedViews.ViewId;

% Always fix the first two key frames

fixedViewIds = refinedKeyFrameIds(dist==2);
fixedViewIds = fixedViewIds(1:min(10, numel(fixedViewIds)));

% Refine local key frames and map points

[mapPointSet, vSetKeyFrames, mapPointIdx] = bundleAdjustment(...
mapPointSet, vSetKeyFrames, [refinedKeyFrameIds; currKeyFrameId], intrinsics, ...
FixedViewIDs=fixedViewIds, PointsUndistorted=true, AbsoluteTolerance=1e-7,...
RelativeTolerance=1e-16, Solver='preconditioned-conjugate-gradient', MaxIteration=10);

% Update view direction and depth

mapPointSet = updateLimitsAndDirection(mapPointSet, mapPointIdx, ...
vSetKeyFrames.Views);

% Update representative view

mapPointSet = updateRepresentativeView(mapPointSet, mapPointIdx, ...
vSetKeyFrames.Views);

% Visualize 3-D world points and camera trajectory

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

Loop Closure

The loop closure step takes the current key frame processed by the local mapping process and tries
to detect and close the loop. Loop candidates are identified by querying images in the database that
are visually similar to the current key frame using evaluateImageRetrieval. A candidate key

1-159
1 Camera Calibration and SfM Examples

frame is valid if it is not connected to the last key frame and three of its neighbor key frames are loop
candidates.

When a valid loop closure candidate is found, compute the relative pose between the loop closure
candidate frame and the current key frame using estgeotform3d. Then add the loop connection
with the relative pose and update mapPointSet and vSetKeyFrames.

% Check loop closure after some key frames have been created
if currKeyFrameId > 50

% Minimum number of feature matches of loop edges

loopEdgeNumMatches = 50;

% Detect possible loop closure key frame candidates

[isDetected, validLoopCandidates] = helperCheckLoopClosure(vSetKeyFrames, currKeyFrameId,
loopDatabase, currILeft, loopEdgeNumMatches);

isTooCloseView = currKeyFrameId - max(validLoopCandidates) < 100;

if isDetected && ~isTooCloseView
% Add loop closure connections
[isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnectionsStereo(...
mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
currFeaturesLeft, currPointsLeft, loopEdgeNumMatches);
end
end

% If no loop closure is detected, add current features into the database

if ~isLoopClosed
addImageFeatures(loopDatabase, currFeaturesLeft, currKeyFrameId);
end

% Update IDs and indices

lastKeyFrameId = currKeyFrameId;
lastKeyFrameIdx = currFrameIdx;
addedFramesIdx = [addedFramesIdx; currFrameIdx];
currFrameIdx = currFrameIdx + 1;
end % End of main loop

1-160
Stereo Visual Simultaneous Localization and Mapping

Loop edge added between keyframe: 2 and 286

% Optimize the poses

vSetKeyFramesOptim = optimizePoses(vSetKeyFrames, minNumMatches, Tolerance=1e-16);

% Update map points after optimizing the poses

mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim);

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

% Plot the optimized camera trajectory

optimizedPoses = poses(vSetKeyFramesOptim);
plotOptimizedTrajectory(mapPlot, optimizedPoses)

% Update legend
showLegend(mapPlot);

1-161
1 Camera Calibration and SfM Examples

Compare with the Ground Truth

You can compare the optimized camera trajectory with the ground truth to evaluate the accuracy of
the solution. The downloaded data contains a gps.txt file that stores the GPS location for each
frame. You can convert the GPS location from geographic to local Cartesian coordinates using
latlon2local (Automated Driving Toolbox) from Automated Driving Toolbox or geodetic2enu
(Mapping Toolbox) from Mapping Toolbox. In this example, you can simply load the converted GPS
data from an M-file.

% Load GPS data

gpsData = load("gpsLocation.mat");
gpsLocation = gpsData.gpsLocation;

% Transform GPS locations to the reference coordinate system

gTruth = helperTransformGPSLocations(gpsLocation, optimizedPoses);

% Plot the GPS locations

plotActualTrajectory(mapPlot, gTruth(addedFramesIdx, :));

% Show legend
showLegend(mapPlot);

1-162
Stereo Visual Simultaneous Localization and Mapping

Dense Reconstruction from Stereo Images

Given the refined camera poses, you can perform dense reconstruction from the stereo images
corresponding to the key frames.

% Create an array of pointCloud objects to store the 3-D world points

% associated with the key frames
ptClouds = repmat(pointCloud(zeros(1, 3)), numel(addedFramesIdx), 1);

for i = 1: numel(addedFramesIdx)
ILeft = readimage(imdsLeft, addedFramesIdx(i));
IRight = readimage(imdsRight, addedFramesIdx(i));

% Reconstruct scene from disparity

disparityMap = disparitySGM(im2gray(ILeft), im2gray(IRight), DisparityRange=disparityRange);
xyzPoints = reconstructScene(disparityMap, reprojectionMatrix);

% Ignore the upper half of the images which mainly show the sky
xyzPoints(1:floor(imageSize(1)/2), :, :) = NaN;

% Ignore the lower part of the images which show the vehicle
xyzPoints(imageSize(1)-50:end, :, :) = NaN;

xyzPoints = reshape(xyzPoints, [imageSize(1)*imageSize(2), 3]);

% Get color from the color image

colors = reshape(ILeft, [imageSize(1)*imageSize(2), 3]);

% Remove world points that are too far away from the camera
validIndex = xyzPoints(:, 3) > 0 & xyzPoints(:, 3) < 100/reprojectionMatrix(4, 3);
xyzPoints = xyzPoints(validIndex, :);
colors = colors(validIndex, :);

% Transform world points to the world coordinates

currPose = optimizedPoses.AbsolutePose(i);
xyzPoints = transformPointsForward(currPose, xyzPoints);
ptCloud = pointCloud(xyzPoints, Color=colors);

% Downsample the point cloud

ptClouds(i) = pcdownsample(ptCloud, random=0.5);
end

% Concatenate the point clouds

pointCloudsAll = pccat(ptClouds);

% Visualize the point cloud

figure
ax = pcshow(pointCloudsAll,VerticalAxis="y", VerticalAxisDir="down");
xlabel('X')
ylabel('Y')
zlabel('Z')
title('Dense Reconstruction')

% Display the bird's eye view of the scene

view(ax, [0 0 1]);
camroll(ax, 90);

1-163
1 Camera Calibration and SfM Examples

Supporting Functions

Short helper functions are listed below. Larger function are included in separate files.

helperDetectAndExtractFeatures detect and extract and ORB features from the image.
function [features, validPoints] = helperDetectAndExtractFeatures(Igray, scaleFactor, numLevels)

numPoints = 600;

% Detect ORB features

points = detectORBFeatures(Igray, ScaleFactor=scaleFactor, NumLevels=numLevels);

% Select a subset of features, uniformly distributed throughout the image

points = selectUniform(points, numPoints, size(Igray, 1:2));

% Extract features
[features, validPoints] = extractFeatures(Igray, points);
end

helperReconstructFromStereo reconstruct scene from stereo image using the disparity map
function [xyzPoints, indexPairs] = helperReconstructFromStereo(I1, I2, ...
features1, features2, points1, points2, reprojectionMatrix, currPose, disparityRange)

indexPairs = helperFindValidFeaturePairs(features1, features2, points1, points2, disparityRan

% Compute disparity for all pixels in the left image. In practice, it is more
% common to compute disparity just for the matched feature points.
disparityMap = disparitySGM(I1, I2, DisparityRange=disparityRange);
xyzPointsAll = reconstructScene(disparityMap, reprojectionMatrix);

% Find the corresponding world point of the matched feature points

1-164
Stereo Visual Simultaneous Localization and Mapping

locations = floor(points1.Location(indexPairs(:, 1), [2 1]));

xyzPoints = [];
isPointFound = false(size(points1));

for i = 1:size(locations, 1)
point3d = squeeze(xyzPointsAll(locations(i,1), locations(i, 2), :))';
isPointValid = point3d(3) > 0 & point3d(3) < 200/reprojectionMatrix(4, 3);
if isPointValid
xyzPoints = [xyzPoints; point3d]; %#ok<*AGROW>
isPointFound(i) = true;
end
end
indexPairs = indexPairs(isPointFound, :);
xyzPoints = xyzPoints * currPose.Rotation + currPose.Translation;
end

helperFindValidFeaturePairs match features between a pair of stereo images

function indexPairs = helperFindValidFeaturePairs(features1, features2, points1, points2, dispari

indexPairs = matchFeatures(features1, features2,...
Unique=true, MaxRatio=1, MatchThreshold=40);

matchedPoints1 = points1.Location(indexPairs(:,1), :);

matchedPoints2 = points2.Location(indexPairs(:,2), :);
scales1 = points1.Scale(indexPairs(:,1), :);
scales2 = points2.Scale(indexPairs(:,2), :);

dist2EpipolarLine = abs(matchedPoints2(:, 2) - matchedPoints1(:, 2));

shiftDist = matchedPoints1(:, 1) - matchedPoints2(:, 1);

isCloseToEpipolarline = dist2EpipolarLine < 2*scales2;

isDisparityValid = shiftDist > 0 & shiftDist < disparityRange(2);
isScaleIdentical = scales1 == scales2;
indexPairs = indexPairs(isCloseToEpipolarline & isDisparityValid & isScaleIdentical, :);
end

helperCullRecentMapPoints cull recently added map points.

function mapPointSet = helperCullRecentMapPoints(mapPointSet, mapPointsIdx, newPointIdx, stereoMa

outlierIdx = setdiff([newPointIdx; stereoMapPointsIndices], mapPointsIdx);

if ~isempty(outlierIdx)
mapPointSet = removeWorldPoints(mapPointSet, outlierIdx);
end
end

helperUpdateGlobalMap update 3-D locations of map points after pose graph optimization

function mapPointSet = helperUpdateGlobalMap(mapPointSet, vSetKeyFrames, vSetKeyFramesOptim)

posesOld = vSetKeyFrames.Views.AbsolutePose;
posesNew = vSetKeyFramesOptim.Views.AbsolutePose;
positionsOld = mapPointSet.WorldPoints;
positionsNew = positionsOld;
indices = 1:mapPointSet.Count;

% Update world location of each map point based on the new absolute pose of

1-165
1 Camera Calibration and SfM Examples

% the corresponding major view

for i = 1: mapPointSet.Count
majorViewIds = mapPointSet.RepresentativeViewId(i);
tform = rigidtform3d(posesNew(majorViewIds).A/posesOld(majorViewIds).A);
positionsNew(i, :) = transformPointsForward(tform, positionsOld(i, :));
end
mapPointSet = updateWorldPoints(mapPointSet, indices, positionsNew);
end

helperTransformGPSLocations transform the GPS locations to the reference coordinate system

function gTruth = helperTransformGPSLocations(gpsLocations, optimizedPoses)

initialYawGPS = atan( (gpsLocations(100, 2) - gpsLocations(1, 2)) / ...

(gpsLocations(100, 1) - gpsLocations(1, 1)));
initialYawSLAM = atan((optimizedPoses.AbsolutePose(50).Translation(2) - ...
optimizedPoses.AbsolutePose(1).Translation(2)) / ...
(optimizedPoses.AbsolutePose(59).Translation(1) - ...
optimizedPoses.AbsolutePose(1).Translation(1)));

relYaw = initialYawGPS - initialYawSLAM;

relTranslation = optimizedPoses.AbsolutePose(1).Translation;

initialTform = rotationVectorToMatrix([0 0 relYaw]);

for i = 1:size(gpsLocations, 1)
gTruth(i, :) = initialTform * gpsLocations(i, :)' + relTranslation';
end
end

References

[1] Mur-Artal, Raul, and Juan D. Tardós. "ORB-SLAM2: An open-source SLAM system for monocular,
stereo, and RGB-D cameras." IEEE Transactions on Robotics 33, no. 5 (2017): 1255-1262.

1-166
Evaluating the Accuracy of Single Camera Calibration

Evaluating the Accuracy of Single Camera Calibration

This example shows how to evaluate the accuracy of camera parameters estimated using the “Using
the Single Camera Calibrator App” on page 18-22 app or the estimateCameraParameters
function.

Overview

Camera calibration is the process of estimating parameters of the camera using images of a special
calibration pattern. The parameters include camera intrinsics, distortion coefficients, and camera
extrinsics. Once you calibrate a camera, there are several ways to evaluate the accuracy of the
estimated parameters:

• Plot the relative locations of the camera and the calibration pattern
• Calculate the reprojection errors
• Calculate the parameter estimation errors

Calibrate the Camera

Estimate camera parameters using a set of images of a checkerboard calibration pattern.

% Create a set of calibration images.

images = imageDatastore(fullfile(toolboxdir("vision"), "visiondata", ...
"calibration", "mono"));
imageFileNames = images.Files;

% Detect calibration pattern.

[imagePoints, boardSize] = detectCheckerboardPoints(imageFileNames);

% Generate world coordinates of the corners of the squares.

squareSize = 29; % millimeters
worldPoints = generateCheckerboardPoints(boardSize, squareSize);

% Calibrate the camera.

I = readimage(images, 1);
imageSize = [size(I, 1), size(I, 2)];
[params, ~, estimationErrors] = estimateCameraParameters(imagePoints, worldPoints, ...
ImageSize=imageSize);

Extrinsics

You can quickly discover obvious errors in your calibration by plotting relative locations of the
camera and the calibration pattern. Use the showExtrinsics function to either plot the locations of
the calibration pattern in the camera's coordinate system, or the locations of the camera in the
pattern's coordinate system. Look for obvious problems, such as the pattern being behind the
camera, or the camera being behind the pattern. Also check if a pattern is too far or too close to the
camera.

figure;
showExtrinsics(params, "CameraCentric");

1-167
1 Camera Calibration and SfM Examples

figure;
showExtrinsics(params, "PatternCentric");

1-168
Evaluating the Accuracy of Single Camera Calibration

Reprojection Errors

Reprojection errors provide a qualitative measure of accuracy. A reprojection error is the distance
between a pattern keypoint detected in a calibration image, and a corresponding world point
projected into the same image. The showReprojectionErrors function provides a useful
visualization of the average reprojection error in each calibration image. If the overall mean
reprojection error is too high, consider excluding the images with the highest error and recalibrating.

figure;
showReprojectionErrors(params);

1-169
1 Camera Calibration and SfM Examples

Estimation Errors

Estimation errors represent the uncertainty of each estimated parameter. The

estimateCameraParameters function optionally returns estimationErrors output, containing
the standard error corresponding to each estimated camera parameter. The returned standard error
σ (in the same units as the corresponding parameter) can be used to calculate confidence intervals.
For example +/- 1 . 96σ corresponds to the 95% confidence interval. In other words, the probability
that the actual value of a given parameter is within 1 . 96σ of its estimate is 95%.

displayErrors(estimationErrors, params);

Standard Errors of Estimated Camera Parameters

----------------------------------------------

Intrinsics
----------
Focal length (pixels): [ 714.1886 +/- 3.3219 710.3785 +/- 4.0579 ]
Principal point (pixels):[ 563.6481 +/- 5.3967 355.7252 +/- 3.3036 ]
Radial distortion: [ -0.3536 +/- 0.0091 0.1730 +/- 0.0488 ]

Extrinsics
----------
Rotation vectors:
[ -0.6096 +/- 0.0054 -0.1789 +/- 0.0073 -0.3835 +/- 0.0024
[ -0.7283 +/- 0.0050 -0.0996 +/- 0.0072 0.1964 +/- 0.0027
[ -0.6722 +/- 0.0051 -0.1444 +/- 0.0074 -0.1329 +/- 0.0026
[ -0.5836 +/- 0.0056 -0.2901 +/- 0.0074 -0.5622 +/- 0.0025

1-170
Evaluating the Accuracy of Single Camera Calibration

[ -0.3157 +/- 0.0065 -0.1441 +/- 0.0075 -0.1067 +/- 0.0011

[ -0.7581 +/- 0.0052 0.1947 +/- 0.0072 0.4324 +/- 0.0030
[ -0.7515 +/- 0.0051 0.0767 +/- 0.0072 0.2070 +/- 0.0029
[ -0.6223 +/- 0.0053 0.0231 +/- 0.0073 0.3663 +/- 0.0024
[ 0.3443 +/- 0.0063 -0.2226 +/- 0.0073 -0.0437 +/- 0.0014

Translation vectors (mm):

[ -146.0517 +/- 6.0391 -26.8685 +/- 3.7318 797.9026 +/- 3.9002
[ -209.4358 +/- 6.9637 -59.4565 +/- 4.3578 921.8198 +/- 4.6295
[ -129.3825 +/- 7.0907 -44.1030 +/- 4.3751 937.6831 +/- 4.4913
[ -151.0049 +/- 6.6905 -27.3253 +/- 4.1339 884.2788 +/- 4.3925
[ -174.9500 +/- 6.7056 -24.3499 +/- 4.1606 886.4961 +/- 4.6686
[ -134.3097 +/- 7.8887 -103.4981 +/- 4.8925 1042.4553 +/- 4.8184
[ -173.9846 +/- 7.6891 -73.1691 +/- 4.7812 1017.2385 +/- 4.8126
[ -202.9448 +/- 7.4327 -87.9091 +/- 4.6482 983.6957 +/- 4.9072
[ -319.8862 +/- 6.3213 -119.8898 +/- 4.0922 829.4581 +/- 4.9591

How to Improve Calibration Accuracy

Whether or not a particular reprojection or estimation error is acceptable depends on the precision
requirements of your particular application. However, if you have determined that your calibration
accuracy is unacceptable, there are several ways to improve it:

• Modify calibration settings. Try using 3 radial distortion coefficients, estimating tangential
distortion, or the skew.
• Take more calibration images. The pattern in the images must be in different 3D orientations, and
it should be positioned such that you have keypoints in all parts of the field of view. In particular, it
is very important to have keypoints close to the edges and the corners of the image in order to get
a better estimate of the distortion coefficients.
• Exclude images that have high reprojection errors and re-calibrate.

Summary

This example showed how to obtain and interpret camera calibration errors.

References

[1] Z. Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22(11):1330-1334, 2000.

1-171
1 Camera Calibration and SfM Examples

Measuring Planar Objects with a Calibrated Camera

This example shows how to measure the diameter of coins in world units using a single calibrated
camera.

Overview

This example shows how to calibrate a camera, and then use it to measure the size of planar objects,
such as coins. An example application of this approach is measuring parts on a conveyor belt for
quality control.

Calibrate the Camera

Camera calibration is the process of estimating the parameters of the lens and the image sensor.
These parameters are needed to measure objects captured by the camera. This example shows how
to calibrate a camera programmatically. Alternatively, you can calibrate a camera using the “Using
the Single Camera Calibrator App” on page 18-22 app.

To calibrate the camera, we first need to take multiple images of a calibration pattern from different
angles. A typical calibration pattern is an asymmetric checkerboard, where one side contains an even
number of squares, both black and white, and the other contains an odd number of squares.

The pattern must be affixed to a flat surface, and it should be at approximately the same distance
from the camera as the objects you want to measure. The size of a square must be measured in world
units, for example millimeters, as precisely as possible. In this example we use 9 images of the
pattern, but in practice it is recommended to use 10 to 20 images for accurate calibration.

Prepare Calibration Images

Create a cell array of file names of calibration images.

numImages = 9;
files = cell(1, numImages);
for i = 1:numImages
files{i} = fullfile(matlabroot, 'toolbox', 'vision', 'visiondata', ...
'calibration', 'slr', sprintf('image%d.jpg', i));
end

% Display one of the calibration images

magnification = 25;
I = imread(files{1});
figure; imshow(I, InitialMagnification = magnification);
title("One of the Calibration Images");

1-172
Measuring Planar Objects with a Calibrated Camera

Estimate Camera Parameters

% Detect the checkerboard corners in the images.

[imagePoints, boardSize] = detectCheckerboardPoints(files);

% Generate the world coordinates of the checkerboard corners in the

% pattern-centric coordinate system, with the upper-left corner at (0,0).
squareSize = 29; % in millimeters
worldPoints = generateCheckerboardPoints(boardSize, squareSize);

% Calibrate the camera.

imageSize = [size(I, 1), size(I, 2)];
cameraParams = estimateCameraParameters(imagePoints, worldPoints, ...
ImageSize = imageSize);

% Evaluate calibration accuracy.

figure; showReprojectionErrors(cameraParams);
title("Reprojection Errors");

1-173
1 Camera Calibration and SfM Examples

The bar graph indicates the accuracy of the calibration. Each bar shows the mean reprojection error
for the corresponding calibration image. The reprojection errors are the distances between the
corner points detected in the image, and the corresponding ideal world points projected into the
image.

Read the Image of Objects to Be Measured

Load the image containing objects to be measured. This image includes the calibration pattern, and
the pattern is in the same plane as the objects you want to measure. In this example, both the pattern
and the coins are on the same table top.

Alternatively, you could use two separate images: one containing the pattern, and the other
containing the objects to be measured. Again, the objects and the pattern must be in the same plane.
Furthermore, images must be captured from exactly the same view point, meaning that the camera
must be fixed in place.

imOrig = imread(fullfile(matlabroot, "toolbox", "vision", "visiondata", ...

"calibration", "slr", "image9.jpg"));
figure; imshow(imOrig, InitialMagnification = magnification);
title("Input Image");

1-174
Measuring Planar Objects with a Calibrated Camera

Undistort the Image

Use the cameraParameters object to remove lens distortion from the image. This is necessary for
accurate measurement.

% Since the lens introduced little distortion, use 'full' output view to illustrate that
% the image was undistored. If we used the default 'same' option, it would be difficult
% to notice any difference when compared to the original image. Notice the small black borders.
[im, newOrigin] = undistortImage(imOrig, cameraParams, OutputView = "full");
figure; imshow(im, InitialMagnification = magnification);
title("Undistorted Image");

1-175
1 Camera Calibration and SfM Examples

Note that this image exhibits very little lens distortion. The undistortion step is far more important if
you use a wide-angle lens, or a low-end webcam.

Segment Coins

In this case, the coins are colorful on white background. Use the saturation component of the HSV
representation of the image to segment them out.

% Convert the image to the HSV color space.

imHSV = rgb2hsv(im);

% Get the saturation channel.

saturation = imHSV(:, :, 2);

% Threshold the image

t = graythresh(saturation);
imCoin = (saturation > t);

figure; imshow(imCoin, InitialMagnification = magnification);

title("Segmented Coins");

1-176
Measuring Planar Objects with a Calibrated Camera

Detect Coins

We can assume that the two largest connected components in the segmented image correspond to the
coins.

% Find connected components.

blobAnalysis = vision.BlobAnalysis(AreaOutputPort = true,...
CentroidOutputPort = false,...
BoundingBoxOutputPort = true,...
MinimumBlobArea = 200, ExcludeBorderBlobs = true);
[areas, boxes] = step(blobAnalysis, imCoin);

% Sort connected components in descending order by area

[~, idx] = sort(areas, "Descend");

% Get the two largest components.

boxes = double(boxes(idx(1:2), :));

% Reduce the size of the image for display.

scale = magnification / 100;
imDetectedCoins = imresize(im, scale);

% Insert labels for the coins.

imDetectedCoins = insertObjectAnnotation(imDetectedCoins, "rectangle", ...
scale * boxes, "penny");

1-177
1 Camera Calibration and SfM Examples

figure; imshow(imDetectedCoins);
title("Detected Coins");

Compute Extrinsics

To map points in the image coordinates to points in the world coordinates we need to compute the
rotation and the translation of the camera relative to the calibration pattern. Note that the
estimateExtrinsics function assumes that there is no lens distortion. In this case imagePoints
have been detected in an image that has already been undistorted using undistortImage.

% Detect the checkerboard.

[imagePoints, boardSize] = detectCheckerboardPoints(im);

% Adjust the imagePoints so that they are expressed in the coordinate system
% used in the original image, before it was undistorted. This adjustment
% makes it compatible with the cameraParameters object computed for the original image.
imagePoints = imagePoints + newOrigin; % adds newOrigin to every row of imagePoints

% Extract camera intrinsics.

camIntrinsics = cameraParams.Intrinsics;

% Compute extrinsic parameters of the camera.

camExtrinsics = estimateExtrinsics(imagePoints, worldPoints, camIntrinsics);

1-178
Measuring Planar Objects with a Calibrated Camera

Measure the First Coin

To measure the first coin we convert the top-left and the top-right corners of the bounding box into
world coordinates. Then we compute the Euclidean distance between them in millimeters. Note that
the actual diameter of a US penny is 19.05 mm.

% Adjust upper left corners of bounding boxes for coordinate system shift
% caused by undistortImage with output view of 'full'. This would not be
% needed if the output was 'same'. The adjustment makes the points compatible
% with the cameraParameters of the original image.
boxes = boxes + [newOrigin, 0, 0]; % zero padding is added for width and height

% Get the top-left and the top-right corners.

box1 = double(boxes(1, :));
imagePoints1 = [box1(1:2); ...
box1(1) + box1(3), box1(2)];

% Get the world coordinates of the corners

worldPoints1 = img2world2d(imagePoints1, camExtrinsics, camIntrinsics);

% Compute the diameter of the coin in millimeters.

d = worldPoints1(2, :) - worldPoints1(1, :);
diameterInMillimeters = hypot(d(1), d(2));
fprintf("Measured diameter of one penny = %0.2f mm\n", diameterInMillimeters);

Measured diameter of one penny = 19.00 mm

Measure the Second Coin

Measure the second coin the same way as the first coin.

% Get the top-left and the top-right corners.

box2 = double(boxes(2, :));
imagePoints2 = [box2(1:2); ...
box2(1) + box2(3), box2(2)];

% Apply the inverse transformation from image to world

worldPoints2 = img2world2d(imagePoints2, camExtrinsics, camIntrinsics);

% Compute the diameter of the coin in millimeters.

d = worldPoints2(2, :) - worldPoints2(1, :);
diameterInMillimeters = hypot(d(1), d(2));
fprintf("Measured diameter of the other penny = %0.2f mm\n", diameterInMillimeters);

Measured diameter of the other penny = 18.85 mm

Measure the Distance to The First Coin

In addition to measuring the size of the coin, we can also measure how far away it is from the
camera.

% Compute the center of the first coin in the image.

center1_image = box1(1:2) + box1(3:4)/2;

% Convert to world coordinates.

center1_world = img2world2d(center1_image, camExtrinsics, camIntrinsics);

% Remember to add the 0 z-coordinate.

1-179
1 Camera Calibration and SfM Examples

center1_world = [center1_world 0];

% Compute the distance to the camera.

cameraPose = extr2pose(camExtrinsics);
cameraLocation = cameraPose.Translation;
distanceToCamera = norm(center1_world - cameraLocation);
fprintf("Distance from the camera to the first penny = %0.2f mm\n", ...
distanceToCamera);

Distance from the camera to the first penny = 719.52 mm

Summary

This example showed how to use a calibrated camera to measure planar objects. Note that the
measurements were accurate to within 0.2 mm.

References

[1] Z. Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 22(11):1330-1334, 2000.

1-180
Depth Estimation from Stereo Video

Depth Estimation from Stereo Video

This example shows how to detect people and their distance to the camera from a video taken with a
calibrated stereo camera.

Load the Parameters of the Stereo Camera

Load the stereoParameters object, which is the result of calibrating the camera using either the
stereoCameraCalibrator app or the estimateCameraParameters function.

% Load the stereoParameters object.

load('handshakeStereoParams.mat');

% Visualize camera extrinsics.

showExtrinsics(stereoParams);

Create Video File Readers and the Video Player

Create System objects for reading and displaying the video.

1-181
1 Camera Calibration and SfM Examples

videoFileLeft = 'handshake_left.avi';
videoFileRight = 'handshake_right.avi';

readerLeft = VideoReader(videoFileLeft);
readerRight = VideoReader(videoFileRight);
player = vision.VideoPlayer('Position', [20,200,740 560]);

Read and Rectify Video Frames

The frames from the left and the right cameras must be rectified in order to compute disparity and
reconstruct the 3-D scene. Rectified images have horizontal epipolar lines, and are row-aligned. This
simplifies the computation of disparity by reducing the search space for matching points to one
dimension. Rectified images can also be combined into an anaglyph, which can be viewed using the
stereo red-cyan glasses to see the 3-D effect.
frameLeft = readFrame(readerLeft);
frameRight = readFrame(readerRight);

[frameLeftRect, frameRightRect, reprojectionMatrix] = ...

rectifyStereoImages(frameLeft, frameRight, stereoParams);

figure;
imshow(stereoAnaglyph(frameLeftRect, frameRightRect));
title('Rectified Video Frames');

1-182
Depth Estimation from Stereo Video

Compute Disparity

In rectified stereo images any pair of corresponding points are located on the same pixel row. For
each pixel in the left image compute the distance to the corresponding pixel in the right image. This
distance is called the disparity, and it is proportional to the distance of the corresponding world point
from the camera.
frameLeftGray = im2gray(frameLeftRect);
frameRightGray = im2gray(frameRightRect);

disparityMap = disparitySGM(frameLeftGray, frameRightGray);

figure;
imshow(disparityMap, [0, 64]);
title('Disparity Map');
colormap jet
colorbar

Reconstruct the 3-D Scene

Reconstruct the 3-D world coordinates of points corresponding to each pixel from the disparity map.
points3D = reconstructScene(disparityMap, reprojectionMatrix);

% Convert to meters and create a pointCloud object

1-183
1 Camera Calibration and SfM Examples

points3D = points3D ./ 1000;

ptCloud = pointCloud(points3D, 'Color', frameLeftRect);

% Create a streaming point cloud viewer

player3D = pcplayer([-3, 3], [-3, 3], [0, 8], 'VerticalAxis', 'y', ...
'VerticalAxisDir', 'down');

% Visualize the point cloud

view(player3D, ptCloud);

Detect People in the Left Image

Use the vision.PeopleDetector system object to detect people.

1-184
Depth Estimation from Stereo Video

% Create the people detector object. Limit the minimum object size for
% speed.
peopleDetector = vision.PeopleDetector('MinSize', [166 83]);

% Detect people.
bboxes = peopleDetector.step(frameLeftGray);

Determine the Distance of Each Person to the Camera

Find the 3-D world coordinates of the centroid of each detected person and compute the distance
from the centroid to the camera in meters.

% Find the centroids of detected people.

centroids = [round(bboxes(:, 1) + bboxes(:, 3) / 2), ...
round(bboxes(:, 2) + bboxes(:, 4) / 2)];

% Find the 3-D world coordinates of the centroids.

centroidsIdx = sub2ind(size(disparityMap), centroids(:, 2), centroids(:, 1));
X = points3D(:, :, 1);
Y = points3D(:, :, 2);
Z = points3D(:, :, 3);
centroids3D = [X(centroidsIdx)'; Y(centroidsIdx)'; Z(centroidsIdx)'];

% Find the distances from the camera in meters.

dists = sqrt(sum(centroids3D .^ 2));

% Display the detected people and their distances.

labels = cell(1, numel(dists));
for i = 1:numel(dists)
labels{i} = sprintf('%0.2f meters', dists(i));
end
figure;
imshow(insertObjectAnnotation(frameLeftRect, 'rectangle', bboxes, labels));
title('Detected People');

1-185
1 Camera Calibration and SfM Examples

Process the Rest of the Video

Apply the steps described above to detect people and measure their distances to the camera in every
frame of the video.
while hasFrame(readerLeft) && hasFrame(readerRight)
% Read the frames.
frameLeft = readFrame(readerLeft);
frameRight = readFrame(readerRight);

% Rectify the frames.

[frameLeftRect, frameRightRect] = ...
rectifyStereoImages(frameLeft, frameRight, stereoParams);

% Convert to grayscale.
frameLeftGray = im2gray(frameLeftRect);
frameRightGray = im2gray(frameRightRect);

% Compute disparity.
disparityMap = disparitySGM(frameLeftGray, frameRightGray);

% Reconstruct 3-D scene.

points3D = reconstructScene(disparityMap, reprojectionMatrix);
points3D = points3D ./ 1000;

1-186
Depth Estimation from Stereo Video

ptCloud = pointCloud(points3D, 'Color', frameLeftRect);

view(player3D, ptCloud);

% Detect people.
bboxes = peopleDetector.step(frameLeftGray);

if ~isempty(bboxes)
% Find the centroids of detected people.
centroids = [round(bboxes(:, 1) + bboxes(:, 3) / 2), ...
round(bboxes(:, 2) + bboxes(:, 4) / 2)];

% Find the 3-D world coordinates of the centroids.

% Find the distances from the camera in meters.

dists = sqrt(sum(centroids3D .^ 2, 2));

% Display the detect people and their distances.

labels = cell(1, numel(dists));
for i = 1:numel(dists)
labels{i} = sprintf('%0.2f meters', dists(i));
end
dispFrame = insertObjectAnnotation(frameLeftRect, 'rectangle', bboxes,...
labels);
else
dispFrame = frameLeftRect;
end

% Display the frame.

step(player, dispFrame);
end

1-187
1 Camera Calibration and SfM Examples

1-188
Depth Estimation from Stereo Video

1-189
1 Camera Calibration and SfM Examples

% Clean up
release(player);

1-190
Depth Estimation from Stereo Video

Summary

This example showed how to localize pedestrians in 3-D using a calibrated stereo camera.

References

[1] G. Bradski and A. Kaehler, "Learning OpenCV : Computer Vision with the OpenCV Library,"
O'Reilly, Sebastopol, CA, 2008.

[2] Dalal, N. and Triggs, B., Histograms of Oriented Gradients for Human Detection. CVPR 2005.

1-191
1 Camera Calibration and SfM Examples

Structure from Motion from Multiple Views

Structure from motion (SfM) is the process of estimating the 3-D structure of a scene from a set of 2-
D views. It is used in many applications, such as robot navigation, autonomous driving, and
augmented reality. This example shows you how to estimate the poses of a calibrated camera from a
sequence of views, and reconstruct the 3-D structure of the scene up to an unknown scale factor.

Overview

This example shows how to reconstruct a 3-D scene from a sequence of 2-D views taken with a
camera calibrated using the Camera Calibrator. The example uses an imageviewset object to store
and manage the data associated with each view, such as the camera pose and the image points, as
well as matches between points from pairs of views.

The example uses the pairwise point matches to estimate the camera pose of the current view
relative to the previous view. It then links the pairwise matches into longer point tracks spanning
multiple views using the findTracks method of the imageviewset object. These tracks then serve
as inputs to multiview triangulation using the triangulateMultiview function and the refinement
of camera poses and the 3-D scene points using the bundleAdjustment function.

The example consists of two main parts: camera motion estimation and dense scene reconstruction.
In the first part, the example estimates the camera pose for each view using a sparse set of points
matched across the views. In the second part, the example iterates over the sequence of views again,
using vision.PointTrackerto track a dense set of points across the views, to compute a dense 3-D
reconstruction of the scene.

The camera motion estimation algorithm consists of the following steps:

1 For each pair of consecutive images, find a set of point correspondences. This example detects
the interest points using the detectSURFFeatures function, extracts the feature descriptors
using the extractFeatures functions, and finds the matches using the matchFeatures
function. Alternatively, you can track the points across the views using vision.PointTracker.
2 Estimate the relative pose of the current view, which is the camera orientation and location
relative to the previous view. The example uses a helper function
helperEstimateRelativePose, which calls estimateEssentialMatrix and estrelpose.
3 Transform the relative pose of the current view into the coordinate system of the first view of the
sequence.
4 Store the current view attributes: the camera pose and the image points.
5 Store the inlier matches between the previous and the current view.
6 Find point tracks across all the views processed so far.
7 Use the triangulateMultiview function to compute the initial 3-D locations corresponding to
the tracks.
8 Use the bundleAdjustment function to refine the camera poses and the 3-D points. Store the
refined camera poses in the imageviewset object.

Read the Input Image Sequence

Read and display the image sequence.

% Use |imageDatastore| to get a list of all image file names in a directory.
imageDir = fullfile(toolboxdir('vision'), 'visiondata', ...

1-192
Structure from Motion from Multiple Views

'structureFromMotion');
imds = imageDatastore(imageDir);

% Display the images.

figure
montage(imds.Files, 'Size', [3, 2]);

% Convert the images to grayscale.

images = cell(1, numel(imds.Files));
for i = 1:numel(imds.Files)
I = readimage(imds, i);
images{i} = im2gray(I);
end

title('Input Image Sequence');

1-193
1 Camera Calibration and SfM Examples

Load Camera Parameters

Load the cameraParameters object created using the Camera Calibrator.

data = load(fullfile(imageDir, 'cameraParams.mat'));

cameraParams = data.cameraParams;

Create a View Set Containing the First View

Use an imageviewset object to store and manage the image points and the camera pose associated
with each view, as well as point matches between pairs of views. Once you populate an
imageviewset object, you can use it to find point tracks across multiple views and retrieve the
camera poses to be used by triangulateMultiview and bundleAdjustment functions.

% Get intrinsic parameters of the camera

intrinsics = cameraParams.Intrinsics;

% Undistort the first image.

I = undistortImage(images{1}, intrinsics);

% Detect features. Increasing 'NumOctaves' helps detect large-scale

% features in high-resolution images. Use an ROI to eliminate spurious
% features around the edges of the image.
border = 50;
roi = [border, border, size(I, 2)- 2*border, size(I, 1)- 2*border];
prevPoints = detectSURFFeatures(I, NumOctaves=8, ROI=roi);

% Extract features. Using 'Upright' features improves matching, as long as

% the camera motion involves little or no in-plane rotation.
prevFeatures = extractFeatures(I, prevPoints, Upright=true);

% Create an empty imageviewset object to manage the data associated with each
% view.
vSet = imageviewset;

% Add the first view. Place the camera associated with the first view
% and the origin, oriented along the Z-axis.
viewId = 1;
vSet = addView(vSet, viewId, rigidtform3d, Points=prevPoints);

Add the Rest of the Views

Go through the rest of the images. For each image:

1 Match points between the previous and the current image.

2 Estimate the camera pose of the current view relative to the previous view.
3 Compute the camera pose of the current view in the global coordinate system relative to the first
view.
4 Triangulate the initial 3-D world points.
5 Use bundle adjustment to refine all camera poses and the 3-D world points.

for i = 2:numel(images)
% Undistort the current image.
I = undistortImage(images{i}, intrinsics);

% Detect, extract and match features.

1-194
Structure from Motion from Multiple Views

currPoints = detectSURFFeatures(I, NumOctaves=8, ROI=roi);

currFeatures = extractFeatures(I, currPoints, Upright=true);
indexPairs = matchFeatures(prevFeatures, currFeatures, ...
MaxRatio=0.7, Unique=true);

% Select matched points.

matchedPoints1 = prevPoints(indexPairs(:, 1));
matchedPoints2 = currPoints(indexPairs(:, 2));

% Estimate the camera pose of current view relative to the previous view.
% The pose is computed up to scale, meaning that the distance between
% the cameras in the previous view and the current view is set to 1.
% This will be corrected by the bundle adjustment.
[relPose, inlierIdx] = helperEstimateRelativePose(...
matchedPoints1, matchedPoints2, intrinsics);

% Get the table containing the previous camera pose.

prevPose = poses(vSet, i-1).AbsolutePose;

% Compute the current camera pose in the global coordinate system

% relative to the first view.
currPose = rigidtform3d(prevPose.A*relPose.A);

% Add the current view to the view set.

vSet = addView(vSet, i, currPose, Points=currPoints);

% Store the point matches between the previous and the current views.
vSet = addConnection(vSet, i-1, i, relPose, Matches=indexPairs(inlierIdx,:));

% Find point tracks across all views.

tracks = findTracks(vSet);

% Get the table containing camera poses for all views.

camPoses = poses(vSet);

% Triangulate initial locations for the 3-D world points.

xyzPoints = triangulateMultiview(tracks, camPoses, intrinsics);

% Refine the 3-D world points and camera poses.

[xyzPoints, camPoses, reprojectionErrors] = bundleAdjustment(xyzPoints, ...
tracks, camPoses, intrinsics, FixedViewId=1, ...
PointsUndistorted=true);

% Store the refined camera poses.

vSet = updateView(vSet, camPoses);

prevFeatures = currFeatures;
prevPoints = currPoints;
end

Display Camera Poses

Display the refined camera poses and 3-D world points.

% Display camera poses.

camPoses = poses(vSet);
figure;
plotCamera(camPoses, Size=0.2);

1-195
1 Camera Calibration and SfM Examples

hold on

% Exclude noisy 3-D points.

goodIdx = (reprojectionErrors < 5);
xyzPoints = xyzPoints(goodIdx, :);

% Display the 3-D points.

pcshow(xyzPoints, VerticalAxis='y', VerticalAxisDir='down', MarkerSize= 45);
grid on
hold off

% Specify the viewing volume.

loc1 = camPoses.AbsolutePose(1).Translation;
xlim([loc1(1)-5, loc1(1)+4]);
ylim([loc1(2)-5, loc1(2)+4]);
zlim([loc1(3)-1, loc1(3)+20]);
camorbit(0, -30);

title('Refined Camera Poses');

Compute Dense Reconstruction

Go through the images again. This time detect a dense set of corners, and track them across all views
usingvision.PointTracker.
% Read and undistort the first image
I = undistortImage(images{1}, intrinsics);

1-196
Structure from Motion from Multiple Views

% Detect corners in the first image.

prevPoints = detectMinEigenFeatures(I, MinQuality=0.001);

% Create the point tracker object to track the points across views.
tracker = vision.PointTracker(MaxBidirectionalError=1, NumPyramidLevels=6);

% Initialize the point tracker.

prevPoints = prevPoints.Location;
initialize(tracker, prevPoints, I);

% Store the dense points in the view set.

vSet = updateConnection(vSet, 1, 2, Matches=zeros(0, 2));

vSet = updateView(vSet, 1, Points=prevPoints);

% Track the points across all views.

for i = 2:numel(images)
% Read and undistort the current image.
I = undistortImage(images{i}, intrinsics);

% Track the points.

[currPoints, validIdx] = step(tracker, I);

% Clear the old matches between the points.

if i < numel(images)
vSet = updateConnection(vSet, i, i+1, Matches=zeros(0, 2));
end
vSet = updateView(vSet, i, Points=currPoints);

% Store the point matches in the view set.

matches = repmat((1:size(prevPoints, 1))', [1, 2]);
matches = matches(validIdx, :);
vSet = updateConnection(vSet, i-1, i, Matches=matches);
end

% Find point tracks across all views.

tracks = findTracks(vSet);

% Find point tracks across all views.

camPoses = poses(vSet);

% Triangulate initial locations for the 3-D world points.

xyzPoints = triangulateMultiview(tracks, camPoses,...
intrinsics);

% Refine the 3-D world points and camera poses.

[xyzPoints, camPoses, reprojectionErrors] = bundleAdjustment(...
xyzPoints, tracks, camPoses, intrinsics, FixedViewId=1, ...
PointsUndistorted=true);

Display Dense Reconstruction

Display the camera poses and the dense point cloud.

% Display the refined camera poses.

figure;
plotCamera(camPoses, Size=0.2);
hold on

1-197
1 Camera Calibration and SfM Examples

% Exclude noisy 3-D world points.

goodIdx = (reprojectionErrors < 5);

% Display the dense 3-D world points.

pcshow(xyzPoints(goodIdx, :), VerticalAxis='y', VerticalAxisDir='down', MarkerSize=45);
grid on
hold off

% Specify the viewing volume.

loc1 = camPoses.AbsolutePose(1).Translation;
xlim([loc1(1)-5, loc1(1)+4]);
ylim([loc1(2)-5, loc1(2)+4]);
zlim([loc1(3)-1, loc1(3)+20]);
camorbit(0, -30);

title('Dense Reconstruction');

References

[1] M.I.A. Lourakis and A.A. Argyros (2009). "SBA: A Software Package for Generic Sparse Bundle
Adjustment". ACM Transactions on Mathematical Software (ACM) 36 (1): 1-30.

[2] R. Hartley, A. Zisserman, "Multiple View Geometry in Computer Vision," Cambridge University
Press, 2003.

[3] B. Triggs; P. McLauchlan; R. Hartley; A. Fitzgibbon (1999). "Bundle Adjustment: A Modern

Synthesis". Proceedings of the International Workshop on Vision Algorithms. Springer-Verlag. pp.
298-372.

1-198
Uncalibrated Stereo Image Rectification

Uncalibrated Stereo Image Rectification

This example shows how to use the estimateFundamentalMatrix,

estimateStereoRectification, and detectSURFFeatures functions to compute the
rectification of two uncalibrated images, where the camera intrinsics are unknown.

Stereo image rectification projects images onto a common image plane in such a way that the
corresponding points have the same row coordinates. This process is useful for stereo vision, because
the 2-D stereo correspondence problem is reduced to a 1-D problem. As an example, stereo image
rectification is often used as a preprocessing step for computing or creating anaglyph images. For
more details, see the “Depth Estimation from Stereo Video” on page 1-181 example.

Step 1: Read Stereo Image Pair

Read in two color images of the same scene, which were taken from different positions. Then, convert
them to grayscale. Colors are not required for the matching process.

I1 = imread("yellowstone_left.png");
I2 = imread("yellowstone_right.png");

% Convert to grayscale.
I1gray = im2gray(I1);
I2gray = im2gray(I2);

Display both images side by side. Then, display a color composite demonstrating the pixel-wise
differences between the images.

figure
imshowpair(I1,I2,"montage")
title("I1 (left); I2 (right)")

figure
imshow(stereoAnaglyph(I1,I2))
title("Composite Image (Red - Left Image, Cyan - Right Image)")

1-199
1 Camera Calibration and SfM Examples

There is an obvious offset between the images in orientation and position. The goal of rectification is
to transform the images, aligning them such that corresponding points will appear on the same rows
in both images.

Step 2: Collect Interest Points from Each Image

The rectification process requires a set of point correspondences between the two images. To
generate these correspondences, you will collect points of interest from both images, and then choose
potential matches between them. Use detectSURFFeatures to find blob-like features in both
images.
blobs1 = detectSURFFeatures(I1gray,MetricThreshold=2000);
blobs2 = detectSURFFeatures(I2gray,MetricThreshold=2000);

Visualize the location and scale of the thirty strongest SURF features in I1 and I2. Notice that not all
of the detected features can be matched because they were either not detected in both images or
because some of them were not present in one of the images due to camera motion.
figure
imshow(I1)
hold on
plot(selectStrongest(blobs1,30))
title("Thirty Strongest SURF Features In I1")

1-200
Uncalibrated Stereo Image Rectification

figure
imshow(I2)
hold on
plot(selectStrongest(blobs2,30))
title("Thirty Strongest SURF Features In I2")

1-201
1 Camera Calibration and SfM Examples

Step 3: Find Putative Point Correspondences

Use the extractFeatures and matchFeatures functions to find putative point correspondences.
For each blob, compute the SURF feature vectors (descriptors).
[features1,validBlobs1] = extractFeatures(I1gray,blobs1);
[features2,validBlobs2] = extractFeatures(I2gray,blobs2);

Use the sum of absolute differences (SAD) metric to determine indices of matching features.
indexPairs = matchFeatures(features1,features2,Metric="SAD", ...
MatchThreshold=5);

Retrieve locations of matched points for each image.

matchedPoints1 = validBlobs1(indexPairs(:,1),:);
matchedPoints2 = validBlobs2(indexPairs(:,2),:);

Show matching points on top of the composite image, which combines stereo images. Notice that
most of the matches are correct, but there are still some outliers.
figure
showMatchedFeatures(I1, I2, matchedPoints1, matchedPoints2)
legend("Putatively Matched Points In I1","Putatively Matched Points In I2")

1-202
Uncalibrated Stereo Image Rectification

Step 4: Remove Outliers Using Epipolar Constraint

The correctly matched points must satisfy epipolar constraints. This means that a point must lie on
the epipolar line determined by its corresponding point. You will use the
estimateFundamentalMatrix function to compute the fundamental matrix and find the inliers that
meet the epipolar constraint.

[fMatrix, epipolarInliers, status] = estimateFundamentalMatrix(...

matchedPoints1,matchedPoints2,Method="RANSAC", ...
NumTrials=10000,DistanceThreshold=0.1,Confidence=99.99);

if status ~= 0 || isEpipoleInImage(fMatrix,size(I1)) ...

|| isEpipoleInImage(fMatrix',size(I2))
error(["Not enough matching points were found or "...
"the epipoles are inside the images. Inspect "...
"and improve the quality of detected features ",...
"and images."]);
end

inlierPoints1 = matchedPoints1(epipolarInliers, :);

inlierPoints2 = matchedPoints2(epipolarInliers, :);

figure

1-203
1 Camera Calibration and SfM Examples

showMatchedFeatures(I1, I2, inlierPoints1, inlierPoints2)

legend("Inlier Points In I1","Inlier Points In I2")

Step 5: Rectify Images

Use the estimateStereoRectification function to compute the rectification transformations.

These can be used to transform the images, such that the corresponding points will appear on the
same rows.

[tform1, tform2] = estimateStereoRectification(fMatrix, ...

inlierPoints1.Location,inlierPoints2.Location,size(I2));

Rectify the stereo images, and display them as a stereo anaglyph. You can use red-cyan stereo glasses
to see the 3D effect.

[I1Rect, I2Rect] = rectifyStereoImages(I1,I2,tform1,tform2);

figure
imshow(stereoAnaglyph(I1Rect,I2Rect))
title("Rectified Stereo Images (Red - Left Image, Cyan - Right Image)")

1-204
Uncalibrated Stereo Image Rectification

Step 6: Generalize The Rectification Process

The parameters used in the above steps have been set to fit the two particular stereo images. To
process other images, you can use the cvexRectifyStereoImages function, which contains
additional logic to automatically adjust the rectification parameters. The image below shows the
result of processing a pair of images using this function.

cvexRectifyImages("parkinglot_left.png","parkinglot_right.png");

1-205
1 Camera Calibration and SfM Examples

References

[1] Trucco, E; Verri, A. "Introductory Techniques for 3-D Computer Vision." Prentice Hall, 1998.

[2] Hartley, R; Zisserman, A. "Multiple View Geometry in Computer Vision." Cambridge University
Press, 2003.

[3] Hartley, R. "In Defense of the Eight-Point Algorithm." IEEE® Transactions on Pattern Analysis and
Machine Intelligence, v.19 n.6, June 1997.

[4] Fischler, MA; Bolles, RC. "Random Sample Consensus: A Paradigm for Model Fitting with
Applications to Image Analysis and Automated Cartography." Comm. Of the ACM 24, June 1981.

1-206
2

Code Generation and Third-Party

Examples

• “Code Generation for Monocular Visual Simultaneous Localization and Mapping ” on page 2-2
• “Code Generation for Object Detection by Using Single Shot Multibox Detector” on page 2-5
• “Code Generation for Object Detection by Using YOLO v2” on page 2-8
• “Introduction to Code Generation with Feature Matching and Registration” on page 2-12
• “Code Generation for Face Tracking with PackNGo” on page 2-19
• “Code Generation for Depth Estimation From Stereo Video” on page 2-27
• “Detect Face (Raspberry Pi2)” on page 2-32
• “Track Face (Raspberry Pi2)” on page 2-38
• “Video Display in a Custom User Interface” on page 2-44
• “Generate Code for Detecting Objects in Images by Using ACF Object Detector” on page 2-49
2 Code Generation and Third-Party Examples

Code Generation for Monocular Visual Simultaneous

Localization and Mapping

This example shows how to process image data from a monocular camera to build a map of an indoor
environment and estimate the trajectory of the camera. The steps in this process are:

You can also integrate the generated code into external software for further testing.

Download Data

This example uses data from the TUM RGB-D benchmark [1] on page 2-4. The size of the data set is
1.38 GB. You can download the data set to a temporary folder using this code.

% Create a folder in a temporary directory in which to save the downloaded file

if ~folderExists
mkdir(dataFolder)
disp("Downloading fr3_office.tgz (1.38 GB). This download can take a few minutes.")
websave(tgzFileName,baseDownloadURL,options);

% Extract contents of the downloaded file

disp("Extracting fr3_office.tgz (1.38 GB) ...")
untar(tgzFileName, dataFolder);
end

Entry-Point Function

The helperMonoVisualSLAM function performs the steps:

2-2
Code Generation for Monocular Visual Simultaneous Localization and Mapping

Convert Image Inputs to Enable Code Generation

As code generation does not support the imageDatastore object, read the images, convert them to
grayscale, and store them in a cell array.

imageFolder = dataFolder + "rgbd_dataset_freiburg3_long_office_household/rgb/";

imds = imageDatastore(imageFolder);
for frameIdx = 1:numel(imds.Files)
Images{frameIdx} = im2gray(readimage(imds,frameIdx));
end

Generate C++ Code

For code generation, you must pass Images as an input to the helperVisualSLAMCodegen
function.

cpuConfig = coder.config("mex");
cpuConfig.TargetLang = "C++";
codegen -config cpuConfig helperVisualSLAMCodegen -args {Images}

Code generation successful.

Run Generated MEX File

Use the helperVisualSLAMCodegen_mex function to find the estimated and optimized camera
poses based on Images cell array.

monoSlamOut = helperVisualSLAMCodegen_mex(Images);

Plot the estimated trajectory and actual trajectory of the camera by specifying monoSlamOut as an
input argument to the helperVisualizeMonoSlam helper function.

% Visualize the results

mapPlot = helperVisualizeMonoVisualSlam(monoSlamOut);

2-3
2 Code Generation and Third-Party Examples

% Clear up
clear helperMonoVisualSLAMCodegen_mex

Reference

2-4
Code Generation for Object Detection by Using Single Shot Multibox Detector

Code Generation for Object Detection by Using Single Shot

Multibox Detector

This example shows how to generate CUDA® code for an SSD network (ssdObjectDetector object)
and take advantage of the NVIDIA® cuDNN and TensorRT libraries. An SSD network is based on a
feed-forward convolutional neural network that detect multiple objects within the image in a single
shot. SSD network can be thought of as having two sub-networks. A feature extraction network,
followed by a detection network.

This example generates code for the network trained in the Object Detection Using SSD Deep
Learning example from Computer Vision Toolbox™. For more information, see “Object Detection
Using SSD Deep Learning” on page 3-302. The Object Detection Using SSD Deep Learning example
uses ResNet-50 for feature extraction. The detection sub-network is a small CNN compared to the
feature extraction network and is composed of a few convolutional layers and layers specific to SSD.

Third-Party Prerequisites

Required

This example generates CUDA MEX and has the following third-party requirements.

• CUDA enabled NVIDIA GPU and compatible driver.

Optional

For non-MEX builds such as static, dynamic libraries or executables, this example has the following
additional requirements.

• NVIDIA toolkit.
• NVIDIA cuDNN library.
• Environment variables for the compilers and libraries. For more information, see “Third-Party
Hardware” (GPU Coder) and “Setting Up the Prerequisite Products” (GPU Coder).

Verify GPU Environment

Use the coder.checkGpuInstall (GPU Coder) function to verify that the compilers and libraries
necessary for running this example are set up correctly.
envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

Get Pretrained DAG Network

This example uses the ssdResNet50VehicleExample_20a MAT-file containing the pretrained SSD
network. This file is approximately 44 MB size. Download the file from the MathWorks® website.
ssdNetFile = matlab.internal.examples.downloadSupportFile('vision/data','ssdResNet50VehicleExampl

The DAG network contains 180 layers including convolution, ReLU, and batch normalization layers,
anchor box, SSD merge, focal loss, and other layers. To display an interactive visualization of the
deep learning network architecture, use the analyzeNetwork (Deep Learning Toolbox) function.

2-5
2 Code Generation and Third-Party Examples

load(ssdNetFile);
analyzeNetwork(detector.Network);

The ssdObj_detect Entry-Point Function

The ssdObj_detect.m entry-point function takes an image input and runs the detector on the image
using the deep learning network saved in the ssdResNet50VehicleExample_20a.mat file. The
function loads the network object from the ssdResNet50VehicleExample_20a.mat file into a
persistent variable ssdObj and reuses the persistent object on subsequent detection calls.
type('ssdObj_detect.m')

function outImg = ssdObj_detect(in,matFile)

persistent ssdObj;

if isempty(ssdObj)
ssdObj = coder.loadDeepLearningNetwork(matFile);
end

% Pass in input
[bboxes,~,labels] = detect(ssdObj,in,'Threshold',0.5);

% Convert categorical labels to cell array of charactor vectors for

% execution
labels = cellstr(labels);

% Annotate detections in the image.

if ~isempty(labels)
outImg = insertObjectAnnotation(in,'rectangle',bboxes,labels);
else
outImg = in;
end

Run MEX Code Generation

To generate CUDA code for the ssdObj_detect.m entry-point function, create a GPU code
configuration object for a MEX target and set the target language to C++. Use the
coder.DeepLearningConfig (GPU Coder) function to create a CuDNN deep learning configuration
object and assign it to the DeepLearningConfig property of the GPU code configuration object.
Run the codegen command specifying an input size of 300-by-300-by-3. This value corresponds to the
input layer size of SSD Network.
cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
inputArgs = {ones(300,300,3,'uint8'),coder.Constant(ssdNetFile)};
codegen -config cfg ssdObj_detect -args inputArgs -report

Code generation successful: View report

Run Generated MEX

To test the generated MEX, the example uses a small vehicle data set that contains 295 images. Many
of these images come from the Caltech Cars 1999 and 2001 data sets, available at the Caltech
Research Data Respository website, created by Pietro Perona and used with permission.

2-6
Code Generation for Object Detection by Using Single Shot Multibox Detector

Load the vehicle data set and randomly select 10 images to test the generated code.

unzip vehicleDatasetImages.zip
imageNames = dir(fullfile(pwd,'vehicleImages','*.jpg'));
imageNames = {imageNames.name}';
rng(0);
imageIndices = randi(length(imageNames),1,10);

Read the video input frame-by-frame and detect the vehicles in the video using the detector.

for idx = 1:10

testImage = imread(fullfile(pwd,'vehicleImages',imageNames{imageIndices(idx)}));
resizedImage = imresize(testImage,[300,300]);
detectorOutput = ssdObj_detect_mex(resizedImage,ssdNetFile);
imshow(detectorOutput);
pause(0.5)
end

References

[1] Liu, Wei, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng Yang Fu, and
Alexander C. Berg. "SSD: Single shot multibox detector." In 14th European Conference on Computer
Vision, ECCV 2016. Springer Verlag, 2016.

2-7
2 Code Generation and Third-Party Examples

Code Generation for Object Detection by Using YOLO v2

This example shows how to generate CUDA® MEX for a you only look once (YOLO) v2 object
detector. A YOLO v2 object detection network is composed of two subnetworks. A feature extraction
network followed by a detection network. This example generates code for the network trained in the
Object Detection Using YOLO v2 Deep Learning example from Computer Vision Toolbox™. For more
information, see “Object Detection Using YOLO v2 Deep Learning” on page 3-468. You can modify
this example to generate CUDA® MEX for the network imported in the Import Pretrained ONNX
YOLO v2 Object Detector example from Computer Vision Toolbox™. For more information, see
“Import Pretrained ONNX YOLO v2 Object Detector” on page 3-436.

Third-Party Prerequisites

Required

This example generates CUDA MEX and has the following third-party requirements.

• CUDA® enabled NVIDIA® GPU and compatible driver.

Optional

For non-MEX builds such as static, dynamic libraries or executables, this example has the following
additional requirements.

Verify GPU Environment

Use the coder.checkGpuInstall (GPU Coder) function to verify that the compilers and libraries
necessary for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

Get Pretrained DAGNetwork

This example uses the yolov2ResNet50VehicleExample MAT-file containing the pretrained

network. The file is approximately 98MB in size. Download the file from the MathWorks® website.

matFile = matlab.internal.examples.downloadSupportFile('vision/data','yolov2ResNet50VehicleExampl
vehicleDetector = load(matFile);
net = vehicleDetector.detector.Network

net =
DAGNetwork with properties:

Layers: [150×1 nnet.cnn.layer.Layer]

Connections: [162×2 table]

2-8
Code Generation for Object Detection by Using YOLO v2

InputNames: {'input_1'}
OutputNames: {'yolov2OutputLayer'}

The DAG network contains 150 layers including convolution, ReLU, and batch normalization layers
and the YOLO v2 transform and YOLO v2 output layers. To display an interactive visualization of the
deep learning network architecture, use the analyzeNetwork (Deep Learning Toolbox) function.

analyzeNetwork(net);

The yolov2_detect Entry-Point Function

The yolov2_detect.m entry-point function takes an image input and runs the detector on the image
using the deep learning network saved in the yolov2ResNet50VehicleExample.mat file. The
function loads the network object from the yolov2ResNet50VehicleExample.mat file into a
persistent variable yolov2Obj and reuses the persistent object on subsequent detection calls.

type('yolov2_detect.m')

function outImg = yolov2_detect(in,matFile)

persistent yolov2Obj;

if isempty(yolov2Obj)
yolov2Obj = coder.loadDeepLearningNetwork(matFile);
end

% Call to detect method

[bboxes,~,labels] = yolov2Obj.detect(in,'Threshold',0.5);

% Convert categorical labels to cell array of charactor vectors

labels = cellstr(labels);

% Annotate detections in the image.

outImg = insertObjectAnnotation(in,'rectangle',bboxes,labels);

Run MEX Code Generation

To generate CUDA code for the entry-point function, create a GPU code configuration object for a
MEX target and set the target language to C++. Use the coder.DeepLearningConfig (GPU
Coder) function to create a CuDNN deep learning configuration object and assign it to the
DeepLearningConfig property of the GPU code configuration object. Run the codegen command
specifying an input size of 224-by-224-by-3. This value corresponds to the input layer size of YOLOv2.

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
cfg.GenerateReport = true;
inputArgs = {ones(224,224,3,'uint8'),coder.Constant(matFile)};

codegen -config cfg yolov2_detect -args inputArgs

Code generation successful: View report

2-9
2 Code Generation and Third-Party Examples

Run Generated MEX

Set up the video file reader and read the input video. Create a video player to display the video and
the output detections.

videoFile = 'highway_lanechange.mp4';
videoFreader = vision.VideoFileReader(videoFile,'VideoOutputDataType','uint8');
depVideoPlayer = vision.DeployableVideoPlayer('Size','Custom','CustomSize',[640 480]);

Read the video input frame-by-frame and detect the vehicles in the video using the detector.

cont = ~isDone(videoFreader);
while cont
I = step(videoFreader);
in = imresize(I,[224,224]);
out = yolov2_detect_mex(in,matFile);
step(depVideoPlayer, out);
% Exit the loop if the video player figure window is closed
cont = ~isDone(videoFreader) && isOpen(depVideoPlayer);
end

2-10
Code Generation for Object Detection by Using YOLO v2

References

[1] Redmon, Joseph, and Ali Farhadi. "YOLO9000: Better, Faster, Stronger." 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR). IEEE, 2017.

2-11
2 Code Generation and Third-Party Examples

Introduction to Code Generation with Feature Matching and

Registration

This example shows how to use the MATLAB® Coder™ to generate C code for a MATLAB file. The
example explains how to modify the MATLAB code used by the “Find Image Rotation and Scale Using
Automated Feature Matching” on page 4-25 example so that it is supported for code generation. The
example highlights some of the general requirements for code generation, as well as some of the
specific actions you must take to prepare MATLAB code. Once the MATLAB code is ready for code
generation, you use the codegen (MATLAB Coder) command to generate a C-MEX function. Finally,
to verify results, the example shows you how to run the generated C-MEX function in MATLAB and
compare its output with the output of the MATLAB code.

This example requires a MATLAB Coder license.

Set Up Your C Compiler

To run this example, you must have access to a C compiler and you must configure it using 'mex -
setup' command. For more information, see “Get Started with MATLAB Coder” (MATLAB Coder).

Decide Whether to Run Under MATLAB or as a Standalone Application

Generated code can run inside the MATLAB environment as a C-MEX file, or outside the MATLAB
environment as a standalone executable or shared utility to be linked with another standalone
executable. For more details about setting code generation options, see the -config option of the
codegen (MATLAB Coder) command.

MEX Executables

This example generates a MEX executable to be run inside the MATLAB environment.

Generating a C-MEX executable to run inside of MATLAB can also be a great first step in a workflow
that ultimately leads to standalone code. The inputs and the outputs of the MEX-file are available for
inspection in the MATLAB environment, where visualization and other kinds of tools for verification
and analysis are readily available. You also have the choice of running individual commands either as
generated C code, or via the MATLAB engine. To run via MATLAB, declare relevant commands as
coder.extrinsic (MATLAB Coder), which means that the generated code will re-enter the
MATLAB environment when it needs to run that particular command. This is useful in cases where
either an isolated command does not yet have code generation support, or if you wish to embed
certain commands that do not generate code (such as plot command).

Standalone Executables

If deployment of code to another application is the goal, then a standalone executable will be
required. The first step is to configure MATLAB Coder appropriately. For example, one way to tell it
you want a standalone executable is to create a MATLAB Coder project using the MATLAB Coder IDE
and configure that project to generate a module or an executable. You can do so using the C/C++
static library or C/C++ executable options from the Build type widget on the Generate page. This IDE
is available by navigating as follows:

- Click APPS tab - Scroll down to MATLAB Coder - In MATLAB Coder Project dialog box, click OK

You can also define a config object using

2-12
Introduction to Code Generation with Feature Matching and Registration

a=coder.config('exe')

and pass that object to the coder command on the MATLAB command line. When you create a
standalone executable, you have to write your own main.c (or main.cpp). Note that when you create a
standalone executable, there are no ready-made utilities for importing or exporting data between the
executable and the MATLAB environment. One of the options is to use printf/fprintf to a file (in your
handwritten main.c) and then import data into MATLAB using 'load -ascii' with your file.

Break Out the Computational Part of the Algorithm into a Separate MATLAB Function

MATLAB Coder requires MATLAB code to be in the form of a function in order to generate C code.
Note that it is generally not necessary to generate C code for all of the MATLAB code in question. It is
often desirable to separate the code into the primary computational portion, from which C code
generation is desired, and a harness or driver, which does not need to generate C code - that code
will run in MATLAB. The harness may contain visualization and other verification aids that are not
actually part of the system under test. The code for the main algorithm of this example resides in a
function called visionRecovertformCodeGeneration_kernel

Once the code has been re-architected as described above, you must check that the rest of the code
uses capabilities that are supported by MATLAB coder. For a list of supported commands, see
MATLAB Coder “Functions and Objects Supported for C/C++ Code Generation” (MATLAB Coder).
For a list of supported language constructs, see “MATLAB Language Features Supported for C/C++
Code Generation” (MATLAB Coder).

It may be convenient to have limited visualization or some other capability that is not supported by
the MATLAB Coder present in the function containing the main algorithm, which we hope to compile.
In these cases, you can declare these items 'extrinsic' (using coder.extrinsic). Such capability is only
possible when you generate the C code into a MATLAB MEX-file, and those functions will actually run
in interpreted MATLAB mode. If generating code for standalone use, extrinsic functions are either
ignored or they generate an error, depending on whether the code generation engine determines that
they affect the results. Thus the code must be properly architected so that the extrinsic functions do
not materially affect the code in question if a standalone executable is ultimately desired.

The original example uses showMatchedFeatures and imshowpair routines for visualization of the
results. These routines are extracted to a new function
featureMatchingVisualization_extrinsic. This function is declared extrinsic.

Run the Simulation

The kernel file visionRecovertformCodeGeneration_kernel.m has two input parameters. The

first input is the original image and the second input is the image distorted by rotation and scale.

% define original image

original = imread('cameraman.tif');

% define distorted image by resizing and then rotating original image

scale = 0.7;
J = imresize(original,scale);
theta = 30;
% Note that imrotate rotates images in a counterclockwise direction when
% you specify a positive angle of rotation. To rotate the image clockwise,
% specify a negative theta.
distorted = imrotate(J,-theta);

2-13
2 Code Generation and Third-Party Examples

% call the generated mex file

[matchedOriginalLoc,matchedDistortedLoc,thetaRecovered,scaleRecovered, ...
recovered] = visionRecovertformCodeGeneration_kernel(original,distorted);

scaleRecovered = 0.702550

thetaRecovered = 29.761566

2-14
Introduction to Code Generation with Feature Matching and Registration

Compile the MATLAB Function Into a MEX File

Now use the codegen (MATLAB Coder) function to compile the

visionRecovertformCodeGeneration_kernel function into a MEX-file. You can specify the '-report'
option to generate a compilation report that shows the original MATLAB code and the associated files
that were created during C code generation. You may want to create a temporary directory where

2-15
2 Code Generation and Third-Party Examples

MATLAB Coder can create new files. Note that the generated MEX-file has the same name as the
original MATLAB file with _mex appended, unless you use the -o option to specify the name of the
executable.

MATLAB Coder requires that you specify the properties of all the input parameters. One easy way to
do this is to define the input properties by example at the command-line using the -args option. For
more information see “Define Input Properties by Example at the Command Line” (MATLAB Coder).
Since the inputs to % visionRecovertformCodeGeneration_kernel are a pair of images, we
define both the inputs with the following properties:

• variable-sized at run-time with upper-bound [1000 1000]

• data type uint8

% Define the properties of input images

imageTypeAndSize = coder.typeof(uint8(0),[1000 1000],[true true]);
compileTimeInputs = {imageTypeAndSize,imageTypeAndSize};

codegen visionRecovertformCodeGeneration_kernel.m -report -args compileTimeInputs;

Code generation successful: To view the report, open('codegen\mex\visionRecovertformCodeGeneratio

Run the Generated Code

[matchedOriginalLocCG,matchedDistortedLocCG, ...
thetaRecoveredCG,scaleRecoveredCG,recoveredCG] = ...
visionRecovertformCodeGeneration_kernel_mex(original,distorted);

scaleRecovered = 0.702448

thetaRecovered = 29.908159

2-16
Introduction to Code Generation with Feature Matching and Registration

Clean Up

clear visionRecovertformCodeGeneration_kernel_mex;

2-17
2 Code Generation and Third-Party Examples

Compare Codegen with MATLAB Code

Recovered scale and theta for both MATLAB and CODEGEN, as shown above, are within reasonable
tolerance. Furthermore, the matched points are identical, as shown below:

isequal(matchedOriginalLocCG,matchedOriginalLoc)
isequal(matchedDistortedLocCG,matchedDistortedLoc)

ans =

logical

ans =

logical

Appendix

The following helper functions are used in this example.

• featureMatchingVisualization_extrinsic

2-18
Code Generation for Face Tracking with PackNGo

Code Generation for Face Tracking with PackNGo

This example shows how to generate code from “Face Detection and Tracking Using the KLT
Algorithm” on page 8-89 example with packNGo function. The packNGo (MATLAB Coder) function
packages all relevant files in a compressed zip file so you can relocate, unpack, and rebuild your
project in another development environment without MATLAB present. This example also shows how
to create a makefile for the packNGo content, rebuild the source files and finally run the standalone
executable outside MATLAB environment.

This example requires a MATLAB® Coder™ license.

This example is a function with the main body at the top and helper routines in the form of “Nested
Functions” below.

function FaceTrackingKLTpackNGoExample()

Set Up Your C++ Compiler

To run this example, you must have access to a C++ compiler and you must configure it using 'mex -
setup c++' command. For more information, see “Choose a C++ Compiler”. If you deploy the
application on MATLAB host, use a C++ compiler that is compatible with the compiler used to build
OpenCV libraries. For more information, see “Portable C Code Generation for Functions That Use
OpenCV Library” on page 22-4.

Break Out the Computational Part of the Algorithm into a Separate MATLAB Function

MATLAB Coder requires MATLAB code to be in the form of a function in order to generate C code.
The code for the main algorithm of this example resides in a function called
FaceTrackingKLTpackNGo_kernel.m. This file is derived from “Face Detection and Tracking Using the
KLT Algorithm” on page 8-89. To learn how to modify the MATLAB code to make it compatible for
code generation, you can look at example “Introduction to Code Generation with Feature Matching
and Registration” on page 2-12.

fileName = 'FaceTrackingKLTpackNGo_kernel.m';
visiondemo_dir = pwd;
currentDir = pwd; % Store the current directory
fileName = fullfile(visiondemo_dir, fileName);

Configure Code Generation Arguments for packNGo

Create a code generation configuration object for EXE output with packNGo function call in post code
generation stage.

codegenArgs = createCodegenArgs(visiondemo_dir);

Setup Code Generation Environment

Change output directory name.

codegenOutDir = fullfile(visiondemo_dir, 'codegen');

mkdir(codegenOutDir);

Add path to the existing directory to have access to necessary files.

2-19
2 Code Generation and Third-Party Examples

currentPath = addpath(visiondemo_dir);
pathCleanup = onCleanup(@()path(currentPath));
cd(codegenOutDir);
dirChange = onCleanup(@()cd(currentDir));

Create the Packaged Zip-file

Invoke codegen command with packNGo function call.

fprintf('-> Generating Code (it may take a few minutes) ....\n');
codegen(codegenArgs{:}, fileName);

-> Generating Code (it may take a few minutes) ....

Warning: C Compiler produced warnings. See the build log for further details.

Code generation successful (with warnings): To view the report, open('codegen\exe\FaceTrackingKLT

Note that, instead of using codegen command, you can open a dialog and launch a code generation
project using codegen (MATLAB Coder). Use the post code generation command with packNGo
function to create a zip file.

Build Standalone Executable

Unzip the zip file into a new folder. Note that the zip file contains source files, header files, libraries,
MAT-file containing the build information object, data files. unzipPackageContents and other
helper functions are included in the appendix.
zipFileLocation = codegenOutDir;
fprintf('-> Unzipping files ....\n');
unzipFolderLocation = unzipPackageContents(zipFileLocation);

-> Unzipping files ....

Create platform dependent makefile from a template makefile.

fprintf('-> Creating makefile ....\n');
[~, fname, ~] = fileparts(fileName);
makefileName = createMakeFile(visiondemo_dir, unzipFolderLocation, fname);

-> Creating makefile ....

Create the commands required to build the project and to run it.
fprintf('-> Creating ''Build Command'' and ''Run command'' ....\n');
[buildCommand, runCommand] = createBuildAndRunCommands(zipFileLocation,...
unzipFolderLocation,makefileName,fname);

-> Creating 'Build Command' and 'Run command' ....

Build the project using build command.

fprintf('-> Building executable....\n');
buildExecutable(unzipFolderLocation, buildCommand);

-> Building executable....

Run the Executable and Deploy

Run the executable and verify that it works.

2-20
Code Generation for Face Tracking with PackNGo

cd(unzipFolderLocation);
system(runCommand);

The application can be deployed in another machine by copying the executable and the library files.

isPublishing = ~isempty(snapnow('get'));
if ~isPublishing % skip printing out directory to html page
fprintf('Executable and library files are located in the following folder:\n%s\n', unzipFolderL
fprintf('To re-execute run the following commands:\n');
fprintf('1. cd(''%s'')\n', unzipFolderLocation);
fprintf('2. system(''%s'')\n', runCommand);
end

Appendix - Helper Functions

% Configure coder to create executable. Use packNGo at post code

% generation stage.
function codegenArgs = createCodegenArgs(folderForMainC)
% Create arguments required for code generation.

% For standalone executable a main C function is required. The main.c

% created for this example is compatible with the content of the file
% visionFaceTrackingKLTpackNGo_kernel.m
mainCFile = fullfile(folderForMainC,'main.c');

% Handle path with space

if contains(mainCFile, ' ')
mainCFile = ['"' mainCFile '"'];
end

cfg = coder.config('exe');
cfg.PostCodeGenCommand = 'packNGo(buildInfo,''packType'',''hierarchical'');';
cfg.CustomSource = mainCFile;
cfg.CustomInclude = folderForMainC;
cfg.EnableOpenMP = false;

codegenArgs = {'-config', cfg};

end

% Create a folder and unzip the packNGo content into it.

function unzipFolderLocation = unzipPackageContents(zipFileLocation)
% Unzip the packaged zip file.

unzipFolderLocationName = 'unzipPackNGo';
mkdir(unzipFolderLocationName);

% Get the name of the zip file generated by packNGo.

zipFile = dir('*.zip');

assert(numel(zipFile)==1);

unzip(zipFile.name,unzipFolderLocationName);

% Unzip internal zip files created in hierarchical packNGo.

zipFileInternal = dir(fullfile(unzipFolderLocationName,'*.zip'));
assert(numel(zipFileInternal)==3);

2-21
2 Code Generation and Third-Party Examples

for i=1:numel(zipFileInternal)
unzip(fullfile(unzipFolderLocationName,zipFileInternal(i).name), ...
unzipFolderLocationName);
end

unzipFolderLocation = fullfile(zipFileLocation,unzipFolderLocationName);
end

% Create platform dependent makefile from template makefile. Use

% buildInfo to get info about toolchain.
function makefileName = createMakeFile(visiondemo_dir, unzipFolderLocation, fname)
% Create Makefile from buildInfo.

binfo = load(fullfile(pwd, 'codegen', 'exe', fname, 'buildInfo.mat'));

lastDir = cd(unzipFolderLocation);
dirCleanup = onCleanup(@()cd(lastDir));

% Get the root directory that contains toolbox/vision sub-directories

matlabDirName = getRootDirName(unzipFolderLocation);

% Get defines
horzcat_with_space = @(cellval)sprintf('%s ',cellval{:});
defs = horzcat_with_space(getDefines(binfo.buildInfo));

% Get source file list

if ispc
[~, cFiles] = system(['dir /s/b ' '*.c']);
[~, cppFiles] = system(['dir /s/b ' '*.cpp']);

else
[~, cFiles] = system(['find ./ ' '-name ' '''*.c''']);
[~, cppFiles] = system(['find ./ ' '-name ' '''*.cpp''']);

end

cIndx = strfind(cFiles, '.c');

cppIndx = strfind(cppFiles, '.cpp');
srcFilesC = [];
srcFilesCPP = [];

for i = 1:length(cIndx)
if i == 1
startIdx = 1;
endIdx = cIndx(i);
else
startIdx = cIndx(i-1)+1;
endIdx = cIndx(i);
end

[~, b, ~] = fileparts(cFiles(startIdx:endIdx));
srcFilesC = [srcFilesC ' ' b '.c']; %#ok<AGROW>
end

for i = 1:length(cppIndx)
if i == 1
startIdx = 1;
endIdx = cppIndx(i);

2-22
Code Generation for Face Tracking with PackNGo

else
startIdx = cppIndx(i-1)+1;
endIdx = cppIndx(i);
end

[~, b, ~] = fileparts(cppFiles(startIdx:endIdx));
srcFilesCPP = [srcFilesCPP ' ' b '.cpp']; %#ok<AGROW>
end

srcFiles = [srcFilesC ' ' srcFilesCPP];

% Get platform dependent names

if isunix % both mac and linux
tmf = 'TemplateMakefilePackNGo_unix';
if ismac
archDir = 'maci64';
dllExt = 'dylib';
else
archDir = 'glnxa64';
dllExt = 'so';
end
else
tmf = 'TemplateMakefilePackNGo_win';
archDir = 'win64';
dllExt = 'dll';
end

% Now that we have defines, lets create a platform dependent makefile

% from template.
fid = fopen(fullfile(visiondemo_dir,tmf));

filecontent = char(fread(fid)');
fclose(fid);

newfilecontent = regexprep(filecontent,...
{'PASTE_ARCH','PASTE_EXT','PASTE_DEFINES','PASTE_SRCFILES', 'PASTE_MATLAB'},...
{ archDir, dllExt, defs, srcFiles, matlabDirName});

makefileName = 'Makefile';
mk_name = fullfile(unzipFolderLocation,makefileName);

if isunix
if( ismac )
[status,sysHeaderPath] = system( 'xcode-select -print-path' );
assert(status==0, ['Could not obtain a path to the system ' ...
'header files using ''xcode-select -print-path''' '']);

[status,sdkPaths] = system( 'xcrun -sdk macosx --show-sdk-path' );

assert(status==0, 'Could not find MacOSX sdk' );

% There might be multiple SDK's

sdkPathCell = strsplit(sdkPaths,'\n');
for idx = 1:numel(sdkPathCell)
if ~isempty(sdkPathCell{idx})
% Pick the first one that's not empty.
sdkPath = sdkPathCell{idx};
fprintf('Choosing SDK in %s\n',sdkPath);
break;

2-23
2 Code Generation and Third-Party Examples

end
end
assert(~isempty(sdkPath), ...
sprintf('There is no sdk available in %s. Please check system environment.\n',s

ccCMD = [ 'xcrun clang -isysroot ' deblank( sdkPath ) ];

cppCMD = [ 'xcrun clang++ -isysroot ' deblank( sdkPath ) ];
else
ccCMD = 'gcc';
cppCMD = 'g++';
end

newfilecontent = regexprep(newfilecontent,'PASTE_CC',ccCMD);
newfilecontent = regexprep(newfilecontent,'PASTE_CPP',cppCMD);
end

fid = fopen(mk_name,'w+');
fprintf(fid,'%s',newfilecontent);
fclose(fid);

end

% Create platform specific commands needed to build the executable and

% to run it.
function [buildCommand, runCommand] = createBuildAndRunCommands( ...
packageLocation,unzipFolderLocation,makefileName,fileName)
% Create the build and run command.

if ismac
buildCommand = [' xcrun make -f ' makefileName];
runCommand = ['./' fileName ' "' fileName '"'];
elseif isunix
buildCommand = [' make -f ' makefileName];
runCommand = ['./' fileName ' "' fileName '"'];
else
% On PC we use the generated BAT files (there should be 2) to help
% build the generated code. These files are copied to the
% unzipFolderLocation where we can use them to build.
batFilename = [fileName '_rtw.bat'];
batFilelocation = fullfile(packageLocation,'codegen', ...
filesep,'exe',filesep,fileName);
batFileDestination = unzipFolderLocation;

% For MSVC, also copy 'setup_msvc.bat'

fid = fopen(fullfile(batFilelocation, batFilename));
batFileContent = fread(fid, '*char');
fclose(fid);
if ~isempty(regexp(convertCharsToStrings(batFileContent), 'setup_msvc.bat', 'once'))
setup_msvc_batFile = fullfile(batFilelocation, 'setup_msvc.bat');
copyfile(setup_msvc_batFile, batFileDestination);
end

% Copy it to packNGo output directory.

copyfile(fullfile(batFilelocation,batFilename),batFileDestination);

% The Makefile we created is named 'Makefile', whereas the Batch

% file refers to <filename>_rtw.mk. Hence we rename the file.
newMakefileName = [fileName '_rtw.mk'];

2-24
Code Generation for Face Tracking with PackNGo

oldMakefilename = makefileName;
copyfile(fullfile(batFileDestination,oldMakefilename),...
fullfile(batFileDestination,newMakefileName));

buildCommand = batFilename;
runCommand = [fileName '.exe' ' "' fileName '"'];
end

end

% Build the executable with the build command.

function buildExecutable(unzipFolderLocation, buildCommand)
% Call system command to build the executable.

lastDir = cd(unzipFolderLocation);
dirCleanup = onCleanup(@()cd(lastDir));

[hadError, sysResults] = system(buildCommand);

if hadError
error (sysResults);
end

end

% Get the root directory that contains toolbox/vision sub-directories

function matlabDirName = getRootDirName(unzipFolderName)
dirLists = dir(unzipFolderName);
dirLists = dirLists(~ismember({dirLists.name},{'.','..'}));

matlabDirName='';
for ij=1:length(dirLists)
thisDirName = dirLists(ij).name;
if (isfolder(thisDirName))
% subdirectory will have toolbox/vision
[subDir1, hasSubDir1] = hasSubdirectory(thisDirName, 'toolbox');
if hasSubDir1
[~, hasSubDir2] = hasSubdirectory(subDir1, 'vision');
if hasSubDir2
matlabDirName = thisDirName;
break;
end
end
end
end
end

% Find the directory that contains the specified sub-directory

function [subDir, hasSubDir] = hasSubdirectory(dirName, subDirName)
dirLists = dir(dirName);
dirLists = dirLists(~ismember({dirLists.name},{'.','..'}));

subDir = '';
hasSubDir = false;

for ij=1:length(dirLists)
thisDirName = dirLists(ij).name;
thisDir = fullfile(dirName,thisDirName);

2-25
2 Code Generation and Third-Party Examples

if (isfolder(thisDir) && strcmp(thisDirName, subDirName))

hasSubDir = true;
subDir = thisDir;
break;
end
end
end

end

2-26
Code Generation for Depth Estimation From Stereo Video

Code Generation for Depth Estimation From Stereo Video

This example shows how to use the MATLAB® Coder™ to generate C code for a MATLAB function,
which uses the stereoParameters object produced by Stereo Camera Calibrator app or the
estimateCameraParameters function. The example explains how to modify the MATLAB code in
the “Depth Estimation from Stereo Video” on page 1-181 example to support code generation.

This example requires a MATLAB Coder license.

Code Generation

You can learn about the basics of code generation using the MATLAB® Coder™ from the
“Introduction to Code Generation with Feature Matching and Registration” on page 2-12 example.

Restructuring the MATLAB Code for C Code Generation

MATLAB Coder requires MATLAB code to be in the form of a function in order to generate C code.
Furthermore, the arguments of the function cannot be MATLAB objects.

This presents a problem for generating code from MATLAB code, which uses cameraParameters or
stereoParameters objects, which are typically created in advance during camera calibration. To
solve this problem, use the toStruct() method to convert the cameraParameters or the
stereoParameters object into a struct. The struct can then be passed into the generated code.

The restructured code for the main algorithm of “Depth Estimation from Stereo Video” on page 1-181
example resides in a function called depthEstimationFromStereoVideo_kernel.m. Note that
depthEstimationFromStereoVideo_kernel is a function that takes a struct created from a
stereoParameters object. Note also that it does not display the reconstructed 3-D point cloud,
because the showPointCloudFunction does not support code generation.

Load the Parameters of the Stereo Camera

Load the stereoParameters object, which is the result of calibrating the camera using either the
stereoCameraCalibrator app or the estimateCameraParameters function.

% Load the stereoParameters object.

load('handshakeStereoParams.mat');

% Visualize camera extrinsics.

showExtrinsics(stereoParams);

% Convert the object into a struct, which can be passed into generated
% code.
stereoParamsStruct = toStruct(stereoParams);

2-27
2 Code Generation and Third-Party Examples

Uncompress Video Files

On Macintosh, VideoReader does not support code generation for reading compressed video.
Uncompress the video files, and store them in the temporary directory.

if strcmp(computer(), 'MACI64') || strcmp(computer(), 'MACA64')

% Uncompress the left video.
videoFileLeft = 'handshake_left.avi';
reader = VideoReader(videoFileLeft);
writer = vision.VideoFileWriter(videoFileLeft);
while hasFrame(reader)
frame = readFrame(reader);
step(writer, frame);
end
release(writer);

% Uncompress the right video.

videoFileRight = 'handshake_right.avi';
reader = VideoReader(videoFileRight);
writer = vision.VideoFileWriter(videoFileRight);
while hasFrame(reader)
frame = readFrame(reader);
step(writer, frame);
end
release(writer);
end

2-28
Code Generation for Depth Estimation From Stereo Video

Compile the MATLAB Function Into a MEX File

Use the codegen function to compile the depthEstimationFromStereoVideo_kernel function

into a MEX-file. You can specify the '-report' option to generate a compilation report that shows the
original MATLAB code and the associated files that were created during C code generation. You may
want to create a temporary directory where MATLAB Coder can store generated files. Note that the
generated MEX-file has the same name as the original MATLAB file with _mex appended, unless you
use the -o option to specify the name of the executable.

compileTimeInputs = {coder.typeof(stereoParamsStruct)};

% Generate code.
codegen depthEstimationFromStereoVideo_kernel -args compileTimeInputs;

Code generation successful.

Run the Generated Code

player = vision.VideoPlayer('Position', [100 200 750 560]);

eofReached = false;
while ~eofReached
[eofReached, dispFrame] = depthEstimationFromStereoVideo_kernel_mex(stereoParamsStruct);

% Hold the last frame.

if ~eofReached
step(player, dispFrame);
end
end

2-29
2 Code Generation and Third-Party Examples

Clean Up

clear depthEstimationFromStereoVideo_kernel_mex;
release(player);

2-30
Code Generation for Depth Estimation From Stereo Video

Summary

This example showed how to generate C code from MATLAB code that takes a cameraParameters
or a stereoParameters object as input.

2-31
2 Code Generation and Third-Party Examples

Detect Face (Raspberry Pi2)

This example shows how to use the MATLAB® Coder™ to generate C code from a MATLAB file and
deploy the application on an ARM target.

The example reads video frames from a webcam and detects faces in each of the frames using the
Viola-Jones face detection algorithm. The detected faces are displayed with bounding boxes. The
webcam function, from 'MATLAB Support Package for USB Webcams', and the VideoPlayer object,
from the Computer Vision System toolbox™, are used for the simulation on the MATLAB host. The
two functions do not support the ARM target, so OpenCV-based webcam reader and video viewer
functions are used for deployment.

The target must have OpenCV version 3.4.0 libraries (built with GTK) and a standard C++ compiler.
A Raspberry Pi 2 with Raspbian Stretch operating system was used for deployment. The example
should work on any ARM target.

This example requires a MATLAB Coder license.

This example is a function with the main body at the top and helper routines in the form of “Nested
Functions” below.

function FaceDetectionARMCodeGenerationExample()

Set Up Your C++ Compiler

To run this example, you must have access to a C++ compiler and you must configure it using 'mex -
setup c++' command. For more information, see “Choose a C++ Compiler”.

Break Out the Computational Part of the Algorithm into a Separate MATLAB Function

MATLAB Coder requires MATLAB code to be in the form of a function in order to generate C code.
The code for the main algorithm of this example resides in a function called
faceDetectionARMKernel.m. The function takes an image from a webcam, as the input. The function
outputs the image with a bounding box around the detected faces. The output image will be displayed
on video viewer window. To learn how to modify the MATLAB code to make it compatible for code
generation, you can look at example “Introduction to Code Generation with Feature Matching and
Registration” on page 2-12.

fileName = 'faceDetectionARMKernel.m';

Create Main Function with I/O Functionality

For a standalone executable target, MATLAB Coder requires that you create a C file containing a
function named "main". This example uses faceDetectionARMMain.c file. This main function in this
file performs the following tasks:

• Reads video frames from the webcam

• Sends video frames to the face detection algorithm
• Displays output frames containing bounding boxes around detected faces

For simulation on MATLAB host, the tasks performed in faceDetectionARMMain.c file is implemented
in faceDetectionARMMain.m

2-32
Detect Face (Raspberry Pi2)

Webcam Reader and Video Viewer

For deployment on ARM, this example implements webcam reader functionality using OpenCV
functions. It also implements a video viewer using OpenCV functions. These OpenCV based utility
functions are implemented in the following files:

• helperOpenCVWebcam.hpp
• helperOpenCVWebcam.cpp
• helperOpenCVVideoViewer.cpp
• helperOpenCVVideoViewer.hpp

For simulation on MATLAB host, the example uses the webcam function from the 'MATLAB Support
Package for USB Webcams' and the VideoPlayer object from the Computer Vision System toolbox.
Run the simulation on the MATLAB host by typing faceDetectionARMMain at the MATLAB®
command line.

OpenCV for ARM Target

This example requires that you install OpenCV 3.4.0 libraries on your ARM target. The video viewer
requires that you build the highqui library in OpenCV with GTK for the ARM target.

Follow the steps to download and build OpenCV 3.4.0 on Raspberry Pi 2 with preinstalled Raspbian
Stretch. You must update your system firmware or install other developer tools and packages as
needed for your system configuration before you start building OpenCV.

Turn off INSTALL_C_EXAMPLES due to: https://github.com/opencv/opencv/issues/5851

Turn off ENABLE_PRECOMPILED_HEADERS due to: https://github.com/opencv/opencv/issues/9942

• $ wget -O opencv-3.4.0.zip https://github.com/opencv/opencv/archive/3.4.0.zip

• $ unzip opencv-3.4.0.zip
• $ cd opencv-3.4.0
• $ mkdir build
• $ cd build
• $ cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D
INSTALL_C_EXAMPLES=OFF -D BUILD_EXAMPLES=ON -D WITH_GTK=ON -D
WITH_FFMPEG=OFF -D ENABLE_PRECOMPILED_HEADERS=OFF ..

These steps are followed to compile and install OpenCV:

• $ make
• $ sudo make install

For official deployment of the example, OpenCV libraries were installed in the following directory on
Raspberry Pi 2:

/usr/local/lib

and the associated headers were placed in

/usr/local/include

2-33
2 Code Generation and Third-Party Examples

Configure Code Generation Arguments

Create a code generation configuration object for EXE output.

codegenArgs = createCodegenArgs();

Generate Code

Invoke codegen command.

fprintf('-> Generating Code (it may take a few minutes) ....\n');
codegen(codegenArgs{:}, fileName);
% During code generation, all dependent file information is stored in a mat
% file named buildInfo.mat.

-> Generating Code (it may take a few minutes) ....

Code generation successful.

Create the Packaged Zip-file

Use build information stored in buildInfo.mat to create a zip folder using packNGo.
fprintf('-> Creating zip folder (it may take a few minutes) ....\n');
bInfo = load(fullfile('codegen','exe','faceDetectionARMKernel','buildInfo.mat'));
packNGo(bInfo.buildInfo, {'packType', 'hierarchical', ...
'fileName', 'faceDetectionARMKernel'});
% The generated zip folder is faceDetectionARMKernel.zip

-> Creating zip folder (it may take a few minutes) ....

Create Project Folder

Unzip faceDetectionARMKernel.zip into a folder named FaceDetectionARM. Unzip all files and
remove the .zip files.
packngoDir = hUnzipPackageContents();

Warning: Directory already exists.

Update Makefile and Copy to Project Folder

The Makefile, faceDetectionARMMakefile.mk, provided in this example is written for Raspberry PI 2

with specific optimization flags. The Makefile was written to work with GCC in a Linux environment
and with your OpenCV libraries located in /usr/local/lib. You can update the Makefile based on your
target configuration. Copy the Makefile to the project folder.
copyfile('faceDetectionARMMakefile.mk', packngoDir);
% Also move the file containing the main function in the top level folder.
copyfile('faceDetectionARMMain.c', packngoDir);
% For simplicity, make sure the root directory name is matlab.
setRootDirectory(packngoDir);

Deployment on ARM

Deploy your project on ARM:

disp('Follow these steps to deploy your project on ARM');

Follow these steps to deploy your project on ARM

2-34
Detect Face (Raspberry Pi2)

Transfer Code to ARM Target

Transfer your project folder named FaceDetectionARM to your ARM target using your preferred file
transfer tool. Since the Raspberry Pi 2 (with Raspbian Stretch) already has an SSH server, you can
use SFTP to transfer files from host to target.

For official deployment of this example, the FileZilla SFTP Client was installed on the host machine
and the project folder was transferred from the host to the /home/pi/FaceDetectionARM folder on
Raspberry Pi.

disp('Step-1: Transfer the folder ''FaceDetectionARM'' to your ARM target');

Step-1: Transfer the folder 'FaceDetectionARM' to your ARM target

Build the Executable on ARM

Run the makefile to build the executable on ARM. For Raspberry Pi 2, (with Raspbian Stretch), open a
linux shell and cd to /home/pi/FaceDetectionARM. Build the executable using the following command:

make -f faceDetectionARMMakefile

The command creates an executable, faceDetectionARMKernel.

disp('Step-2: Build the executable on ARM using the shell command: make -f faceDetectionARMMakefi

Step-2: Build the executable on ARM using the shell command: make -f faceDetectionARMMakefile.mk

Run the Executable on ARM

Run the executable generated in the above step. For Raspberry Pi 2, (with Raspbian Stretch), use the
following command in the shell window:

./faceDetectionARMKernel

Make sure that you are connected to the Raspberry Pi with a window manager, and not just through a
command line terminal to avoid errors related to GTK. This is necessary for the tracking window to
show up.

To close the video viewer while the executable is running on Raspberry Pi2, click on the video viewer
and press the escape key.

disp('Step-3: Run the executable on ARM using the shell command: ./faceDetectionARMKernel');

Step-3: Run the executable on ARM using the shell command: ./faceDetectionARMKernel

Appendix - Helper Functions

% Configure coder to create executable. Use packNGo at post code

% generation stage.
function codegenArgs = createCodegenArgs()
% Create arguments required for code generation.

% First - create configuration object

%
% For standalone executable a main C function is required. The
% faceDetectionARMMain.c created for this example is compatible
% with the content of the file faceDetectionARMKernel.m
mainCFile = 'faceDetectionARMMain.c';

2-35
2 Code Generation and Third-Party Examples

% Include helper functions

camCPPFile = 'helperOpenCVWebcam.cpp';
viewerCPPFile = 'helperOpenCVVideoViewer.cpp';

% Handle path with space

if contains(mainCFile, ' ')
mainCFile = ['"' mainCFile '"'];
camCPPFile = ['"' camCPPFile '"'];
viewerCPPFile = ['"' viewerCPPFile '"'];
end

% Create configuration object

cfg = coder.config('exe');
cfg.CustomSource = sprintf('%s\n%s\n%s',mainCFile,camCPPFile,viewerCPPFile);
cfg.CustomInclude = pwd;
% Set production hardware to ARM to generate ARM compatible portable code
cfg.HardwareImplementation.ProdHWDeviceType = 'ARM Compatible->ARM Cortex';
cfg.EnableOpenMP = false;

% Create input arguments

inRGB_type = coder.typeof(uint8(0),[480 640 3]);
% Use '-c' option to generate C code without calling C++ compiler.
codegenArgs = {'-config', cfg, '-c', '-args', {inRGB_type}};

end

% Unzip the packaged zip file

function packngoDir = hUnzipPackageContents()

packngoDirName = 'FaceDetectionARM';

% create packngo directory

mkdir(packngoDirName);

% get the name of the single zip file generated by packngo

zipFile = dir('*.zip');
assert(numel(zipFile)==1);

unzip(zipFile.name,packngoDirName);

% unzip internal zip files created in hierarchical packNGo

zipFileInternal = dir(fullfile(packngoDirName,'*.zip'));

for i=1:numel(zipFileInternal)
unzip(fullfile(packngoDirName,zipFileInternal(i).name), ...
packngoDirName);
end
% delete internal zip files
delete(fullfile(packngoDirName,'*.zip'));
packngoDir = packngoDirName;
end

% Set root directory as matlab

function setRootDirectory(packngoDir)
dirList = dir(packngoDir);
if isempty(find(ismember({dirList.name},'matlab'), 1))
% root directory is not matlab. Change it to matlab

2-36
Detect Face (Raspberry Pi2)

for i=1:length(dirList)
thisDir = fullfile(packngoDir,dirList(i).name, 'toolbox', 'vision');
if isfolder(thisDir)
% rename the dir
movefile(fullfile(packngoDir,dirList(i).name), ...
fullfile(packngoDir,'matlab'));
break;
end
end
end
end

end

2-37
2 Code Generation and Third-Party Examples

Track Face (Raspberry Pi2)

This example shows how to use the MATLAB® Coder™ to generate C code from a MATLAB file and
deploy the application on ARM target.

The example reads video frames from a webcam. It detects a face using Viola-Jones face detection
algorithm and tracks the face in a live video stream using the KLT algorithm. It finally displays the
frame with a bounding box and a set of markers around the face being tracked. The webcam function,
from 'MATLAB Support Package for USB Webcams', and the VideoPlayer object, from the Computer
Vision System toolbox™, are used for the simulation on the MATLAB host. The two functions do not
support the ARM target, so OpenCV-based webcam reader and video viewer functions are used for
deployment.

This example requires a MATLAB Coder license.

This example is a function with the main body at the top and helper routines in the form of “Nested
Functions” below.

function FaceTrackingARMCodeGenerationExample()

Set Up Your C++ Compiler

To run this example, you must have access to a C++ compiler and you must configure it using 'mex -
setup c++' command. For more information, see “Choose a C++ Compiler”.

Break Out the Computational Part of the Algorithm into a Separate MATLAB Function

MATLAB Coder requires MATLAB code to be in the form of a function in order to generate C code.
The code for the main algorithm of this example resides in a function called
faceTrackingARMKernel.m. The function takes an image from a webcam, as the input. The function
outputs the image with a bounding box and a set of markers around the face. The output image will
be displayed on video viewer window. To learn how to modify the MATLAB code to make it compatible
for code generation, you can look at example “Introduction to Code Generation with Feature
Matching and Registration” on page 2-12.

fileName = 'faceTrackingARMKernel.m';

Create Main Function with I/O Functionality

For a standalone executable target, MATLAB Coder requires that you create a C file containing a
function named "main". This example uses faceTrackingARMMain.c file. This main function in this file
performs the following tasks:

• Reads video frames from the webcam

• Sends video frames to the face tracking algorithm
• Displays output frames containing bounding box and markers around the face

For simulation on MATLAB host, the tasks performed in faceTrackingARMMain.c file is implemented
in faceTrackingARMMain.m

2-38
Track Face (Raspberry Pi2)

Webcam Reader and Video Viewer

• helperOpenCVWebcam.hpp
• helperOpenCVWebcam.cpp
• helperOpenCVVideoViewer.cpp
• helperOpenCVVideoViewer.hpp

OpenCV for ARM Target

This example requires that you install OpenCV 3.4.0 libraries on your ARM target. The video viewer
requires that you build the highgui library in OpenCV with GTK for the ARM target.

Turn off INSTALL_C_EXAMPLES due to: https://github.com/opencv/opencv/issues/5851

Turn off ENABLE_PRECOMPILED_HEADERS due to: https://github.com/opencv/opencv/issues/9942

• $ wget -O opencv-3.4.0.zip https://github.com/opencv/opencv/archive/3.4.0.zip

These steps are followed to compile and install OpenCV:

• $ make
• $ sudo make install

For official deployment of the example, OpenCV libraries were installed in the following directory on
Raspberry Pi 2:

/usr/local/lib

and the associated headers were placed in

/usr/local/include

2-39
2 Code Generation and Third-Party Examples

Configure Code Generation Arguments

Create a code generation configuration object for EXE output.

codegenArgs = createCodegenArgs();

Generate Code

Invoke codegen command.

-> Generating Code (it may take a few minutes) ....

Code generation successful.

Create the Packaged Zip-file

Use build information stored in buildInfo.mat to create a zip folder using packNGo.
fprintf('-> Creating zip folder (it may take a few minutes) ....\n');
bInfo = load(fullfile('codegen','exe','faceTrackingARMKernel','buildInfo.mat'));
packNGo(bInfo.buildInfo, {'packType', 'hierarchical', ...
'fileName', 'faceTrackingARMKernel'});
% The generated zip folder is faceTrackingARMKernel.zip

-> Creating zip folder (it may take a few minutes) ....

Create Project Folder

Unzip faceTrackingARMKernel.zip into a folder named FaceTrackingARM. Unzip all files and remove
the .zip files.
packngoDir = hUnzipPackageContents();

Warning: Directory already exists.

Update Makefile and Copy to Project Folder

The Makefile, faceTrackingARMMakefile.mk, provided in this example is written for Raspberry PI 2

with specific optimization flags. The Makefile was written to work with GCC in a Linux environment
and with your OpenCV libraries located in /usr/local/lib. You can update the Makefile based on your
target configuration. Copy the Makefile to the project folder.
copyfile('faceTrackingARMMakefile.mk', packngoDir);
% Also move the file containing the main function in the top level folder.
copyfile('faceTrackingARMMain.c', packngoDir);
% For simplicity, make sure the root directory name is matlab.
setRootDirectory(packngoDir);

Deployment on ARM

Deploy your project on ARM:

disp('Follow these steps to deploy your project on ARM');

Follow these steps to deploy your project on ARM

2-40
Track Face (Raspberry Pi2)

Transfer Code to ARM Target

Transfer your project folder named FaceTrackingARM to your ARM target using your preferred file
transfer tool. Since the Raspberry Pi 2 (with Raspbian Stretch) already has an SSH server, you can
use SFTP to transfer files from host to target.

For official deployment of this example, the FileZilla SFTP Client was installed on the host machine
and the project folder was transferred from the host to the /home/pi/FaceTrackingARM folder on
Raspberry Pi.

disp('Step-1: Transfer the folder ''FaceTrackingARM'' to your ARM target');

Step-1: Transfer the folder 'FaceTrackingARM' to your ARM target

Build the Executable on ARM

Run the makefile to build the executable on ARM. For Raspberry Pi 2, (with Raspbian Stretch), open a
command line terminal and 'cd' to /home/pi/FaceTrackingARM. Build the executable using the
following command:

make -f faceTrackingARMMakefile.mk

The command creates an executable, faceTrackingARMKernel.

disp('Step-2: Build the executable on ARM using the shell command: make -f faceTrackingARMMakefil

Step-2: Build the executable on ARM using the shell command: make -f faceTrackingARMMakefile.mk

Run the Executable on ARM

Run the executable generated in the above step. For Raspberry Pi 2, (with Raspbian Stretch), use the
following command in the shell window:

./faceTrackingARMKernel

To close the video viewer while the executable is running on Raspberry Pi2, click on the video viewer
and press the escape key.

disp('Step-3: Run the executable on ARM using the shell command: ./faceTrackingARMKernel');

Step-3: Run the executable on ARM using the shell command: ./faceTrackingARMKernel

Appendix - Helper Functions

% Configure coder to create executable. Use packNGo at post code

% generation stage.
function codegenArgs = createCodegenArgs()
% Create arguments required for code generation.

% First - create configuration object

%
% For standalone executable a main C function is required. The
% faceTrackingARMMain.c created for this example is compatible
% with the content of the file faceTrackingARMKernel.m

2-41
2 Code Generation and Third-Party Examples

mainCFile = 'faceTrackingARMMain.c';

% Include helper functions

camCPPFile = 'helperOpenCVWebcam.cpp';
viewerCPPFile = 'helperOpenCVVideoViewer.cpp';

% Handle path with space

if contains(mainCFile, ' ')
mainCFile = ['"' mainCFile '"'];
camCPPFile = ['"' camCPPFile '"'];
viewerCPPFile = ['"' viewerCPPFile '"'];
end

% Create configuration object

% Create input arguments

inRGB_type = coder.typeof(uint8(0),[480 640 3]);
% Use '-c' option to generate C code without calling C++ compiler.
codegenArgs = {'-config', cfg, '-c', '-args', {inRGB_type}};

end

% Unzip the packaged zip file

function packngoDir = hUnzipPackageContents()

packngoDirName = 'FaceTrackingARM';

% create packngo directory

mkdir(packngoDirName);

% get the name of the single zip file generated by packngo

zipFile = dir('*.zip');
assert(numel(zipFile)==1);

unzip(zipFile.name,packngoDirName);

% unzip internal zip files created in hierarchical packNGo

zipFileInternal = dir(fullfile(packngoDirName,'*.zip'));

% Set root directory as matlab

function setRootDirectory(packngoDir)
dirList = dir(packngoDir);
if isempty(find(ismember({dirList.name},'matlab'), 1))

2-42
Track Face (Raspberry Pi2)

% root directory is not matlab. Change it to matlab

end

2-43
2 Code Generation and Third-Party Examples

Video Display in a Custom User Interface

This example shows how to display multiple video streams in a custom graphical user interface (GUI).

Overview

When working on a project involving video processing, we are often faced with creating a custom
user interface. It may be needed for the purpose of visualizing and/or demonstrating the effects of our
algorithms on the input video stream. This example illustrates how to create a figure window with
two axes to display two video streams. It also shows how to set up buttons and their corresponding
callbacks.

This example is written as a function with the main body at the top. The example also uses nested
functions and a separate helper function listed.

function VideoInCustomGUIExample()

Initialize the video reader.

videoSrc = VideoReader('vipmen.avi');

Create a figure window and two axes to display the input video and the processed video.

[hFig, hAxes] = createFigureAndAxes();

Add buttons to control video playback.

insertButtons(hFig, hAxes, videoSrc);

2-44
Video Display in a Custom User Interface

Interact with the New User Interface

Now that the GUI is constructed, we can press the play button to trigger the main video processing
loop defined in the getAndProcessFrame function listed below.

% Initialize the display with the first frame of the video

frame = getAndProcessFrame(videoSrc, 0);
% Display input video frame on axis
showFrameOnAxis(hAxes.axis1, frame);
showFrameOnAxis(hAxes.axis2, zeros(size(frame(:,:,1))+60,'uint8'));

Note that each video frame is centered in the axis box. If the axis size is bigger than the frame size,
video frame borders are padded with background color. If axis size is smaller than the frame size
scroll bars are added.

2-45
2 Code Generation and Third-Party Examples

Create Figure, Axes, Titles

Create a figure window and two axes with titles to display two videos.

function [hFig, hAxes] = createFigureAndAxes()

% Close figure opened by last run

figTag = 'CVST_VideoOnAxis_9804532';
close(findobj('tag',figTag));

% Create new figure

hFig = figure('numbertitle', 'off', ...
'name', 'Video In Custom GUI', ...
'menubar','none', ...
'toolbar','none', ...
'resize', 'on', ...
'tag',figTag, ...
'position',[680 678 480 240],...
'HandleVisibility','callback'); % hide the handle to prevent unintended modificati

% Create axes and titles

hAxes.axis1 = createPanelAxisTitle(hFig,[0.1 0.2 0.36 0.6],'Original Video'); % [X Y W H]
hAxes.axis2 = createPanelAxisTitle(hFig,[0.5 0.2 0.36 0.6],'Rotated Video');

end

Create Axis and Title

Axis is created on uipanel container object. This allows more control over the layout of the GUI. Video
title is created using uicontrol.

function hAxis = createPanelAxisTitle(hFig, pos, axisTitle)

% Create panel
hPanel = uipanel('parent',hFig,'Position',pos,'Units','Normalized');

% Create axis
hAxis = axes('position',[0 0 1 1],'Parent',hPanel);
hAxis.XTick = [];
hAxis.YTick = [];
hAxis.XColor = [1 1 1];
hAxis.YColor = [1 1 1];
% Set video title using uicontrol. uicontrol is used so that text
% can be positioned in the context of the figure, not the axis.
titlePos = [pos(1)+0.02 pos(2)+pos(3)+0.3 0.3 0.07];
uicontrol('style','text',...
'String', axisTitle,...
'Units','Normalized',...
'Parent',hFig,'Position', titlePos,...
'BackgroundColor',hFig.Color);
end

Insert Buttons

Insert buttons to play, pause the videos.

function insertButtons(hFig,hAxes,videoSrc)

2-46
Video Display in a Custom User Interface

% Play button with text Start/Pause/Continue

uicontrol(hFig,'unit','pixel','style','pushbutton','string','Start',...
'position',[10 10 75 25], 'tag','PBButton123','callback',...
{@playCallback,videoSrc,hAxes});

% Exit button with text Exit

uicontrol(hFig,'unit','pixel','style','pushbutton','string','Exit',...
'position',[100 10 50 25],'callback', ...
{@exitCallback,hFig});
end

Play Button Callback

This callback function rotates the input video frame and displays the original input and rotated frame
on two different axes. The helper function showFrameOnAxis, is responsible for displaying a frame
of the video on the user-defined axis.

function playCallback(hObject,~,videoSrc,hAxes)
try
% Check the status of play button
isTextStart = strcmp(hObject.String,'Start');
isTextCont = strcmp(hObject.String,'Continue');
if isTextStart
% Two cases: (1) starting first time, or (2) restarting
% Start from first frame
if ~hasFrame(videoSrc)
videoSrc.CurrentTime = 0.0;
end
end
if (isTextStart || isTextCont)
hObject.String = 'Pause';
else
hObject.String = 'Continue';
end

% Rotate input video frame and display original and rotated

% frames on figure
angle = 0;
while strcmp(hObject.String, 'Pause') && hasFrame(videoSrc)
% Get input video frame and rotated frame
[frame,rotatedImg,angle] = getAndProcessFrame(videoSrc,angle);
% Display input video frame on axis
showFrameOnAxis(hAxes.axis1, frame);
% Display rotated video frame on axis
showFrameOnAxis(hAxes.axis2, rotatedImg);
end

% When video reaches the end of file, display "Start" on the

% play button.
if ~hasFrame(videoSrc)
hObject.String = 'Start';
end
catch ME
% Re-throw error message if it is not related to invalid handle
if ~strcmp(ME.identifier, 'MATLAB:class:InvalidHandle')
rethrow(ME);
end

2-47
2 Code Generation and Third-Party Examples

end
end

Video Processing Algorithm

This function defines the main algorithm that is invoked when play button is activated.

function [frame,rotatedImg,angle] = getAndProcessFrame(videoSrc,angle)

% Read input video frame

frame = readFrame(videoSrc);

% Pad and rotate input video frame

paddedFrame = padarray(frame, [30 30], 0, 'both');
rotatedImg = imrotate(paddedFrame, angle, 'bilinear', 'crop');
angle = angle + 1;
end

Exit Button Callback

This callback function closes figure window.

function exitCallback(~,~,hFig)
% Close the figure window
close(hFig);
end

end

2-48
Generate Code for Detecting Objects in Images by Using ACF Object Detector

Generate Code for Detecting Objects in Images by Using ACF

Object Detector

This example shows how to generate code from a MATLAB® function that detects objects in images
by using an acfObjectDetector object. When you intend to generate code from your MATLAB
function that uses an acfObjectDetector object, you must create the object outside of the
MATLAB function. The example explains how to modify the MATLAB code in “Train Stop Sign
Detector Using ACF Object Detector” to support code generation.

Design the MATLAB Code File for Code Generation

To generate C Code, MATLAB Coder requires MATLAB code to be in the form of a function. The
arguments of the function cannot be MATLAB objects. This requirement presents a problem for
generating code from the MATLAB function that uses acfObjectDetector objects created outside
of the MATLAB function. To solve this problem, use the toStruct function to convert the
acfObjectDetector object into a structure and pass the structure to the MATLAB function.

To support code generation, this example restructures the code of an existing example ( See “Train
Stop Sign Detector Using ACF Object Detector”) in a function called detectObjectsUsingACF,
which is present in the current working folder as a supporting file. The detectObjectsUsingACF
function takes an image as an input and loads the pretrained ACF stop sign detector.
type("detectObjectsUsingACF.m")

function [bboxes,scores] = detectObjectsUsingACF(InputImage)

% Load a trained detector from a MAT file
S = coder.load('detectorStruct.mat');
% Define a persistent variable
persistent detector
if isempty(detector)
% Re-create the ACF Object Detector
detector = acfObjectDetector(S.detectorStruct.Classifier,S.detectorStruct.TrainingOptions);
end
% Use the detect function to detect objects in the input image
[bboxes,scores] = detect(detector,InputImage);
end

Create ACF Stop Sign Detector Outside of the MATLAB Function

Load the training data.

load("stopSignsAndCars.mat")

Select the ground truth for stop signs. The ground truth data is the set of known locations of stop
signs in the images.
stopSigns = stopSignsAndCars(:,1:2);

Add the full path to the image files.

stopSigns.imageFilename = fullfile(toolboxdir("vision"),...
"visiondata",stopSigns.imageFilename);

Use the trainACFObjectDetector function to train the ACF detector. Turn off the training
progress output by setting Verbose=false.

2-49
2 Code Generation and Third-Party Examples

detector = trainACFObjectDetector(stopSigns,NegativeSamplesFactor=2,Verbose=false);

Generate C-MEX Function

Because you intend to generate code for the MATLAB function detectObjectsUsingACF, convert
the created detector into a structure.

detectorStruct = toStruct(detector);

Save the trained object structure as a MAT file.

save("detectorStruct.mat","detectorStruct");

Generate C-MEX code that you can run in the MATLAB environment. Use the codegen (MATLAB
Coder) command.

codegen detectObjectsUsingACF -report -args { coder.typeof(uint8(0), [inf inf 3])}

Code generation successful: View report

Detect Objects Using Generated C-MEX Function

To detect objects in an image, load a test image.

img = imread("stopSignTest.jpg");

Call the generated C-MEX function by passing the loaded image img as an input.

[bboxes,scores] = detectObjectsUsingACF_mex(img);

Display the detection results and insert the bounding boxes for objects into the image.

img = insertObjectAnnotation(img,"rectangle",bboxes,scores);
figure
imshow(img)

Clean Up

Release the system memory used to store the generated C-MEX file.

2-50
Generate Code for Detecting Objects in Images by Using ACF Object Detector

clear ObjectDetectionFromImages_mex;

See Also
“Introduction to Code Generation with Feature Matching and Registration” on page 2-12 | “Generate
Code to Detect Edges on Images” (MATLAB Coder)

2-51
3

Deep Learning, Semantic Segmentation,

and Detection Examples

• “Hand Pose Estimation Using HRNet Deep Learning” on page 3-3

• “Recognize Seven-Segment Digits Using OCR” on page 3-17
• “Train an OCR Model to Recognize Seven-Segment Digits” on page 3-22
• “Automate Ground Truth Labeling for OCR” on page 3-33
• “Object Detection In Large Satellite Imagery Using Deep Learning” on page 3-47
• “Augmented Reality Using AprilTag Markers” on page 3-74
• “Multiclass Object Detection Using YOLO v2 Deep Learning” on page 3-84
• “Generate Adversarial Examples for Semantic Segmentation” on page 3-101
• “Classify Defects on Wafer Maps Using Deep Learning” on page 3-112
• “Detect Image Anomalies Using Explainable FCDD Network” on page 3-128
• “Detect Image Anomalies Using Pretrained ResNet-18 Feature Embeddings” on page 3-141
• “Localize Industrial Defects Using PatchCore Anomaly Detector” on page 3-161
• “Detect Defects on Printed Circuit Boards Using YOLOX Network” on page 3-174
• “Train Object Detectors in Experiment Manager” on page 3-182
• “Activity Recognition Using R(2+1)D Video Classification” on page 3-189
• “Activity Recognition from Video and Optical Flow Data Using Deep Learning” on page 3-212
• “Evaluate a Video Classifier” on page 3-240
• “Extract Training Data for Video Classification” on page 3-244
• “Classify Streaming Webcam Video Using SlowFast Video Classifier” on page 3-248
• “Gesture Recognition using Videos and Deep Learning” on page 3-251
• “Explore Semantic Segmentation Network Using Grad-CAM” on page 3-272
• “Point Cloud Classification Using PointNet Deep Learning” on page 3-279
• “Object Detection Using SSD Deep Learning” on page 3-302
• “Object Detection in a Cluttered Scene Using Point Feature Matching” on page 3-315
• “Semantic Segmentation Using Deep Learning” on page 3-326
• “Calculate Segmentation Metrics in Block-Based Workflow” on page 3-345
• “Semantic Segmentation of Multispectral Images Using Deep Learning” on page 3-350
• “3-D Brain Tumor Segmentation Using Deep Learning” on page 3-368
• “Prune and Quantize Semantic Segmentation Network” on page 3-378
• “Train Vision Transformer Network for Image Classification” on page 3-395
• “Image Category Classification Using Bag of Features” on page 3-404
• “Image Category Classification Using Deep Learning” on page 3-411
• “Image Retrieval Using Customized Bag of Features” on page 3-420
3 Deep Learning, Semantic Segmentation, and Detection Examples

• “Create SSD Object Detection Network” on page 3-427

• “Train YOLO v2 Network for Vehicle Detection” on page 3-431
• “Import Pretrained ONNX YOLO v2 Object Detector” on page 3-436
• “Export YOLO v2 Object Detector to ONNX” on page 3-443
• “Estimate Anchor Boxes From Training Data” on page 3-449
• “Object Detection Using YOLO v3 Deep Learning” on page 3-453
• “Object Detection Using YOLO v2 Deep Learning” on page 3-468
• “Create YOLO v2 Object Detection Network” on page 3-478
• “Train Object Detector Using R-CNN Deep Learning” on page 3-483
• “Object Detection Using Faster R-CNN Deep Learning” on page 3-496
• “Train Classification Network to Classify Object in 3-D Point Cloud” on page 3-506
• “Estimate Body Pose Using Deep Learning” on page 3-516
• “Generate Image from Segmentation Map Using Deep Learning” on page 3-524
• “Train Simple Semantic Segmentation Network in Deep Network Designer” on page 3-538
• “Train ACF-Based Stop Sign Detector” on page 3-543
• “Train Fast R-CNN Stop Sign Detector” on page 3-546
• “Perform Instance Segmentation Using SOLOv2” on page 3-549
• “Perform Instance Segmentation Using Mask R-CNN” on page 3-556
• “Object Detection Using YOLO v4 Deep Learning” on page 3-561

3-2
Hand Pose Estimation Using HRNet Deep Learning

Hand Pose Estimation Using HRNet Deep Learning

This example shows how to detect keypoints in a human hand and estimate hand pose using the
HRNet deep learning network.

Overview

Hand pose estimation detects and estimates the 2D pose and configuration of a human hand from an
image or a video. It identifies the position and orientation of the hand joints, such as the locations of
fingertips, knuckles, and the palm. The applications of hand pose estimation include virtual and
augmented reality, human-computer interaction, sign language recognition, gesture-based interfaces,
robotics, and medical diagnosis.

This example uses a High-Resolution Net (HRNet) [1 on page 3-15] deep learning network to detect
keypoints in a human hand. To learn more about the HRNet deep learning network, see “Getting
Started with HRNet” on page 19-86

In this example, you:

• Estimate hand pose using a pretrained HRNet object keypoint detector.

• Configure the pretrained HRNet keypoint detector, using a transfer learning approach, to detect
keypoints in a human hand image.

Download Pretrained Network

Download a pretrained hand pose keypoint detector by using the

helperDownloadHandPoseKeypointDetector helper function. If you want to train the keypoint
detector with a new set of data, set the doTraining variable to true.

doTraining = false;
downloadFolder = tempdir;
pretrainedKeypointDetector = helperDownloadHandPoseKeypointDetector(downloadFolder);

Downloading pretrained hand pose keypoint detector (102 MB)...

Detect Hand Keypoints

Read a test image into the workspace.

I = imread("handPose.jpg");

Specify the bounding box location of the hand region in the form [x y w h]. The x and y values
specify the upper-left corner of the bounding box. w specifies the width of the box, which is its length
along the x-axis. h specifies the height of the box, which is its length along the y-axis.

Alternatively, you can get bounding box locations by training object detectors like
yolov3ObjectDetector and yolov4ObjectDetector to detect object locations.

handBoundingBoxes = [188 123 246 205];

Use the pretrained keypoint detector to detect the hand keypoints in the image.

[keypoints,scores,visibility] = detect(pretrainedKeypointDetector,I,handBoundingBoxes);

3-3
3 Deep Learning, Semantic Segmentation, and Detection Examples

Visualize the detected keypoints. The image represents the detected hand keypoints as yellow dots,
and the keypoint connections using green lines.

outputImg = insertObjectKeypoints(I,keypoints, ...

Connections = pretrainedKeypointDetector.KeypointConnections, ...
ConnectionColor="green", ...
KeypointColor="yellow",KeypointSize=3,LineWidth=3);
outputImg = insertShape(outputImg,rectangle=handBoundingBoxes);
figure
imshow(outputImg)

The remainder of this example shows how to configure the pretrained object keypoint detector using
a transfer learning approach, and train an HRNet deep learning network on a hand pose data set.

Load Training Data

To illustrate the training procedure, this example uses a labeled data set that contains 2500 images
from the Large-Scale Multiview Hand Pose Dataset [2 on page 3-15]. Each image in the data set
contains a human hand with 21 annotated keypoints.

Download and load the hand pose data set.

dataset = helperDownloadHandPoseDataset(downloadFolder);

3-4
Hand Pose Estimation Using HRNet Deep Learning

Downloading hand pose dataset (98 MB)...

data = load(dataset);
handPoseDataset = data.handPoseDataset;

The hand pose data table contains three columns. The first, second, and third columns contain the
image filenames, keypoint locations, and hand bounding boxes, respectively. Keypoints consist of N-
by-2 matrices, where N is the number of keypoints present in the hand. Each image contains only one
hand, which is one object. Therefore, each row represent one object in an image. If a custom data set
contains more than one object in an image, create a row of data for each object in that image.

% Display first few rows of the data set.

handPoseDataset(1:4,:)

ans=4×3 table
imageFilename keypoints boundingBoxes
__________________________ _____________ ___________________

{'0_webcam_1_data_1.jpg' } {21×2 double} {[185 156 249 211]}

{'0_webcam_1_data_2.jpg' } {21×2 double} {[222 163 239 247]}
{'0_webcam_1_data_4.jpg' } {21×2 double} {[268 143 231 204]}
{'0_webcam_1_data_11.jpg'} {21×2 double} {[205 157 253 184]}

% Add the full data path to the locally stored hand pose data folder.
handPoseDataset.imageFilename = fullfile(downloadFolder,"2DHandPoseDataAndGroundTruth","2DHandPos

Configure Keypoint Detector

Use the hrnetObjectKeypointDetector function to configure an existing pretrained network for

custom keypoint classes.

Read the keypoint class names and keypoint connection information using the
helperHandPoseDatasetKeypointNames and helperKeypointConnection helper functions.
keypointClasses contains the categorical class labels for every hand keypoint.
keypointConnections contains the connectivity information between pairs of keypoints.

keypointClasses = helperHandPoseDatasetKeypointNames;
keypointConnections = helperKeypointConnection;

Create an hrnetObjectKeypointDetector object, and configure it to detect keypoints in a human

hand. The pretrained HRNet deep learning network provided by the Computer Vision Toolbox™
Model for Object Keypoint Detection has been trained on the COCO keypoint data set for keypoint
detection in humans.

handPoseKeypointDetector = hrnetObjectKeypointDetector("human-full-body-w32",keypointClasses,Keyp

Prepare Data for Training

Use imageDatastore to create ImageDatastore objects for loading the image data.

handPoseImds = imageDatastore(handPoseDataset.imageFilename);

Use arrayDatastore to create ArrayDatastore objects for loading the groud truth keypoint
location data.

handPoseArrds = arrayDatastore(handPoseDataset(:,2));

3-5
3 Deep Learning, Semantic Segmentation, and Detection Examples

Use boxLabelDatastore to create boxLabelDatastore objects for loading the bounding box
locations.
handPoseBlds = boxLabelDatastore(handPoseDataset(:,3));

Combine the image, array, and box label datastores.

handPoseCds = combine(handPoseImds,handPoseArrds,handPoseBlds);

The HRNet deep learning network has been trained on image patches that contain only one object in
each image. Use the transform function and the helperPreprocessCropped helper function to
preprocess the images in the datastore. Use the functions to crop image patches that contain the
object of interest and rescale the keypoints to the new image size. Then, store the preprocessed data
by using the writeall function. The function stores the image patches as JPEG files and the hand
keypoint data as a MAT file.
% Define the input size and number of keypoints to process.
inputSize = handPoseKeypointDetector.InputSize;
numKeypoints = size(handPoseKeypointDetector.KeyPointClasses,1);

% Preprocess and store all the data.

imagesPatchHandPoseData = transform(handPoseCds,@(data)helperPreprocessCropped(data,inputSize,num
imagesPatchDataLocation = fullfile(downloadFolder,"imagesPatchHandPoseData");
writeall(imagesPatchHandPoseData,imagesPatchDataLocation,"WriteFcn",@helperDataStoretWriteFcn,Fol

Load the data. Create an ImageDatastore object for the image patches and a FileDatastore
object for the keypoints.
handPosePatchImds = imageDatastore(fullfile(imagesPatchDataLocation,"imagePatches"));
handPoseKptfileds = fileDatastore(fullfile(imagesPatchDataLocation,"Keypoints"),"ReadFcn",@load,F

Split the data set into training, validation, and test sets. Select 80% of the data for training, 10% for
validation, and rest for testing the trained detector.
rng(0);
numFiles = numel(handPosePatchImds.Files);
shuffledIndices = randperm(numFiles);

numTrain = round(0.8*numFiles);
trainingIdx = shuffledIndices(1:numTrain);

numVal = round(0.10*numFiles);
valIdx = shuffledIndices(numTrain+1:numTrain+numVal);

testIdx = shuffledIndices(numTrain+numVal+1:end);

Create ImageDatastore objects and for training, validation, and test sets.
trainingImages = handPosePatchImds.Files(trainingIdx);
valImages = handPosePatchImds.Files(valIdx);
testImages = handPosePatchImds.Files(testIdx);
imdsTrain = imageDatastore(trainingImages);
imdsValidation = imageDatastore(valImages);
imdsTest = imageDatastore(testImages);

Create FileDatastore objects for training, validation, and test sets.

trainingKeypoints = handPoseKptfileds.Files(trainingIdx);
valKeypoints = handPoseKptfileds.Files(valIdx);

3-6
Hand Pose Estimation Using HRNet Deep Learning

testKeypoints = handPoseKptfileds.Files(testIdx);
fdsTrain = fileDatastore(trainingKeypoints,"ReadFcn",@load,FileExtensions=".mat");
fdsValidation = fileDatastore(valKeypoints,"ReadFcn",@load,FileExtensions=".mat");
fdsTest = fileDatastore(testKeypoints,"ReadFcn",@load,FileExtensions=".mat");

Create CombinedDatastore objects for training, validation, and test set by combining the
respective image datastore and file data store of each set.

trainingData = combine(imdsTrain,fdsTrain);
validationData = combine(imdsValidation,fdsValidation);
testData = combine(imdsTest,fdsTest);

Visualize the data set. Render the ground truth keypoints in yellow and the keypoint connections in
green color.

data = read(trainingData);
I = data{1};
keypoints = data{2}.keypoint;
Iout = insertObjectKeypoints(I,keypoints, ...
Connections=keypointConnections, ...
ConnectionColor="green", ...
KeypointColor="yellow",KeypointSize=3,LineWidth=3);
figure
imshow(Iout)

3-7
3 Deep Learning, Semantic Segmentation, and Detection Examples

Train HRNet Object Keypoint Detector

Use the handPoseKeypointDetector object and the minibatchqueue (Deep Learning Toolbox)
function to train the HRNet deep learning network on the hand pose data set with a mini-batch size of
8. Decrease the mini-batch size if you run out of memory during training. Create mini-batch queues
for the training and validation data. The minibatchqueue function automatically detects whether a
GPU is available and uses it by default. If you do not have a compatible GPU, or prefer to train on a
CPU, you can specify the OutputEnvironment name-value argument as "cpu" when calling the
minibatchqueue function.

miniBatchSize = 8;
mbqTrain = minibatchqueue(trainingData,3, ...
MiniBatchSize=miniBatchSize, ...
MiniBatchFcn=@(images,keypoints)helperCreateBatchData(images,keypoints,handPoseKeypointDe
MiniBatchFormat=["SSCB","SSCB","SSCB"]);

mbqValidation = minibatchqueue(validationData,3, ...

MiniBatchSize=miniBatchSize, ...
MiniBatchFcn=@(images,keypoints)helperCreateBatchData(images,keypoints,handPoseKeypointDe
MiniBatchFormat=["SSCB","SSCB","SSCB"]);

Specify these training options.

• Set the number of epochs to 10. For larger data sets, you must train for a higher number of
epochs.
• Set the learning rate to 0.001.

numEpochs = 10;
initialLearnRate = 0.001;

Initialize the velocity, averageGrad, and averageSqGrad parameters for Adam optimization.

velocity = [];
averageGrad = [];
averageSqGrad = [];

To monitor training progress, calculate the total number of iterations.

numObservationsTrain = numel(imdsTrain.Files);
numIterationsPerEpoch = floor(numObservationsTrain/miniBatchSize);
numIterations = numEpochs*numIterationsPerEpoch;

Initialize the trainingProgressMonitor (Deep Learning Toolbox) object to create a training

progress plotter. You must create the object close to when you start the training loop, because the
timer starts when you create the monitor object.

Train the HRNet hand pose keypoint detector on hand pose data. Observe the training progress
plotter to monitor the training of the detector object on a custom training loop.

• Read data from the training minibatchqueue. If it does not have any more data, reset and
shuffle the minibatchqueue.
• Evaluate the model gradients using the dlfeval (Deep Learning Toolbox) function. The
modelGradients function, listed as a supporting function, returns the gradients of the loss with
respect to the learnable parameters in the network, the corresponding mini-batch loss, and the
state of the current batch.

3-8
Hand Pose Estimation Using HRNet Deep Learning

• Update the detector parameters using the adamupdate (Deep Learning Toolbox) function.
• Update the state of non-learnable parameters of the detector.
• Update the training progress plot.

if doTraining
monitor = trainingProgressMonitor( ...
Metrics=["TrainingLoss","ValidationLoss"], ...
Info=["Epoch","Iteration","LearningRate"], ...
XLabel="Iteration");
groupSubPlot(monitor,"Loss",["TrainingLoss","ValidationLoss"])
iteration = 0;
monitor.Status = "Running";

% Custom training loop.

for epoch = 1:numEpochs

reset(mbqTrain)
shuffle(mbqTrain)

if epoch >= 7
currentLR = initialLearnRate/10;
elseif epoch >= 10
currentLR = initialLearnRate/100;
else
currentLR = initialLearnRate;
end

while(hasdata(mbqTrain) && ~monitor.Stop)

iteration = iteration + 1;

[XTrain,YTrain,WTrain] = next(mbqTrain);

% Calculate modelGradients using the dlfeval function.

[gradients,trainingLoss,dlYPred,state] = dlfeval(@modelGradients,handPoseKeypointDete

% Update the state of the non-learnable parameters.

handPoseKeypointDetector.State = state;

% Update the network parameters using the ADAM optimizer.

[handPoseKeypointDetector.Learnables,averageGrad,averageSqGrad] = adamupdate(handPose
gradients,averageGrad,averageSqGrad,iteration,currentLR);

% Calculate the validation loss.

validationLoss = [];
reset(mbqValidation)
while (hasdata(mbqValidation))
[XVal,YVal,WVal] = next(mbqValidation);
dlValPred = forward(handPoseKeypointDetector,XVal);
valLoss = helperCalculateLoss(dlValPred,WVal,YVal);
validationLoss = [validationLoss; valLoss];
end
validationLoss = mean(validationLoss);

updateInfo(monitor, ...
LearningRate=currentLR, ...
Epoch=string(epoch) + " of " + string(numEpochs), ...
Iteration=string(iteration) + " of " + string(numIterations));

3-9
3 Deep Learning, Semantic Segmentation, and Detection Examples

recordMetrics(monitor,iteration, ...
TrainingLoss=trainingLoss, ...
ValidationLoss=validationLoss)
monitor.Progress=100*floor(iteration/numIterations);
end
end
else
handPoseKeypointDetector = pretrainedKeypointDetector;
end

Evaluate Object Keypoint Detector

Evaluate the hand keypoint detection using the percentage of correct keypoints (PCK) metric [3 on
page 3-15]. The PCK metric measures the percentage of estimated keypoints that fall within a
certain radius of the ground truth keypoints. To compute the PCK metric, set a threshold to define
whether or not a predicted keypoint is accurate. The optimal distance threshold for comparing the
predicted and ground truth keypoints ranges from 0.1 to 0.3. If the distance between them is within
the threshold, you can consider the predicted keypoints accurate.

To compute the PCK metric, calculate the Euclidean distance between the predicted keypoints and
the ground truth keypoints, and then normalize the value by a specified distance. In the case of hand
keypoint detection, use the distance between the middle point and the lowest point of the middle
finger as the normalization factor.
testDataPCK = [];
reset(testData)

while testData.hasdata
data = read(testData);
I = data{1};
keypoint = data{2}.keypoint;
[height, width] = size(I,[1 2]);
bbox = [1 1 width height];
% Distance between the middle and the lower point of middle finger.
normalizationFactor = sqrt((keypoint(5,1)-keypoint(6,1))^2 + (keypoint(5,2)-keypoint(6,2))^2)
threshold = 0.3;
predictedKeypoints = detect(handPoseKeypointDetector,I,bbox);
pck = helperCalculatePCK(predictedKeypoints,keypoint,normalizationFactor,threshold);
testDataPCK = [testDataPCK;pck];
end

PCK = mean(testDataPCK);
disp("Average PCK on the hand pose test dataset is: " + PCK);

Average PCK on the hand pose test dataset is: 0.94438

A PCK score of 0.9443 on the test data implies that 94.43% of the keypoints have been identified
correctly. To improve the results, you can add more data to the data set or use data augmentation. To
customize this example for your own data, you might need to reduce the learning rate if the
validation loss remains constant and the model does not converge.

Supporting Functions

modelGradients — Calculate gradients for mini-batch input data.

function [gradients,loss,dlYPredOut,state] = modelGradients(detector,dlX,dlY,dlW)
% Loss and gradient calculation during the forward pass

3-10
Hand Pose Estimation Using HRNet Deep Learning

[dlYPredOut,state] = forward(detector,dlX);
loss = helperCalculateLoss(dlYPredOut,dlW,dlY);
gradients = dlgradient(loss,detector.Learnables);
end

helperCalculateLoss — Calculate the mean squared error (MSE) loss.

function loss = helperCalculateLoss(dlYPred,dlW,dlY)

outputSize = size(dlYPred,[1 2]);
dlW = logical(dlW);
dlY = reshape(dlY.*dlW,size(dlY,1),size(dlY,2),[]);
dlY = dlarray(dlY,"SSB");
dlYPred = reshape(dlYPred.*dlW,size(dlYPred,1),size(dlYPred,2),[]);
dlYPred = dlarray(dlYPred,"SSB");
loss = mse(dlYPred,dlY);
loss = (loss*1./(outputSize(1)*outputSize(2)));
end

Data Processing Helper Functions

helperCreateBatchData — Create mini-batches of data for the minibtachqueue function.

function [X, Y, W] = helperCreateBatchData(images,keypoints,handPoseKeypointDetector)

% Returns the images combined along the batch dimension as X. It also returns the
% generated heat map and its weights combined along the batch dimension as Y and W respectively.
inputSize = handPoseKeypointDetector.InputSize;
outputSize = [inputSize(1)/4 inputSize(2)/4];
numKeypoints = size(handPoseKeypointDetector.KeyPointClasses,1);
miniBatchSize = size(images,1);
X = zeros(inputSize(1),inputSize(2),inputSize(3),miniBatchSize,"single");
Y = zeros(outputSize(1),outputSize(2),numKeypoints,miniBatchSize,"single");
W = zeros(outputSize(1),outputSize(2),numKeypoints,miniBatchSize,"single");

for k = 1:miniBatchSize
I = images{k};
keypoint = keypoints{k}.keypoint;
X(:,:,:,k) = single(rescale(I));
[heatmaps,weights] = helperGenerateHeatmap(single(keypoint),inputSize,outputSize);
Y(:,:,:,k) = single(heatmaps);
W(:,:,:,k) = repmat(permute(weights,[2 3 1]),outputSize(1:2));
end
end

helperPreprocessCropped — Crop the input images based on their bounding boxes, and
transform their corresponding keypoints based on the cropped image coordinates.

function preprocessedData = helperPreprocessCropped(trainingData,inputSize,numKeypoints)

preprocessedData = cell(size(trainingData));
I = trainingData{1};
keypoint = trainingData{2}.keypoints{1};
bbox = trainingData{3};
[center,scale] = helperBoxToCenterScale(bbox,inputSize(1),inputSize(2));
trans = helperGetAffineTransform(center,scale,inputSize,false);
ImageAugmented = imwarp(I,trans,OutputView=imref2d([inputSize(1) inputSize(2)]), ...
interpolationMethod="linear",FillValues=0);
preprocessedData{1} = ImageAugmented;
for i = 1:numKeypoints
keypoint(i,1:2) = affineTransform(keypoint(i,1:2),trans);

3-11
3 Deep Learning, Semantic Segmentation, and Detection Examples

end
preprocessedData{2} = keypoint;
preprocessedData{3} = trainingData{3};
preprocessedData{4} = trainingData{4};
end

helperDataStoretWriteFcn — Write the preprocessed datastore to files.

function helperDataStoretWriteFcn(data,writeInfo,~)
name = erase(writeInfo.SuggestedOutputName,writeInfo.Location);
name = erase(name,".jpg");
imageFolder = fullfile(writeInfo.Location,"imagePatches");
if ~exist(imageFolder,"dir")
mkdir(imageFolder)
end
imwrite(data{1},fullfile(imageFolder,name + ".jpeg"))
keypoint = data{2};
keypointFolder = fullfile(writeInfo.Location,"Keypoints");
if ~exist(keypointFolder,"dir")
mkdir(keypointFolder)
end
save(fullfile(keypointFolder,name + ".mat"),"keypoint")
end

helperGenerateHeatmap — HRNet-based keypoint detection for training typically follows a top-

down approach. Before training the model, convert ground truth keypoints to heatmaps to enable
proper regression. The size of the heatmap must correspond to the output size of the HRNet deep
learning network.
function [heatmaps,weights] = helperGenerateHeatmap(keypoints,inputSize,outputSize)
heatmapSize = [outputSize(2) outputSize(1)];
sigma = 3;
featStride = [inputSize(2) inputSize(1)]./heatmapSize;
numKeypoints = size(keypoints,1);
heatmaps = zeros([heatmapSize(2) heatmapSize(1) numKeypoints]);

if size(keypoints,2) == 2
weights = ones(numKeypoints,1);
else
weights = keypoints(:,3);
end
tmpSize = sigma*3;
for k = 1:numKeypoints
muX = round(keypoints(k,1)/featStride(1) + 0.5);
muY = round(keypoints(k,2)/featStride(2) + 0.5);
upperLeft = [floor(muX - tmpSize) floor(muY - tmpSize)];
bottomRight = [floor(muX + tmpSize + 1),floor(muY + tmpSize + 1)];
if (upperLeft(1) >= heatmapSize(1) || upperLeft(2) >= heatmapSize(2) || ...
bottomRight(1) < 0 || bottomRight(2) < 0)
weights(k) = 0;
continue
end
sizeRegion = 2*tmpSize + 1;
[x,y] = meshgrid(1:sizeRegion,1:sizeRegion);
x0 = floor(sizeRegion/2);
y0 = x0;
g = exp(-((x - x0).^2 + (y - y0).^2) ./ (2*(sigma^2)));
gx = [max(0, -upperLeft(1)) min(bottomRight(1),heatmapSize(1))-upperLeft(1)-1] + 1;

3-12
Hand Pose Estimation Using HRNet Deep Learning

gy = [max(0, -upperLeft(2)) min(bottomRight(2),heatmapSize(2))-upperLeft(2)-1] + 1;

imgx = [max(0, upperLeft(1)) min(bottomRight(1),heatmapSize(1))-1] + 1;
imgy = [max(0, upperLeft(2)) min(bottomRight(2),heatmapSize(2))-1] + 1;
if weights(k) > 0.5
heatmaps(imgy(1):imgy(2),imgx(1):imgx(2),k) = g(gy(1):gy(2),gx(1):gx(2));
end
end
end

helperBoxToCenterScale — Convert bounding box format from [x y w h] to center and scale.

The center is the coordinates of the bounding box center, and the scale is the bounding box
width and height normalized by a scale factor.
function [center,scale] = helperBoxToCenterScale(box,modelImageHeight,modelImageWidth)
boxWidth = box(:,3);
boxHeight = box(:,4);
center(1) = box(:,1) + floor(boxWidth/2);
center(2) = box(:,2) + floor(boxHeight/2);
aspectRatio = modelImageWidth*1.0/modelImageHeight;
% Pixel standard deviation is 200.0, which serves as the normalization factor to
% to calculate bounding box scales.
% https://github.com/leoxiaobin/deep-high-resolution-net.pytorch/blob/master/demo/demo.py#L18
pixelStd = 200;

if boxWidth > aspectRatio*boxHeight

boxHeight = boxWidth*1.0/aspectRatio;
elseif boxWidth < aspectRatio*boxHeight
boxWidth = boxHeight*aspectRatio;
end
scale = double([boxWidth*1.0/pixelStd boxHeight*1.0/pixelStd]);
if(center(1) ~= -1)
scale = scale*1.25;
end
end

helperGetAffineTransform — Calculate the affine transform based on the center and scale of the
image.
function transformMatrix = helperGetAffineTransform(center,scale,outputHeatMapSize,invAffineTrans
% center: Center of the bounding box [x y].
% scale: Scale of the bounding box, normalized by the scale factor, [width height].
% outputHeatMapSize: Size of the destination heatmaps.
% invAffineTransform (boolean): Option to invert the affine transform direction.
% (inv=False: src->dst or inv=True: dst->src).

% shift (0-100%): Shift translation ratio with regard to the width and height.
shift = [0 0];

% pixelStd is 200 as per https://github.com/open-mmlab/mmpose/blob/master/mmpose/core/post_proces

scaleTmp = scale*200.0;
srcWidth = scaleTmp(1);
dstHeight = outputHeatMapSize(1);
dstWidth = outputHeatMapSize(2);

srcPoint = [1 srcWidth*-0.5];
dstDir = double([1 dstWidth*-0.5]);

src = zeros(3,2);

3-13
3 Deep Learning, Semantic Segmentation, and Detection Examples

dst = zeros(3,2);
src(1,:) = center + scaleTmp.*shift;
src(2,:) = center + srcPoint + scaleTmp.*shift;
dst(1,:) = [dstWidth*0.5 dstHeight*0.5];
dst(2,:) = [dstWidth*0.5 dstHeight*0.5] + dstDir;

src(3,:) = helperGetThirdPoint(src(1,:),src(2,:));
dst(3,:) = helperGetThirdPoint(dst(1,:),dst(2,:));

if invAffineTransform
transformMatrix = fitgeotform2d(dst,src,"affine");
else
transformMatrix = fitgeotform2d(src,dst,"affine");
end
end

helperGetThirdPoint — To calculate the affine matrix, you must have three pairs of points. This
function obtains the third point, given 2D points a and b. The function defines the third point by
rotating the vector a - b by 90 degrees anticlockwise, using point b as the rotation center.

function thirdPoint = helperGetThirdPoint(a,b)

% Args:
% a: point(x,y)
% b: point(x,y)
% Returns:
% The third point.
direction = a - b;
thirdPoint = b + [-direction(2)-1 direction(1)+1];
end

function newJoint = affineTransform(keypoint,trans)

newJoint = [keypoint(1) keypoint(2) 1];
newJoint = trans.A*newJoint';
newJoint = newJoint(1:2);
end

Utility Functions

helperDownloadHandPoseKeypointDetector — Download the pretrained hand pose keypoint

detector.

function keypointDetector = helperDownloadHandPoseKeypointDetector(downloadFolder)

pretrainedURL = "https://ssd.mathworks.com/supportfiles/vision/data/hrnet2DHandPose.zip";
pretrainedFolder = fullfile(downloadFolder,"pretrainedNetwork");
if ~exist(pretrainedFolder,"dir")
mkdir(pretrainedFolder)
end
pretrainedDetectorZip = fullfile(pretrainedFolder,"hrnet2DHandPose.zip");
if ~exist(pretrainedDetectorZip,"file")
disp("Downloading pretrained hand pose keypoint detector (102 MB)...")
websave(pretrainedDetectorZip,pretrainedURL);
end
unzip(pretrainedDetectorZip,pretrainedFolder)
pretrainedDetector = fullfile(pretrainedFolder,"hrnet2DHandPose.mat");
keypointDetector = load(pretrainedDetector).handPoseKeypointDetector;
end

helperDownloadHandPoseDataset — Download the hand pose data set and ground truth labels.

3-14
Hand Pose Estimation Using HRNet Deep Learning

function dataset = helperDownloadHandPoseDataset(downloadFolder)

dataFilename = "2DHandPoseDataAndGroundTruth.zip";
dataAndImageUrl = "https://ssd.mathworks.com/supportfiles/vision/data/2DHandPose/" + dataFilename
zipFile = fullfile(downloadFolder,dataFilename);
if ~exist(zipFile,"file")
disp("Downloading hand pose dataset (98 MB)...")
websave(zipFile,dataAndImageUrl);
end
unzip(zipFile,downloadFolder)
dataset = fullfile(downloadFolder,"2DHandPoseDataAndGroundTruth","2DHandPoseGroundTruth.mat");
end

helperHandPoseDatasetKeypointNames — Returns the categorical class names of the 21 hand

keypoints.

function classes = helperHandPoseDatasetKeypointNames()

classes = ["forefinger3","forefinger4","forefinger2","forefinger1", ...
"middleFinger3","middleFinger4","middleFinger2","middleFinger1", ...
"pinkyFinger3","pinkyFinger4","pinkyFinger2","pinkyFinger1", ...
"ringFinger3","ringFinger4","ringFinger2","ringFinger1", ...
"thumb3","thumb4","thumb2","thumb1","wrist"]';
end

helperKeypointConnection — Returns the pairs of keypoint connections between the 21 hand

keypoints.

function connection = helperKeypointConnection()

connection = [4 3; 3 1; 1 2; 8 7; 7 5; 5 6 ; 16 15; 15 13; 13 14; 12 11; 11 9; 9 10; 20 19; 19 17
end

helperCalculatePCK — Calculate the PCK of each predicted keypoint and corresponding ground
truth keypoint.

function pckcurrent = helperCalculatePCK(pred,groundtruth,normalizationFactor,threshold)

assert(size(pred,1) == size(groundtruth,1) && size(pred,2) == size(groundtruth,2) && size(pred,3)
pckcurrent = [];
for imgidx = 1:size(pred,3)
pck = mean(sqrt((pred(:,1,imgidx)-groundtruth(:,1,imgidx)).^2+(pred(:,2,imgidx)-groundtruth(:
pckcurrent = [pckcurrent pck];
end
pckcurrent = mean(pckcurrent);
end

References

[1] Sun, Ke, Bin Xiao, Dong Liu, and Jingdong Wang. “Deep High-Resolution Representation Learning
for Human Pose Estimation.” In 2019 IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 5686–96. Long Beach, CA, USA: IEEE, 2019. https://doi.org/10.1109/
CVPR.2019.00584.

[2] Gomez-Donoso, Francisco, Sergio Orts-Escolano, and Miguel Cazorla. "Large-Scale Multiview 3D
Hand Pose Dataset." Image and Vision Computing 81 (2019): 25–33. https://doi.org/10.1016/
j.imavis.2018.12.001.

3-15
3 Deep Learning, Semantic Segmentation, and Detection Examples

[3] Yang, Yi, and Deva Ramanan. "Articulated Human Detection with Flexible Mixtures of Parts." IEEE
Transactions on Pattern Analysis and Machine Intelligence 35, no. 12 (December 2013): 2878–90.
https://doi.org/10.1109/TPAMI.2012.261.

3-16
Recognize Seven-Segment Digits Using OCR

Recognize Seven-Segment Digits Using OCR

This example shows how to recognize seven-segment digits in an image by using optical character
recognition (OCR). In the example, you use the detectTextCRAFT function and region properties to
detect the seven-segment text regions in the image. Then, you use OCR to recognize the seven-
segment digits in the detected text regions.

Read Image

Read an image into the MATLAB® workspace.

img = imread('meterreading.jpg');

Detect Seven-Segment Text Regions

Detect text regions in the input image by using the detectTextCRAFT function. The
CharacterThreshold value is the region threshold to use for localizing each character in the
image. The LinkThreshold value is the affinity threshold that defines the score for grouping two
detected texts into a single instance. You can fine-tune the detection results by modifying the region
and affinity threshold values. Increase the value of the affinity threshold for more word-level and
character-level detections. For information about the effect of the affinity threshold on the detection
results, see the “Detect Characters by Modifying Affinity Threshold” example.

Set the value of the affinity threshold to 0.005. The default value for the region threshold is 0.4. The
output is a set of bounding boxes that localize the text regions in the input image. The bounding box
specifies the spatial coordinates of the detected text regions in the image and is a vector of form [x,
y, width, height]. The vector specifies the upper left corner and size of the detected region in
pixels.
bbox = detectTextCRAFT(img,LinkThreshold=0.005);

Draw the output bounding boxes on the image by using the insertShape function.
Iout = insertShape(img,"rectangle",bbox,LineWidth=4);

Display the input image and the output text detections.

figure
montage({img;Iout});
title("Input Image | Detected Text Regions")

3-17
3 Deep Learning, Semantic Segmentation, and Detection Examples

In the input image, the seven-segment text region occupies the maximum area. Use the area of the
detected bounding boxes to extract the seven-segment text region.

Compute the area of the bounding boxes and find the bounding box with maximum area.

bboxArea = bbox(:,3).*bbox(:,4);
[value,indx]= max(bboxArea);

Extract the text region with maximum bounding box area from the input image.

roi = bbox(indx,:);
extractedImg = img(roi(2):roi(2)+roi(4),roi(1):roi(1)+roi(3),:);

Display the extracted seven-segment text region.

figure
imshow(extractedImg)
title('Extracted Seven-Segment Text Region')

Recognize Seven-Segment Digits

Recognize the seven-segment digits in the detected text region by using ocr function. Set the value
of the Model name-value argument to "seven-segment". The output is an ocrText object
containing information about the recognized text, the recognition confidence, and the location of the
text in the original image.

output = ocr(img,roi,Model="seven-segment")

output =
ocrText with properties:

Text: '810000...'
CharacterBoundingBoxes: [17x4 double]
CharacterConfidences: [17x1 single]
Words: {2x1 cell}
WordBoundingBoxes: [2x4 double]
WordConfidences: [2x1 single]
TextLines: {2x1 cell}
TextLineBoundingBoxes: [2x4 double]

3-18
Recognize Seven-Segment Digits Using OCR

TextLineConfidences: [2x1 single]

Display the recognized seven-segment digits. You can notice that OCR detects two bounding boxes
enclosing the text regions and recognizes the digits in each region.

disp([output.Words])

{'810000' }
{'0110555.'}

Iocr = insertObjectAnnotation(img,"Rectangle",output.WordBoundingBoxes,output.Words,LineWidth=4,F
figure
imshow(Iocr)

Challenges Obtaining Accurate Results

The main challenges in accurate recognition of the seven-segment digits are the segmentation of text
regions and the choice of the LayoutAnalysis name-value argument of ocr function.

As a preprocessing step, the ocr function performs binarization to segment the text regions from the
background. Due to the nature of the seven-segment text images, the binarized text regions have
disconnected pixels. If the distance between the pixels disconnected along the vertical direction is
large and the value of the LayoutAnalysis parameter is set to "block", the ocr function considers
the input image to have multiple lines of text. Then, the ocr function groups each line of text into a
region and recognizes the digits within each region. As a result, the recognition results might be
inaccurate. In such cases, you can improve the recognition accuracy by selecting a proper value for
the LayoutAnalysis parameter.

3-19
3 Deep Learning, Semantic Segmentation, and Detection Examples

Improve Results Using LayoutAnalysis Parameter

If the detected image region consists of only one line of seven-segment text, you can set the
LayoutAnalysis name-value argument to "word", "character", or "line" in order to obtain good
recognition results. For more details about how to select the value for LayoutAnalysis name-value
argument, see ocr.

The input image contains a group of seven-segment digits. To recognize all the digits in the group, set
the value of the LayoutAnalysis name-value argument to "word". Compute the OCR results.

output = ocr(img,roi,Model="seven-segment",LayoutAnalysis="word")

output =
ocrText with properties:

Text: '010555....'
CharacterBoundingBoxes: [9x4 double]
CharacterConfidences: [9x1 single]
Words: {'010555.'}
WordBoundingBoxes: [149 213 1057 365]
WordConfidences: 0.6762
TextLines: {'010555.'}
TextLineBoundingBoxes: [149 213 1057 365]
TextLineConfidences: 0.6762

Display the recognized seven-segment digits.

disp([output.Words])

{'010555.'}

Draw the output bounding boxes on the image by using the insertObjectAnnotation function.
Display the recognition results. You can notice that the seven-segment text region in the image is well
localized and the digits are recognized correctly.

Iocr = insertObjectAnnotation(img,"Rectangle",output.WordBoundingBoxes,output.Words,LineWidth=4,F
figure
imshow(Iocr)

3-20
Recognize Seven-Segment Digits Using OCR

Further Exploration

• If the detected text region consist of multiple lines of seven-segment texts, set the
LayoutAnalysis name-value argument to "block" for optimal results.
• You can improve the recognition results by accurately localizing and segmenting the seven-
segment text regions in a given image. Though you can use the detectTextCRAFT function for
detecting the text regions, you will have to manually select the appropriate region threshold and
affinity threshold values for good detection results. Alternatively, you can use the Color
Thresholder or Image Segmenter apps to interactively segment the desired text regions in the
image.
• If the segmented region contain outliers, use morphological operations to preprocess the image
before performing OCR. For an example, see the Image Pre-processing and ROI-based Processing
Techniques demonstrated in the “Recognize Text Using Optical Character Recognition (OCR)” on
page 4-46 example. The Improve Recognition Results section in “Automatically Detect and
Recognize Text Using Pretrained CRAFT Network and OCR” on page 4-14 example also
demonstrates the image preprocessing techniques that you can use for improving recognition
results if the image contain multiple lines of texts.

3-21
3 Deep Learning, Semantic Segmentation, and Detection Examples

Train an OCR Model to Recognize Seven-Segment Digits

This example shows how to train an OCR model to recognize seven-segment digits, use quantization
to improve runtime performance, and evaluate text recognition accuracy. The Computer Vision
Toolbox™ provides several pretrained OCR models, including one for seven-segment digits. Training
an OCR model is necessary when a pretrained model is not effective for your application. This
example demonstrates the general procedure for training an OCR model using the YUVA EB dataset
[1].

Load Data

This example uses 119 images from the YUVA EB dataset. The dataset contains images of energy
meter displays with seven-segment numerals. These images were captured under challenging text
recognition conditions such as tilted positions, lens blur, and non-uniform lighting conditions. A small
dataset is useful for exploring the OCR training procedure, but in practice, more labeled images are
needed to train a robust OCR model.

Download and extract dataset.

datasetFiles = helperDownloadDataset;

Downloading evaluation data set (7SegmentImages.zip - 96 MB)...

The images in the dataset were annotated with bounding boxes containing the seven-segment digits
and text labels were added to these bounding boxes as an attribute using the “Get Started with the
Image Labeler” on page 11-71. To learn more about labeling images for OCR training, see “Train
Custom OCR Model” on page 19-2. The labels were exported from the app as groundTruth object
and saved in 7SegmentGtruth.mat file.

Load the ground truth to be used for training and evaluation.

ld = load("7SegmentGtruth.mat");
gTruth = ld.gTruth;

Create datastores that contain images, bounding boxes and text labels from the groundTruth object
using the ocrTrainingData function with the label and attribute names used during labeling.

labelName = "Text";
attributeName = "Digits";
[imds,boxds,txtds] = ocrTrainingData(gTruth,labelName,attributeName);

Display few samples from the ground truth data.

helperDisplayGroundtruthData(imds, boxds, txtds)

3-22
Train an OCR Model to Recognize Seven-Segment Digits

Analyze Ground Truth Data

Analyze Ground Truth Character Set

Analyze the ground truth text to verify that all characters of interest for training have observation
samples in the ground truth data. To verify this, find the character set of the ground truth data.

Read all ground truth text corresponding to each image and combine the text in each image.

allImagesText = txtds.readall;
allText = strjoin(vertcat(allImagesText{:}), "");

Find the unique set of characters in the ground truth text.

[characterSet, ~, idx] = unique(char(allText));

Display the ground truth character set.

disp("Ground truth Character Set: " + string(characterSet))

Ground truth Character Set: .0123456789

The ground truth data contains images of the 10 digits from 0-9 and the period symbol in the seven-
segment font.

3-23
3 Deep Learning, Semantic Segmentation, and Detection Examples

Analyze Dataset Class Distribution

In addition to verifying the ground truth character set, it is important to ensure that all characters
have equal representation in the dataset.

Count the occurences of each of these characters in the ground truth data.

characterSet = cellstr(characterSet');
characterCount = accumarray(idx,1);

Tabulate the character count and sort the count in descending order.

characterCountTbl = table(characterSet, characterCount, ...

VariableNames=["Character", "Count"]);
characterCountTbl = sortrows(characterCountTbl, ...
"Count", "descend")

characterCountTbl=11×2 table
Character Count
_________ _____

{'0'} 170
{'.'} 120
{'1'} 98
{'3'} 91
{'2'} 84
{'4'} 78
{'5'} 61
{'9'} 56
{'8'} 55
{'7'} 43
{'6'} 40

Visualize the character count with a bar graph.

numCharacters = numel(characterSet);

figure
bar(1:numCharacters, characterCountTbl.Count)
xticks(1:numCharacters)
xticklabels(characterCountTbl.Character)
xlabel("Digits")
ylabel("Number of samples")

3-24
Train an OCR Model to Recognize Seven-Segment Digits

The characters '0' and '.' have the maximum number of occurences and the characters '7' and '6' have
the least number of occurences. In text recognition applications, it is common to have such imbalance
in the number of character samples as not all characters occur frequently in paragraphs of text.

Dataset imbalance may result in an OCR model that performs poorly on underrepresented characters.
You can balance the dataset by oversampling the least occuring characters if such behavior exists in
the trained OCR model.

Prepare Data for Training

Combine the datastores extracted from gTruth using ocrTrainingData.

cds = combine(imds,boxds,txtds);

Use 60% of the dataset for training and split the rest of the data evenly for validation and testing. The
following code randomly splits the data into training, validation and test.

trainPercent = 60;
[cdsTrain, cdsVal, cdsTest, numTrain, numVal, numTest] = helperPartitionOCRData(cds, trainPercent

The 60/20/20 split results in the following number of training, validation and test images:

disp("Number of training images = " + numTrain)

Number of training images = 71

disp("Number of validation images = " + numVal)

3-25
3 Deep Learning, Semantic Segmentation, and Detection Examples

Number of validation images = 24

disp("Number of test images = " + numTest)

Number of test images = 24

Train OCR Model

Create a directory to save the trained OCR model.

outputDir = "OCRModel";
if ~exist(outputDir, "dir")
mkdir(outputDir);
end

Create a directory to save checkpoints.

checkpointsDir = "Checkpoints";
if ~exist(checkpointsDir, "dir")
mkdir(checkpointsDir);
end

Use ocrTrainingOptions function to specify the following training options for OCR Training.
Empirical analysis is required to determine the optimal training options values.

• ocrTrainingOptions uses ADAM solver by default. Set the gradient decay factor for ADAM
optimization to 0.9.
• Use an initial learning rate of 20e-4.
• Set the maximum number of epochs for training to 15.
• Set the verbose frequency to 100 iterations.
• Specify the output directory.
• Specify the checkpoint path to enable saving checkpoints.
• Specify validation data to enable validation step during training.
• Set the validation frequency to 10 iterations.

ocrOptions = ocrTrainingOptions(GradientDecayFactor=0.9,...
InitialLearnRate=20e-4,...
MaxEpochs=15,...
VerboseFrequency=100,...
OutputLocation=outputDir,...
CheckpointPath=checkpointsDir,...
ValidationData=cdsVal,...
ValidationFrequency=10);

Train a new OCR model by fine-tuning the pretrained "english" model. The training will take about
8-9 minutes.

trainedModelName = "sevenSegmentModel";
baseModel = "english";
[trainedModel, trainingInfo] = trainOCR(cdsTrain, trainedModelName, baseModel, ocrOptions);

*************************************************************************
Starting OCR training

Model Name: sevenSegmentModel

3-26
Train an OCR Model to Recognize Seven-Segment Digits

Base Model: english

Preparing training data... 100.00 % completed.

Preparing validation data... 100.00 % completed.

Character Set: .0123456789

|================================================================================================
| Epoch | Iteration | Time Elapsed | Training Statistics | Validatio
| | | (hh:mm:ss) | RMSE | Character Error | Word Error | RMSE | Charact
|================================================================================================
| 1 | 1 | 00:02:02 | 18.73 | 100.00 | 100.00 | 0.00 | 0.
| 1 | 100 | 00:02:40 | 9.05 | 40.80 | 65.00 | 4.42 | 11
| 2 | 200 | 00:03:20 | 6.15 | 24.33 | 46.00 | 3.35 | 13
| 3 | 300 | 00:03:59 | 4.73 | 16.95 | 34.33 | 2.75 | 10
| 4 | 400 | 00:04:38 | 3.90 | 13.71 | 27.00 | 2.67 | 9.
| 5 | 500 | 00:05:16 | 3.36 | 11.33 | 22.40 | 5.59 | 27
| 6 | 600 | 00:05:54 | 3.04 | 10.03 | 19.67 | 2.44 | 11
| 7 | 700 | 00:06:32 | 2.72 | 8.69 | 17.29 | 3.14 | 11
| 8 | 800 | 00:07:11 | 2.45 | 7.63 | 15.25 | 2.39 | 9.
| 9 | 900 | 00:07:49 | 2.24 | 6.79 | 13.56 | 2.92 | 14
| 10 | 1000 | 00:08:27 | 2.07 | 6.11 | 12.20 | 2.31 |