MATLAB Programming Fundamentals - MathWorks
MATLAB Programming Fundamentals - MathWorks
Programming Fundamentals
R2021b
How to Contact MathWorks
Phone: 508-647-7000
Language
Syntax Basics
1
Continue Long Statements on Multiple Lines . . . . . . . . . . . . . . . . . . . 1-2
Program Components
2
MATLAB Operators and Special Characters . . . . . . . . . . . . . . . . . . . . 2-2
Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Relational Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Special Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
String and Character Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
v
Compatible Array Sizes for Basic Operations . . . . . . . . . . . . . . . . . . 2-25
Inputs with Compatible Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
Inputs with Incompatible Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
vi Contents
Fast Fourier Transform Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-84
Numeric Classes
4
Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Integer Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Creating Integer Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Arithmetic Operations on Integer Classes . . . . . . . . . . . . . . . . . . . . . 4-4
Largest and Smallest Values for Integer Classes . . . . . . . . . . . . . . . . 4-4
vii
Single Precision Math . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
viii Contents
Why Does isempty("") Return 0? . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-61
Why Does Appending Strings Using Square Brackets Return Multiple
Strings? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-62
ix
Line Plot with Durations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-37
Scatter Plot with Dates and Durations . . . . . . . . . . . . . . . . . . . . . . 7-39
Plots that Support Dates and Durations . . . . . . . . . . . . . . . . . . . . . 7-40
Categorical Arrays
8
Create Categorical Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
x Contents
Tables
9
Create Tables and Assign Data to Them . . . . . . . . . . . . . . . . . . . . . . . 9-2
Timetables
10
Create Timetables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
xi
Retime and Synchronize Timetable Variables Using Different
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
Structures
11
Structure Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Create Scalar Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Access Values in Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
Index into Nonscalar Structure Array . . . . . . . . . . . . . . . . . . . . . . . 11-4
Cell Arrays
12
What Is a Cell Array? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
xii Contents
Combine Cell Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10
Function Handles
13
Create Function Handle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
What Is a Function Handle? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Creating Function Handles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-2
Anonymous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-3
Arrays of Function Handles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-4
Saving and Loading Function Handles . . . . . . . . . . . . . . . . . . . . . . 13-4
Map Containers
14
Overview of Map Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-2
xiii
Modify Keys and Values in Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Remove Keys and Values from Map . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Modify Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-13
Modify Keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
Modify Copy of Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14-14
Using Objects
16
Object Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
Two Copy Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
Handle Object Copy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
Value Object Copy Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-2
Handle Object Copy Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16-3
Testing for Handle or Value Class . . . . . . . . . . . . . . . . . . . . . . . . . . 16-5
xiv Contents
Defining Your Own Classes
17
Scripts
18
Create Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2
xv
Modify Figures in Live Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
Explore Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-12
Update Code with Figure Changes . . . . . . . . . . . . . . . . . . . . . . . . 19-14
Add Formatting and Annotations . . . . . . . . . . . . . . . . . . . . . . . . . 19-14
Add and Modify Multiple Subplots . . . . . . . . . . . . . . . . . . . . . . . . 19-16
Save and Print Figure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-20
xvi Contents
Create Examples Using the Live Editor . . . . . . . . . . . . . . . . . . . . . . 19-84
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-89
Function Basics
20
Create Functions in Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
Syntax for Function Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2
Contents of Functions and Files . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3
End Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-4
xvii
Sharing Variables Between Parent and Nested Functions . . . . . . . 20-28
Using Handles to Store Function Parameters . . . . . . . . . . . . . . . . 20-29
Visibility of Nested Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-31
Function Arguments
21
Find Number of Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . 21-2
xviii Contents
Parse Function Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21-13
xix
HTML Markup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-18
LaTeX Markup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-19
Check Code for Errors and Warnings Using the Code Analyzer . . . 24-5
Enable Continuous Code Checking . . . . . . . . . . . . . . . . . . . . . . . . . 24-5
View Code Analyzer Status for File . . . . . . . . . . . . . . . . . . . . . . . . . 24-5
View Code Analyzer Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-6
Fix Problems in Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-7
Create a Code Analyzer Message Report . . . . . . . . . . . . . . . . . . . . . 24-8
Adjust Code Analyzer Message Indicators and Messages . . . . . . . . 24-9
Understand Code Containing Suppressed Messages . . . . . . . . . . . 24-11
Understand the Limitations of Code Analysis . . . . . . . . . . . . . . . . 24-12
Enable MATLAB Compiler Deployment Messages . . . . . . . . . . . . . 24-14
xx Contents
Change Code Based on Code Analyzer Messages . . . . . . . . . . . . . 24-30
Other Ways to Access Code Analyzer Messages . . . . . . . . . . . . . . 24-30
Programming Utilities
25
Identify Program Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-2
Simple Display of Program File Dependencies . . . . . . . . . . . . . . . . 25-2
Detailed Display of Program File Dependencies . . . . . . . . . . . . . . . 25-2
Dependencies Within a Folder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-2
xxi
Order of Argument Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-16
Avoiding Class and Size Conversions . . . . . . . . . . . . . . . . . . . . . . 26-17
nargin in Argument Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19
Restrictions on Variable and Function Access . . . . . . . . . . . . . . . . 26-20
Debugging Arguments Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-21
Software Development
Error Handling
27
Exception Handling in a MATLAB Application . . . . . . . . . . . . . . . . . 27-2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-2
Getting an Exception at the Command Line . . . . . . . . . . . . . . . . . . 27-2
Getting an Exception in Your Program Code . . . . . . . . . . . . . . . . . . 27-3
Generating a New Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-3
xxii Contents
Issue Warnings and Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-13
Issue Warnings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-13
Throw Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-13
Add Run-Time Parameters to Your Warnings and Errors . . . . . . . . 27-14
Add Identifiers to Warnings and Errors . . . . . . . . . . . . . . . . . . . . . 27-14
Program Scheduling
28
Schedule Command Execution Using Timer . . . . . . . . . . . . . . . . . . . 28-2
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-2
Example: Displaying a Message . . . . . . . . . . . . . . . . . . . . . . . . . . . 28-2
Performance
29
Measure the Performance of Your Code . . . . . . . . . . . . . . . . . . . . . . 29-2
Overview of Performance Timing Functions . . . . . . . . . . . . . . . . . . 29-2
Time Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-2
Time Portions of Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-2
The cputime Function vs. tic/toc and timeit . . . . . . . . . . . . . . . . . . . 29-2
Tips for Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-3
xxiii
Profile Multiple Statements in Command Window . . . . . . . . . . . . . 29-10
Profile an App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-11
Preallocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-16
Preallocating a Nondouble Matrix . . . . . . . . . . . . . . . . . . . . . . . . 29-16
Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-18
Using Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-18
Array Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-19
Logical Array Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-20
Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-21
Ordering, Setting, and Counting Operations . . . . . . . . . . . . . . . . . 29-22
Functions Commonly Used in Vectorization . . . . . . . . . . . . . . . . . 29-23
Background Processing
30
Asynchronous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-2
Asynchronous Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-2
Background Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-4
xxiv Contents
Memory Usage
31
Strategies for Efficient Use of Memory . . . . . . . . . . . . . . . . . . . . . . . 31-2
Use Appropriate Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-2
Avoid Temporary Copies of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-3
Reclaim Used Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31-4
xxv
Elements of the demos.xml File . . . . . . . . . . . . . . . . . . . . . . . . . . 32-30
Projects
33
Create Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-2
What Are Projects? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-2
Create Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-2
Open Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-2
Set up Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-3
Add Files to Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-5
Other Ways to Create Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-6
xxvi Contents
Upgrade Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-30
Run Upgrade Project Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33-30
Examine Upgrade Project Report . . . . . . . . . . . . . . . . . . . . . . . . . 33-31
xxvii
Set Up SVN Source Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-14
SVN Source Control Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-14
Register Binary Files with SVN . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-14
Standard Repository Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-17
Tag Versions of Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-17
Enforce Locking Files Before Editing . . . . . . . . . . . . . . . . . . . . . . 34-17
Share a Subversion Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-18
xxviii Contents
Integration with SVN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34-41
Integration with Other Source Control Tools . . . . . . . . . . . . . . . . . 34-42
Unit Testing
35
Write Test Using Live Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-3
xxix
Write Function-Based Unit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-20
Create Test Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-20
Run the Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-22
Analyze the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-23
xxx Contents
Define Parameters at Suite Creation Time . . . . . . . . . . . . . . . . . . . 35-82
xxxi
Write Test for App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-155
Write Test That Uses App Testing and Mocking Frameworks . . . 35-159
Create App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-159
Test App With Manual Intervention . . . . . . . . . . . . . . . . . . . . . . . 35-160
Create Fully Automated Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-161
xxxii Contents
GitHub Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-213
Jenkins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-214
Travis CI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-214
Other Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35-214
xxxiii
Hide Inactive Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-26
Specify Inactive Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-26
Complete Class Definition File with Inactive Properties Method . . . . . . 36-26
xxxiv Contents
Analyze System Object Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36-58
View and Navigate System object Code . . . . . . . . . . . . . . . . . . . . . . . . 36-58
Example: Go to StepImpl Method Using Analyzer . . . . . . . . . . . . . . . . . 36-58
Create New System Objects for File Input and Output . . . . . . . . . . . . . 36-69
xxxv
Language
37
1
Syntax Basics
The start and end quotation marks for a character vector must appear on the same line. For example,
this code returns an error, because each line contains only one quotation mark:
x = [1.23...
4.56];
is the same as
x = [1.23 4.56];
1-2
Name=Value in Function Calls
Use the name=value syntax to help identify name-value arguments for functions and to clearly
distinguish names from values in lists of name-value arguments.
Most functions and methods support both syntaxes, but there are some limitations on where and how
the name=value syntax can be used:
• Mixing name,value and name=value syntaxes: The recommended practice is to use only one
syntax in any given function call. However, if you do mix name=value and name,value syntaxes
in a single call, all name=value arguments must appear after the name,value arguments. For
example, plot(x,y,"Color","red",LineWidth=2) is a valid combination, but
plot(x,y,Color="red","LineWidth",2) errors.
• Using positional arguments after name-value arguments: Some functions have positional
arguments that appear after name-value arguments. For example, this call to the verifyEqual
method uses the RelTol name-value argument, followed by a string input:
verifyEqual(testCase,1.5,2,"RelTol",0.1,...
"Difference exceeds relative tolerance.")
Using the name=value syntax (RelTol=0.1) causes the statement to error. In cases where a
positional argument follows name-value arguments, use the name,value syntax.
• Names that are invalid variable names: Name-value arguments with names that are invalid
MATLAB variable names cannot be used with the name=value syntax. See “Variable Names” on
page 1-5 for more info. For example, a name-value argument like "allow-empty",true errors
if passed as allow-empty=true. Use the name,value syntax in these cases.
Function authors do not need to code differently to support both the name,value and name=value
syntaxes. For information on authoring functions that accept name-value arguments, see “Name-Value
Arguments” on page 26-10.
1-3
1 Syntax Basics
helpFile = which('help');
[helpPath,name,ext] = fileparts(helpFile);
The current workspace now contains three variables from fileparts: helpPath, name, and ext. In
this case, the variables are small. However, some functions return results that use much more
memory. If you do not need those variables, they waste space on your system.
If you do not use the tilde operator, you can request only the first N outputs of a function (where N is
less than or equal to the number of possible outputs) and ignore any remaining outputs. For example,
request only the first output, ignoring the second and third.
helpPath = fileparts(helpFile);
If you request more than one output, enclose the variable names in square brackets, []. The
following code ignores the output argument ext.
[helpPath,name] = fileparts(helpFile);
To ignore function outputs in any position in the argument list, use the tilde operator. For example,
ignore the first output using a tilde.
[~,name,ext] = fileparts(helpFile);
You can ignore any number of function outputs using the tilde operator. Separate consecutive tildes
with a comma. For example, this code ignores the first two output arguments.
[~,~,ext] = fileparts(helpFile);
See Also
More About
• “Ignore Inputs in Function Definitions” on page 21-10
1-4
Variable Names
Variable Names
In this section...
“Valid Names” on page 1-5
“Conflicts with Function Names” on page 1-5
Valid Names
A valid variable name starts with a letter, followed by letters, digits, or underscores. MATLAB is case
sensitive, so A and a are not the same variable. The maximum length of a variable name is the value
that the namelengthmax command returns.
You cannot define variables with the same names as MATLAB keywords, such as if or end. For a
complete list, run the iskeyword command.
Check whether a proposed name is already in use with the exist or which function. exist returns
0 if there are no existing variables, functions, or other artifacts with the proposed name. For example:
exist checkname
ans =
0
If you inadvertently create a variable with a name conflict, remove the variable from memory with the
clear function.
Another potential source of name conflicts occurs when you define a function that calls load or eval
(or similar functions) to add variables to the workspace. In some cases, load or eval add variables
that have the same names as functions. Unless these variables are in the function workspace before
the call to load or eval, the MATLAB parser interprets the variable names as function names. For
more information, see:
See Also
clear | exist | iskeyword | namelengthmax | which | isvarname
1-5
1 Syntax Basics
In MATLAB code, use an exact match with regard to case for variables, files, and functions. For
example, if you have a variable, a, you cannot refer to that variable as A. It is a best practice to use
lowercase only when naming functions. This is especially useful when you use both Microsoft®
Windows® and UNIX®1 platforms because their file systems behave differently with regard to case.
When you use the help function, the help displays some function names in all uppercase, for
example, PLOT, solely to distinguish the function name from the rest of the text. Some functions for
interfacing to Oracle® Java® software do use mixed case and the command-line help and the
documentation accurately reflect that.
Spaces
Blank spaces around operators such as -, :, and ( ), are optional, but they can improve readability.
For example, MATLAB interprets the following statements the same way.
y = sin (3 * pi) / 2
y=sin(3*pi)/2
However, blank spaces act as delimiters in horizontal concatenation. When defining row vectors, you
can use spaces and commas interchangeably to separate elements:
A = [1, 0 2, 3 3]
A =
1 0 2 3 3
Because of this flexibility, check to ensure that MATLAB stores the correct values. For example, the
statement [1 sin (pi) 3] produces a much different result than [1 sin(pi) 3] does.
[1 sin (pi) 3]
[1 sin(pi) 3]
ans =
1. UNIX is a registered trademark of The Open Group in the United States and other countries.
1-6
Choose Command Syntax or Function Syntax
For introductory information on calling functions, see “Calling Functions”. For information related to
defining functions, see “Create Functions in Files” on page 20-2.
In function syntax, inputs can be data, variables, and even MATLAB expressions. If an input is data,
such as the numeric value 2 or the string array ["a" "b" "c"], MATLAB passes it to the function
as-is. If an input is a variable MATLAB will pass the value assigned to it. If an input is an expression,
like 2+2 or sin(2*pi), MATLAB evaluates it first, and passes the result to the function. If the
functions has outputs, you can assign them to variables as shown in the example syntax above.
Command syntax is simpler but more limited. To use it, separate inputs with spaces rather than
commas, and do not enclose them in parentheses.
functionName input1 ... inputN
With command syntax, MATLAB passes all inputs as character vectors (that is, as if they were
enclosed in single quotation marks) and does not assign outputs to variables. To pass a data type
other than a character vector, use the function syntax. To pass a value that contains a space, you have
two options. One is to use function syntax. The other is to put single quotes around the value.
Otherwise, MATLAB treats the space as splitting your value into multiple inputs.
If a value is assigned to a variable, you must use function syntax to pass the value to the function.
Command syntax always passes inputs as character vectors and cannot pass variable values. For
example, create a variable and call the disp function with function syntax to pass the value of the
variable:
A = 123;
disp(A)
You cannot use command syntax to pass the value of A, because this call
disp A
is equivalent to
1-7
1 Syntax Basics
disp('A')
and returns
filename = 'accounts.txt';
A = int8(1:8);
B = A;
or
Some functions expect character vectors for variable names, such as save, load, clear, and whos.
For example,
requests information about variable X in the example file durer.mat. This command is equivalent to
whos('-file','durer.mat','X')
ls ./d
This could be a call to the ls function with './d' as its argument. It also could represent element-
wise division on the array ls, using the variable d as the divisor.
1-8
Choose Command Syntax or Function Syntax
If you issue this statement at the command line, MATLAB can access the current workspace and path
to determine whether ls and d are functions or variables. However, some components, such as the
Code Analyzer and the Editor/Debugger, operate without reference to the path or workspace. When
you are using those components, MATLAB uses syntactic rules to determine whether an expression is
a function call using command syntax.
In general, when MATLAB recognizes an identifier (which might name a function or a variable), it
analyzes the characters that follow the identifier to determine the type of expression, as follows:
ls =d
• An open parenthesis after an identifier implies a function call. For example:
ls('./d')
• Space after an identifier, but not after a potential operator, implies a function call using command
syntax. For example:
ls ./d
• Spaces on both sides of a potential operator, or no spaces on either side of the operator, imply an
operation on variables. For example, these statements are equivalent:
ls ./ d
ls./d
Therefore, MATLAB treats the potentially ambiguous statement ls ./d as a call to the ls function
using command syntax.
The best practice is to avoid defining variable names that conflict with common functions, to prevent
any ambiguity.
See Also
“Calling Functions” | “Create Functions in Files” on page 20-2
1-9
1 Syntax Basics
Issue
You may encounter the following error message, or something similar, while working with functions
or variables in MATLAB:
These errors usually indicate that MATLAB cannot find a particular variable or MATLAB program file
in the current directory or on the search path.
Possible Solutions
Verify Spelling of Function or Variable Name
One of the most common causes is misspelling the function or variable name. Especially with longer
names or names containing similar characters (such as the letter l and numeral one), it is easy to
make mistakes and hard to detect them.
Often, when you misspell a MATLAB function, a suggested function name appears in the Command
Window. For example, this command fails because it includes an uppercase letter in the function
name:
accumArray
When this happens, press Enter to execute the suggested command or Esc to dismiss it.
Object methods are typically called using function syntax: for instance method(object,inputs).
Alternatively, they can be called using dot notation: for instance object.method(inputs). One
common error is to mix these syntaxes. For instance, you might call the method using function syntax,
but to provide inputs following dot notation syntax and leave out the object as an input: for instance,
method(inputs). To avoid this, when calling an object method, make sure you specify the object
first, either through the first input of function syntax or through the first identifier of dot notation.
When you write a function, you establish its name when you write its function definition line. This
name should always match the name of the file you save it to. For example, if you create a function
named curveplot,
then you should name the file containing that function curveplot.m. If you create a pcode file for
the function, then name that file curveplot.p. In the case of conflicting function and file names, the
file name overrides the name given to the function. In this example, if you save the curveplot
1-10
Resolve Error: Undefined Function or Variable
function to a file named curveplotfunction.m, then attempts to invoke the function using the
function name will fail:
curveplot
Undefined function or variable 'curveplot'.
If you encounter this problem, change either the function name or file name so that they are the
same.
To Locate the file that defines this function, use the MATLAB Find Files utility as follows:
1
On the Home tab, in the File section, click Find Files.
2 Under Find files named, enter *.m
3 Under Find files containing text, enter the function name.
4 Click the Find button
If you are unable to use a built-in function from MATLAB or its toolboxes, make sure that the function
is installed and is the correct version.
If you do not know which toolbox contains the function you need, search for the function
documentation at https://www.mathworks.com/help. The toolbox name appears at the top of the
function reference page. Alternatively, for steps to identify toolboxes that a function depends on, see
“Identify Program Dependencies” on page 25-2.
Once you know which toolbox the function belongs to, use the ver function to see which toolboxes
are installed on the system from which you run MATLAB. The ver function displays a list of all
currently installed MathWorks® products. If you can locate the toolbox you need in the output
displayed by ver, then the toolbox is installed. If you cannot, you need to install it in order to use it.
For help with installing MathWorks products, see “Install License Manager on License Server”.
1-11
1 Syntax Basics
Tip If you have a custom file path, this step will delete it.
The MATLAB search path is a subset of all the folders in the file system. MATLAB uses the search
path to locate files used with MathWorks products efficiently. For more information, see “What Is the
MATLAB Search Path?”.
If the function you are attempting to use is part of a toolbox, then verify that the toolbox is available
using ver.
Because MATLAB stores the toolbox information in a cache file, you need to first update this cache
and then reset the path.
1
On the Home tab, in the Environment section, click Preferences.
A small dialog box opens warning that you will lose your current path settings if you proceed.
Select Yes if you decide to proceed.
Run ver to see if the toolbox is installed. If not, you may need to reinstall this toolbox to use this
function. For more information about installing a toolbox, see How do I install additional toolboxes
into an existing installation of MATLAB.
Once ver shows your toolbox, run the following command to see if you can find the function:
replacing <functionname> with the name of the function. If MATLAB finds your function file, it
presents you with the path to it. You can add that file to the path using the addpath function. If it
does not, make sure the necessary toolbox is installed, and that it is the correct version.
If you are unable to use a built-in function from a MATLAB toolbox and have confirmed that the
toolbox is installed, make sure that you have an active license for that toolbox. Use license to
display currently active licenses. For additional support for managing licenses, see “Manage Your
Licenses”.
1-12
2
Program Components
Arithmetic Operators
Symbol Role More Information
+ Addition plus
+ Unary plus uplus
- Subtraction minus
- Unary minus uminus
.* Element-wise multiplication times
* Matrix multiplication mtimes
./ Element-wise right division rdivide
/ Matrix right division mrdivide
.\ Element-wise left division ldivide
\ Matrix left division mldivide
Relational Operators
Symbol Role More Information
== Equal to eq
~= Not equal to ne
> Greater than gt
>= Greater than or equal to ge
< Less than lt
<= Less than or equal to le
Logical Operators
Symbol Role More Information
& Find logical AND and
| Find logical OR or
&& Find logical AND (with short- Logical Operators: Short-
circuiting) Circuit && ||
2-2
MATLAB Operators and Special Characters
Special Characters
@ Name: At symbol
Uses:
Description: The @ symbol forms a handle to either the named function that follows the @
sign, or to the anonymous function that follows the @ sign. You can also use @ to call
superclass methods from subclasses.
Examples
fhandle = @myfun
disp@MySuper(obj)
Call the superclass constructor from a subclass using the object being constructed:
obj = obj@MySuper(arg1,arg2,...)
More Information:
2-3
2 Program Components
Uses:
• Decimal point
• Element-wise operations
• Structure field access
• Object property or method specifier
Description: The period character separates the integral and fractional parts of a
number, such as 3.1415. MATLAB operators that contain a period always work element-
wise. The period character also enables you to access the fields in a structure, as well as
the properties and methods of an object.
Examples
Decimal point:
102.5543
Element-wise operations:
A.*B
A.^2
myStruct.f1
myObj.PropertyName
More Information
2-4
MATLAB Operators and Special Characters
Description: Three or more periods at the end of a line continues the current command
on the next line. If three or more periods occur before the end of a line, then MATLAB
ignores the rest of the line and continues to the next line. This effectively makes a
comment out of anything on the current line that follows the three periods.
Examples
Break a character vector up on multiple lines and concatenate the lines together:
To comment out one line in a multiline command, use ... at the beginning of the line to
ensure that the command remains complete. If you use % to comment out a line it
produces an error:
y = 1 +...
2 +...
% 3 +...
4;
However, this code runs properly since the third line does not produce a gap in the
command:
y = 1 +...
2 +...
... 3 +...
4;
More Information
2-5
2 Program Components
, Name: Comma
Uses: Separator
Examples
A = [12,13; 14,15]
Separate subscripts:
A(1,2)
[Y,I] = max(A,[],2)
More Information
• horzcat
2-6
MATLAB Operators and Special Characters
: Name: Colon
Uses:
• Vector creation
• Indexing
• For-loop iteration
Description: Use the colon operator to create regularly spaced vectors, index into
arrays, and define the bounds of a for loop.
Examples
Create a vector:
x = 1:10
x = 1:3:19
A(:)
A = rand(3,4);
A(:) = 1:12;
A(2:5,3)
A(:,3)
x = 1;
for k = 1:25
x = x + x^2;
end
More Information
• colon
• “Creating, Concatenating, and Expanding Matrices”
2-7
2 Program Components
; Name: Semicolon
Uses:
Examples
A = [12,13; 14,15]
Y = max(A);
More Information
• vertcat
2-8
MATLAB Operators and Special Characters
( ) Name: Parentheses
Uses:
• Operator precedence
• Function argument enclosure
• Indexing
Examples
Precedence of operations:
(A.*(B./C)) - D
plot(X,Y,'r*')
C = union(A,B)
Indexing:
A(3,:)
A(1,2)
A(1:5,1)
More Information
2-9
2 Program Components
Uses:
• Array construction
• Array concatenation
• Empty matrix and array element deletion
• Multiple output argument assignment
Examples
X = [10 12 -3]
A = rand(3);
A = [A; 10 20 30]
A = []
A(:,1) = []
[C,iA,iB] = union(A,B)
More Information
2-10
MATLAB Operators and Special Characters
Description: Use curly braces to construct a cell array, or to access the contents of a
particular cell in a cell array.
Examples
To construct a cell array, enclose all elements of the array in curly braces:
Index to a specific cell array element by enclosing all indices in curly braces:
A = C{4,7,2}
More Information
• “Cell Arrays”
% Name: Percent
Uses:
• Comment
• Conversion specifier
Description: The percent sign is most commonly used to indicate nonexecutable text
within the body of a program. This text is normally used to include comments in your
code.
Two percent signs, %%, serve as a cell delimiter as described in “Create and Run Sections
in Code” on page 18-5.
Examples
More Information
2-11
2 Program Components
Description: The %{ and %} symbols enclose a block of comments that extend beyond
one line.
Note With the exception of whitespace characters, the %{ and %} operators must appear
alone on the lines that immediately precede and follow the block of help text. Do not
include any other text on these lines.
Examples
Enclose any multiline comments with percent followed by an opening or closing brace:
%{
The purpose of this routine is to compute
the value of ...
%}
More Information
Description: The exclamation point precedes operating system commands that you want
to execute from within MATLAB.
Examples
The exclamation point initiates a shell escape function. Such a function is to be performed
directly by the operating system:
!rmdir oldtests
More Information
2-12
MATLAB Operators and Special Characters
Description: The question mark retrieves the meta.class object for a particular class
name. The ? operator works only with a class name, not an object.
Examples
?inputParser
More Information
• metaclass
'' Name: Single quotes
Description: Use single quotes to create character vectors that have class char.
Examples
More Information
Description: Use double quotes to create string scalars that have class string.
Examples
S = "Hello, world"
More Information
2-13
2 Program Components
Uses: Separator
Description: Use the space character to separate row elements in an array constructor,
or the values returned by a function. In these contexts, the space character and comma
are equivalent.
Examples
Uses: Separator
Examples
2-14
MATLAB Operators and Special Characters
~ Name: Tilde
Uses:
• Logical NOT
• Argument placeholder
Description: Use the tilde symbol to represent logical NOT or to suppress specific input
or output arguments.
Examples
A = eye(3);
~A
A = [1 -1; 0 1]
B = [1 -2; 3 2]
A~=B
[~,~,iB] = union(A,B)
More Information
• not
• “Ignore Inputs in Function Definitions” on page 21-10
• “Ignore Function Outputs” on page 1-4
= Name: Equal sign
Uses: Assignment
Description: Use the equal sign to assign values to a variable. The syntax B = A stores
the elements of A in variable B.
Note The = character is for assignment, whereas the == character is for comparing the
elements in two arrays. See eq for more information.
Examples
Create a matrix A. Assign the values in A to a new variable, B. Lastly, assign a new value
to the first element in B.
A = [1 0; -1 0];
B = A;
B(1) = 200;
2-15
2 Program Components
Examples
More Information:
• “Subclass Syntax”
.? Name: Dot question mark
Description:
When using function argument validation, you can define the fields of the name-value
structure as the names of all writeable properties of the class.
Examples
Specify the field names of the propArgs structure as the writeable properties of the
matlab.graphics.primitive.Line class.
function f(propArgs)
arguments
propArgs.?matlab.graphics.primitive.Line
end
% Function code
...
end
More Information:
Use the special characters in this table to specify a folder path using a character vector or string.
2-16
MATLAB Operators and Special Characters
Description: In addition to their use as mathematical operators, the slash and backslash
characters separate the elements of a path or folder. On Microsoft Windows based
systems, both slash and backslash have the same effect. On The Open Group UNIX based
systems, you must use slash only.
Examples
dir([matlabroot '\toolbox\matlab\elmat\shiftdim.m'])
dir([matlabroot '/toolbox/matlab/elmat/shiftdim.m'])
dir([matlabroot '/toolbox/matlab/elmat/shiftdim.m'])
.. Name: Dot dot
Description: Two dots in succession refers to the parent of the current folder. Use this
character to specify folder paths relative to the current folder.
Examples
To go up two levels in the folder tree and down into the test folder, use:
cd ..\..\test
More Information
• cd
* Name: Asterisk
Description: In addition to being the symbol for matrix multiplication, the asterisk * is
used as a wildcard character.
Wildcards are generally used in file operations that act on multiple files or folders.
MATLAB matches all characters in the name exactly except for the wildcard character *,
which can match any one or more characters.
Examples
Locate all files with names that start with january_ and have a .mat file extension:
dir('january_*.mat')
2-17
2 Program Components
@ Name: At symbol
Examples
\@myClass\get.m
More Information
Examples
+mypack
+mypack/pkfcn.m % a package function
+mypack/@myClass % class folder in a package
More Information
There are certain special characters that you cannot enter as ordinary text. Instead, you must use
unique character sequences to represent them. Use the symbols in this table to format strings and
character vectors on their own or in conjunction with formatting functions like compose, sprintf,
and error. For more information, see “Formatting Text” on page 6-24.
2-18
MATLAB Operators and Special Characters
See Also
More About
• “Array vs. Matrix Operations” on page 2-20
• “Array Comparison with Relational Operators” on page 2-29
• “Compatible Array Sizes for Basic Operations” on page 2-25
• “Operator Precedence” on page 2-32
• “Find Array Elements That Meet a Condition” on page 5-2
• “Greek Letters and Special Characters in Chart Text”
2-19
2 Program Components
Introduction
MATLAB has two different types of arithmetic operations: array operations and matrix operations.
You can use these arithmetic operations to perform numeric computations, for example, adding two
numbers, raising the elements of an array to a given power, or multiplying two matrices.
Matrix operations follow the rules of linear algebra. By contrast, array operations execute element by
element operations and support multidimensional arrays. The period character (.) distinguishes the
array operations from the matrix operations. However, since the matrix and array operations are the
same for addition and subtraction, the character pairs .+ and .- are unnecessary.
Array Operations
Array operations execute element by element operations on corresponding elements of vectors,
matrices, and multidimensional arrays. If the operands have the same size, then each element in the
first operand gets matched up with the element in the same location in the second operand. If the
operands have compatible sizes, then each input is implicitly expanded as needed to match the size of
the other. For more information, see “Compatible Array Sizes for Basic Operations” on page 2-25.
As a simple example, you can add two vectors with the same size.
A = [1 1 1]
A =
1 1 1
B = [1 2 3]
B =
1 2 3
A+B
ans =
2 3 4
If one operand is a scalar and the other is not, then MATLAB implicitly expands the scalar to be the
same size as the other operand. For example, you can compute the element-wise product of a scalar
and a matrix.
A = [1 2 3; 1 2 3]
A =
2-20
Array vs. Matrix Operations
1 2 3
1 2 3
3.*A
ans =
3 6 9
3 6 9
Implicit expansion also works if you subtract a 1-by-3 vector from a 3-by-3 matrix because the two
sizes are compatible. When you perform the subtraction, the vector is implicitly expanded to become
a 3-by-3 matrix.
A = [1 1 1; 2 2 2; 3 3 3]
A =
1 1 1
2 2 2
3 3 3
m = [2 4 6]
m =
2 4 6
A - m
ans =
-1 -3 -5
0 -2 -4
1 -1 -3
A row vector and a column vector have compatible sizes. If you add a 1-by-3 vector to a 2-by-1 vector,
then each vector implicitly expands into a 2-by-3 matrix before MATLAB executes the element-wise
addition.
x = [1 2 3]
x =
1 2 3
y = [10; 15]
y =
10
15
x + y
ans =
11 12 13
16 17 18
2-21
2 Program Components
If the sizes of the two operands are incompatible, then you get an error.
A = [8 1 6; 3 5 7; 4 9 2]
A =
8 1 6
3 5 7
4 9 2
m = [2 4]
m =
2 4
A - m
The following table provides a summary of arithmetic array operators in MATLAB. For function-
specific information, click the link to the function reference page in the last column.
Matrix Operations
Matrix operations follow the rules of linear algebra and are not compatible with multidimensional
arrays. The required size and shape of the inputs in relation to one another depends on the operation.
For nonscalar inputs, the matrix operators generally calculate different answers than their array
operator counterparts.
For example, if you use the matrix right division operator, /, to divide two matrices, the matrices
must have the same number of columns. But if you use the matrix multiplication operator, *, to
multiply two matrices, then the matrices must have a common inner dimension. That is, the number
of columns in the first input must be equal to the number of rows in the second input. The matrix
multiplication operator calculates the product of two matrices with the formula,
2-22
Array vs. Matrix Operations
n
C(i, j) = ∑ A(i, k)B(k, j) .
k=1
A = [1 3;2 4]
A =
1 3
2 4
B = [3 0;1 5]
B =
3 0
1 5
A*B
ans =
6 15
10 20
The previous matrix product is not equal to the following element-wise product.
A.*B
ans =
3 0
2 20
The following table provides a summary of matrix arithmetic operators in MATLAB. For function-
specific information, click the link to the function reference page in the last column.
2-23
2 Program Components
See Also
More About
• “Compatible Array Sizes for Basic Operations” on page 2-25
• “MATLAB Operators and Special Characters” on page 2-2
• “Operator Precedence” on page 2-32
2-24
Compatible Array Sizes for Basic Operations
These are some combinations of scalars, vectors, and matrices that have compatible sizes:
• One input is a matrix, and the other is a column vector with the same number of rows.
2-25
2 Program Components
Multidimensional Arrays
Every array in MATLAB has trailing dimensions of size 1. For multidimensional arrays, this means
that a 3-by-4 matrix is the same as a matrix of size 3-by-4-by-1-by-1-by-1. Examples of
multidimensional arrays with compatible sizes are:
• One input is a matrix, and the other is a 3-D array with the same number of rows and columns.
• One input is a matrix, and the other is a 3-D array. The dimensions are all either the same or one
of them is 1.
Empty Arrays
The rules are the same for empty arrays or arrays that have a dimension size of zero. The size of the
dimension that is not equal to 1 determines the size of the output. This means that dimensions with a
2-26
Compatible Array Sizes for Basic Operations
size of zero must be paired with a dimension of size 1 or 0 in the other array, and that the output has
a dimension size of 0.
A: 1-by-0
B: 3-by-1
Result: 3-by-0
A: 3-by-2
B: 4-by-2
• Two nonscalar row vectors with lengths that are not the same.
A: 1-by-3
B: 1-by-4
Examples
Subtract Vector from Matrix
To simplify vector-matrix operations, use implicit expansion with dimensional functions such as sum,
mean, min, and others.
For example, calculate the mean value of each column in a matrix, then subtract the mean value from
each element.
A = magic(3)
A =
8 1 6
3 5 7
4 9 2
C = mean(A)
C =
5 5 5
A - C
ans =
3 -4 1
-2 0 2
-1 4 -3
Row and column vectors have compatible sizes, and when you perform an operation on them the
result is a matrix.
2-27
2 Program Components
For example, add a row and column vector. The result is the same as bsxfun(@plus,a,b).
a = [1 2 3 4]
ans =
1 2 3 4
b = [5; 6; 7]
ans =
5
6
7
a + b
ans =
6 7 8 9
7 8 9 10
8 9 10 11
See Also
bsxfun
More About
• “Array vs. Matrix Operations” on page 2-20
• “MATLAB Operators and Special Characters” on page 2-2
2-28
Array Comparison with Relational Operators
Relational operators compare operands quantitatively, using operators like “less than”, “greater
than”, and “not equal to.” The result of a relational comparison is a logical array indicating the
locations where the relation is true.
Array Comparison
Numeric Arrays
The relational operators perform element-wise comparisons between two arrays. The arrays must
have compatible sizes to facilitate the operation. Arrays with compatible sizes are implicitly expanded
to be the same size during execution of the calculation. In the simplest cases, the two operands are
arrays of the same size, or one is a scalar. For more information, see “Compatible Array Sizes for
Basic Operations” on page 2-25.
For example, if you compare two matrices of the same size, then the result is a logical matrix of the
same size with elements indicating where the relation is true.
A = [2 4 6; 8 10 12]
A =
2 4 6
8 10 12
B = [5 5 5; 9 9 9]
B =
5 5 5
9 9 9
A < B
ans =
2-29
2 Program Components
1 1 0
1 0 0
A > 7
ans =
0 0 0
1 1 1
If you compare a 1-by-N row vector to an M-by-1 column vector, then MATLAB expands each vector
into an M-by-N matrix before performing the comparison. The resulting matrix contains the
comparison result for each combination of elements in the vectors.
A = 1:3
A =
1 2 3
B = [2; 3]
B =
2
3
A >= B
ans =
0 1 1
0 0 1
Empty Arrays
The relational operators work with arrays for which any dimension has size zero, as long as both
arrays have compatible sizes. This means that if one array has a dimension size of zero, then the size
of the corresponding dimension in the other array must be 1 or zero, and the size of that dimension in
the output is zero.
A = ones(3,0);
B = ones(3,1);
A == B
ans =
A == []
return an error if A is not 0-by-0 or 1-by-1. This behavior is consistent with that of all other binary
operators, such as +, -, >, <, &, |, and so on.
2-30
Array Comparison with Relational Operators
Complex Numbers
• The operators >, <, >=, and <= use only the real part of the operands in performing comparisons.
• The operators == and ~= test both real and imaginary parts of the operands.
Logic Statements
Use relational operators in conjunction with the logical operators A & B (AND), A | B (OR),
xor(A,B) (XOR), and ~A (NOT), to string together more complex logical statements.
For example, you can locate where negative elements occur in two arrays.
A = [2 -1; -3 10]
A =
2 -1
-3 10
B = [0 -2; -3 -1]
B =
0 -2
-3 -1
ans =
0 1
1 0
For more examples, see “Find Array Elements That Meet a Condition” on page 5-2.
See Also
gt | lt | ge | le | eq | ne
More About
• “Array vs. Matrix Operations” on page 2-20
• “Compatible Array Sizes for Basic Operations” on page 2-25
• “MATLAB Operators and Special Characters” on page 2-2
2-31
2 Program Components
Operator Precedence
You can build expressions that use any combination of arithmetic, relational, and logical operators.
Precedence levels determine the order in which MATLAB evaluates an expression. Within each
precedence level, operators have equal precedence and are evaluated from left to right. The
precedence rules for MATLAB operators are shown in this list, ordered from highest precedence level
to lowest precedence level:
1 Parentheses ()
2 Transpose (.'), power (.^), complex conjugate transpose ('), matrix power (^)
3 Power with unary minus (.^-), unary plus (.^+), or logical negation (.^~) as well as matrix
power with unary minus (^-), unary plus (^+), or logical negation (^~).
Note Although most operators work from left to right, the operators (^-), (.^-), (^+), (.^+),
(^~), and (.^~) work from second from the right to left. It is recommended that you use
parentheses to explicitly specify the intended precedence of statements containing these
operator combinations.
4 Unary plus (+), unary minus (-), logical negation (~)
5 Multiplication (.*), right division (./), left division (.\), matrix multiplication (*), matrix
right division (/), matrix left division (\)
6 Addition (+), subtraction (-)
7 Colon operator (:)
8 Less than (<), less than or equal to (<=), greater than (>), greater than or equal to (>=),
equal to (==), not equal to (~=)
9 Element-wise AND (&)
10 Element-wise OR (|)
11 Short-circuit AND (&&)
12 Short-circuit OR (||)
The same precedence rule holds true for the && and || operators.
2-32
Operator Precedence
C = (A./B).^2
C =
2.2500 81.0000 1.0000
See Also
More About
• “Array vs. Matrix Operations” on page 2-20
• “Compatible Array Sizes for Basic Operations” on page 2-25
• “Array Comparison with Relational Operators” on page 2-29
• “MATLAB Operators and Special Characters” on page 2-2
2-33
2 Program Components
Use random points picked from the peaks function in the domain [ − 3, 3] × [ − 3, 3] as the data set.
Add a small amount of noise to the data.
xy = rand(10000,2)*6-3;
z = peaks(xy(:,1),xy(:,2)) + 0.5-rand(10000,1);
A = [xy z];
plot3(A(:,1), A(:,2), A(:,3), '.')
view(-28,32)
Find points that have similar x and y coordinates using uniquetol with these options:
• Specify ByRows as true, since the rows of A contain the point coordinates.
• Specify OutputAllIndices as true to return the indices for all points that are within tolerance
of each other.
• Specify DataScale as [1 1 Inf] to use an absolute tolerance for the x and y coordinates, while
ignoring the z-coordinate.
DS = [1 1 Inf];
[C,ia] = uniquetol(A, 0.3, 'ByRows', true, ...
'OutputAllIndices', true, 'DataScale', DS);
2-34
Average Similar Data Points Using a Tolerance
Average each group of points that are within tolerance (including the z-coordinates), producing a
reduced data set that still holds the general shape of the original data.
for k = 1:length(ia)
aveA(k,:) = mean(A(ia{k},:),1);
end
hold on
plot3(aveA(:,1), aveA(:,2), aveA(:,3), '.r', 'MarkerSize', 15)
See Also
uniquetol
More About
• “Group Scattered Data Using a Tolerance” on page 2-36
2-35
2 Program Components
Create a set of random 2-D points. Then create and plot a grid of equally spaced points on top of the
random data.
x = rand(10000,2);
[a,b] = meshgrid(0:0.1:1);
gridPoints = [a(:), b(:)];
plot(x(:,1), x(:,2), '.')
hold on
plot(gridPoints(:,1), gridPoints(:,2), 'xr', 'Markersize', 6)
Use ismembertol to locate the data points in x that are within tolerance of the grid points in
gridPoints. Use these options with ismembertol:
• Specify ByRows as true, since the point coordinates are in the rows of x.
• Specify OutputAllIndices as true to return all of the indices for rows in x that are within
tolerance of the corresponding row in gridPoints.
For each grid point, plot the points in x that are within tolerance of that grid point.
2-36
Group Scattered Data Using a Tolerance
figure
hold on
for k = 1:length(LocB)
plot(x(LocB{k},1), x(LocB{k},2), '.')
end
plot(gridPoints(:,1), gridPoints(:,2), 'xr', 'Markersize', 6)
See Also
ismembertol
More About
• “Average Similar Data Points Using a Tolerance” on page 2-34
2-37
2 Program Components
Bit-Wise Operations
This topic shows how to use bit-wise operations in MATLAB® to manipulate the bits of numbers.
Operating on bits is directly supported by most modern CPUs. In many cases, manipulating the bits of
a number in this way is quicker than performing arithmetic operations like division or multiplication.
Number Representations
Any number can be represented with bits (also known as binary digits). The binary, or base 2, form of
a number contains 1s and 0s to indicate which powers of 2 are present in the number. For example,
the 8-bit binary form of 7 is
00000111
A collection of 8 bits is also called 1 byte. In binary representations, the bits are counted from the
right to the left, so the first bit in this representation is a 1. This number represents 7 because
2 1 0
2 + 2 + 2 = 7.
When you type numbers into MATLAB, it assumes the numbers are double precision (a 64-bit binary
representation). However, you can also specify single-precision numbers (32-bit binary
representation) and integers (signed or unsigned, from 8 to 64 bits). For example, the most memory
efficient way to store the number 7 is with an 8-bit unsigned integer:
a = uint8(7)
a = uint8
7
You can even specify the binary form directly using the prefix 0b followed by the binary digits (for
more information, see “Hexadecimal and Binary Values” on page 6-55). MATLAB stores the number
in an integer format with the fewest number of bits. Instead of specifying all the bits, you need to
specify only the left-most 1 and all the digits to the right of it. The bits to the left of that bit are
trivially zero. So the number 7 is:
b = 0b111
b = uint8
7
MATLAB stores negative integers using two's complement. For example, consider the 8-bit signed
integer -8. To find the two's complement bit pattern for this number:
1 Start with the bit pattern of the positive version of the number, 8: 00001000.
2 Next, flip all of the bits: 11110111.
3 Finally, add 1 to the result: 11111000.
n = 0b11111000s8
n = int8
-8
2-38
Bit-Wise Operations
MATLAB does not natively display the binary format of numbers. For that, you can use the dec2bin
function, which returns a character vector of binary digits for positive integers. Again, this function
returns only the digits that are not trivially zero.
dec2bin(b)
ans =
'111'
You can use bin2dec to switch between the two formats. For example, you can convert the binary
digits 10110101 to decimal format with the commands
data = [1 0 1 1 0 1 0 1];
dec = bin2dec(num2str(data))
dec = 181
The cast and typecast functions are also useful to switch among different data types. These
functions are similar, but they differ in how they treat the underlying storage of the number:
Because MATLAB does not display the digits of a binary number directly, you must pay attention to
data types when you work with bit-wise operations. Some functions return binary digits as a
character vector (dec2bin), some return the decimal number (bitand), and others return a vector of
the bits themselves (bitget).
MATLAB has several functions that enable you to perform logical operations on the bits of two equal-
length binary representations of numbers, known as bit masking:
• bitand — If both digits are 1, then the resulting digit is also a 1. Otherwise, the resulting digit is
0.
• bitor — If either digit is 1, then the resulting digit is also a 1. Otherwise, the resulting digit is 0.
• bitxor — If the digits are different, then the resulting digit is a 1. Otherwise, the resulting digit
is 0.
In addition to these functions, the bit-wise complement is available with bitcmp, but this is a unary
operation that flips the bits in only one number at a time.
One use of bit masking is to query the status of a particular bit. For example, if you use a bit-wise
AND operation with the binary number 00001000, you can query the status of the fourth bit. You can
then shift that bit to the first position so that MATLAB returns a 0 or 1 (the next section describes bit
shifting in more detail).
n = 0b10111001;
n4 = bitand(n,0b1000);
n4 = bitshift(n4,-3)
n4 = uint8
1
Bit-wise operations can have surprising applications. For example, consider the 8-bit binary
representation of the number n = 8:
2-39
2 Program Components
00001000
8 is a power of 2, so its binary representation contains a single 1. Now consider the number
n − 1 = 7:
00000111
By subtracting 1, all of the bits starting at the right-most 1 are flipped. As a result, when n is a power
of 2, corresponding digits of n and n − 1 are always different, and the bit-wise AND returns zero.
n = 0b1000;
bitand(n,n-1)
ans = uint8
0
0
However, when n is not a power of 2, then the right-most 1 is for the 2 bit, so n and n − 1 have all
0
the same bits except for the 2 bit. For this case, the bit-wise AND returns a nonzero number.
n = 0b101;
bitand(n,n-1)
ans = uint8
4
This operation suggests a simple function that operates on the bits of a given input number to check
whether the number is a power of 2:
function tf = isPowerOfTwo(n)
tf = n && ~bitand(n,n-1);
end
The use of the short-circuit AND operator && checks to make sure that n is not zero. If it is, then the
function does not need to calculate bitand(n,n-1) to know that the correct answer is false.
Shifting Bits
Because bit-wise logical operations compare corresponding bits in two numbers, it is useful to be able
to move the bits around to change which bits are compared. You can use bitshift to perform this
operation:
• bitshift(A,N) shifts the bits of A to the left by N digits. This is equivalent to multiplying A by
N
2 .
• bitshift(A,-N) shifts the bits of A to the right by N digits. This is equivalent to dividing A by
N
2 .
These operations are sometimes written A<<N (left shift) and A>>N (right shift), but MATLAB does not
use << and >> operators for this purpose.
When the bits of a number are shifted, some bits fall off the end of the number, and 0s or 1s are
introduced to fill in the newly created space. When you shift bits to the left, the bits are filled in on
the right; when you shift bits to the right, the bits are filled in on the left.
For example, if you shift the bits of the number 8 (binary: 1000) to the right by one digit, you get 4
(binary: 100).
2-40
Bit-Wise Operations
n = 0b1000;
bitshift(n,-1)
ans = uint8
4
Similarly, if you shift the number 15 (binary: 1111) to the left by two digits, you get 60 (binary:
111100).
n = 0b1111;
bitshift(15,2)
ans = 60
When you shift the bits of a negative number, bitshift preserves the signed bit. For example, if you
shift the signed integer -3 (binary: 11111101) to the right by 2 digits, you get -1 (binary: 11111111).
In these cases, bitshift fills in on the left with 1s rather than 0s.
n = 0b11111101s8;
bitshift(n,-2)
ans = int8
-1
Writing Bits
You can use the bitset function to change the bits in a number. For example, change the first bit of
the number 8 to a 1 (which adds 1 to the number):
bitset(8,1)
ans = 9
By default, bitset flips bits to on or 1. You can optionally use the third input argument to specify the
bit value.
bitset does not change multiple bits at once, so you need to use a for loop to change multiple bits.
Therefore, the bits you change can be either consecutive or nonconsecutive. For example, change the
first two bits of the binary number 1000:
bits = [1 2];
c = 0b1000;
for k = 1:numel(bits)
c = bitset(c,bits(k));
end
dec2bin(c)
ans =
'1011'
Another common use of bitset is to convert a vector of binary digits into decimal format. For
example, use a loop to set the individual bits of the integer 11001101.
data = [1 1 0 0 1 1 0 1];
n = length(data);
dec = 0b0u8;
for k = 1:n
dec = bitset(dec,n+1-k,data(k));
2-41
2 Program Components
end
dec
dec = uint8
205
dec2bin(dec)
ans =
'11001101'
Another use of bit shifting is to isolate consecutive sections of bits. For example, read the last four
bits in the 16-bit number 0110000010100000. Recall that the last four bits are on the left of the
binary representation.
n = 0b0110000010100000;
dec2bin(bitshift(n,-12))
ans =
'110'
To isolate consecutive bits in the middle of the number, you can combine the use of bit shifting with
logical masking. For example, to extract the 13th and 14th bits, you can shift the bits to the right by
12 and then mask the resulting four bits with 0011. Because the inputs to bitand must be the same
integer data type, you can specify 0011 as an unsigned 16-bit integer with 0b11u16. Without the -
u16 suffix, MATLAB stores the number as an unsigned 8-bit integer.
m = 0b11u16;
dec2bin(bitand(bitshift(n,-12),m))
ans =
'10'
Another way to read consecutive bits is with bitget, which reads specified bits from a number. You
can use colon notation to specify several consecutive bits to read. For example, read the last 8 bits of
n.
bitget(n,16:-1:8)
0 1 1 0 0 0 0 0 1
You can also use bitget to read bits from a number when the bits are not next to each other. For
example, read the 5th, 8th, and 14th bits from n.
2-42
Bit-Wise Operations
1 1 0
See Also
bitand | bitor | bitxor | bitget | bitset | bitshift | bitcmp
More About
• “Integers” on page 4-2
• “Perform Cyclic Redundancy Check” on page 2-44
• “Hexadecimal and Binary Values” on page 6-55
2-43
2 Program Components
1101100111011010
To obtain the check value, divide this number by the polynomial x3 + x2 + x + 1. You can represent
this polynomial with its coefficients: 1111.
The division is performed in steps, and after each step the polynomial divisor is aligned with the left-
most 1 in the number. Because the result of dividing by the four term polynomial has three bits (in
general dividing by a polynomial of length n + 1 produces a check value of length n), append the
number with 000 to calculate the remainder. At each step, the result uses the bit-wise XOR of the four
bits being operated on, and all other bits are unchanged.
1101100111011010 000
1111
----------------
0010100111011010 000
Each successive division operates on the result of the previous step, so the second division is
0010100111011010 000
1111
----------------
0001010111011010 000
The division is completed once the dividend is all zeros. The complete division, including the above
two steps, is
1101100111011010 000
1111
0010100111011010 000
1111
0001010111011010 000
1111
0000101111011010 000
1111
0000010011011010 000
1111
0000001101011010 000
1111
0000000010011010 000
1111
0000000001101010 000
1111
2-44
Perform Cyclic Redundancy Check
0000000000010010 000
1111
0000000000001100 000
1111
0000000000000011 000
11 11
0000000000000000 110
The remainder bits, 110, are the check value for this message.
In MATLAB®, you can perform this same operation to obtain the check value using bit-wise
operations. First, define variables for the message and polynomial divisor. Use unsigned 32-bit
integers so that extra bits are available for the remainder.
message = 0b1101100111011010u32;
messageLength = 16;
divisor = 0b1111u32;
divisorDegree = 3;
Next, initialize the polynomial divisor. Use dec2bin to display the bits of the result.
divisor = bitshift(divisor,messageLength-divisorDegree-1);
dec2bin(divisor)
ans =
'1111000000000000'
Now, shift the divisor and message so that they have the correct number of bits (16 bits for the
message and 3 bits for the remainder).
divisor = bitshift(divisor,divisorDegree);
remainder = bitshift(message,divisorDegree);
dec2bin(divisor)
ans =
'1111000000000000000'
dec2bin(remainder)
ans =
'1101100111011010000'
Perform the division steps of the CRC using a for loop. The for loop always advances a single bit
each step, so include a check to see if the current digit is a 1. If the current digit is a 1, then the
division step is performed; otherwise, the loop advances a bit and continues.
for k = 1:messageLength
if bitget(remainder,messageLength+divisorDegree)
remainder = bitxor(remainder,divisor);
end
remainder = bitshift(remainder,1);
end
Shift the bits of the remainder to the right to get the check value for the operation.
CRC_check_value = bitshift(remainder,-messageLength);
dec2bin(CRC_check_value)
2-45
2 Program Components
ans =
'110'
You can use the check value to verify the integrity of a message by repeating the same division
operation. However, instead of using a remainder of 000 to start, use the check value 110. If the
message is error free, then the result of the division will be zero.
Reset the remainder variable, and add the CRC check value to the remainder bits using a bit-wise OR.
Introduce an error into the message by flipping one of the bit values with bitset.
remainder = bitshift(message,divisorDegree);
remainder = bitor(remainder,CRC_check_value);
remainder = bitset(remainder,6);
dec2bin(remainder)
ans =
'1101100111011110110'
Perform the CRC division operation and then check if the result is zero.
for k = 1:messageLength
if bitget(remainder,messageLength+divisorDegree)
remainder = bitxor(remainder,divisor);
end
remainder = bitshift(remainder,1);
end
if remainder == 0
disp('Message is error free.')
else
disp('Message contains errors.')
end
References
[1] Sklar, Bernard. Digital Communications: Fundamentals and Applications. Englewood Cliffs, NJ:
Prentice Hall, 1988.
[2] Wicker, Stephen B. Error Control Systems for Digital Communication and Storage. Upper Saddle
River, NJ: Prentice Hall, 1995.
See Also
bitshift | bitxor
More About
• “Bit-Wise Operations” on page 2-38
• “Hexadecimal and Binary Values” on page 6-55
2-46
Conditional Statements
Conditional Statements
Conditional statements enable you to select at run time which block of code to execute. The simplest
conditional statement is an if statement. For example:
% If it is even, divide by 2
if rem(a, 2) == 0
disp('a is even')
b = a/2;
end
if statements can include alternate choices, using the optional keywords elseif or else. For
example:
a = randi(100, 1);
if a < 30
disp('small')
elseif a < 80
disp('medium')
else
disp('large')
end
Alternatively, when you want to test for equality against a set of known values, use a switch
statement. For example:
switch dayString
case 'Monday'
disp('Start of the work week')
case 'Tuesday'
disp('Day 2')
case 'Wednesday'
disp('Day 3')
case 'Thursday'
disp('Day 4')
case 'Friday'
disp('Last day of the work week')
otherwise
disp('Weekend!')
end
For both if and switch, MATLAB executes the code corresponding to the first true condition, and
then exits the code block. Each conditional statement requires the end keyword.
In general, when you have many possible discrete, known values, switch statements are easier to
read than if statements. However, you cannot test for inequality between switch and case values.
For example, you cannot implement this type of condition with a switch:
if yourNumber < 0
2-47
2 Program Components
disp('Negative')
elseif yourNumber > 0
disp('Positive')
else
disp('Zero')
end
See Also
if | switch | end | return
2-48
Loop Control Statements
• for statements loop a specific number of times, and keep track of each iteration with an
incrementing index variable.
x = ones(1,10);
for n = 2:6
x(n) = 2 * x(n - 1);
end
• while statements loop as long as a condition remains true.
For example, find the first integer n for which factorial(n) is a 100-digit number:
n = 1;
nFactorial = 1;
while nFactorial < 1e100
n = n + 1;
nFactorial = nFactorial * n;
end
It is a good idea to indent the loops for readability, especially when they are nested (that is, when one
loop contains another loop):
A = zeros(5,100);
for m = 1:5
for n = 1:100
A(m, n) = 1/(m + n - 1);
end
end
You can programmatically exit a loop using a break statement, or skip to the next iteration of a loop
using a continue statement. For example, count the number of lines in the help for the magic
function (that is, all comment lines until a blank line):
fid = fopen('magic.m','r');
count = 0;
while ~feof(fid)
line = fgetl(fid);
if isempty(line)
break
elseif ~strncmp(line,'%',1)
continue
end
count = count + 1;
end
fprintf('%d lines in MAGIC help\n',count);
fclose(fid);
2-49
2 Program Components
Tip If you inadvertently create an infinite loop (a loop that never ends on its own), stop execution of
the loop by pressing Ctrl+C.
See Also
for | while | break | continue | end
2-50
Regular Expressions
Regular Expressions
In this section...
“What Is a Regular Expression?” on page 2-51
“Steps for Building Expressions” on page 2-52
“Operators and Characters” on page 2-55
This topic describes what regular expressions are and how to use them to search text. Regular
expressions are flexible and powerful, though they use complex syntax. An alternative to regular
expressions is a pattern (since R2020b), which is simpler to define and results in code that is easier
to read. For more information, see “Build Pattern Expressions” on page 6-40.
The character vector 'Joh?n\w*' is an example of a regular expression. It defines a pattern that
starts with the letters Jo, is optionally followed by the letter h (indicated by 'h?'), is then followed
by the letter n, and ends with any number of word characters, that is, characters that are alphabetic,
numeric, or underscore (indicated by '\w*'). This pattern matches any of the following:
Regular expressions provide a unique way to search a volume of text for a particular subset of
characters within that text. Instead of looking for an exact character match as you would do with a
function like strfind, regular expressions give you the ability to look for a particular pattern of
characters.
km/h
km/hr
km/hour
kilometers/hour
kilometers per hour
You could locate any of the above terms in your text by issuing five separate search commands:
strfind(text, 'km/h');
strfind(text, 'km/hour');
% etc.
To be more efficient, however, you can build a single phrase that applies to all of these search terms:
2-51
2 Program Components
Translate this phrase into a regular expression (to be explained later in this section) and you have:
pattern = 'k(ilo)?m(eters)?(/|\sper\s)h(r|our)?';
Now locate one or more of the terms using just a single command:
ans =
There are four MATLAB functions that support searching and replacing characters using regular
expressions. The first three are similar in the input values they accept and the output values they
return. For details, click the links to the function reference pages.
Function Description
regexp Match regular expression.
regexpi Match regular expression, ignoring case.
regexprep Replace part of text using regular expression.
regexptranslate Translate text into regular expression.
When calling any of the first three functions, pass the text to be parsed and the regular expression in
the first two input arguments. When calling regexprep, pass an additional input that is an
expression that specifies a pattern for the replacement.
This entails breaking up the text you want to search for into groups of like character types. These
character types could be a series of lowercase letters, a dollar sign followed by three numbers
and then a decimal point, etc.
2 Express each pattern as a regular expression on page 2-53
2-52
Regular Expressions
Use the metacharacters and operators described in this documentation to express each segment
of your search pattern as a regular expression. Then combine these expression segments into the
single expression to use in the search.
3 Call the appropriate search function on page 2-54
Pass the text you want to parse to one of the search functions, such as regexp or regexpi, or to
the text replacement function, regexprep.
The example shown in this section searches a record containing contact information belonging to a
group of five friends. This information includes each person's name, telephone number, place of
residence, and email address. The goal is to extract specific information from the text..
contacts = { ...
'Harry 287-625-7315 Columbus, OH [email protected]'; ...
'Janice 529-882-1759 Fresno, CA [email protected]'; ...
'Mike 793-136-0975 Richmond, VA [email protected]'; ...
'Nadine 648-427-9947 Tampa, FL [email protected]'; ...
'Jason 697-336-7728 Montrose, CO [email protected]'};
The first part of the example builds a regular expression that represents the format of a standard
email address. Using that expression, the example then searches the information for the email
address of one of the group of friends. Contact information for Janice is in row 2 of the contacts cell
array:
contacts{2}
ans =
A typical email address is made up of standard components: the user's account name, followed by an
@ sign, the name of the user's internet service provider (ISP), a dot (period), and the domain to which
the ISP belongs. The table below lists these components in the left column, and generalizes the
format of each component in the right column.
In this step, you translate the general formats derived in Step 1 into segments of a regular
expression. You then add these segments together to form the entire expression.
2-53
2 Program Components
The table below shows the generalized format descriptions of each character pattern in the left-most
column. (This was carried forward from the right column of the table in Step 1.) The second column
shows the operators or metacharacters that represent the character pattern.
Assembling these patterns into one character vector gives you the complete expression:
email = '[a-z_]+@[a-z]+\.(com|net)';
In this step, you use the regular expression derived in Step 2 to match an email address for one of the
friends in the group. Use the regexp function to perform the search.
Here is the list of contact information shown earlier in this section. Each person's record occupies a
row of the contacts cell array:
contacts = { ...
'Harry 287-625-7315 Columbus, OH [email protected]'; ...
'Janice 529-882-1759 Fresno, CA [email protected]'; ...
'Mike 793-136-0975 Richmond, VA [email protected]'; ...
'Nadine 648-427-9947 Tampa, FL [email protected]'; ...
'Jason 697-336-7728 Montrose, CO [email protected]'};
This is the regular expression that represents an email address, as derived in Step 2:
email = '[a-z_]+@[a-z]+\.(com|net)';
Call the regexp function, passing row 2 of the contacts cell array and the email regular
expression. This returns the email address for Janice.
ans =
MATLAB parses a character vector from left to right, “consuming” the vector as it goes. If matching
characters are found, regexp records the location and resumes parsing the character vector, starting
just after the end of the most recent match.
Make the same call, but this time for the fifth person in the list:
ans =
2-54
Regular Expressions
You can also search for the email address of everyone in the list by using the entire cell array for the
input argument:
Metacharacters
Metacharacters represent letters, letter ranges, digits, and space characters. Use them to construct a
generalized pattern of characters.
2-55
2 Program Components
Character Representation
Operator Description
\a Alarm (beep)
\b Backspace
\f Form feed
\n New line
\r Carriage return
\t Horizontal tab
\v Vertical tab
\char Any character with special meaning in regular expressions that you want to match literally
(for example, use \\ to match a single backslash)
Quantifiers
Quantifiers specify the number of times a pattern must occur in the matching text.
2-56
Regular Expressions
{0,1} is equivalent to ?.
expr{m,} At least m times consecutively. '<a href="\w{1,}\.html">' matches an
<a> HTML tag when the file name contains one
{0,} and {1,} are equivalent to * and +, or more characters.
respectively.
expr{n} Exactly n times consecutively. '\d{4}' matches four consecutive digits.
Equivalent to {n,n}.
Quantifiers can appear in three modes, described in the following table. q represents any of the
quantifiers in the previous table.
'<tr><td><p>text</p></td>'
exprq? Lazy expression: match as few characters as Given the text'<tr><td><p>text</p></
necessary. td>', the expression '</?t.*?>' ends each
match at the first occurrence of the closing
angle bracket (>):
Grouping Operators
Grouping operators allow you to capture tokens, apply one operator to multiple elements, or disable
backtracking in a specific group.
2-57
2 Program Components
Anchors
Anchors in the expression match the beginning or end of a character vector or word.
Lookaround Assertions
Lookaround assertions look for patterns that immediately precede or follow the intended match, but
are not part of the match.
The pointer remains at the current location, and characters that correspond to the test expression
are not captured or discarded. Therefore, lookahead assertions can match overlapping character
groups.
2-58
Regular Expressions
If you specify a lookahead assertion before an expression, the operation is equivalent to a logical AND.
For more information, see “Lookahead Assertions in Regular Expressions” on page 2-63.
Logical and conditional operators allow you to test the state of a given condition, and then use the
outcome to determine which pattern, if any, to match next. These operators support logical OR and if
or if/else conditions. (For AND conditions, see “Lookaround Assertions” on page 2-58.)
Conditions can be tokens on page 2-60, lookaround assertions on page 2-58, or dynamic expressions
on page 2-60 of the form (?@cmd). Dynamic expressions must return a logical or numeric value.
2-59
2 Program Components
Token Operators
Tokens are portions of the matched text that you define by enclosing part of the regular expression in
parentheses. You can refer to a token by its sequence in the text (an ordinal token), or assign names
to tokens for easier code maintenance and readable output.
Note If an expression has nested parentheses, MATLAB captures tokens that correspond to the
outermost set of parentheses. For example, given the search pattern '(and(y|rew))', MATLAB
creates a token for 'andrew' but not for 'y' or 'rew'.
Dynamic Expressions
Dynamic expressions allow you to execute a MATLAB command or a regular expression to determine
the text to match.
The parentheses that enclose dynamic expressions do not create a capturing group.
2-60
Regular Expressions
Within dynamic expressions, use the following operators to define replacement terms.
Comments
The comment operator enables you to insert comments into your code to make it more maintainable.
The text of the comment is ignored by MATLAB when matching against the input text.
Search Flags
Flag Description
(?-i) Match letter case (default for regexp and regexprep).
(?i) Do not match letter case (default for regexpi).
2-61
2 Program Components
Flag Description
(?s) Match dot (.) in the pattern with any character (default).
(?-s) Match dot in the pattern with any character that is not a newline character.
(?-m) Match the ^ and $ metacharacters at the beginning and end of text (default).
(?m) Match the ^ and $ metacharacters at the beginning and end of a line.
(?-x) Include space characters and comments when matching (default).
(?x) Ignore space characters and comments when matching. Use '\ ' and '\#' to
match space and # characters.
The expression that the flag modifies can appear either after the parentheses, such as
(?i)\w*
or inside the parentheses and separated from the flag with a colon (:), such as
(?i:\w*)
The latter syntax allows you to change the behavior for part of a larger expression.
See Also
regexp | regexpi | regexprep | regexptranslate | pattern
More About
• “Lookahead Assertions in Regular Expressions” on page 2-63
• “Tokens in Regular Expressions” on page 2-66
• “Dynamic Regular Expressions” on page 2-72
• “Search and Replace Text” on page 6-37
2-62
Lookahead Assertions in Regular Expressions
Lookahead Assertions
There are two types of lookaround assertions for regular expressions: lookahead and lookbehind. In
both cases, the assertion is a condition that must be satisfied to return a match to the expression.
A lookahead assertion has the form (?=test) and can appear anywhere in a regular expression.
MATLAB looks ahead of the current location in the text for the test condition. If MATLAB matches the
test condition, it continues processing the rest of the expression to find a match.
For example, look ahead in a character vector specifying a path to find the name of the folder that
contains a program file (in this case, fileread.m).
chr = which('fileread')
chr =
'matlabroot\toolbox\matlab\iofun\fileread.m'
regexp(chr,'\w+(?=\\\w+\.[mp])','match')
ans =
{'iofun'}
The match expression, \w+, searches for one or more alphanumeric or underscore characters. Each
time regexp finds a term that matches this condition, it looks ahead for a backslash (specified with
two backslashes, \\), followed by a file name (\w+) with an .m or .p extension (\.[mp]). The
regexp function returns the match that satisfies the lookahead condition, which is the folder name
iofun.
Overlapping Matches
Lookahead assertions do not consume any characters in the text. As a result, you can use them to find
overlapping character sequences.
For example, use lookahead to find every sequence of six nonwhitespace characters in a character
vector by matching initial characters that precede five additional characters:
startIndex =
1 8 9 16 17 24 25
2-63
2 Program Components
Without the lookahead operator, MATLAB parses a character vector from left to right, consuming the
vector as it goes. If matching characters are found, regexp records the location and resumes parsing
the character vector from the location of the most recent match. There is no overlapping of
characters in this process.
chr = 'Locate several 6-char. phrases';
startIndex = regexpi(chr,'\S{6}')
startIndex =
1 8 16 24
chr =
Merely searching for non-vowels ([^aeiou]) does not return the expected answer, as the output
includes capital letters, space characters, and punctuation:
c = regexp(chr,'[^aeiou]','match')
c =
Columns 1 through 14
{' '} {'N'} {'O'} {'R'} {'M'} {'E'} {'S'} {'T'} {' '} {'E'} {'s
Columns 15 through 28
{' '} {'t'} {'h'} {' '} {'m'} {'t'} {'r'} {'x'} {' '} {'2'} {'-
Columns 29 through 42
{'.'} {'↵'} {' '} {' '} {' '} {' '} {'N'} {'O'} {'R'} {'M'} {'E
Column 43
{'S'}
2-64
Lookahead Assertions in Regular Expressions
Try this again, using a lookahead operator to create the following AND condition:
c = regexp(chr,'(?=[a-z])[^aeiou]','match')
c =
{'s'} {'t'} {'m'} {'t'} {'t'} {'h'} {'m'} {'t'} {'r'} {'x'} {'n
Note that when using a lookahead operator to perform an AND, you need to place the match
expression expr after the test expression test:
(?=test)expr or (?!test)expr
See Also
regexp | regexpi | regexprep
More About
• “Regular Expressions” on page 2-51
2-65
2 Program Components
Introduction
Parentheses used in a regular expression not only group elements of that expression together, but
also designate any matches found for that group as tokens. You can use tokens to match other parts
of the same text. One advantage of using tokens is that they remember what they matched, so you
can recall and reuse matched text in the process of searching or replacing.
Each token in the expression is assigned a number, starting from 1, going from left to right. To make
a reference to a token later in the expression, refer to it using a backslash followed by the token
number. For example, when referencing a token generated by the third set of parentheses in the
expression, use \3.
As a simple example, if you wanted to search for identical sequential letters in a character array, you
could capture the first letter as a token and then search for a matching character immediately
afterwards. In the expression shown below, the (\S) phrase creates a token whenever regexp
matches any nonwhitespace character in the character array. The second part of the expression,
'\1', looks for a second instance of the same character immediately following the first.
poe = ['While I nodded, nearly napping, ' ...
'suddenly there came a tapping,'];
mat =
The cell array tok contains cell arrays that each contain a token.
tok{:}
ans =
{'d'}
ans =
2-66
Tokens in Regular Expressions
{'p'}
ans =
{'d'}
ans =
{'p'}
The cell array ext contains numeric arrays that each contain starting and ending indices for a token.
ext{:}
ans =
11 11
ans =
26 26
ans =
35 35
ans =
57 57
For another example, capture pairs of matching HTML tags (e.g., <a> and </a>) and the text
between them. The expression used for this example is
expr = '<(\w+).*?>.*?</\1>';
The first part of the expression, '<(\w+)', matches an opening angle bracket (<) followed by one or
more alphabetic, numeric, or underscore characters. The enclosing parentheses capture token
characters following the opening angle bracket.
The second part of the expression, '.*?>.*?', matches the remainder of this HTML tag (characters
up to the >), and any characters that may precede the next opening angle bracket.
The last part, '</\1>', matches all characters in the ending HTML tag. This tag is composed of the
sequence </tag>, where tag is whatever characters were captured as a token.
2-67
2 Program Components
ans =
'<a name="752507"></a>'
ans =
'<b>Default</b>'
tok{:}
ans =
{'a'}
ans =
{'b'}
Multiple Tokens
Here is an example of how tokens are assigned values. Suppose that you are going to search the
following text:
You choose to search the above text with the following search pattern:
and(y|rew)|(t)e(d)
This pattern has three parenthetical expressions that generate tokens. When you finally perform the
search, the following tokens are generated for each match.
Only the highest level parentheses are used. For example, if the search pattern and(y|rew) finds the
text andrew, token 1 is assigned the value rew. However, if the search pattern (and(y|rew)) is
used, token 1 is assigned the value andrew.
2-68
Tokens in Regular Expressions
Unmatched Tokens
For those tokens specified in the regular expression that have no match in the text being evaluated,
regexp and regexpi return an empty character vector ('') as the token output, and an extent that
marks the position in the string where the token was expected.
The example shown here executes regexp on a character vector specifying the path returned from
the MATLAB tempdir function. The regular expression expr includes six token specifiers, one for
each piece of the path. The third specifier [a-z]+ has no match in the character vector because this
part of the path, Profiles, begins with an uppercase letter:
chr = tempdir
chr =
'C:\WINNT\Profiles\bpascal\LOCALS~1\Temp\'
When a token is not found in the text, regexp returns an empty character vector ('') as the token
and a numeric array with the token extent. The first number of the extent is the string index that
marks where the token was expected, and the second number of the extent is equal to one less than
the first.
In the case of this example, the empty token is the third specified in the expression, so the third token
returned is empty:
tok{:}
ans =
The third token extent returned in the variable ext has the starting index set to 10, which is where
the nonmatching term, Profiles, begins in the path. The ending extent index is set to one less than
the starting index, or 9:
ext{:}
ans =
1 2
4 8
10 9
19 25
27 34
36 39
2-69
2 Program Components
second, $2, is 'Baker'. Note that regexprep returns the modified text, not a vector of starting
indices.
regexprep('Norma Jean Baker', '(\w+\s\w+)\s(\w+)', '$2, $1')
ans =
Named Capture
If you use a lot of tokens in your expressions, it may be helpful to assign them names rather than
having to keep track of which token number is assigned to which token.
When referencing a named token within the expression, use the syntax \k<name> instead of the
numeric \1, \2, etc.:
poe = ['While I nodded, nearly napping, ' ...
'suddenly there came a tapping,'];
ans =
Named tokens can also be useful in labeling the output from the MATLAB regular expression
functions. This is especially true when you are processing many pieces of text.
For example, parse different parts of street addresses from several character vectors. A short name is
assigned to each token in the expression:
chr1 = '134 Main Street, Boulder, CO, 14923';
chr2 = '26 Walnut Road, Topeka, KA, 25384';
chr3 = '847 Industrial Drive, Elizabeth, NJ, 73548';
p1 = '(?<adrs>\d+\s\S+\s(Road|Street|Avenue|Drive))';
p2 = '(?<city>[A-Z][a-z]+)';
p3 = '(?<state>[A-Z]{2})';
p4 = '(?<zip>\d{5})';
As the following results demonstrate, you can make your output easier to work with by using named
tokens:
loc1 = regexp(chr1, expr, 'names')
loc1 =
2-70
Tokens in Regular Expressions
loc2 =
loc3 =
See Also
regexp | regexpi | regexprep
More About
• “Regular Expressions” on page 2-51
2-71
2 Program Components
In this section...
“Introduction” on page 2-72
“Dynamic Match Expressions — (??expr)” on page 2-73
“Commands That Modify the Match Expression — (??@cmd)” on page 2-73
“Commands That Serve a Functional Purpose — (?@cmd)” on page 2-74
“Commands in Replacement Expressions — ${cmd}” on page 2-76
Introduction
In a dynamic expression, you can make the pattern that you want regexp to match dependent on the
content of the input text. In this way, you can more closely match varying input patterns in the text
being parsed. You can also use dynamic expressions in replacement terms for use with the
regexprep function. This gives you the ability to adapt the replacement text to the parsed input.
You can include any number of dynamic expressions in the match_expr or replace_expr
arguments of these commands:
regexp(text, match_expr)
regexpi(text, match_expr)
regexprep(text, match_expr, replace_expr)
As an example of a dynamic expression, the following regexprep command correctly replaces the
term internationalization with its abbreviated form, i18n. However, to use it on a different
term such as globalization, you have to use a different replacement expression:
match_expr = '(^\w)(\w*)(\w$)';
replace_expr1 = '$118$3';
regexprep('internationalization', match_expr, replace_expr1)
ans =
'i18n'
replace_expr2 = '$111$3';
regexprep('globalization', match_expr, replace_expr2)
ans =
'g11n'
match_expr = '(^\w)(\w*)(\w$)';
replace_expr = '$1${num2str(length($2))}$3';
2-72
Dynamic Regular Expressions
ans =
'i18n'
ans =
'g11n'
When parsed, a dynamic expression must correspond to a complete, valid regular expression. In
addition, dynamic match expressions that use the backslash escape character (\) require two
backslashes: one for the initial parsing of the expression, and one for the complete match. The
parentheses that enclose dynamic expressions do not create a capturing group.
There are three forms of dynamic expressions that you can use in match expressions, and one form
for replacement expressions, as described in the following sections
Here is an example of the type of expression that you can use with this operator:
chr = {'5XXXXX', '8XXXXXXXX', '1X'};
regexp(chr, '^(\d+)(??X{$1})$', 'match', 'once');
The purpose of this particular command is to locate a series of X characters in each of the character
vectors stored in the input cell array. Note however that the number of Xs varies in each character
vector. If the count did not vary, you could use the expression X{n} to indicate that you want to match
n of these characters. But, a constant value of n does not work in this case.
The solution used here is to capture the leading count number (e.g., the 5 in the first character vector
of the cell array) in a token, and then to use that count in a dynamic expression. The dynamic
expression in this example is (??X{$1}), where $1 is the value captured by the token \d+. The
operator {$1} makes a quantifier of that token value. Because the expression is dynamic, the same
pattern works on all three of the input vectors in the cell array. With the first input character vector,
regexp looks for five X characters; with the second, it looks for eight, and with the third, it looks for
just one:
regexp(chr, '^(\d+)(??X{$1})$', 'match', 'once')
ans =
For example, use the dynamic expression (??@flilplr($1)) to locate a palindrome, “Never Odd or
Even”, that has been embedded into a larger character vector.
2-73
2 Program Components
First, create the input string. Make sure that all letters are lowercase, and remove all nonword
characters.
chr = lower(...
'Find the palindrome Never Odd or Even in this string');
chr =
'findthepalindromeneveroddoreveninthisstring'
Locate the palindrome within the character vector using the dynamic expression:
palindrome =
{'neveroddoreven'}
The dynamic expression reverses the order of the letters that make up the character vector, and then
attempts to match as much of the reversed-order vector as possible. This requires a dynamic
expression because the value for $1 relies on the value of the token (.{3,}).
Dynamic expressions in MATLAB have access to the currently active workspace. This means that you
can change any of the functions or variables used in a dynamic expression just by changing variables
in the workspace. Repeat the last command of the example above, but this time define the function to
be called within the expression using a function handle stored in the base workspace:
fun = @fliplr;
palindrome =
{'neveroddoreven'}
The following example parses a word for zero or more characters followed by two identical
characters followed again by zero or more characters:
ans =
2-74
Dynamic Regular Expressions
{'mississippi'}
To track the exact steps that MATLAB takes in determining the match, the example inserts a short
script (?@disp($1)) in the expression to display the characters that finally constitute the match.
Because the example uses greedy quantifiers, MATLAB attempts to match as much of the character
vector as possible. So, even though MATLAB finds a match toward the beginning of the string, it
continues to look for more matches until it arrives at the very end of the string. From there, it backs
up through the letters i then p and the next p, stopping at that point because the match is finally
satisfied:
regexp('mississippi', '\w*(\w)(?@disp($1))\1\w*', 'match')
i
p
p
ans =
{'mississippi'}
Now try the same example again, this time making the first quantifier lazy (*?). Again, MATLAB
makes the same match:
regexp('mississippi', '\w*?(\w)\1\w*', 'match')
ans =
{'mississippi'}
But by inserting a dynamic script, you can see that this time, MATLAB has matched the text quite
differently. In this case, MATLAB uses the very first match it can find, and does not even consider the
rest of the text:
regexp('mississippi', '\w*?(\w)(?@disp($1))\1\w*', 'match')
m
i
s
ans =
{'mississippi'}
To demonstrate how versatile this type of dynamic expression can be, consider the next example that
progressively assembles a cell array as MATLAB iteratively parses the input text. The (?!) operator
found at the end of the expression is actually an empty lookahead operator, and forces a failure at
each iteration. This forced failure is necessary if you want to trace the steps that MATLAB is taking to
resolve the expression.
MATLAB makes a number of passes through the input text, each time trying another combination of
letters to see if a fit better than last match can be found. On any passes in which no matches are
2-75
2 Program Components
found, the test results in an empty character vector. The dynamic script (?@if(~isempty($&)))
serves to omit the empty character vectors from the matches cell array:
matches = {};
expr = ['(Euler\s)?(Cauchy\s)?(Boole)?(?@if(~isempty($&)),' ...
'matches{end+1}=$&;end)(?!)'];
matches
matches =
{'Euler Cauchy Bo…'} {'Euler Cauchy '} {'Euler '} {'Cauchy Boole'} {'Cauchy '}
The operators $& (or the equivalent $0), $`, and $' refer to that part of the input text that is
currently a match, all characters that precede the current match, and all characters to follow the
current match, respectively. These operators are sometimes useful when working with dynamic
expressions, particularly those that employ the (?@cmd) operator.
This example parses the input text looking for the letter g. At each iteration through the text, regexp
compares the current character with g, and not finding it, advances to the next character. The
example tracks the progress of scan through the text by marking the current location being parsed
with a ^ character.
(The $` and $´ operators capture that part of the text that precedes and follows the current parsing
location. You need two single-quotation marks ($'') to express the sequence $´ when it appears
within text.)
chr = 'abcdefghij';
expr = '(?@disp(sprintf(''starting match: [%s^%s]'',$`,$'')))g';
In the regexprep call shown here, the replacement pattern is '${convertMe($1,$2)}'. In this
case, the entire replacement pattern is a dynamic expression:
2-76
Dynamic Regular Expressions
The dynamic expression tells MATLAB to execute a function named convertMe using the two tokens
(\d+\.?\d*) and (\w+), derived from the text being matched, as input arguments in the call to
convertMe. The replacement pattern requires a dynamic expression because the values of $1 and $2
are generated at runtime.
The following example defines the file named convertMe that converts measurements from imperial
units to metric.
function valout = convertMe(valin, units)
switch(units)
case 'inches'
fun = @(in)in .* 2.54;
uout = 'centimeters';
case 'miles'
fun = @(mi)mi .* 1.6093;
uout = 'kilometers';
case 'pounds'
fun = @(lb)lb .* 0.4536;
uout = 'kilograms';
case 'pints'
fun = @(pt)pt .* 0.4731;
uout = 'litres';
case 'ounces'
fun = @(oz)oz .* 28.35;
uout = 'grams';
end
val = fun(str2num(valin));
valout = [num2str(val) ' ' uout];
end
At the command line, call the convertMe function from regexprep, passing in values for the
quantity to be converted and name of the imperial unit:
regexprep('This highway is 125 miles long', ...
'(\d+\.?\d*)\W(\w+)', '${convertMe($1,$2)}')
ans =
ans =
ans =
As with the (??@ ) operator discussed in an earlier section, the ${ } operator has access to
variables in the currently active workspace. The following regexprep command uses the array A
defined in the base workspace:
A = magic(3)
2-77
2 Program Components
A =
8 1 6
3 5 7
4 9 2
ans =
See Also
regexp | regexpi | regexprep
More About
• “Regular Expressions” on page 2-51
2-78
Comma-Separated Lists
Comma-Separated Lists
In this section...
“What Is a Comma-Separated List?” on page 2-79
“Generating a Comma-Separated List” on page 2-79
“Assigning Output from a Comma-Separated List” on page 2-81
“Assigning to a Comma-Separated List” on page 2-81
“How to Use the Comma-Separated Lists” on page 2-82
“Fast Fourier Transform Example” on page 2-84
ans =
ans =
ans =
Such a list, by itself, is not very useful. But when used with large and more complex data structures
like MATLAB structures and cell arrays, the comma-separated list can enable you to simplify your
MATLAB code.
Extracting multiple elements from a cell array yields a comma-separated list. Given a 4-by-6 cell array
as shown here
C = cell(4,6);
for k = 1:24
C{k} = k*2;
end
C
C =
2-79
2 Program Components
C{:,5}
ans =
34
ans =
36
ans =
38
ans =
40
C{1,5},C{2,5},C{3,5},C{4,5}
For structures, extracting a field of the structure that exists across one of its dimensions yields a
comma-separated list.
Start by converting the cell array used above into a 4-by-1 MATLAB structure with six fields: f1
through f6. Read field f5 for all rows and MATLAB returns a comma-separated list:
S = cell2struct(C,{'f1','f2','f3','f4','f5','f6'},2);
S.f5
ans =
34
ans =
36
ans =
38
2-80
Comma-Separated Lists
ans =
40
S(1).f5,S(2).f5,S(3).f5,S(4).f5
C = cell(4,6);
for k = 1:24
C{k} = k*2;
end
[c1,c2,c3,c4,c5,c6] = C{1,1:6};
c5
c5 =
34
If you specify fewer output variables than the number of outputs returned by the expression, MATLAB
assigns the first N outputs to those N variables, and then discards any remaining outputs. In this next
example, MATLAB assigns C{1,1:3} to the variables c1, c2, and c3, and then discards C{1,4:6}:
[c1,c2,c3] = C{1,1:6};
S = cell2struct(C,{'f1','f2','f3','f4','f5','f6'},2);
[sf1,sf2,sf3] = S.f5;
sf3
sf3 =
38
You also can use the deal function for this purpose.
This example uses deal to overwrite each element in a comma-separated list. First create a list.
ans =
31 7
2-81
2 Program Components
ans =
3 78
ans =
10 20
ans =
14 12
This example does the same as the one above, but with a comma-separated list of vectors in a
structure field:
ans =
31 7
ans =
3 78
ans =
10 20
ans =
14 12
2-82
Comma-Separated Lists
The following sections provide examples of using comma-separated lists with cell arrays. Each of
these examples applies to MATLAB structures as well.
Constructing Arrays
You can use a comma-separated list to enter a series of elements when constructing a matrix or array.
Note what happens when you insert a list of elements as opposed to adding the cell itself.
When you specify a list of elements with C{:, 5}, MATLAB inserts the four individual elements:
A = {'Hello',C{:,5},magic(4)}
A =
When you specify the C cell itself, MATLAB inserts the entire cell array:
A = {'Hello',C,magic(4)}
A =
Displaying Arrays
ans =
Hello
ans =
ans =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Concatenation
Putting a comma-separated list inside square brackets extracts the specified elements from the list
and concatenates them:
2-83
2 Program Components
A = [C{:,5:6}]
A =
34 36 38 40 42 44 46 48
When writing the code for a function call, you enter the input arguments as a list with each argument
separated by a comma. If you have these arguments stored in a structure or cell array, then you can
generate all or part of the argument list from the structure or cell array instead. This can be
especially useful when passing in variable numbers of arguments.
X = -pi:pi/10:pi;
Y = tan(sin(X)) - sin(tan(X));
C = cell(2,3);
C{1,1} = 'LineWidth';
C{2,1} = 2;
C{1,2} = 'MarkerEdgeColor';
C{2,2} = 'k';
C{1,3} = 'MarkerFaceColor';
C{2,3} = 'g';
figure
plot(X,Y,'--rs',C{:})
MATLAB functions can also return more than one value to the caller. These values are returned in a
list with each value separated by a comma. Instead of listing each return value, you can use a comma-
separated list with a structure or cell array. This becomes more useful for those functions that have
variable numbers of return values.
C = cell(1,3);
[C{:}] = fileparts('work/mytests/strArrays.mat')
C =
fftshift uses vectors of indices to perform the swap. For the vector shown above, the index [1 2
3 4 5 6] is rearranged to form a new index [4 5 6 1 2 3]. The function then uses this index
vector to reposition the elements. For a multidimensional array, fftshift must construct an index
vector for each dimension. A comma-separated list makes this task much simpler.
2-84
Comma-Separated Lists
function y = fftshift(x)
numDims = ndims(x);
idx = cell(1,numDims);
for k = 1:numDims
m = size(x,k);
p = ceil(m/2);
idx{k} = [p+1:m 1:p];
end
y = x(idx{:});
end
The function stores the index vectors in cell array idx. Building this cell array is relatively simple.
For each of the N dimensions, determine the size of that dimension and find the integer index nearest
the midpoint. Then, construct a vector that swaps the two halves of that dimension.
By using a cell array to store the index vectors and a comma-separated list for the indexing operation,
fftshift shifts arrays of any dimension using just a single operation: y = x(idx{:}). If you were
to use explicit indexing, you would need to write one if statement for each dimension you want the
function to handle:
if ndims(x) == 1
y = x(index1);
else if ndims(x) == 2
y = x(index1,index2);
end
end
Another way to handle this without a comma-separated list would be to loop over each dimension,
converting one dimension at a time and moving data each time. With a comma-separated list, you
move the data just once. A comma-separated list makes it very easy to generalize the swapping
operation to an arbitrary number of dimensions.
2-85
2 Program Components
• MATLAB compiles code the first time you run it to enhance performance for future runs. However,
because code in an eval statement can change at run time, it is not compiled.
• Code within an eval statement can unexpectedly create or assign to a variable already in the
current workspace, overwriting existing data.
• Concatenated character vectors within an eval statement are often difficult to read. Other
language constructs can simplify the syntax in your code.
For many common uses of eval, there are preferred alternate approaches, as shown in the following
examples.
For example, create a cell array that contains 10 elements, where each element is a numeric array:
numArrays = 10;
A = cell(numArrays,1);
for n = 1:numArrays
A{n} = magic(n);
end
Access the data in the cell array by indexing with curly braces. For example, display the fifth element
of A:
A{5}
ans =
17 24 1 8 15
23 5 7 14 16
4 6 13 20 22
2-86
Alternatives to the eval Function
10 12 19 21 3
11 18 25 2 9
The assignment statement A{n} = magic(n) is more elegant and efficient than this call to eval:
eval(['A', int2str(n),' = magic(n)']) % Not recommended
The best practice is to use function syntax, which allows you to pass variables as inputs. For example:
currentFile = 'myfile1.mat';
save(currentFile)
You can construct file names within a loop using the sprintf function (which is usually more
efficient than int2str), and then call the save function without eval. This code creates 10 files in
the current folder:
numFiles = 10;
for n = 1:numFiles
randomData = rand(n);
currentFile = sprintf('myfile%d.mat',n);
save(currentFile,'randomData')
end
• Create function handles with the @ symbol or with the str2func function. For example, run a
function from a list stored in a cell array:
examples = {@odedemo,@sunspots,@fitdemo};
n = input('Select an example (1, 2, or 3): ');
examples{n}()
• Use the feval function. For example, call a plot function (such as plot, bar, or pie) with data
that you specify at run time:
2-87
2 Program Components
If you enter weight at the input prompt, then you can find the minimum weight value with the
following command.
min(dataToUse)
ans =
90
For an additional example, see “Generate Field Names from Variables” on page 11-10.
Error Handling
The preferred method for error handling in MATLAB is to use a try, catch statement. For example:
try
B = A;
catch exception
disp('A is undefined')
end
If your workspace does not contain variable A, then this code returns:
A is undefined
Previous versions of the documentation for the eval function include the syntax
eval(expression,catch_expr). If evaluating the expression input returns an error, then eval
evaluates catch_expr. However, an explicit try/catch is significantly clearer than an implicit
catch in an eval statement. Using the implicit catch is not recommended.
2-88
Classes (Data Types)
89
3
There are 16 fundamental classes in MATLAB. Each of these classes is in the form of a matrix or
array. With the exception of function handles, this matrix or array is a minimum of 0-by-0 in size and
can grow to an n-dimensional array of any size. A function handle is always scalar (1-by-1).
All of the fundamental MATLAB classes are shown in the diagram below:
Numeric classes in the MATLAB software include signed and unsigned integers, and single- and
double-precision floating-point numbers. By default, MATLAB stores all numeric values as double-
precision floating point. (You cannot change the default type and precision.) You can choose to store
any number, or array of numbers, as integers or as single-precision. Integer and single-precision
arrays offer more memory-efficient storage than double-precision.
All numeric types support basic array operations, such as subscripting, reshaping, and mathematical
operations.
You can create two-dimensional double and logical matrices using one of two storage formats:
full or sparse. For matrices with mostly zero-valued elements, a sparse matrix requires a fraction
of the storage space required for an equivalent full matrix. Sparse matrices invoke methods
especially tailored to solve sparse problems.
These classes require different amounts of storage, the smallest being a logical value or 8-bit
integer which requires only 1 byte. It is important to keep this minimum size in mind if you work on
data in files that were written using a precision smaller than 8 bits.
3-2
Fundamental MATLAB Classes
3-3
3 Overview of MATLAB Classes
See Also
More About
• “Valid Combinations of Unlike Classes” on page 15-2
3-4
4
Numeric Classes
Integers
In this section...
“Integer Classes” on page 4-2
“Creating Integer Data” on page 4-2
“Arithmetic Operations on Integer Classes” on page 4-4
“Largest and Smallest Values for Integer Classes” on page 4-4
Integer Classes
MATLAB has four signed and four unsigned integer classes. Signed types enable you to work with
negative integers as well as positive, but cannot represent as wide a range of numbers as the
unsigned types because one bit is used to designate a positive or negative sign for the number.
Unsigned types give you a wider range of numbers, but these numbers can only be zero or positive.
MATLAB supports 1-, 2-, 4-, and 8-byte storage for integer data. You can save memory and execution
time for your programs if you use the smallest integer type that accommodates your data. For
example, you do not need a 32-bit integer to store the value 100.
Here are the eight integer classes, the range of values you can store with each type, and the MATLAB
conversion function required to create that type:
For example, to store 325 as a 16-bit signed integer assigned to variable x, type
x = int16(325);
If the number being converted to an integer has a fractional part, MATLAB rounds to the nearest
integer. If the fractional part is exactly 0.5, then from the two equally nearby integers, MATLAB
chooses the one for which the absolute value is larger in magnitude:
x = 325.499;
int16(x)
4-2
Integers
ans =
int16
325
x = x + .001;
int16(x)
ans =
int16
326
If you need to round a number using a rounding scheme other than the default, MATLAB provides
four rounding functions: round, fix, floor, and ceil. The fix function enables you to override the
default and round towards zero when there is a nonzero fractional part:
x = 325.9;
int16(fix(x))
ans =
int16
325
Arithmetic operations that involve both integers and floating-point always result in an integer data
type. MATLAB rounds the result, when necessary, according to the default rounding algorithm. The
example below yields an exact answer of 1426.75 which MATLAB then rounds to the next highest
integer:
int16(325) * 4.39
ans =
int16
1427
The integer conversion functions are also useful when converting other classes, such as strings, to
integers:
str = 'Hello World';
int8(str)
ans =
If you convert a NaN value into an integer class, the result is a value of 0 in that integer class. For
example,
int32(NaN)
ans =
int32
4-3
4 Numeric Classes
• Integers or integer arrays of the same integer data type. This yields a result that has the same
data type as the operands:
For all binary operations in which one operand is an array of integer data type (except 64-bit
integers) and the other is a scalar double, MATLAB computes the operation using element-wise
double-precision arithmetic, and then converts the result back to the original integer data type. For
binary operations involving a 64-bit integer array and a scalar double, MATLAB computes the
operation as if 80-bit extended-precision arithmetic were used, to prevent loss of precision.
You can also obtain these values with the intmax and intmin functions:
intmax('int8')
ans =
int8
127
intmin('int8')
ans =
int8
-128
If you convert a number that is larger than the maximum value of an integer data type to that type,
MATLAB sets it to the maximum value. Similarly, if you convert a number that is smaller than the
minimum value of the integer data type, MATLAB sets it to the minimum value. For example,
4-4
Integers
x = int8(300)
x =
int8
127
x = int8(-300)
x =
int8
-128
Also, when the result of an arithmetic operation involving integers exceeds the maximum (or
minimum) value of the data type, MATLAB sets it to the maximum (or minimum) value:
x = int8(100) * 3
x =
int8
127
x = int8(-100) * 3
x =
int8
-128
4-5
4 Numeric Classes
Floating-Point Numbers
In this section...
“Double-Precision Floating Point” on page 4-6
“Single-Precision Floating Point” on page 4-6
“Creating Floating-Point Data” on page 4-6
“Arithmetic Operations on Floating-Point Numbers” on page 4-8
“Largest and Smallest Values for Floating-Point Classes” on page 4-9
“Accuracy of Floating-Point Data” on page 4-10
“Avoiding Common Problems with Floating-Point Arithmetic” on page 4-11
Bits Usage
63 Sign (0 = positive, 1 = negative)
62 to 52 Exponent, biased by 1023
51 to 0 Fraction f of the number 1.f
Bits Usage
31 Sign (0 = positive, 1 = negative)
30 to 23 Exponent, biased by 127
22 to 0 Fraction f of the number 1.f
Because MATLAB stores numbers of type single using 32 bits, they require less memory than
numbers of type double, which use 64 bits. However, because they are stored with fewer bits,
numbers of type single are represented to less precision than numbers of type double.
4-6
Floating-Point Numbers
Because the default numeric type for MATLAB is double, you can create a double with a simple
assignment statement:
x = 25.783;
The whos function shows that MATLAB has created a 1-by-1 array of type double for the value you
just stored in x:
whos x
Name Size Bytes Class
x 1x1 8 double
Use isfloat if you just want to verify that x is a floating-point number. This function returns logical
1 (true) if the input is a floating-point number, and logical 0 (false) otherwise:
isfloat(x)
ans =
logical
You can convert other numeric data, characters or strings, and logical data to double precision using
the MATLAB function, double. This example converts a signed integer to double-precision floating
point:
y = int64(-589324077574); % Create a 64-bit integer
Because MATLAB stores numeric data as a double by default, you need to use the single
conversion function to create a single-precision number:
x = single(25.783);
The whos function returns the attributes of variable x in a structure. The bytes field of this structure
shows that when x is stored as a single, it requires just 4 bytes compared with the 8 bytes to store it
as a double:
xAttrib = whos('x');
xAttrib.bytes
ans =
4
You can convert other numeric data, characters or strings, and logical data to single precision using
the single function. This example converts a signed integer to single-precision floating point:
y = int64(-589324077574); % Create a 64-bit integer
4-7
4 Numeric Classes
single
-5.8932e+11
Double-Precision Operations
You can perform basic arithmetic operations with double and any of the following other classes.
When one or more operands is an integer (scalar or array), the double operand must be a scalar. The
result is of type double, except where noted otherwise:
This example performs arithmetic on data of types char and double. The result is of type double:
c = 'uppercase' - 32;
class(c)
ans =
double
char(c)
ans =
UPPERCASE
Single-Precision Operations
You can perform basic arithmetic operations with single and any of the following other classes. The
result is always single:
• single
• double
• char
• logical
In this example, 7.5 defaults to type double, and the result is of type single:
class(x)
ans =
single
4-8
Floating-Point Numbers
The MATLAB functions realmax and realmin return the maximum and minimum values that you
can represent with the double data type:
ans =
The range for double is:
-1.79769e+308 to -2.22507e-308 and
2.22507e-308 to 1.79769e+308
Numbers larger than realmax or smaller than -realmax are assigned the values of positive and
negative infinity, respectively:
realmax + .0001e+308
ans =
Inf
-realmax - .0001e+308
ans =
-Inf
The MATLAB functions realmax and realmin, when called with the argument 'single', return the
maximum and minimum values that you can represent with the single data type:
ans =
The range for single is:
-3.40282e+38 to -1.17549e-38 and
1.17549e-38 to 3.40282e+38
Numbers larger than realmax('single') or smaller than -realmax('single') are assigned the
values of positive and negative infinity, respectively:
realmax('single') + .0001e+038
ans =
single
Inf
-realmax('single') - .0001e+038
ans =
single
4-9
4 Numeric Classes
-Inf
Double-Precision Accuracy
Because there are only a finite number of double-precision numbers, you cannot represent all
numbers in double-precision storage. On any computer, there is a small gap between each double-
precision number and the next larger double-precision number. You can determine the size of this
gap, which limits the precision of your results, using the eps function. For example, to find the
distance between 5 and the next larger double-precision number, enter
format long
eps(5)
ans =
8.881784197001252e-16
This tells you that there are no double-precision numbers between 5 and 5 + eps(5). If a double-
precision computation returns the answer 5, the result is only accurate to within eps(5).
The value of eps(x) depends on x. This example shows that, as x gets larger, so does eps(x):
eps(50)
ans =
7.105427357601002e-15
If you enter eps with no input argument, MATLAB returns the value of eps(1), the distance from 1
to the next larger double-precision number.
Single-Precision Accuracy
Similarly, there are gaps between any two single-precision numbers. If x has type single, eps(x)
returns the distance between x and the next larger single-precision number. For example,
x = single(5);
eps(x)
returns
ans =
single
4.7684e-07
Note that this result is larger than eps(5). Because there are fewer single-precision numbers than
double-precision numbers, the gaps between the single-precision numbers are larger than the gaps
between double-precision numbers. This means that results in single-precision arithmetic are less
precise than in double-precision arithmetic.
4-10
Floating-Point Numbers
For a number x of type double, eps(single(x)) gives you an upper bound for the amount that x is
rounded when you convert it from double to single. For example, when you convert the double-
precision number 3.14 to single, it is rounded by
double(single(3.14) - 3.14)
ans =
1.0490e-07
single
2.3842e-07
The decimal number 4/3 is not exactly representable as a binary fraction. For this reason, the
following calculation does not give zero, but rather reveals the quantity eps.
e = 1 - 3*(4/3 - 1)
e =
2.2204e-16
Similarly, 0.1 is not exactly representable as a binary number. Thus, you get the following
nonintuitive behavior:
a = 0.0;
for i = 1:10
a = a + 0.1;
end
a == 1
ans =
logical
4-11
4 Numeric Classes
logical
There are gaps between floating-point numbers. As the numbers get larger, so do the gaps, as
evidenced by:
(2^53 + 1) - 2^53
ans =
0
Since pi is not really π, it is not surprising that sin(pi) is not exactly zero:
sin(pi)
ans =
1.224646799147353e-16
When subtractions are performed with nearly equal operands, sometimes cancellation can occur
unexpectedly. The following is an example of a cancellation caused by swamping (loss of precision
that makes the addition insignificant).
sqrt(1e-16 + 1) - 1
ans =
0
Some functions in MATLAB, such as expm1 and log1p, may be used to compensate for the effects of
catastrophic cancellation.
Round-off, cancellation, and other traits of floating-point arithmetic combine to produce startling
computations when solving the problems of linear algebra. MATLAB warns that the following matrix A
is ill-conditioned, and therefore the system Ax = b may be sensitive to small perturbations:
A = diag([2 eps]);
b = [2; eps];
y = A\b;
Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate. RCOND = 1.110223e-16.
These are only a few of the examples showing how IEEE floating-point arithmetic affects
computations in MATLAB. Note that all computations performed in IEEE 754 arithmetic are affected,
this includes applications written in C or FORTRAN, as well as MATLAB.
References
[1] Moler, Cleve. “Floating Points.” MATLAB News and Notes. Fall, 1996.
[2] Moler, Cleve. Numerical Computing with MATLAB. Natick, MA: The MathWorks, Inc., 2004.
4-12
Create Complex Numbers
The following statement shows one way of creating a complex value in MATLAB. The variable x is
assigned a complex number with a real part of 2 and an imaginary part of 3:
x = 2 + 3i;
Another way to create a complex number is using the complex function. This function combines two
numeric inputs into a complex output, making the first input real and the second imaginary:
x = rand(3) * 5;
y = rand(3) * -8;
z = complex(x, y)
z =
4.7842 -1.0921i 0.8648 -1.5931i 1.2616 -2.2753i
2.6130 -0.0941i 4.8987 -2.3898i 4.3787 -3.7538i
4.4007 -7.1512i 1.3572 -5.2915i 3.6865 -0.5182i
You can separate a complex number into its real and imaginary parts using the real and imag
functions:
zr = real(z)
zr =
4.7842 0.8648 1.2616
2.6130 4.8987 4.3787
4.4007 1.3572 3.6865
zi = imag(z)
zi =
-1.0921 -1.5931 -2.2753
-0.0941 -2.3898 -3.7538
-7.1512 -5.2915 -0.5182
4-13
4 Numeric Classes
Infinity
MATLAB represents infinity by the special value Inf. Infinity results from operations like division by
zero and overflow, which lead to results too large to represent as conventional floating-point values.
MATLAB also provides a function called Inf that returns the IEEE arithmetic representation for
positive infinity as a double scalar value.
Several examples of statements that return positive or negative infinity in MATLAB are shown here.
x = 1/0 x = 1.e1000
x = x =
Inf Inf
x = exp(1000) x = log(0)
x = x =
Inf -Inf
x = log(0);
isinf(x)
ans =
1
NaN
MATLAB represents values that are not real or complex numbers with a special value called NaN,
which stands for “Not a Number”. Expressions like 0/0 and inf/inf result in NaN, as do any
arithmetic operations involving a NaN:
x = 0/0
x =
NaN
x = NaN;
whos x
Name Size Bytes Class
x 1x1 8 double
The NaN function returns one of the IEEE arithmetic representations for NaN as a double scalar
value. The exact bit-wise hexadecimal representation of this NaN value is,
4-14
Infinity and NaN
format hex
x = NaN
x =
fff8000000000000
Always use the isnan function to verify that the elements in an array are NaN:
isnan(x)
ans =
MATLAB preserves the “Not a Number” status of alternate NaN representations and treats all of the
different representations of NaN equivalently. However, in some special cases (perhaps due to
hardware limitations), MATLAB does not preserve the exact bit pattern of alternate NaN
representations throughout an entire calculation, and instead uses the canonical NaN bit pattern
defined above.
Because two NaNs are not equal to each other, logical operations involving NaN always return false,
except for a test for inequality, (NaN ~= NaN):
NaN ~= NaN
ans =
1
4-15
4 Numeric Classes
Command Operation
whos x Display the data type of x.
xType = class(x); Assign the data type of x to a variable.
isnumeric(x) Determine if x is a numeric type.
isa(x, 'integer') Determine if x is the specified numeric type. (Examples for any
isa(x, 'uint64') integer, unsigned 64-bit integer, any floating point, double precision,
isa(x, 'float') and single precision are shown here).
isa(x, 'double')
isa(x, 'single')
isreal(x) Determine if x is real or complex.
isnan(x) Determine if x is Not a Number (NaN).
isinf(x) Determine if x is infinite.
isfinite(x) Determine if x is finite.
4-16
Display Format for Numeric Values
x = 4/3
x =
1.3333
You can change the display in the Command Window or Editor using the format function.
format long
x
x =
1.333333333333333
Using the format function only sets the format for the current MATLAB session. To set the format for
subsequent sessions, click Preferences on the Home tab in the Environment section. Select
MATLAB > Command Window, and then choose a Numeric format option.
4-17
4 Numeric Classes
The display format only affects how numbers are displayed, not how they are stored in MATLAB.
See Also
format
Related Examples
• “Format Output”
4-18
Integer Arithmetic
Integer Arithmetic
This example shows how to perform arithmetic on integer data representing signals and images.
Load measurement datasets comprising signals from four instruments using 8 and 16-bit A-to-D's
resulting in data saved as int8, int16 and uint16. Time is stored as uint16.
load integersignal
% Look at variables
whos Signal1 Signal2 Signal3 Signal4 Time1
Plot Data
First we will plot two of the signals to see the signal ranges.
4-19
4 Numeric Classes
It is likely that these values would need to be scaled to calculate the actual physical value that the
signal represents e.g. Volts.
Process Data
We can perform standard arithmetic on integers such as +, -, *, and /. Let's say we wished to find the
sum of Signal1 and Signal2.
Now let's plot the sum signal and see where it saturates.
cla;
plot(Time1, SumSig);
hold on
Saturated = (SumSig == intmin('int8')) | (SumSig == intmax('int8')); % Find where it has saturate
plot(Time1(Saturated),SumSig(Saturated),'rd')
grid
hold off
4-20
Integer Arithmetic
Here we see the images are 24-bit color, stored as three planes of uint8 data.
Display Images
cla;
image(street1); % Display image
axis equal
axis off
4-21
4 Numeric Classes
4-22
Integer Arithmetic
Scale an Image
We can scale the image by a double precision constant but keep the image stored as integers. For
example,
duller = 0.5 * street2; % Scale image with a double constant but create an integer
whos duller
subplot(1,2,1);
image(street2);
axis off equal tight
title('Original'); % Display image
subplot(1,2,2);
image(duller);
axis off equal tight
title('Duller'); % Display image
4-23
4 Numeric Classes
We can add the two street images together and plot the ghostly result.
4-24
Integer Arithmetic
4-25
4 Numeric Classes
Ad = [1 2 0; 2 5 -1; 4 10 -1]
Ad = 3×3
1 2 0
2 5 -1
4 10 -1
A = single(Ad); % or A = cast(Ad,'single');
We can also create single precision zeros and ones with their respective functions.
n = 1000;
Z = zeros(n,1,'single');
O = ones(n,1,'single');
whos A Ad O Z n
A 3x3 36 single
Ad 3x3 72 double
O 1000x1 4000 single
Z 1000x1 4000 single
n 1x1 8 double
We can see that some of the variables are of type single and that the variable A (the single precision
version of Ad) takes half the number of bytes of memory to store because singles require just four
bytes (32-bits), whereas doubles require 8 bytes (64-bits).
1 2 4
4-26
Single Precision Math
2 5 10
0 -1 -1
whos B
B 3x3 36 single
5 12 24
12 30 59
24 59 117
C = A .* B % Elementwise arithmetic
1 4 0
4 25 -10
0 -10 1
5 2 -2
-2 -1 1
0 -2 1
1 0 0
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
E = eig(A) % Eigenvalues
4-27
4 Numeric Classes
3.7321
0.2679
1.0000
F = fft(A(:,1)) % FFT
7.0000 + 0.0000i
-2.0000 + 1.7321i
-2.0000 - 1.7321i
12.3171
0.5149
0.1577
1 -5 5 -1
3.7321
1.0000
0.2679
R = conv(P,Q)
4-28
Single Precision Math
Now let's look at a function to compute enough terms in the Fibonacci sequence so the ratio is less
than the correct machine epsilon (eps) for datatype single or double.
ans = 19
ans = 41
fcurrent = ones(dtype);
fnext = fcurrent;
4-29
4 Numeric Classes
goldenMean = (ones(dtype)+sqrt(5))/2;
tol = eps(goldenMean);
nterms = 2;
while abs(fnext/fcurrent - goldenMean) >= tol
nterms = nterms + 1;
temp = fnext;
fnext = fnext + fcurrent;
fcurrent = temp;
end
Notice that we initialize several of our variables, fcurrent, fnext, and goldenMean, with values
that are dependent on the input datatype, and the tolerance tol depends on that type as well. Single
precision requires that we calculate fewer terms than the equivalent double precision calculation.
4-30
5
To apply a single condition, start by creating a 5-by-5 matrix that contains random integers between 1
and 15. Reset the random number generator to the default state for reproducibility.
rng default
A = randi(15,5)
A = 5×5
13 2 3 3 10
14 5 15 7 1
2 9 15 14 13
14 15 8 12 15
10 15 13 15 11
Use the relational less than operator, <, to determine which elements of A are less than 9. Store the
result in B.
B = A < 9
0 1 1 1 0
0 1 0 1 1
1 0 0 0 0
0 0 1 0 0
0 0 0 0 0
The result is a logical matrix. Each value in B represents a logical 1 (true) or logical 0 (false) state
to indicate whether the corresponding element of A fulfills the condition A < 9. For example, A(1,1)
is 13, so B(1,1) is logical 0 (false). However, A(1,2) is 2, so B(1,2) is logical 1 (true).
Although B contains information about which elements in A are less than 9, it doesn’t tell you what
their values are. Rather than comparing the two matrices element by element, you can use B to index
into A.
A(B)
ans = 8×1
2
2
5
3
8
5-2
Find Array Elements That Meet a Condition
3
7
1
The result is a column vector of the elements in A that are less than 9. Since B is a logical matrix, this
operation is called logical indexing. In this case, the logical array being used as an index is the
same size as the other array, but this is not a requirement. For more information, see “Array
Indexing”.
Some problems require information about the locations of the array elements that meet a condition
rather than their actual values. In this example, you can use the find function to locate all of the
elements in A less than 9.
I = find(A < 9)
I = 8×1
3
6
7
11
14
16
17
22
The result is a column vector of linear indices. Each index describes the location of an element in A
that is less than 9, so in practice A(I) returns the same result as A(B). The difference is that A(B)
uses logical indexing, whereas A(I) uses linear indexing.
You can use the logical and, or, and not operators to apply any number of conditions to an array; the
number of conditions is not limited to one or two.
First, use the logical and operator, denoted &, to specify two conditions: the elements must be less
than 9 and greater than 2. Specify the conditions as a logical index to view the elements that
satisfy both conditions.
ans = 5×1
5
3
8
3
7
The result is a list of the elements in A that satisfy both conditions. Be sure to specify each condition
with a separate statement connected by a logical operator. For example, you cannot specify the
conditions above by A(2<A<9), since it evaluates to A(2<A | A<9).
Next, find the elements in A that are less than 9 and even numbered.
5-3
5 The Logical Class
ans = 3×1
2
2
8
The result is a list of all even elements in A that are less than 9. The use of the logical NOT operator,
~, converts the matrix mod(A,2) into a logical matrix, with a value of logical 1 (true) located where
an element is evenly divisible by 2.
Finally, find the elements in A that are less than 9 and even numbered and not equal to 2.
A(A<9 & ~mod(A,2) & A~=2)
ans = 8
The result, 8, is even, less than 9, and not equal to 2. It is the only element in A that satisfies all three
conditions.
Use the find function to get the index of the element equal to 8 that satisfies the conditions.
find(A<9 & ~mod(A,2) & A~=2)
ans = 14
Sometimes it is useful to simultaneously change the values of several existing array elements. Use
logical indexing with a simple assignment statement to replace the values in an array that meet a
condition.
Replace all values in A that are greater than 10 with the number 10.
A(A>10) = 10
A = 5×5
10 2 3 3 10
10 5 10 7 1
2 9 10 10 10
10 10 8 10 10
10 10 10 10 10
Next, replace all values in A that are not equal to 10 with a NaN value.
A(A~=10) = NaN
A = 5×5
5-4
Find Array Elements That Meet a Condition
10 10 10 10 10
Lastly, replace all of the NaN values in A with zeros and apply the logical NOT operator, ~A.
A(isnan(A)) = 0;
C = ~A
0 1 1 1 0
0 1 0 1 1
1 1 0 0 0
0 0 1 0 0
0 0 0 0 0
The resulting matrix has values of logical 1 (true) in place of the NaN values, and logical 0 (false)
in place of the 10s. The logical NOT operation, ~A, converts the numeric array into a logical array
such that A&C returns a matrix of logical 0 (false) values and A|C returns a matrix of logical 1
(true) values.
See Also
nan | Logical Operators: Short Circuit | isnan | find | and | or | xor | not
5-5
5 The Logical Class
The any and all functions are natural extensions of the logical | (OR) and & (AND) operators,
respectively. However, rather than comparing just two elements, the any and all functions compare
all of the elements in a particular dimension of an array. It is as if all of those elements are connected
by & or | operators and the any or all functions evaluate the resulting long logical expressions.
Therefore, unlike the core logical operators, the any and all functions reduce the size of the array
dimension that they operate on so that it has size 1. This enables the reduction of many logical values
into a single logical condition.
First, create a matrix A that contains random integers between 1 and 25. Reset the random number
generator to the default state for reproducibility.
rng default
A = randi(25,5)
A = 5×5
21 3 4 4 17
23 7 25 11 1
4 14 24 23 22
23 24 13 20 24
16 25 21 24 17
Next, use the mod function along with the logical NOT operator, ~, to determine which elements in A
are even.
A = ~mod(A,2)
0 0 1 1 0
0 0 0 0 0
1 1 1 0 1
0 1 0 1 1
1 0 0 1 0
The resulting matrices have values of logical 1 (true) where an element is even, and logical 0
(false) where an element is odd.
Since the any and all functions reduce the dimension that they operate on to size 1, it normally
takes two applications of one of the functions to reduce a 2–D matrix into a single logical condition,
such as any(any(A)). However, if you use the notation A(:) to regard all of the elements of A as a
single column vector, you can use any(A(:)) to get the same logical information without nesting the
function calls.
any(A(:))
5-6
Reduce Logical Arrays to Single Value
ans = logical
1
You can perform logical and relational comparisons within the function call to any or all. This makes
it easy to quickly test an array for a variety of properties.
all(~A(:))
ans = logical
0
Determine whether any main or super diagonal elements in A are even. Since the vectors returned by
diag(A) and diag(A,1) are not the same size, you first need to reduce each diagonal to a single
scalar logical condition before comparing them. You can use the short-circuit OR operator || to
perform the comparison, since if any elements in the first diagonal are even then the entire
expression evaluates to true regardless of what appears on the right-hand side of the operator.
any(diag(A)) || any(diag(A,1))
ans = logical
1
See Also
any | all | and | or | xor | Logical Operators: Short Circuit
5-7
6
You can store any 1-by-n sequence of characters as a string, using the string data type. Starting in
R2017a, enclose text in double quotes to create a string.
str =
"Hello, world"
Though the text "Hello, world" is 12 characters long, str itself is a 1-by-1 string, or string scalar.
You can use a string scalar to specify a file name, plot label, or any other piece of textual information.
n = strlength(str)
n = 12
If the text includes double quotes, use two double quotes within the definition.
str =
"They said, "Welcome!" and waved."
To add text to the end of a string, use the plus operator, +. If a variable can be converted to a string,
then plus converts it and appends it.
fahrenheit = 71;
celsius = (fahrenheit-32)/1.8;
tempText = "temperature is " + celsius + "C"
tempText =
"temperature is 21.6667C"
Starting in R2019a, you can also concatenate text using the append function.
tempText2 =
"Today's temperature is 21.6667C"
The string function can convert different types of inputs, such as numeric, datetime, duration, and
categorical values. For example, convert the output of pi to a string.
ps = string(pi)
ps =
"3.1416"
6-2
Text in String and Character Arrays
You can store multiple pieces of text in a string array. Each element of the array can contain a string
having a different number of characters, without padding.
str = ["Mercury","Gemini","Apollo";...
"Skylab","Skylab B","ISS"]
str is a 2-by-3 string array. You can find the lengths of the strings with the strlength function.
N = strlength(str)
N = 2×3
7 6 6
6 8 3
As of R2018b, string arrays are supported throughout MATLAB and MathWorks® products. Functions
that accept character arrays (and cell arrays of character vectors) as inputs also accept string arrays.
To store a 1-by-n sequence of characters as a character vector, using the char data type, enclose it in
single quotes.
chr =
'Hello, world'
The text 'Hello, world' is 12 characters long, and chr stores it as a 1-by-12 character vector.
whos chr
If the text includes single quotes, use two single quotes within the definition.
chr =
'They said, 'Welcome!' and waved.'
• To specify single pieces of text, such as file names and plot labels.
• To represent data that is encoded using characters. In such cases, you might need easy access to
individual characters.
seq = 'GCTAGAATCC';
6-3
6 Characters and Strings
You can access individual characters or subsets of characters by indexing, just as you would index
into a numeric array.
seq(4:6)
ans =
'AGA'
Concatenate character vector with square brackets, just as you concatenate other types of arrays.
seq2 =
'GCTAGAATCCATTAGAAACC'
Starting in R2019a, you also can concatenate text using append. The append function is
recommended because it treats string arrays, character vectors, and cell arrays of character vectors
consistently.
seq2 = append(seq,'ATTAGAAACC')
seq2 =
'GCTAGAATCCATTAGAAACC'
MATLAB functions that accept string arrays as inputs also accept character vectors and cell arrays of
character vectors.
See Also
string | char | cellstr | strlength | plus | horzcat | append
Related Examples
• “Create String Arrays” on page 6-5
• “Analyze Text Data with String Arrays” on page 6-15
• “Frequently Asked Questions About String Arrays” on page 6-59
• “Update Your Code to Accept Strings” on page 6-64
• “Cell Arrays of Character Vectors” on page 6-12
6-4
Create String Arrays
MATLAB® provides string arrays to store pieces of text. Each element of a string array contains a 1-
by-n sequence of characters.
str =
"Hello, world"
As an alternative, you can convert a character vector to a string using the string function. chr is a
1-by-17 character vector. str is a 1-by-1 string that has the same text as the character vector.
chr = 'Greetings, friend'
chr =
'Greetings, friend'
str = string(chr)
str =
"Greetings, friend"
Create a string array containing multiple strings using the [] operator. str is a 2-by-3 string array
that contains six strings.
str = ["Mercury","Gemini","Apollo";
"Skylab","Skylab B","ISS"]
Find the length of each string in str with the strlength function. Use strlength, not length, to
determine the number of characters in strings.
L = strlength(str)
L = 2×3
7 6 6
6 8 3
As an alternative, you can convert a cell array of character vectors to a string array using the string
function. MATLAB displays strings in string arrays with double quotes, and displays characters
vectors in cell arrays with single quotes.
6-5
6 Characters and Strings
C = {'Mercury','Venus','Earth'}
C = 1x3 cell
{'Mercury'} {'Venus'} {'Earth'}
str = string(C)
In addition to character vectors, you can convert numeric, datetime, duration, and categorical values
to strings using the string function.
ans =
"01-Sep-2021 16:04:06"
Also, you can read text from files into string arrays using the readtable, textscan, and fscanf
functions.
String arrays can contain both empty and missing values. An empty string contains zero characters.
When you display an empty string, the result is a pair of double quotes with nothing between them
(""). The missing string is the string equivalent to NaN for numeric arrays. It indicates where a string
array has missing values. When you display a missing string, the result is <missing>, with no
quotation marks.
Create an empty string array using the strings function. When you call strings with no
arguments, it returns an empty string. Note that the size of str is 1-by-1, not 0-by-0. However, str
contains zero characters.
str = strings
str =
""
Create an empty character vector using single quotes. Note that the size of chr is 0-by-0.
chr = ''
chr =
6-6
Create String Arrays
Create a string array where every element is an empty string. You can preallocate a string array with
the strings function.
str = strings(2,3)
To create a missing string, convert a missing value using the string function. The missing string
displays as <missing>.
str = string(missing)
str =
<missing>
You can create a string array with both empty and missing strings. Use the ismissing function to
determine which elements are strings with missing values. Note that the empty string is not a
missing string.
str(1) = "";
str(2) = "Gemini";
str(3) = string(missing)
ismissing(str)
0 0 1
Compare a missing string to another string. The result is always 0 (false), even when you compare a
missing string to another missing string.
str = string(missing);
str == "Gemini"
ans = logical
0
str == string(missing)
ans = logical
0
String arrays support array operations such as indexing and reshaping. Use array indexing to access
the first row of str and all the columns.
6-7
6 Characters and Strings
str = ["Mercury","Gemini","Apollo";
"Skylab","Skylab B","ISS"];
str(1,:)
str(2,2)
ans =
"Skylab B"
Assign a new string outside the bounds of str. MATLAB expands the array and fills unallocated
elements with missing values.
str(3,4) = "Mir"
You can index into a string array using curly braces, {}, to access characters directly. Use curly
braces when you need to access and modify characters within a string element. Indexing with curly
braces provides compatibility for code that could work with either string arrays or cell arrays of
character vectors. But whenever possible, use string functions to work with the characters in strings.
Access the second element in the second row with curly braces. chr is a character vector, not a
string.
str = ["Mercury","Gemini","Apollo";
"Skylab","Skylab B","ISS"];
chr = str{2,2}
chr =
'Skylab B'
Access the character vector and return the first three characters.
str{2,2}(1:3)
ans =
'Sky'
Find the space characters in a string and replace them with dashes. Use the isspace function to
inspect individual characters within the string. isspace returns a logical vector that contains a true
value wherever there is a space character. Finally, display the modified string element, str(2,2).
TF = isspace(str{2,2})
6-8
Create String Arrays
0 0 0 0 0 0 1 0
str{2,2}(TF) = "-";
str(2,2)
ans =
"Skylab-B"
Note that in this case, you can also replace spaces using the replace function, without resorting to
curly brace indexing.
replace(str(2,2)," ","-")
ans =
"Skylab-B"
Concatenate strings into a string array just as you would concatenate arrays of any other kind.
Transpose str1 and str2. Concatenate them and then vertically concatenate column headings onto
the string array. When you concatenate character vectors into a string array, the character vectors
are automatically converted to strings.
str1 = str1';
str2 = str2';
str = [str1 str2];
str = [["Mission:","Station:"] ; str]
To append text to strings, use the plus operator, +. The plus operator appends text to strings but
does not change the size of a string array.
Append a last name to an array of names. If you append a character vector to strings, then the
character vector is automatically converted to a string.
names = ["Mary";"John";"Elizabeth";"Paul";"Ann"];
names = names + ' Smith'
6-9
6 Characters and Strings
"John Smith"
"Elizabeth Smith"
"Paul Smith"
"Ann Smith"
Append different last names. You can append text to a string array from a string array or from a cell
array of character vectors. When you add nonscalar arrays, they must be the same size.
names = ["Mary";"John";"Elizabeth";"Paul";"Ann"];
lastnames = ["Jones";"Adams";"Young";"Burns";"Spencer"];
names = names + " " + lastnames
Append a missing string. When you append a missing string with the plus operator, the output is a
missing string.
str1 = "Jones";
str2 = string(missing);
str1 + str2
ans =
<missing>
MATLAB provides a rich set of functions to work with string arrays. For example, you can use the
split, join, and sort functions to rearrange the string array names so that the names are in
alphabetical order by last name.
Split names on the space characters. Splitting changes names from a 5-by-1 string array to a 5-by-2
array.
names = ["Mary Jones";"John Adams";"Elizabeth Young";"Paul Burns";"Ann Spencer"];
names = split(names)
Switch the columns of names so that the last names are in the first column. Add a comma after each
last name.
names = [names(:,2) names(:,1)];
names(:,1) = names(:,1) + ','
6-10
Create String Arrays
"Adams," "John"
"Young," "Elizabeth"
"Burns," "Paul"
"Spencer," "Ann"
Join the last and first names. The join function places a space character between the strings it joins.
After the join, names is a 5-by-1 string array.
names = join(names)
names = sort(names)
See Also
string | strings | strlength | ismissing | isspace | plus | split | join | sort
Related Examples
• “Analyze Text Data with String Arrays” on page 6-15
• “Search and Replace Text” on page 6-37
• “Compare Text” on page 6-32
• “Test for Empty Strings and Missing Values” on page 6-20
• “Frequently Asked Questions About String Arrays” on page 6-59
• “Update Your Code to Accept Strings” on page 6-64
6-11
6 Characters and Strings
Note
• As of R2018b, the recommended way to store text is to use string arrays. If you create variables
that have the string data type, store them in string arrays, not cell arrays. For more information,
see “Text in String and Character Arrays” on page 6-2 and “Update Your Code to Accept Strings”
on page 6-64.
• While the phrase cell array of strings frequently has been used to describe such cell arrays, the
phrase is no longer accurate because such a cell array holds character vectors, not strings.
C = 1x5 cell
{'Li'} {'Sanchez'} {'Jones'} {'Yang'} {'Larson'}
The character vectors in C can have different lengths because a cell array does not require that its
contents have the same size. To determine the lengths of the character vectors in C, use the
strlength function.
L = strlength(C)
L = 1×5
2 7 5 4 6
chr =
'Li'
6-12
Cell Arrays of Character Vectors
C = 1x5 cell
{'Yang'} {'Sanchez'} {'Jones'} {'Yang'} {'Larson'}
To refer to a subset of cells, instead of their contents, index using smooth parentheses.
C(1:3)
While you can access the contents of cells by indexing, most functions that accept cell arrays as
inputs operate on the entire cell array. For example, you can use the strcmp function to compare the
contents of C to a character vector. strcmp returns 1 where there is a match and 0 otherwise.
TF = strcmp(C,'Yang')
1 0 0 1 0
num = sum(TF)
num = 2
Use TF as logical indices to return the matches in C. If you index using smooth parentheses, then the
output is a cell array containing only the matches.
M = C(TF)
M = 1x2 cell
{'Yang'} {'Yang'}
You can convert cell arrays of character vectors to string arrays. To convert a cell array of character
vectors, use the string function.
C = {'Li','Sanchez','Jones','Yang','Larson'}
C = 1x5 cell
{'Li'} {'Sanchez'} {'Jones'} {'Yang'} {'Larson'}
str = string(C)
6-13
6 Characters and Strings
In fact, the string function converts any cell array, so long as all of the contents can be converted to
strings.
str2 = string(C2)
See Also
cellstr | char | iscellstr | strcmp | string
More About
• “Text in String and Character Arrays” on page 6-2
• “Access Data in Cell Array” on page 12-5
• “Create String Arrays” on page 6-5
• “Update Your Code to Accept Strings” on page 6-64
• “Frequently Asked Questions About String Arrays” on page 6-59
6-14
Analyze Text Data with String Arrays
Read text from Shakespeare's Sonnets with the fileread function. fileread returns the text as a
1-by-100266 character vector.
sonnets = fileread('sonnets.txt');
sonnets(1:35)
ans =
'THE SONNETS
by William Shakespeare'
Convert the text to a string using the string function. Then, split it on newline characters using the
splitlines function. sonnets becomes a 2625-by-1 string array, where each string contains one
line from the poems. Display the first five lines of sonnets.
sonnets = string(sonnets);
sonnets = splitlines(sonnets);
sonnets(1:5)
To calculate the frequency of the words in sonnets, first clean it by removing empty strings and
punctuation marks. Then reshape it into a string array that contains individual words as elements.
Remove the strings with zero characters ("") from the string array. Compare each element of
sonnets to "", the empty string. Starting in R2017a, you can create strings, including an empty
string, using double quotes. TF is a logical vector that contains a true value wherever sonnets
contains a string with zero characters. Index into sonnets with TF and delete all strings with zero
characters.
TF = (sonnets == "");
sonnets(TF) = [];
sonnets(1:10)
6-15
6 Characters and Strings
Replace some punctuation marks with space characters. For example, replace periods, commas, and
semi-colons. Keep apostrophes because they can be part of some words in the Sonnets, such as
light's.
p = [".","?","!",",",";",":"];
sonnets = replace(sonnets,p," ");
sonnets(1:10)
Strip leading and trailing space characters from each element of sonnets.
sonnets = strip(sonnets);
sonnets(1:10)
Split sonnets into a string array whose elements are individual words. You can use the split
function to split elements of a string array on whitespace characters, or on delimiters that you
specify. However, split requires that every element of a string array must be divisible into an equal
number of new strings. The elements of sonnets have different numbers of spaces, and therefore are
not divisible into equal numbers of strings. To use the split function on sonnets, write a for-loop
that calls split on one element at a time.
Create the empty string array sonnetWords using the strings function. Write a for-loop that splits
each element of sonnets using the split function. Concatenate the output from split onto
sonnetWords. Each element of sonnetWords is an individual word from sonnets.
sonnetWords = strings(0);
for i = 1:length(sonnets)
6-16
Analyze Text Data with String Arrays
Find the unique words in sonnetWords. Count them and sort them based on their frequency.
To count words that differ only by case as the same word, convert sonnetWords to lowercase. For
example, The and the count as the same word. Find the unique words using the unique function.
Then, count the number of times each unique word occurs using the histcounts function.
sonnetWords = lower(sonnetWords);
[words,~,idx] = unique(sonnetWords);
numOccurrences = histcounts(idx,numel(words));
Sort the words in sonnetWords by number of occurrences, from most to least common.
[rankOfOccurrences,rankIndex] = sort(numOccurrences,'descend');
wordsByFrequency = words(rankIndex);
Plot the occurrences of words in the Sonnets from the most to least common words. Zipf's Law states
that the distribution of occurrences of words in a large body text follows a power-law distribution.
loglog(rankOfOccurrences);
xlabel('Rank of word (most to least common)');
ylabel('Number of Occurrences');
6-17
6 Characters and Strings
Calculate the total number of occurrences of each word in sonnetWords. Calculate the number of
occurrences as a percentage of the total number of words, and calculate the cumulative percentage
from most to least common. Write the words and the basic statistics for them to a table.
numOccurrences = numOccurrences(rankIndex);
numOccurrences = numOccurrences';
numWords = length(sonnetWords);
T = table;
T.Words = wordsByFrequency;
6-18
Analyze Text Data with String Arrays
T.NumOccurrences = numOccurrences;
T.PercentOfText = numOccurrences / numWords * 100.0;
T.CumulativePercentOfText = cumsum(numOccurrences) / numWords * 100.0;
T(1:10,:)
ans=10×4 table
Words NumOccurrences PercentOfText CumulativePercentOfText
______ ______________ _____________ _______________________
The most common word in the Sonnets, and, occurs 490 times. Together, the ten most common words
account for 20.163% of the text.
See Also
string | split | join | unique | replace | lower | splitlines | histcounts | strip | sort |
table
Related Examples
• “Create String Arrays” on page 6-5
• “Search and Replace Text” on page 6-37
• “Compare Text” on page 6-32
• “Test for Empty Strings and Missing Values” on page 6-20
6-19
6 Characters and Strings
You can test a string array for empty strings using the == operator.
Starting in R2017a, you can create an empty string using double quotes with nothing between them
(""). Note that the size of str is 1-by-1, not 0-by-0. However, str contains zero characters.
str = ""
str =
""
Create an empty character vector using single quotes. Note that the size of chr is 0-by-0. The
character array chr actually is an empty array, and not just an array with zero characters.
chr = ''
chr =
Create an array of empty strings using the strings function. Each element of the array is a string
with no characters.
str2 = strings(1,3)
if (str == "")
disp 'str has zero characters'
end
Do not use the isempty function to test for empty strings. A string with zero characters still has a
size of 1-by-1. However, you can test if a string array has at least one dimension with a size of zero
using the isempty function.
Create an empty string array using the strings function. To be an empty array, at least one
dimension must have a size of zero.
str = strings(0,3)
6-20
Test for Empty Strings and Missing Values
str =
isempty(str)
ans = logical
1
Test a string array for empty strings. The == operator returns a logical array that is the same size as
the string array.
str = ["Mercury","","Apollo"]
str == ''
0 1 0
Strings always contain the empty string as a substring. In fact, the empty string is always at both the
start and the end of every string. Also, the empty string is always found between any two consecutive
characters in a string.
TF = logical
1
TF = startsWith(str,"")
TF = logical
1
Count the number of characters in str. Then count the number of empty strings in str. The count
function counts empty strings at the beginning and end of str, and between each pair of characters.
Therefore if str has N characters, it also has N+1 empty strings.
str
str =
"Hello, world"
6-21
6 Characters and Strings
strlength(str)
ans = 12
count(str,"")
ans = 13
Replace a substring with the empty string. When you call replace with an empty string, it removes
the substring and replaces it with a string that has zero characters.
replace(str,"world","")
ans =
"Hello, "
Insert a substring after empty strings using the insertAfter function. Because there are empty
strings between each pair of characters, insertAfter inserts substrings between each pair.
insertAfter(str,"","-")
ans =
"-H-e-l-l-o-,- -w-o-r-l-d-"
In general, string functions that replace, erase, extract, or insert substrings allow you to specify
empty strings as the starts and ends of the substrings to modify. When you do so, these functions
operate on the start and end of the string, and between every pair of characters.
You can test a string array for missing values using the ismissing function. The missing string is the
string equivalent to NaN for numeric arrays. It indicates where a string array has missing values. The
missing string displays as <missing>.
To create a missing string, convert a missing value using the string function.
str = string(missing)
str =
<missing>
You can create a string array with both empty and missing strings. Use the ismissing function to
determine which elements are strings with missing values. Note that the empty string is not a
missing string.
str(1) = "";
str(2) = "Gemini";
str(3) = string(missing)
ismissing(str)
0 0 1
6-22
Test for Empty Strings and Missing Values
Compare str to a missing string. The comparison is always 0 (false), even when you compare a
missing string to another missing string.
str == string(missing)
0 0 0
To find missing strings, use the ismissing function. Do not use the == operator.
See Also
string | strings | strlength | ismissing | contains | startsWith | endsWith | erase |
extractBetween | extractBefore | extractAfter | insertAfter | insertBefore | replace |
replaceBetween | eraseBetween | eq | all | any
Related Examples
• “Create String Arrays” on page 6-5
• “Analyze Text Data with String Arrays” on page 6-15
• “Search and Replace Text” on page 6-37
• “Compare Text” on page 6-32
6-23
6 Characters and Strings
Formatting Text
To convert data to text and control its format, you can use formatting operators with common
conversion functions, such as num2str and sprintf. These operators control notation, alignment,
significant digits, and so on. They are similar to those used by the printf function in the C
programming language. Typical uses for formatted text include text for display and output files.
For example, %f converts floating-point values to text using fixed-point notation. Adjust the format by
adding information to the operator, such as %.2f to represent two digits after the decimal mark, or
%12f to represent 12 characters in the output, padding with spaces as needed.
A = pi*ones(1,3);
txt = sprintf('%f | %.2f | %12f', A)
txt =
'3.141593 | 3.14 | 3.141593'
You can combine operators with ordinary text and special characters in a format specifier. For
instance, \n inserts a newline character.
txt =
'Displaying pi:
3.141593
3.14
3.141593'
Functions that support formatting operators are compose, num2str, sprintf, fprintf, and the
error handling functions assert, error, warning, and MException.
Conversion Character
The conversion character specifies the notation of the output. It consists of a single character and
appears last in the format specifier.
Specifier Description
c Single character.
6-24
Formatting Text
Specifier Description
d Decimal notation (signed).
e Exponential notation (using a lowercase e, as in 3.1415e+00).
E Exponential notation (using an uppercase E, as in 3.1415E+00).
f Fixed-point notation.
g The more compact of %e or %f. (Insignificant zeroes do not print.)
G Same as %g, but using an uppercase E.
o Octal notation (unsigned).
s Character vector or string array.
u Decimal notation (unsigned).
x Hexadecimal notation (unsigned, using lowercase letters a–f).
X Hexadecimal notation (unsigned, using uppercase letters A–F).
For example, format the number 46 using different conversion characters to display the number in
decimal, fixed-point, exponential, and hexadecimal formats.
A = 46*ones(1,4);
txt = sprintf('%d %f %e %X', A)
txt =
'46 46.000000 4.600000e+01 2E'
Subtype
The subtype field is a single alphabetic character that immediately precedes the conversion
character. Without the subtype field, the conversion characters %o, %x, %X, and %u treat input data as
integers. To treat input data as floating-point values instead and convert them to octal, decimal, or
hexadecimal representations, use one of following subtype specifiers.
b The input data are double-precision floating-point values rather than unsigned integers. For
example, to print a double-precision value in hexadecimal, use a format like %bx.
t The input data are single-precision floating-point values rather than unsigned integers.
Precision
The precision field in a formatting operator is a nonnegative integer that immediately follows a
period. For example, in the operator %7.3f, the precision is 3. For the %g operator, the precision
indicates the number of significant digits to display. For the %f, %e, and %E operators, the precision
indicates how many digits to display to the right of the decimal point.
txt =
'157.08 1.6e+02 157.079633 157.08'
While you can specify the precision in a formatting operator for input text (for example, in the %s
operator), there is usually no reason to do so. If you specify the precision as p, and p is less than the
number of characters in the input text, then the output contains only the first p characters.
6-25
6 Characters and Strings
Field Width
The field width in a formatting operator is a nonnegative integer that specifies the number of digits or
characters in the output when formatting input values. For example, in the operator %7.3f, the field
width is 7.
Specify different field widths. To show the width for each output, use the | character. By default, the
output text is padded with space characters when the field width is greater than the number of
characters.
txt = sprintf('|%e|%15e|%f|%15f|', pi*50*ones(1,4))
txt =
'|1.570796e+02| 1.570796e+02|157.079633| 157.079633|'
When used on text input, the field width can determine whether to pad the output text with spaces. If
the field width is less than or equal to the number of characters in the input text, then it has no effect.
txt = sprintf('%30s', 'Pad left with spaces')
txt =
' Pad left with spaces'
Flags
Optional flags control additional formatting of the output text. The table describes the characters you
can use as flags.
Right- and left-justify the output. The default behavior is to right-justify the output text.
txt = sprintf('right-justify: %12.2f\nleft-justify: %-12.2f',...
12.3, 12.3)
txt =
'right-justify: 12.30
6-26
Formatting Text
Display a + sign for positive numbers. The default behavior is to omit the leading + sign for positive
numbers.
txt =
'no sign: 12.30
sign: +12.30'
Pad to the left with spaces and zeroes. The default behavior is to pad with spaces.
txt =
'Pad with spaces: 5.20
Pad with zeroes: 000000005.20'
Note You can specify more than one flag in a formatting operator.
Value Identifiers
By default, functions such as sprintf insert values from input arguments into the output text in
sequential order. To process the input arguments in a nonsequential order, specify the order using
numeric identifiers in the format specifier. Specify nonsequential arguments with an integer
immediately following the % sign, followed by a $ sign.
ans = ans =
Special Characters
Special characters can be part of the output text. But because they cannot be entered as ordinary
text, they require specific character sequences to represent them. To insert special characters into
output text, use any of the character sequences in the table.
6-27
6 Characters and Strings
The figure illustrates how the field width and precision settings affect the output of the formatting
functions. In this figure, the zero following the % sign in the formatting operator means to add leading
zeroes to the output text rather than space characters.
6-28
Formatting Text
• If the field width w is greater than p+1+n, then the whole part of the output value is padded to the
left with w-(p+1+n) additional characters. The additional characters are space characters unless
the formatting operator includes the 0 flag. In that case, the additional characters are zeroes.
You can specify the field width and precision using values from a sequential argument list. Use an
asterisk (*) in place of the field width or precision fields of the formatting operator.
For example, format and display three numbers. In each case, use an asterisk to specify that the field
width or precision come from input arguments that follow the format specifier.
txt =
' 123.456780 16.428 3.1416'
The table describes the effects of each formatting operator in the example.
You can mix the two styles. For example, get the field width from the following input argument and
the precision from the format specifier.
txt =
'123.46'
You also can specify field width and precision as values from a nonsequential argument list, using an
alternate syntax shown in the figure. Within the formatting operator, specify the field width and
precision with asterisks that follow numbered identifiers and $ signs. Specify the values of the field
width and precision with input arguments that follow the format specifier.
For example, format and display three numbers. In each case, use a numbered identifier to specify
that the field width or precision come from input arguments that follow the format specifier.
6-29
6 Characters and Strings
txt =
' 123.456780 16.428 3.1416'
The table describes the effect of each formatting operator in the example.
ans = ans =
If your function call provides more input arguments than there are formatting operators in the format
specifier, then the operators are reused. However, only function calls that use sequential ordering
reuse formatting operators. You cannot reuse formatting operators when you use numbered
identifiers.
ans = ans =
'1234' '1'
6-30
Formatting Text
If you use numbered identifiers when the input data is a vector or array, then the output does not
contain formatted data.
ans = ans =
See Also
compose | sprintf | fprintf | num2str
Related Examples
• “Convert Text to Numeric Values” on page 6-49
• “Convert Numeric Values to Text” on page 6-45
6-31
6 Characters and Strings
Compare Text
Compare text in character arrays and string arrays in different ways. String arrays were introduced
in R2016b. You can compare string arrays and character vectors with relational operators and with
the strcmp function. You can sort string arrays using the sort function, just as you would sort
arrays of any other type. MATLAB® also provides functions to inspect characters in pieces of text.
For example, you can determine which characters in a character vector or string array are letters or
space characters.
You can compare string arrays for equality with the relational operators == and ~=. When you
compare string arrays, the output is a logical array that has 1 where the relation is true, and 0 where
it is not true.
Create two string scalars. Starting in R2017a, you can create strings using double quotes.
str1 = "Hello";
str2 = "World";
str1,str2
str1 =
"Hello"
str2 =
"World"
str1 == str2
ans = logical
0
str1 = ["Mercury","Gemini","Apollo";...
"Skylab","Skylab B","International Space Station"];
str2 = "Apollo";
str1 == str2
0 0 1
0 0 0
Compare a string array to a character vector. As long as one of the variables is a string array, you can
make the comparison.
chr = 'Gemini';
TF = (str1 == chr)
0 1 0
6-32
Compare Text
0 0 0
Index into str1 with TF to extract the string elements that matched Gemini. You can use logical
arrays to index into an array.
str1(TF)
ans =
"Gemini"
Compare for inequality using the ~= operator. Index into str1 to extract the elements that do not
match 'Gemini'.
TF = (str1 ~= chr)
1 0 1
1 1 1
str1(TF)
Compare two nonscalar string arrays. When you compare two nonscalar arrays, they must be the
same size.
str2 = ["Mercury","Mars","Apollo";...
"Jupiter","Saturn","Neptune"];
TF = (str1 == str2)
1 0 1
0 0 0
str1(TF)
You can also compare strings with the relational operators >, >=, <, and <=. Strings that start with
uppercase letters come before strings that start with lowercase letters. For example, the string
"ABC" is less than "abc". Digits and some punctuation marks also come before letters.
6-33
6 Characters and Strings
ans = logical
1
Compare a string array that contains names to another name with the > operator. The names
Sanchez, de Ponte, and Nash come after Matthews, because S, d, and N all are greater than M.
1 0 1 0 1
str(TF)
You can sort string arrays. MATLAB® stores characters as Unicode® using the UTF-16 character
encoding scheme. Character and string arrays are sorted according to the UTF-16 code point order.
For the characters that are also the ASCII characters, this order means that uppercase letters come
before lowercase letters. Digits and some punctuation also come before letters.
sort(str)
Sort a 2-by-3 string array. The sort function sorts the elements in each column separately.
sort(str2)
To sort the elements in each row, sort str2 along the second dimension.
sort(str2,2)
6-34
Compare Text
You can compare character vectors and cell arrays of character vectors to each other. Use the
strcmp function to compare two character vectors, or strncmp to compare the first N characters.
You also can use strcmpi and strncmpi for case-insensitive comparisons.
Compare two character vectors with the strcmp function. chr1 and chr2 are not equal.
chr1 = 'hello';
chr2 = 'help';
TF = strcmp(chr1,chr2)
TF = logical
0
Note that the MATLAB strcmp differs from the C version of strcmp. The C version of strcmp
returns 0 when two character arrays are the same, not when they are different.
Compare the first two characters with the strncmp function. TF is 1 because both character vectors
start with the characters he.
TF = strncmp(chr1,chr2,2)
TF = logical
1
Compare two cell arrays of character vectors. strcmp returns a logical array that is the same size as
the cell arrays.
1
0
0
You can inspect the characters in string arrays or character arrays with the isstrprop, isletter,
and isspace functions.
Determine which characters in a character vector are space characters. isspace returns a logical
vector that is the same size as chr.
6-35
6 Characters and Strings
0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0
The isstrprop function can query characters for many different traits. isstrprop can determine
whether characters in a string or character vector are letters, alphanumeric characters, decimal or
hexadecimal digits, or punctuation characters.
Determine which characters in a string are punctuation marks. isstrprop returns a logical vector
whose length is equal to the number of characters in str.
str =
"A horse! A horse! My kingdom for a horse!"
isstrprop(str,"punct")
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
isstrprop(chr,"alpha")
1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 1 1
See Also
strcmp | sort | isstrprop | isletter | isspace | eq | ne | gt | ge | le | lt
Related Examples
• “Text in String and Character Arrays” on page 6-2
• “Create String Arrays” on page 6-5
• “Analyze Text Data with String Arrays” on page 6-15
• “Search and Replace Text” on page 6-37
• “Test for Empty Strings and Missing Values” on page 6-20
6-36
Search and Replace Text
Processing text data often involves finding and replacing substrings. There are several functions that
find text and return different information: some functions confirm that the text exists, while others
count occurrences, find starting indices, or extract substrings. These functions work on character
vectors and string scalars, such as "yes", as well as character and string arrays, such as
["yes","no";"abc","xyz"]. In addition, you can use patterns to define rules for searching, such as
one or more letter or digit characters.
To determine if text is present, use a function that returns logical values, like contains,
startsWith, or endsWith. Logical values of 1 correspond to true, and 0 corresponds to false.
TF = logical
1
Calculate how many times the text occurs using the count function.
n = count(txt,"sea")
n = 2
To locate where the text occurs, use the strfind function, which returns starting indices.
idx = strfind(txt,"sea")
idx = 1×2
11 28
Find and extract text using extraction functions, such as extract, extractBetween,
extractBefore, or extractAfter.
mid = extractBetween(txt,"sea","shore")
mid =
"shells by the sea"
mid = extractBetween(txt,"sea","shore","Boundaries","inclusive")
mid =
"seashells by the seashore"
The search and replacement functions can also find text in multi-element arrays. For example, look
for color names in several song titles.
6-37
6 Characters and Strings
colors =["Red","Yellow","Blue","Black","White"];
TF = contains(songs,colors)
1
0
1
To list the songs that contain color names, use the logical TF array as indices into the original songs
array. This technique is called logical indexing.
colorful = songs(TF)
Use the function replace to replace text in songs that matches elements of colors with the string
"Orange".
replace(songs,colors,"Orange")
Match Patterns
Since R2020b
In addition to searching for literal text, like “sea” or “yellow”, you can search for text that matches a
pattern. There are many predefined patterns, such as digitsPattern to find numeric digits.
address = "123a Sesame Street, New York, NY 10128";
nums = extract(address,digitsPattern)
For additional precision in searches, you can combine patterns. For example, locate words that start
with the character “S”. Use a string to specify the “S” character, and lettersPattern to find
additional letters after that character.
pat = "S" + lettersPattern;
StartWithS = extract(address,pat)
6-38
Search and Replace Text
"Street"
See Also
contains | extract | count | pattern | replace | strfind
Related Examples
• “Text in String and Character Arrays” on page 6-2
• “Build Pattern Expressions” on page 6-40
• “Test for Empty Strings and Missing Values” on page 6-20
• “Regular Expressions” on page 2-51
6-39
6 Characters and Strings
Patterns are a tool to aid in searching for and modifying text. Similar to regular expressions, a
pattern defines rules for matching text. Patterns can be used with text-searching functions like
contains, matches, and extract to specify which portions of text these functions act on. You can
build a pattern expression in a way similar to how you would build a mathematical expression, using
pattern functions, operators, and literal text. Because building pattern expressions is open ended,
patterns can become quite complicated. Building patterns in steps and using functions like
maskedPattern and namedPattern can help organize complicated patterns.
The simplest pattern is built from a single pattern function. For example, lettersPattern matches
any letter characters. There are many pattern functions for matching different types of characters
and other features of text. A list of these functions can be found on the pattern reference page.
txt = "abc123def";
pat = lettersPattern;
extract(txt,pat)
Patterns combine with other patterns and literal text by using the plus(+) operator. This operator
appends patterns and text together in the order they are defined in the pattern expression. The
combined patterns only match text in the same order. In this example, "YYYY/MM/DD" is not a match
because a four-letter string must be at the end of the text.
Patterns used with the or(|) operator specify that only one of the two specified patterns needs to
match a section of text. If neither pattern is able to match then the pattern expression fails to match.
txt = "123abc";
pat = lettersPattern|digitsPattern;
extract(txt,pat)
Some pattern functions take patterns as their input and modify them in some way. For example,
optionalPattern makes a specified pattern match if possible, but the pattern is not required for a
successful match.
6-40
Build Pattern Expressions
Boundary Patterns
Boundary patterns are a special type of pattern that do not match characters but rather match the
boundaries between a designated character type and other characters or the start or end of that
piece of text. For example, digitBoundary matches the boundaries between digit characters and
nondigit characters and between digit characters and the start or end of the text. It does not match
digit characters themselves. Boundary patterns are useful as delimiters for functions like split.
txt = "123abc";
pat = digitBoundary;
split(txt,pat)
Boundary patterns are special amongst patterns because they can be negated using the not(~)
operator. When negated in this way, boundary patterns match before or after characters that did not
satisfy the requirements above. For example, ~digitBoundary matches the boundary between:
Use replace to mark the locations matched by ~digitBoundary with a "|" character.
txt = "123abc";
pat = ~digitBoundary;
replace(txt,pat,"|")
ans =
"1|2|3a|b|c|"
Sometimes a simple pattern is not sufficient to solve a problem and a more complicated pattern is
needed. As a pattern expression grows it can become difficult to understand what it is matching. One
way to simplify building a complicated pattern is building each part of the pattern separately and
then combining the parts together into a single pattern expression.
For instance, email addresses use the form [email protected]. Each of the three identifiers —
local_part, domain, and TLD — must be a combination of digits, letters and underscore characters. To
build the full pattern, start by defining a pattern for the identifiers. Build a pattern that matches one
letter or digit character or one underscore character.
6-41
6 Characters and Strings
identifier = asManyOfPattern(identCharacters,1);
Test the pattern by seeing how well it matches the following example emails.
exampleEmails = ["[email protected]"
"[email protected]"
"[email protected]"];
matches(exampleEmails,emailPattern)
1
0
0
The pattern fails to match several of the example emails even though all the emails are valid. Both the
local_part and domain can be made of a series of identifiers that are separated by periods. Use the
identifier pattern to build a pattern that is capable of matching a series of identifiers.
asManyOfPattern matches as many concurrent appearances of the specified pattern as possible,
but if there are none the rest of the pattern is still able to match successfully.
Use this pattern to build a new emailPattern that can match all of the example emails.
1
1
1
Complex patterns can sometimes be difficult to read and interpret, especially by those you share
them with who are unfamiliar with the pattern's structure. For example, when displayed,
emailPattern is long and difficult to read.
emailPattern
emailPattern = pattern
Matching:
Part of the issue with the display is that there are many repetitions of the identifier pattern. If the
exact details of this pattern are not important to users of the pattern, then the display of the
6-42
Build Pattern Expressions
identifier pattern can be concealed using maskedPattern. This function creates a new pattern
where the display of identifier is masked and the variable name, "identifier", is displayed
instead. Alternatively, you can specify a different name to be displayed. The details of patterns that
are masked in this way can be accessed by clicking "Show all details" in the displayed pattern.
identifier = maskedPattern(identifier);
identifierSeries = asManyOfPattern(identifier + ".") + identifier
identifierSeries = pattern
Matching:
Patterns can be further organized using the namedPattern function. namedPattern designates a
pattern as a named pattern that changes how the pattern is displayed when combined with other
patterns. Email addresses have several important portions, [email protected], which each have
their own matching rules. Create a named pattern for each section.
localPart = namedPattern(identifierSeries,"local_part");
Named patterns can be nested, to further delineate parts of a pattern. To nest a named pattern, build
a pattern using named patterns and then designate that pattern as a named pattern. For example,
Domain.TLD can be divided into the domain, subdomains, and the top level domain (TLD). Create
named patterns for each part of domain.TLD.
subdomain = namedPattern(identifierSeries,"subdomain");
domainName = namedPattern(identifier,"domainName");
tld = namedPattern(identifier,"TLD");
Nest the named patterns for the components of domain underneath a single named pattern domain.
domain = optionalPattern(subdomain + ".") + ...
domainName + "." + ...
tld;
domain = namedPattern(domain);
Combine the patterns together into a single named pattern, emailPattern. In the display of
emailPattern you can see each named pattern and what they match as well as the information on
any nested named patterns.
emailPattern = localPart + "@" + domain
emailPattern = pattern
Matching:
6-43
6 Characters and Strings
You can access named patterns and nested named patterns by dot-indexing into a pattern. For
example, you can access the nested named pattern subdomain by dot-indexing from emailPattern
into domain and then dot-indexing again into subdomain.
emailPattern.domain.subdomain
ans = pattern
Matching:
Dot-assignment can be used to change named patterns without needing to rewrite the rest of the
pattern expression.
emailPattern.domain = "mathworks.com"
emailPattern = pattern
Matching:
See Also
pattern | string | regexp | contains | replace | extract
More About
• “Search and Replace Text” on page 6-37
• “Regular Expressions” on page 2-51
6-44
Convert Numeric Values to Text
Convert to Strings
To convert a number to a string that represents it, use the string function.
str = string(pi)
str =
"3.1416"
The string function converts a numeric array to a string array having the same size.
A = [256 pi 8.9e-3];
str = string(A)
You can specify the format of the output text using the compose function, which accepts format
specifiers for precision, field width, and exponential notation.
str = compose("%9.7f",pi)
str =
"3.1415927"
If the input is a numeric array, then compose returns a string array. Return a string array that
represents numbers using exponential notation.
A = [256 pi 8.9e-3];
str = compose("%5.2e",A)
Before R2016b, convert numbers to character vectors and concatenate characters in brackets, [].
The simplest way to combine text and numbers is to use the plus operator (+). This operator
automatically converts numeric values to strings when the other operands are strings.
For example, plot a sine wave. Calculate the frequency of the wave and add a string representing that
value in the title of the plot.
X = linspace(0,2*pi);
Y = sin(X);
plot(X,Y)
6-45
6 Characters and Strings
freq = 1/(2*pi);
str = "Sine Wave, Frequency = " + freq + " Hz"
str =
"Sine Wave, Frequency = 0.15915 Hz"
title(str)
Sometimes existing text is stored in character vectors or cell arrays of character vectors. However,
the plus operator also automatically converts those types of data to strings when another operand is
a string. To combine numeric values with those types of data, first convert the numeric values to
strings, and then use plus to combine the text.
str =
"Sine Wave, Frequency = 0.15915 Hz"
Character Codes
If your data contains integers that represent Unicode® values, use the char function to convert the
values to the corresponding characters. The output is a character vector or array.
u = [77 65 84 76 65 66];
c = char(u)
c =
'MATLAB'
6-46
Convert Numeric Values to Text
Converting Unicode values also allows you to include special characters in text. For instance, the
Unicode value for the degree symbol is 176. To add char(176) to a string, use plus.
deg = char(176);
temp = 21;
str = "Temperature: " + temp + deg + "C"
str =
"Temperature: 21°C"
Before R2016b, use num2str to convert the numeric value to a character vector, and then
concatenate.
str =
'Temperature: 21°C'
Since R2019b
You can represent hexadecimal and binary values in your code either using text or using literals. The
recommended way to represent them is to write them as literals. You can write hexadecimal and
binary literals using the 0x and 0b prefixes respectively. However, it can sometimes be useful to
represent such values as text, using the dec2hex or dec2bin functions.
For example, set a bit in a binary value. If you specify the binary value using a literal, then it is stored
as an integer. After setting one of the bits, display the new binary value as text using the dec2bin
function.
register = 0b10010110
register = uint8
150
register = bitset(register,5,0)
register = uint8
134
binStr = dec2bin(register)
binStr =
'10000110'
See Also
dec2bin | dec2hex | char | string | compose | plus
More About
• “Convert Text to Numeric Values” on page 6-49
• “Hexadecimal and Binary Values” on page 6-55
• “Convert Between Datetime Arrays, Numbers, and Text” on page 7-42
• “Formatting Text” on page 6-24
6-47
6 Characters and Strings
6-48
Convert Text to Numeric Values
You can convert string arrays, character vectors, and cell arrays of character vectors to numeric
values. Text can represent hexadecimal or binary values, though when you convert them to numbers
they are stored as decimal values. You can also convert text representing dates and time to
datetime or duration values, which can be treated like numeric values.
Double-Precision Values
The recommended way to convert text to double-precision values is to use the str2double function.
It can convert character vectors, string arrays, and cell arrays of character vectors.
For example, create a character vector using single quotes and convert it to the number it represents.
X = str2double('3.1416')
X = 3.1416
If the input argument is a string array or cell array of character vectors, then str2double converts
it to a numeric array having the same size. You can create strings using double quotes. (Strings have
the string data type, while character vectors have the char data type.)
str = ["2.718","3.1416";
"137","0.015"]
X = str2double(str)
X = 2×2
2.7180 3.1416
137.0000 0.0150
The str2double function can convert text that includes commas (as thousands separators) and
decimal points. For example, you can use str2double to convert the Balance variable in the table
below. Balance represents numbers as strings, using a comma as the thousands separator.
load balances
balances
balances=3×2 table
Customer Balance
_________ ___________
"Diaz" "13,790.00"
"Johnson" "2,456.10"
6-49
6 Characters and Strings
"Wu" "923.71"
T.Balance = str2double(T.Balance)
T=3×2 table
Customer Balance
_________ _______
"Diaz" 13790
"Johnson" 2456.1
"Wu" 923.71
While the str2num function can also convert text to numbers, it is not recommended. str2num uses
the eval function, which can cause unintended side effects when the text input includes a function
name. To avoid these issues, use str2double.
As an alternative, you can convert strings to double-precision values using the double function. If the
input is a string array, then double returns a numeric array that has the same size, just as
str2double does. However, if the input is a character vector, then double converts the individual
characters to numbers representing their Unicode values.
X = double("3.1416")
X = 3.1416
X = double('3.1416')
X = 1×6
51 46 49 52 49 54
This list summarizes the best practices for converting text to numeric values.
• To convert text to numeric values, use the str2double function. It treats string arrays, character
vectors, and cell arrays of character vectors consistently.
• You can also use the double function for string arrays. However, it treats character vectors
differently.
• Avoid str2num. It calls the eval function which can have unintended consequences.
You can represent hexadecimal and binary numbers as text or as literals. When you write them as
literals, you must use the 0x and 0b prefixes. When you represent them as text and then convert
them, you can use the prefixes, but they are not required.
D = 0x3FF
D = uint16
1023
6-50
Convert Text to Numeric Values
Then convert text representing the same value by using the hex2dec function. It recognizes the
prefix but does not require it.
D = hex2dec('3FF')
D = 1023
D = hex2dec('0x3FF')
D = 1023
D = bin2dec('101010')
D = 42
D = bin2dec('0b101010')
D = 42
MATLAB provides the datetime and duration data types to store dates and times, and to treat
them as numeric values. To convert text representing dates and times, use the datetime and
duration functions.
Convert text representing a date to a datetime value. The datetime function recognizes many
common formats for dates and times.
C = '2019-09-20'
C =
'2019-09-20'
D = datetime(C)
D = datetime
20-Sep-2019
str = ["2019-01-31","2019-02-28","2019-03-31"]
D = datetime(str)
D = 1x3 datetime
31-Jan-2019 28-Feb-2019 31-Mar-2019
If you convert text to duration values, then use the hh:mm:ss or dd:hh:mm:ss formats.
D = duration('12:34:56')
6-51
6 Characters and Strings
D = duration
12:34:56
See Also
bin2dec | hex2dec | str2double | datetime | duration | double | table
More About
• “Convert Numeric Values to Text” on page 6-45
• “Convert Between Datetime Arrays, Numbers, and Text” on page 7-42
• “Hexadecimal and Binary Values” on page 6-55
• “Formatting Text” on page 6-24
• “Unicode and ASCII Values” on page 6-53
6-52
Unicode and ASCII Values
You can convert characters to integers that represent their Unicode code values. To convert a single
character or a character array, use any of these functions:
• double
• uint16, uint32, or uint64
The best practice is to use the double function. However, if you need to store the numeric values as
integers, use unsigned integers having at least 16 bits because MATLAB uses the UTF-16 encoding.
Convert a character vector to Unicode code values using the double function.
C = 'MATLAB'
C =
'MATLAB'
unicodeValues = double(C)
unicodeValues = 1×6
77 65 84 76 65 66
You cannot convert characters in a string array directly to Unicode code values. In particular, the
double function converts strings to the numbers they represent, just as the str2double function
does. If double cannot convert a string to a number, then it returns a NaN value.
str = "MATLAB";
double(str)
ans = NaN
To convert characters in a string, first convert the string to a character vector, or use curly braces to
extract the characters. Then convert the characters using a function such as double.
C = char(str);
unicodeValues = double(C)
unicodeValues = 1×6
77 65 84 76 65 66
You can convert Unicode values to characters using the char function.
6-53
6 Characters and Strings
D = [77 65 84 76 65 66]
D = 1×6
77 65 84 76 65 66
C = char(D)
C =
'MATLAB'
A typical use for char is to create characters you cannot type and append them to strings. For
example, create the character for the degree symbol and append it to a string. The Unicode code
value for the degree symbol is 176.
deg = char(176)
deg =
'°'
myLabel =
"Current temperature is 21°C"
For more information on Unicode, including mappings between characters and code values, see
Unicode.
See Also
char | double | single | string | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 |
uint64
More About
• “Convert Text to Numeric Values” on page 6-49
• “Convert Numeric Values to Text” on page 6-45
External Websites
• Unicode
6-54
Hexadecimal and Binary Values
• As literals. Starting in R2019b, you can write hexadecimal and binary values as literals using an
appropriate prefix as notation. For example, 0x2A is a literal that specifies 42—and MATLAB
stores it as a number, not as text.
• As strings or character vectors. For example, the character vector '2A' represents the number 42
as a hexadecimal value. When you represent a hexadecimal or binary value using text, enclose it
in quotation marks. MATLAB stores this representation as text, not a number.
MATLAB provides several functions for converting numbers to and from their hexadecimal and binary
representations.
Hexadecimal literals start with a 0x or 0X prefix, while binary literals start with a 0b or 0B prefix.
MATLAB stores the number written with this notation as an integer. For example, these two literals
both represent the integer 42.
A = 0x2A
A = uint8
42
B = 0b101010
B = uint8
42
Do not use quotation marks when you write a number using this notation. Use 0-9, A-F, and a-f to
represent hexadecimal digits. Use 0 and 1 to represent binary digits.
By default, MATLAB stores the number as the smallest unsigned integer type that can accommodate
it. However, you can use an optional suffix to specify the type of integer that stores the value.
• To specify unsigned 8-, 16-, 32-, and 64-bit integer types, use the suffixes u8, u16, u32, and u64.
• To specify signed 8-, 16-, 32-, and 64-bit integer types, use the suffixes s8, s16, s32, and s64.
A = 0x2As32
A = int32
42
When you specify signed integer types, you can write literals that represent negative numbers.
Represent negative numbers in two's complement form. For example, specify a negative number with
a literal using the s8 suffix.
A = 0xFFs8
A = int8
-1
6-55
6 Characters and Strings
Since MATLAB stores these literals as numbers, you can use them in any context or function where
you use numeric arrays.
You can also convert integers to character vectors that represent them as hexadecimal or binary
values using the dec2hex and dec2bin functions. Convert an integer to hexadecimal.
hexStr = dec2hex(255)
hexStr =
'FF'
binStr = dec2bin(16)
binStr =
'10000'
Since these functions produce text, use them when you need text that represents numeric values. For
example, you can append these values to a title or a plot label, or write them to a file that stores
numbers as their hexadecimal or binary representations.
The recommended way to convert an array of numbers to text is to use the compose function. This
function returns a string array having the same size as the input numeric array. To produce
hexadecimal format, use %X as the format specifier.
A = 1×5
hexStr = compose("%X",A)
The dec2hex and dec2bin functions also convert arrays of numbers to text representing them as
hexadecimal or binary values. However, these functions return character arrays, where each row
represents a number from the input numeric array, padded with zeros as necessary.
To convert a binary value to hexadecimal, start with a binary literal, and convert it to text
representing its hexadecimal value. Since a literal is interpreted as a number, you can specify it
directly as the input argument to dec2hex.
D = 0b1111;
hexStr = dec2hex(D)
hexStr =
'F'
6-56
Hexadecimal and Binary Values
If you start with a hexadecimal literal, then you can convert it to text representing its binary value
using dec2bin.
D = 0x8F;
binStr = dec2bin(D)
binStr =
'10001111'
One typical use of binary numbers is to represent bits. For example, many devices have registers that
provide access to a collection of bits representing data in memory or the status of the device. When
working with such hardware you can use numbers in MATLAB to represent the value in a register.
Use binary values and bitwise operations to represent and access particular bits.
Create a number that represents an 8-bit register. It is convenient to start with binary representation,
but the number is stored as an integer.
register = 0b10010110
register = uint8
150
To get or set the values of particular bits, use bitwise operations. For example, use the bitand and
bitshift functions to get the value of the fifth bit. (Shift that bit to the first position so that
MATLAB returns a 0 or 1. In this example, the fifth bit is a 1.)
b5 = bitand(register,0b10000);
b5 = bitshift(b5,-4)
b5 = uint8
1
register = bitset(register,5,0)
register = uint8
134
Since register is an integer, use the dec2bin function to display all the bits in binary format.
binStr is a character vector, and represents the binary value without a leading 0b prefix.
binStr = dec2bin(register)
binStr =
'10000110'
See Also
bin2dec | bitand | bitshift | bitset | dec2bin | dec2hex | hex2dec | sprintf | sscanf
More About
• “Convert Text to Numeric Values” on page 6-49
• “Convert Numeric Values to Text” on page 6-45
6-57
6 Characters and Strings
External Websites
• Two's Complement
6-58
Frequently Asked Questions About String Arrays
In most respects, strings arrays behave like character vectors and cell arrays of character vectors.
However, there are a few key differences between string arrays and character arrays that can lead to
results you might not expect. For each of these differences, there is a recommended way to use
strings that leads to the expected result.
With command syntax, you separate inputs with spaces rather than commas, and you do not enclose
input arguments in parentheses. For example, you can use the cd function with command syntax to
change folders.
cd C:\Temp
The text C:\Temp is a character vector. In command form, all arguments are always character
vectors. If you have an argument, such as a folder name, that contains spaces, then specify it as one
input argument by enclosing it in single quotes.
cd 'C:\Program Files'
But if you specify the argument using double quotes, then cd throws an error.
cd "C:\Program Files"
Error using cd
Too many input arguments.
The error message can vary depending on the function that you use and the arguments that you
specify. For example, if you use the load function with command syntax and specify the argument
using double quotes, then load throws a different error.
load "myVariables.mat"
In command form, double quotes are treated as part of the literal text rather than as the string
construction operator. If you wrote the equivalent of cd "C:\Program Files" in functional form,
then it would look like a call to cd with two arguments.
cd('"C:\Program','Files"')
When specifying arguments as strings, use function syntax. All functions that support command
syntax also support function syntax. For example, you can use cd with function syntax and input
arguments that are double quoted strings.
6-59
6 Characters and Strings
cd("C:\Program Files")
str = ["Venus","Earth","Mars"]
Avoid using cell arrays of strings. When you use cell arrays, you give up the performance advantages
that come from using string arrays. And in fact, most functions do not accept cell arrays of strings as
input arguments, options, or values of name-value pairs. For example, if you specify a cell array of
strings as an input argument, then the contains function throws an error.
C = {"Venus","Earth","Mars"}
TF = contains(C,"Earth")
str = ["Venus","Earth","Mars"];
TF = contains(str,"Earth");
Before R2016b, the term "cell array of strings" meant a cell array whose elements all contain
character vectors. But it is more precise to refer to such cell arrays as "cell arrays of character
vectors," to distinguish them from string arrays.
Cell arrays can contain variables having any data types, including strings. It is still possible to create
a cell array whose elements all contain strings. And if you already have specified cell arrays of
character vectors in your code, then replacing single quotes with double quotes might seem like a
simple update. However, it is not recommended that you create or use cell arrays of strings.
Create a character vector using single quotes. To determine its length, use the length function.
Because C is a vector, its length is equal to the number of characters. C is a 1-by-11 vector.
C = 'Hello world';
L = length(C)
L = 11
6-60
Frequently Asked Questions About String Arrays
Create a string with the same characters, using double quotes. Though it stores 11 characters, str is
a 1-by-1 string array, or string scalar. If you call length on a string scalar, then the output argument is
1, no matter how many characters it stores.
L = 1
To determine the number of characters in a string, use the strlength function, introduced in
R2016b. For compatibility, strlength operates on character vectors as well. In both cases
strlength returns the number of characters.
L = strlength(C)
L = 11
L = strlength(str)
L = 11
You also can use strlength on string arrays containing multiple strings and on cell arrays of
character vectors.
The length function returns the size of the longest dimension of an array. For a string array, length
returns the number of strings along the longest dimension of the array. It does not return the number
of characters within strings.
L = strlength("")
L = 0
However, an empty string is not an empty array. An empty string is a string scalar that happens to
have no characters.
sz = size("")
sz = 1×2
1 1
If you call isempty on an empty string, then it returns 0 (false) because the string is not an empty
array.
tf = isempty("")
tf = logical
0
However, if you call isempty on an empty character array, then it returns 1 (true). A character
array specified as a empty pair of single quotes, '', is a 0-by-0 character array.
tf = isempty('')
6-61
6 Characters and Strings
tf = logical
1
To test whether a piece of text has no characters, the best practice is to use the strlength function.
You can use the same call whether the input is a string scalar or a character vector.
str = "";
if strlength(str) == 0
disp('String has no text')
end
chr = '';
if strlength(chr) == 0
disp('Character vector has no text')
end
For example, if you concatenate two strings, then the result is a 1-by-2 string array.
However, if you concatenate two character vectors, then the result is a longer character vector.
chr = 'HelloWorld'
To append text to a string (or to the elements of a string array), use the plus operator instead of
square brackets.
str = "HelloWorld"
As an alternative, you can use the strcat function. strcat appends text whether the input
arguments are strings or character vectors.
str = strcat("Hello","World")
str = "HelloWorld"
Whether you use square brackets, plus, or strcat, you can specify an arbitrary number of
arguments. Append a space character between Hello and World.
6-62
Frequently Asked Questions About String Arrays
See Also
string | strlength | contains | plus | strcat | sprintf | dir | cd | copyfile | load | length
| size | isempty
Related Examples
• “Create String Arrays” on page 6-5
• “Test for Empty Strings and Missing Values” on page 6-20
• “Compare Text” on page 6-32
• “Update Your Code to Accept Strings” on page 6-64
6-63
6 Characters and Strings
If you write code for other MATLAB users, then it is to your advantage to update your API to accept
string arrays, while maintaining backward compatibility with other text data types. String adoption
makes your code consistent with MathWorks products.
If your code has few dependencies, or if you are developing new code, then consider using string
arrays as your primary text data type for better performance. In that case, best practice is to write or
update your API to accept input arguments that are character vectors, cell arrays of character
vectors, or string arrays.
For the definitions of string array and other terms, see “Terminology for Character and String Arrays”
on page 6-70.
Functions
• If an input argument can be either a character vector or a cell array of character vectors, then
update your code so that the argument also can be a string array. For example, consider a
function that has an input argument you can specify as a character vector (using single
quotes). Best practice is to update the function so that the argument can be specified as either
a character vector or a string scalar (using double quotes).
• Accept strings as both names and values in name-value pair arguments.
• A cell array of string arrays has a string array in each cell. For example, {"hello","world"}
is a cell array of string arrays. While you can create such a cell array, it is not recommended
for storing text. The elements of a string array have the same data type and are stored
6-64
Update Your Code to Accept Strings
efficiently. If you store strings in a cell array, then you lose the advantages of using a string
array.
However, if your code accepts heterogeneous cell arrays as inputs, then consider accepting cell
arrays that contain strings. You can convert any strings in such a cell array to character
vectors.
• In general, do not change the output type.
• If your function returns a character vector or cell array of character vectors, then do not
change the output type, even if the function accepts string arrays as inputs. For example, the
fileread function accepts an input file name specified as either a character vector or a
string, but the function returns the file contents as a character vector. By keeping the output
type the same, you can maintain backward compatibility.
• Return the same data type when the function modifies input text.
• If your function modifies input text and returns the modified text as the output argument, then
the input and output arguments should have the same data type. For example, the lower
function accepts text as the input argument, converts it to all lowercase letters, and returns it.
If the input argument is a character vector, then lower returns a character vector. If the input
is a string array, then lower returns a string array.
• Consider adding a 'TextType' argument to import functions.
• If your function imports data from files, and at least some of that data can be text, then
consider adding an input argument that specifies whether to return text as a character array or
a string array. For example, the readtable function provides the 'TextType' name-value
pair argument. This argument specifies whether readtable returns a table with text in cell
arrays of character vectors or string arrays.
Classes
• For string adoption, treat methods as though they are functions. Accept string arrays as input
arguments, and in general, do not change the data type of the output arguments, as described
in the previous section.
• Do not change the data types of properties.
• If a property is a character vector or a cell array of character vectors, then do not change its
type. When you access such a property, the value that is returned is still a character vector or a
cell array of character vectors.
As an alternative, you can add a new property that is a string, and make it dependent on the
old property to maintain compatibility.
• Set properties using string arrays.
• If you can set a property using a character vector or cell array of character vectors, then
update your class to set that property using a string array too. However, do not change the
data type of the property. Instead, convert the input string array to the data type of the
property, and then set the property.
• Add a string method.
6-65
6 Characters and Strings
• If your class already has a char and/or a cellstr method, then add a string method. If you
can represent an object of your class as a character vector or cell array of character vectors,
then represent it as a string array too.
The convertStringsToChars function provides a way to process all input arguments, converting
only those arguments that are string arrays. To enable your existing code to accept string arrays as
inputs, add a call to convertStringsToChars at the beginnings of your functions and methods.
For example, if you have defined a function myFunc that accepts three input arguments, process all
three inputs using convertStringsToChars. Leave the rest of your code unaltered.
function y = myFunc(a,b,c)
[a,b,c] = convertStringsToChars(a,b,c);
<line 1 of original code>
<line 2 of original code>
...
In this example, the arguments [a,b,c] overwrite the input arguments in place. If any input
argument is not a string array, then it is unaltered.
If myFunc accepts a variable number of input arguments, then process all the arguments specified by
varargin.
function y = myFunc(varargin)
[varargin{:}] = convertStringsToChars(varargin{:});
...
Performance Considerations
The convertStringsToChars function is more efficient when converting one input argument. If
your function is performance sensitive, then you can convert input arguments one at a time, while
still leaving the rest of your code unaltered.
function y = myFunc(a,b,c)
a = convertStringsToChars(a);
b = convertStringsToChars(b);
c = convertStringsToChars(c);
...
6-66
Update Your Code to Accept Strings
Functions
• If an input argument can be a string array, then also allow it to be a character vector or cell
array of character vectors.
• Accept character arrays as both names and values in name-value pair arguments.
• A cell array of string arrays has a string array in each cell. While you can create such a cell
array, it is not recommended for storing text. If your code uses strings as the primary text data
type, store multiple pieces of text in a string array, not a cell array of string arrays.
However, if your code accepts heterogeneous cell arrays as inputs, then consider accepting cell
arrays that contain strings.
• In general, return strings.
• If your function returns output arguments that are text, then return them as string arrays.
• Return the same data type when the function modifies input text.
• If your function modifies input text and returns the modified text as the output argument, then
the input and output arguments should have the same data type.
Classes
• Accept character vectors and cell arrays of character vectors as input arguments, as described
in the previous section. In general, return strings as outputs.
• Specify properties as string arrays.
• If a property contains text, then set the property using a string array. When you access the
property, return the value as a string array.
The convertCharsToStrings function provides a way to process all input arguments, converting
only those arguments that are character vectors or cell arrays of character vectors. To enable your
new code to accept these text data types as inputs, add a call to convertCharsToStrings at the
beginnings of your functions and methods.
For example, if you have defined a function myFunc that accepts three input arguments, process all
three inputs using convertCharsToStrings.
6-67
6 Characters and Strings
function y = myFunc(a,b,c)
[a,b,c] = convertCharsToStrings(a,b,c);
<line 1 of original code>
<line 2 of original code>
...
In this example, the arguments [a,b,c] overwrite the input arguments in place. If any input
argument is not a character vector or cell array of character vectors, then it is unaltered.
If myFunc accepts a variable number of input arguments, then process all the arguments specified by
varargin.
function y = myFunc(varargin)
[varargin{:}] = convertCharsToStrings(varargin{:});
...
Performance Considerations
The convertCharsToStrings function is more efficient when converting one input argument. If
your function is performance sensitive, then you can convert input arguments one at a time, while
still leaving the rest of your code unaltered.
function y = myFunc(a,b,c)
a = convertCharsToStrings(a);
b = convertCharsToStrings(b);
c = convertCharsToStrings(c);
...
If you must convert input arguments, then use the functions in this table.
Conversion Function
String scalar to character vector char
String array to cell array of character vectors cellstr
Character vector to string scalar string
Cell array of character vectors to string array string
6-68
Update Your Code to Accept Strings
An empty string is a string with no characters. MATLAB displays an empty string as a pair of double
quotes with nothing between them (""). However, an empty string is still a 1-by-1 string array. It is
not an empty array.
The recommended way to check whether a string is empty is to use the strlength function.
str = "";
tf = (strlength(str) ~= 0)
Note Do not use the isempty function to check for an empty string. An empty string has no
characters but is still a 1-by-1 string array.
The strlength function returns the length of each string in a string array. If the string must be a
string scalar, and also not empty, then check for both conditions.
If str could be either a character vector or string scalar, then you still can use strlength to
determine its length. strlength returns 0 if the input argument is an empty character vector ('').
An empty string array is, in fact, an empty array—that is, an array that has at least one dimension
whose length is 0.
6-69
6 Characters and Strings
The recommended way to create an empty string array is to use the strings function, specifying 0
as at least one of the input arguments. The isempty function returns 1 when the input is an empty
string array.
str = strings(0);
tf = isempty(str)
The strlength function returns a numeric array that is the same size as the input string array. If the
input is an empty string array, then strlength returns an empty array.
str = strings(0);
L = strlength(str)
String arrays also can contain missing strings. The missing string is the string equivalent to NaN for
numeric arrays. It indicates where a string array has missing values. The missing string displays as
<missing>, with no quotation marks.
You can create missing strings using the missing function. The recommended way to check for
missing strings is to use the ismissing function.
str = string(missing);
tf = ismissing(str)
Note Do not check for missing strings by comparing a string to the missing string.
The missing string is not equal to itself, just as NaN is not equal to itself.
str = string(missing);
f = (str == missing)
6-70
Update Your Code to Accept Strings
See Also
char | cellstr | string | strings | convertStringsToChars | convertCharsToStrings |
isstring | isStringScalar | ischar | iscellstr | strlength | validateattributes |
convertContainedStringsToChars
More About
• “Create String Arrays” on page 6-5
• “Test for Empty Strings and Missing Values” on page 6-20
• “Compare Text” on page 6-32
• “Search and Replace Text” on page 6-37
• “Frequently Asked Questions About String Arrays” on page 6-59
6-71
7
For example, create a MATLAB datetime array that represents two dates: June 28, 2014 at 6 a.m. and
June 28, 2014 at 7 a.m. Specify numeric values for the year, month, day, hour, minute, and second
components for the datetime.
t = datetime(2014,6,28,6:7,0,0)
t =
28-Jun-2014 06:00:00 28-Jun-2014 07:00:00
Change the value of a date or time component by assigning new values to the properties of the
datetime array. For example, change the day number of each datetime by assigning new values to the
Day property.
t.Day = 27:28
t =
Change the display format of the array by changing its Format property. The following format does
not display any time components. However, the values in the datetime array do not change.
t =
Jun 27, 2014 Jun 28, 2014
If you subtract one datetime array from another, the result is a duration array in units of fixed
length.
t2 = datetime(2014,6,29,6,30,45)
t2 =
29-Jun-2014 06:30:45
d = t2 - t
d =
48:30:45 23:30:45
By default, a duration array displays in the format, hours:minutes:seconds. Change the display
format of the duration by changing its Format property. You can display the duration value with a
single unit, such as hours.
d.Format = 'h'
d =
7-2
Represent Dates and Times in MATLAB
You can create a duration in a single unit using the seconds, minutes, hours, days, or years
functions. For example, create a duration of 2 days, where each day is exactly 24 hours.
d = days(2)
d =
2 days
You can create a calendar duration in a single unit of variable length. For example, one month can be
28, 29, 30, or 31 days long. Specify a calendar duration of 2 months.
L = calmonths(2)
L =
2mo
Use the caldays, calweeks, calquarters, and calyears functions to specify calendar durations
in other units.
Add a number of calendar months and calendar days. The number of days remains separate from the
number of months because the number of days in a month is not fixed, and cannot be determined
until you add the calendar duration to a specific datetime.
L = calmonths(2) + caldays(35)
L =
2mo 35d
t2 = t + calmonths(2) + caldays(35)
t2 =
whos t2
In summary, there are several ways to represent dates and times, and MATLAB has a data type for
each approach:
7-3
7 Dates and Time
The calendarDuration data type also accounts for daylight saving time changes and leap years,
so that 1 day might be more or less than 24 hours, and 1 year can have 365 or 366 days.
See Also
datetime | duration | calendarDuration
7-4
Specify Time Zones
You can specify a time zone when you create a datetime, using the 'TimeZone' name-value pair
argument. The time zone value 'local' specifies the system time zone. To display the time zone
offset for each datetime, include a time zone offset specifier such as 'Z' in the value for the
'Format' argument.
t = datetime(2014,3,8:9,6,0,0,'TimeZone','local',...
'Format','d-MMM-y HH:mm:ss Z')
t =
A different time zone offset is displayed depending on whether the datetime occurs during daylight
saving time.
You can modify the time zone of an existing datetime. For example, change the TimeZone property of
t using dot notation. You can specify the time zone value as the name of a time zone region in the
IANA Time Zone Database. A time zone region accounts for the current and historical rules for
standard and daylight offsets from UTC that are observed in that geographic region.
t.TimeZone = 'Asia/Shanghai'
t =
You also can specify the time zone value as a character vector of the form +HH:mm or -HH:mm, which
represents a time zone with a fixed offset from UTC that does not observe daylight saving time.
t.TimeZone = '+08:00'
t =
Operations on datetime arrays with time zones automatically account for time zone differences. For
example, create a datetime in a different time zone.
u = datetime(2014,3,9,6,0,0,'TimeZone','Europe/London',...
'Format','d-MMM-y HH:mm:ss Z')
u =
dt = t - u
7-5
7 Dates and Time
dt =
-19:00:00 04:00:00
When you perform operations involving datetime arrays, the arrays either must all have a time zone
associated with them, or they must all have no time zone.
See Also
datetime | timezones
Related Examples
• “Represent Dates and Times in MATLAB” on page 7-2
• “Convert Date and Time to Julian Date or POSIX Time” on page 7-7
7-6
Convert Date and Time to Julian Date or POSIX Time
While datetime arrays are not required to have a time zone, converting "unzoned" datetime values
to Julian dates or POSIX times can lead to unexpected results. To ensure the expected result, specify
the time zone before conversion.
You can specify a time zone for a datetime array, but you are not required to do so. In fact, by
default the datetime function creates an "unzoned" datetime array.
d = datetime('now')
d = datetime
01-Sep-2021 16:01:28
d is constructed from the local time on your machine and has no time zone associated with it. In many
contexts, you might assume that you can treat the times in an unzoned datetime array as local
times. However, the juliandate and posixtime functions treat the times in unzoned datetime
arrays as UTC times, not local times. To avoid any ambiguity, it is recommended that you avoid using
juliandate and posixtime on unzoned datetime arrays. For example, avoid using
posixtime(datetime('now')) in your code.
If your datetime array has values that do not represent UTC times, specify the time zone using the
TimeZone name-value pair argument so that juliandate and posixtime interpret the datetime
values correctly.
d = datetime('now','TimeZone','America/New_York')
d = datetime
01-Sep-2021 16:01:28
As an alternative, you can specify the TimeZone property after you create the array.
d.TimeZone = 'America/Los_Angeles'
d = datetime
01-Sep-2021 13:01:28
7-7
7 Dates and Time
A Julian date is the number of days (including fractional days) since noon on November 24, 4714
BCE, in the proleptic Gregorian calendar, or January 1, 4713 BCE, in the proleptic Julian calendar. To
convert datetime arrays to Julian dates, use the juliandate function.
DZ = 1x3 datetime
29-Aug-2016 10:05:24 29-Sep-2016 10:05:24 29-Oct-2016 10:05:24
format longG
JDZ = juliandate(DZ)
JDZ = 1×3
Create an unzoned copy of DZ. Convert D to the equivalent Julian dates. As D has no time zone,
juliandate treats the times as UTC times.
D = DZ;
D.TimeZone = '';
JD = juliandate(D)
JD = 1×3
Compare JDZ and JD. The differences are equal to the time zone offset between UTC and the
America/New_York time zone in fractional days.
JDZ - JD
ans = 1×3
The POSIX time is the number of seconds (including fractional seconds) elapsed since 00:00:00 1-
Jan-1970 UTC (Universal Coordinated Time), ignoring leap seconds. To convert datetime arrays to
POSIX times, use the posixtime function.
7-8
Convert Date and Time to Julian Date or POSIX Time
DZ = 1x3 datetime
29-Aug-2016 10:05:24 29-Sep-2016 10:05:24 29-Oct-2016 10:05:24
PTZ = posixtime(DZ)
PTZ = 1×3
Create an unzoned copy of DZ. Convert D to the equivalent POSIX times. As D has no time zone,
posixtime treats the times as UTC times.
D = DZ;
D.TimeZone = '';
PT = posixtime(D)
PT = 1×3
Compare PTZ and PT. The differences are equal to the time zone offset between UTC and the
America/New_York time zone in seconds.
PTZ - PT
ans = 1×3
See Also
datetime | timezones | posixtime | juliandate
Related Examples
• “Represent Dates and Times in MATLAB” on page 7-2
• “Specify Time Zones” on page 7-5
7-9
7 Dates and Time
t.Format = 'default'
Changing the Format property does not change the values in the array, only their display. For
example, the following can be representations of the same datetime value (the latter two do not
display any time components):
The Format property of the datetime, duration, and calendarDuration data types accepts
different formats as inputs.
To change the default formats, see “Default datetime Format” on page 7-12.
Alternatively, you can use the letters A-Z and a-z to specify a custom date format. You can include
nonletter characters such as a hyphen, space, or colon to separate the fields. This table shows several
common display formats and examples of the formatted output for the date, Saturday, April 19, 2014
at 9:41:06 PM in New York City.
7-10
Set Date and Time Display Format
For a complete list of valid symbolic identifiers, see the Format property for datetime arrays.
Note The letter identifiers that datetime accepts are different from those used by the datestr,
datenum, and datevec functions.
To specify the number of fractional digits displayed, use the format function.
To display a duration in the form of a digital timer, specify one of the following character vectors.
• 'dd:hh:mm:ss'
• 'hh:mm:ss'
• 'mm:ss'
• 'hh:mm'
You also can display up to nine fractional second digits by appending up to nine S characters. For
example, 'hh:mm:ss.SSS' displays the milliseconds of a duration value to 3 digits.
Changing the Format property does not change the values in the array, only their display.
7-11
7 Dates and Time
This table describes the date and time components that the characters represent.
To specify the number of digits displayed for fractional seconds, use the format function.
Changing the Format property does not change the values in the array, only their display.
where fmt is a character vector composed of the letters A-Z and a-z described for the Format
property of datetime arrays, above. For example,
datetime.setDefaultFormats('default','yyyy-MM-dd hh:mm:ss')
sets the default datetime format to include a 4-digit year, 2-digit month number, 2-digit day number,
and hour, minute, and second values.
In addition, you can specify a default format for datetimes created without time components. For
example,
datetime.setDefaultFormats('defaultdate','yyyy-MM-dd')
sets the default date format to include a 4-digit year, 2-digit month number, and 2-digit day number.
To reset the both the default format and the default date-only formats to the factory defaults, type
datetime.setDefaultFormats('reset')
You also can set the default formats in the Preferences dialog box. For more information, see “Set
Command Window Preferences”.
7-12
Set Date and Time Display Format
See Also
datetime | duration | calendarDuration | format
7-13
7 Dates and Time
Create a sequence of datetime values starting from November 1, 2013 and ending on November 5,
2013. The default step size is one calendar day.
t1 = datetime(2013,11,1,8,0,0);
t2 = datetime(2013,11,5,8,0,0);
t = t1:t2
t = 1x5 datetime
Columns 1 through 3
Columns 4 through 5
t = t1:caldays(2):t2
t = 1x3 datetime
01-Nov-2013 08:00:00 03-Nov-2013 08:00:00 05-Nov-2013 08:00:00
Specify a step size in units other than days. Create a sequence of datetime values spaced 18 hours
apart.
t = t1:hours(18):t2
t = 1x6 datetime
Columns 1 through 3
Columns 4 through 6
7-14
Generate Sequence of Dates and Time
Use the years, days, minutes, and seconds functions to create datetime and duration sequences
using other fixed-length date and time units. Create a sequence of duration values between 0 and 3
minutes, incremented by 30 seconds.
d = 0:seconds(30):minutes(3)
d = 1x7 duration
0 sec 30 sec 60 sec 90 sec 120 sec 150 sec 180 sec
Assign a time zone to t1 and t2. In the America/New_York time zone, t1 now occurs just before a
daylight saving time change.
t1.TimeZone = 'America/New_York';
t2.TimeZone = 'America/New_York';
If you create the sequence using a step size of one calendar day, then the difference between
successive datetime values is not always 24 hours.
t = t1:t2;
dt = diff(t)
dt = 1x4 duration
24:00:00 25:00:00 24:00:00 24:00:00
t = t1:days(1):t2
t = 1x5 datetime
Columns 1 through 3
Columns 4 through 5
dt = diff(t)
dt = 1x4 duration
24:00:00 24:00:00 24:00:00 24:00:00
If you specify a step size in terms of an integer, it is interpreted as a number of 24-hour days.
t = t1:1:t2
7-15
7 Dates and Time
t = 1x5 datetime
Columns 1 through 3
Columns 4 through 5
t1 = datetime(2013,11,1,8,0,0);
t = t1 + hours(0:2)
t = 1x3 datetime
01-Nov-2013 08:00:00 01-Nov-2013 09:00:00 01-Nov-2013 10:00:00
t = t1 + calmonths(1:5)
t = 1x5 datetime
Columns 1 through 3
Columns 4 through 5
dt = caldiff(t)
dt = 1x4 calendarDuration
1mo 1mo 1mo 1mo
dt = caldiff(t,'days')
dt = 1x4 calendarDuration
31d 31d 28d 31d
7-16
Generate Sequence of Dates and Time
Add a number of calendar months to the date, January 31, 2014, to create a sequence of dates that
fall on the last day of each month.
t = datetime(2014,1,31) + calmonths(0:11)
t = 1x12 datetime
Columns 1 through 5
Columns 6 through 10
Columns 11 through 12
30-Nov-2014 31-Dec-2014
Create a sequence of five equally spaced dates between April 14, 2014 and August 4, 2014. First,
define the endpoints.
A = datetime(2014,04,14);
B = datetime(2014,08,04);
The third input to linspace specifies the number of linearly spaced points to generate between the
endpoints.
C = linspace(A,B,5)
C = 1x5 datetime
14-Apr-2014 12-May-2014 09-Jun-2014 07-Jul-2014 04-Aug-2014
Create a sequence of six equally spaced durations between 1 and 5.5 hours.
A = duration(1,0,0);
B = duration(5,30,0);
C = linspace(A,B,6)
C = 1x6 duration
01:00:00 01:54:00 02:48:00 03:42:00 04:36:00 05:30:00
7-17
7 Dates and Time
Generate a sequence of dates consisting of the next three occurrences of Monday. First, define
today's date.
t1 = datetime('today','Format','dd-MMM-yyyy eee')
t1 = datetime
01-Sep-2021 Wed
The first input to dateshift is always the datetime array from which you want to generate a
sequence. Specify 'dayofweek' as the second input to indicate that the datetime values in the
output sequence must fall on a specific day of the week. You can specify the day of the week either by
number or by name. For example, you can specify Monday either as 2 or 'Monday'.
t = dateshift(t1,'dayofweek',2,1:3)
t = 1x3 datetime
06-Sep-2021 Mon 13-Sep-2021 Mon 20-Sep-2021 Mon
Generate a sequence of start-of-month dates beginning with April 1, 2014. Specify 'start' as the
second input to dateshift to indicate that all datetime values in the output sequence should fall at
the start of a particular unit of time. The third input argument defines the unit of time, in this case,
month. The last input to dateshift can be an array of integer values that specifies how t1 should be
shifted. In this case, 0 corresponds to the start of the current month, and 4 corresponds to the start
of the fourth month from t1.
t1 = datetime(2014,04,01);
t = dateshift(t1,'start','month',0:4)
t = 1x5 datetime
01-Apr-2014 01-May-2014 01-Jun-2014 01-Jul-2014 01-Aug-2014
t1 = datetime(2014,04,01);
t = dateshift(t1,'end','month',0:2)
t = 1x3 datetime
30-Apr-2014 31-May-2014 30-Jun-2014
dt = caldiff(t,'days')
dt = 1x2 calendarDuration
31d 30d
7-18
Generate Sequence of Dates and Time
You can specify other units of time such as week, day, and hour.
t1 = datetime('now')
t1 = datetime
01-Sep-2021 13:42:35
t = dateshift(t1,'start','hour',0:4)
t = 1x5 datetime
Columns 1 through 3
Columns 4 through 5
Generate a sequence of datetime values beginning with the previous hour. Negative integers in the
last input to dateshift correspond to datetime values earlier than t1.
t = dateshift(t1,'start','hour',-1:1)
t = 1x3 datetime
01-Sep-2021 12:00:00 01-Sep-2021 13:00:00 01-Sep-2021 14:00:00
See Also
dateshift | linspace
7-19
7 Dates and Time
Create language-independent datetime values. That is, create datetime values that use month
numbers rather than month names, such as 01 instead of January. Avoid using day of week names.
t = datetime
2021-09-01
instead of this:
t = datetime('today','Format','eeee, dd-MMM-yyyy')
t = datetime
Wednesday, 01-Sep-2021
Display the hour using 24-hour clock notation rather than 12-hour clock notation. Use the 'HH'
identifiers when specifying the display format for datetime values.
t = datetime
15:32
instead of this:
t = datetime('now','Format','hh:mm a')
t = datetime
03:32 PM
When specifying the display format for time zone information, use the Z or X identifiers instead of the
lowercase z to avoid the creation of time zone names that might not be recognized in other languages
or regions.
7-20
Share Code and Data Across Locales
t.TimeZone = 'America/New_York';
t = datetime
01-09-2021 -0400
If you share files but not code, you do not need to write locale-independent code while you work in
MATLAB. However, when you write to a file, ensure that any text representing dates and times is
language-independent. Then, other MATLAB users can read the files easily without having to specify
a locale in which to interpret date and time data.
t = [datetime('today');datetime('tomorrow')]
t = 2x1 datetime
01-Sep-2021
02-Sep-2021
S = 2x1 cell
{'01. September 2021'}
{'02. September 2021'}
S is a cell array of character vectors representing dates in German. You can export S to a text file to
use with systems in the de_DE locale.
• When reading text files using the textscan function, specify the file encoding when opening the
file with fopen. The encoding is the fourth input argument to fopen.
• When reading text files using the readtable function, use the FileEncoding name-value pair
argument to specify the character encoding associated with the file.
7-21
7 Dates and Time
See Also
datetime | char | cellstr | readtable | textscan
7-22
Extract or Assign Date and Time Components of Datetime Array
t = 1x3 datetime
02-Sep-2021 10:44:38 03-Oct-2022 06:44:38 04-Nov-2023 02:44:38
Get the year values of each datetime in the array. Use dot notation to access the Year property of t.
t_years = t.Year
t_years = 1×3
Get the month values of each datetime in t by accessing the Month property.
t_months = t.Month
t_months = 1×3
9 10 11
You can retrieve the day, hour, minute, and second components of each datetime in t by accessing the
Hour, Minute, and Second properties, respectively.
Use the month function to get the month number for each datetime in t. Using functions is an
alternate way to retrieve specific date or time components of t.
m = month(t)
m = 1×3
9 10 11
Use the month function rather than the Month property to get the full month names of each datetime
in t.
m = month(t,'name')
7-23
7 Dates and Time
m = 1x3 cell
{'September'} {'October'} {'November'}
You can retrieve the year, quarter, week, day, hour, minute, and second components of each datetime
in t using the year, quarter, week, hour, minute, and second functions, respectively.
w = week(t)
w = 1×3
36 41 44
Use the ymd function to get the year, month, and day values of t as three separate numeric arrays.
[y,m,d] = ymd(t)
y = 1×3
m = 1×3
9 10 11
d = 1×3
2 3 4
Use the hms function to get the hour, minute, and second values of t as three separate numeric
arrays.
[h,m,s] = hms(t)
h = 1×3
10 6 2
m = 1×3
44 44 44
s = 1×3
7-24
Extract or Assign Date and Time Components of Datetime Array
Assign new values to components in an existing datetime array by modifying the properties of the
array. Use dot notation to access a specific property.
Change the year number of all datetime values in t to 2014. Use dot notation to modify the Year
property.
t.Year = 2014
t = 1x3 datetime
02-Sep-2014 10:44:38 03-Oct-2014 06:44:38 04-Nov-2014 02:44:38
Change the months of the three datetime values in t to January, February, and March, respectively.
You must specify the new value as a numeric array.
t.Month = [1,2,3]
t = 1x3 datetime
02-Jan-2014 10:44:38 03-Feb-2014 06:44:38 04-Mar-2014 02:44:38
t.TimeZone = 'Europe/Berlin';
Change the display format of t to display only the date, and not the time information.
t.Format = 'dd-MMM-yyyy'
t = 1x3 datetime
02-Jan-2014 03-Feb-2014 04-Mar-2014
If you assign values to a datetime component that are outside the conventional range, MATLAB®
normalizes the components. The conventional range for day of month numbers is from 1 to 31. Assign
day values that exceed this range.
t = 1x3 datetime
30-Dec-2013 01-Feb-2014 01-Apr-2014
The month and year numbers adjust so that all values remain within the conventional range for each
date component. In this case, January -1, 2014 converts to December 30, 2013.
See Also
datetime | ymd | hms | week
7-25
7 Dates and Time
Create a space-delimited text file named schedule.txt that contains the following (to create the
file, use any text editor, and copy and paste):
Date Name Time
10.03.2015 Joe 14:31
10.03.2015 Bob 15:33
11.03.2015 Bob 11:29
12.03.2015 Kim 12:09
12.03.2015 Joe 13:05
Read the file using the readtable function. Use the %D conversion specifier to read the first and
third columns of data as datetime values.
T = readtable('schedule.txt','Format','%{dd.MM.uuuu}D %s %{HH:mm}D','Delimiter',' ')
T =
Date Name Time
__________ _____ _____
10.03.2015 'Joe' 14:31
10.03.2015 'Bob' 15:33
11.03.2015 'Bob' 11:29
12.03.2015 'Kim' 12:09
12.03.2015 'Joe' 13:05
Change the display format for the T.Date and T.Time variables to view both date and time
information. Since the data in the first column of the file ("Date") have no time information, the time
of the resulting datetime values in T.Date default to midnight. Since the data in the third column of
the file ("Time") have no associated date, the date of the datetime values in T.Time defaults to the
current date.
T.Date.Format = 'dd.MM.uuuu HH:mm';
T.Time.Format = 'dd.MM.uuuu HH:mm';
T
T =
Date Name Time
________________ _____ ________________
10.03.2015 00:00 'Joe' 12.12.2014 14:31
10.03.2015 00:00 'Bob' 12.12.2014 15:33
11.03.2015 00:00 'Bob' 12.12.2014 11:29
12.03.2015 00:00 'Kim' 12.12.2014 12:09
12.03.2015 00:00 'Joe' 12.12.2014 13:05
Combine the date and time information from two different table variables by adding T.Date and the
time values in T.Time. Extract the time information from T.Time using the timeofday function.
myDatetime = T.Date + timeofday(T.Time)
myDatetime =
10.03.2015 14:31
10.03.2015 15:33
7-26
Combine Date and Time from Separate Variables
11.03.2015 11:29
12.03.2015 12:09
12.03.2015 13:05
See Also
readtable | timeofday
7-27
7 Dates and Time
Create a datetime scalar. By default, datetime arrays are not associated with a time zone.
t1 = datetime('now')
t1 = datetime
01-Sep-2021 13:47:20
t2 = 1x3 datetime
01-Sep-2021 14:47:20 01-Sep-2021 15:47:20 01-Sep-2021 16:47:20
Verify that the difference between each pair of datetime values in t2 is 1 hour.
dt = diff(t2)
dt = 1x2 duration
01:00:00 01:00:00
diff returns durations in terms of exact numbers of hours, minutes, and seconds.
t2 = 1x3 datetime
01-Sep-2021 13:27:20 01-Sep-2021 13:17:20 01-Sep-2021 13:07:20
Add a numeric array to a datetime array. MATLAB® treats each value in the numeric array as a
number of exact, 24-hour days.
t2 = t1 + [1:3]
t2 = 1x3 datetime
02-Sep-2021 13:47:20 03-Sep-2021 13:47:20 04-Sep-2021 13:47:20
If you work with datetime values in different time zones, or if you want to account for daylight saving
time changes, work with datetime arrays that are associated with time zones. Create a datetime
scalar representing March 8, 2014 in New York.
t1 = datetime(2014,3,8,0,0,0,'TimeZone','America/New_York')
7-28
Date and Time Arithmetic
t1 = datetime
08-Mar-2014
t2 = t1 + days(0:2)
t2 = 1x3 datetime
08-Mar-2014 00:00:00 09-Mar-2014 00:00:00 10-Mar-2014 01:00:00
Because a daylight saving time shift occurred on March 9, 2014, the third datetime in t2 does not
occur at midnight.
Verify that the difference between each pair of datetime values in t2 is 24 hours.
dt = diff(t2)
dt = 1x2 duration
24:00:00 24:00:00
You can add fixed-length durations in other units such as years, hours, minutes, and seconds by
adding the outputs of the years, hours, minutes, and seconds functions, respectively.
To account for daylight saving time changes, you should work with calendar durations instead of
durations. Calendar durations account for daylight saving time shifts when they are added to or
subtracted from datetime values.
t3 = t1 + caldays(0:2)
t3 = 1x3 datetime
08-Mar-2014 09-Mar-2014 10-Mar-2014
View that the difference between each pair of datetime values in t3 is not always 24 hours due to the
daylight saving time shift that occurred on March 9.
dt = diff(t3)
dt = 1x2 duration
24:00:00 23:00:00
t1 = datetime(2014,1,31)
t1 = datetime
31-Jan-2014
t2 = t1 + calmonths(1:4)
7-29
7 Dates and Time
t2 = 1x4 datetime
28-Feb-2014 31-Mar-2014 30-Apr-2014 31-May-2014
Calculate the difference between each pair of datetime values in t2 in terms of a number of calendar
days using the caldiff function.
dt = caldiff(t2,'days')
dt = 1x3 calendarDuration
31d 30d 31d
The number of days between successive pairs of datetime values in dt is not always the same
because different months consist of a different number of days.
t2 = t1 + calyears(0:4)
t2 = 1x5 datetime
31-Jan-2014 31-Jan-2015 31-Jan-2016 31-Jan-2017 31-Jan-2018
Calculate the difference between each pair of datetime values in t2 in terms of a number of calendar
days using the caldiff function.
dt = caldiff(t2,'days')
dt = 1x4 calendarDuration
365d 365d 366d 365d
The number of days between successive pairs of datetime values in dt is not always the same
because 2016 is a leap year and has 366 days.
You can use the calquarters, calweeks, and caldays functions to create arrays of calendar
quarters, calendar weeks, or calendar days that you add to or subtract from datetime arrays.
Adding calendar durations is not commutative. When you add more than one calendarDuration
array to a datetime, MATLAB® adds them in the order in which they appear in the command.
t2 = datetime
30-May-2014
First add 30 calendar days to the same date, and then add 3 calendar months. The result is not the
same because when you add a calendar duration to a datetime, the number of days added depends on
the original date.
7-30
Date and Time Arithmetic
t2 = datetime
02-Jun-2014
d1 = calendarDuration
1y 2mo 20d
d2 = calmonths(11) + caldays(23)
d2 = calendarDuration
11mo 23d
d = d1 + d2
d = calendarDuration
2y 1mo 43d
When you sum two or more calendar durations, a number of months greater than 12 roll over to a
number of years. However, a large number of days does not roll over to a number of months, because
different months consist of different numbers of days.
Increase d by multiplying it by a factor of 2. Calendar duration values must be integers, so you can
multiply them only by integer values.
2*d
ans = calendarDuration
4y 2mo 86d
Subtract one datetime array from another to calculate elapsed time in terms of an exact number of
hours, minutes, and seconds.
Find the exact length of time between a sequence of datetime values and the start of the previous
day.
t2 = datetime('now') + caldays(1:3)
t2 = 1x3 datetime
02-Sep-2021 13:47:22 03-Sep-2021 13:47:22 04-Sep-2021 13:47:22
t1 = datetime('yesterday')
t1 = datetime
31-Aug-2021
dt = t2 - t1
7-31
7 Dates and Time
dt = 1x3 duration
61:47:22 85:47:22 109:47:22
whos dt
dt 1x3 40 duration
View the elapsed durations in units of days by changing the Format property of dt.
dt.Format = 'd'
dt = 1x3 duration
2.5746 days 3.5746 days 4.5746 days
Scale the duration values by multiplying dt by a factor of 1.2. Because durations have an exact
length, you can multiply and divide them by fractional values.
dt2 = 1.2*dt
Use the between function to find the number of calendar years, months, and days elapsed between
two dates.
t1 = datetime('today')
t1 = datetime
01-Sep-2021
t2 = t1 + calmonths(0:2) + caldays(4)
t2 = 1x3 datetime
05-Sep-2021 05-Oct-2021 05-Nov-2021
dt = between(t1,t2)
dt = 1x3 calendarDuration
4d 1mo 4d 2mo 4d
See Also
between | diff | caldiff
7-32
Compare Dates and Time
Compare two datetime arrays. The arrays must be the same size or one can be a scalar.
A = datetime(2013,07,26) + calyears(0:2:6)
A = 1x4 datetime
26-Jul-2013 26-Jul-2015 26-Jul-2017 26-Jul-2019
B = datetime(2014,06,01)
B = datetime
01-Jun-2014
A < B
1 0 0 0
The < operator returns logical 1 (true) where a datetime in A occurs before a datetime in B.
A >= '26-Sep-2014'
0 1 1 1
Comparisons of datetime arrays account for the time zone information of each array.
Compare September 1, 2014 at 4:00 p.m. in Los Angeles with 5:00 p.m. on the same day in New York.
A = datetime(2014,09,01,16,0,0,'TimeZone','America/Los_Angeles',...
'Format','dd-MMM-yyyy HH:mm:ss Z')
A = datetime
01-Sep-2014 16:00:00 -0700
B = datetime(2014,09,01,17,0,0,'TimeZone','America/New_York',...
'Format','dd-MMM-yyyy HH:mm:ss Z')
B = datetime
01-Sep-2014 17:00:00 -0400
A < B
7-33
7 Dates and Time
ans = logical
0
4:00 p.m. in Los Angeles occurs after 5:00 p.m. on the same day in New York.
Compare Durations
A = duration([2,30,30;3,15,0])
A = 2x1 duration
02:30:30
03:15:00
B = duration([2,40,0;2,50,0])
B = 2x1 duration
02:40:00
02:50:00
A >= B
0
1
Compare a duration array to a numeric array. Elements in the numeric array are treated as a number
of fixed-length (24-hour) days.
1
0
Use the isbetween function to determine whether values in a datetime array lie within a closed
interval.
tlower = datetime(2014,08,01)
tlower = datetime
01-Aug-2014
tupper = datetime(2014,09,01)
7-34
Compare Dates and Time
tupper = datetime
01-Sep-2014
Create a datetime array and determine whether the values lie within the interval bounded by t1
and t2.
A = datetime(2014,08,21) + calweeks(0:2)
A = 1x3 datetime
21-Aug-2014 28-Aug-2014 04-Sep-2014
tf = isbetween(A,tlower,tupper)
1 1 0
See Also
isbetween
More About
• “Array Comparison with Relational Operators” on page 2-29
7-35
7 Dates and Time
Create t as a sequence of dates and create y as random data. Plot the vectors using the plot
function.
t = datetime(2014,6,28) + calweeks(0:9);
y = rand(1,10);
plot(t,y);
By default, plot chooses tick mark locations based on the range of data. When you zoom in and out
of a plot, the tick labels automatically adjust to the new axis limits.
Change the x-axis limits. Also, change the format for the tick labels along the x-axis. For a list of
formatting options, see the xtickformat function.
7-36
Plot Dates and Durations
Create t as seven linearly spaced duration values between 0 and 3 minutes. Create y as a vector of
random data. Plot the data.
t = 0:seconds(30):minutes(3);
y = rand(1,7);
plot(t,y);
7-37
7 Dates and Time
View the x-axis limits. Since the duration tick labels are in terms of a single unit (minutes), the limits
are stored in terms of that unit.
xl = xlim
xl = 1x2 duration
-4.5 sec 184.5 sec
Change the format for the duration tick labels to display in the form of a digital timer that includes
more than one unit. For a list of formatting options, see the xtickformat function.
xtickformat('mm:ss')
7-38
Plot Dates and Durations
View the x-axis limits again. Since the duration tick labels are now in terms of multiple units, the
limits are stored in units of 24-hour days.
xl = xlim
xl = 1x2 duration
-00:04 03:04
t = datetime('today') + caldays(1:100);
y = linspace(10,40,100) + 10*rand(1,100);
scatter(t,y)
7-39
7 Dates and Time
bar barh
plot plot3
semilogx (x values must be numeric) semilogy (y values must be numeric)
stem stairs
scatter scatter3
area mesh
surf surface
fill fill3
line text
histogram
See Also
plot | datetime | xtickformat
7-40
Core Functions Supporting Date and Time Arrays
This table lists notable MATLAB functions that operate on datetime, duration, and
calendarDuration arrays in addition to other arrays.
7-41
7 Dates and Time
Overview
datetime is the best data type for representing points in time. datetime values have flexible
display formats and up to nanosecond precision, and can account for time zones, daylight saving
time, and leap seconds. However, if you work with code authored in MATLAB R2014a or earlier, or if
you share code with others who use such a version, you might need to work with dates and time
stored in one of these three formats:
Example: 7.3510e+005
Date strings, vectors, and numbers can be stored as arrays of values. Store multiple date strings in a
cell array of character vectors, multiple date vectors in an m-by-6 matrix, and multiple serial date
numbers in a matrix.
You can convert any of these formats to a datetime array using the datetime function. If your
existing MATLAB code expects a serial date number or date vector, use the datenum or datevec
functions, respectively, to convert a datetime array to the expected data format. To convert a
datetime array to character vectors, use the char or cellstr functions.
Starting in R2016b, you also can convert a datetime array to a string array with the string
function.
7-42
Convert Between Datetime Arrays, Numbers, and Text
A date string includes characters that separate the fields, such as the hyphen, space, and colon used
here:
d = '23-Aug-2010 16:35:42'
Convert one or more date strings to a datetime array using the datetime function. For best
performance, specify the format of the input date strings as an input to datetime.
Note The specifiers that datetime uses to describe date and time formats differ from the specifiers
that the datestr, datevec, and datenum functions accept.
For a complete list of date and time format specifiers, see the Format property of the datetime data
type.
t = datetime(d,'InputFormat','dd-MMM-yyyy HH:mm:ss')
t =
datetime
23-Aug-2010 16:35:42
Although the date string, d, and the datetime scalar, t, look similar, they are not equal. View the
size and data type of each variable.
whos d t
d 1x20 40 char
t 1x1 17 datetime
Convert a datetime array to a character vector using char or cellstr. For example, convert the
current date and time to a timestamp to append to a file name.
t = datetime('now','Format','yyyy-MM-dd''T''HHmmss')
t =
datetime
2017-01-03T151105
S = char(t);
filename = ['myTest_',S]
filename =
'myTest_2017-01-03T151105'
7-43
7 Dates and Time
Convert a string array. MATLAB displays strings in double quotes. For best performance, specify the
format of the input date strings as an input to datetime.
str =
"24-Oct-2016 11:58:17"
"19-Nov-2016 09:36:29"
"12-Dec-2016 10:09:06"
t = datetime(str,'InputFormat','dd-MMM-yyyy HH:mm:ss')
t =
24-Oct-2016 11:58:17
19-Nov-2016 09:36:29
12-Dec-2016 10:09:06
t = datetime('25-Dec-2016 06:12:34');
str = string(t)
str =
"25-Dec-2016 06:12:34"
[2012 10 24 10 45 07]
Convert one or more date vectors to a datetime array using the datetime function:
t = datetime([2012 10 24 10 45 07])
7-44
Convert Between Datetime Arrays, Numbers, and Text
t =
datetime
24-Oct-2012 10:45:07
Instead of using datevec to extract components of datetime values, use functions such as year,
month, and day instead:
y = year(t)
y =
2012
Alternatively, access the corresponding property, such as t.Year for year values:
y = t.Year
y =
2012
Serial time can represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75
serial days. So the character vector '31-Oct-2003, 6:00 PM' in MATLAB is date number
731885.75.
Convert one or more serial date numbers to a datetime array using the datetime function. Specify
the type of date number that is being converted:
t = datetime(731885.75,'ConvertFrom','datenum')
t =
datetime
31-Oct-2003 18:00:00
t = datetime(2014,6,18) + calmonths(1:4)
t =
7-45
7 Dates and Time
Subtract the origin value. For example, the origin value might be the starting day of an experiment.
dt = t - datetime(2014,7,1)
dt =
dt is a duration array. Convert dt to a double array of values in units of years, days, hours,
minutes, or seconds using the years, days, hours, minutes, or seconds function, respectively.
x = hours(dt)
x =
y = log(x)
y =
See Also
datetime | datenum | datevec | cellstr | char | string | duration
More About
• “Represent Dates and Times in MATLAB” on page 7-2
• “Extract or Assign Date and Time Components of Datetime Array” on page 7-23
• Proleptic Gregorian Calendar
7-46
Carryover in Date Vectors and Strings
In the following example, the month element has a value of 22. MATLAB increments the year value to
2010 and sets the month to October.
datestr([2009 22 03 00 00 00])
ans =
03-Oct-2010
The carrying forward of values also applies to time and day values in text representing dates and
times. For example, October 3, 2010 and September 33, 2010 are interpreted to be the same date,
and correspond to the same serial date number.
datenum('03-Oct-2010')
ans =
734414
datenum('33-Sep-2010')
ans =
734414
The following example takes the input month (07, or July), finds the last day of the previous month
(June 30), and subtracts the number of days in the field specifier (5 days) from that date to yield a
return date of June 25, 2010.
ans =
25-Jun-2010
7-47
7 Dates and Time
Note The best practice is to use datetime values to represent points in time rather than date
vectors. Unlike date vectors, datetime values display in a human-readable format, often avoiding
the need for conversion to text. If you need to convert a date vector to text, the best practice is to
first convert it to a datetime value, and then to convert the datetime value to text by using the
string or char functions. While you can convert date vectors to text directly by using the datestr
function, you might get unexpected results, as described in this section.
Because a date vector is a 1-by-6 row vector of numbers, the datestr function might interpret input
date vectors as vectors of serial date numbers and return unexpected output. Or it might interpret
vectors of serial date numbers as date vectors. This ambiguity exists because datestr has a
heuristic rule for interpreting a 1-by-6 row vector as either a date vector or a vector of six serial date
numbers. The same ambiguity applies to inputs that are m-by-6 numeric matrices, where each row
can be interpreted either as a date vector or as six serial date numbers.
For example, consider a date vector that includes the year 3000. This year is outside the range of
years that datestr interprets as elements of date vectors. Therefore, the input is interpreted as a 1-
by-6 vector of serial date numbers.
d = datestr([3000 11 05 10 32 56])
d =
'18-Mar-0008'
'11-Jan-0000'
'05-Jan-0000'
'10-Jan-0000'
'01-Feb-0000'
'25-Feb-0000'
Here datestr interprets 3000 as a serial date number, and converts it to the text '18-Mar-0008'
(the date that is 3000 days after 0-Jan-0000). Also, datestr converts the next five elements as
though they also were serial date numbers.
There are two methods for converting such a date vector to text.
• The recommended method is to convert the date vector to a datetime value. Then convert it
using the char, cellstr, or string function. The datetime function always treats 1-by-6
numeric vectors as date vectors.
dt = datetime([3000 11 05 10 32 56]);
ds = string(dt)
dt =
"05-Nov-3000 10:32:56"
• As an alternative, convert it to a serial date number using the datenum function. Then, convert
the date number to a character vector using datestr.
dn = datenum([3000 11 05 10 32 56]);
ds = datestr(dn)
7-48
Converting Date Vector Returns Unexpected Output
ds =
'05-Nov-3000 10:32:56'
When converting dates to text, datestr interprets input as either date vectors or serial date
numbers using a heuristic rule. Consider an m-by-6 matrix. The datestr function interprets the
matrix as m date vectors when:
If either condition is false, for any row, then datestr interprets the m-by-6 matrix as an m-by-6 matrix
of serial date numbers.
Usually, dates with years in the range 1700–2300 are interpreted as date vectors. However, datestr
might interpret rows with month, day, hour, minute, or second values outside their normal ranges as
serial date numbers. For example, datestr correctly interprets the following date vector for the year
2020:
d = datestr([2020 06 21 10 51 00])
d =
'21-Jun-2020 10:51:00'
But given a day value outside the typical range (1–31), datestr returns a date for each element of
the vector.
d = datestr([2020 06 2110 10 51 00])
d =
'12-Jul-0005'
'06-Jan-0000'
'10-Oct-0005'
'10-Jan-0000'
'20-Feb-0000'
'00-Jan-0000'
Again, the datetime function always treats numeric inputs as date vectors. In this case, it calculates
an appropriate date, interpreting 2110 as the 2110th day since the beginning of June 2020.
d = datetime([2020 06 2110 10 51 00])
d =
datetime
11-Mar-2026 10:51:00
• When you have a matrix of date vectors that datestr might interpret incorrectly as serial date
numbers, convert the matrix by using either the datetime or datenum functions. Then convert
those values to text.
• When you have a matrix of serial date numbers that datestr might interpret as date vectors, first
convert the matrix to a column vector. Then, use datestr to convert the column vector.
7-49
7 Dates and Time
See Also
datetime | datenum | datevec | char | string | datestr
More About
• “Represent Dates and Times in MATLAB” on page 7-2
• “Convert Between Datetime Arrays, Numbers, and Text” on page 7-42
7-50
8
Categorical Arrays
By default, categorical arrays contain categories that have no mathematical ordering. For example,
the discrete set of pet categories {'dog' 'cat' 'bird'} has no meaningful mathematical
ordering, so MATLAB® uses the alphabetical ordering {'bird' 'cat' 'dog'}. Ordinal categorical
arrays contain categories that have a meaningful mathematical ordering. For example, the discrete
set of size categories {'small', 'medium', 'large'} has the mathematical ordering small <
medium < large.
When you create categorical arrays from cell arrays of character vectors or string arrays, leading and
trailing spaces are removed. For example, if you specify the text {' cat' 'dog '} as categories, then
when you convert them to categories they become {'cat' 'dog'}.
You can use the categorical function to create a categorical array from a numeric array, logical
array, string array, cell array of character vectors, or an existing categorical array.
Create a 1-by-11 cell array of character vectors containing state names from New England.
state = ["MA","ME","CT","VT","ME","NH","VT","MA","NH","CT","RI"];
Convert the cell array, state, to a categorical array that has no mathematical order.
state = categorical(state)
MA ME CT VT ME NH VT MA NH
Columns 10 through 11
CT RI
class(state)
ans =
'categorical'
categories(state)
8-2
Create Categorical Arrays
{'RI'}
{'VT'}
Create a 1-by-8 cell array of character vectors containing the sizes of eight objects.
AllSizes = ["medium","large","small","small","medium",...
"large","medium","small"];
The cell array, AllSizes, has three distinct values: 'large', 'medium', and 'small'. With the cell
array of character vectors, there is no convenient way to indicate that small < medium < large.
Convert the cell array, AllSizes, to an ordinal categorical array. Use valueset to specify the values
small, medium, and large, which define the categories. For an ordinal categorical array, the first
category specified is the smallest and the last category is the largest.
valueset = ["small","medium","large"];
sizeOrd = categorical(AllSizes,valueset,'Ordinal',true)
Columns 7 through 8
medium small
class(sizeOrd)
ans =
'categorical'
The order of the values in the categorical array, sizeOrd, remains unchanged.
The categories are listed in the specified order to match the mathematical ordering small <
medium < large.
8-3
8 Categorical Arrays
Use the discretize function to create a categorical array by binning the values of x. Put all values
between zero and 15 in the first bin, all the values between 15 and 35 in the second bin, and all the
values between 35 and 50 in the third bin. Each bin includes the left endpoint, but does not include
the right endpoint.
catnames = ["small","medium","large"];
binnedData = discretize(x,[0 15 35 50],'categorical',catnames);
binnedData is a 100-by-1 ordinal categorical array with three categories, such that small <
medium < large.
Use the summary function to print the number of elements in each category.
summary(binnedData)
small 30
medium 35
large 35
Starting in R2016b, you can create string arrays with the string function and convert them to
categorical array.
str = ["Earth","Jupiter","Neptune","Jupiter","Mars","Earth"]
planets = categorical(str)
Add missing elements to str and convert it to a categorical array. Where str has missing values,
planets has undefined values.
str(8) = "Mars"
Columns 7 through 8
<missing> "Mars"
planets = categorical(str)
8-4
Create Categorical Arrays
Columns 7 through 8
<undefined> Mars
See Also
categorical | categories | summary | discretize
Related Examples
• “Convert Text in Table Variables to Categorical” on page 8-6
• “Access Data Using Categorical Arrays” on page 8-24
• “Compare Categorical Array Elements” on page 8-16
More About
• “Advantages of Using Categorical Arrays” on page 8-34
• “Ordinal Categorical Arrays” on page 8-36
8-5
8 Categorical Arrays
load patients
whos
Store the patient data from Age, Gender, Height, Weight, SelfAssessedHealthStatus, and
Location in a table. Use the unique identifiers in the variable LastName as row names.
T = table(Age,Gender,Height,Weight,...
SelfAssessedHealthStatus,Location,...
'RowNames',LastName);
Convert Table Variables from Cell Arrays of Character Vectors to Categorical Arrays
The cell arrays of character vectors, Gender and Location, contain discrete sets of unique values.
T.Gender = categorical(T.Gender);
T.Location = categorical(T.Location);
The variable, SelfAssessedHealthStatus, contains four unique values: Excellent, Fair, Good,
and Poor.
Convert SelfAssessedHealthStatus to an ordinal categorical array, such that the categories have
the mathematical ordering Poor < Fair < Good < Excellent.
T.SelfAssessedHealthStatus = categorical(T.SelfAssessedHealthStatus,...
{'Poor','Fair','Good','Excellent'},'Ordinal',true);
Print a Summary
View the data type, description, units, and other descriptive statistics for each variable by using
summary to summarize the table.
8-6
Convert Text in Table Variables to Categorical
format compact
summary(T)
Variables:
Age: 100x1 double
Values:
Min 25
Median 39
Max 50
Gender: 100x1 categorical
Values:
Female 53
Male 47
Height: 100x1 double
Values:
Min 60
Median 67
Max 72
Weight: 100x1 double
Values:
Min 111
Median 142.5
Max 202
SelfAssessedHealthStatus: 100x1 ordinal categorical
Values:
Poor 11
Fair 15
Good 40
Excellent 34
Location: 100x1 categorical
Values:
County General Hospital 39
St. Mary s Medical Center 24
VA Hospital 37
The table variables Gender, SelfAssessedHealthStatus, and Location are categorical arrays.
The summary contains the counts of the number of elements in each category. For example, the
summary indicates that 53 of the 100 patients are female and 47 are male.
Create a subtable, T1, containing the age, height, and weight of all female patients who were
observed at County General Hospital. You can easily create a logical vector based on the values in the
categorical arrays Gender and Location.
rows is a 100-by-1 logical vector with logical true (1) for the table rows where the gender is female
and the location is County General Hospital.
vars = {'Age','Height','Weight'};
T1 = T(rows,vars)
8-7
8 Categorical Arrays
T1=19×3 table
Age Height Weight
___ ______ ______
Brown 49 64 119
Taylor 31 66 132
Anderson 45 68 128
Lee 44 66 146
Walker 28 65 123
Young 25 63 114
Campbell 37 65 135
Evans 39 62 121
Morris 43 64 135
Rivera 29 63 130
Richardson 30 67 141
Cox 28 66 111
Torres 45 70 137
Peterson 32 60 136
Ramirez 48 64 137
Bennett 35 64 131
⋮
A is a 19-by-3 table.
Since ordinal categorical arrays have a mathematical ordering for their categories, you can perform
element-wise comparisons of them with relational operations, such as greater than and less than.
Create a subtable, T2, of the gender, age, height, and weight of all patients who assessed their health
status as poor or fair.
rows = T.SelfAssessedHealthStatus<='Fair';
vars = {'Gender','Age','Height','Weight'};
T2 = T(rows,vars)
T2=26×4 table
Gender Age Height Weight
______ ___ ______ ______
Johnson Male 43 69 163
Jones Female 40 67 133
Thomas Female 42 66 137
Jackson Male 25 71 174
Garcia Female 27 69 131
Rodriguez Female 39 64 117
Lewis Female 41 62 137
Lee Female 44 66 146
Hall Male 25 70 189
Hernandez Male 36 68 166
Lopez Female 40 66 137
Gonzalez Female 35 66 118
Mitchell Male 39 71 164
8-8
Convert Text in Table Variables to Categorical
T2 is a 26-by-4 table.
See Also
Related Examples
• “Create Tables and Assign Data to Them” on page 9-2
• “Create Categorical Arrays” on page 8-2
• “Access Data in Tables” on page 9-32
• “Access Data Using Categorical Arrays” on page 8-24
More About
• “Advantages of Using Categorical Arrays” on page 8-34
• “Ordinal Categorical Arrays” on page 8-36
8-9
8 Categorical Arrays
load patients
whos
The workspace variable, Location, is a cell array of character vectors that contains the three unique
medical facilities where patients were observed.
To access and compare data more easily, convert Location to a categorical array.
Location = categorical(Location);
summary(Location)
39 patients were observed at County General Hospital, 24 at St. Mary's Medical Center, and 37 at the
VA Hospital.
Convert SelfAssessedHealthStatus to an ordinal categorical array, such that the categories have
the mathematical ordering Poor < Fair < Good < Excellent.
SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus,...
{'Poor' 'Fair' 'Good' 'Excellent'},'Ordinal',true);
summary(SelfAssessedHealthStatus)
8-10
Plot Categorical Data
Poor 11
Fair 15
Good 40
Excellent 34
Plot Histogram
figure
histogram(SelfAssessedHealthStatus)
title('Self Assessed Health Status From 100 Patients')
The function histogram accepts the categorical array, SelfAssessedHealthStatus, and plots the
category counts for each of the four categories.
Create a histogram of the hospital location for only the patients who assessed their health as Fair or
Poor.
figure
histogram(Location(SelfAssessedHealthStatus<='Fair'))
title('Location of Patients in Fair or Poor Health')
8-11
8 Categorical Arrays
figure
pie(SelfAssessedHealthStatus);
title('Self Assessed Health Status From 100 Patients')
8-12
Plot Categorical Data
The function pie accepts the categorical array, SelfAssessedHealthStatus, and plots a pie chart
of the four categories.
Create a Pareto chart from the category counts for each of the four categories of
SelfAssessedHealthStatus.
figure
A = countcats(SelfAssessedHealthStatus);
C = categories(SelfAssessedHealthStatus);
pareto(A,C);
title('Self Assessed Health Status From 100 Patients')
8-13
8 Categorical Arrays
The first input argument to pareto must be a vector. If a categorical array is a matrix or
multidimensional array, reshape it into a vector before calling countcats and pareto.
Gender = categorical(Gender);
summary(Gender)
Female 53
Male 47
Gender is a 100-by-1 categorical array with two categories, Female and Male.
Use the categorical array, Gender, to access Weight and Height data for each gender separately.
X1 = Weight(Gender=='Female');
Y1 = Height(Gender=='Female');
X2 = Weight(Gender=='Male');
Y2 = Height(Gender=='Male');
X1 and Y1 are 53-by-1 numeric arrays containing data from the female patients.
8-14
Plot Categorical Data
X2 and Y2 are 47-by-1 numeric arrays containing data from the male patients.
Create a scatter plot of height vs. weight. Indicate data from the female patients with a circle and
data from the male patients with a cross.
figure
h1 = scatter(X1,Y1,'o');
hold on
h2 = scatter(X2,Y2,'x');
See Also
categorical | summary | countcats | histogram | pie | bar | rose | scatter
Related Examples
• “Access Data Using Categorical Arrays” on page 8-24
8-15
8 Categorical Arrays
colors = categorical(C)
categories(colors)
Use the relational operator, eq (==), to compare the first and second rows of colors.
colors(1,:) == colors(2,:)
1 0 1 1
Only the values in the second column differ between the rows.
Compare the entire categorical array, colors, to the character vector 'blue' to find the location of
all blue values.
colors == 'blue'
1 0 0 1
1 0 0 1
There are four blue entries in colors, one in each corner of the array.
8-16
Compare Categorical Array Elements
Add a mathematical ordering to the categories in colors. Specify the category order that represents
the ordering of color spectrum, red < green < blue.
categories(colors)
Determine if elements in the first column of colors are greater than the elements in the second
column.
1
1
Both values in the first column, blue, are greater than the corresponding values in the second
column, red and green.
Find all the elements in colors that are less than 'blue'.
0 1 1 0
0 1 1 0
The function lt (<) indicates the location of all green and red values with 1.
See Also
categorical | categories
8-17
8 Categorical Arrays
Related Examples
• “Access Data Using Categorical Arrays” on page 8-24
More About
• “Relational Operations”
• “Advantages of Using Categorical Arrays” on page 8-34
• “Ordinal Categorical Arrays” on page 8-36
8-18
Combine Categorical Arrays
rng('default')
A = randi(3,[25,1]);
A = categorical(A,1:3,{'milk' 'water' 'juice'});
A is a 25-by-1 categorical array with three distinct categories: milk, water, and juice.
summary(A)
milk 6
water 5
juice 14
Six students in classroom A prefer milk, five prefer water, and fourteen prefer juice.
B = randi(3,[28,1]);
B = categorical(B,1:3,{'milk' 'water' 'juice'});
summary(B)
milk 9
water 8
juice 11
Nine students in classroom B prefer milk, eight prefer water, and eleven prefer juice.
Concatenate the data from classrooms A and B into a single categorical array, Group1.
Group1 = [A;B];
summary(Group1)
milk 15
water 13
juice 25
Group1 is a 53-by-1 categorical array with three categories: milk, water, and juice.
8-19
8 Categorical Arrays
Create a categorical array, Group2, containing data from 50 students who were given the additional
beverage option of soda.
Group2 = randi(4,[50,1]);
Group2 = categorical(Group2,1:4,{'juice' 'milk' 'soda' 'water'});
summary(Group2)
juice 12
milk 14
soda 10
water 14
Group2 is a 50-by-1 categorical array with four categories: juice, milk, soda, and water.
students = [Group1;Group2];
summary(students)
milk 29
water 27
juice 37
soda 10
Concatenation appends the categories exclusive to the second input, soda, to the end of the list of
categories from the first input, milk, water, juice, soda.
Use reordercats to change the order of the categories in the categorical array, students.
students = reordercats(students,{'juice','milk','water','soda'});
categories(students)
Use the function union to find the unique responses from Group1 and Group2.
C = union(Group1,Group2)
C = 4x1 categorical
milk
water
8-20
Combine Categorical Arrays
juice
soda
union returns the combined values from Group1 and Group2 with no repetitions. In this case, C is
equivalent to the categories of the concatenation, students.
All of the categorical arrays in this example were nonordinal. To combine ordinal categorical arrays,
they must have the same sets of categories including their order.
See Also
categorical | categories | summary | union | cat | horzcat | vertcat
Related Examples
• “Create Categorical Arrays” on page 8-2
• “Combine Categorical Arrays Using Multiplication” on page 8-22
• “Convert Text in Table Variables to Categorical” on page 8-6
• “Access Data Using Categorical Arrays” on page 8-24
More About
• “Ordinal Categorical Arrays” on page 8-36
8-21
8 Categorical Arrays
Combine two categorical arrays using times. The input arrays must have the same number of
elements, but can have different numbers of categories.
A = categorical({'blue','red','green'});
B = categorical({'+','-','+'});
C = A.*B
C = 1x3 categorical
blue + red - green +
Show the categories of C. The categories are all the ordered pairs that can be created from the
categories of A and B, also known as the Cartesian product.
categories(C)
D = B.*A
D = 1x3 categorical
+ blue - red + green
categories(D)
8-22
Combine Categorical Arrays Using Multiplication
Combine two categorical arrays. If either A or B have an undefined element, the corresponding
element of C is undefined.
A = categorical({'blue','red','green','black'});
B = categorical({'+','-','+','-'});
A = removecats(A,{'black'});
C = A.*B
C = 1x4 categorical
blue + red - green + <undefined>
Combine two ordinal categorical arrays. C is an ordinal categorical array only if A and B are both
ordinal. The ordering of the categories of C follows from the orderings of the input categorical arrays.
A = categorical({'blue','red','green'},{'green','red','blue'},'Ordinal',true);
B = categorical({'+','-','+'},'Ordinal',true);
C = A.*B;
categories(C)
See Also
categorical | categories | summary | times
Related Examples
• “Create Categorical Arrays” on page 8-2
• “Combine Categorical Arrays” on page 8-19
• “Access Data Using Categorical Arrays” on page 8-24
More About
• “Ordinal Categorical Arrays” on page 8-36
8-23
8 Categorical Arrays
In this section...
“Select Data By Category” on page 8-24
“Common Ways to Access Data Using Categorical Arrays” on page 8-24
• Select elements from particular categories. For categorical arrays, use the logical operators
== or ~= to select data that is in, or not in, a particular category. To select data in a particular
group of categories, use the ismember function.
For ordinal categorical arrays, use inequalities >, >=, <, or <= to find data in categories above or
below a particular category.
• Delete data that is in a particular category. Use logical operators to include or exclude data
from particular categories.
• Find elements that are not in a defined category. Categorical arrays indicate which elements
do not belong to a defined category by <undefined>. Use the isundefined function to find
observations without a defined value.
load patients
whos
8-24
Access Data Using Categorical Arrays
Gender and Location contain data that belong in categories. Each cell array contains character
vectors taken from a small set of unique values (indicating two genders and three locations
respectively). Convert Gender and Location to categorical arrays.
Gender = categorical(Gender);
Location = categorical(Location);
For categorical arrays, you can use the logical operators == and ~= to find the data that is in, or not
in, a particular category.
Determine if there are any patients observed at the location, 'Rampart General Hospital'.
ans = logical
0
You can use ismember to find data in a particular group of categories. Create a logical vector for the
patients observed at County General Hospital or VA Hospital.
VA_CountyGenIndex = ...
ismember(Location,{'County General Hospital','VA Hospital'});
VA_CountyGenIndex is a 100-by-1 logical array containing logical true (1) for each element in the
categorical array Location that is a member of the category County General Hospital or VA
Hospital. The output, VA_CountyGenIndex contains 76 nonzero elements.
Use the logical vector, VA_CountyGenIndex to select the LastName of the patients observed at
either County General Hospital or VA Hospital.
VA_CountyGenPatients = LastName(VA_CountyGenIndex);
Use the summary function to print a summary containing the category names and the number of
elements in each category.
summary(Location)
Location is a 100-by-1 categorical array with three categories. County General Hospital
occurs in 39 elements, St. Mary s Medical Center in 24 elements, and VA Hospital in 37
elements.
8-25
8 Categorical Arrays
Female 53
Male 47
Gender is a 100-by-1 categorical array with two categories. Female occurs in 53 elements and Male
occurs in 47 elements.
Use logical operator == to access the age of only the female patients. Then plot a histogram of this
data.
figure()
histogram(Age(Gender=='Female'))
title('Age of Female Patients')
You can use logical operators to include or exclude data from particular categories. Delete all patients
observed at VA Hospital from the workspace variables, Age and Location.
Age = Age(Location~='VA Hospital');
Location = Location(Location~='VA Hospital');
Now, Age is a 63-by-1 numeric array, and Location is a 63-by-1 categorical array.
8-26
Access Data Using Categorical Arrays
List the categories of Location, as well as the number of elements in each category.
summary(Location)
The patients observed at VA Hospital are deleted from Location, but VA Hospital is still a
category.
Use the removecats function to remove VA Hospital from the categories of Location.
categories(Location)
Delete Element
You can delete elements by indexing. For example, you can remove the first element of Location by
selecting the rest of the elements with Location(2:end). However, an easier way to delete
elements is to use [].
Location(1) = [];
summary(Location)
Location is a 62-by-1 categorical array that has two categories. Deleting the first element has no
effect on other elements from the same category and does not delete the category itself.
Location(1:8)
8-27
8 Categorical Arrays
After removing the category, County General Hospital, elements that previously belonged to
that category no longer belong to any category defined for Location. Categorical arrays denote
these elements as undefined.
Use the function isundefined to find observations that do not belong to any category.
undefinedIndex = isundefined(Location);
undefinedIndex is a 62-by-1 categorical array containing logical true (1) for all undefined
elements in Location.
Use the summary function to print the number of undefined elements in Location.
summary(Location)
The first element of Location belongs to the category, St. Mary's Medical Center. Set the first
element to be undefined so that it no longer belongs to any category.
Location(1) = '<undefined>';
summary(Location)
You can make selected elements undefined without removing a category or changing the categories
of other elements. Set elements to be undefined to indicate elements with values that are unknown.
You can use undefined elements to preallocate the size of a categorical array for better performance.
Create a categorical array that has elements with known locations only.
definedIndex = ~isundefined(Location);
newLocation = Location(definedIndex);
summary(newLocation)
Expand the size of newLocation so that it is a 200-by-1 categorical array. Set the last new element
to be undefined. All of the other new elements also are set to be undefined. The 23 original
elements keep the values they had.
newLocation(200) = '<undefined>';
summary(newLocation)
8-28
Access Data Using Categorical Arrays
newLocation has room for values you plan to store in the array later.
See Also
categorical | categories | summary | any | histogram | removecats | isundefined
Related Examples
• “Create Categorical Arrays” on page 8-2
• “Convert Text in Table Variables to Categorical” on page 8-6
• “Plot Categorical Data” on page 8-10
• “Compare Categorical Array Elements” on page 8-16
• “Work with Protected Categorical Arrays” on page 8-30
More About
• “Advantages of Using Categorical Arrays” on page 8-34
• “Ordinal Categorical Arrays” on page 8-36
8-29
8 Categorical Arrays
When you create a categorical array with the categorical function, you have the option of
specifying whether or not the categories are protected. Ordinal categorical arrays always have
protected categories, but you also can create a nonordinal categorical array that is protected using
the 'Protected',true name-value pair argument.
When you assign values that are not in the array's list of categories, the array updates automatically
so that its list of categories includes the new values. Similarly, you can combine (nonordinal)
categorical arrays that have different categories. The categories in the result include the categories
from both arrays.
When you assign new values to a protected categorical array, the values must belong to one of the
existing categories. Similarly, you can only combine protected arrays that have the same categories.
• If you want to combine two nonordinal categorical arrays that have protected categories, they
must have the same categories, but the order does not matter. The resulting categorical array
uses the category order from the first array.
• If you want to combine two ordinal categorical array (that always have protected categories), they
must have the same categories, including their order.
To add new categories to the array, you must use the function addcats.
Create a categorical array containing the sizes of 10 objects. Use the names small, medium, and
large for the values 'S', 'M', and 'L'.
A = categorical({'M';'L';'S';'S';'M';'L';'M';'L';'M';'S'},...
{'S','M','L'},{'small','medium','large'},'Ordinal',true)
A = 10x1 categorical
medium
large
small
small
medium
large
medium
large
medium
small
8-30
Work with Protected Categorical Arrays
When you create an ordinal categorical array, the categories are always protected.
Use the isprotected function to verify that the categories of A are protected.
tf = isprotected(A)
tf = logical
1
If you try to assign a new value that does not belong to one of the existing categories, then MATLAB®
returns an error. For example, you cannot assign the value 'xlarge' to the categorical array, as in
the expression A(2) = 'xlarge', because xlarge is not a category of A. Instead, MATLAB®
returns the error:
To add a new category for xlarge, use the addcats function. Since A is ordinal you must specify the
order for the new category.
A = addcats(A,'xlarge','After','large');
A(2) = 'xlarge'
A = 10x1 categorical
medium
xlarge
small
small
medium
large
medium
large
medium
small
A is now a 10-by-1 categorical array with four categories, such that small < medium < large <
xlarge.
Create another ordinal categorical array, B, containing the sizes of five items.
8-31
8 Categorical Arrays
B = categorical([2;1;1;2;2],1:2,{'xsmall','small'},'Ordinal',true)
B = 5x1 categorical
small
xsmall
xsmall
small
small
B is a 5-by-1 categorical array with two categories such that xsmall < small.
To combine two ordinal categorical arrays (which always have protected categories), they must have
the same categories and the categories must be in the same order.
A = addcats(A,'xsmall','Before','small');
categories(A)
B = addcats(B,{'medium','large','xlarge'},'After','small');
categories(B)
The categories of A and B are now the same including their order.
C = [A;B]
C = 15x1 categorical
medium
xlarge
small
small
medium
large
medium
large
medium
8-32
Work with Protected Categorical Arrays
small
small
xsmall
xsmall
small
small
categories(C)
C is a 16-by-1 ordinal categorical array with five categories, such that xsmall < small < medium
< large < xlarge.
See Also
categorical | categories | summary | isprotected | isordinal | addcats
Related Examples
• “Create Categorical Arrays” on page 8-2
• “Convert Text in Table Variables to Categorical” on page 8-6
• “Access Data Using Categorical Arrays” on page 8-24
• “Combine Categorical Arrays” on page 8-19
• “Combine Categorical Arrays Using Multiplication” on page 8-22
More About
• “Ordinal Categorical Arrays” on page 8-36
8-33
8 Categorical Arrays
In this section...
“Natural Representation of Categorical Data” on page 8-34
“Mathematical Ordering for Character Vectors” on page 8-34
“Reduce Memory Requirements” on page 8-34
An ordering other than alphabetical order is not possible with character arrays or cell arrays of
character vectors. Thus, inequality comparisons, such as greater and less than, are not possible. With
categorical arrays, you can use relational operations to test for equality and perform element-wise
comparisons that have a meaningful mathematical ordering.
state = [repmat({'MA'},25,1);repmat({'NY'},25,1);...
repmat({'CA'},50,1);...
repmat({'MA'},25,1);repmat({'NY'},25,1)];
whos state
8-34
Advantages of Using Categorical Arrays
The variable state is a cell array of character vectors requiring 17,400 bytes of memory.
state = categorical(state);
categories(state)
whos state
See Also
categorical | categories
Related Examples
• “Create Categorical Arrays” on page 8-2
• “Convert Text in Table Variables to Categorical” on page 8-6
• “Compare Categorical Array Elements” on page 8-16
• “Access Data Using Categorical Arrays” on page 8-24
More About
• “Ordinal Categorical Arrays” on page 8-36
8-35
8 Categorical Arrays
Order of Categories
categorical is a data type to store data with values from a finite set of discrete categories, which
can have a natural order. You can specify and rearrange the order of categories in all categorical
arrays. However, you only can treat ordinal categorical arrays as having a mathematical ordering to
their categories. Use an ordinal categorical array if you want to use the functions min, max, or
relational operations, such as greater than and less than.
The discrete set of pet categories {'dog' 'cat' 'bird'} has no meaningful mathematical
ordering. You are free to use any category order and the meaning of the associated data does not
change. For example, pets = categorical({'bird','cat','dog','dog','cat'}) creates a
categorical array and the categories are listed in alphabetical order, {'bird' 'cat' 'dog'}. You
can choose to specify or change the order of the categories to {'dog' 'cat' 'bird'} and the
meaning of the data does not change.
ordinal categorical arrays contain categories that have a meaningful mathematical ordering. For
example, the discrete set of size categories {'small', 'medium', 'large'} has the
mathematical ordering small < medium < large. The first category listed is the smallest and the
last category is the largest. The order of the categories in an ordinal categorical array affects the
result from relational comparisons of ordinal categorical arrays.
Create an ordinal categorical array, sizes, from a cell array of character vectors, A. Use valueset,
specified as a vector of unique values, to define the categories for sizes.
sizes = categorical(A,valueset,'Ordinal',true)
sizes is 3-by-2 ordinal categorical array with three categories such that small < medium <
large. The order of the values in valueset becomes the order of the categories of sizes.
8-36
Ordinal Categorical Arrays
Create an equivalent categorical array from an array of integers. Use the values 1, 2, and 3 to define
the categories small, medium, and large, respectively.
A2 = [2 3; 1 2; 3 1];
valueset = 1:3;
catnames = {'small','medium','large'};
sizes2 = categorical(A2,valueset,catnames,'Ordinal',true)
isequal(sizes,sizes2)
ans = logical
1
sizes and sizes2 are equivalent categorical arrays with the same ordering of categories.
Create a nonordinal categorical array from the cell array of character vectors, A.
sizes3 = categorical(A)
isordinal(sizes3)
ans = logical
0
Convert sizes3 to an ordinal categorical array, such that small < medium < large.
sizes3 = categorical(sizes3,{'small','medium','large'},'Ordinal',true);
8-37
8 Categorical Arrays
sizes3 is now a 3-by-2 ordinal categorical array equivalent to sizes and sizes2.
See Also
categorical | categories | isordinal | isequal
Related Examples
• “Create Categorical Arrays” on page 8-2
• “Convert Text in Table Variables to Categorical” on page 8-6
• “Compare Categorical Array Elements” on page 8-16
• “Access Data Using Categorical Arrays” on page 8-24
More About
• “Advantages of Using Categorical Arrays” on page 8-34
8-38
Core Functions Supporting Categorical Arrays
The following table lists notable MATLAB functions that operate on categorical arrays in addition to
other arrays.
8-39
9
Tables
In MATLAB®, you can create tables and assign data to them in several ways.
The way you choose depends on the nature of your data and how you plan to use tables in your code.
You can create a table from arrays by using the table function. For example, create a small table
with data for five patients.
First, create six column-oriented arrays of data. These arrays have five rows because there are five
patients. (Most of these arrays are 5-by-1 column vectors, while BloodPressure is a 5-by-2 matrix.)
LastName = ["Sanchez";"Johnson";"Zhang";"Diaz";"Brown"];
Age = [38;43;38;40;49];
Smoker = [true;false;true;false;true];
Height = [71;69;64;67;64];
Weight = [176;163;131;133;119];
BloodPressure = [124 93; 109 77; 125 83; 117 75; 122 80];
Now create a table, patients, as a container for the data. In this call to the table function, the
input arguments use the workspace variable names for the names of the variables in patients.
patients = table(LastName,Age,Smoker,Height,Weight,BloodPressure)
patients=5×6 table
LastName Age Smoker Height Weight BloodPressure
_________ ___ ______ ______ ______ _____________
9-2
Create Tables and Assign Data to Them
The table is a 5-by-6 table because it has six variables. As the BloodPressure variable shows, a
table variable itself can have multiple columns. This example shows why tables have rows and
variables, not rows and columns.
Once you have created a table, you can add a new variable at any time by using dot notation. Dot
notation refers to table variables by name, T.varname, where T is the table and varname is the
variable name. This notation is similar to the notation you use to access and assign data to the fields
of a structure.
For example, add a BMI variable to patients. Calculate body mass index, or BMI, using the values in
patients.Weight and patients.Height. Assign the BMI values to a new table variable.
patients.BMI = (patients.Weight*0.453592)./(patients.Height*0.0254).^2
patients=5×7 table
LastName Age Smoker Height Weight BloodPressure BMI
_________ ___ ______ ______ ______ _____________ ______
Another way to create a table is to start with an empty table and assign variables to it. For example,
re-create the table of patient data, but this time assign variables using dot notation.
patients2 = table
patients2 =
Next, create a copy of the patient data by assigning variables. Table variable names do not have to
match array names, as shown by the Name and BP table variables.
patients2.Name = LastName;
patients2.Age = Age;
patients2.Smoker = Smoker;
patients2.Height = Height;
patients2.Weight = Weight;
patients2.BP = BloodPressure
patients2=5×6 table
Name Age Smoker Height Weight BP
_________ ___ ______ ______ ______ __________
9-3
9 Tables
Sometimes you know the sizes and data types of the data that you want to store in a table, but you
plan to assign the data later. Perhaps you plan to add only a few rows at a time. In that case,
preallocating space in the table and then assigning values to empty rows can be more efficient.
For example, to preallocate space for a table to contain time and temperature readings at different
stations, use the table function. Instead of supplying input arrays, specify the sizes and data types of
the table variables. To give them names, specify the 'VariableNames' argument. Preallocation fills
table variables with default values that are appropriate for their data types.
sz = [4 3];
varTypes = ["double","datetime","string"];
varNames = ["Temperature","Time","Station"];
temps = table('Size',sz,'VariableTypes',varTypes,'VariableNames',varNames)
temps=4×3 table
Temperature Time Station
___________ ____ _________
0 NaT <missing>
0 NaT <missing>
0 NaT <missing>
0 NaT <missing>
One way to assign or add a row to a table is to assign a cell array to a row. If the cell array is a row
vector and its elements match the data types of their respective variables, then the assignment
converts the cell array to a table row. However, you can assign only one row at a time using cell
arrays. Assign values to the first two rows.
temps(1,:) = {75,datetime('now'),"S1"};
temps(2,:) = {68,datetime('now')+1,"S2"}
temps=4×3 table
Temperature Time Station
___________ ____________________ _________
As an alternative, you can assign rows from a smaller table into a larger table. With this method, you
can assign one or more rows at a time.
temps(3:4,:) = table([63;72],[datetime('now')+2;datetime('now')+3],["S3";"S4"])
temps=4×3 table
Temperature Time Station
___________ ____________________ _______
9-4
Create Tables and Assign Data to Them
You can use either syntax to increase the size of a table by assigning rows beyond the end of the
table. If necessary, missing rows are filled in with default values.
temps(6,:) = {62,datetime('now')+6,"S6"}
temps=6×3 table
Temperature Time Station
___________ ____________________ _________
You can convert variables that have other data types to tables. Cell arrays and structures are other
types of containers that can store arrays that have different data types. So you can convert cell arrays
and structures to tables. You can also convert an array to a table where the table variables contain
columns of values from the array. To convert these kinds of variables, use the array2table,
cell2table, or struct2table functions.
For example, convert an array to a table by using array2table. Arrays do not have column names,
so the table has default variable names.
A = randi(3,3)
A = 3×3
3 3 1
3 2 2
1 1 3
a2t = array2table(A)
a2t=3×3 table
A1 A2 A3
__ __ __
3 3 1
3 2 2
1 1 3
You can provide your own table variable names by using the "VariableNames" name-value
argument.
a2t = array2table(A,"VariableNames",["First","Second","Third"])
a2t=3×3 table
First Second Third
9-5
9 Tables
3 3 1
3 2 2
1 1 3
It is common to have a large quantity of tabular data in a file such as a CSV (comma-separated value)
file or an Excel® spreadsheet. To read such data into a table, use the readtable function.
For example, the CSV file outages.csv is a sample file that is distributed with MATLAB. The file
contains data for a set of electrical power outages. The first line of outages.csv has column names.
The rest of the file has comma-separated data values for each outage. The first few lines are shown
here.
Region,OutageTime,Loss,Customers,RestorationTime,Cause
SouthWest,2002-02-01 12:18,458.9772218,1820159.482,2002-02-07 16:50,winter storm
SouthEast,2003-01-23 00:49,530.1399497,212035.3001,,winter storm
SouthEast,2003-02-07 21:15,289.4035493,142938.6282,2003-02-17 08:14,winter storm
West,2004-04-06 05:44,434.8053524,340371.0338,2004-04-06 06:10,equipment fault
MidWest,2002-03-16 06:18,186.4367788,212754.055,2002-03-18 23:23,severe storm
...
To read outages.csv and store the data in a table, you can use readtable. It reads numeric
values, dates and times, and strings into table variables that have appropriate data types. Here, Loss
and Customers are numeric arrays. The OutageTime and RestorationTime variables are
datetime arrays because readtable recognizes the date and time formats of the text in those
columns of the input file. To read the rest of the text data into string arrays, specify the "TextType"
name-value argument.
outages = readtable("outages.csv","TextType","string")
outages=1468×6 table
Region OutageTime Loss Customers RestorationTime Cause
___________ ________________ ______ __________ ________________ ______________
9-6
Create Tables and Assign Data to Them
Finally, you can interactively preview and import data from spreadsheets or delimited text files by
using the Import Tool. There are two ways to open the Import Tool.
• MATLAB Toolstrip: On the Home tab, in the Variable section, click Import Data.
• MATLAB command prompt: Enter uiimport(filename), where filename is the name of a text
or spreadsheet file.
For example, open the outages.csv sample file by using uiimport and which to get the path to
the file.
uiimport(which("outages.csv"))
The Import Tool shows you a preview of the six columns from outages.csv. To import the data as a
table, follow these steps.
See Also
readtable | table | array2table | cell2table | struct2table | Import Tool
Related Examples
• “Access Data in Tables” on page 9-32
• “Add and Delete Table Rows” on page 9-9
9-7
9 Tables
9-8
Add and Delete Table Rows
load patients
T = table(LastName,Gender,Age,Height,Weight,Smoker,Systolic,Diastolic);
size(T)
ans = 1×2
100 8
Read data on more patients from a comma-delimited file, morePatients.csv, into a table, T2. Then,
append the rows from T2 to the end of the table, T.
T2 = readtable('morePatients.csv');
Tnew = [T;T2];
size(Tnew)
ans = 1×2
104 8
The table Tnew has 104 rows. In order to vertically concatenate two tables, both tables must have the
same number of variables, with the same variable names. If the variable names are different, you can
directly assign new rows in a table to rows from another table. For example, T(end+1:end+4,:) =
T2.
To append new rows stored in a cell array, vertically concatenate the cell array onto the end of the
table. You can concatenate directly from a cell array when it has the right number of columns and the
contents of its cells can be concatenated onto the corresponding table variables.
cellPatients = {'Edwards','Male',42,70,158,0,116,83;
'Falk','Female',28,62,125,1,120,71};
Tnew = [Tnew;cellPatients];
size(Tnew)
ans = 1×2
106 8
You also can convert a cell array to a table using the cell2table function.
9-9
9 Tables
You also can append new rows stored in a structure. Convert the structure to a table, and then
concatenate the tables.
structPatients(1,1).LastName = 'George';
structPatients(1,1).Gender = 'Male';
structPatients(1,1).Age = 45;
structPatients(1,1).Height = 76;
structPatients(1,1).Weight = 182;
structPatients(1,1).Smoker = 1;
structPatients(1,1).Systolic = 132;
structPatients(1,1).Diastolic = 85;
structPatients(2,1).LastName = 'Hadley';
structPatients(2,1).Gender = 'Female';
structPatients(2,1).Age = 29;
structPatients(2,1).Height = 58;
structPatients(2,1).Weight = 120;
structPatients(2,1).Smoker = 0;
structPatients(2,1).Systolic = 112;
structPatients(2,1).Diastolic = 70;
Tnew = [Tnew;struct2table(structPatients)];
size(Tnew)
ans = 1×2
108 8
To omit any rows in a table that are duplicated, use the unique function.
Tnew = unique(Tnew);
size(Tnew)
ans = 1×2
106 8
Tnew([18,20,21],:) = [];
size(Tnew)
ans = 1×2
103 8
9-10
Add and Delete Table Rows
First, specify the variable of identifiers, LastName, as row names. Then, delete the variable,
LastName, from Tnew. Finally, use the row name to index and delete rows.
Tnew.Properties.RowNames = Tnew.LastName;
Tnew.LastName = [];
Tnew('Smith',:) = [];
size(Tnew)
ans = 1×2
102 7
The table now has one less row and one less variable.
You also can search for observations in the table. For example, delete rows for any patients under the
age of 30.
ans = 1×2
85 7
See Also
table | readtable | array2table | cell2table | struct2table
Related Examples
• “Add, Delete, and Rearrange Table Variables” on page 9-12
• “Clean Messy and Missing Data in Tables” on page 9-19
9-11
9 Tables
You also can modify table variables using the Variables Editor.
Load arrays of sample data from the patients MAT-file. Display the names and sizes of the variables
loaded into the workspace.
load patients
whos -file patients
Create two tables. Create one table, T, with information collected from a patient questionnaire and
create another table, T2, with data measured from patients. Each table has 100 rows.
T = table(Age,Gender,Smoker);
T2 = table(Height,Weight,Systolic,Diastolic);
head(T,5)
ans=5×3 table
Age Gender Smoker
___ __________ ______
38 {'Male' } true
43 {'Male' } false
38 {'Female'} false
40 {'Female'} false
49 {'Female'} false
head(T2,5)
ans=5×4 table
Height Weight Systolic Diastolic
9-12
Add, Delete, and Rearrange Table Variables
71 176 124 93
69 163 109 77
64 131 125 83
67 133 117 75
64 119 122 80
ans=5×7 table
Age Gender Smoker Height Weight Systolic Diastolic
___ __________ ______ ______ ______ ________ _________
If the tables that you are horizontally concatenating have row names, horzcat concatenates the
tables by matching the row names. Therefore, the tables must use the same row names, but the row
order does not matter.
Add the names of patients from the workspace variable LastName before the first table variable in T.
You can specify any location in the table using the name of a variable near the new location. Use
quotation marks to refer to the names of table variables. However, do not use quotation marks for
input arguments that are workspace variables.
T = addvars(T,LastName,'Before','Age');
head(T,5)
ans=5×8 table
LastName Age Gender Smoker Height Weight Systolic Diastolic
____________ ___ __________ ______ ______ ______ ________ _________
You also can specify locations in a table using numbers. For example, the equivalent syntax using a
number to specify location is T = addvars(T,LastName,'Before',1).
9-13
9 Tables
An alternative way to add new table variables is to use dot syntax. When you use dot syntax, you
always add the new variable as the last table variable. You can add a variable that has any data type,
as long as it has the same number of rows as the table.
Create a new variable for blood pressure as a horizontal concatenation of the two variables
Systolic and Diastolic. Add it to T.
ans=5×9 table
LastName Age Gender Smoker Height Weight Systolic Diastolic B
____________ ___ __________ ______ ______ ______ ________ _________ _
T now has 9 variables and 100 rows. A table variable can have multiple columns. So although
BloodPressure has two columns, it is one table variable.
Add a new variable, BMI, in the table T, that contains the body mass index for each patient. BMI is a
function of height and weight. When you calculate BMI, you can refer to the Weight and Height
variables that are in T.
T.BMI = (T.Weight*0.453592)./(T.Height*0.0254).^2;
The operators ./ and .^ in the calculation of BMI indicate element-wise division and exponentiation,
respectively.
head(T,5)
ans=5×10 table
LastName Age Gender Smoker Height Weight Systolic Diastolic B
____________ ___ __________ ______ ______ ______ ________ _________ _
Move the table variable BMI using the movevars function, so that it is after the variable Weight.
When you specify table variables by name, use quotation marks.
T = movevars(T,'BMI','After','Weight');
head(T,5)
9-14
Add, Delete, and Rearrange Table Variables
ans=5×10 table
LastName Age Gender Smoker Height Weight BMI Systolic Dias
____________ ___ __________ ______ ______ ______ ______ ________ ____
You also can specify locations in a table using numbers. For example, the equivalent syntax using a
number to specify location is T = movevars(T,'BMI,'After',6). It is often more convenient to
refer to variables by name.
As an alternative, you can move table variables by indexing. You can index into a table using the same
syntax you use for indexing into a matrix.
T = T(:,[1:7 10 8 9]);
head(T,5)
ans=5×10 table
LastName Age Gender Smoker Height Weight BMI BloodPressure
____________ ___ __________ ______ ______ ______ ______ _____________
In a table with many variables, it is often more convenient to use the movevars function.
Delete Variables
To delete table variables, use the removevars function. Delete the Systolic and Diastolic table
variables.
T = removevars(T,{'Systolic','Diastolic'});
head(T,5)
ans=5×8 table
LastName Age Gender Smoker Height Weight BMI BloodPressure
____________ ___ __________ ______ ______ ______ ______ _____________
9-15
9 Tables
As an alternative, you can delete variables using dot syntax and the empty matrix, []. Remove the
Age variable from the table.
T.Age = [];
head(T,5)
ans=5×7 table
LastName Gender Smoker Height Weight BMI BloodPressure
____________ __________ ______ ______ ______ ______ _____________
You also can delete variables using indexing and the empty matrix, []. Remove the Gender variable
from the table.
T(:,'Gender') = [];
head(T,5)
ans=5×6 table
LastName Smoker Height Weight BMI BloodPressure
____________ ______ ______ ______ ______ _____________
To split multicolumn table variables into variables that each have one column, use the splitvars
functions. Split the variable BloodPressure into two variables.
T = splitvars(T,'BloodPressure','NewVariableNames',{'Systolic','Diastolic'});
head(T,5)
ans=5×7 table
LastName Smoker Height Weight BMI Systolic Diastolic
____________ ______ ______ ______ ______ ________ _________
Similarly, you can group related table variables together in one variable, using the mergevars
function. Combine Systolic and Diastolic back into one variable, and name it BP.
9-16
Add, Delete, and Rearrange Table Variables
T = mergevars(T,{'Systolic','Diastolic'},'NewVariableName','BP');
head(T,5)
ans=5×6 table
LastName Smoker Height Weight BMI BP
____________ ______ ______ ______ ______ __________
You can reorient the rows of a table or timetable, so that they become the variables in the output
table, using the rows2vars function. However, if the table has multicolumn variables, then you must
split them before you can call rows2vars.
Reorient the rows of T. Specify that the names of the patients in T are the names of table variables in
the output table. The first variable of T3 contains the names of the variables of T. Each remaining
variable of T3 contains the data from the corresponding row of T.
T = splitvars(T,'BP','NewVariableNames',{'Systolic','Diastolic'});
T3 = rows2vars(T,'VariableNamesSource','LastName');
T3(:,1:5)
ans=6×5 table
OriginalVariableNames Smith Johnson Williams Jones
_____________________ ______ _______ ________ ______
{'Smoker' } 1 0 0 0
{'Height' } 71 69 64 67
{'Weight' } 176 163 131 133
{'BMI' } 24.547 24.071 22.486 20.831
{'Systolic' } 124 109 125 117
{'Diastolic'} 93 77 83 75
You can use dot syntax with T3 to access patient data as an array. However, if the row values of an
input table cannot be concatenated, then the variables of the output table are cell arrays.
T3.Smith
ans = 6×1
1.0000
71.0000
176.0000
24.5467
124.0000
93.0000
See Also
table | addvars | movevars | removevars | splitvars | mergevars | inner2outer |
rows2vars
9-17
9 Tables
Related Examples
• “Add and Delete Table Rows” on page 9-9
• “Clean Messy and Missing Data in Tables” on page 9-19
• “Modify Units, Descriptions, and Table Variable Names” on page 9-24
9-18
Clean Messy and Missing Data in Tables
Load sample data from a comma-separated text file, messy.csv. The file contains many different
missing data indicators:
To specify the character vectors to treat as empty values, use the 'TreatAsEmpty' name-value pair
argument with the readtable function. (Use the disp function to display all 21 rows, even when
running this example as a live script.)
T = readtable('messy.csv','TreatAsEmpty',{'.','NA'});
disp(T)
A B C D E
________ ____ __________ ____ ____
{'afe1'} 3 {'yes' } 3 3
{'egh3'} NaN {'no' } 7 7
{'wth4'} 3 {'yes' } 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes' } 5 5
{'jre3'} 34.6 {'yes' } 34.6 34.6
{'wen9'} 234 {'yes' } 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes' } 5 5
{'wnk3'} 245 {'yes' } 245 245
{'abk6'} 563 {0x0 char} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes' } 23 23
{'wba3'} NaN {'yes' } NaN 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} -99 {'yes' } -99 -99
{'bas8'} 23 {'no' } 23 23
{'gry5'} NaN {'yes' } NaN 21
T is a table with 21 rows and five variables. 'TreatAsEmpty' only applies to numeric columns in the
file and cannot handle numeric values specified as text, such as '-99'.
Summarize Table
View the data type, description, units, and other descriptive statistics for each variable by creating a
table summary using the summary function.
summary(T)
9-19
9 Tables
Variables:
B: 21x1 double
Values:
Min -99
Median 14
Max 563
NumMissing 3
D: 21x1 double
Values:
Min -99
Median 7
Max 563
NumMissing 2
E: 21x1 double
Values:
Min -99
Median 14
Max 563
When you import data from a file, the default is for readtable to read any variables with
nonnumeric elements as a cell array of character vectors.
Display the subset of rows from the table, T, that have at least one missing value.
A B C D E
________ ___ __________ ___ ___
readtable replaced '.' and 'NA' with NaN in the numeric variables, B, D, and E.
Clean the data so that the missing values indicated by code -99 have the standard MATLAB®
numeric missing value indicator, NaN.
9-20
Clean Messy and Missing Data in Tables
T = standardizeMissing(T,-99);
disp(T)
A B C D E
________ ____ __________ ____ ____
{'afe1'} 3 {'yes' } 3 3
{'egh3'} NaN {'no' } 7 7
{'wth4'} 3 {'yes' } 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes' } 5 5
{'jre3'} 34.6 {'yes' } 34.6 34.6
{'wen9'} 234 {'yes' } 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes' } 5 5
{'wnk3'} 245 {'yes' } 245 245
{'abk6'} 563 {0x0 char} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes' } 23 23
{'wba3'} NaN {'yes' } NaN 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} NaN {'yes' } NaN NaN
{'bas8'} 23 {'no' } 23 23
{'gry5'} NaN {'yes' } NaN 21
Create a new table, T2, and replace missing values with values from previous rows of the table.
fillmissing provides a number of ways to fill in missing values.
T2 = fillmissing(T,'previous');
disp(T2)
A B C D E
________ ____ _______ ____ ____
{'afe1'} 3 {'yes'} 3 3
{'egh3'} 3 {'no' } 7 7
{'wth4'} 3 {'yes'} 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes'} 5 5
{'jre3'} 34.6 {'yes'} 34.6 34.6
{'wen9'} 234 {'yes'} 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes'} 5 5
{'wnk3'} 245 {'yes'} 245 245
{'abk6'} 563 {'yes'} 563 563
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes'} 23 23
{'wba3'} 23 {'yes'} 23 14
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'poj2'} 22 {'yes'} 22 22
9-21
9 Tables
{'bas8'} 23 {'no' } 23 23
{'gry5'} 23 {'yes'} 23 21
Create a new table, T3, that contains only the rows from T without missing values. T3 has only 16
rows.
T3 = rmmissing(T);
disp(T3)
A B C D E
________ ____ _______ ____ ____
{'afe1'} 3 {'yes'} 3 3
{'wth4'} 3 {'yes'} 3 3
{'atn2'} 23 {'no' } 23 23
{'arg1'} 5 {'yes'} 5 5
{'jre3'} 34.6 {'yes'} 34.6 34.6
{'wen9'} 234 {'yes'} 234 234
{'ple2'} 2 {'no' } 2 2
{'dbo8'} 5 {'no' } 5 5
{'oii4'} 5 {'yes'} 5 5
{'wnk3'} 245 {'yes'} 245 245
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
{'oks9'} 23 {'yes'} 23 23
{'pkn4'} 2 {'no' } 2 2
{'adw3'} 22 {'no' } 22 22
{'bas8'} 23 {'no' } 23 23
Organize Data
Sort the rows of T3 in descending order by C, and then sort in ascending order by A.
T3 = sortrows(T2,{'C','A'},{'descend','ascend'});
disp(T3)
A B C D E
________ ____ _______ ____ ____
9-22
Clean Messy and Missing Data in Tables
{'egh3'} 3 {'no' } 7 7
{'pkn4'} 2 {'no' } 2 2
{'ple2'} 2 {'no' } 2 2
{'pnj5'} 463 {'no' } 463 463
{'wnn3'} 6 {'no' } 6 6
In C, the rows are grouped first by 'yes', followed by 'no'. Then in A, the rows are listed
alphabetically.
T3 = T3(:,{'A','C','B','D','E'});
disp(T3)
A C B D E
________ _______ ____ ____ ____
See Also
readtable | summary | ismissing | sortrows | standardizeMissing | rmmissing |
fillmissing
Related Examples
• “Add and Delete Table Rows” on page 9-9
• “Add, Delete, and Rearrange Table Variables” on page 9-12
• “Modify Units, Descriptions, and Table Variable Names” on page 9-24
• “Access Data in Tables” on page 9-32
• “Missing Data in MATLAB”
• “Data Cleaning and Calculations in Tables” on page 9-66
• “Grouped Calculations in Tables and Timetables” on page 9-86
9-23
9 Tables
load patients
BloodPressure = [Systolic Diastolic];
T = table(Gender,Age,Height,Weight,Smoker,BloodPressure);
T(1:5,:)
ans=5×6 table
Gender Age Height Weight Smoker BloodPressure
__________ ___ ______ ______ ______ _____________
Specify units for each variable in the table by modifying the table property, VariableUnits. Specify
the variable units as a cell array of character vectors.
An individual empty character vector within the cell array indicates that the corresponding variable
does not have units.
Add a variable description for the variable, BloodPressure. Assign a single character vector to the
element of the cell array containing the description for BloodPressure.
T.Properties.VariableDescriptions{'BloodPressure'} = 'Systolic/Diastolic';
You can use the variable name, 'BloodPressure', or the numeric index of the variable, 6, to index
into the cell array of character vectors containing the variable descriptions.
View the data type, description, units, and other descriptive statistics for each variable by using
summary to summarize the table.
summary(T)
9-24
Modify Units, Descriptions, and Table Variable Names
Variables:
Properties:
Units: Yrs
Values:
Min 25
Median 39
Max 50
Properties:
Units: In
Values:
Min 60
Median 67
Max 72
Properties:
Units: Lbs
Values:
Min 111
Median 142.5
Max 202
Values:
True 34
False 66
Properties:
Description: Systolic/Diastolic
Values:
Column 1 Column 2
________ ________
Min 109 68
Median 122 81.5
Max 138 99
The BloodPressure variable has a description and the Age, Height, Weight, and BloodPressure
variables have units.
9-25
9 Tables
Change the variable name for the first variable from Gender to Sex.
T.Properties.VariableNames{'Gender'} = 'Sex';
T(1:5,:)
ans=5×6 table
Sex Age Height Weight Smoker BloodPressure
__________ ___ ______ ______ ______ _____________
In addition to properties for variable units, descriptions and names, there are table properties for row
and dimension names, a table description, and user data.
See Also
readtable | table | array2table | cell2table | struct2table | summary
Related Examples
• “Add, Delete, and Rearrange Table Variables” on page 9-12
• “Access Data in Tables” on page 9-32
9-26
Add Custom Properties to Tables and Timetables
All tables and timetables have properties that contain metadata about them or their variables. You
can access these properties through the T.Properties object, where T is the name of the table or
timetable. For example, T.Properties.VariableNames returns a cell array containing the names
of the variables of T.
The properties you access through T.Properties are part of the definitions of the table and
timetable data types. You cannot add or remove these predefined properties. But starting in
R2018b, you can add and remove your own custom properties, by modifying the
T.Properties.CustomProperties object of a table or timetable.
Add Properties
Read power outage data into a table. Sort it using the first variable that contains dates and times,
OutageTime. Then display the first three rows.
T = readtable('outages.csv');
T = sortrows(T,'OutageTime');
head(T,3)
ans=3×6 table
Region OutageTime Loss Customers RestorationTime Cause
_____________ ________________ ______ __________ ________________ ____________
Display its properties. These are the properties that all tables have in common. Note that there is also
a CustomProperties object, but that by default it has no properties.
T.Properties
ans =
TableProperties with properties:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1x6 cell}
VariableDescriptions: {}
VariableUnits: {}
VariableContinuity: []
RowNames: {}
CustomProperties: No custom properties are set.
Use addprop and rmprop to modify CustomProperties.
To add custom properties, use the addprop function. Specify the names of the properties. For each
property, also specify whether it has metadata for the whole table (similar to the Description
property) or for its variables (similar to the VariableNames property). If the property has variable
metadata, then its value must be a vector whose length is equal to the number of variables.
9-27
9 Tables
Add custom properties that contain an output file name, file type, and indicators of which variables to
plot. Best practice is to assign the input table as the output argument of addprop, so that the custom
properties are part of the same table. Specify that the output file name and file type are table
metadata using the 'table' option. Specify that the plot indicators are variable metadata using the
'variable' option.
T = addprop(T,{'OutputFileName','OutputFileType','ToPlot'}, ...
{'table','table','variable'});
T.Properties
ans =
TableProperties with properties:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1x6 cell}
VariableDescriptions: {}
VariableUnits: {}
VariableContinuity: []
RowNames: {}
When you add custom properties using addprop, their values are empty arrays by default. You can
set and access the values of the custom properties using dot syntax.
Set the output file name and type. These properties contain metadata for the table. Then assign a
logical array to the ToPlot property. This property contains metadata for the variables. In this
example, the elements of the value of the ToPlot property are true for each variable to be included
in a plot, and false for each variable to be excluded.
T.Properties.CustomProperties.OutputFileName = 'outageResults';
T.Properties.CustomProperties.OutputFileType = '.mat';
T.Properties.CustomProperties.ToPlot = [false false true true true false];
T.Properties
ans =
TableProperties with properties:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1x6 cell}
VariableDescriptions: {}
VariableUnits: {}
VariableContinuity: []
RowNames: {}
9-28
Add Custom Properties to Tables and Timetables
ToPlot: [0 0 1 1 1 0]
Plot variables from T in a stacked plot using the stackedplot function. To plot only the Loss,
Customers, and RestorationTime values, use the ToPlot custom property as the second input
argument.
stackedplot(T,T.Properties.CustomProperties.ToPlot);
When you move or delete table variables, both the predefined and custom properties are reordered so
that their values correspond to the same variables. In this example, the values of the ToPlot custom
property stay aligned with the variables marked for plotting, just as the values of the
VariableNames predefined property stay aligned.
T.Customers = [];
T.Properties
ans =
TableProperties with properties:
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {1x5 cell}
VariableDescriptions: {}
9-29
9 Tables
VariableUnits: {}
VariableContinuity: []
RowNames: {}
Convert the table to a timetable, using the outage times as row times. Move Region to the end of the
table, and RestorationTime before the first variable, using the movevars function. Note that the
properties are reordered appropriately. The RestorationTime and Loss variables still have
indicators for inclusion in a plot.
T = table2timetable(T);
T = movevars(T,'Region','After','Cause');
T = movevars(T,'RestorationTime','Before',1);
T.Properties
ans =
TimetableProperties with properties:
Description: ''
UserData: []
DimensionNames: {'OutageTime' 'Variables'}
VariableNames: {'RestorationTime' 'Loss' 'Cause' 'Region'}
VariableDescriptions: {}
VariableUnits: {}
VariableContinuity: []
RowTimes: [1468x1 datetime]
StartTime: 2002-02-01 12:18
SampleRate: NaN
TimeStep: NaN
Remove Properties
You can remove any or all of the custom properties of a table using the rmprop function. However,
you cannot use it to remove predefined properties from T.Properties, because those properties are
part of the definition of the table data type.
Remove the OutputFileName and OutputFileType custom properties. Display the remaining table
properties.
T = rmprop(T,{'OutputFileName','OutputFileType'});
T.Properties
ans =
TimetableProperties with properties:
Description: ''
UserData: []
9-30
Add Custom Properties to Tables and Timetables
See Also
readtable | table | head | addprop | table2timetable | movevars | rmprop | sortrows |
stackedplot
Related Examples
• “Modify Units, Descriptions, and Table Variable Names” on page 9-24
• “Access Data in Tables” on page 9-32
• “Add, Delete, and Rearrange Table Variables” on page 9-12
9-31
9 Tables
A table is a container that stores column-oriented data in variables. Table variables can have different
data types and sizes as long as all variables have the same number of rows. Table variables have
names, just as the fields of a structure have names. The rows of a table can have names, but row
names are not required. To access table data, index into the rows and variables using either their
names or numeric indices.
• Smooth parentheses, (), returns a table that has selected rows and variables.
• Dot notation returns the contents of a variable as an array.
• Curly braces, {}, returns an array concatenated from the contents of selected rows and
variables.
You can specify rows and variables by name, numeric index, or data type. Starting in R2019b,
variable names and row names can include any characters, including spaces and non-ASCII
characters. Also, they can start with any characters, not just letters. Variable and row names do not
have to be valid MATLAB identifiers (as determined by the isvarname function).
9-32
Access Data in Tables
Array extracted
from the first
table variable
9-33
9 Tables
9-34
Access Data in Tables
load patients
whos
Create a table and populate it with the Age, Gender, Height, Weight, and Smoker workspace
variables. Use the unique identifiers in LastName as row names. T is a 100-by-5 table. (When you
specify row names, they do not count as a table variable).
T = table(Age,Gender,Height,Weight,Smoker,...
'RowNames',LastName)
T=100×5 table
Age Gender Height Weight Smoker
___ __________ ______ ______ ______
9-35
9 Tables
Create a subtable containing the first five rows and all the variables from T. To specify the desired
rows and variables, use numeric indices within parentheses. This type of indexing is similar to
indexing into numeric arrays.
T1 = T(1:5,:)
T1=5×5 table
Age Gender Height Weight Smoker
___ __________ ______ ______ ______
T1 is a 5-by-5 table.
In addition to numeric indices, you can use row or variable names inside the parentheses. (In this
case, using row indices and a colon is more compact than using row or variable names.)
Select all the data for the patients with the last names 'Williams' and 'Brown'. Since T has row
names that are the last names of patients, index into T using row names.
T2 = T({'Williams','Brown'},:)
T2=2×5 table
Age Gender Height Weight Smoker
___ __________ ______ ______ ______
T2 is a 2-by-5 table.
You also can select variables by name. Create a table that has only the first five rows of T and the
Height and Weight variables. Display it.
9-36
Access Data in Tables
T3 = T(1:5,{'Height','Weight'})
T3=5×2 table
Height Weight
______ ______
Smith 71 176
Johnson 69 163
Williams 64 131
Jones 67 133
Brown 64 119
Table variable names do not have to be valid MATLAB identifiers. They can include spaces and non-
ASCII characters, and can start with any character.
Add a variable name with spaces and a dash to T. Then index into T using variable names.
T = addvars(T,SelfAssessedHealthStatus,'NewVariableNames','Self-Assessed Health Status');
T(1:5,{'Age','Smoker','Self-Assessed Health Status'})
ans=5×3 table
Age Smoker Self-Assessed Health Status
___ ______ ___________________________
Instead of specifying variables using names or numbers, you can create a data type subscript that
matches all variables having the same data type.
S =
table vartype subscript:
Create a table that has only the numeric variables, and only the first five rows, from T.
T4 = T(1:5,S)
T4=5×3 table
Age Height Weight
___ ______ ______
Smith 38 71 176
Johnson 43 69 163
9-37
9 Tables
Williams 38 64 131
Jones 40 67 133
Brown 49 64 119
load patients
T = table(Age,Gender,Height,Weight,Smoker,...
'RowNames',LastName);
To extract data from one variable, use dot notation. Extract numeric values from the variable Weight.
Then plot a histogram of those values.
histogram(T.Weight)
title('Patient Weight')
9-38
Access Data in Tables
You can index into an array or a table using an array of logical indices. Typically, you use a logical
expression that determines which values in a table variable meet a condition. The result of the
expression is an array of logical indices.
For example, create logical indices matching patients whose age is less than 40.
1
0
1
0
0
0
1
0
1
1
⋮
To extract heights for patients whose age is less than 40, index into the Height variable using rows.
There are 56 patients younger than 40.
T.Height(rows)
ans = 56×1
71
64
64
68
66
71
72
65
69
69
⋮
You can index into a table with logical indices. Display the rows of T for the patients who are younger
than 40.
T(rows,:)
ans=56×5 table
Age Gender Height Weight Smoker
___ __________ ______ ______ ______
9-39
9 Tables
You can match multiple conditions with one logical expression. Display the rows for smoking patients
younger than 40.
ans=18×5 table
Age Gender Height Weight Smoker
___ __________ ______ ______ ______
• By name, without quotation marks. For example, T.Date specifies a variable named 'Date'.
• By an expression, where the expression is enclosed by parentheses after the dot. For example, T.
('Start Date') specifies a variable named 'Start Date'.
Use the first syntax when a table variable name also happens to be a valid MATLAB® identifier. (A
valid identifier starts with a letter and includes only letters, digits, and underscores.)
9-40
Access Data in Tables
For example, create a table from the patients MAT-file. Then use dot notation to access the
contents of table variables.
load patients
T = table(Age,Gender,Height,Weight,Smoker,...
'RowNames',LastName);
To specify a variable by position in the table, use a number. Age is the first variable in T, so use the
number 1 to specify its position.
T.(1)
ans = 100×1
38
43
38
40
49
46
33
40
28
31
⋮
To specify a variable by name, you can enclose it in quotation marks. Since 'Age' is a valid identifier,
you can specify it using either T.Age or T.('Age').
T.('Age')
ans = 100×1
38
43
38
40
49
46
33
40
28
31
⋮
You can specify table variable names that are not valid MATLAB identifiers. Variable names can
include spaces and non-ASCII characters, and can start with any character. However, when you use
dot notation to access a table variable with such a name, you must specify it using parentheses.
9-41
9 Tables
ans=5×6 table
Age Gender Height Weight Smoker Self-Assessed Health Status
___ __________ ______ ______ ______ ___________________________
Access the new table variable using dot notation. Display the first five elements.
You also can use the output of a function as a variable name. Delete the T.('Self-Assessed
Health Status') variable. Then replace it with a variable whose name includes today's date.
ans=5×6 table
Age Gender Height Weight Smoker 01-Sep-2021 Self Report
___ __________ ______ ______ ______ _______________________
9-42
Access Data in Tables
Create a table from numeric and logical arrays from the patients file.
load patients
T = table(Age,Height,Weight,Smoker,...
'RowNames',LastName);
Extract data from multiple variables in T. Unlike dot notation, indexing with curly braces can extract
values from multiple table variables and concatenate them into one array.
Extract the height and weight for the first five patients. Use numeric indices to select the first five
rows, and variable names to select the variables Height and Weight.
A = T{1:5,{'Height','Weight'}}
A = 5×2
71 176
69 163
64 131
67 133
64 119
If you specify one variable name, then curly brace indexing results in the same array you can get with
dot notation. However, you must specify both rows and variables when you use curly brace indexing.
For example, this syntaxes T.Height and T{:,'Height'} return the same array.
If all the table variables have data types that allow them to be concatenated together, then you can
use the T.Variables syntax to put all the table data into an array. This syntax is equivalent to
T{:,:} where the colons indicate all rows and all variables.
A2 = T.Variables
A2 = 100×4
38 71 176 1
43 69 163 0
38 64 131 0
40 67 133 0
49 64 119 0
46 68 142 0
33 64 142 1
40 68 180 0
28 68 183 0
31 66 132 0
⋮
See Also
table | histogram | addvars | vartype
9-43
9 Tables
Related Examples
• “Advantages of Using Tables” on page 9-56
• “Create Tables and Assign Data to Them” on page 9-2
• “Modify Units, Descriptions, and Table Variable Names” on page 9-24
• “Calculations on Tables” on page 9-45
• “Data Cleaning and Calculations in Tables” on page 9-66
• “Grouped Calculations in Tables and Timetables” on page 9-86
• “Find Array Elements That Meet a Condition” on page 5-2
9-44
Calculations on Tables
Calculations on Tables
This example shows how to perform calculations on tables.
The functions rowfun and varfun each apply a specified function to a table, yet many other
functions require numeric or homogeneous arrays as input arguments. You can extract data from
individual variables using dot indexing or from one or more variables using curly braces. The
extracted data is then an array that you can use as input to other functions. Starting in R2018a, you
also can use the groupsummary function for calculations on groups of data in a table.
Read data from a comma-separated text file, testScores.csv, into a table using the readtable
function. testScores.csv contains test scores for several students. Use the student names in the
first column of the text file as row names in the table.
T = readtable('testScores.csv','ReadRowNames',true)
T=10×4 table
Gender Test1 Test2 Test3
__________ _____ _____ _____
HOWARD {'male' } 90 87 93
WARD {'male' } 87 85 83
TORRES {'male' } 86 85 88
PETERSON {'female'} 75 80 72
GRAY {'female'} 89 86 87
RAMIREZ {'female'} 96 92 98
JAMES {'male' } 78 75 77
WATSON {'female'} 91 94 92
BROOKS {'female'} 86 83 85
KELLY {'male' } 79 76 82
View the data type, description, units, and other descriptive statistics for each variable by using the
summary function to summarize the table.
summary(T)
Variables:
Values:
Min 75
Median 86.5
Max 96
9-45
9 Tables
Values:
Min 75
Median 85
Max 94
Values:
Min 72
Median 86
Max 98
The summary contains the minimum, median, and maximum score for each test.
Extract the data from the second, third, and fourth variables using curly braces, {}, find the average
of each row, and store it in a new variable, TestAvg.
T.TestAvg = mean(T{:,2:end},2)
T=10×5 table
Gender Test1 Test2 Test3 TestAvg
__________ _____ _____ _____ _______
HOWARD {'male' } 90 87 93 90
WARD {'male' } 87 85 83 85
TORRES {'male' } 86 85 88 86.333
PETERSON {'female'} 75 80 72 75.667
GRAY {'female'} 89 86 87 87.333
RAMIREZ {'female'} 96 92 98 95.333
JAMES {'male' } 78 75 77 76.667
WATSON {'female'} 91 94 92 92.333
BROOKS {'female'} 86 83 85 84.667
KELLY {'male' } 79 76 82 79
Alternatively, you can use the variable names, T{:,{'Test1','Test2','Test3'}} or the variable
indices, T{:,2:4} to select the subset of data.
Compute the mean and maximum of TestAvg by gender of the students. First, compute the means by
using the varfun function.
varfun(@mean,T,'InputVariables','TestAvg',...
'GroupingVariables','Gender')
ans=2×3 table
Gender GroupCount mean_TestAvg
__________ __________ ____________
{'female'} 5 87.067
{'male' } 5 83.4
9-46
Calculations on Tables
Starting in R2018a, you also can use the groupsummary function to perform computations on groups
of data in a table. Compute the maximum values of TestAvg for each group of students using
groupsummary.
groupsummary(T,'Gender','max','TestAvg')
ans=2×3 table
Gender GroupCount max_TestAvg
__________ __________ ___________
{'female'} 5 95.333
{'male' } 5 90
The maximum score for each test is 100. Use curly braces to extract the data from the table and
convert the test scores to a 25 point scale.
T{:,2:end} = T{:,2:end}*25/100
T=10×5 table
Gender Test1 Test2 Test3 TestAvg
__________ _____ _____ _____ _______
T.Properties.VariableNames{end} = 'Final'
T=10×5 table
Gender Test1 Test2 Test3 Final
__________ _____ _____ _____ ______
9-47
9 Tables
See Also
table | summary | rowfun | varfun | findgroups | splitapply | groupsummary
Related Examples
• “Access Data in Tables” on page 9-32
• “Split Table Data Variables and Apply Functions” on page 9-52
• “Data Cleaning and Calculations in Tables” on page 9-66
• “Grouped Calculations in Tables and Timetables” on page 9-86
9-48
Split Data into Groups and Calculate Statistics
Split the patients into nonsmokers and smokers using the Smoker variable. Calculate the mean
weight for each group.
[G,smoker] = findgroups(Smoker);
meanWeight = splitapply(@mean,Weight,G)
meanWeight = 2×1
149.9091
161.9412
The findgroups function returns G, a vector of group numbers created from Smoker. The
splitapply function uses G to split Weight into two groups. splitapply applies the mean function
to each group and concatenates the mean weights into a vector.
findgroups returns a vector of group identifiers as the second output argument. The group
identifiers are logical values because Smoker contains logical values. The patients in the first group
are nonsmokers, and the patients in the second group are smokers.
smoker
9-49
9 Tables
Split the patient weights by both gender and status as a smoker and calculate the mean weights.
G = findgroups(Gender,Smoker);
meanWeight = splitapply(@mean,Weight,G)
meanWeight = 4×1
130.3250
130.9231
180.0385
181.1429
The unique combinations across Gender and Smoker identify four groups of patients: female
nonsmokers, female smokers, male nonsmokers, and male smokers. Summarize the four groups and
their mean weights in a table.
[G,gender,smoker] = findgroups(Gender,Smoker);
T = table(gender,smoker,meanWeight)
T=4×3 table
gender smoker meanWeight
______ ______ __________
T.gender contains categorical values, and T.smoker contains logical values. The data types of these
table variables match the data types of Gender and Smoker respectively.
Calculate body mass index (BMI) for the four groups of patients. Define a function that takes Height
and Weight as its two input arguments, and that calculates BMI.
BMI = 4×1
21.6721
21.6686
26.5775
26.4584
Calculate the fraction of patients who report their health as either Poor or Fair. First, use
splitapply to count the number of patients in each group: female nonsmokers, female smokers,
male nonsmokers, and male smokers. Then, count only those patients who report their health as
either Poor or Fair, using logical indexing on S and G. From these two sets of counts, calculate the
fraction for each group.
9-50
Split Data into Groups and Calculate Statistics
[G,gender,smoker] = findgroups(Gender,Smoker);
S = SelfAssessedHealthStatus;
I = ismember(S,{'Poor','Fair'});
numPatients = splitapply(@numel,S,G);
numPF = splitapply(@numel,S(I),G(I));
numPF./numPatients
ans = 4×1
0.2500
0.3846
0.3077
0.1429
Compare the standard deviation in Diastolic readings of those patients who report Poor or Fair
health, and those patients who report Good or Excellent health.
stdDiastolicPF = splitapply(@std,Diastolic(I),G(I));
stdDiastolicGE = splitapply(@std,Diastolic(~I),G(~I));
Collect results in a table. For these patients, the female nonsmokers who report Poor or Fair health
show the widest variation in blood pressure readings.
T = table(gender,smoker,numPatients,numPF,stdDiastolicPF,stdDiastolicGE,BMI)
T=4×7 table
gender smoker numPatients numPF stdDiastolicPF stdDiastolicGE BMI
______ ______ ___________ _____ ______________ ______________ ______
See Also
findgroups | splitapply
Related Examples
• “Grouping Variables To Split Data” on page 9-61
• “Split Table Data Variables and Apply Functions” on page 9-52
• “Data Cleaning and Calculations in Tables” on page 9-66
9-51
9 Tables
The sample file, outages.csv, contains data representing electric utility outages in the United
States. The file contains six columns: Region, OutageTime, Loss, Customers, RestorationTime,
and Cause. Read outages.csv into a table.
T = readtable('outages.csv');
Convert Region and Cause to categorical arrays, and OutageTime and RestorationTime to
datetime arrays. Display the first five rows.
T.Region = categorical(T.Region);
T.Cause = categorical(T.Cause);
T.OutageTime = datetime(T.OutageTime);
T.RestorationTime = datetime(T.RestorationTime);
T(1:5,:)
ans=5×6 table
Region OutageTime Loss Customers RestorationTime Cause
_________ ________________ ______ __________ ________________ _______________
Determine the greatest power loss due to a power outage in each region. The findgroups function
returns G, a vector of group numbers created from T.Region. The splitapply function uses G to
split T.Loss into five groups, corresponding to the five regions. splitapply applies the max
function to each group and concatenates the maximum power losses into a vector.
G = findgroups(T.Region);
maxLoss = splitapply(@max,T.Loss,G)
maxLoss = 5×1
104 ×
2.3141
2.3418
0.8767
0.2796
1.6659
Calculate the maximum power loss due to a power outage by cause. To specify that Cause is the
grouping variable, use table indexing. Create a table that contains the maximum power losses and
their causes.
9-52
Split Table Data Variables and Apply Functions
T1 = T(:,'Cause');
[G,powerLosses] = findgroups(T1);
powerLosses.maxLoss = splitapply(@max,T.Loss,G)
powerLosses=10×2 table
Cause maxLoss
________________ _______
attack 582.63
earthquake 258.18
energy emergency 11638
equipment fault 16659
fire 872.96
severe storm 8767.3
thunder storm 23418
unknown 23141
wind 2796
winter storm 2883.7
powerLosses is a table because T1 is a table. You can append the maximum losses as another table
variable.
Calculate the maximum power loss by cause in each region. To specify that Region and Cause are
the grouping variables, use table indexing. Create a table that contains the maximum power losses
and display the first 15 rows.
T1 = T(:,{'Region','Cause'});
[G,powerLosses] = findgroups(T1);
powerLosses.maxLoss = splitapply(@max,T.Loss,G);
powerLosses(1:15,:)
ans=15×3 table
Region Cause maxLoss
_________ ________________ _______
MidWest attack 0
MidWest energy emergency 2378.7
MidWest equipment fault 903.28
MidWest severe storm 6808.7
MidWest thunder storm 15128
MidWest unknown 23141
MidWest wind 2053.8
MidWest winter storm 669.25
NorthEast attack 405.62
NorthEast earthquake 0
NorthEast energy emergency 11638
NorthEast equipment fault 794.36
NorthEast fire 872.96
NorthEast severe storm 6002.4
NorthEast thunder storm 23418
Determine power-outage impact on customers by cause and region. Because T.Loss contains NaN
values, wrap sum in an anonymous function to use the 'omitnan' input argument.
9-53
9 Tables
osumFcn = @(x)(sum(x,'omitnan'));
powerLosses.totalCustomers = splitapply(osumFcn,T.Customers,G);
powerLosses(1:15,:)
ans=15×4 table
Region Cause maxLoss totalCustomers
_________ ________________ _______ ______________
MidWest attack 0 0
MidWest energy emergency 2378.7 6.3363e+05
MidWest equipment fault 903.28 1.7822e+05
MidWest severe storm 6808.7 1.3511e+07
MidWest thunder storm 15128 4.2563e+06
MidWest unknown 23141 3.9505e+06
MidWest wind 2053.8 1.8796e+06
MidWest winter storm 669.25 4.8887e+06
NorthEast attack 405.62 2181.8
NorthEast earthquake 0 0
NorthEast energy emergency 11638 1.4391e+05
NorthEast equipment fault 794.36 3.9961e+05
NorthEast fire 872.96 6.1292e+05
NorthEast severe storm 6002.4 2.7905e+07
NorthEast thunder storm 23418 2.1885e+07
Determine the mean durations of all U.S. power outages in hours. Add the mean durations of power
outages to powerLosses. Because T.RestorationTime has NaT values, omit the resulting NaN
values when calculating the mean durations.
D = T.RestorationTime - T.OutageTime;
H = hours(D);
omeanFcn = @(x)(mean(x,'omitnan'));
powerLosses.meanOutage = splitapply(omeanFcn,H,G);
powerLosses(1:15,:)
ans=15×5 table
Region Cause maxLoss totalCustomers meanOutage
_________ ________________ _______ ______________ __________
9-54
Split Table Data Variables and Apply Functions
See Also
findgroups | splitapply | rowfun | varfun
Related Examples
• “Access Data in Tables” on page 9-32
• “Calculations on Tables” on page 9-45
• “Grouping Variables To Split Data” on page 9-61
• “Split Data into Groups and Calculate Statistics” on page 9-49
• “Data Cleaning and Calculations in Tables” on page 9-66
• “Grouped Calculations in Tables and Timetables” on page 9-86
9-55
9 Tables
You can use the table data type to collect mixed-type data and metadata properties, such as variable
name, row names, descriptions, and variable units, in a single container. Tables are suitable for
column-oriented or tabular data that is often stored as columns in a text file or in a spreadsheet. For
example, you can use a table to store experimental data, with rows representing different
observations and columns representing different measured variables.
Tables consist of rows and column-oriented variables. Each variable in a table can have a different
data type and a different size, but each variable must have the same number of rows.
Then, combine the workspace variables, Systolic and Diastolic into a single BloodPressure
variable and convert the workspace variable, Gender, from a cell array of character vectors to a
categorical array.
BloodPressure = [Systolic Diastolic];
Gender = categorical(Gender);
whos('Gender','Age','Smoker','BloodPressure')
The variables Age, BloodPressure, Gender, and Smoker have varying data types and are
candidates to store in a table since they all have the same number of rows, 100.
Now, create a table from the variables and display the first five rows.
T = table(Gender,Age,Smoker,BloodPressure);
T(1:5,:)
ans=5×4 table
Gender Age Smoker BloodPressure
______ ___ ______ _____________
The table displays in a tabular format with the variable names at the top.
Each variable in a table is a single data type. If you add a new row to the table, MATLAB® forces
consistency of the data type between the new data and the corresponding table variables. For
example, if you try to add information for a new patient where the first column contains the patient's
9-56
Advantages of Using Tables
The error occurs because MATLAB® cannot assign numeric data, 37, to the categorical array,
Gender.
For comparison of tables with structures, consider the structure array, StructArray, that is
equivalent to the table, T.
StructArray = table2struct(T)
Structure arrays organize records using named fields. Each field's value can have a different data
type or size. Now, display the named fields for the first element of StructArray.
StructArray(1)
Fields in a structure array are analogous to variables in a table. However, unlike with tables, you
cannot enforce homogeneity within a field. For example, you can have some values of S.Gender that
are categorical array elements, Male or Female, others that are character vectors, 'Male' or
'Female', and others that are integers, 0 or 1.
Now consider the same data stored in a scalar structure, with four fields each containing one variable
from the table.
ScalarStruct = struct(...
'Gender',{Gender},...
'Age',Age,...
'Smoker',Smoker,...
'BloodPressure',BloodPressure)
Unlike with tables, you cannot enforce that the data is rectangular. For example, the field
ScalarStruct.Age can be a different length than the other fields.
A table allows you to maintain the rectangular structure (like a structure array) and enforce
homogeneity of variables (like fields in a scalar structure). Although cell arrays do not have named
9-57
9 Tables
fields, they have many of the same disadvantages as structure arrays and scalar structures. If you
have rectangular data that is homogeneous in each variable, consider using a table. Then you can use
numeric or named indexing, and you can use table properties to store metadata.
You can index into a table using parentheses, curly braces, or dot indexing. Parentheses allow you to
select a subset of the data in a table and preserve the table container. Curly braces and dot indexing
allow you to extract data from a table. Within each table indexing method, you can specify the rows
or variables to access by name or by numeric index.
Consider the sample table from above. Each row in the table, T, represents a different patient. The
workspace variable, LastName, contains unique identifiers for the 100 rows. Add row names to the
table by setting the RowNames property to LastName and display the first five rows of the updated
table.
T.Properties.RowNames = LastName;
T(1:5,:)
ans=5×4 table
Gender Age Smoker BloodPressure
______ ___ ______ _____________
In addition to labeling the data, you can use row and variable names to access data in the table. For
example, use named indexing to display the age and blood pressure of the patients Williams and
Brown.
T({'Williams','Brown'},{'Age','BloodPressure'})
ans=2×2 table
Age BloodPressure
___ _____________
Williams 38 125 83
Brown 49 122 80
Now, use numeric indexing to return an equivalent subtable. Return the third and fifth row from the
second and fourth variables.
T(3:2:5,2:2:4)
ans=2×2 table
Age BloodPressure
___ _____________
Williams 38 125 83
Brown 49 122 80
With cell arrays or structures, you do not have the same flexibility to use named or numeric indexing.
9-58
Advantages of Using Tables
• With a cell array, you must use strcmp to find desired named data, and then you can index into
the array.
• With a scalar structure or structure array, it is not possible to refer to a field by number.
Furthermore, with a scalar structure, you cannot easily select a subset of variables or a subset of
observations. With a structure array, you can select a subset of observations, but you cannot select
a subset of variables.
• With a table, you can access data by named index or by numeric index. Furthermore, you can
easily select a subset of variables and a subset of rows.
For more information on table indexing, see “Access Data in Tables” on page 9-32.
In addition to storing data, tables have properties to store metadata, such as variable names, row
names, descriptions, and variable units. You can access a property using T.Properties.PropName,
where T is the name of the table and PropName is one of the table properties.
For example, add a table description, variable descriptions, and variable units for Age.
T.Properties.VariableDescriptions = ...
{'Male or Female' ...
'' ...
'true or false' ...
'Systolic/Diastolic'};
T.Properties.VariableUnits{'Age'} = 'Yrs';
Individual empty character vectors within the cell array for VariableDescriptions indicate that
the corresponding variable does not have a description. For more information, see the Properties
section of table.
summary(T)
Variables:
Properties:
Description: Male or Female
Values:
Female 53
Male 47
Properties:
Units: Yrs
Values:
9-59
9 Tables
Min 25
Median 39
Max 50
Properties:
Description: true or false
Values:
True 34
False 66
Properties:
Description: Systolic/Diastolic
Values:
Column 1 Column 2
________ ________
Min 109 68
Median 122 81.5
Max 138 99
Structures and cell arrays do not have properties for storing metadata.
See Also
table | summary
Related Examples
• “Create Tables and Assign Data to Them” on page 9-2
• “Modify Units, Descriptions, and Table Variable Names” on page 9-24
• “Access Data in Tables” on page 9-32
• “Data Cleaning and Calculations in Tables” on page 9-66
• “Grouped Calculations in Tables and Timetables” on page 9-86
9-60
Grouping Variables To Split Data
Grouping Variables
Grouping variables are variables used to group, or categorize, observations—that is, data values in
other variables. A grouping variable can be any of these data types:
Data variables are the variables that contain observations. A grouping variable must have a value
corresponding to each value in the data variables. Data values belong to the same group when the
corresponding values in the grouping variable are the same.
This table shows examples of data variables, grouping variables, and the groups that you can create
when you split the data variables using the grouping variables.
You can give groups of data meaningful names when you use cell arrays of character vectors or
categorical arrays as grouping variables. A categorical array is an efficient and flexible choice of
grouping variable.
Group Definition
Typically, there are as many groups as there are unique values in the grouping variable. (A
categorical array also can include categories that are not represented in the data.) The groups and
the order of the groups depend on the data type of the grouping variable.
• For numeric, logical, datetime, or duration vectors, or cell arrays of character vectors, the
groups correspond to the unique values sorted in ascending order.
• For categorical arrays, the groups correspond to the unique values observed in the array, sorted in
the order returned by the categories function.
The findgroups function can accept multiple grouping variables, for example G =
findgroups(A1,A2). You also can include multiple grouping variables in a table, for example T =
table(A1,A2); G = findgroups(T). The findgroups function defines groups by the unique
combinations of values across corresponding elements of the grouping variables. findgroups
decides the order by the order of the first grouping variable, and then by the order of the second
grouping variable, and so on. For example, if A1 = {'a','a','b','b'} and A2 = [0 1 0 0],
9-61
9 Tables
then the unique values across the grouping variables are 'a' 0, 'a' 1, and 'b' 0, defining three
groups.
The findgroups function returns a vector of group numbers that define groups based on the unique
values in the grouping variables. splitapply uses the group numbers to split the data into groups
efficiently before applying a function.
See Also
findgroups | splitapply | rowfun | varfun
9-62
Grouping Variables To Split Data
Related Examples
• “Access Data in Tables” on page 9-32
• “Split Table Data Variables and Apply Functions” on page 9-52
• “Split Data into Groups and Calculate Statistics” on page 9-49
• “Data Cleaning and Calculations in Tables” on page 9-66
• “Grouped Calculations in Tables and Timetables” on page 9-86
9-63
9 Tables
T =
Number Party
______ __________
Display its properties, including the dimension names. The default values of the dimension names are
'Row' and 'Variables'.
T.Properties
ans =
Description: ''
UserData: []
DimensionNames: {'Row' 'Variables'}
VariableNames: {'Number' 'Party'}
VariableDescriptions: {}
VariableUnits: {}
RowNames: {5×1 cell}
Starting in R2016b, you can assign new names to the dimension names, and use them to access table
data. Dimension names must be valid MATLAB identifiers, and must not be one of the reserved
names, 'Properties', 'RowNames', or 'VariableNames'.
Assign a new name to the first dimension name, and use it to access the row names of the table.
T.Properties.DimensionNames{1} = 'Name';
T.Name
ans =
'Van Buren'
'Arthur'
9-64
Changes to DimensionNames Property in R2016b
'Fillmore'
'Garfield'
'Polk'
Create a new table variable called Name. When you create the variable, the table modifies its first
dimension name to prevent a conflict. The updated dimension name becomes Name_1.
T =
T.Properties.DimensionNames
ans =
'Name_1' 'Data'
Similarly, if you assign a dimension name that is not a valid MATLAB identifier, the name is modified.
ans =
'LastName' 'Data'
In R2016b, tables raise warnings when dimension names are not valid identifiers, or conflict with
variable names or reserved names, so that you can continue to work with code and tables created
with previous releases. If you encounter these warnings, it is recommended that you update your
code to avoid them.
9-65
9 Tables
Because tables and timetables are containers, working with them is somewhat different than working
with ordinary numeric arrays. The example shows how to use different tabular subscripting modes,
how these modes differ, and the advantages and disadvantages of each mode for different situations.
It also shows how to access and assign data, apply transformation and summary functions, convert
table variables to different data types, and plot results.
The Ames Housing Data used in this example comes from residential real estate data for the town of
Ames, Iowa, in the United States. You can download the original data from an XLS (Excel®
Workbook) spreadsheet. The data description is available as a text file. (Used with permission of the
copyright holder. Please contact the copyright holder if you wish to publish or redistribute this data.)
The best way to import a spreadsheet into MATLAB is to use the readtable function, or for data that
include timestamps, the readtimetable function. While the Ames Housing Data includes the sale
month and year for each house, the month and year are stored in separate columns. In this case, it is
simpler to use readtable.
Read the housing data. With readtable you can read data directly from a URL. Store all text data
from the spreadsheet as string arrays in the output table. Also, when readtable reads column
headers from a file, it uses them as table variable names and transforms them into valid MATLAB
identifiers. To preserve the original names, use the 'VariableNamingRule' name-value argument.
housing = readtable("http://jse.amstat.org/v19n3/decock/AmesHousing.xls","TextType","string");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names
Display housing. The table has one variable for each of the 82 columns in the spreadsheet.
housing
housing=2930×82 table
Order PID MSSubClass MSZoning LotFrontage LotArea Street Alley
_____ ____________ __________ ________ ___________ _______ ______ _____
9-66
Data Cleaning and Calculations in Tables
The spreadsheet has some column headers with spaces and other column headers that start with
numbers. Column headers become variable names in the output table. By default, readtable
standardizes names with spaces by using camel case, and standardizes names beginning with
numbers by prepending them with 'x'. Although a table can have variable names with spaces and
other non-alphanumeric characters in them, the standardization makes working with table variable
names more natural. Before standardizing names, readtable saves the original column headers in
housing.Properties.VariableDescriptions.
housing.Properties.VariableDescriptions
In this example, the original variable names are not needed. To delete them, assign an empty cell
array to the VariableDescriptions property.
housing.Properties.VariableDescriptions = {};
You can remove the Order variable because it is a row index and not needed. To remove one variable
from the table, assign an empty array, [], to the variable, just as you delete rows or columns from a
matrix.
housing.Order = [];
There are 81 variables left in the table. For a complete analysis of the housing prices, most of the
variables are probably important. But for this example, only a much smaller subset is needed. To
delete the unwanted variables one-by-one is tedious. The removevars function can delete them all at
once, but in this case there is an easier way. First list the variables that you want to keep. Then use
subscripting to select them and delete the others. Selecting variables by name is often much easier
than figuring out their numeric indices.
housing=2930×25 table
PID MSSubClass LotFrontage LotArea Neighborhood BldgType OverallCo
____________ __________ ___________ _______ ____________ ________ _________
9-67
9 Tables
Two of the variable names are not very clear. Rename those variables with better names by using the
VariableNames property.
housing.Properties.VariableNames(["GrLivArea" "LowQualFinSF"]) = ["TotalAboveGroundLivingArea" "L
There are two other variable names, starting with 'x', that look odd. Another way to rename them is
to use the renamevars function. If you use renamevars, assign the output to the original table.
Otherwise the update is lost.
housing = renamevars(housing,["x1stFlrSF" "x2ndFlrSF"],["FirstFlrArea" "SecondFlrArea"]);
Six of the variables are string arrays. Conceptually they all contain categorical data: discrete,
nonnumeric values drawn from a small fixed set of possible values or categories. It is almost always a
good idea to convert that kind of data to categorical arrays. You can use the
detectImportOptions function to control the data types of the data you read with readtable. But
instead of starting over, you can convert these table variables to have the categorical data type.
For example, convert the Neighborhood variable to a categorical array.
housing.Neighborhood = categorical(housing.Neighborhood);
This assignment overwrites, or replaces, the existing text variable Neighborhood in the table with a
new categorical variable. Replacement is what enables the assignment to change the data type. In
contrast, this assignment, using indexing:
housing.Neighborhood(:) = categorical(housing.Neighborhood)
assigns values into the existing text variable, element by element, rather than replacing the variable.
In that case housing.Neighborhood remains a string array. This behavior is consistent with the
behavior of ordinary workspace variables. Assignment by indexing into an array does not change the
type of the array. For example, if you index into an array of integers and assign a floating-point value
to an element, the value is truncated and stored as an integer.
x = uint32([1 2 3]);
x(2) = 2.2 % converted to 2, as a uint32
1 2 3
Assignment with dot notation is one way to convert the type of a variable in a table. The
convertvars function is another way and has two benefits. First, it avoids any confusion about
overwriting as opposed to assignment into a variable. The convertvars function always overwrites
existing variables and converts their type. Second, convertvars can operate on more than one
9-68
Data Cleaning and Calculations in Tables
variable at a time. There are several more text variables in housing to be converted to the
categorical data type. Changing them one at a time would get tedious, but convertvars can
convert more than one variable in one command.
housing = convertvars(housing,["BldgType" "Foundation"],"categorical");
It is not necessary to explicitly list the variables by name or position in the table. You can find all the
table variables that are string arrays and convert them to categorical variables. To specify table
variables that are string arrays, use the function handle @isstring when calling convertvars.
housing = convertvars(housing,@isstring,"categorical");
In both cases, assign the output of convertvars back to the original table. Otherwise, the update is
lost.
Sometimes, converting all text variables to categorical is too much. For example, if the current
homeowners' names were present in the data, then it would not make sense to store them in a
categorical variable. Homeowners' names do not define housing categories. You might keep their
names in a string array instead.
As another example, the CentralAir variable is one of the variables that was converted to
categorical. But because its categories are just Y and N, it might make more sense to consider it a
logical variable.
summary(housing.CentralAir)
N 196
Y 2734
The logical data type (like all the integer types) does not allow missing values (analogous to NaN),
while categorical does. The CentralAir variable happens to have no missing data values. You
can use either logical or categorical as the data type for CentralAir.
any(ismissing(housing.CentralAir))
ans = logical
0
Convert the data type to logical, with true corresponding to Y, using dot notation to overwrite the
existing categorical variable with the new logical one.
housing.CentralAir = (housing.CentralAir == "Y");
housing=2930×25 table
PID MSSubClass LotFrontage LotArea Neighborhood BldgType OverallCond
__________ __________ ___________ _______ ____________ ________ ___________
9-69
9 Tables
All the text data has been converted to categorical variables. But there are still a few things to
clean up.
The OverallCond variable was read in as a numeric array, but its values are all drawn from the
integers 1-10. You can leave these values as numeric data, but you can think of it as ordinal
categorical data. When a categorical array is ordinal, its categories have a specified order. For
example, the categories 10 and 5 can be compared (10 > 5, because a house whose condition is
rated as a 10 is theoretically nicer than one rated 5), but for these comparisons, there is no numeric
meaning to 10 - 5. To avoid unintentionally treating OverallCond as numeric data, convert it to an
ordinal categorical array, which still enables relational comparisons but prevents arithmetic
operations. The category names 1, 2, and so on are easy to interpret and are acceptable.
housing.OverallCond = categorical(housing.OverallCond,1:10,"Ordinal",true);
Similarly, the MSSubClass variable consisted of numeric codes in the original spreadsheet. You can
think of those values as being categorical data. Because there is no mathematical order to these
particular codes, the categories are nonordinal (or nominal). In this case, readtable read those
values in as text to preserve leading zeroes in the codes. MSSubClass was then converted to
categorical data.
While MSSubClass has the data type that you want, you might find it difficult to interpret the codes
as categories of houses. The file that describes the Ames Housing Data contains the definitions of the
numeric codes. Giving these categories readable names can help you understand the data. To make it
clear which names go with which numbers, specify both the categories (code) and their names
(subclass) in another call to the categorical function.
code = ["020" "030" "040" "045" "050" "060" "070" "075" "080" "085" "090" "120" "150" "160" "180"
subclass = ["1-STORY 1946 & NEWER ALL STYLES" ...
"1-STORY 1945 & OLDER" ...
"1-STORY W/FINISHED ATTIC ALL AGES" ...
"1-1/2 STORY - UNFINISHED ALL AGES" ...
"1-1/2 STORY FINISHED ALL AGES" ...
"2-STORY 1946 & NEWER" ...
"2-STORY 1945 & OLDER" ...
"2-1/2 STORY ALL AGES" ...
"SPLIT OR MULTI-LEVEL" ...
"SPLIT FOYER" ...
"DUPLEX - ALL STYLES AND AGES" ...
"1-STORY PUD (Planned Unit Development) - 1946 & NEWER" ...
"1-1/2 STORY PUD - ALL AGES" ...
"2-STORY PUD - 1946 & NEWER" ...
"PUD - MULTILEVEL - INCL SPLIT LEV/FOYER" ...
"2 FAMILY CONVERSION - ALL STYLES AND AGES"];
housing.MSSubClass = categorical(housing.MSSubClass,code,subclass);
9-70
Data Cleaning and Calculations in Tables
The category names for the BldgType variable are not obvious. As with MSSubClass, more
descriptive names can help you understand the building categories. To display the number of houses
in each building category, use the summary function.
summary(housing.BldgType)
1Fam 2425
2fmCon 62
Duplex 109
Twnhs 101
TwnhsE 233
With only five categories, you can safely list the new category names in the right order without
specifying the old names. To rename categories, use the renamecats function.
types = ["Single-family Detached" "Two-family Conversion" "Duplex" "Townhouse End Unit" "Townhous
housing.BldgType = renamecats(housing.BldgType,types);
The GarageType variable includes the category NA, standing for Not Applicable. In GarageType, NA
means that the house does not have a garage. But it is too easy to confuse NA with a missing value. A
true missing value means it cannot be determined if a house has a garage. But in this housing data, it
is always known if a house has a garage. Change that one category name to make its meaning clearer.
housing.GarageType = renamecats(housing.GarageType,"NA","None");
Finally, the PID variable was read in as a string array. While its values were numeric, some of them
had leading zeroes. The readtable function preserved this information by storing the values as
strings. Then the call to convertvars converted the PID variable to a categorical array. PID
stores identification numbers that are unique. Identification numbers are assigned as needed and do
not come from a fixed set of values. There is no particular advantage in storing them in a
categorical variable. If every identification number is a category, then adding a new identification
number means adding a new category to PID. It might be more convenient to convert PID back to a
string array. To convert values to strings, use the string function.
housing.PID = string(housing.PID);
housing
housing=2930×25 table
PID MSSubClass LotFrontage LotAr
____________ _____________________________________________________ ___________ _____
9-71
9 Tables
The table has separate variables for the month and year of sale. It is more convenient if those
variables are combined in one datetime variable. Assignment by using dot notation is a good way to
add a new variable at the right edge of a table. Add the date of sale as a new variable.
Now delete the two original variables. It is easier to list the variables by name and use removevars.
housing=2930×24 table
PID MSSubClass LotFrontage LotAr
____________ _____________________________________________________ ___________ _____
Explore the data by making some simple plots. Many basic plotting commands do not accept tables as
input arguments. But you can use dot notation to pass one or more table variables into a plotting
function. You are taking arrays out of the table and passing them as input arguments to a plotting
function.
For example, make a scatter plot of the sale prices of houses in the table as a function of the years in
which they were built.
scatter(housing.YearBuilt,housing.SalePrice,20,"filled");
9-72
Data Cleaning and Calculations in Tables
A log transformation of the prices might show a simpler relationship between year and price. Also,
you can show more information in the scatter plot by using the living area of the houses to color the
markers. The living areas have a long right tail, so it is also useful to show a log transformation of the
areas. To transform the two table variables, wrap them in calls to the log function. Then make
another scatter plot.
logSalePrice = log(housing.SalePrice);
logLivingArea = log(housing.TotalAboveGroundLivingArea);
scatter(housing.YearBuilt,logSalePrice,20,logLivingArea,"filled");
9-73
9 Tables
Any large, complex data set collected over a long period of time might contain some errors. Check for
errors in the housing data. Dates in the data are a good place to start. First compare YearBuilt to
YearRemod_Add.
checkRows = housing.YearBuilt > housing.YearRemod_Add;
housing(checkRows,:)
ans=1×24 table
PID MSSubClass LotFrontage LotArea Neighborhood
____________ _______________________________ ___________ _______ ____________
It is not possible for remodeling to have been done in 2001 if the house itself was built in 2002. If you
assume that the YearBuilt value is known to be the error (an assumption that needs to be
confirmed), you can use dot notation to assign 2001 as the year in which this house was built.
housing.YearBuilt(checkRows) = 2001;
ans=2×24 table
PID MSSubClass LotFrontage LotArea Neighborhood
9-74
Data Cleaning and Calculations in Tables
There is another issue. These two houses were sold in late 2007, as shown in the LastSoldDate
variable. But the corresponding value in YearBuilt is 2008. It might be that for these houses, the
years in YearBuilt were recorded in early 2008 (another assumption needing confirmation). Update
the YearBuilt variable, this time by using dot notation to assign to two rows.
housing.YearBuilt(checkRows) = 2007;
The next step in cleaning the data is to check for missing data in the numeric and categorical
variables. The one logical variable in housing does not support missing values. The ismissing
function indicates which elements of the table have missing values.
missingElements = ismissing(housing)
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⋮
The ismissing function returns a logical matrix that is the same size as the table. Summing the
columns of that matrix gives the number of missing values in each of the variables of the table.
numMissing = sum(missingElements,1)
numMissing = 1×24
0 0 490 0 0 0 0 0 0 0 0 0 0 0 0 0
Only three of the variables have missing data, but without the variable names it is not easy to tell
which variables they are. One way to tell is to index into the VariableNames property of the table to
find the names that correspond to the variables with missing values.
housing.Properties.VariableNames(numMissing > 0)
Deciding what to do about missing data is a challenge. If the data is missing at random, and there are
only a few missing values, one strategy is to remove those rows from the table. The four missing
9-75
9 Tables
basement bath values (NaNs, in this case) occur in only two rows. You can remove those two rows by
using the rmmissing function.
ans=2×24 table
PID MSSubClass LotFrontage LotArea Neighborhood
____________ _______________________________ ___________ _______ ____________
This call to rmmissing removes only the rows that have missing values in BsmtFullBath and
BsmtHalfBath. The 490 rows with missing LotFrontage values are still in the table. You can
remove these 490 rows but doing so deletes more than 16% of the data. You also can fill these
missing values with the mean frontage value by using the fillmissing function, but that is not
practical for this data. For variables that form a time series, fillmissing also supports filling
variables with interpolated values or moving-window smoothed values. LotFrontage is not a time
series. The data in this variable is a cross-sectional data set.
One commonly used strategy for filling in missing values in cross-sectional data is to create a
regression model to predict the missing values in a row from the non-missing data in that row. A
simple scatter plot indicates that there is a log-log relationship between the area of a lot and its
frontage. That relationship suggests a model.
loglog(housing.LotArea,housing.LotFrontage,'o')
9-76
Data Cleaning and Calculations in Tables
You can use that log-log relationship to fill in the missing LotFrontage values by regressing the
values on LotArea.
missingValues = ismissing(housing.LotFrontage);
beta = polyfit(log(housing.LotArea(~missingValues)),log(housing.LotFrontage(~missingValues)),1);
housing.LotFrontage(missingValues) = exp(polyval(beta,log(housing.LotArea(missingValues))));
You can use dot notation to work on data in a table when you use functions such as polyfit and
polyval that accept numeric vectors but not tables. You can think of a table as a container that is
designed to hold data having different types. Functions such as polyfit that are specifically for
numeric inputs do not work on a table because a table often contains nonnumeric data. Even when a
table contains only numeric data, it is still a container. The functions must be applied to the contents
of the table. Use dot notation to access table variables.
Add the imputed missing values that you calculated with polyfit and polyval to the scatter plot. A
simple imputation scheme might not be sufficient in a real analysis of this data, but it illustrates how
to visualize and make computations on numeric data in a table.
hold on
loglog(housing.LotArea(missingValues),housing.LotFrontage(missingValues),'rx')
hold off
9-77
9 Tables
Dot notation has been convenient for operations such as converting an existing table variable, adding
a new variable, assigning values, plotting, and applying functions like polyval to a table variable.
Dot notation is also convenient for arithmetic operations on table variables. For example, convert the
LotFrontage variable from feet to meters.
housing=2928×24 table
PID MSSubClass LotFrontage LotAr
____________ _____________________________________________________ ___________ _____
9-78
Data Cleaning and Calculations in Tables
Using dot notation means that the multiplication is applied not to the housing table, which cannot
be done because tables are containers, but rather to its LotFrontage variable, which is a numeric
vector. With dot notation, you extracted LotFrontage from the table and put the modified version
back in.
Another way to access the contents of a table is to subscript into it by using curly braces, just as you
use curly braces to extract the contents of a cell array. You can use curly brace subscripting to refer
to and operate on data in a table by extracting and reinserting contents. For example, convert
LotFrontage back to feet by using curly brace subscripting.
Dot notation and brace subscripting are different syntaxes for the same kinds of operations. They
both work on the contents of a table. Also, they both enable you to specify a table variable and a
subset of its rows.
housing.LotArea(1:2)
ans = 2×1
31770
11622
housing{1:2,"LotArea"}
ans = 2×1
31770
11622
While both syntaxes work on the contents of table, there are two subtle differences to consider.
First, a limitation of curly brace subscripting is that it assigns into the contents of a table rather than
replacing a variable. For example, this assignment does not change the data type of the
LotFrontage variable in the way that an assignment using dot notation does. The call to the single
function on the right side of the assignment creates an array having the single data type. But by
subscripting into housing with curly braces, you assign values from that array into the existing table
variable. And the data type of LotFrontage is double. The values from the right side are converted
back to double by this assignment.
housing{:,"LotFrontage"} = single(housing{:,"LotFrontage"});
Second, a benefit of curly brace subscripting is that, unlike dot notation, it uses the familiar two-
dimensional subscripting syntax. This syntax enables you to refer to more than one variable at a time
and also to a subset of rows. For example, there are five variables whose units are square feet.
Converting these variables to square meters one at a time is tedious. To apply the multiplication to all
five variables at once, use curly brace subscripting.
9-79
9 Tables
A common mistake is to use parenthesis subscripting instead of braces to operate on the contents of a
table. While some functions, such as ismissing or varfun, do accept a table as their input, many
numeric operations, including arithmetic, do not. For example, this assignment using parentheses
results in an error. The try-catch block catches the error and displays it.
try
housing(:,areaVars) = 0.3048^2 * housing(:,areaVars);
catch ME
disp(ME.message)
end
A third way to do calculations on numeric variables in a table is to use the varfun function. Like
curly brace subscripting, varfun can operate on all or only some of the variables in a table. Unlike
curly braces, varfun operates on each table variable separately. By default, varfun returns another
table containing a variable for each separate result.
Sometimes the operation that you want to apply is an existing function. To pass the function as an
argument to varfun, use a function handle. For example, use the round function to round data in the
variables specified by areaVars.
roundedAreaTable = varfun(@round,housing,"InputVariables",areaVars)
roundedAreaTable=2928×5 table
round_LotArea round_FirstFlrArea round_SecondFlrArea round_LowQualFinishedArea ro
_____________ __________________ ___________________ _________________________ __
2952 154 0 0
1080 83 0 0
1325 123 0 0
1037 196 0 0
1285 86 65 0
927 86 63 0
457 124 0 0
465 119 0 0
501 150 0 0
697 96 72 0
929 71 83 0
741 110 0 0
781 73 63 0
945 125 0 0
634 140 0 0
4971 157 148 0
⋮
9-80
Data Cleaning and Calculations in Tables
If there is no function that does exactly what you want, you can also write an anonymous function to
do it.
sqMeters2sqFeet = @(x) x / 0.3048^2;
areaTable = varfun(sqMeters2sqFeet,housing,"InputVariables",areaVars)
areaTable=2928×5 table
Fun_LotArea Fun_FirstFlrArea Fun_SecondFlrArea Fun_LowQualFinishedArea Fun_TotalA
___________ ________________ _________________ _______________________ __________
31770 1656 0 0
11622 896 0 0
14267 1329 0 0
11160 2110 0 0
13830 928 701 0
9978 926 678 0
4920 1338 0 0
5005 1280 0 0
5389 1616 0 0
7500 1028 776 0
10000 763 892 0
7980 1187 0 0
8402 789 676 0
10176 1341 0 0
6820 1502 0 0
53504 1690 1589 0
⋮
Because that result is a table, it can be assigned back into the original table with parenthesis
subscripting.
housing(:,areaVars) = areaTable;
housing.Properties.VariableUnits(areaVars) = "ft^2";
housing(:,areaVars) = areaTable;
The two assignments have the same effect. The assignment with parentheses assigns one table to
another. The assignment with curly braces explicitly assigns values to the content of the table. The
left and right sides of that assignment are numeric matrices. Because curly brace subscripting
extracts and reinserts data, it is a convenient way to modify data in place. Contents-to-contents
assignment can operate on only one data type at a time, while table-to-table assignment can move
data of different types. For example, this assignment results in an error because it involves mixed
numeric and categorical data in brace subscripting.
try
housing{:,["LotFrontage" "OverallCond"]} = normalize(housing{:,["LotFrontage" "OverallCond"]}
catch ME
disp(ME.message)
end
9-81
9 Tables
Because varfun returns a table, assignment using parenthesis subscripting cannot change the type
of any table variables. For example, this assignment does not convert any variables from the double
to single data type.
housing(:,areaVars) = varfun(@single,housing,"InputVariables",areaVars);
To convert the data types of table variables, use convertvars, as previously shown.
Because curly brace subscripting extracts the variables from a table as one matrix having one data
type, you can use it to perform row operations across numeric variables in a table. For example, a
check on the data is to compare the individual square footage variables against
TotalAboveGroundLivingArea. Extract the former by using curly braces. Then compare their row
sums to TotalAboveGroundLivingArea, extracted by using dot notation.
area = housing{:,["FirstFlrArea" "SecondFlrArea" "LowQualFinishedArea"]}
area = 2928×3
1656 0 0
896 0 0
1329 0 0
2110 0 0
928 701 0
926 678 0
1338 0 0
1280 0 0
1616 0 0
1028 776 0
⋮
isequal(sum(area,2), housing.TotalAboveGroundLivingArea)
ans = logical
1
The square footage data is consistent. Another example is to compute the total number of bathrooms
in each house by extracting the four different bathroom counts and adding them up across each row.
bathCountVars = ["BsmtHalfBath" "HalfBath" "BsmtFullBath" "FullBath"];
bathCounts = housing{:,bathCountVars}
bathCounts = 2928×4
0 0 1 1
0 0 0 1
0 1 0 1
0 1 1 2
0 1 0 2
0 1 0 2
0 0 1 2
0 0 0 2
0 0 1 2
0 1 0 2
⋮
9-82
Data Cleaning and Calculations in Tables
sum(housing{:,bathCountVars},2);
but that sum is not correct. Half-baths count only half as much as full bathrooms. A trend in real
estate listings is to account for multiple half-baths by counting them after the decimal point. Matrix
multiplication makes that operation one line.
Replace those four variables with TotalBaths, rather than adding a new variable at the end of the
table. Begin this replacement by using addvars to add TotalBaths next to the existing variables.
There is a mistake in one row of the data. A townhouse built in 2007 probably does not have four half
baths and no full baths.
groupcounts(housing,"TotalBaths")
ans=17×3 table
TotalBaths GroupCount Percent
__________ __________ ________
0.4 1 0.034153
1 442 15.096
1.1 293 10.007
1.2 20 0.68306
1.3 2 0.068306
2 890 30.396
2.1 558 19.057
2.2 29 0.99044
3 349 11.919
3.1 288 9.8361
3.2 6 0.20492
3.3 1 0.034153
4 25 0.85383
4.1 16 0.54645
4.2 3 0.10246
6 2 0.068306
⋮
ans=1×25 table
PID MSSubClass LotFrontage LotAr
____________ _____________________________________________________ ___________ _____
"0528228275" 1-STORY PUD (Planned Unit Development) - 1946 & NEWER 53 3922
The BsmtHalfBath count should be two full bathrooms. The bathroom counts are all numeric. The
assignment with braces updates all three values across that row.
9-83
9 Tables
housing = removevars(housing,bathCountVars)
housing=2928×21 table
PID MSSubClass LotFrontage LotAr
____________ _____________________________________________________ ___________ _____
Unlike curly braces, varfun operates on each variable in a table separately. For that reason, varfun
cannot do row operations. The related function rowfun can do row operations. It is often simpler and
faster to use curly brace subscripting for row operations.
In previous sections, the operations on numeric data in the table were transformations that replace
the original values. Many other important operations are reductions whose results are scalars. For
example, calculate the median price of the values in SalePrice.
median(housing.SalePrice)
ans = 160000
The median function works column-wise on matrices. You can use curly brace subscripting to extract
those four variables as a numeric matrix. Then you can calculate the medians of the columns of the
matrix.
median(housing{:,["LotFrontage", "LotArea" "TotalAboveGroundLivingArea" "SalePrice"]})
ans = 1×4
105 ×
This operation does not attach variable names or any other table metadata to the result. As an
alternative, you can use varfun to apply median to each variable in the table. With varfun, the
result is another table that contains separate numeric results and preserves the names.
varfun(@median,housing,"InputVariables",["LotFrontage", "LotArea" "TotalAboveGroundLivingArea" "S
ans=1×4 table
median_LotFrontage median_LotArea median_TotalAboveGroundLivingArea median_SalePrice
9-84
Data Cleaning and Calculations in Tables
These two ways to get the medians are equivalent. There is a trade-off between having the variable
names preserved in another table and having the results in one numeric row vector. The way you pick
depends on what you plan to do with the result.
Using curly braces when calculating the medians has another drawback. Curly braces require
compatible data type for all the variables. That is, the data you extract from the variables must have
data types that allow them to be concatenated into one matrix. Ordinal categorical data can also
have median values. Because categorical and numeric arrays cannot be concatenated, this
operation results in an error.
But because varfun operates on each variable in the table separately, there is no requirement that
the variables have the same data type or compatible types allowing concatenation. The only
requirement is that all the variables must support the function that is applied. To calculate the
medians of ordinal categorical variables and numeric variables in one function call use varfun.
ans=1×5 table
median_LotFrontage median_LotArea median_OverallCond median_TotalAboveGroundLivingAr
__________________ ______________ __________________ _______________________________
See Also
categorical | table | readtable | varfun | renamevars | convertvars | summary |
ismissing | rmmissing | datetime | removevars | addvars | groupcounts
Related Examples
• “Access Data in Tables” on page 9-32
• “Clean Messy and Missing Data in Tables” on page 9-19
• “Add and Delete Table Rows” on page 9-9
• “Add, Delete, and Rearrange Table Variables” on page 9-12
• “Modify Units, Descriptions, and Table Variable Names” on page 9-24
• “Create Timetables” on page 10-2
• “Resample and Aggregate Data in Timetable” on page 10-5
• “Grouped Calculations in Tables and Timetables” on page 9-86
9-85
9 Tables
This example shows how to import nitrogen dioxide (NO2) data from the US Environmental
Protection Agency (EPA) into a table and do grouped calculations on this data. NO2 is one of the
Criteria Air Pollutants regulated under the US Clean Air Act. It is toxic by itself and is also a key
component of photochemical smog that results in ground-level ozone production. NO2 is produced
through high-temperature processes that can split nitrogen and oxygen gases and enable them to
recombine. Natural processes contribute NO2 to the atmosphere, but so do human activities such as
combustion in automobile engines and power plants, lightning, and biomass burning. The
concentration of NO2 in the atmosphere is also influenced by the photochemical cycling between NO
and NO2, atmospheric transport, and ultimately oxidation to nitric acid, causing acid rain. Different
processes contribute NO2 to the atmosphere on different timescales, leading to daily (diurnal),
weekly, and annual cycles in its atmospheric concentration. Time-series analysis of such data relies
heavily on grouped calculations to examine different periodic behavior or to average the data over
time to smooth out high-frequency variability and reveal long-term trends.
The example first shows how to do preliminary data cleaning, including conversion of the table to a
timetable. Then it shows simple ways to group the data by one grouping variable and calculate annual
mean NO2 concentrations. It also shows how to group the NO2 data by two grouping variables
together, time and location, enabling calculations that find locations exceeding EPA standards at
various times. You can also group the NO2 data by time period to look for daily or yearly cycles.
Finally it shows how to apply a function that requires inputs from multiple table variables to find the
times at which the maximum NO2 concentrations occurred at each site.
First, import NO2 data from the Air Quality System (AQS) database maintained by the EPA. This data
consists of hourly measurements of NO2 concentrations from outdoor monitors across the United
States, Puerto Rico, and the U.S. Virgin Islands. It is stored as a set of zipped spreadsheets, one for
each year starting with 1980.
Download hourly NO2 measurements for the years 1985–1989. You can download and unzip the
compressed spreadsheets by using the unzip function. The result is set of files in your current folder
with names such as hourly_42602_1985.csv. Here, 42602 is an EPA code for NO2. (Data from the
US Environmental Protection Agency. Air Quality System Data Mart available via https://
www.epa.gov/airdata. Accessed July 15, 2021.)
yrs = string(1985:1989);
urls = "https://aqs.epa.gov/aqsweb/airdata/hourly_42602_" + yrs + ".zip";
fnames = strings(numel(yrs),1);
for ii = 1:numel(yrs)
fnames(ii) = unzip(urls(ii));
end
fnames
9-86
Grouped Calculations in Tables and Timetables
"hourly_42602_1986.csv"
"hourly_42602_1987.csv"
"hourly_42602_1988.csv"
"hourly_42602_1989.csv"
Import data from the spreadsheets into a table. Start by creating an empty table. Then import data
from the spreadsheets, one by one, by using the readtable function and adding it to the table.
Create import options that help specify how readtable imports tabular data. To create import
options based on the contents of the spreadsheets, use the detectImportOptions function. Read
all the text data into table variables that store strings. You can also specify that only specified table
variables have certain data types. To specify that only the TimeGMT and TimeLocal table variables
store times as duration arrays, use the setvaropts function.
NO2data = table;
opts = detectImportOptions(fnames(1),"TextType","string");
opts = setvaropts(opts,["TimeGMT","TimeLocal"],"Type","duration","InputFormat","hh:mm");
Import data from the spreadsheets by using the readtable function. You can vertically concatenate
the tables you read in so that all the data is in one large table.
The spreadsheets have column names, such as "Time GMT", that you cannot use as MATLAB
identifiers. As the warning messages indicate, readtable converts these names into table variable
names that are valid MATLAB identifiers, such as TimeGMT. When a table variable name is also a
valid MATLAB identifier, it is easier to access the variable by using dot notation, as in
NO2data.TimeGMT.
for ii = 1:numel(yrs)
NO2data = [NO2data; readtable(fnames(ii),opts)];
end
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names
Display NO2data. It has 24 variables storing NO2 sample measurements, site locations, state names,
times, and many other pieces of information.
NO2data
NO2data=11294497×24 table
StateCode CountyCode SiteNum ParameterCode POC Latitude Longitude Datum
_________ __________ _______ _____________ ___ ________ _________ ______
9-87
9 Tables
Next, prepare NO2data for analysis by cleaning the data. Data cleaning is the process of detecting
and correcting (or removing) parts of the data set that are either corrupt, inaccurate, or irrelevant.
You can also convert table variables so that they have data types that can be more convenient for
analysis, such as categorical or datetime arrays.
For example, the table variable SampleMeasurement has measurements of NO2 concentration.
Concentrations below the method detection limit (MDL) are unreliable. To exclude them from
analysis, find the rows where SampleMeasurement is below the MDL. Set those elements to NaN.
NO2data.SampleMeasurement(NO2data.SampleMeasurement < NO2data.MDL) = NaN;
Create a table that contains only the subset of variables that are relevant to this example. You can use
table subscripting to create a table that has all rows (specified by a colon) and only those variables
that you name.
NO2data = NO2data(:,["DateLocal","TimeLocal","SampleMeasurement","StateName","CountyName","SiteNu
NO2data=11294497×8 table
DateLocal TimeLocal SampleMeasurement StateName CountyName SiteNum Latitud
__________ _________ _________________ _________ __________ _______ _______
9-88
Grouped Calculations in Tables and Timetables
Combine the local date and time into a single timestamp. The new Timestamp table variable is a
datetime array. Delete the DateLocal and TimeLocal variables because they are now redundant.
To categorize the data later, convert the StateName and CountyName variables to categorical
arrays, first erasing space characters from the names. There are fixed sets of state and county names
in the data, which makes it convenient to create categories based on them.
Rename the SampleMeasurement variable to MeasuredNO2. One way to rename table variables is
by using the VariableNames property of the table.
NO2data.Properties.VariableNames("SampleMeasurement") = "MeasuredNO2";
Convert NO2data to a timetable. The datetime values in Timestamp are now row times that label
the rows of the timetable. The dates and times of the original table were in separate variables. To put
data like this data into a timetable, it is more convenient to import the data as a table, and then
combine the separate date and time variables into one datetime variable. Then convert the modified
table by using the table2timetable function.
NO2data = table2timetable(NO2data)
NO2data=11294497×6 timetable
Timestamp MeasuredNO2 StateName CountyName SiteNum Latitude Long
____________________ ___________ _________ __________ _______ ________ ____
Given the size of the timetable, it is obvious that there are many thousands of hourly measurements
in every state. One way to calculate the number of measurements for each state is to sum the number
of rows that have a particular state as a category. For example, calculate the number of
measurements for Alaska, and then for Arizona.
9-89
9 Tables
numAlaska = sum(NO2data.StateName=="Alaska")
numAlaska = 7071
numArizona = sum(NO2data.StateName=="Arizona")
numArizona = 142793
It is tedious to perform this calculation multiple times or to store intermediate results in many
variables or subtables. Instead, MATLAB provides functions that group data in tables and apply
functions to each group in-place. For example, use the groupcounts function to group the data in
NO2data by the states in StateName and count the rows in each group. Instead of calling sum many
times, call groupcounts once.
NO2counts = groupcounts(NO2data,"StateName")
NO2counts=42×3 table
StateName GroupCount Percent
__________________ __________ ________
To sort the results in a table or timetable, use the sortrows function. Sort gc on its GroupCount
variable from highest to lowest value.
sortedNO2counts = sortrows(NO2counts,"GroupCount","descend")
sortedNO2counts=42×3 table
StateName GroupCount Percent
_____________ __________ _______
9-90
Grouped Calculations in Tables and Timetables
To calculate other statistics, use the groupsummary function. For example, find the maximum NO2
concentration measured in each state.
NO2max = groupsummary(NO2data,"StateName","max","MeasuredNO2");
sortedNO2max = sortrows(NO2max,"max_MeasuredNO2","descend")
sortedNO2max=42×3 table
StateName GroupCount max_MeasuredNO2
____________ __________ _______________
As an alternative, you can use the varfun function with the "GroupingVariables" name-value
argument for grouped calculations. But the groupsummary function is simpler and performs most of
the same grouped calculations as varfun.
Functions such as groupcounts, groupsummary, and varfun work equally well on tables and
timetables. But timetables also provide the retime and synchronize functions, which can perform
time-based calculations by using their row times. You can group timetable data by time and perform
calculations on data within the time periods. The retime function is the best option for such cases.
For example, group the data in NO2data into yearly time periods. Find the maximum NO2
concentration for each year.
yearlyMaxNO2 = retime(NO2data(:,"MeasuredNO2"),"yearly","max")
yearlyMaxNO2=5×1 timetable
Timestamp MeasuredNO2
___________ ___________
01-Jan-1985 407.3
9-91
9 Tables
01-Jan-1986 500
01-Jan-1987 497
01-Jan-1988 743.5
01-Jan-1989 462
This calculation is useful if you have one time series. In this case, the data in the MeasuredNO2
variable come from multiple sites. A more useful analysis is to group by both year and site.
The US EPA has two National Ambient Air Quality Standards (NAAQS) for NO2. A location is not in
compliance with the NAAQS if either:
Analyze data in NO2data to find locations that are not in compliance with the first standard, where
the annual mean exceeded 53 ppb. There are three different ways to approach this analysis. What the
three approaches have in common is that you can group the data by both time and site to calculate
annual means by site.
To find sites that do not comply with the NAAQS, calculate the mean value for each site for each year.
While NO2data does not include unique identifiers for the sites, you can use state names, county
names, and site numbers together to uniquely identify air quality sites.
The row times of NO2data are datetime values. Extract their year components and add a new
variable to NO2data named Year. Calculate the annual means for each site by using groupsummary
with StateName, CountyName, SiteNum, and Year as grouping variables.
NO2data.Year = year(NO2data.Timestamp);
meanNO2bySite = groupsummary(NO2data,["StateName","CountyName","SiteNum","Year"],"mean","Measured
meanNO2bySite=1585×6 table
StateName CountyName SiteNum Year GroupCount mean_MeasuredNO2
_________ ______________ _______ ____ __________ ________________
9-92
Grouped Calculations in Tables and Timetables
To find the sites that have the highest mean NO2, sort the timetable.
sortedMeanNO2bySite = sortrows(meanNO2bySite,"mean_MeasuredNO2","descend")
sortedMeanNO2bySite=1585×6 table
StateName CountyName SiteNum Year GroupCount mean_MeasuredNO2
__________ __________ _______ ____ __________ ________________
You can create a table that includes only those sites exceeding 53 ppb by using logical indexing.
Create a logical vector that indicates the rows where mean_MeasuredNO2 is greater than 53. Use
that vector as a subscript to get matching rows from meanNO2bySite.
exceeded53ppb = meanNO2bySite.mean_MeasuredNO2 > 53;
sitesExceed53ppb = meanNO2bySite(exceeded53ppb,:)
sitesExceed53ppb=19×6 table
StateName CountyName SiteNum Year GroupCount mean_MeasuredNO2
__________ __________ _______ ____ __________ ________________
9-93
9 Tables
Sometimes pivoting, or rearranging statistics calculated from tabular data, makes it easier to see and
analyze results, particularly when you look at the relationship between two grouping variables. For
example, you can create a pivot table for the annual mean NO2 by site. By pivoting, you can create a
table where every site lists annual mean NO2 in its own table variable, showing the relationship
between year and site. In MATLAB, you can create pivot tables by using the stack and unstack
functions, which stack and unstack table variables into taller or wider formats.
A complication in this case is that NO2data has three grouping variables that together uniquely
identify sites: state name, county name, and site number. To create a pivot table, first combine these
three table variables into one variable. Convert StateName, CountyName, and SiteNum into strings
and add them together. Replace spaces and dashes with underscores, and erase periods and
parentheses. The names in SiteID are unique site identifiers.
Add SiteID to NO2data as a new table variable. Calculate annual means by using groupsummary,
but this time use SiteID as a grouping variable.
NO2data.SiteID = categorical(siteID);
meanNO2bySiteID = groupsummary(NO2data,["SiteID","Year"],"mean","MeasuredNO2")
meanNO2bySiteID=1585×4 table
SiteID Year GroupCount mean_MeasuredNO2
__________________________ ____ __________ ________________
To create a pivot table, use the unstack function. Each unique site in the SiteID variable of
meanNO2bySiteID becomes the name of a separate table variable in the output,
pivotedMeanNO2bySiteID, and has the annual means associated with that site. This unstacking
operation is how you can create a pivot table in MATLAB.
pivotedMeanNO2bySiteID = unstack(meanNO2bySiteID,"mean_MeasuredNO2","SiteID","GroupingVariable","
pivotedMeanNO2bySiteID=5×443 table
Year Alaska_KenaiPeninsula_1004 Arizona_Apache_10 Arizona_Apache_11 Arizona_Apach
9-94
Grouped Calculations in Tables and Timetables
This representation of the annual means by site has an advantage and a disadvantage.
• It is easier to look at the short five-year time series for each site. After unstacking, each site has
its own variable in pivotedMeanNO2bySiteID. You can easily compare sites to each other.
• It is harder to sort and pick out the largest values across the whole pivoted table. After
unstacking, pivotedMeanNO2bySiteID has 443 variables. The stacked version,
meanNO2bySite, has only seven variables.
To group data in NO2data by year and another grouping variable, it was necessary to add Year as an
additional variable. Also, the output from groupsummary is a table even when the input is a
timetable. But suppose you want to keep the results in a timetable instead. The retime function can
also produce annual summaries. But it can group data only by time. To group data by site and by year,
rearrange NO2data so that you can call retime on a timetable where the NO2 concentrations are
already grouped by site.
Group the raw data in NO2data by site by using the unstack function. The output timetable has a
separate variable for each site. This timetable looks similar to a pivot table. But instead of having
means or some other statistic, NO2bySite has all the raw data. It is just reorganized. For further
convenience, sort the rows of the timetable by their row times so that the earliest timestamps come
first.
NO2bySite = unstack(NO2data,"MeasuredNO2","SiteID","GroupingVariable","Timestamp");
NO2bySite = sortrows(NO2bySite)
NO2bySite=43824×442 timetable
Timestamp Alaska_KenaiPeninsula_1004 Arizona_Apache_10 Arizona_Apache_11
____________________ __________________________ _________________ _________________
In this format you can easily plot the raw data by using the stackedplot function. This plot shows
NO2 concentrations for each site as a function of time.
stackedplot(NO2bySite)
9-95
9 Tables
To create a timetable that is also a pivot table, use retime to calculate annual means.
meanNO2bySiteTT = retime(NO2bySite,"yearly","mean")
meanNO2bySiteTT=5×442 timetable
Timestamp Alaska_KenaiPeninsula_1004 Arizona_Apache_10 Arizona_Apache_11 Arizon
___________ __________________________ _________________ _________________ ______
stackedplot(meanNO2bySiteTT)
9-96
Grouped Calculations in Tables and Timetables
You might want to preserve information from the original timetable NO2data in this timetable of
results. For example, you might want to add the latitudes and longitudes of the sites to NO2bySite.
They were stored for each timestamp in NO2data. But to store them more compactly in this
timetable, add them as per-variable custom properties to NO2bySite.
LatLon = groupsummary(NO2data,"SiteID","mode",["Latitude","Longitude"]);
NO2bySite = addprop(NO2bySite,["Latitude","Longitude"],["variable","variable"]);
NO2bySite.Properties.CustomProperties.Latitude(string(LatLon.SiteID)) = LatLon.mode_Latitude';
NO2bySite.Properties.CustomProperties.Longitude(string(LatLon.SiteID)) = LatLon.mode_Longitude';
To calculate compliance with the second NAAQS standard for NO2 requires a sequence of grouped
calculations. By the second standard, a location is out of compliance if the 98th percentile of the 1-
hour daily maximum concentrations of NO2, averaged over 3 years, exceeds 100 ppb.
Start with the hourly concentrations of NO2 by site. To find the daily maximum for each site, use the
retime function, specifying "max" as the method to find the maximum concentration for each day's
worth of data. Then find the 98th percentiles of the daily maximums in each year's worth of data,
calling retime a second time. To calculate percentiles, use the findPrctile supporting function
referred to in this example.
dailyMax = retime(NO2bySite,"daily","max");
yearlyP98 = retime(dailyMax,"yearly",@(x)findPrctile(x,98))
yearlyP98=5×442 timetable
Timestamp Alaska_KenaiPeninsula_1004 Arizona_Apache_10 Arizona_Apache_11 Arizon
9-97
9 Tables
01-Jan-1985 NaN 27 29
01-Jan-1986 NaN 13 14
Next calculate a moving mean for each site, specifying a three-year window for the moving mean. The
smoothdata enables you to apply the movmean function to each variable in yearlyP98.
moving3yearAvg = smoothdata(yearlyP98,"movmean",[years(3) 0])
moving3yearAvg=5×442 timetable
Timestamp Alaska_KenaiPeninsula_1004 Arizona_Apache_10 Arizona_Apache_11 Arizon
___________ __________________________ _________________ _________________ ______
01-Jan-1985 NaN 27 29
01-Jan-1986 NaN 20 21.5
Display sites that are out of compliance. First specify a time range starting in 1987, the first year for
which the moving three-year window has three full years of data.
full3years = timerange("1987-01-01","1989-01-01","closed")
full3years =
timetable timerange subscript: