75% found this document useful (4 votes)
2K views667 pages

Supercharged Python - Take Your Code To The Next Level

Uploaded by

Vineeth Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
75% found this document useful (4 votes)
2K views667 pages

Supercharged Python - Take Your Code To The Next Level

Uploaded by

Vineeth Babu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

From the Library of Vineeth Babu

Supercharged Python

From the Library of Vineeth Babu

Overland_Book.indb i 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


Supercharged Python

Brian Overland
John Bennett

Boston • Columbus • New York • San Francisco • Amsterdam • Cape Town


Dubai • London • Madrid • Milan • Munich • Paris • Montreal • Toronto • Delhi • Mexico City
São Paulo • Sydney • Hong Kong • Seoul • Singapore • Taipei • Tokyo

From the Library of Vineeth Babu

Overland_Book.indb iii 4/30/19 1:37 PM


Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and the publisher was
aware of a trademark claim, the designations have been printed with initial capital letters or
in all capitals.
The authors and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or omis-
sions. No liability is assumed for incidental or consequential damages in connection with or
arising out of the use of the information or programs contained herein.
For information about buying this title in bulk quantities, or for special sales opportunities
(which may include electronic versions; custom cover designs; and content particular to your
business, training goals, marketing focus, or branding interests), please contact our corporate
sales department at corpsales@[Link] or (800) 382-3419.
For government sales inquiries, please contact governmentsales@[Link].
For questions about sales outside the U.S., please contact intlcs@[Link].
Visit us on the Web: [Link]/aw
Library of Congress Control Number: 2019936408
Copyright © 2019 Pearson Education, Inc.
Cover illustration: Open Studio/Shutterstock
All rights reserved. This publication is protected by copyright, and permission must be obtained
from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmis-
sion in any form or by any means, electronic, mechanical, photocopying, recording, or likewise.
For information regarding permissions, request forms and the appropriate contacts within the
Pearson Education Global Rights & Permissions Department, please visit [Link]/
permissions/.
ISBN-13: 978-0-13-515994-1
ISBN-10: 0-13-515994-6
1 19

From the Library of Vineeth Babu

Overland_Book.indb iv 4/30/19 1:37 PM


To my beautiful and brilliant mother, Betty P. M. Overland. . . .
All the world is mad except for me and thee. Stay a little.
—Brian

To my parents, who did so much to shape who I am.


—John

From the Library of Vineeth Babu

Overland_Book.indb v 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


Contents
Preface xxiii
What Makes Python Special? xxiii
Paths to Learning: Where Do I Start? xxiv
Clarity and Examples Are Everything xxiv
Learning Aids: Icons xxv
What You’ll Learn xxvi
Have Fun xxvi

Acknowledgments xxvii

About the Authors xxix

Chapter 1 Review of the Fundamentals 1


1.1 Python Quick Start 1
1.2 Variables and Naming Names 4
1.3 Combined Assignment Operators 4
1.4 Summary of Python Arithmetic Operators 5
1.5 Elementary Data Types: Integer and Floating Point 6
1.6 Basic Input and Output 7
1.7 Function Definitions 9
1.8 The Python “if” Statement 11
1.9 The Python “while” Statement 12
1.10 A Couple of Cool Little Apps 14

vii
From the Library of Vineeth Babu

Overland_Book.indb vii 4/30/19 1:37 PM


viii Contents

1.11 Summary of Python Boolean Operators 15


1.12 Function Arguments and Return Values 16
1.13 The Forward Reference Problem 19
1.14 Python Strings 19
1.15 Python Lists (and a Cool Sorting App) 21
1.16 The “for” Statement and Ranges 23
1.17 Tuples 25
1.18 Dictionaries 26
1.19 Sets 28
1.20 Global and Local Variables 29
Summary 31
Review Questions 31
Suggested Problems 32

Chapter 2 Advanced String Capabilities 33


2.1 Strings Are Immutable 33
2.2 Numeric Conversions, Including Binary 34
2.3 String Operators (+, =, *, >, etc.) 36
2.4 Indexing and Slicing 39
2.5 Single-Character Functions (Character Codes) 42
2.6 Building Strings Using “join” 44
2.7 Important String Functions 46
2.8 Binary, Hex, and Octal Conversion Functions 47
2.9 Simple Boolean (“is”) Methods 48
2.10 Case Conversion Methods 49
2.11 Search-and-Replace Methods 50
2.12 Breaking Up Input Using “split” 53
2.13 Stripping 54
2.14 Justification Methods 55
Summary 56
Review Questions 57
Suggested Problems 57

From the Library of Vineeth Babu

Overland_Book.indb viii 4/30/19 1:37 PM


Contents ix

Chapter 3 Advanced List Capabilities 59


3.1 Creating and Using Python Lists 59
3.2 Copying Lists Versus Copying List Variables 61
3.3 Indexing 61
3.3.1 Positive Indexes 62
3.3.2 Negative Indexes 63
3.3.3 Generating Index Numbers Using “enumerate” 63
3.4 Getting Data from Slices 64
3.5 Assigning into Slices 67
3.6 List Operators 67
3.7 Shallow Versus Deep Copying 69
3.8 List Functions 71
3.9 List Methods: Modifying a List 73
3.10 List Methods: Getting Information on Contents 75
3.11 List Methods: Reorganizing 75
3.12 Lists as Stacks: RPN Application 78
3.13 The “reduce” Function 81
3.14 Lambda Functions 83
3.15 List Comprehension 84
3.16 Dictionary and Set Comprehension 87
3.17 Passing Arguments Through a List 89
3.18 Multidimensional Lists 90
3.18.1 Unbalanced Matrixes 91
3.18.2 Creating Arbitrarily Large Matrixes 91
Summary 93
Review Questions 93
Suggested Problems 94

Chapter 4 Shortcuts, Command Line, and Packages 95


4.1 Overview 95
4.2 Twenty-Two Programming Shortcuts 95
4.2.1 Use Python Line Continuation as Needed 96
4.2.2 Use “for” Loops Intelligently 97
4.2.3 Understand Combined Operator Assignment (+= etc.) 98

From the Library of Vineeth Babu

Overland_Book.indb ix 4/30/19 1:37 PM


x Contents

4.2.4 Use Multiple Assignment 100


4.2.5 Use Tuple Assignment 101
4.2.6 Use Advanced Tuple Assignment 102
4.2.7 Use List and String “Multiplication” 104
4.2.8 Return Multiple Values 105
4.2.9 Use Loops and the “else” Keyword 106
4.2.10 Take Advantage of Boolean Values and “not” 107
4.2.11 Treat Strings as Lists of Characters 107
4.2.12 Eliminate Characters by Using “replace” 108
4.2.13 Don’t Write Unnecessary Loops 108
4.2.14 Use Chained Comparisons (n < x < m) 108
4.2.15 Simulate “switch” with a Table of Functions 109
4.2.16 Use the “is” Operator Correctly 110
4.2.17 Use One-Line “for” Loops 111
4.2.18 Squeeze Multiple Statements onto a Line 112
4.2.19 Write One-Line if/then/else Statements 112
4.2.20 Create Enum Values with “range” 113
4.2.21 Reduce the Inefficiency of the “print” Function Within IDLE 114
4.2.22 Place Underscores Inside Large Numbers 115
4.3 Running Python from the Command Line 115
4.3.1 Running on a Windows-Based System 115
4.3.2 Running on a Macintosh System 116
4.3.3 Using pip or pip3 to Download Packages 117
4.4 Writing and Using Doc Strings 117
4.5 Importing Packages 119
4.6 A Guided Tour of Python Packages 121
4.7 Functions as First-Class Objects 123
4.8 Variable-Length Argument Lists 125
4.8.1 The *args List 125
4.8.2 The “**kwargs” List 127
4.9 Decorators and Function Profilers 128
4.10 Generators 132
4.10.1 What’s an Iterator? 132
4.10.2 Introducing Generators 133
4.11 Accessing Command-Line Arguments 138
Summary 141
Questions for Review 142
Suggested Problems 142

From the Library of Vineeth Babu

Overland_Book.indb x 4/30/19 1:37 PM


Contents xi

Chapter 5 Formatting Text Precisely 145


5.1 Formatting with the Percent Sign Operator (%) 145
5.2 Percent Sign (%) Format Specifiers 147
5.3 Percent Sign (%) Variable-Length Print Fields 150
5.4 The Global “format” Function 152
5.5 Introduction to the “format” Method 156
5.6 Ordering by Position (Name or Number) 158
5.7 “Repr” Versus String Conversion 161
5.8 The “spec” Field of the “format” Function and Method 162
5.8.1 Print-Field Width 163
5.8.2 Text Justification: “fill” and “align” Characters 164
5.8.3 The “sign” Character 166
5.8.4 The Leading-Zero Character (0) 167
5.8.5 Thousands Place Separator 168
5.8.6 Controlling Precision 170
5.8.7 “Precision” Used with Strings (Truncation) 172
5.8.8 “Type” Specifiers 173
5.8.9 Displaying in Binary Radix 174
5.8.10 Displaying in Octal and Hex Radix 174
5.8.11 Displaying Percentages 175
5.8.12 Binary Radix Example 176
5.9 Variable-Size Fields 176
Summary 178
Review Questions 179
Suggested Problems 179

Chapter 6 Regular Expressions, Part I 181


6.1 Introduction to Regular Expressions 181
6.2 A Practical Example: Phone Numbers 183
6.3 Refining Matches 185
6.4 How Regular Expressions Work: Compiling Versus Running 188
6.5 Ignoring Case, and Other Function Flags 192
6.6 Regular Expressions: Basic Syntax Summary 193
6.6.1 Meta Characters 194
6.6.2 Character Sets 195

From the Library of Vineeth Babu

Overland_Book.indb xi 4/30/19 1:37 PM


xii Contents

6.6.3 Pattern Quantifiers 197


6.6.4 Backtracking, Greedy, and Non-Greedy 199
6.7 A Practical Regular-Expression Example 200
6.8 Using the Match Object 203
6.9 Searching a String for Patterns 205
6.10 Iterative Searching (“findall”) 206
6.11 The “findall” Method and the Grouping Problem 208
6.12 Searching for Repeated Patterns 210
6.13 Replacing Text 211
Summary 213
Review Questions 213
Suggested Problems 214

Chapter 7 Regular Expressions, Part II 215


7.1 Summary of Advanced RegEx Grammar 215
7.2 Noncapture Groups 217
7.2.1 The Canonical Number Example 217
7.2.2 Fixing the Tagging Problem 218
7.3 Greedy Versus Non-Greedy Matching 219
7.4 The Look-Ahead Feature 224
7.5 Checking Multiple Patterns (Look-Ahead) 227
7.6 Negative Look-Ahead 229
7.7 Named Groups 231
7.8 The “[Link]” Function 234
7.9 The Scanner Class and the RPN Project 236
7.10 RPN: Doing Even More with Scanner 239
Summary 243
Review Questions 243
Suggested Problems 244

Chapter 8 Text and Binary Files 245


8.1 Two Kinds of Files: Text and Binary 245
8.1.1 Text Files 246
8.1.2 Binary Files 246

From the Library of Vineeth Babu

Overland_Book.indb xii 4/30/19 1:37 PM


Contents xiii
8.2 Approaches to Binary Files: A Summary 247
8.3 The File/Directory System 248
8.4 Handling File-Opening Exceptions 249
8.5 Using the “with” Keyword 252
8.6 Summary of Read/Write Operations 252
8.7 Text File Operations in Depth 254
8.8 Using the File Pointer (“seek”) 257
8.9 Reading Text into the RPN Project 258
8.9.1 The RPN Interpreter to Date 258
8.9.2 Reading RPN from a Text File 260
8.9.3 Adding an Assignment Operator to RPN 262
8.10 Direct Binary Read/Write 268
8.11 Converting Data to Fixed-Length Fields (“struct”) 269
8.11.1 Writing and Reading One Number at a Time 272
8.11.2 Writing and Reading Several Numbers at a Time 272
8.11.3 Writing and Reading a Fixed-Length String 273
8.11.4 Writing and Reading a Variable-Length String 274
8.11.5 Writing and Reading Strings and Numerics Together 275
8.11.6 Low-Level Details: Big Endian Versus Little Endian 276
8.12 Using the Pickling Package 278
8.13 Using the “shelve” Package 280
Summary 282
Review Questions 283
Suggested Problems 283

Chapter 9 Classes and Magic Methods 285


9.1 Classes and Objects: Basic Syntax 285
9.2 More About Instance Variables 287
9.3 The “_ _init_ _” and “_ _new_ _” Methods 288
9.4 Classes and the Forward Reference Problem 289
9.5 Methods Generally 290
9.6 Public and Private Variables and Methods 292
9.7 Inheritance 293
9.8 Multiple Inheritance 294
9.9 Magic Methods, Summarized 295

From the Library of Vineeth Babu

Overland_Book.indb xiii 4/30/19 1:37 PM


xiv Contents

9.10 Magic Methods in Detail 297


9.10.1 String Representation in Python Classes 297
9.10.2 The Object Representation Methods 298
9.10.3 Comparison Methods 300
9.10.4 Arithmetic Operator Methods 304
9.10.5 Unary Arithmetic Methods 308
9.10.6 Reflection (Reverse-Order) Methods 310
9.10.7 In-Place Operator Methods 312
9.10.8 Conversion Methods 314
9.10.9 Collection Class Methods 316
9.10.10 Implementing “_ _iter_ _” and “_ _next_ _” 319
9.11 Supporting Multiple Argument Types 320
9.12 Setting and Getting Attributes Dynamically 322
Summary 323
Review Questions 324
Suggested Problems 325

Chapter 10 Decimal, Money, and Other Classes 327


10.1 Overview of Numeric Classes 327
10.2 Limitations of Floating-Point Format 328
10.3 Introducing the Decimal Class 329
10.4 Special Operations on Decimal Objects 332
10.5 A Decimal Class Application 335
10.6 Designing a Money Class 336
10.7 Writing the Basic Money Class (Containment) 337
10.8 Displaying Money Objects (“_ _str_ _”, “_ _repr_ _”) 338
10.9 Other Monetary Operations 339
10.10 Demo: A Money Calculator 342
10.11 Setting the Default Currency 345
10.12 Money and Inheritance 347
10.13 The Fraction Class 349
10.14 The Complex Class 353
Summary 357
Review Questions 357
Suggested Problems 358

From the Library of Vineeth Babu

Overland_Book.indb xiv 4/30/19 1:37 PM


Contents xv

Chapter 11 The Random and Math Packages 359


11.1 Overview of the Random Package 359
11.2 A Tour of Random Functions 360
11.3 Testing Random Behavior 361
11.4 A Random-Integer Game 363
11.5 Creating a Deck Object 365
11.6 Adding Pictograms to the Deck 368
11.7 Charting a Normal Distribution 370
11.8 Writing Your Own Random-Number Generator 374
11.8.1 Principles of Generating Random Numbers 374
11.8.2 A Sample Generator 374
11.9 Overview of the Math Package 376
11.10 A Tour of Math Package Functions 376
11.11 Using Special Values (pi) 377
11.12 Trig Functions: Height of a Tree 378
11.13 Logarithms: Number Guessing Revisited 381
11.13.1 How Logarithms Work 381
11.13.2 Applying a Logarithm to a Practical Problem 382
Summary 385
Review Questions 385
Suggested Problems 386

Chapter 12 The “numpy” (Numeric Python) Package 387


12.1 Overview of the “array,” “numpy,” and “matplotlib” Packages 387
12.1.1 The “array” Package 387
12.1.2 The “numpy” Package 387
12.1.3 The “[Link]” Package 388
12.1.4 The “matplotlib” Package 388
12.2 Using the “array” Package 388
12.3 Downloading and Importing “numpy” 390
12.4 Introduction to “numpy”: Sum 1 to 1 Million 391
12.5 Creating “numpy” Arrays 392
12.5.1 The “array” Function (Conversion to an Array) 394
12.5.2 The “arange” Function 396

From the Library of Vineeth Babu

Overland_Book.indb xv 4/30/19 1:37 PM


xvi Contents

12.5.3 The “linspace” Function 396


12.5.4 The “empty” Function 397
12.5.5 The “eye” Function 398
12.5.6 The “ones” Function 399
12.5.7 The “zeros” Function 400
12.5.8 The “full” Function 401
12.5.9 The “copy” Function 402
12.5.10 The “fromfunction” Function 403
12.6 Example: Creating a Multiplication Table 405
12.7 Batch Operations on “numpy” Arrays 406
12.8 Ordering a Slice of “numpy” 410
12.9 Multidimensional Slicing 412
12.10 Boolean Arrays: Mask Out That “numpy”! 415
12.11 “numpy” and the Sieve of Eratosthenes 417
12.12 Getting “numpy” Stats (Standard Deviation) 419
12.13 Getting Data on “numpy” Rows and Columns 424
Summary 429
Review Questions 429
Suggested Problems 430

Chapter 13 Advanced Uses of “numpy” 431


13.1 Advanced Math Operations with “numpy” 431
13.2 Downloading “matplotlib” 434
13.3 Plotting Lines with “numpy” and “matplotlib” 435
13.4 Plotting More Than One Line 441
13.5 Plotting Compound Interest 444
13.6 Creating Histograms with “matplotlib” 446
13.7 Circles and the Aspect Ratio 452
13.8 Creating Pie Charts 455
13.9 Doing Linear Algebra with “numpy” 456
13.9.1 The Dot Product 456
13.9.2 The Outer-Product Function 460
13.9.3 Other Linear Algebra Functions 462
13.10 Three-Dimensional Plotting 463
13.11 “numpy” Financial Applications 464

From the Library of Vineeth Babu

Overland_Book.indb xvi 4/30/19 1:37 PM


Contents xvii
13.12 Adjusting Axes with “xticks” and “yticks” 467
13.13 “numpy” Mixed-Data Records 469
13.14 Reading and Writing “numpy” Data from Files 471
Summary 475
Review Questions 475
Suggested Problems 476

Chapter 14 Multiple Modules and the RPN Example 477


14.1 Overview of Modules in Python 477
14.2 Simple Two-Module Example 478
14.3 Variations on the “import” Statement 482
14.4 Using the “_ _all_ _” Symbol 484
14.5 Public and Private Module Variables 487
14.6 The Main Module and “_ _main_ _” 488
14.7 Gotcha! Problems with Mutual Importing 490
14.8 RPN Example: Breaking into Two Modules 493
14.9 RPN Example: Adding I/O Directives 496
14.10 Further Changes to the RPN Example 499
14.10.1 Adding Line-Number Checking 500
14.10.2 Adding Jump-If-Not-Zero 502
14.10.3 Greater-Than (>) and Get-Random-Number (!) 504
14.11 RPN: Putting It All Together 508
Summary 513
Review Questions 514
Suggested Problems 514

Chapter 15 Getting Financial Data off the Internet 517


15.1 Plan of This Chapter 517
15.2 Introducing the Pandas Package 518
15.3 “stock_load”: A Simple Data Reader 519
15.4 Producing a Simple Stock Chart 521
15.5 Adding a Title and Legend 524
15.6 Writing a “makeplot” Function (Refactoring) 525

From the Library of Vineeth Babu

Overland_Book.indb xvii 4/30/19 1:37 PM


xviii Contents

15.7 Graphing Two Stocks Together 527


15.8 Variations: Graphing Other Data 530
15.9 Limiting the Time Period 534
15.10 Split Charts: Subplot the Volume 536
15.11 Adding a Moving-Average Line 538
15.12 Giving Choices to the User 540
Summary 544
Review Questions 545
Suggested Problems 545

Appendix A Python Operator Precedence Table 547

Appendix B Built-In Python Functions 549


abs(x) 550
all(iterable) 550
any(iterable) 550
ascii(obj) 551
bin(n) 551
bool(obj) 551
bytes(source, encoding) 552
callable(obj) 552
chr(n) 552
compile(cmd_str, filename, mode_str, flags=0,
dont_inherit=False, optimize=–1) 553
complex(real=0, imag=0) 553
complex(complex_str) 554
delattr(obj, name_str) 555
dir([obj]) 555
divmod(a, b) 556
enumerate(iterable, start=0) 556
eval(expr_str [, globals [, locals]] ) 557
exec(object [, global [, locals]]) 558
filter(function, iterable) 558
float([x]) 559
format(obj, [format_spec]) 559
frozenset([iterable]) 560
getattr(obj, name_str [,default]) 560

From the Library of Vineeth Babu

Overland_Book.indb xviii 4/30/19 1:37 PM


Contents xix
globals() 560
hasattr(obj, name_str) 561
hash(obj) 561
help([obj]) 561
hex(n) 561
id(obj) 561
input([prompt_str]) 562
int(x, base=10) 562
int() 562
isinstance(obj, class) 562
issubclass(class1, class2) 563
iter(obj) 563
len(sequence) 564
list([iterable]) 564
locals() 565
map(function, iterable1 [, iterable2…]) 565
max(arg1 [, arg2]…) 566
max(iterable) 566
min(arg1 [, arg2]…) 566
min(iterable) 567
oct(n) 567
open(file_name_str, mode='rt') 567
ord(char_str) 568
pow(x, y [, z]) 569
print(objects, sep='', end='\n', file=[Link]) 569
range(n) 570
range(start, stop [, step]) 570
repr(obj) 570
reversed(iterable) 571
round(x [,ndigits]) 571
set([iterable]) 572
setattr(obj, name_str, value) 573
sorted(iterable [, key] [, reverse]) 573
str(obj='') 573
str(obj=b'' [, encoding='utf-8']) 574
sum(iterable [, start]) 574
super(type) 575
tuple([iterable]) 575
type(obj) 575
zip(*iterables) 575

From the Library of Vineeth Babu

Overland_Book.indb xix 4/30/19 1:37 PM


xx Contents

Appendix C Set Methods 577


set_obj.add(obj) 577
set_obj.clear() 578
set_obj.copy() 578
set_obj.difference(other_set) 578
set_obj.difference_update(other_set) 578
set_obj.discard(obj) 579
set_obj.intersection(other_set) 579
set_obj.intersection_update(other_set) 579
set_obj.isdisjoint(other_set) 579
set_obj.issubset(other_set) 579
set_obj.issuperset(other_set) 580
set_obj.pop() 580
set_obj.remove(obj) 580
set_obj.symmetric_difference(other_set) 580
set_obj.symmetric_difference_update(other_set) 581
set_obj.union(other_set) 581
set_obj.union_update(other_set) 581

Appendix D Dictionary Methods 583


dict_obj.clear() 583
dict_obj.copy() 584
dict_obj.get(key_obj, default_val = None) 584
dict_obj.items() 585
dict_obj.keys() 585
dict_obj.pop(key [, default_value]) 585
dict_obj.popitem() 585
dict_obj.setdefault(key, default_value=None) 586
dict_obj.values() 586
dict_obj.update(sequence) 586

Appendix E Statement Reference 587


Variables and Assignments 587
Spacing Issues in Python 589
Alphabetical Statement Reference 590
assert Statement 590
break Statement 591
class Statement 591

From the Library of Vineeth Babu

Overland_Book.indb xx 4/30/19 1:37 PM


Contents xxi
continue Statement 593
def Statement 594
del Statement 594
elif Clause 595
else Clause 595
except Clause 595
for Statement 595
global Statement 596
if Statement 597
import Statement 598
nonlocal Statement 598
pass Statement 599
raise Statement 599
return Statement 599
try Statement 600
while Statement 602
with Statement 602
yield Statement 603

Index 605

From the Library of Vineeth Babu

Overland_Book.indb xxi 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


Preface
Books on Python aimed for the absolute beginner have become a cottage
industry these days. Everyone and their dog, it seems, wants to chase the
Python.
We’re a little biased, but one book we especially recommend is Python
Without Fear. It takes you by the hand and explains the major features one
at a time. But what do you do after you know a little of the language but not
enough to call yourself an “expert”? How do you learn enough to get a job or
to write major applications?
That’s what this book is for: to be the second book you ever buy on Python
and possibly the last.

What Makes Python Special?


It’s safe to say that many people are attracted to Python because it looks eas-
ier than C++. That may be (at least in the beginning), but underneath this
so-called easy language is a tool of great power, with many shortcuts and soft-
ware libraries called “packages” that—in some cases—do most of the work
for you. These let you create some really impressive software, outputting
beautiful graphs and manipulating large amounts of data.
For most people, it may take years to learn all the shortcuts and advanced
features. This book is written for people who want to get that knowledge now,
to get closer to being a Python expert much faster.

xxiii
From the Library of Vineeth Babu

Overland_Book.indb xxiii 4/30/19 1:37 PM


xxiv Preface

Paths to Learning: Where Do I Start?


This book offers different learning paths for different people.

◗ You’re rusty: If you’ve dabbled in Python but you’re a little rusty, you may
want to take a look at Chapter 1, “Review of the Fundamentals.” Otherwise,
you may want to skip Chapter 1 or only take a brief look at it.
◗ You know the basics but are still learning: Start with Chapters 2 and 3, which sur-
vey the abilities of strings and lists. This survey includes some advanced abilities
of these data structures that people often miss the first time they learn Python.
◗ Your understanding of Python is strong, but you don’t know everything yet:
Start with Chapter 4, which lists 22 programming shortcuts unique to Python,
that most people take a long time to fully learn.
◗ You want to master special features: You can start in an area of specialty. For
example, Chapters 5, 6, and 7 deal with text formatting and regular expres-
sions. The two chapters on regular expression syntax, Chapters 6 and 7, start
with the basics but then cover the finer points of this pattern-matching tech-
nology. Other chapters deal with other specialties. For example, Chapter 8
describes the different ways of handling text and binary files.
◗ You want to learn advanced math and plotting software: If you want to do
plotting, financial, or scientific applications, start with Chapter 12, “The ‘numpy’
(Numeric Python) Package.” This is the basic package that provides an under-
lying basis for many higher-level capabilities described in Chapters 13 through 15.

Clarity and Examples Are Everything


Even with advanced technology, our emphasis is on clarity, short examples,
more clarity, and more examples. We emphasize an interactive approach,
especially with the use of the IDLE environment, encouraging you to type in
statements and see what they do. Text in bold represents lines for you to type
in, or to be added or changed.
>>> print('Hello', 'my', 'world!')
Hello my world!
Several of the applications in this book are advanced pieces of software,
including a Deck object, a fully functional “RPN” language interpreter, and a
multifaceted stock-market program that presents the user with many choices.
With these applications, we start with simple examples in the beginning,
finally showing all the pieces in context. This approach differs from many

From the Library of Vineeth Babu

Overland_Book.indb xxiv 4/30/19 1:37 PM


Preface xxv
books, which give you dozens of functions all out of order, with no sense of
architecture. In this book, architecture is everything.
You can download examples from [Link]/books.

Learning Aids: Icons


This book makes generous use of tables for ease of reference, as well as con-
ceptual art (figures). Our experience is that while poorly conceived figures can
be a distraction, the best figures can be invaluable. A picture is worth a thou-
sand words. Sometimes, more.
We also believe that in discussing plotting and graphics software, there’s no
substitute for showing all the relevant screen shots.
The book itself uses a few important, typographical devices. There are
three special icons used in the text.

Note Ë We sometimes use Notes to point out facts you’ll eventually want to know
but that diverge from the main discussion. You might want to skip over Notes the
first time you read a section, but it’s a good idea to go back later and read them.
Ç Note

ntax
The Key Syntax Icon introduces general syntax displays, into which you supply
Key Sy

some or all of the elements. These elements are called “placeholders,” and they
appear in italics. Some of the syntax—especially keywords and punctuation—
are in bold and intended to be typed in as shown. Finally, square brackets,
when not in bold, indicate an optional item. For example:
set([iterable])
This syntax display implies that iterable is an iterable object (such as a
list or a generator object) that you supply. And it’s optional.
Square brackets, when in bold, are intended literally, to be typed in as
shown. For example:
list_name = [obj1, obj2, obj3, …]
Ellipses (…) indicate a language element that can be repeated any number
of times.

Performance Performance tips are like Notes in that they constitute a short digression
Tip from the rest of the chapter. These tips address the question of how you
can improve software performance. If you’re interested in that topic, you’ll
want to pay special attention to these notes.
Ç Performance Tip

From the Library of Vineeth Babu

Overland_Book.indb xxv 4/30/19 1:37 PM


xxvi Preface

What You’ll Learn


The list of topics in this book that are not in Python Without Fear or other
“beginner” texts is a long one, but here is a partial list of some of the major
areas:

◗ List, set, and dictionary comprehension.


◗ Regular expressions and advanced formatting techniques; how to use them in
lexical analysis.
◗ Packages: the use of Python’s advanced numeric and plotting software. Also,
special types such as Decimal and Fraction.
◗ Mastering all the ways of using binary file operations in Python, as well as
text operations.
◗ How to use multiple modules in Python while avoiding the “gotchas.”
◗ Fine points of object-oriented programming, especially all the “magic meth-
ods,” their quirks, their special features, and their uses.

Have Fun
When you master some or all of the techniques of this book, you should make
a delightful discovery: Python often enables you to do a great deal with a rel-
atively small amount of code. That’s why it’s dramatically increasing in popu-
larity every day. Because Python is not just a time-saving device, it’s fun to be
able to program this way . . . to see a few lines of code do so much.
We wish you the joy of that discovery.

Register your copy of Supercharged Python on the InformIT site for conve-
nient access to updates and/or corrections as they become available. To start
the registration process, go to [Link]/register and log in or create
an account. Enter the product ISBN (9780135159941) and click Submit.
Look on the Registered Products tab for an Access Bonus Content link
next to this product, and follow that link to access any available bonus
materials. If you would like to be notified of exclusive offers on new edi-
tions and updates, please check the box to receive email from us.

From the Library of Vineeth Babu

Overland_Book.indb xxvi 4/30/19 1:37 PM


Acknowledgments
From Brian
I want to thank my coauthor, John Bennett. This book is the result of close
collaboration between the two of us over half a year, in which John was there
every step of the way to contribute ideas, content, and sample code, so his
presence is there throughout the book. I also want to thank Greg Doench,
acquisitions editor, who was a driving force behind the concept, purpose, and
marketing of this book.
This book also had a wonderful supporting editorial team, including
Rachel Paul and Julie Nahil. But I want to especially thank copy editor Betsy
Hardinger, who showed exceptional competence, cooperation, and profes-
sionalism in getting the book ready for publication.

From John
I want to thank my coauthor, Brian Overland, for inviting me to join him on
this book. This allows me to pass on many of the things I had to work hard to
find documentation for or figure out by brute-force experimentation. Hope-
fully this will save readers a lot of work dealing with the problems I ran into.

xxvii
From the Library of Vineeth Babu

Overland_Book.indb xxvii 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


About the Authors
Brian Overland started as a professional programmer back in his twenties,
but also worked as a computer science, English, and math tutor. He enjoys
picking up new languages, but his specialty is explaining them to others, as
well as using programming to do games, puzzles, simulations, and math prob-
lems. Now he’s the author of over a dozen books on programming.
In his ten years at Microsoft he was a software tester, programmer/writer,
and manager, but his greatest achievement was in presenting Visual Basic 1.0,
as lead writer and overall documentation project lead. He believes that project
changed the world by getting people to develop for Windows, and one of the
keys to its success was showing it could be fun and easy.
He’s also a playwright and actor, which has come in handy as an instructor
in online classes. As a novelist, he’s twice been a finalist in the Pacific North-
west Literary Contest but is still looking for a publisher.

John Bennett was a senior software engineer at Proximity Technology, Franklin


Electronic Publishing, and Microsoft Corporation. More recently, he’s devel-
oped new programming languages using Python as a prototyping tool. He
holds nine U.S. patents, and his projects include a handheld spell checker and
East Asian handwriting recognition software.

xxix
From the Library of Vineeth Babu

Overland_Book.indb xxix 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


1 Review of the
Fundamentals
You and Python could be the start of a beautiful friendship. You may have
heard that Python is easy to use, that it makes you productive fast. It’s true.
You may also find that it’s fun. You can start programming without worrying
about elaborate setups or declarations.
Although this book was written primarily for people who’ve already had
an introduction to Python, this chapter can be your starting point to an excit-
ing new journey. To download Python, go to [Link].
ntax If you’re familiar with all the basic concepts in Python, you can skip this
Key Sy

chapter. You might want to take a look at the global statement at the end of
this chapter, however, if you’re not familiar with it. Many people fail to under-
stand this keyword.

1.1 Python Quick Start


Start the Python interactive development environment (IDLE). At the prompt,
you can enter statements, which are executed; and expressions, which Python
evaluates and prints the value of.
You can follow along with this sample session, which shows input for
you to enter in bold. The nonbold characters represent text printed by the
environment.
>>> a = 10
>>> b = 20
>>> c = 30
>>> a + b + c
60
This “program” places the values 10, 20, and 30 into three variables and
adds them together. So far, so good, but not amazing.

1
From the Library of Vineeth Babu

Overland_Book.indb 1 4/30/19 1:37 PM


2 Chapter 1 Review of the Fundamentals

If it helps you in the beginning, you can think of variables as storage loca-
tions into which to place values, even though that’s not precisely what Python
does.
What Python really does is make a, b, and c into names for the values 10,
20, and 30. By this we mean “names” in the ordinary sense of the word. These
names are looked up in a symbol table; they do not correspond to fixed places
in memory! The difference doesn’t matter now, but it will later, when we get
to functions and global variables. These statements, which create a, b, and c
as names, are assignments.
In any case, you can assign new values to a variable once it’s created. So
in the following example, it looks as if we’re incrementing a value stored in a
magical box (even though we’re really not doing that).
>>> n = 5
>>> n = n + 1
>>> n = n + 1
>>> n
7
What’s really going on is that we’re repeatedly reassigning n as a name for
an increasingly higher value. Each time, the old association is broken and n
refers to a new value.
Assignments create variables, and you can’t use a variable name that hasn’t
yet been created. IDLE complains if you attempt the following:
>>> a = 5
>>> b = a + x # ERROR!
Because x has not yet been assigned a value, Python isn’t happy. The solu-
tion is to assign a value to x before it’s used on the right side of an assignment.
In the next example, referring to x no longer causes an error, because it’s been
assigned a value in the second line.
>>> a = 5
>>> x = 2.5
>>> b = a + x
>>> b
7.5
Python has no data declarations. Let us repeat that: There are no data dec-
larations. Instead, a variable is created by an assignment. There are some
other ways to create variables (function arguments and for loops), but for
the most part, a variable must appear on the left of an assignment before it
appears on the right.

From the Library of Vineeth Babu

Overland_Book.indb 2 4/30/19 1:37 PM


1.1 Python Quick Start 3
You can run Python programs as scripts. From within IDLE, do the

1
following:

◗ From the Files menu, choose New File.


◗ Enter the program text. For this next example, enter the following:
side1 = 5
side2 = 12
hyp = (side1 * side1 + side2 * side2) ** 0.5
print(hyp)

Then choose Run Module from the Run menu. When you’re prompted to
save the file, click OK and enter the program name as [Link]. The program
then runs and prints the results in the main IDLE window (or “shell”).
Alternatively, you could enter this program directly into the IDLE environ-
ment, one statement at a time, in which case the sample session should look
like this:
>>> side1 = 5
>>> side2 = 12
>>> hyp = (side1 * side1 + side2 * side2) ** 0.5
>>> hyp
13.0
Let’s step through this example a statement or two at a time. First, the val-
ues 5 and 12 are assigned to variables side1 and side2. Then the hypotenuse
of a right triangle is calculated by squaring both values, adding them together,
and taking the square root of the result. That’s what ** 0.5 does. It raises a
value to the power 0.5, which is the same as taking its square root.
(That last factoid is a tidbit you get from not falling asleep in algebra class.)
The answer printed by the program should be 13.0. It would be nice to
write a program that calculated the hypotenuse for any two values entered by
the user; but we’ll get that soon enough by examining the input statement.
Before moving on, you should know about Python comments. A comment is
text that’s ignored by Python itself, but you can use it to put in information help-
ful to yourself or other programmers who may need to maintain the program.
All text from a hashtag (#) to the end of the line is a comment. This is text
ignored by Python itself that still may be helpful for human readability’s sake.
For example:
side1 = 5 # Initialize one side.
side2 = 12 # Initialize the other.
hyp = (side1 * side1 + side2 * side2) ** 0.5
print(hyp) # Print results.

From the Library of Vineeth Babu

Overland_Book.indb 3 4/30/19 1:37 PM


4 Chapter 1 Review of the Fundamentals

1.2 Variables and Naming Names


Although Python gives you some latitude in choosing variable names, there
are some rules.

◗ The first character must be a letter or an underscore (_), but the remaining
characters can be any combination of underscores, letters, and digits.
◗ However, names with leading underscores are intended to be private to a
class, and names starting with double underscores may have special meaning,
such as _ _init_ _ or _ _add_ _, so avoid using names that start with double
underscores.
◗ Avoid any name that is a keyword, such as if, else, elif, and, or, not,
class, while, break, continue, yield, import, and def.
◗ Also, although you can use capitals if you want (names are case-sensitive),
initial-all-capped names are generally reserved for special types, such as class
names. The universal Python convention is to stick to all-lowercase for most
variable names.

Within these rules, there is still a lot of leeway. For example, instead of
using boring names like a, b, and c, we can use i, thou, and a jug_of_wine—
because it’s more fun (with apologies to Omar Khayyam).
i = 10
thou = 20
a_jug_of_wine = 30
loaf_of_bread = 40
inspiration = i + thou + a_jug_of_wine + loaf_of_bread
print(inspiration, 'percent good')
This prints the following:
100 percent good

1.3 Combined Assignment Operators


From the ideas in the previous section, you should be able to see that the fol-
lowing statements are valid.
n = 10 # n is a name for 10.
n = n + 1 # n is a name for 11.
n = n + 1 # n is a name for 12.

From the Library of Vineeth Babu

Overland_Book.indb 4 4/30/19 1:37 PM


1.4 Summary of Python Arithmetic Operators 5
A statement such as n = n + 1 is extremely common, so much so that Python

1
offers a shortcut, just as C and C++ do. Python provides shortcut assignment
ops for many combinations of different operators within an assignment.
n = 0 # n must exist before being modified.
n += 1 # Equivalent to n = n + 1
n += 10 # Equivalent to n = n + 10
n *= 2 # Equivalent to n = n * 2
n -= 1 # Equivalent to n = n - 1
n /= 3 # Equivalent to n = n / 3
The effect of these statements is to start n at the value 0. Then they add 1
to n, then add 10, and then double that, resulting in the value 22, after which 1
is subtracted, producing 21. Finally, n is divided by 3, producing a final result
of n set to 7.0.

1.4 Summary of Python Arithmetic Operators


Table 1.1 summarizes Python arithmetic operators, shown by precedence,
alongside the corresponding shortcut (a combined assignment operation).

Table 1.1. Summary of Arithmetic Operators


SYNTAX DESCRIPTION ASSIGNMENT OP PRECEDENCE
a ** b Exponentiation **= 1
a * b Multiplication *= 2
a / b Division /= 2
a // b Ground division //= 2
a % b Remainder division %= 2
a + b Addition += 3
a - b Subtraction -= 3

Table 1.1 shows that exponentiation has a higher precedence than the mul-
tiplication, division, and remainder operations, which in turn have a higher
precedence than addition and subtraction.
Consequently, parentheses are required in the following statement to pro-
duce the desired result:
hypot = (a * a + b * b) ** 0.5
This statement adds a squared to b squared and then takes the square root
of the sum.

From the Library of Vineeth Babu

Overland_Book.indb 5 4/30/19 1:37 PM


6 Chapter 1 Review of the Fundamentals

1.5 Elementary Data Types: Integer and Floating Point


Because Python has no data declarations, a variable’s type is whatever type
the associated data object is.
For example, the following assignment makes x a name for 5, which has int
type. This is the integer type, which is a number that has no decimal point.
x = 5 # x names an integer.
But after the following reassignment, x names a floating-point number,
thereby changing the variable’s type to float.
x = 7.3 # x names a floating-pt value.
As in other languages, putting a decimal point after a number gives it
floating-point type, even the digit following the decimal point is 0:
x = 5.0
Python integers are “infinite integers,” in that Python supports arbitrarily
large integers, subject only to the physical limitations of the system. For exam-
ple, you can store 10 to the 100th power, so Python can handle this:
google = 10 ** 100 # Raise 10 to the power of 100.
Integers store quantities precisely. Unlike floating-point values, they don’t
have rounding errors.
But system capacities ultimately impose limitations. A googleplex is 10
raised to the power of a google (!). That’s too big even for Python. If every 0
were painted on a wooden cube one centimeter in length, the physical universe
would be far too small to contain a printout of the number.
(As for attempting to create a googleplex; well, as they say on television, “Don’t
try this at home.” You’ll have to hit Ctrl+C to stop Python from hanging. It’s
like when Captain Kirk said to the computer, “Calculate pi to the last digit.”)
ions
Vers

The way that Python interprets integer and floating-point division (/) depends
on the version of Python in use.

In Python 3.0, the rules for division are as follows:

◗ Division of any two numbers (integer and/or floating point) always results in a
floating-point result. For example:
4 / 2 # Result is 2.0
7 / 4 # Result is 1.75

From the Library of Vineeth Babu

Overland_Book.indb 6 4/30/19 1:37 PM


1.6 Basic Input and Output 7
◗ If you want to divide one integer by another and get an integer result, use

1
ground division (//). This also works with floating-point values.
4 // 2 # Result is 2
7 // 4 # Result is 1
23 // 5 # Result is 4
8.0 // 2.5 # Result is 3.0
◗ You can get the remainder using remainder (or modulus) division.
23 % 5 # Result is 3

Note that in remainder division, a division is carried out first and the quo-
tient is thrown away. The result is whatever is left over after division. So 5
goes into 23 four times but results in a remainder of 3.
In Python 2.0, the rules are as follows:

◗ Division between two integers is automatically ground division, so the remain-


der is thrown away:
7 / 2 # Result is 3 (in Python 2.0)
◗ To force a floating-point result, convert one of the operands to floating-point
format.
7 / 2.0 # Result is 3.5
7 / float(2) # Ditto
◗ Remember that you can always use modulus division (%) to get the remainder.

Python also supports a divmod function that returns quotient and remain-
der as a tuple (that is, an ordered group) of two values. For example:
quot, rem = divmod(23, 10)
The values returned in quot and rem, in this case, will be 2 and 3 after exe-
cution. This means that 10 divides into 23 two times and leaves a remainder
of 3.

1.6 Basic Input and Output


Earlier, in Section 1.1, we promised to show how to prompt the user for the
values used as inputs to a formula. Now we’re going to make good on that
promise. (You didn’t think we were lying, did you?)

From the Library of Vineeth Babu

Overland_Book.indb 7 4/30/19 1:37 PM


8 Chapter 1 Review of the Fundamentals

The Python input function is an easy-to-use input mechanism that includes


an optional prompt. The text typed by the user is returned as a string.
ions
Vers

In Python 2.0, the input function works differently: it instead evaluates the
string entered as if it were a Python statement. To achieve the same result as
the Python 3.0 input statement, use the raw_input function in Python 2.0.

The input function prints the prompt string, if specified; then it returns
the string the user entered. The input string is returned as soon as the user
presses the Enter key; but no newline is appended.
ntax
Key Sy

input(prompt_string)
To store the string returned as a number, you need to convert to integer
(int) or floating-point (float) format. For example, to get an integer use this
code:
n = int(input('Enter integer here: '))
Or use this to get a floating-point number:
x = float(input('Enter floating pt value here: '))
The prompt is printed without an added space, so you typically need to
provide that space yourself.
Why is an int or float conversion necessary? Remember that they are
necessary when you want to get a number. When you get any input by using
the input function, you get back a string, such as “5.” Such a string is fine for
many purposes, but you cannot perform arithmetic on it without performing
the conversion first.
Python 3.0 also supports a print function that—in its simplest form—
prints all its arguments in the order given, putting a space between each.
ntax
Key Sy

print(arguments)
Python 2.0 has a print statement that does the same thing but does not use
parentheses.
The print function has some special arguments that can be entered by
using the name.

◗ sep=string specifies a separator string to be used instead of the default sep-


arator, which is one space. This can be an empty string if you choose: sep=''.
◗ end=string specifies what, if anything, to print after the last argument is
printed. The default is a newline. If you don’t want a newline to be printed, set
this argument to an empty string or some other string, as in end=''.

From the Library of Vineeth Babu

Overland_Book.indb 8 4/30/19 1:37 PM


1.7 Function Definitions 9
Given these elementary functions—input and print—you can create a

1
Python script that’s a complete program. For example, you can enter the fol-
lowing statements into a text file and run it as a script.
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = ((side1 * side1) + (side2 * side2)) ** 0.5
print('Length of hypotenuse is:', hyp)

1.7 Function Definitions


Within the Python interactive development environment, you can more easily
enter a program if you first enter it as a function definition, such as main. Then
call that function. Python provides the def keyword for defining functions.
def main():
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = (side1 * side1 + side2 * side2) ** 0.5
print('Length of hypotenuse is: ', hyp)
Note that you must enter the first line as follows. The def keyword, paren-
theses, and colon (:) are strictly required.
def main():
If you enter this correctly from within IDLE, the environment automati-
cally indents the next lines for you. Maintain this indentation. If you enter the
function as part of a script, then you must choose an indentation scheme, and
it must be consistent. Indentation of four spaces is recommended when you
have a choice.

Note Ë Mixing tab characters with actual spaces can cause errors even though it
might not look wrong. So be careful with tabs!
Ç Note

Because there is no “begin block” and “end block” syntax, Python relies on
indentation to know where statement blocks begin and end. The critical rule
is this:

✱ Within any given block of code, the indentation of all statements (that is, at
the same level of nesting) must be the same.

From the Library of Vineeth Babu

Overland_Book.indb 9 4/30/19 1:37 PM


10 Chapter 1 Review of the Fundamentals

For example, the following block is invalid and needs to be revised.


def main():
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = (side1 * side1 + side2 * side2) ** 0.5
print('Length of hypotenuse is: ', hyp)
If you have a nested block inside a nested block, the indentation of each
level must be consistent. Here’s an example:
def main():
age = int(input('Enter your age: '))
name = input('Enter your name: ')
if age < 30:
print('Hello', name)
print('I see you are less than 30.')
print('You are so young.')
The first three statements inside this function definition are all at the same level
of nesting; the last three statements are at a deeper level. But each is consistent.
Even though we haven't gotten to the if statement yet (we’re just about to),
you should be able to see that the flow of control in the next example is differ-
ent from the previous example.
def main():
age = int(input('Enter your age: '))
name = input('Enter your name: ')
if age < 30:
print('Hello', name)
print('I see you are less than 30.')
print('You are so young.')
Hopefully you can see the difference: In this version of the function, the last
two lines do not depend on your age being less than 30. That’s because Python
uses indentation to determine the flow of control.
Because the last two statements make sense only if the age is less than 30,
it’s reasonable to conclude that this version has a bug. The correction would
be to indent the last two statements so that they line up with the first print
statement.
After a function is defined, you can call that function—which means to
make it execute—by using the function name, followed by parentheses. (If you
don’t include the parentheses, you will not successfully execute the function!)
main()

From the Library of Vineeth Babu

Overland_Book.indb 10 4/30/19 1:37 PM


1.8 The Python “if” Statement 11
So let’s review. To define a function, which means to create a kind of mini-
program unto itself, you enter the def statement and keep entering lines in

1
the function until you’re done—after which, enter a blank line. Then run
the function by typing its name followed by parentheses. Once a function is
defined, you can execute it as often as you want.
So the following sample session, in the IDLE environment, shows the process
of defining a function and calling it twice. For clarity, user input is in bold.
>>> def main():
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = (side1 * side1 + side2 * side2) ** 0.5
print('Length of hypotenuse is: ', hyp)

>>> main()
Enter length of a side: 3
Enter another length: 4
Length of hypotenuse is: 5.0
>>> main()
Enter length of a side: 30
Enter another length: 40
Length of hypotenuse is: 50.0
As you can see, once a function is defined, you can call it (causing it to exe-
cute) as many times as you like.
The Python philosophy is this: Because you should do this indentation any-
way, why shouldn’t Python rely on the indentation and thereby save you the
extra work of putting in curly braces? This is why Python doesn’t have any
“begin block” or “end block” syntax but relies on indentation.

1.8 The Python “if ” Statement


As with all Python control structures, indentation matters in an if statement,
as does the colon at the end of the first line.
if a > b:
print('a is greater than b')
c = 10
The if statement has a variation that includes an optional else clause.
if a > b:
print('a is greater than b')
c = 10

From the Library of Vineeth Babu

Overland_Book.indb 11 4/30/19 1:37 PM


12 Chapter 1 Review of the Fundamentals

else:
print('a is not greater than b')
c = -10
An if statement can also have any number of optional elif clauses.
Although the following example has statement blocks of one line each, they
can be larger.
age = int(input('Enter age: '))
if age < 13:
print('You are a preteen.')
elif age < 20:
print('You are a teenager.')
elif age <= 30:
print('You are still young.')
else:
print('You are one of the oldies.')
You cannot have empty statement blocks; to have a statement block that
does nothing, use the pass keyword.
Here’s the syntax summary, in which square brackets indicate optional
items, and the ellipses indicate a part of the syntax that can be repeated any
number of times.
ntax
Key Sy

if condition:
indented_statements
[ elif condition:
indented_statements ]...
[ else:
indented_statements ]

1.9 The Python “while” Statement


Python has a while statement with one basic structure. (There is no “do
while” version, although there is an optional else clause, as mentioned in
Chapter 4.)
This limitation helps keep the syntax simple. The while keyword creates
a loop, which tests a condition just as an if statement does. But after the
indented statements are executed, program control returns to the top of the
ntax
loop and the condition is tested again.
Key Sy

while condition:
indented_statements

From the Library of Vineeth Babu

Overland_Book.indb 12 4/30/19 1:37 PM


1.9 The Python “while” Statement 13
Here’s a simple example that prints all the numbers from 1 to 10.

1
n = 10 # This may be set to any positive integer.
i = 1
while i <= n:
print(i, end=' ')
i += 1
Let’s try entering these statements in a function. But this time, the function
takes an argument, n. Each time it’s executed, the function can take a differ-
ent value for n.
>>> def print_nums(n):
i = 1
while i <= n:
print(i, end=' ')
i += 1

>>> print_nums(3)
1 2 3
>>> print_nums(7)
1 2 3 4 5 6 7
>>> print_nums(8)
1 2 3 4 5 6 7 8
It should be clear how this function works. The variable i starts as 1, and
it’s increased by 1 each time the loop is executed. The loop is executed again
as long as i is equal to or less than n. When i exceeds n, the loop stops, and no
further values are printed.
Optionally, the break statement can be used to exit from the nearest
enclosing loop. And the continue statement can be used to continue to the
next iteration of the loop immediately (going to the top of the loop) but not
exiting as break does.
ntax
Key Sy

break
For example, you can use break to exit from an otherwise infinite loop.
True is a keyword that, like all words in Python, is case-sensitive. Capitaliza-
tion matters.
n = 10 # Set n to any positive integer.
i = 1
while True: # Always executes!
print(i)

From the Library of Vineeth Babu

Overland_Book.indb 13 4/30/19 1:37 PM


14 Chapter 1 Review of the Fundamentals

if i >= n:
break
i += 1
Note the use of i += 1. If you’ve been paying attention, this means the
same as the following:
i = i + 1 # Add 1 to the current value and reassign.

1.10 A Couple of Cool Little Apps


At this point, you may be wondering, what’s the use of all this syntax if it
doesn’t do anything? But if you’ve been following along, you already know
enough to do a good deal. This section shows two great little applications that
do something impressive . . . although we need to add a couple of features.
Here’s a function that prints any number of the famous Fibonacci sequence:
def pr_fibo(n):
a, b = 1, 0
while a < n:
print(a, sep=' ')
a, b = a + b, a
You can make this a complete program by running it from within IDLE or
by adding these module-level lines below it:
n = int(input('Input n: '))
pr_fibo(n)
New features, by the way, are contained in these lines of the function
definition:
a, b = 1, 0
a, b = a + b, a
These two statements are examples of tuple assignment, which we return
to in later chapters. In essence, it enables a list of values to be used as inputs,
and a list of variables to be used as outputs, without one assignment interfer-
ing with the other. These assignments could have been written as
a = 1
b = 0
...
temp = a
a = a + b
b = temp

From the Library of Vineeth Babu

Overland_Book.indb 14 4/30/19 1:37 PM


1.11 Summary of Python Boolean Operators 15
Simply put, a and b are initialized to 1 and 0, respectively. Then, later, a is
set to the total a + b, while simultaneously, b is set to the old value of a.

1
The second app (try it yourself!) is a complete computer game. It secretly
selects a random number between 1 and 50 and then requires you, the player,
to try to find the answer through repeated guesses.
The program begins by using the random package; we present more infor-
mation about that package in Chapter 11. For now, enter the first two lines as
shown, knowing they will be explained later in the book.
from random import randint
n = randint(1, 50)
while True:
ans = int(input('Enter a guess: '))
if ans > n:
print('Too high! Guess again. ')
elif ans < n:
print('Too low! Guess again. ')
else:
print('Congrats! You got it!')
break
To run, enter all this in a Python script (choose New from the File menu),
and then choose Run Module from the Run menu, as usual. Have fun.

1.11 Summary of Python Boolean Operators


The Boolean operators return the special value True or False. Note that the
logic operators and and or use short-circuit logic. Table 1.2 summarizes these
operators.

Table 1.2. Python Boolean and Comparison Operators


OPERATOR MEANING EVALUATES TO
== Test for equality True or False
!= Test for inequality True or False
> Greater than True or False
< Less than True or False
>= Greater than or equal to True or False
<= Less than or equal to True or False
and Logical “and” Value of first or second operand
▼ continued on next page

From the Library of Vineeth Babu

Overland_Book.indb 15 4/30/19 1:37 PM


16 Chapter 1 Review of the Fundamentals

Table 1.2. Python Boolean and Comparison Operators (continued)


OPERATOR MEANING EVALUATES TO
or Logical “or” Value of first or second operand
not Logical “not” True or False, reversing value of its single
operand

All the operators in Table 1.2 are binary—that is, they take two oper-
ands—except not, which takes a single operand and reverses its logical value.
Here’s an example:
if not (age > 12 and age < 20):
print('You are not a teenager.')
By the way, another way to write this—using a Python shortcut—is to write
the following:
if not (12 < age < 20):
print('You are not a teenager.')
This is, as far as we know, a unique Python coding shortcut. In Python
3.0, at least, this example not only works but doesn’t even require parentheses
right after the if and not keywords, because logical not has low precedence
as an operator.

1.12 Function Arguments and Return Values


Function syntax is flexible enough to support multiple arguments and multi-
ple return values.
ntax
Key Sy

def function_name(arguments):
indented_statements
In this syntax, arguments is a list of argument names, separated by com-
mas if there’s more than one. Here’s the syntax of the return statement:
return value
You can also return multiple values:
return value, value ...
Finally, you can omit the return value. If you do, the effect is the same as the
statement return None.
return # Same effect as return None

From the Library of Vineeth Babu

Overland_Book.indb 16 4/30/19 1:37 PM


1.12 Function Arguments and Return Values 17
Execution of a return statement causes immediate exit and return to the

1
caller of the function. Reaching the end of a function causes an implicit return—
returning None by default. (Therefore, using return at all is optional.)
Technically speaking, Python argument passing is closer to “pass by ref-
erence” than “pass by value”; however, it isn’t exactly either. When a value is
passed to a Python function, that function receives a reference to the named
data. However, whenever the function assigns a new value to the argument
variable, it breaks the connection to the original variable that was passed.
Therefore, the following function does not do what you might expect. It
does not change the value of the variable passed to it.
def double_it(n):
n = n * 2

x = 10
double_it(x)
print(x) # x is still 10!
This may at first seem a limitation, because sometimes a programmer needs
to create multiple “out” parameters. However, you can do that in Python by
returning multiple values directly. The calling statement must expect the values.
def set_values():
return 10, 20, 30
a, b, c = set_values()
The variables a, b, and c are set to 10, 20, and 30, respectively.
Because Python has no concept of data declarations, an argument list in
Python is just a series of comma-separated names—except that each may
optionally be given a default value. Here is an example of a function definition
with two arguments but no default values:
def calc_hyp(a, b):
hyp = (a * a + b * b) ** 0.5
return hyp
These arguments are listed without type declaration; Python functions do
no type checking except the type checking you do yourself! (However, you can
check a variable’s type by using the type or isinstance function.)
Although arguments have no type, they may be given default values.
The use of default values enables you to write a function in which not all
arguments have to be specified during every function call. A default argument
has the following form:
argument_name = default_value

From the Library of Vineeth Babu

Overland_Book.indb 17 4/30/19 1:37 PM


18 Chapter 1 Review of the Fundamentals

For example, the following function prints a value multiple times, but the
default number of times is 1:
def print_nums(n, rep=1):
i = 1
while i <= rep:
print(n)
i += 1
Here, the default value of rep is 1; so if no value is given for the last argument,
it’s given the value 1. Therefore this function call prints the number 5 one time:
print_nums(5)
The output looks like this:
5

Note Ë Because the function just shown uses n as an argument name, it’s nat-
ural to assume that n must be a number. However, because Python has no
variable or argument declarations, there’s nothing enforcing that; n could just
as easily be passed a string.
But there are repercussions to data types in Python. In this case, a problem
can arise if you pass a nonnumber to the second argument, rep. The value
passed here is repeatedly compared to a number, so this value, if given, needs
to be numeric. Otherwise, an exception, representing a runtime error, is raised.
Ç Note

Default arguments, if they appear in the function definition, must come


after all other arguments.
Another special feature is the use of named arguments. These should not
be confused with default values, which is a separate issue. Default arguments
are specified in a function definition. Named arguments are specified during
a function call.
Some examples should clarify. Normally, argument values are assigned to
arguments in the order given. For example, suppose a function is defined
to have three arguments:
def a_func(a, b, c):
return (a + b) * c
But the following function call specifies c and b directly, leaving the first
argument to be assigned to a, by virtue of its position.
print(a_func(4, c = 3, b = 2))

From the Library of Vineeth Babu

Overland_Book.indb 18 4/30/19 1:37 PM


1.14 Python Strings 19
The result of this function call is to print the value 18. The values 3, 4, and 2
are assigned out of order, so that a, b, and c, respectively get 4, 2, and 3.

1
Named arguments, if used, must come at the end of the list of arguments.

1.13 The Forward Reference Problem


In most computer languages, there’s an annoying problem every programmer
has to deal with: the forward reference problem. The problem is this: In what
order do I define my functions?
It’s a problem because the general rule is that a function must exist before
you call it. In a way, it’s parallel to the rule for variables, which is that a vari-
able must exist before you use it to calculate a value.
So how do you ensure that every function exists—meaning it must be
defined—before you call it? And what if, God forbid, you have two functions
that need to call each other? The problem is easily solved if you follow two rules:

◗ Define all your functions before you call any of them.


◗ Then, at the very end of the source file, put in your first module-level function
call. (Module-level code is code that is outside any function.)

This solution works because a def statement creates a function as a call-


able object but does not yet execute it. Therefore, if funcA calls funcB, you
can define funcA first—as long as when you get around to executing funcA,
funcB is also defined.

1.14 Python Strings


Python has a text string class, str, which enables you to use characters of
printable text. The class has many built-in capabilities. If you want to get a list
of them, type the following into IDLE:
>>> help(str)
You can specify Python strings using a variety of quotation marks. The
only rule is that they must match. Internally, the quotation marks are not
stored as part of the string itself. This is a coding issue; what's the easiest way
to represent certain strings?
s1 = 'This is a string.'

s2 = "This is also a string."

From the Library of Vineeth Babu

Overland_Book.indb 19 4/30/19 1:37 PM


20 Chapter 1 Review of the Fundamentals

s3 = '''This is a special literal


quotation string.'''
The last form—using three consecutive quotation marks to delimit the
string—creates a literal quotation string. You can also repeat three double
quotation marks to achieve the same effect.
s3 = """This is a special literal
quotation string."""
If a string is delimited by single quotation marks, you can easily embed
double quotation marks.
s1 = 'Shakespeare wrote "To be or not to be."'
But if a string is delimited by double quotation marks, you can easily embed
single quotation marks.
s2 = "It's not true, it just ain't!"
You can print these two strings.
print(s1)
print(s2)
This produces the following:
Shakespeare wrote "To be or not to be."
It's not true, it just ain't!
The benefit of the literal quotation syntax is that it enables you to embed
both kinds of quotation marks, as well as embed newlines.
'''You can't get it at "Alice's Restaurant."'''
Alternatively, you can place embedded quotation marks into a string by
using the backslash (\) as an escape character.
s2 = 'It\'s not true, it just ain\'t!'
Chapter 2, “Advanced String Capabilities,” provides a nearly exhaustive
tour of string capabilities.
You can deconstruct strings in Python, just as you can in Basic or C, by
indexing individual characters, using indexes running from 0 to N–1, where N
is the length of the string. Here’s an example:
s = 'Hello'
s[0]

From the Library of Vineeth Babu

Overland_Book.indb 20 4/30/19 1:37 PM


1.15 Python Lists (and a Cool Sorting App) 21
This produces

1
'H'
However, you cannot assign new values to characters within existing
strings, because Python strings are immutable: They cannot be changed.
How, then, can new strings be constructed? You do that by using a combi-
nation of concatenation and assignment. Here’s an example:
s1 = 'Abe'
s2 = 'Lincoln'
s1 = s1 + ' ' + s2
In this example, the string s1 started with the value 'Abe', but then it ends
up containing 'Abe Lincoln'.
This operation is permitted because a variable is only a name.
Therefore, you can “modify” a string through concatenation without actually
violating the immutability of strings. Why? It’s because each assignment cre-
ates a new association between the variable and the data. Here’s an example:
my_str = 'a'
my_str += 'b'
my_str += 'c'
The effect of these statements is to create the string 'abc' and to assign
it (or rather, reassign it) to the variable my_str. No string data was actually
modified, despite appearances. What’s really going on in this example is that
the name my_str is used and reused, to name an ever-larger string.
You can think of it this way: With every statement, a larger string is created
and then assigned to the name my_str.
In dealing with Python strings, there’s another important rule to keep in
mind: Indexing a string in Python produces a single character. In Python, a
single character is not a separate type (as it is in C or C++), but is merely a
string of length 1. The choice of quotation marks used has no effect on this
rule.

1.15 Python Lists (and a Cool Sorting App)


Python’s most frequently used collection class is called the list collection, and
it’s incredibly flexible and powerful.
ntax
Key Sy

[ items ]

From the Library of Vineeth Babu

Overland_Book.indb 21 4/30/19 1:37 PM


22 Chapter 1 Review of the Fundamentals

Here the square brackets are intended literally, and items is a list of zero or
more items, separated by commas if there are more than one. Here’s an exam-
ple, representing a series of high temperatures, in Fahrenheit, over a summer
weekend:
[78, 81, 81]
Lists can contain any kind of object (including other lists!) and, unlike C or
C++, Python lets you mix the types. For example, you can have lists of strings:
['John', 'Paul', 'George', 'Ringo' ]
And you can have lists that mix up the types:
['John', 9, 'Paul', 64 ]
However, lists that have mixed types cannot be automatically sorted in
Python 3.0, and sorting is an important feature.
Unlike some other Python collection classes (dictionaries and sets), order
is significant in a list, and duplicate values are allowed. But it’s the long list of
built-in capabilities (all covered in Chapter 3) that makes Python lists really
impressive. In this section we use two: append, which adds an element to a list
dynamically, and the aforementioned sort capability.
Here’s a slick little program that showcases the Python list-sorting capabil-
ity. Type the following into a Python script and run it.
a_list = []

while True:
s = input('Enter name: ')
if not s:
break
a_list.append(s)
a_list.sort()
print(a_list)
Wow, that’s incredibly short! But does it work? Here’s a sample session:
Enter name: John
Enter name: Paul
Enter name: George
Enter name: Ringo
Enter name: Brian
Enter name:
['Brian', 'George', 'John', 'Paul', 'Ringo']

From the Library of Vineeth Babu

Overland_Book.indb 22 4/30/19 1:37 PM


1.16 The “for” Statement and Ranges 23
See what happened? Brian (who was the manager, I believe) got added to

1
the group and now all are printed in alphabetical order.
This little program, you should see, prompts the user to enter one name at
a time; as each is entered, it’s added to the list through the append method.
Finally, when an empty string is entered, the loop breaks. After that, it’s sorted
and printed.

1.16 The “for” Statement and Ranges


When you look at the application in the previous section, you may wonder
whether there is a refined way, or at least a more flexible way, to print the con-
tents of a list. Yes, there is. In Python, that’s the central (although not exclu-
sive) purpose of the for statement: to iterate through a collection and perform
the same operation on each element.
One such use is to print each element. The last line of the application in the
previous section could be replaced by the following, giving you more control
over how to print the output.
for name in a_list:
print(name)
Now the output is
Brian
George
John
Paul
Ringo
In the sample for statement, iterable is most often a collection, such as
a list, but can also be a call to the range function, which is a generator that
produces an iteration through a series of values. (You’ll learn more about gen-
erators in Chapter 4.)
ntax
Key Sy

for var in iterable:


indented_statements
Notice again the importance of indenting, as well the colon (:).
Values are sent to a for loop in a way similar to function-argument pass-
ing. Consequently, assigning a value to a loop variable has no effect on the
original data.
my_lst = [10, 15, 25]
for thing in my_lst:
thing *= 2

From the Library of Vineeth Babu

Overland_Book.indb 23 4/30/19 1:37 PM


24 Chapter 1 Review of the Fundamentals

It may seem that this loop should double each element of my_lst, but it
does not. To process a list in this way, changing values in place, it’s necessary
to use indexing.
my_lst = [10, 15, 25]
for i in [0, 1, 2]:
my_lst[i] *= 2
This has the intended effect: doubling each individual element of my_lst,
so that now the list data is [20, 30, 50].
To index into a list this way, you need to create a sequence of indexes of the
form
0, 1, 2, ... N-1
in which N is the length of the list. You can automate the production of such
sequences of indexes by using the range function. For example, to double
every element of an array of length 5, use this code:
my_lst = [100, 102, 50, 25, 72]
for i in range(5):
my_lst[i] *= 2
This code fragment is not optimal because it hard-codes the length of the
list, that length being 5, into the code. Here is a better way to write this loop:
my_lst = [100, 102, 50, 25, 72]
for i in range(len(my_lst)):
my_lst[i] *= 2
After this loop is executed, my_lst contains [200, 204, 100, 50, 144].
The range function produces a sequence of integers as shown in Table 1.3,
depending on whether you specify one, two, or three arguments.

Table 1.3. Effects of the Range Function


SYNTAX EFFECT
range(end) Produces a sequence beginning with 0, up to but not
including end.
range(beg, end) Produces a sequence beginning with beg, up to but not
including end.
range(beg, end, step) Produces a sequence beginning with beg, up to but not
including end; however, the elements are increased by the
value of step each time. If step is negative, then the range
counts backward.

From the Library of Vineeth Babu

Overland_Book.indb 24 4/30/19 1:37 PM


1.17 Tuples 25
Another use of range is to create a loop that iterates through a series of

1
integers. For example, the following loop calculates a factorial number.
n = int(input('Enter a positive integer: '))
prod = 1
for i in range(1, n + 1):
prod *= i
print(prod)
This loop works because range(1, n + 1) produces integers beginning
with 1 up to but not including n + 1. This loop therefore has the effect of
doing the following calculation:
1 * 2 * 3 * ... n

1.17 Tuples
The Python concept of tuple is closely related to that of lists; if anything, the
concept of tuple is even more fundamental. The following code returns a list
of integers:
def my_func():
return [10, 20, 5]
This function returns values as a list.
my_lst = my_func()
But the following code, returning a simple series of values, actually returns
a tuple:
def a_func():
return 10, 20, 5
It can be called as follows:
a, b, c = a_func()
Note that a tuple is a tuple even if it’s grouped within parentheses for clar-
ity’s sake.
return (10, 20, 5) # Parens have no effect in
# this case.
The basic properties of a tuple and a list are almost the same: Each is an
ordered collection, in which any number of repeated values are allowed.

From the Library of Vineeth Babu

Overland_Book.indb 25 4/30/19 1:37 PM


26 Chapter 1 Review of the Fundamentals

However, unlike a list, a tuple is immutable; tuple values cannot be changed


in place. Tuples do not support all the methods or functions supported by lists;
in particular, tuples do not support any methods that modify the contents of
the tuple.

1.18 Dictionaries
A Python dictionary is a collection that contains a series of associations
between key-value pairs. Unlike lists, dictionaries are specified with curly
braces, not square brackets.
ntax
Key Sy

{ key1: value1, key2: value2, ...}


In plain English, a dictionary is like a flat, two-column table in a database.
It lacks the advanced features of modern database management systems; it’s
only a table. But it can still serve as a rich data-storage object in your Python
programs.
The keys for a dictionary are a series of unique values; keys cannot be
duplicated. For each key there’s an associated data object, called a value. For
example, you can create a dictionary for grading a class of students as follows:
grade_dict = { 'Bob':3.9, 'Sue':3.9, 'Dick':2.5 }
This statement creates a dictionary with three entries—the strings “Bob,”
“Sue,” and “Dick”—which have the associated values 3.9, 3.9, and 2.5, respec-
tively. Note it’s perfectly fine to duplicate the value 3.9, because it’s not a key.
As usual, grade_dict is only a name, and you can give a dictionary almost
any name you want (as long as the name obeys the rules listed earlier). I’ve
chosen the name grade_dict, because it is suggestive of what this object is.
After a dictionary is created, you can always add a value through a state-
ment such as this:
grade_dict['Bill G'] = 4.0
This statement adds the key “Bill G” and associates it with the value 4.0.
That data is added to the dictionary named grade_dict. If the key “Bill G”
already exists, the statement is still valid; but it has the effect of replacing the
value associated with “Bill G” rather than adding Bill as a new entry.
You can print, or otherwise refer to, a value in the dictionary by using a
statement such as the following. Note what it does: It uses a string (“Bill G”)
as a key, a kind of index value, to find the data associated with that key.
print(grade_dict['Bill G']) # Print the value 4.0

From the Library of Vineeth Babu

Overland_Book.indb 26 4/30/19 1:37 PM


1.18 Dictionaries 27
Note that you can start with an empty dictionary and then add data to it.

1
grade_dict = { }
Additional rules apply to selecting types for use in dictionaries:

◗ In Python version 3.0, all the keys must share the same type, or at least a com-
patible type, such as integers and floating point, that can be compared.
◗ The key type should be immutable (data you cannot change “in place”).
Strings and tuples are immutable, but lists are not.
◗ Therefore, lists such as [1, 2] cannot be used for keys, but tuples, such as
(1, 2), can.
◗ The values may be of any type; however, it is often a good idea to use the same
type, if possible, for all the value objects.

There’s a caution you should keep in mind. If you attempt to get the value
for a particular key and if that key does not exist, Python raises an exception.
To avoid this, use the get method to ensure that the specified key exists.
ntax
Key Sy

[Link](key [,default_value])
In this syntax, the square brackets indicate an optional item. If the key
exists, its corresponding value in the dictionary is returned. Otherwise, the
default_value is returned, if specified; or None is returned if there is no
such default value. This second argument enables you to write efficient histo-
gram code such as the following, which counts frequencies of words.
s = (input('Enter a string: ')).split()
wrd_counter = {}
for wrd in s:
wrd_counter[wrd] = wrd_counter.get(wrd, 0) + 1
What this example does is the following: When it finds a new word, that
word is entered into the dictionary with the value 0 + 1, or just 1. If it finds an
existing word, that word frequency is returned by get, and then 1 is added to
it. So if a word is found, its frequency count is incremented by 1. If the word
is not found, it’s added to the dictionary with a starting count of 1. Which is
what we want.
In this example, the split method of the string class is used to divide a
string into a list of individual words. For more information on split, see Sec-
tion 2.12, “Breaking Up Input Using ‘split’.”

From the Library of Vineeth Babu

Overland_Book.indb 27 4/30/19 1:37 PM


28 Chapter 1 Review of the Fundamentals

1.19 Sets
Sets are similar to dictionaries, but they lack associated values. A set, in effect,
is only a set of unique keys, which has the effect of making a set different from
a list in the following ways:

◗ All its members must be unique. An attempt to add an existing value to a set is
simply ignored.
◗ All its members should be immutable, as with dictionary keys.
◗ Order is never significant.

For example, consider the following two set definitions:


b_set1 = { 'John', 'Paul', 'George', 'Pete' }
b_set2 = { 'John', 'George', 'Pete', 'Paul' }
These two sets are considered fully equal to each other, as are the following
two sets:
set1 = {1, 2, 3, 4, 5}
set2 = {5, 4, 3, 2, 1}
Once a set is created, you can manipulate contents by using the add and
remove methods. For example:
b_set1.remove('Pete')
b_set1.add('Ringo')
(Don’t you always feel sorry for Pete?)
Note that when creating a new set, you cannot simply use a pair of empty
curly braces, because that syntax is used to create empty dictionaries. Instead,
use the following syntax:
my_set = set()
Set collections also support the union and intersection methods, as well as
use of the following operators:
setA = {1, 2, 3, 4}
setB = {3, 4, 5}
setUnion = setA | setB # Assign {1, 2, 3, 4, 5}
setIntersect = setA & setB # Assign {3, 4}
setXOR = setA ^ setB # Assign {1, 2, 5}
setSub = setA - setB # Assign {1, 2}

From the Library of Vineeth Babu

Overland_Book.indb 28 4/30/19 1:37 PM


1.20 Global and Local Variables 29
In these examples, setUnion and setIntersect are the results of union
and intersection operations, respectively. setXOR is the result of an either/or

1
operation; it has all those elements that appear in one set or the other but not
both. setSub contains elements that are in the first set (setA in this case) but
not the second (setB).
Appendix C, “Set Methods,” lists all the methods supported by the set
class, along with examples for most of them.

1.20 Global and Local Variables


Python variables can be global or local, just as in other languages. Some pro-
grammers discourage the use of global variables, but when you need them,
you need them.
What is a global variable? It’s a variable that retains its value between func-
tion calls and is visible to all functions. So a change to my_global_var in one
function reflects the value of my_global_var in another.
If a variable x is referred to within a function definition, then the local
version of x is used—provided such a variable exists at the local level. Other-
wise, a global version of the variable is used if it exists.
Local scope, as opposed to global, means that changes to the variable have
no effect on variables having the same name outside the function definition.
The variable in that case is private. But a global variable is visible everywhere.
For example, the following statements create two versions of count: a local
version and a global one. But by default, the function uses its own (local) ver-
sion of the variable.
count = 10
def funcA():
count = 20
print(count) # Prints 20, a local value.

def funcB():
print(count) # Prints 10, the global value.
Do you see how this works? The first function in this example uses its
own local version of count, because such a variable was created within that
function.
But the second function, funcB, created no such variable. Therefore, it uses the
global version, which was created in the first line of the example (count = 10).
The difficulty occurs when you want to refer to a global version of a vari-
able, but you make it the target of an assignment statement. Python has no

From the Library of Vineeth Babu

Overland_Book.indb 29 4/30/19 1:37 PM


30 Chapter 1 Review of the Fundamentals

concept of data declarations, so adding an assignment statement has the effect


of creating a new variable. And that’s a problem, because when it creates a
variable, then by default the variable will be local if it’s inside a function.
For example, suppose you have funcB change the value of count. You can
do so, but now funcB refers only to its own private copy of count. If you
were relying on the function to change the value of count recognized every-
where, you’re out of luck.
def funcB():
count = 100 # count now is local, no effect
# on global version of count.
print(count) # Prints 100, the local value.
The solution is to use the global statement. This statement tells Python to
avoid using a local version of the variable; it therefore must refer to the global
version, assuming it exists. Here’s an example:
count = 10 # Variable created as global.

def my_func():
global count
count += 1

my_func() # Call my_func.


print(count) # Prints 11.
Now, calling my_func causes the value of count to be changed, and this
affects program code outside the function itself, as you can see. If my_func
had referred to a local copy of count, then it would have no effect on count
outside the function.
The global statement itself does not create anything; you need an assign-
ment to do that. In the previous example, count was created in the statement
preceding the function definition.
Module-level code, which consists of all statements outside function and
class definitions, enables you to create global variables. But so does the fol-
lowing code, which—upon being executed—creates a variable foo if it does
not already exist.
def my_func():
global foo
foo = 5 # Create foo if it does not already
# exist (as a global).
print(foo)

From the Library of Vineeth Babu

Overland_Book.indb 30 4/30/19 1:37 PM


Review Questions 31
Assuming foo does not already exist, the effect of this function is to create
foo and set it to 5. It cannot be created as a local—because of the statement

1
global foo—and therefore foo is created as a global variable. This works
even though the assignment to foo is not part of module-level code.
In general, there is a golden rule about global and local variables in Python.
It’s simply this:

✱ If there’s any chance that a function might attempt to assign a value to a global
variable, use the global statement so that it’s not treated as local.

Chapter 1 Summary
Chapter 1 covers the fundamentals of Python except for class definitions,
advanced operations on collections, and specialized parts of the library such
as file operations. The information presented here is enough to write many
Python programs.
So congratulations! If you understand everything in this chapter, you are
already well on the way to becoming a fluent Python programmer. The next
couple of chapters plunge into the fine points of lists and strings, the two most
important kinds of collections.
Chapter 3 covers called something called “comprehension” in Python (not
to be confused with artificial intelligence) and explains how comprehension
applies not only to lists but also to sets, dictionaries, and other collections. It
also shows you how to use lambda functions.

Chapter 1 Review Questions


1 Considering that there are no data declarations in Python, is it even theoreti-
cally possible to have uninitialized data?
2 In what sense are Python integers “infinite,” and in what sense are they not
infinite at all?
3 Is a class having infinite range even theoretically possible?
4 How exactly is indentation in Python more critical than in most other pro-
gramming languages?
5 The best policy is to use a completely consistent indentation scheme through-
out a Python program, but does Python give you some leeway? Exactly where
must indentation be consistent in a program? Where can it differ? Show exam-
ples if you can.

From the Library of Vineeth Babu

Overland_Book.indb 31 4/30/19 1:37 PM


32 Chapter 1 Review of the Fundamentals

6 Explain precisely why tab characters can cause a problem with the indenta-
tions used in a Python program (and thereby introduce syntax errors)?
7 What is the advantage of having to rely so much on indentation in Python?
8 How many different values can a Python function return to the caller?
9 Recount this chapter’s solution to the forward reference problem for func-
tions. How can such an issue arise in the first place?
10 When you’re writing a Python text string, what, if anything, should guide
your choice of what kind of quotation marks to use (single, double, or triple)?
11 Name at least one way in which Python lists are different from arrays in other
languages, such as C, which are contiguously stored collections of a single
base type.

Chapter 1 Suggested Problems


1 Write a little program that asks for your name, age, and address, and then
prints all the information you just entered. However, instead of placing it in
a function called main, place it in a function called test_func. Then call
test_func to run it.
2 Write a program that gets the radius of a sphere, calculates the volume, and
then prints the answer. If necessary, look up the volume formula online.

From the Library of Vineeth Babu

Overland_Book.indb 32 4/30/19 1:37 PM


2 Advanced String
Capabilities
How does a computer communicate messages to humans? Through hand-
waving, smoke signals, or (as in sci-fi movies of the 1950s) a blinking red light?
No. Even programs that utilize voice or voice recognition (somewhat out-
side the scope of this book) depend on groups of printable characters called
text strings, or just strings. Every programmer needs to manage the art of
prompting for, searching, and printing these strings. Fortunately, Python
excels at this task.
Even if you’ve used Python text strings before, you’ll likely want to peruse
this chapter to make sure that you’re using all the built-in capabilities of
Python strings.

2.1 Strings Are Immutable


Data types in Python are either mutable (changeable) or immutable.
The advantage of mutable types is clear. The data can be changed “in
place,” meaning you don’t have to reconstruct an object from scratch every
time you make a change. Mutable types include lists, dictionaries, and sets.
The advantage of immutable types is less obvious but important. An
immutable type can be used as a key for a dictionary; such keys are frequently
strings. For example, you might have a ratings dictionary to list average rat-
ings from a group of critics.
movie_dict = { 'Star Bores': 5.0,
'The Oddfather': 4.5,
'Piranha: The Revenge': 2.0 }
Another advantage of immutable types is that because they cannot be
changed, their usage is optimized internally. Using tuples, for example, is
somewhat more efficient than using lists.

33
From the Library of Vineeth Babu

Overland_Book.indb 33 4/30/19 1:37 PM


34 Chapter 2 Advanced String Capabilities

The limitation of immutable types is that such data cannot be changed in


place. The following statements, for example, are not valid.
my_str = 'hello, Dave, this is Hal.'
my_str[0] = 'H' # ERROR!
The second statement in this example is invalid because it attempts to take
the string created in the first statement and modify the data itself. As a result,
Python raises a TypeError exception.
But the following statements are valid.
my_str = 'hello'
my_str = 'Hello'
These statements are valid because each time, a completely new string is
created, and the name my_str is reassigned.
In Python, a variable is nothing more than a name, and it may be reused,
over and over. That’s why these last statements might seem to violate immu-
tability of strings but in fact do not. No existing string is altered in this last
example; rather, two different strings are created and the name my_str is
reused.
This behavior follows from the nature of assignment in Python and its lack
of data declarations. You can reuse a name as often as you want.

2.2 Numeric Conversions, Including Binary


Type names in Python implicitly invoke type conversions wherever such con-
versions are supported.
ntax
Key Sy

type(data_object)
The action is to take the specified data_object and produce the result
after converting it to the specified type—if the appropriate conversion exists.
If not, Python raises a ValueError exception.
Here are some examples:
s = '45'
n = int(s)
x = float(s)
If you then print n and x, you get the following:
45
45.0

From the Library of Vineeth Babu

Overland_Book.indb 34 4/30/19 1:37 PM


2.2 Numeric Conversions, Including Binary 35
The int conversion, unlike most conversions, takes an optional second
argument. This argument enables you to convert a string to a number while
interpreting it in a different radix, such as binary. Here’s an example:
n = int('10001', 2) # Interpret in binary radix.
Printing n reveals it was assigned the decimal value 17.

2
Likewise, you can use other bases with the int conversion. The following
code uses octal (8) and hexadecimal (16) bases.
n1 = int('775', 8)
n2 = int('1E', 16)
print('775 octal and 16 hex:', n1, n2)
These statements print the following results:
775 octal and 1E hex: 509 30
We can therefore summarize the int conversion as taking an optional sec-
ond argument, which has a default value of 10, indicating decimal radix.
ntax
Key Sy

int(data_object, radix=10)
The int and float conversions are necessary when you get input from the
keyboard—usually by using the input statement—or get input from a text
file, and you need to convert the digit characters into an actual numeric value.
A str conversion works in the opposite direction. It converts a number into
its string representation. In fact, it works on any type of data for which the
type defines a string representation.
Converting a number to a string enables you to do operations such as
counting the number of printable digits or counting the number of times a
specific digit occurs. For example, the following statements print the length of
the number 1007.
n = 1007
s = str(n) # Convert to '1007'
print('The length of', n, 'is', len(s), 'digits.')
This example prints the following output:
The length of 1007 is 4 digits.
There are other ways to get this same information. You could, for exam-
ple, use the mathematical operation that takes the base-10 logarithm. But
this example suggests what you can do by converting a number to its string
representation.

From the Library of Vineeth Babu

Overland_Book.indb 35 4/30/19 1:37 PM


36 Chapter 2 Advanced String Capabilities

Note Ë Converting a number to its string representation is not the same as con-
verting a number to its ASCII or Unicode number. That's a different opera-
tion, and it must be done one character at a time by using the ord function.
Ç Note

2.3 String Operators (+, =, *, >, etc.)


The string type, str, supports some of the same operators that numeric types
do, but interprets them differently. For example, addition (+) becomes string
concatenation when applied to strings rather than numbers.
Here’s an example of some valid string operators: assignment and test for
equality.
dog1_str = 'Rover' # Assignment
dog2_str = dog1_str # Create alias for.

dog1_str == dog2_str # True!


dog1_str == 'Rover' # True!
In this example, the second statement creates a reference, or alias, for the same
data that dog1_str refers to. (If, however, dog1_str is later assigned to new
data, dog2_str still refers to 'Rover'.) Because dog1_str and dog2_str
refer to the same data, the first test for equality must produce the value True.
But the second test for equality also returns True. As long as two strings
have the same content, they are considered equal. They do not necessarily
have to be aliases for the same data in memory.
All operator-based comparisons with Python strings are case-sensitive.
There are several ways to ignore case. You can convert both operands to
uppercase or both to lowercase (by using the upper or lower string method),
and that will work fine with strings that contain ASCII characters only.
However, if you’re working with strings that use the wider Unicode character
set, the safest way to do case-insensitive comparisons is to use the casefold
method, provided specifically for this purpose.
def compare_no_case(str1, str2):
return [Link]() == [Link]()

print(compare_no_case('cat', 'CAT')) # Return True.


Table 2.1 lists the operators available with the str type.

From the Library of Vineeth Babu

Overland_Book.indb 36 4/30/19 1:37 PM


2.3 String Operators (+, =, *, >, etc.) 37
Table 2.1. String Operators
OPERATOR SYNTAX DESCRIPTION
name = str Assigns the string data to the specified variable name.
str1 == str2 Returns True if str1 and str2 have the same contents. (As with all comparison
ops, this is case-sensitive.)

2
str1 != str2 Returns True if str1 and str2 have different contents.
str1 < str2 Returns True if str1 is earlier in alphabetical ordering than str2. For
example, 'abc' < 'def' returns True, but 'abc' < 'aaa' returns False.
(See the note about ordering.)
str1 > str2 Returns True if str1 is later in alphabetical ordering than str2. For example,
'def' > 'abc' returns True, but 'def' > 'xyz' returns False.
str1 <= str2 Returns True if str1 is earlier than str2 in alphabetical ordering or if the
strings have the same content.
str1 >= str2 Returns True if str1 is later than str2 in alphabetical ordering or if the
strings have the same content.
str1 + str2 Produces the concatenation of the two strings, which is the result of simply
gluing str2 contents onto the end of str1. For example, 'Big' + 'Deal'
produces the concatenated string 'BigDeal'.
str1 * n Produces the result of a string concatenated onto itself n times, where n is an
integer. For example, 'Goo' * 3 produces 'GooGooGoo'.
n * str1 Same effect as str1 * n.
str1 in str2 Produces True if the substring str1, in its entirety, is contained in str2.
str1 not in str2 Produces True if the substring str1 is not contained in str2.
str is obj Returns True if str and obj refer to the same object in memory; sometimes
necessary for comparisons to None or to an unknown object type.
str is not obj Returns True if str and obj do not refer to the same object in memory.

Note Ë When strings are compared, Python uses a form of alphabetical order;
more specifically, it uses code point order, which looks at ASCII or Unicode
values of the characters. In this order, all uppercase letters precede all lower-
case letters, but otherwise letters involve alphabetical comparisons, as you'd
expect. Digit comparisons also work as you’d expect, so that '1' is less than '2'.
Ç Note

The concatenation operator (+) for strings may be familiar, because it is


supported in many languages that have some kind of string class.

From the Library of Vineeth Babu

Overland_Book.indb 37 4/30/19 1:37 PM


38 Chapter 2 Advanced String Capabilities

Concatenation does not automatically add a space between two words. You
have to do that yourself. But all strings, including literal strings such as ' ',
have the same type, str, so Python has no problem carrying out the following:
first = 'Will'
last = 'Shakespeare'
full_name = first + ' ' + last
print(full_name)
This example prints
Will Shakespeare
The string-multiplication operator (*) can be useful when you’re doing
character-oriented graphics and want to initialize a long line—a divider, for
example.
divider_str = '_' * 30
print(divider_str)
This prints the following:
_ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
The result of this operation, '_' * 30, is a string made up of 30 underscores.

Performance There are other ways of creating a string containing 30 underscores in a


Tip row, but the use of the multiplication operator (*) is by far the most
efficient.
Ç Performance Tip

Be careful not to abuse the is and is not operators. These operators test
for whether or not two values are the same object in memory. You could have
two string variables, for example, which both contain the value "cat". Test-
ing them for equality (==) will always return True in this situation, but obj1
is obj2 might not.
When should you use is or is not? You should use them primarily when
you’re comparing objects of different types, for which the appropriate test for
equality (==) might not be defined. One such case is testing to see whether
some value is equal to the special value None, which is unique and therefore
appropriate to test using is.

From the Library of Vineeth Babu

Overland_Book.indb 38 4/30/19 1:37 PM


2.4 Indexing and Slicing 39

2.4 Indexing and Slicing


Two of the ways to extract data from strings include indexing and slicing:

◗ Indexing uses a number to refer to an individual character, according to its

2
place within the string.
◗ Slicing is an ability more unique to Python. It enables you to refer to an entire
substring of characters by using a compact syntax.

Lists support similar abilities, so Chapter 3, “Advanced List Capabilities,”


should look similar. However, there are some differences. The biggest one is this:

✱ You cannot use indexing, slicing, or any other operation to change values of a
string “in place,” because strings are immutable.

You can use both positive (nonnegative) and negative indexes in any combi-
nation. Figure 2.1 illustrates how positive indexes run from 0 to N–1, where N
is the length of the string.
This figure also illustrates negative indexes, which run backward from –1
(indicating the last character) to –N.

K i n g M e !
0 1 2 3 4 5 6 7

K i n g M e !
–8 –7 –6 –5 –4 –3 –2 –1
Figure 2.1. String indexing in Python

Aside from immutability, there’s another difference between strings and


lists. Indexing a string always produces a one-character string, assuming the
index is valid. A one-character string has str type, just as a larger string does.
So, for example, suppose you index the first character of 'Hello'; the
result is the string 'H'. Although its length is 1, it’s still a string.
s = 'Hello'
ch = s[0]
print(type(ch))

From the Library of Vineeth Babu

Overland_Book.indb 39 4/30/19 1:37 PM


40 Chapter 2 Advanced String Capabilities

This code, if executed, prints the following results—demonstrating that


ch, though it only contains one character, still has type str:
<class 'str'>
Python has no separate “character” type.
Slicing is a special ability shared by Python strings, lists, and tuples. Table
2.2 summarizes the syntax supported for slicing of strings, which produces
substrings. Remember that you can’t assign into a slice of a string.

Table 2.2. Slicing Syntax for Python Strings


SYNTAX GETS THIS SUBSTRING
string[beg: end] All characters starting with beg, up to but not including end.
string[:end] All characters from the beginning of the string up to but
not including end.
string[beg:] All elements from beg forward to the end of the string.
string[:] All characters in the string; this operation copies the entire
string.
string[beg: end: step] All characters starting with beg, up to but not including
end, moving through the string step items at a time.

Suppose you want to remove the beginning and last characters from a
string. In this case, you’ll want to combine positive and negative indexes. Start
with a string that includes opening and closing double quotation marks.
king_str = '"Henry VIII"'
If you print this string directly, you get the following:
"Henry VIII"
But what if you want to print the string without the quotation marks? An
easy way to do that is by executing the following code:
new_str = king_str[1:-1]
print(new_str)
The output is now
Henry VIII
Figure 2.2 illustrates how this works. In slicing operations, the slice begins
with the first argument, up to but not including the second argument.

From the Library of Vineeth Babu

Overland_Book.indb 40 4/30/19 1:37 PM


2.4 Indexing and Slicing 41
king_str[1:-1]

0 1 –1
“ H e n r y V I I I “

2
Sliced section includes 1, up to
but not including –1
Figure 2.2. String slicing example 1

Here’s another example. Suppose we’d like to extract the second word,
“Bad,” from the phrase “The Bad Dog.” As Figure 2.3 illustrates, the correct
slice would begin with index 4 and extend to all the characters up to but not
including index 7. The string could therefore be accessed as string[4:7].

string[4:7]

0 1 2 3 4 5 6 7 8 9 10
T h e B a d D o g

Sliced section includes 4, up to


but not including 7
Figure 2.3. String slicing example 2

The rules for slicing have some interesting consequences.

◗ If both beg and end are positive indexes, beg-end gives the maximum length
of the slice.
◗ To get a string containing the first N characters of a string, use string[:N].
◗ To get a string containing the last N characters of a string, use string[-N:].
◗ To cause a complete copy of the string to be made, use string[:].

Slicing permits a third, and optional, step argument. When positive, the
step argument specifies how many characters to move ahead at a time. A
step argument of 2 means “Get every other character.” A step argument of
3 means “Get every third character.” For example, the following statements
start with the second character in 'RoboCop' and then step through the string
two characters at a time.

From the Library of Vineeth Babu

Overland_Book.indb 41 4/30/19 1:37 PM


42 Chapter 2 Advanced String Capabilities

a_str = 'RoboCop'
b_str = a_str[1::2] # Get every other character.
print(b_str)
This example prints the following:
ooo
Here’s another example. A step value of 3 means “Get every third charac-
ter.” This time the slice, by default, starts in the first position.
a_str = 'AbcDefGhiJklNop'
b_str = a_str[::3] # Get every third character.
print(b_str)
This example prints the following:
ADGJN
You can even use a negative step value, which causes the slicing to be per-
formed backward through the string. For example, the following function
returns the exact reverse of the string fed to it as an argument.
def reverse_str(s):
return s[::-1]

print(reverse_str('Wow Bob wow!'))


print(reverse_str('Racecar'))
This example prints the following:
!wow boB woW
racecaR
When slicing, Python does not raise an exception for out-of-range indexes.
It simply gets as much input as it can. In some cases, that may result in an
empty string.
a_str = 'cat'
b_str = a_str[10:20] # b_str assigned an empty string.

2.5 Single-Character Functions (Character Codes)


There are two functions intended to be used with strings of length 1. In effect,
these are single-character functions, even though they operate on strings.

From the Library of Vineeth Babu

Overland_Book.indb 42 4/30/19 1:37 PM


2.5 Single-Character Functions (Character Codes) 43
ntax ord(str) # Returns a numeric code
Key Sy

chr(n) # Converts ASCII/Unicode to a one-char str.


The ord function expects a string argument but raises a TypeError excep-
tion if the string is greater than 1. You can use this function to return the
ASCII or Unicode value corresponding to a character. For example, the fol-

2
lowing example confirms that the ASCII code for the letter A is decimal 65.
print(ord('A')) # Print 65.
The chr function is the inverse of the ord function. It takes a character code
and returns its ASCII or Unicode equivalent, as a string of length1. Calling chr
with an argument of 65 should therefore print a letter A, which it does.
print(chr(65)) # Print 'A'
The in and not in operators, although not limited to use with one-character
strings, often are used that way. For example, the following statements test
whether the first character of a string is a vowel:
s = 'elephant'
if s[0] in 'aeiou':
print('First char. is a vowel.')
Conversely, you could write a consonant test.
s = 'Helephant'
if s[0] not in 'aeiou':
print('First char. is a consonant.')
One obvious drawback is that these examples do not correctly work on
uppercase letters. Here’s one way to fix that:
if s[0] in 'aeiouAEIOU':
print('First char. is a vowel.')
Alternatively, you can convert a character to uppercase before testing it;
that has the effect of creating a case-insensitive comparison.
s = 'elephant'
if s[0].upper() in 'AEIOU':
print('First char. is a vowel.')
You can also use in and not in to test substrings that contain more than
one character. In that case, the entire substring must be found to produce True.
'bad' in 'a bad dog' # True!
Is there bad in a bad dog? Yes, there is.

From the Library of Vineeth Babu

Overland_Book.indb 43 4/30/19 1:37 PM


44 Chapter 2 Advanced String Capabilities

Notice that the in operator, if tested, always responds as if all strings


include the empty string, '', which differs from the way lists work. Python
does not return True if you ask whether a list contains the empty list.
print('' in 'cat') # Prints True
print([] in [1, 2, 3]) # Prints False
Another area in which single-character operations are important is in the
area of for loops and iteration. If you iterate through a list, you get access to
each list element. But if you iterate through a string, you get individual charac-
ters: again, these are each strings of length 1 rather than objects of a separate
“character” type.
s = 'Cat'
for ch in s:
print(ch, ', type:', type(ch))
This code prints the following:
C, type: <class 'str'>
a, type: <class 'str'>
t, type: <class 'str'>
Because each of these characters is a string of length 1, we can print the cor-
responding ASCII values:
s = 'Cat'
for ch in s:
print(ord(ch), end=' ')
This example prints the following:
67 97 116

2.6 Building Strings Using “ join”


Considering that strings are immutable, you might well ask the following
question: How do you construct or build new strings?
Once again, the special nature of Python assignment comes to the rescue.
For example, the following statements build the string “Big Bad John”:
a_str = 'Big '
a_str = a_str + 'Bad '
a_str = a_str + 'John'

From the Library of Vineeth Babu

Overland_Book.indb 44 4/30/19 1:37 PM


2.6 Building Strings Using “join” 45
These are perfectly valid statements. They reuse the name a_str, each time
assigning a new string to the name. The end result is to create the following
string:
'Big Bad John'
The following statements are also valid, and even if they seem to violate

2
immutability, they actually do not.
a_str = 'Big '
a_str += 'Bad '
a_str += 'John'
This technique, of using =, +, and += to build strings, is adequate for simple
cases involving a few objects. For example, you could build a string contain-
ing all the letters of the alphabet as follows, using the ord and chr functions
introduced in Section 2.5, “Single-Character Operations (Character Codes).”
n = ord('A')
s = ''
for i in range(n, n + 26):
s += chr(i)
This example has the virtue of brevity. But it causes Python to create
entirely new strings in memory, over and over again.
An alternative, which is slightly better, is to use the join method.
ntax
Key Sy

separator_string.join(list)
This method joins together all the strings in list to form one large string.
If this list has more than one element, the text of separator_string is placed
between each consecutive pair of strings. An empty list is a valid separator
string; in that case, all the strings in the list are simply joined together.
Use of join is usually more efficient at run time than concatenation, although
you probably won’t see the difference in execution time unless there are a great
many elements.
n = ord('A')
a_lst = [ ]
for i in range(n, n + 26):
a_lst.append(chr(i))
s = ''.join(a_lst)
The join method concatenates all the strings in a_lst, a list of strings,
into one large string. The separator string is empty in this case.

From the Library of Vineeth Babu

Overland_Book.indb 45 4/30/19 1:37 PM


46 Chapter 2 Advanced String Capabilities

Performance The advantage of join over simple concatenation can be seen in large
Tip cases involving thousands of operations. The drawback of concatena-
tion in such cases is that Python has to create thousands of strings of increas-
ing size, which are used once and then thrown away, through “garbage
collection.” But garbage collection exacts a cost in execution time, assuming it
is run often enough to make a difference.
Ç Performance Tip

Here’s a case in which the approach of using join is superior: Suppose you
want to write a function that takes a list of names and prints them one at a
time, nicely separated by commas. Here’s the hard way to write the code:
def print_nice(a_lst):
s = ''
for item in a_lst:
s += item + ', '
if len(s) > 0: # Get rid of trailing
# comma+space
s = s[:-2]
print(s)
Given this function definition, we can call it on a list of strings.
print_nice(['John', 'Paul', 'George', 'Ringo'])
This example prints the following:
John, Paul, George, Ringo
Here’s the version using the join method:
def print_nice(a_lst):
print(', '.join(a_lst))
That’s quite a bit less code!

2.7 Important String Functions


Many of the “functions” described in this chapter are actually methods: mem-
ber functions of the class that are called with the “dot" syntax.
But in addition to methods, the Python language has some important
built-in functions that are implemented for use with the fundamental types
of the language. The ones listed here apply especially well to strings.

From the Library of Vineeth Babu

Overland_Book.indb 46 4/30/19 1:37 PM


2.8 Binary, Hex, and Octal Conversion Functions 47
ntax input(prompt_str) # Prompt user for input string.
Key Sy

len(str) # Return num. of chars in str.


max(str) # Return char with highest code val.
min(str) # Return char with lowest code val.
reversed(str) # Return iter with reversed str.
sorted(str) # Return list with sorted str.

2
One of the most important functions is len, which can be used with any
of the standard collection classes to determine the number of elements. In
the case of strings, this function returns the number of characters. Here’s an
example:
dog1 = 'Jaxx'
dog2 = 'Cutie Pie'
print(dog1, 'has', len(dog1), 'letters.')
print(dog2, 'has', len(dog2), 'letters.')
This prints the following strings. Note that “Cutie Pie” has nine letters
because it counts the space.
Jaxx has 4 letters.
Cutie Pie has 9 letters.
The reversed and sorted functions produce an iterator and a list, respec-
tively, rather than strings. However, the output from these data objects can
be converted back into strings by using the join method. Here’s an example:
a_str = ''.join(reversed('Wow,Bob,wow!'))
print(a_str)
b_str = ''.join(sorted('Wow,Bob,wow!'))
print(b_str)
This prints the following:
!wow,boB,woW
!,,BWbooowww

2.8 Binary, Hex, and Octal Conversion Functions


In addition to the str conversion function, Python supports three functions
that take numeric input and produce a string result. Each of these functions
produces a digit string in the appropriate base (2, 16, and 8, corresponding to
binary, hexadecimal, and octal).

From the Library of Vineeth Babu

Overland_Book.indb 47 4/30/19 1:37 PM


48 Chapter 2 Advanced String Capabilities

ntax bin(n) # Returns a string containing n in binary:


Key Sy

# For example, bin(15) -> '0b1111'


hex(n) # Returns a string containing n in hex:
# For example, hex(15) -> '0xf'
oct(n) # Returns a string containing n in octal:
# For example, oct(15) -> '0o17'
Here’s another example, this one showing how 10 decimal is printed in
binary, octal, and hexadecimal.
print(bin(10), oct(10), hex(10))
This prints the following:
0b1010 0o12 0xa
As you can see, these three functions automatically use the prefixes “0b,”
“0o,” and “0x.”

2.9 Simple Boolean (“is”) Methods


These methods—all of which begin with the word “is” in their name—return
either True or False. They are often used with single-character strings but
can also be used on longer strings; in that case, they return True if and only
if every character in the string passes the test. Table 2.3 shows the Boolean
methods of strings.

Table 2.3. Boolean Methods of Strings


METHOD NAME/SYNTAX RETURNS TRUE IF STRING PASSES THIS TEST
[Link]() All characters are alphanumeric—a letter or digit—and there is at least
one character.
[Link]() All characters are letters of the alphabet, and there is at least one
character.
[Link]() All characters are decimal digits, and there is at least one character.
Similar to isdigit but intended to be used with Unicode characters.
[Link]() All characters are decimal digits, and there is at least one character.
[Link]() The string contains a valid Python identifier (symbolic) name. The first
character must be a letter or underscore; each other character must be a
letter, digit, or underscore.
[Link]() All letters in the string are lowercase, and there is at least one letter.
(There may, however, be nonalphabetic characters.)
[Link]() All characters in the string, if any, are printable characters. This excludes
special characters such as \n and \t.

From the Library of Vineeth Babu

Overland_Book.indb 48 4/30/19 1:37 PM


2.10 Case Conversion Methods 49
Table 2.3. Boolean Methods of Strings (continued)
METHOD NAME/SYNTAX RETURNS TRUE IF STRING PASSES THIS TEST
[Link]() All characters in the string are “whitespace” characters, and there is at
least one character.
[Link]() Every word in the string is a valid title, and there is at least one charac-

2
ter. This requires that each word be capitalized and that no uppercase
letter appear anywhere but at the beginning of a word. There may be
whitespace and punctuation characters in between words.
[Link]() All letters in the string are uppercase, and there is at least one letter.
(There may, however, be nonalphabetic characters.)

These functions are valid for use with single-character strings as well as
longer strings. The following code illustrates the use of both.
h_str = 'Hello'
if h_str[0].isupper():
print('First letter is uppercase.')
if h_str.isupper():
print('All chars are uppercase.')
else:
print('Not all chars are uppercase.')
This example prints the following:
First letter is uppercase.
Not all chars are uppercase.
This string would also pass the test for being a title, because the first letter
is uppercase and the rest are not.
if h_str.istitle():
print('Qualifies as a title.')

2.10 Case Conversion Methods


The methods in the previous section test for uppercase versus lowercase let-
ters. The methods in this section perform conversion to produce a new string.
ntax
Key Sy

[Link]() # Produce all-lowercase string


[Link]() # Produce all-uppercase string
[Link]() # 'foo foo'.title() => 'Foo Foo'
[Link]() # Upper to lower, and vice versa

From the Library of Vineeth Babu

Overland_Book.indb 49 4/30/19 1:37 PM


50 Chapter 2 Advanced String Capabilities

The effects of the lower and upper methods are straightforward. The first
converts each uppercase letter in a string to a lowercase letter; the second does
the converse, converting each lowercase letter to an uppercase letter. Nonletter
characters are not altered but kept in the string as is.
The result, after conversion, is then returned as a new string. The original
string data, being immutable, isn’t changed “in place.” But the following state-
ments do what you’d expect.
my_str = "I'm Henry VIII, I am!"
new_str = my_str.upper()
my_str = new_str
The last two steps can be efficiently merged:
my_str = my_str.upper()
If you then print my_str, you get the following:
I'M HENRY VIII, I AM!
The swapcase method is used only rarely. The string it produces has an
uppercase letter where the source string had a lowercase latter, and vice versa.
For example:
my_str = my_str.swapcase()
print(my_str)
This prints the following:
i'M hENRY viii, i AM!

2.11 Search-and-Replace Methods


The search-and-replace methods are among the most useful of the str class
methods. In this section, we first look at startswith and endswith, and
then present the other search-and-replace functions.
ntax
Key Sy

[Link](substr) # Return True if prefix found.


[Link](substr) # Return True if suffix found.
One of the authors wrote an earlier book, Python Without Fear (Addison-
Wesley, 2018), which features a program that converts Roman numerals to
decimal. It has to check for certain combinations of letters at the beginning of
the input string—starting with any number of Roman numeral Ms.

From the Library of Vineeth Babu

Overland_Book.indb 50 4/30/19 1:37 PM


2.11 Search-and-Replace Methods 51
while [Link]('M'):
amt += 1000 # Add 1,000 to running total.
romstr = romstr[1:] # Strip off first character.
The endswith method, conversely, looks for the presence of a target sub-
string as the suffix. For example:

2
me_str = 'John Bennett, PhD'
is_doc = me_str.endswith('PhD')
These methods, startswith and endswith, can be used on an empty
string without raising an error. If the substring is empty, the return value is
always True.
Now let’s look at other search-and-replace methods of Python strings.
ntax
Key Sy

[Link](substr [, beg [, end]])


[Link](substr [, beg [, end]])
[Link]() # Like find, but raises exception
[Link]() # Like find, but starts from end
[Link](old, new [, count]) # count is optional; limits
# no. of replacements
In this syntax, the brackets are not intended literally but represent optional
items.
The count method reports the number of occurrences of a target substring.
Here’s how it works.
frank_str = 'doo be doo be doo...'

n = frank_str.count('doo')
print(n) # Print 3.
You can optionally use the start and end arguments with this same
method call.
print(frank_str.count('doo', 1)) # Print 2
print(frank_str.count('doo', 1, 10)) # Print 1
A start argument of 1 specifies that counting begins with the second char-
acter. If start and end are both used, then counting happens over a target
string beginning with start position up to but not including the end position.
These arguments are zero-based indexes, as usual.
If either or both of the arguments (begin, end) are out of range, the count
method does not raise an exception but works on as many characters as it can.

From the Library of Vineeth Babu

Overland_Book.indb 51 4/30/19 1:37 PM


52 Chapter 2 Advanced String Capabilities

Similar rules apply to the find method. A simple call to this method finds
the first occurrence of the substring argument and returns the nonnegative
index of that instance; it returns –1 if the substring isn’t found.
frank_str = 'doo be doo be doo...'

print(frank_str.find('doo')) # Print 0
print(frank_str.find('doob')) # Print -1
If you want to find the positions of all occurrences of a substring, you can
call the find method in a loop, as in the following example.
frank_str = 'doo be doo be doo...'
n = -1
while True:
n = frank_str.find('doo', n + 1)
if n == -1:
break
print(n, end=' ')
This example prints every index at which an instance of 'doo' can be
found.
0 7 14
This example works by taking advantage of the start argument. After
each successful call to the find method, the initial searching position, n, is set
to the previous successful find index and then is adjusted upward by 1. This
guarantees that the next call to the find method must look for a new instance
of the substring.
If the find operation fails to find any occurrences, it returns a value of –1.
The index and rfind methods are almost identical to the find method,
with a few differences. The index function does not return –1 when it fails to
find an occurrence of the substring. Instead it raises a ValueError exception.
The rfind method searches for the last occurrence of the substring argu-
ment. By default, this method starts at the end and searches to the left. How-
ever, this does not mean it looks for a reverse of the substring. Instead, it
searches for a regular copy of the substring, and it returns the starting index
number of the last occurrence—that is, where the last occurrence starts.
frank_str = 'doo be doo be doo...'
print(frank_str.rfind('doo')) # Prints 14.
The example prints 14 because the rightmost occurrence of 'doo' starts in
zero-based position 14.

From the Library of Vineeth Babu

Overland_Book.indb 52 4/30/19 1:37 PM


2.12 Breaking Up Input Using “split” 53
Finally, the replace method replaces each and every occurrence of an old
substring with a new substring. This method, as usual, produces the resulting
string, because it cannot change the original string in place.
For example, let’s say we have a set of book titles but want to change the
spelling of the word “Grey” to “Gray.” Here’s an example:

2
title = '25 Hues of Grey'
new_title = [Link]('Grey', 'Gray')
Printing new_title produces this:
25 Hues of Gray
The next example illustrates how replace works on multiple occurrences
of the same substring.
title = 'Greyer Into Grey'
new_title = [Link]('Grey', 'Gray')
The new string is now
Grayer Into Gray

2.12 Breaking Up Input Using “split”


One of the most common programming tasks when dealing with character
input is tokenizing—breaking down a line of input into individual words,
phrases, and numbers. Python’s split method provides an easy and conve-
nient way to perform this task.
ntax
Key Sy

input_str.split(delim_string=None)
The call to this method returns a list of substrings taken from input_
string. The delim_string specifies a string that serves as the delimiter; this
is a substring used to separate one token from another.
If delim_string is omitted or is None, then the behavior of split is to, in
effect, use any sequence of one or more whitespace characters (spaces, tabs,
and newlines) to distinguish one token from the next.
For example, the split method—using the default delimiter of a space—
can be used to break up a string containing several names.
stooge_list = 'Moe Larry Curly Shemp'.split()
The resulting list, if printed, is as follows:
['Moe', 'Larry', 'Curly', 'Shemp']

From the Library of Vineeth Babu

Overland_Book.indb 53 4/30/19 1:37 PM


54 Chapter 2 Advanced String Capabilities

The behavior of split with a None or default argument uses any number
of white spaces in a row as the delimiter. Here’s an example:
stooge_list = 'Moe Larry Curly Shemp'.split()
If, however, a delimiter string is specified, it must be matched precisely to
recognize a divider between one character and the next.
stooge_list = 'Moe Larry Curly Shemp'.split(' ')
In this case, the split method recognizes an extra string—although it is
empty—wherever there’s an extra space. That might not be the behavior you
want. The example just shown would produce the following:
['Moe', '', '', '', 'Larry', 'Curly', '', 'Shemp']
Another common delimiter string is a comma, or possibly a comma com-
bined with a space. In the latter case, the delimiter string must be matched
exactly. Here’s an example:
stooge_list = 'Moe, Larry, Curly, Shemp'.split(', ')
In contrast, the following example uses a simple comma as delimiter. This
example causes the tokens to contain the extra spaces.
stooge_list = 'Moe, Larry, Curly, Shemp'.split(',')
The result in this case includes a leading space in the last three of the four
string elements:
['Moe', ' Larry', ' Curly', ' Shemp']
If you don’t want those leading spaces, an easy solution is to use stripping,
as shown next.

2.13 Stripping
Once you retrieve input from the user or from a text file, you may want to
place it in the correct format by stripping leading and trailing spaces. You
might also want to strip leading and trailing “0” digits or other characters.
The str class provides several methods to let you perform this stripping.
ntax
Key Sy

[Link](extra_chars=' ') # Strip leading & trailing.


[Link](extra_chars=' ') # String leading chars.
[Link](extra_chars=' ') # String trailing chars.
Each of these method calls produces a string that has trailing or leading
characters (or both) to be stripped out.

From the Library of Vineeth Babu

Overland_Book.indb 54 4/30/19 1:37 PM


2.14 Justification Methods 55
The lstrip method strips only leading characters, and the rstrip method
strips only trailing characters, but otherwise all three methods perform the
same job. The strip method strips both leading and trailing characters.
With each method, if the extra_chars argument is specified, the method
strips all occurrences of each and every character in the extra_chars string.
For example, if the string contains *+0, then the method strips all leading or

2
trailing asterisks (*) as well as all leading or trailing “0” digits and plus signs (+).
Internal instances of the character to be stripped are left alone. For exam-
ple, the following statement strips leading and trailing spaces but not the space
in the middle.
name_str = ' Will Shakes '
new_str = name_str.strip()
Figure 2.4 illustrates how this method call works.

W i l l S h a k e s

W i l l S h a k e s
Figure 2.4. Python stripping operations

2.14 Justification Methods


When you need to do sophisticated text formatting, you generally should use
the techniques described in Chapter 5, “Formatting Text Precisely.” However,
the str class itself comes with rudimentary techniques for justifying text:
either left justifying, right justifying, or centering text within a print field.
ntax
Key Sy

[Link](width [, fillchar]) # Left justify


[Link](width [, fillchar]) # Right justify
[Link](width [, fillchar]) # Center the text.
digit_str.zfill(width) # Pad with 0's.
In the syntax of these methods, each pair of square brackets indicates an
optional item not intended to be interpreted literally. These methods return a
string formatted as follows:

◗ The text of str is placed in a larger print field of size specified by width.
◗ If the string text is shorter than the specified length, the text is justified left,
right, or centered, as appropriate. The center method slightly favors left jus-
tification if it cannot be centered perfectly.

From the Library of Vineeth Babu

Overland_Book.indb 55 4/30/19 1:37 PM


56 Chapter 2 Advanced String Capabilities

◗ The rest of the result is padded with the fill character. If this fill character is
not specified, then the default value is a white space.

Here’s an example:
new_str = 'Help!'.center(10, '#')
print(new_str)
This example prints
##Help!###
Another common fill character (other than a space) is the digit character
“0”. Number strings are typically right justified rather than left justified.
Here’s an example:
new_str = '750'.rjust(6, '0')
print(new_str)
This example prints
000750
The zfill method provides a shorter, more compact way of doing the
same thing: padding a string of digits with leading “0” characters.
s = '12'
print([Link](7))
But the zfill method is not just a shortcut for rjust; instead, with zfill,
the zero padding becomes part of the number itself, so the zeros are printed
between the number and the sign:
>>> '-3'.zfill(5)
'-0003'
>>> '-3'.rjust(5, '0')
'000-3'

Chapter 2 Summary
The Python string type (str) is an exceptionally powerful data type, even in
comparison to strings in other languages. String methods include the abilities
to tokenize input (splitting); remove leading and trailing spaces (stripping);
convert to numeric formats; and print numeric expressions in any radix.
The built-in search abilities include methods for counting and finding sub-
strings (count, find, and index) as well as the ability to do text replacement.

From the Library of Vineeth Babu

Overland_Book.indb 56 4/30/19 1:37 PM


Suggested Problems 57
And yet there’s a great deal more you can do with strings. Chapter 5, “For-
matting Text Precisely,” explores the fine points of using formatting charac-
ters as well as the format method for the sophisticated printing of output.
Chapter 6, “Regular Expressions, Part I” goes even farther in matching,
searching, and replacing text patterns, so that you can carry out flexible
searches by specifying patterns of any degree of complexity.

2
Chapter 2 Review Questions
1 Does assignment to an indexed character of a string violate Python’s immuta-
bility for strings?
2 Does string concatenation, using the += operator, violate Python’s immutabil-
ity for strings? Why or why not?
3 How many ways are there in Python to index a given character?
4 How, precisely, are indexing and slicing related?
5 What is the exact data type of an indexed character? What is the data type of a
substring produced from slicing?
6 In Python, what is the relationship between the string and character “types”?
7 Name at least two operators and one method that enable you to build a larger
string out of one or more smaller strings.
8 If you are going to use the index method to locate a substring, what is the
advantage of first testing the target string by using in or not in?
9 Which built-in string methods, and which operators, produce a simple Bool-
ean (true/false) results?

Chapter 2 Suggested Problems


1 Write a program that prompts for a string and counts the number of vowels
and consonants, printing the results. (Hint: use the in and not in operators
to reduce the amount of code you might otherwise have to write.)
2 Write a function that efficiently strips the first two characters of a string and
the last two characters of a string. Returning an empty string should be an
acceptable return value. Test this function with a series of different inputs.

From the Library of Vineeth Babu

Overland_Book.indb 57 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


3 Advanced List
Capabilities
“I’ve got a little list . . . ”
—Gilbert and Sullivan, The Mikado

To paraphrase the Lord High Executioner in The Mikado, we’ve got a little
list. . . . Actually, in Python we’ve got quite a few of them. One of the foun-
dations of a strong programming language is the concept of arrays or lists—
objects that hold potentially large numbers of other objects, all held together
in a collection.
Python’s most basic collection class is the list, which does everything an
array does in other languages, but much more. This chapter explores the
basic, intermediate, and advanced features of Python lists.

3.1 Creating and Using Python Lists


Python has no data declarations. How, then, do you create collections such as
a list? You do so in the same way you create other data.

◗ Specify the data on the right side of an assignment. This is where a list is actu-
ally created, or built.
◗ On the left side, put a variable name, just as you would for any other assign-
ment, so that you have a way to refer to the list.

Variables have no type except through assignment. In theory, the same


variable could refer first to an integer and then to a list.
x = 5
x = [1, 2, 3]

59
From the Library of Vineeth Babu

Overland_Book.indb 59 4/30/19 1:37 PM


60 Chapter 3 Advanced List Capabilities

But it’s much better to use a variable to represent only one type of data and
stick to it. We also recommend using suggestive variable names. For example,
it’s a good idea to use a “list” suffix when you give a name to list collections.
my_int_list = [5, -20, 5, -69]
Here’s a statement that creates a list of strings and names it beat_list:
beat_list = [ 'John', 'Paul', 'George', 'Ringo' ]
You can even create lists that mix numeric and string data.
mixed_list = [10, 'John', 5, 'Paul' ]
But you should mostly avoid mixing data types inside lists. In Python 3.0,
mixing data types prevents you from using the sort method on the list. Inte-
ger and floating-point data, however, can be freely mixed.
num_list = [3, 2, 17, 2.5]
num_list.sort() # Sorts into [2, 2.5, 3, 17]
Another technique you can use for building a collection is to append one
element at a time to an empty list.
my_list = [] # Must do this before you append!
my_list.append(1)
my_list.append(2)
my_list.append(3)
These statements have the same effect as initializing a list all at once, as here:
my_list = [1, 2, 3]
You can also remove list items.
my_list.remove(1) # List is now [2, 3]
The result of this statement is to remove the first instance of an element
equal to 1. If there is no such value in the list, Python raises a ValueError
exception.
List order is meaningful, as are duplicate values. For example, to store a
series of judge’s ratings, you might use the following statement, which indi-
cates that three different judges all assigned the score 1.0, but the third judge
assigned 9.8.
the_scores = [1.0, 1.0, 9.8, 1.0]
The following statement removes only the first instance of 1.0.
the_scores.remove(1.0) # List now equals [1.0, 9.8, 1.0]

From the Library of Vineeth Babu

Overland_Book.indb 60 4/30/19 1:37 PM


3.3 Indexing 61

3.2 Copying Lists Versus Copying List Variables


In Python, variables are more like references in C++ than they are like “value”
variables. In practical terms, this means that copying from one collection to
another requires a little extra work.
What do you think the following does?
a_list = [2, 5, 10]
b_list = a_list

3
The first statement creates a list by building it on the right side of the assign-
ment (=). But the second statement in this example creates no data. It just does
the following action:
Make “b_list” an alias for whatever “a_list” refers to.
The variable b_list therefore becomes an alias for whatever a_list
refers to. Consequently, if changes are made to either variable, both reflect
that change.
b_list.append(100)
a_list.append(200)
b_list.append(1)
print(a_list) # This prints [2, 5, 10, 100, 200, 1]
If instead you want to create a separate copy of all the elements of a list, you
need to perform a member-by-member copy. The simplest way to do that is to
use slicing.
my_list = [1, 10, 5]
yr_list = my_list[:] # Perform member-by-member copy.
Now, because my_list and yr_list refer to separate copies of [1, 10, 5],
you can change one of the lists without changing the other.

3.3 Indexing
Python supports both nonnegative and negative indexes.
The nonnegative indexes are zero-based, so in the following example,
list_name[0] refers to the first element. (Section 3.3.2 covers negative
indexes.)
my_list = [100, 500, 1000]
print(my_list[0]) # Print 100.

From the Library of Vineeth Babu

Overland_Book.indb 61 4/30/19 1:37 PM


62 Chapter 3 Advanced List Capabilities

Because lists are mutable, they can be changed “in place” without creat-
ing an entirely new list. Consequently, you can change individual elements by
making one of those elements the target of an assignment—something you
can’t do with strings.
my_list[1] = 55 # Set second element to 55.

3.3.1 Positive Indexes


Positive (nonnegative) index numbers are like those used in other languages,
such as C++. Index 0 denotes the first element in the list, 1 denotes the sec-
ond, and so on. These indexes run from 0 to N–1, where N is the number of
elements.
For example, assume the following statement has been executed, creating
a list.
a_list = [100, 200, 300, 400, 500, 600]
These elements are indexed by the number 0 through 5, as shown in Figure 3.1.

0 1 2 3 4 5
100 200 300 400 500 600
Figure 3.1. Nonnegative indexes

The following examples use nonnegative indexes to access individual


elements.
print(a_list[0]) # Prints 100.
print(a_list[1]) # Prints 200.
print(a_list[2]) # Prints 300.
Although lists can grow without limit, an index number must be in range at
the time it’s used. Otherwise, Python raises an IndexError exception.

Performance Here, as elsewhere, we’ve used separate calls to the print function
Tip because it’s convenient for illustration purposes. But remember that
repeated calls to print slow down your program, at least within IDLE. A
faster way to print these values is to use only one call to print.
print(a_list[0], a_list[1], a_list[2], sep='\n')
Ç Performance Tip

From the Library of Vineeth Babu

Overland_Book.indb 62 4/30/19 1:37 PM


3.3 Indexing 63

3.3.2 Negative Indexes


You can also refer to items in a list by using negative indexes, which refer to an
element by its distance from the end of the list.
An index value of –1 denotes the last element in a list, and –2 denotes the
next-to -last element, and so on. The value –N denotes the first element in the
list. Negative indexes run from –1 to –N, in which N is the length of the list.
The list in the previous section can be indexed as illustrated in Figure 3.2.

–6 –5 –4 –3 –2 –1

3
100 200 300 400 500 600
Figure 3.2. Negative indexes

The following examples demonstrate negative indexing.


a_list = [100, 200, 300, 400, 500, 600]
print(a_list[-1]) # Prints 600.
print(a_list[-3]) # Prints 400.
Out-of-range negative indexes can raise an IndexError exception, just as
nonnegative indexes can.

3.3.3 Generating Index Numbers Using “enumerate”


The “Pythonic” way is to avoid the range function except where it’s needed.
Here’s the correct way to write a loop that prints elements of a list:
a_list = ['Tom', 'Dick', 'Jane']

for s in a_list:
print(s)
This prints the following:
Tom
Dick
Jane
This approach is more natural and efficient than relying on indexing, which
would be inefficient and slower.
for i in range(len(a_list)):
print(a_list[i])

From the Library of Vineeth Babu

Overland_Book.indb 63 4/30/19 1:37 PM


64 Chapter 3 Advanced List Capabilities

But what if you want to list the items next to numbers? You can do that by
using index numbers (plus 1, if you want the indexing to be 1-based), but a
better technique is to use the enumerate function.
ntax
Key Sy

enumerate(iter, start=0)
In this syntax, start is optional. Its default value is 0.
This function takes an iterable, such as a list, and produces another iter-
able, which is a series of tuples. Each of those tuples has the form
(num, item)
In which num is an integer in a series beginning with start. The following
statement shows an example, using a_list from the previous example and
starting the series at 1:
list(enumerate(a_list, 1))
This produces the following:
[(1, 'Tom'), (2, 'Dick'), (3, 'Jane')]
We can put this together with a for loop to produce the desired result.
for item_num, name_str in enumerate(a_list, 1):
print(item_num, '. ', name_str, sep='')
This loop calls the enumerate function to produce tuples of the form (num,
item). Each iteration prints the number followed by a period (“.”) and an
element.
1. Tom
2. Dick
3. Jane

3.4 Getting Data from Slices


Whereas indexing refers to one element at a time, the technique of slicing pro-
duces a sublist from a specified range. The sublist can range in size from an
empty list to a new list having all the contents of the original list.
Table 3.1 shows the various ways you can use slicing.

From the Library of Vineeth Babu

Overland_Book.indb 64 4/30/19 1:37 PM


3.4 Getting Data from Slices 65
Table 3.1 Slicing Lists in Python
SYNTAX PRODUCES THIS NEW LIST
list[beg:end] All list elements starting with beg, up to but not including
end.
list[:end] All elements from the beginning of the list, up to but not
including end.
list[beg:] All elements from beg forward to the end of the list.
list[:] All elements in the list; this operation copies the entire list,
element by element.

3
list[beg: end: step] All elements starting with beg, up to but not including end;
but movement through the list is step items at a time.
With this syntax, any or all of the three values may be omit-
ted. Each has a reasonable default value; the default value of
step is 1.

Here are some examples of list slicing:


a_list = [1, 2, 5, 10, 20, 30]

b_list = a_list[1:3] # Produces [2, 5]


c_list = a_list[4:] # Produces [20, 30]
These examples use positive indexing, in which index numbers run from
0 to N–1. You can just as easily use negative indexing to help specify a slice.
Here’s an example:
d_list = a_list[-4:-1] # Produces [5, 10, 20]
e_list = a_list[-1:] # Produces [30]
An important principle in either case is that the end argument specifies the
end of the slice as follows: Copy elements up to but not including the end
argument. Positive and negative index numbers can be mixed together.

Note Ë When Python carries out a slicing operation, which always includes at
least one colon (:) between the square brackets, the index specifications are
not required to be in range. Python copies as many elements as it can. If it fails
to copy any elements at all, the result is simply an empty list.
Ç Note

Figure 3.3 shows an example of how slicing works. Remember that Python
selects elements starting with beg, up to but not including the element referred
to by end. Therefore, the slice a_list[2:5] copies the sublist [300, 400, 500].

From the Library of Vineeth Babu

Overland_Book.indb 65 4/30/19 1:37 PM


66 Chapter 3 Advanced List Capabilities

a_list[2:5]

0 1 2 3 4 5
100 200 300 400 500 600

Sliced section includes 2,


up to but not including 5
Figure 3.3 Slicing example

Finally, specifying a value for step, the third argument, can affect the data
produced. For example, a value of 2 causes Python to get every other element
from the range [2:5].
a_list = [100, 200, 300, 400, 500, 600]
b_list = a_list[Link] # Produces [300, 500]
A negative step value reverses the direction in which list elements are
accessed. So a step value of –1 produces values in the slice by going backward
through the list one item at a time. A step value of –2 produces values in the
slice by going backward through the list two items at a time.
The following example starts with the last element and works backwards;
it therefore produces an exact copy of the list—with all elements reversed!
rev_list = a_list[::-1]
Here’s an example:
a_list = [100, 200, 300]
rev_list = a_list[::-1]
print(rev_list) # Prints [300, 200, 100]
The step argument can be positive or negative but cannot be 0. If step is
negative, then the defaults for the other values change as follows:

◗ The default value of beg becomes the last element in the list (indexed as –1).
◗ The default value of end becomes the beginning of the list.

Therefore, the slice expression [::-1] produces a reversal of the original


list.

From the Library of Vineeth Babu

Overland_Book.indb 66 4/30/19 1:37 PM


3.6 List Operators 67

3.5 Assigning into Slices


Because lists are mutable, you can assign to elements in place. This extends to
slicing. Here’s an example:
my_list = [10, 20, 30, 40, 50, 60]
my_list[1:4] = [707, 777]
This example has the effect of deleting the range [20, 30, 40] and insert-
ing the list [707, 777] in its place. The resulting list is

3
[10, 707, 777, 50, 60]
You may even assign into a position of length 0. The effect is to insert new
list items without deleting existing ones. Here’s an example:
my_list = [1, 2, 3, 4]
my_list[0:0] = [-50, -40]
print(my_list) # prints [-50, -40, 1, 2, 3, 4]
The following restrictions apply to this ability to assign into slices:

◗ When you assign to a slice of a list, the source of the assignment must be
another list or collection, even if it has zero or one element.
◗ If you include a step argument in the slice to be assigned to, the sizes of the
two collections—the slice assigned to and the sequence providing the data—
must match in size. If step is not specified, the sizes do not need to match.

3.6 List Operators


Table 3.2 summarizes the built-in operators applying to lists.

Table 3.2. List Operators in Python


OPERATOR/SYNTAX DESCRIPTION
list1 + list2 Produces a new list containing the contents of both
list1 and list2 by performing concatenation.
list1 * n, or Produces a list containing the contents of list1, repeated
n * list1 n times. For example, [0] * 3 produces [0, 0, 0].
list[n] Indexing. See Section 3.3.
▼ continued on next page

From the Library of Vineeth Babu

Overland_Book.indb 67 4/30/19 1:37 PM


68 Chapter 3 Advanced List Capabilities

Table 3.2. List Operators in Python (continued)


OPERATOR/SYNTAX DESCRIPTION
list[beg:end:step] Slicing. See Section 3.4.
list1 = list2 Makes list1 into a name for whatever list2 refers to.
Consequently, list1 becomes an alias for list2.
list1 = list2[:] Assigns list1 to a new list after performing a member-
by-member copy of list2. (See Section 3.4.)
list1 == list2 Produces True if list1 and list2 have equal contents,
after performing a member-by-member comparison.
list1 != list2 Produces False if list1 and list2 have equal contents;
True otherwise.
elem in list Produces True if elem is an element of list.
elem not in list Produces True if elem is not an element of list.
list1 < list2 Performs a member-by-member “less than” comparison.
list1 <= list2 Performs a member-by-member “less than or equal to”
comparison.
list1 > list2 Performs a member-by-member “greater than”
comparison.
list1 >= list2 Performs a member-by-member “greater than or equal
to” comparison.
*list Replaces list with a series of individual, “unpacked”
values. The use of this operator with *args is explained
in Section 4.8, "Variable-Length Argument Lists.

The first two of these operators (+ and *) involve making copies of list
items. But these are shallow copies. (Section 3.7, “Shallow Versus Deep Copy-
ing,” discusses this issue in greater detail.) So far, shallow copying has worked
fine, but the issue will rear its head when we discuss multidimensional arrays
in Section 3.18.
Consider the following statements:
a_list = [1, 3, 5, 0, 2]
b_list = a_list # Make an alias.
c_list = a_list[:] # Member-by-member copy
After b_list is created, the variable name b_list is just an alias for
a_list. But the third statement in this example creates a new copy of the
data. If a_list is modified later, c_list retains the original order.

From the Library of Vineeth Babu

Overland_Book.indb 68 4/30/19 1:37 PM


3.7 Shallow Versus Deep Copying 69
The multiplication operator (*) is particularly useful when you’re working
with large lists. How do you create an array of size 1,000 and initialize all the
elements to zero? Here’s the most convenient way:
big_array = [0] * 1000
The test for equality (==) and test for inequality (!=) work on any lists;
the contents are compared, and all members must be equal for == to produce
True. But the inequality operators (<, >, and so on) require compatible data
types, supporting greater-than and less-than comparisons. And sorting is pos-
sible between elements only if a < b is defined as well as b < a, as explained in

3
Section 9.10.3, “Comparison Methods.”
Neither an empty list nor the value None necessarily returns True when
applied to the in operator.
a = [1, 2, 3]
None in a # This produces False
[] in a # So does this.

b = [1, 2, 3, [], None]


None in b # This produces True
[] in b # So does this.
These results may seem surprising when you recall that '' in 'Fred' (in
which 'Fred' can be any string you want) produces True. In this particular
case, Python has different behavior for lists and strings.

3.7 Shallow Versus Deep Copying


The difference between shallow and deep copying is an important topic in
Python. First, let’s look at shallow copying. Given the following list assign-
ments, we’d expect a_list to be a separate copy from b_list, so that if
changes are made to b_list, then a_list would be unaffected.
a_list = [1, 2, [5, 10]]
b_list = a_list[:] # Member-by-member copy.
Now, let's modify b_list through indexing, setting each element to 0:
b_list[0] = 0
b_list[1] = 0
b_list[2][0] = 0
b_list[2][1] = 0

From the Library of Vineeth Babu

Overland_Book.indb 69 4/30/19 1:37 PM


70 Chapter 3 Advanced List Capabilities

You’d probably expect none of these assignments to affect a_list, because


that’s a separate collection from b_list. But if you print a_list, here’s what
you get:
>>> print(a_list)
[1, 2, [0, 0]]
This may seem impossible, because a_list had the last element set to [5,
10]. Changes to b_list shouldn’t have any effect on the contents of a_list,
but now the latter’s last element is [0, 0]! What happened?
The member-by-member copy, carried out earlier, copied the values 1 and
2, followed by a reference to the list-within-a-list. Consequently, changes
made to b_list can affect a_list if they involve the second level.
Figure 3.4 illustrates the concept. Shallow copying makes new copies of
top-level data only.

1 1

2 2

10

Figure 3.4. Shallow copying

And now you can see the problem. A member-by-member copy was carried
out, but the list within the list was a reference, so both lists ended up referring
to the same data in the final position.
The solution is simple. You need to do a deep copy to get the expected
behavior. To get a deep copy, in which even embedded list items get copied,
import the copy package and use [Link].
import copy

a_list = [1, 2, [5, 10]]


b_list = [Link](a_list) # Create a DEEP COPY.
After these statements are executed, b_list becomes a new list completely
unconnected to a_list. The result is illustrated in Figure 3.5, in which each
list gets its own, separate copy of the list-within-a-list.

From the Library of Vineeth Babu

Overland_Book.indb 70 4/30/19 1:37 PM


3.8 List Functions 71

1 1

2 2

5 5

10 10

3
Figure 3.5. Deep copying

With deep copying, the depth of copying extends to every level. You could
have collections within collections to any level of complexity.
If changes are now made to b_list after being copied to a_list, they will
have no further effect on a_list. The last element of a_list will remain
set to [5,10] until changed directly. All this functionality is thanks to deep
copying.

3.8 List Functions


When you work with lists, there are several Python functions you’ll find use-
ful: These include len, max, and min, as well as sorted, reversed, and sum.
These are functions, not methods. The main difference is that methods use
the dot (.) syntax; the other difference is that methods represent built-in abili-
ties, whereas the functions here implement abilities that are useful with collec-
tions generally. Admittedly, this is sometimes a very fine distinction.
ntax
Key Sy

len(collection) # Return length of the collection


max(collection) # Return the elem with maximum
# value.
min(collection) # Return the elem with minimum
# value.
reversed(collection) # Produce iter in reversed order.
sorted(collection) # Produce list in sorted order.
sum(collection) # Adds up all the elements, which
# must be numeric.
The len function returns the number of elements in a collection. This
includes lists, strings, and other Python collection types. In the case of dictio-
naries, it returns the number of keys.

From the Library of Vineeth Babu

Overland_Book.indb 71 4/30/19 1:37 PM


72 Chapter 3 Advanced List Capabilities

You’ll often use len when working with lists. For example, the following
loop doubles every item in a list. It’s necessary to use len to make this a gen-
eral solution.
for i in range(len(a_list)):
a_list[i] *= 2
The max and min functions produce maximum and minimum elements,
respectively. These functions work only on lists that have elements with com-
patible types, such as all numeric elements or all string elements. In the case of
strings, alphabetical order (or rather, code point order) enables comparisons.
Here’s an example:
a_list = [100, -3, -5, 120]
print('Length of the list is', len(a_list))
print('Max and min are', max(a_list), min(a_list))
This prints the following:
Length of the list is 4
Max and min are 120 -5
The sorted and reversed functions are similar to the sort and reverse
methods, presented in Section 3.11. But whereas those methods reorganize a
list in place, these functions produce new lists.
These functions work on tuples and strings as well as lists, but the sorted
function always produces a list. Here’s an example:
a_tup = (30, 55, 15, 45)
print(sorted(a_tup)) # Print [15, 30, 45, 55]
The reversed function is unusual because it produces an iterable but not
a collection. In simple terms, this means you need a for loop to print it or else
use a list or tuple conversion. Here’s an example:
a_tup = (1, 3, 5, 0)
for i in reversed(a_tup):
print(i, end=' ')
This prints
0 5 3 1
Alternatively, you can use the following:
print(tuple(reversed(a_tup)))

From the Library of Vineeth Babu

Overland_Book.indb 72 4/30/19 1:37 PM


3.9 List Methods: Modifying a List 73
This produces
(0, 5, 3, 1)
Finally, there is the sum function, which is extremely convenient. You could
write a loop yourself to perform this function, but it’s nice not to have to do
so. The sum function is supported for those arrays that are made up only of
numeric types, such as int and float.
One possible use is to quickly and easily figure the average for any list of
numbers. Here’s an example:

3
>>> num_list = [2.45, 1, -10, 55.5, 100.03, 40, -3]
>>> print('The average is ', sum(num_list) / len(num_list))
The average is 26.56857142857143

3.9 List Methods: Modifying a List


The largest single group of list methods includes those that modify list data in
place, modifying data in place rather than creating a new list.
ntax
Key Sy

[Link](value) # Append a value


[Link]() # Remove all contents
[Link](iterable) # Append a series of values
[Link](index, value) # At index, insert value
[Link](value) # Remove first instance of
# value
The append and extend methods have a similar purpose: to add data to
the end of a list. The difference is that the append method adds a single ele-
ment to the end of the list in question, whereas the extend method appends a
series of elements from a collection or iterable.
a_list = [1, 2, 3]

a_list.append(4)
a_list.extend([4]) # This has the same effect.

a_list.extend([4, 5, 6]) # Adds 3 elements to the list.


The insert method has a purpose similar to append. However, insert
places a value at the position indicated by the index argument; that is, the
method places the new value just before whichever element is specified by the
index argument.

From the Library of Vineeth Babu

Overland_Book.indb 73 4/30/19 1:37 PM


74 Chapter 3 Advanced List Capabilities

If the index is out of range, the method places the new value at the end of
the list if the index is too high to be in range, and it inserts the new value at the
beginning of the list if the index is too low. Here’s an example:
a_list = [10, 20, 40] # Missing 30.
a_list.insert(2, 30 ) # At index 2 (third), insert 30.
print(a_list) # Prints [10, 20, 30, 40]
a_list.insert(100, 33)
print(a_list) # Prints [10, 20, 30, 40, 33]
a_list.insert(-100, 44)
print(a_list) # Prints [44, 10, 20, 30, 40, 33]
The remove method removes the first occurrence of the specified argument
from the list. There must be at least one occurrence of this value, or Python
raises a ValueError exception.
my_list = [15, 25, 15, 25]
my_list.remove(25)
print(my_list) # Prints [15, 15, 25]
You may want to use in, not in, or the count method to verify that a
value is in a list before attempting to remove it.
Here’s a practical example that combines these methods.
In competitive gymnastics, winners are determined by a panel of judges,
each of whom submits a score. The highest and lowest scores are thrown out,
and then the average of the remaining scores is taken. The following function
performs these tasks:
def eval_scores(a_list):
a_list.remove(max(a_list))
a_list.remove(min(a_list))
return sum(a_list) / len(a_list)
Here’s a sample session. Suppose that the_scores contains the judges’
ratings.
the_scores = [8.5, 6.0, 8.5, 8.7, 9.9, 9.0]
The eval_scores function throws out the low and high values (6.0 and
9.9); then it calculates the average of the rest, producing 8.675.
print(eval_scores(the_scores))

From the Library of Vineeth Babu

Overland_Book.indb 74 4/30/19 1:37 PM


3.11 List Methods: Reorganizing 75

3.10 List Methods: Getting Information on Contents


The next set of list methods returns information about a list. The first two
of these, count and index, do not alter contents and are also supported by
tuples.
ntax
Key Sy

[Link](value) # Get no. of


# instances.
[Link](value[, beg [, end]]) # Get index of value.
[Link]([index]) # Return and remove

3
# indexed item: use
# last by default.
In this syntax, brackets are not intended literally but instead indicate optional
items.
The count method returns the number of occurrences of the specified element.
It returns the number of matching items at the top level only. Here’s an example:
yr_list = [1, 2, 1, 1,[3, 4]]
print(yr_list.count(1)) # Prints 3
print(yr_list.count(2)) # Prints 1
print(yr_list.count(3)) # Prints 0
print(yr_list.count([3, 4])) # Prints 1
The index method returns the zero-based index of the first occurrence
of a specified value. You may optionally specify start and end indexes; the
searching happens in a subrange beginning with the start position, up to but
not including the end position. An exception is raised if the item is not found.
For example, the following call to the index method returns 3, signifying
the fourth element.
beat_list = ['John', 'Paul', 'George', 'Ringo']
print(beat_list.index('Ringo')) # Print 3.
But 3 is also printed if the list is defined as
beat_list = ['John', 'Paul', 'George', 'Ringo', 'Ringo']

3.11 List Methods: Reorganizing


The last two list methods in this chapter modify a list by changing the order of
the elements in place.
ntax
Key Sy

[Link]([key=None] [, reverse=False])
[Link]() # Reverse existing order.

From the Library of Vineeth Babu

Overland_Book.indb 75 4/30/19 1:37 PM


76 Chapter 3 Advanced List Capabilities

Each of these methods changes the ordering of all the elements in place.
In Python 3.0, all the elements of the list—in the case of either method—
must have compatible types, such as all strings or all numbers. The sort
method places all the elements in lowest-to-highest order by default—or by
highest-to-lowest if reverse is specified and set to True. If the list consists of
strings, the strings are placed in alphabetical (code point) order.
The following example program prompts the user for a series of strings,
until the user enters an empty string by pressing Enter without any other
input. The program then prints the strings in alphabetical order.
def main():
my_list = [] # Start with empty list
while True:
s = input('Enter next name: ')
if len(s) == 0:
break
my_list.append(s)
my_list.sort() # Place all elems in order.
print('Here is the sorted list:')
for a_word in my_list:
print(a_word, end=' ')

main()
Here’s a sample session of this program, showing user input in bold.
Enter next name: John
Enter next name: Paul
Enter next name: George
Enter next name: Ringo
Enter next name: Brian
Enter next name:
Here is the sorted list:
Brian George John Paul Ringo
The sort method has some optional arguments. The first is the key argu-
ment, which by default is set to None. This argument, if specified, is a func-
tion (a callable) that’s run on each element to get that element’s key value.
Those keys are compared to determine the new order. So, for example, if a
three-member list produced key values of 15, 1, and 7, they would be sorted as
middle-last-first.
For example, suppose you want a list of strings to be ordered according to
case-insensitive comparisons. An easy way to do that is to write a function

From the Library of Vineeth Babu

Overland_Book.indb 76 4/30/19 1:37 PM


3.11 List Methods: Reorganizing 77
that returns strings that are all uppercase, all lowercase, or converted with the
casefold method, which essentially performs the same action (converting to
all lowercase).
def ignore_case(s):
return [Link]()

a_list = [ 'john', 'paul', 'George', 'brian', 'Ringo' ]


b_list = a_list[:]
a_list.sort()

3
b_list.sort(key=ignore_case)
If you now print a_list and b_list in an IDLE session, you get the fol-
lowing results (with user input shown in bold):
>>> a_list
['George', 'Ringo', 'brian', 'john', 'paul']
>>> b_list
['brian', 'George', 'john', 'paul', 'Ringo']
Notice how a_list and b_list, which started with identical contents, are
sorted. The first was sorted by ordinary, case-sensitive comparisons, in which
all uppercase letters are “less than” compared to lowercase letters. The second
list was sorted by case-insensitive comparisons, pushing poor old 'Ringo' to
the end.
The second argument is the reversed argument, which by default is
False. If this argument is included and is True, elements are sorted in high-
to-low order.
The reverse method changes the ordering of the list, as you’d expect, but
without sorting anything. Here’s an example:
my_list = ['Brian', 'John', 'Paul', 'George', 'Ringo']
my_list.reverse() # Reverse elems in place.
for a_word in my_list:
print(a_word, end=' ')
Calling reverse has the effect of producing a reverse sort: the last shall be
first, and the first shall be last. Now Ringo becomes the frontman.
Ringo Paul John George Brian

Note Ë Using the keys argument, as just explained, is a good candidate for the
use of lambda functions, as explained later in Section 3.14.
Ç Note

From the Library of Vineeth Babu

Overland_Book.indb 77 4/30/19 1:37 PM


78 Chapter 3 Advanced List Capabilities

3.12 Lists as Stacks: RPN Application


The append and pop methods have a special use. You can use these methods
on a list as if the list were a stack mechanism, a last-in-first-out (LIFO) device.
Figure 3.6 illustrates the operation of a stack, using the visual image of a
stack of plates or numbered blocks. Notice how it functions as a last-in-first-
out mechanism.

Push(20)
Push(10) Pop ->20
Pop ->10

20 20

10 10 10 10

0 0 0 0 0

Figure 3.6. Operation of a hypothetical stack

The push and pop functions on a traditional stack are replaced by the
append and pop methods of a Python list.
The key change that needs to be made—conceptually, at any rate—is to
think of operating on the last element to be added to the end of the list, rather
than to the literal top of a stack.
This end-of-the-list approach is functionally equivalent to a stack. Figure 3.7
illustrates 10 and 20 being pushed on, and then popped off, a list used as a
stack. The result is that the items are popped off in reverse order.

0 10 [Link](10)

0 10 20 [Link](20)

0 10 20 [Link]() -> 20

0 10 [Link]() -> 10

Figure 3.7. Stack operation with a Python list

From the Library of Vineeth Babu

Overland_Book.indb 78 4/30/19 1:37 PM


3.12 Lists as Stacks: RPN Application 79
One of the most useful demonstrations of a stack device is an interpreter for
the Reverse Polish Notation (RPN) language. We develop a sophisticated lan-
guage interpreter by the end of this book, but for now we start with a simple
calculator.
The RPN language evaluates operators in a postfix language, in which two
expressions are followed by an operator. Most languages use an infix nota-
tion. In postfix, the operands appear first and are followed by the operator.
For example, to add 7 and 3, you write the numbers first and then write an
addition sign (+).

3
7 3 +
This adds 7 to 3, which produces 10. Or, to multiply 10 by 5, producing 50,
you use this:
10 5 *
Then—and here is why RPN is so useful—you can put these two expres-
sions together in a clear, unambiguous way, without any need for parentheses:
10 5 * 7 3 + /
This expression is equivalent to the following standard notation, which
produces 5.0:
(10 * 5) / (7 + 3)
Here's another example:
1 2 / 3 4 / +
This example translates into (1/2) + (3/4) and therefore produces 1.25.
Here’s another example:
2 4 2 3 7 + + + *
This translates into
2 * (4 + (2 + (3 + 7)))
which evaluates to 32. The beauty of an RPN expression is that parentheses
are never needed. The best part is that the interpreter follows only a few sim-
ple rules:

◗ If the next item is a number, push it on the stack.


◗ If the next item is an operator, pop the top two items off the stack, apply the
operation, and then push the result.

From the Library of Vineeth Babu

Overland_Book.indb 79 4/30/19 1:37 PM


80 Chapter 3 Advanced List Capabilities

Here’s the pseudocode for the application:


code
Pseudo

Get an input string.


Split it into tokens and store in a list.
For each item in the list,
If item is an operator,
Pop stack into op2
Pop stack into op1
Carry out operation and push the result onto the stack.
Else
Push item onto the stack as a float value.
Pop stack and print the value.

Here’s the Python code that implements this program logic:


the_stack = []

def push(v):
the_stack.append(v)

def pop():
return the_stack.pop()

def main():
s = input('Enter RPN string: ')
a_list = [Link]()
for item in a_list:
if item in '+-*/':
op2 = pop()
op1 = pop()
if item == '+':
push(op1 + op2)
elif item == '-':
push(op1 - op2)
elif item == '*':
push(op1 * op2)
else:
push(op1 / op2)

From the Library of Vineeth Babu

Overland_Book.indb 80 4/30/19 1:37 PM


3.13 The “reduce” Function 81
else:
push(float(item))
print(pop())

main()
This application, although not long, could be more compact. We’ve included
dedicated push and pop functions operating on a global variable, the_stack.
A few lines could have been saved by using methods of the_stack directly.
op1 = the_stack.pop()

3
...
the_stack.append(op1 + op2) # Push op1 + op2.
Revising the example so that it uses these methods directly is left as an exer-
cise. Note also that there is currently no error checking, such as checking to
make sure that the stack is at least two elements in length before an operation
is carried out. Error checking is also left as an exercise.

Performance The following tip saves you seven lines of code. Instead of testing for
Tip each operator separately, you can use the eval function to take a Python
command string and execute it. You would then need only one function call to
carry out any arithmetic operation in this app.
push(eval(str(op1) + item + str(op2)))
Be careful, however, because the eval function can easily be misused. In
this application, it should be called only if the item is one of the four opera-
tors: +, *, –, or /.
Ç Performance Tip

3.13 The “reduce” Function


One of the more interesting features of Python lists is the ability to use cus-
tomized functions to process all the elements of a list. This includes the map
and filter list methods. The map method produces a new list by transform-
ing all elements in a source list. The filter function produces a new list that
is a sublist of the source, based on a specified condition (such as selecting pos-
itive numbers only).
However, list comprehension (discussed at length in Section 3.15, “List
Comprehension”) usually does a better job of what map and filter do.

From the Library of Vineeth Babu

Overland_Book.indb 81 4/30/19 1:37 PM


82 Chapter 3 Advanced List Capabilities

But the functools package provides a reason to use list-processing mini-


functions. To use the functools package, begin by importing it.
import functools
You can then use the [Link] function to apply a function of
your choosing to operate on all the elements of an array.
ntax
Key Sy

[Link](function, list)
The action of reduce is to apply the specified function to each succes-
sive pair of neighboring elements in list, accumulating the result, passing it
along, and finally returning the overall answer. The function argument—a
callable—must itself take two arguments and produce a result. Assuming that
a list (or other sequence) has at least four elements, the effect is as follows.

◗ Take the first two elements as arguments to the function. Remember the result.
◗ Take the result from step 1 and the third element as arguments to the func-
tion. Remember this result.
◗ Take the result from step 2 and the fourth element as arguments to the
function.
◗ Continue to the end of the list in this manner.

The result is easy to understand in the case of addition and multiplication.


import functools

def add_func(a, b):


return a + b

def mul_func(a, b):


return a * b

n = 5
a_list = list(range(1, n + 1))

triangle_num = [Link](add_func, a_list)


fact_num = [Link](mul_func, a_list)
If you remember how the range function works, then you’ll see that a_list
is equal to the following sequence, as long as n is set to 5.
1, 2, 3, 4, 5

From the Library of Vineeth Babu

Overland_Book.indb 82 4/30/19 1:37 PM


3.14 Lambda Functions 83
The example calculates the triangle number of n, which is the sum of all the
numbers in the sequence; and the factorial number of n, which is the product
of all the numbers in the sequence.
triangle_num = 1 + 2 + 3 + 4 + 5
fact_num = 1 * 2 * 3 * 4 * 5

Note Ë This result—producing triangle numbers by calculating a sum—is more


easily achieved by calling the sum function, as pointed out in Section 3.8, “List
Functions.”

3
Ç Note

Applying a subtraction function would be a strange thing to do in this


example, but legal. It would produce the following.
(((1 - 2) - 3) - 4) - 5
Likewise, applying a division function would produce the following:
(((1 / 2) / 3) / 4) / 5

3.14 Lambda Functions


When you operate on a list as shown in the previous section, you may want to
employ a simple function intended for a one-time use.
That’s what a lambda function is: a function that’s created on the fly, typi-
cally for one use. A lambda is a function that has no name, unless you choose
to assign it to a variable.
ntax
Key Sy

lambda arguments: return_value


In this syntax, arguments consists of zero or more variable names to be used
as arguments to the function, separated by commas if there are more than one.
The result is a callable that cannot be either saved or used directly in an
expression accepting a callable. Here’s an example of saving a lambda by giv-
ing it a name:
my_f = lambda x, y: x + y
Given this assignment, which makes my_f a name for this minifunction,
the name can now be used as a callable. Here’s an example:
sum1 = my_f(3, 7)
print(sum1) # Print 10.
sum2 = my_f(10, 15)
print(sum2) # Print 25.

From the Library of Vineeth Babu

Overland_Book.indb 83 4/30/19 1:37 PM


84 Chapter 3 Advanced List Capabilities

But this usage, while interesting to note, is not usually how a lambda is
used. A more practical use is with the reduce function. For example, here’s
how to calculate the triangle number for 5:
t5 = [Link](lambda x, y: x + y, [1,2,3,4,5])
Here’s how to calculate the factorial of 5:
f5 = [Link](lambda x, y: x * y, [1,2,3,4,5])
Programs create data dynamically, at run time, and assign names to data
objects if you want to refer to them again. The same thing happens with func-
tions (callables); they are created at run time and are either assigned names—
if you want to refer to them again—or used anonymously, as in the last two
examples.

3.15 List Comprehension


One of the most important features Python introduced with version 2.0 is list
comprehension. It provides a compact way of using for syntax to generate a
series of values from a list. It can also be applied to dictionaries, sets, and
other collections.
The simplest illustration of list comprehension copies all the elements in a
member-by-member copy.
The following statement uses slicing to create a copy:
b_list = a_list[:]
Here’s another way to get a member-by-member copy:
b_list = []
for i in a_list:
b_list.append(i)
Code like this is so common that Python 2.0 introduced a compact way of
doing the same thing. (I’ve used spacing to make it easier to understand.)
b_list = [i for i in a_list]
This example shows the two parts of the list-comprehension expression
clearly, but once you understand it, you’ll probably want to write it without
the extra spaces.
b_list = [i for i in a_list]

From the Library of Vineeth Babu

Overland_Book.indb 84 4/30/19 1:37 PM


3.15 List Comprehension 85
Here’s a variation. Suppose you want to create a list that contains the
squares of each of the elements in a_list:
b_list = [ ]
for i in a_list:
b_list.append(i * i)
If a_list contains [1, 2, 3], then the result of these statements is to
create a list containing [1, 4, 9] and assign this list to the variable b_list.
The corresponding list-comprehension expression in this case is shown here:

3
b_list = [i * i for i in a_list]
Perhaps by now you can see the pattern. In this second example, the ele-
ments inside the square brackets can be broken down as follows:

◗ The value expression i * i, which is the value to be generated and placed in


the new list; i * i specifies that the square of each element should be put in
the new list.
◗ The for statement header, for i in a_list, supplies the series of values to
operate on. Therefore, the source of the values is a_list.

Figure 3.8 illustrates this list-comprehension syntax.

b_list = [ ]
for i in a_lst:
b_lst.append(i * i)

b_list[ i * i for i in a_lst ]


Figure 3.8. List comprehension

Syntactically, list compression is a way of creating a list by using a value


expression, followed immediately by a for statement header that supplies a
sequence of data. Remember, however, that the for statement header is used
in the list-comprehension expression without its terminating colon (:).
ntax
Key Sy

[ value for_statement_header ]

From the Library of Vineeth Babu

Overland_Book.indb 85 4/30/19 1:37 PM


86 Chapter 3 Advanced List Capabilities

The for_statement_header can be taken from nested loops to any level.


Here is an example involving two such loops:
mult_list = [ ]
for i in range(3):
for j in range(3):
mult_list.append(i * j)
This nested loop produces the list [0, 0, 0, 0, 1, 2, 0, 2, 4]. This
loop is equivalent to the following list-comprehension statement:
mult_list = [i * j for i in range(3) for j in range(3)]
In this case, i * j is the value produced by each iteration of the loops, and
the rest of the line consists of the headers of the nested loops.
List comprehension has another, optional, feature. Syntactically, it’s placed
at the end of the expression but before the closing square bracket.
ntax
Key Sy

[ value for_statement_header if_expression ]


As a simple example, suppose you want to select only the elements of a list
that are positive. If you wrote out the loop by hand, you could write it this
way:
my_list = [10, -10, -1, 12, -500, 13, 15, -3]

new_list = []
for i in my_list:
if i > 0:
new_list.append(i)
The result, in this case, is to place the values [10, 12, 13, 15] in new_list.
The following statement, using list comprehension, does the same thing:
new_list = [i for i in my_list if i > 0 ]
The list-comprehension statement on the right, within the square brackets,
breaks down into three pieces in this case:

◗ The value expression i; takes a value directly from the list.


◗ The for statement header, for i in my_list, supplies the sequence of val-
ues to operate on.
◗ Finally, the if condition, if i > 0, selects which items get included.

From the Library of Vineeth Babu

Overland_Book.indb 86 4/30/19 1:37 PM


3.16 Dictionary and Set Comprehension 87
Again, once you understand how this works, it’s customary to write it with-
out the extra spaces I used for clarity.
new_list = [i for i in my_list if i > 0 ]
The following example, in contrast, creates a list consisting only of negative
values.
my_list = [1, 2, -10, -500, 33, 21, -1]
neg_list = [i for i in my_list if i < 0 ]
The result in this case is to produce the following list and give it the name

3
neg_list:
[-10, -500, -1]

3.16 Dictionary and Set Comprehension


The principles of list comprehension extend to sets and dictionaries. It’s easi-
est to see this with sets, because a set is a simple collection of values in which
duplicates are ignored and order doesn’t matter.
For example, suppose we want to get only the positive values from a_list
and place them in a set rather than a list. You could write this using an ordi-
nary loop:
a_list = [5, 5, 5, -20, 2, -1, 2]
my_set = set( )
for i in a_list:
if i > 0:
my_set.add(i)
You can also do this through set comprehension, by using set braces (“curly
braces”) rather than square brackets.
my_set = {i for i in a_list if i > 0}
The result, in either case, is to create the set {5, 2} and assign it to the vari-
able my_set. There are no duplicate values. The elimination of duplicates
happens automatically because you’re producing a set.
Note here that set comprehension is being performed (creating a set),
because curly braces (“set braces”) are used instead of square brackets, which
would have created a list.

From the Library of Vineeth Babu

Overland_Book.indb 87 4/30/19 1:37 PM


88 Chapter 3 Advanced List Capabilities

Alternatively, suppose you want to produce the same set, but have it consist
of the squares of positive values from a_list, resulting in {25, 4}. In that
case, you could use the following statement:
my_set = {i * i for i in a_list if i > 0}
Dictionary comprehension is a little more complicated, because in order to
work, it’s necessary to create a loop that generates key-value pairs, using this
syntax:
key : value
Suppose you have a list of tuples that you’d like to be the basis for a data
dictionary.
vals_list = [ ('pi', 3.14), ('phi', 1.618) ]
A dictionary could be created as follows:
my_dict = { i[0]: i[1] for i in vals_list }
Note the use of the colon (:) in the key-value expression, i[0] : i[1].
You can verify that a dictionary was successfully produced by referring to or
printing the following expression, which should produce the number 3.14:
my_dict['pi'] # Produces 3.14.
Here’s another example, which combines data from two lists into a dictio-
nary. It assumes that these two lists are the same length.
keys = ['Bob', 'Carol', 'Ted', 'Alice' ]
vals = [4.0, 4.0, 3.75, 3.9]
grade_dict = { keys[i]: vals[i] for i in range(len(keys)) }
This example creates a dictionary initialized as follows:
grade_dict = { 'Bob':4.0, 'Carol':4.0, 'Ted':3.75,
'Alice':3.9 }

Performance You can improve the performance of the code in this last example by
Tip using the built-in zip function to merge the lists. The comprehension
then is as follows:
grade_dict = { key: val for key, val in zip(keys, vals)}
Ç Performance Tip

From the Library of Vineeth Babu

Overland_Book.indb 88 4/30/19 1:37 PM


3.17 Passing Arguments Through a List 89
ntax In summary, the following syntax produces a set:
Key Sy

{ value for_statement_header optional_if_cond }


The following syntax produces a dictionary:
{ key : value for_statement_header optional_if_cond }
One of the cleverest ways to use dictionary comprehension is to invert a dic-
tionary. For example, you might want to take a phone book, in which a name
is used to look up a number, and invert it so that you can use a number to look
up a name.

3
idict = {v : k for k, v in phone_dict.items() }
The items method of data dictionaries produces a list of k, v pairs, in
which k is a key and v is a value. For each such pair, the value expression v:k
inverts the key-value relationship in producing the new dictionary, idict.

3.17 Passing Arguments Through a List


Argument values in Python are not exactly passed either by reference or by
value. Instead, Python arguments are passed as data-dictionary entries, in
which an argument name is associated with the value at the time of the func-
tion call.
In practical terms, this means that you cannot simply give a variable name
as an argument and write a function that modifies that variable.
double_it(n)
Let’s assume that when double_it executes, the value passed to n is 10.
The function receives the key-value pair n:10. But new assignments to n—
treated as if it were a local variable—have no effect on the value of n outside
the function, because such assignments would break the connection between
n and the data.
You can, however, pass a list to a function and write the function in such
a way that the function modifies some or all of the values in that list. This is
possible because lists (in contrast to strings and tuples) are mutable. Here’s an
example:
def set_list_vals(list_arg):
list_arg[0] = 100
list_arg[1] = 200
list_arg[2] = 150

From the Library of Vineeth Babu

Overland_Book.indb 89 4/30/19 1:37 PM


90 Chapter 3 Advanced List Capabilities

a_list = [0, 0, 0]
set_list_vals(a_list)
print(a_list) # Prints [100, 200, 150]
This approach works because the values of the list are changed in place,
without creating a new list and requiring variable reassignment. But the fol-
lowing example fails to change the list passed to it.
def set_list_vals(list_arg):
list_arg = [100, 200, 150]

a_list = [0, 0, 0]
set_list_vals(a_list)
print(a_list) # Prints [0, 0, 0]
With this approach, the values of the list, a_list, were not changed after
the function returned. What happened?
The answer is that the list argument, list_arg, was reassigned to refer to
a completely new list. The association between the variable list_arg and the
original data, [0, 0, 0], was broken.
However, slicing and indexing are different. Assigning into an indexed item
or a slice of a list does not change what the name refers to; it still refers to the
same list, but the first element of that list is modified.
my_list[0] = new_data # This really changes list data.

3.18 Multidimensional Lists


List elements can themselves be lists. So you can write code like the following:
weird_list = [ [1, 2, 3], 'John', 'George' ]
But much more common is the true multidimensional list, or matrix. The
following assignment creates a 3 × 3 list and assigns it to the variable mat:
mat = [[10, 11, 21], [20, 21, 22], [25, 15, 15]]
The right side of this assignment creates three rows, and each has three
values:
[10, 11, 12],
[20, 21, 22],
[25, 15, 15]

From the Library of Vineeth Babu

Overland_Book.indb 90 4/30/19 1:37 PM


3.18 Multidimensional Lists 91
You can index an individual element within this two-dimensional list as
follows:
list_name[row_index][column_index]
As usual, indexes in Python run from 0 to N–1, where N is the length of
the dimension. You can use negative indexes, as usual. Therefore, mat[1][2]
(second row, third column) produces the value 22.

Note Ë This chapter describes how to use the core Python language to create
multidimensional lists. Chapter 12 describes the use of the numpy package,

3
which enables the use of highly optimized routines for manipulating multidi-
mensional arrays, especially arrays (or matrixes) of numbers.
Ç Note

3.18.1 Unbalanced Matrixes


Although you’ll probably most often create matrixes that are rectangular, you
can use Python to create unbalanced matrixes. Here’s an example:
weird_mat = [[1, 2, 3, 4], [0, 5], [9, 8, 3]]
Program code can determine the exact size and shape of a Python matrix
through inspection. Taking the length of such a list (in this case, a matrix) gets
the number of elements at the top level. Here’s an example:
len(weird_mat) # Equal to 3.
This result tells you that there are three rows. You can then get the length
of each of these rows, within the matrix, as follows:
len(weird_mat[0]) # Equal to 4.
len(weird_mat[1]) # Equal to 2.
len(weird_mat[2]) # Equal to 3.
This process can be repeated to any depth.

3.18.2 Creating Arbitrarily Large Matrixes


Creating an arbitrarily large multidimensional list is a challenge in Python.
Fortunately, this section provides the simplest solution (other than using the
dedicated numpy package described in Chapter 12).
Remember, Python has no concept of data declaration. Therefore, Python
matrixes cannot be declared; they must be built.

From the Library of Vineeth Babu

Overland_Book.indb 91 4/30/19 1:37 PM


92 Chapter 3 Advanced List Capabilities

It might seem that list multiplication would solve the problem. It does, in
the case of one-dimensional lists.
big_list = [0] * 100 # Create a list of 100 elements
# each initialized to 0.
This works so well, you might be tempted to just generalize to a second
dimension.
mat = [[0] * 100] * 200
But although this statement is legal, it doesn’t do what you want. The inner
expression, [0] * 100, creates a list of 100 elements. But the code repeats
that data 200 times—not by creating 200 separate rows but instead by creat-
ing 200 references to the same row.
The effect is to create 200 rows that aren’t separate. This is a shallow copy;
you get 200 redundant references to the same row. This is frustrating. The
way around it is to append each of the 200 rows one at a time, which you can
do in a for loop:
mat = [ ]
for i in range(200):
[Link]([0] * 100)
In this example, mat starts out as an empty list, just like any other.
Each time through the loop, a row containing 100 zeros is appended. After
this loop is executed, mat will refer to a true two-dimensional matrix made up
of 20,000 fully independent cells. It can then be indexed as high as mat[199]
[99]. Here’s an example:
mat[150][87] = 3.141592
As with other for loops that append data to a list, the previous example is a
great candidate for list comprehension.
mat = [ [0] * 100 for i in range(200) ]
The expression [0] * 100 is the value part of this list-comprehension
expression; it specifies a one-dimensional list (or “row”) that consists of 100
elements, each set to 0. This expression should not be placed in an additional
pair of brackets, by the way, or the effect would be to create an extra, and
unnecessary, level of indexing.
The expression for i in range(200) causes Python to create, and
ntax
append, such a row . . . 200 times.
Key Sy

matrix_name = [[init] * ncols for var in range(nrows)]

From the Library of Vineeth Babu

Overland_Book.indb 92 4/30/19 1:37 PM


Review Questions 93
In this syntax display, init is the initial value you want to assign each element
to, and ncols and nrows are the number of columns and rows, respectively.
Because var isn’t important and need not be used again, you can replace it
with the trivial name “_” (just an underscore), which is basically a placeholder.
For example, to declare a 30 × 25 matrix, you would use this statement:
mat2 = [ [0] * 25 for _ in range(30) ]
You can use this technique to build matrixes of even higher dimensions,
each time adding a level of list comprehension. Here is a 30 × 20 × 25 three-
dimensional list:

3
mat2 = [[ [0] * 25 for _ in range(20) ]
for _ in range(30) ]
And here is a 10 × 10 × 10 × 10 four-dimensional list:
mat2 = [[[ [0] * 10 for _ in range(10) ]
for _ in range(10) ]
for _ in range(10) ]
You can build matrixes of higher dimensions still, but remember that as
dimensions increase, things get bigger—fast!

Chapter 3 Summary
This chapter has demonstrated just how powerful Python lists are. Many of
these same abilities are realized in functions, such as len, count, and index,
which apply to other collection classes as well, including strings and tuples.
However, because lists are mutable, there are some list capabilities not sup-
ported by those other types, such as sort and reverse, which alter list data
“in place.”
This chapter also introduced some exotic abilities, such as the use of functools
and lambda functions. It also explained techniques for creating multidimen-
sional lists, an ability that Chapter 12 provides efficient and superior alterna-
tives to; still, it’s useful to know how to create multidimensional lists using the
core language.

Chapter 3 Review Questions


1 Can you write a program, or a function, that uses both positive and negative
indexing? Is there any penalty for doing so?

From the Library of Vineeth Babu

Overland_Book.indb 93 4/30/19 1:37 PM


94 Chapter 3 Advanced List Capabilities

2 What’s the most efficient way of creating a Python list that has 1,000 elements
to start with? Assume every element should be initialized to the same value.
3 How do you use slicing to get every other element of a list, while ignoring the
rest? (For example, you want to create a new list that has the first, third, fifth,
seventh, and so on element.)
4 Describe some of the differences between indexing and slicing.
5 What happens when one of the indexes used in a slicing expression is out of
range?
6 If you pass a list to a function, and if you want the function to be able to
change the values of the list—so that the list is different after the function
returns—what action should you avoid?
7 What is an unbalanced matrix?
8 Why does the creation of arbitrarily large matrixes require the use of either
list comprehension or a loop?

Chapter 3 Suggested Problems


1 Use the reduce list-processing function to help get the average of a randomly
chosen list of numbers. The correct answer should be no more than one or
two lines of code. Then calculate the deviation of each element by subtract-
ing each element from the average (also called the “mean”) and squaring each
result. Finally, return the resulting list.
2 Write a program that enables users to enter a list of numbers, in which the
list is any length they want. Then find the median value, not the average or
mean. The median value is the value that has just as many greater values as
lesser values, in comparison to the rest of the list. If you order the entire list
from lowest to highest, and if there are an even number of elements, then the
median would be the average of the two values in the middle.

From the Library of Vineeth Babu

Overland_Book.indb 94 4/30/19 1:37 PM


Shortcuts,
4 Command Line,
and Packages
Master crafters need many things, but, above all, they need to master the
tools of the profession. This chapter introduces tools that, even if you’re a
fairly experienced Python programmer, you may not have yet learned. These
tools will make you more productive as well as increase the efficiency of your
programs.
So get ready to learn some new tips and tricks.

4.1 Overview
Python is unusually gifted with shortcuts and time-saving programming
techniques. This chapter begins with a discussion of twenty-two of these
techniques.
Another thing you can do to speed up certain programs is to take advantage
of the many packages that are available with Python. Some of these—such as
re (regular expressions), system, random, and math—come with the stan-
dard Python download, and all you have to do is to include an import state-
ment. Other packages can be downloaded quite easily with the right tools.

4.2 Twenty-Two Programming Shortcuts


This section lists the most common techniques for shortening and tightening
your Python code. Most of these are new in the book, although a few of them
have been introduced before and are presented in greater depth here.

◗ Use Python line continuation as needed.


◗ Use for loops intelligently.

95
From the Library of Vineeth Babu

Overland_Book.indb 95 4/30/19 1:37 PM


96 Chapter 4 Shortcuts, Command Line, and Packages

◗ Understand combined operator assignment (+= etc.).


◗ Use multiple assignment.
◗ Use tuple assignment.
◗ Use advanced tuple assignment.
◗ Use list and string “multiplication.”
◗ Return multiple values.
◗ Use loops and the else keyword.
◗ Take advantage of Booleans and not.
◗ Treat strings as lists of characters.
◗ Eliminate characters by using replace.
◗ Don’t write unnecessary loops.
◗ Use chained comparisons.
◗ Simulate “switch” with a table of functions.
◗ Use the is operator correctly.
◗ Use one-line for loops.
◗ Squeeze multiple statements onto a line.
◗ Write one-line if/then/else statements.
◗ Create Enum values with range.
◗ Reduce the inefficiency of the print function within IDLE.
◗ Place underscores inside large numbers.

Let’s look at these ideas in detail.

4.2.1 Use Python Line Continuation as Needed


In Python, the normal statement terminator is just the end of a physical line
(although note the exceptions in Section 3.18). This makes programming eas-
ier, because you can naturally assume that statements are one per line.
But what if you need to write a statement longer than one physical line?
This dilemma can crop up in a number of ways. For example, you might have
a string to print that you can’t fit on one line. You could use literal quotations,
but line wraps, in that case, are translated as newlines—something you might

From the Library of Vineeth Babu

Overland_Book.indb 96 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 97
not want. The solution, first of all, is to recognize that literal strings posi-
tioned next to other literal strings are automatically concatenated.
>>> my_str = 'I am Hen-er-y the Eighth,' ' I am!'
>>> print(my_str)
I am Hen-er-y the Eighth, I am!
If these substrings are too long to put on a single physical line, you have
a couple of choices. One is to use the line-continuation character, which is a
backslash (\).
my_str = 'I am Hen-er-y the Eighth,' \
' I am!'
Another technique is to observe that any open—and so far unmatched—
parenthesis, square bracket, or brace automatically causes continuation onto

4
the next physical line. Consequently, you can enter as long a statement as you
want—and you can enter a string of any length you want—without necessar-
ily inserting newlines.
my_str = ('I am Hen-er-y the Eighth, '
'I am! I am not just any Henry VIII, '
'I really am!')
This statement places all this text in one string. You can likewise use open
parentheses with other kinds of statements.
length_of_hypotenuse = ( (side1 * side1 + side2 * side2)
** 0.5 )
A statement is not considered complete until all open parentheses [(] have
been matched by closing parentheses [)]. The same is true for braces and
square brackets. As a result, this statement will automatically continue to the
next physical line.

4.2.2 Use “ for” Loops Intelligently


If you come from the C/C++ world, you may tend to overuse the range func-
tion to print members of a list. Here’s an example of the C way of writing a
for loop, using range and an indexing operation.
beat_list = ['John', 'Paul', 'George', 'Ringo']
for i in range(len(beat_list)):
print(beat_list[i])

From the Library of Vineeth Babu

Overland_Book.indb 97 4/30/19 1:37 PM


98 Chapter 4 Shortcuts, Command Line, and Packages

If you ever write code like this, you should try to break the habit as soon as
you can. It’s better to print the contents of a list or iterator directly.
beat_list = ['John', 'Paul', 'George', 'Ringo']
for guy in beat_list:
print(guy)
Even if you need access to a loop variable, it’s better to use the enumerate
function to generate such numbers. Here’s an example:
beat_list = ['John', 'Paul', 'George', 'Ringo']
for i, name in enumerate(beat_list, 1):
print(i, '. ', name, sep='')
This prints
1. John
2. Paul
3. George
4. Ringo
There are, of course, some cases in which it’s necessary to use indexing.
That happens most often when you are trying to change the contents of a list
in place.

4.2.3 Understand Combined Operator Assignment (+= etc.)


The combined operator-assignment operators are introduced in Chapter 1
and so are reviewed only briefly here. Remember that assignment (=) can be
combined with any of the following operators: +, -, /, //, %, **, &, ^, |, <<,
>>.
The operators &, |, and ^ are bitwise “and,” “or,” and “exclusive or,”
respectively. The operators << and >> perform bit shifts to the left and to the
right.
This section covers some finer points of operator-assignment usage. First,
any assignment operator has low precedence and is carried out last.
Second, an assignment operator may or may not be in place, depending on
whether the type operated on is mutable. In place refers to operations that
work on existing data in memory rather than creating a completely new
object. Such operations are faster and more efficient.
Integers, floating-point numbers, and strings are immutable. Assignment
operators, used with these types, do not cause in-place assignment; they instead
must produce a completely new object, which is reassigned to the variable.
Here’s an example:

From the Library of Vineeth Babu

Overland_Book.indb 98 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 99
s1 = s2 = 'A string.'
s1 += '...with more stuff!'
print('s1:', s1)
print('s2:', s2)
The print function, in this case, produces the following output:
s1: A string...with more stuff!
s2: A string.
When s1 was assigned a new value, it did not change the string data in
place; it assigned a whole new string to s1. But s2 is a name that still refers to
the original string data. This is why s1 and s2 now contain different strings.
But lists are mutable, and therefore changes to lists can occur in place.
a_list = b_list = [10, 20]

4
a_list += [30, 40]
print('a_list:', a_list)
print('b_list:', b_list)
This code prints
a_list: [10, 20, 30, 40]
b_list: [10, 20, 30, 40]
In this case, the change was made to the list in place, so there was no need
to create a new list and reassign that list to the variable. Therefore, a_list
was not assigned to a new list, and b_list, a variable that refers to the same
data in memory, reflects the change as well.
In-place operations are almost always more efficient. In the case of lists,
Python reserves some extra space to grow when allocating a list in memory,
and that in turns permits append operations, as well as +=, to efficiently
grow lists. However, occasionally lists exceed the reserved space and must be
moved. Such memory management is seamless and has little or no impact on
program behavior.
Non-in-place operations are less efficient, because a new object must be cre-
ated. That’s why it’s advisable to use the join method to grow large strings
rather than use the += operator, especially if performance is important. Here’s an
example using the join method to create a list and join 26 characters together.
str_list = []
n = ord('a')
for i in range(n, n + 26):
str_list += chr(i)
alphabet_str = ''.join(str_list)

From the Library of Vineeth Babu

Overland_Book.indb 99 4/30/19 1:37 PM


100 Chapter 4 Shortcuts, Command Line, and Packages

Figures 4.1 and 4.2 illustrate the difference between in-place operations
and non-in-place operations. In Figure 4.1, string data seems to be appended
onto an existing string, but what the operation really does is to create a new
string and then assign it to the variable—which now refers to a different place
in memory.

2 Create new
S ‘Here’s a string’ string.
1

S ‘Here’s a string...with more!’


3
Figure 4.1. Appending to a string (not in-place)

But in Figure 4.2, list data is appended onto an existing list without the
need to create a new list and reassign the variable.

a_list 10 20 30 40
Create new Grow the list
1 2
list. in place.
Figure 4.2. Appending to a list (in-place)

Here’s a summary:

◗ Combined assignment operators such as += cause in-place changes to data if


the object is mutable (such as a list); otherwise, a whole new object is assigned
to the variable on the left.
◗ In-place operations are faster and use space more efficiently, because they do
not force creation of a new object. In the case of lists, Python usually allocates
extra space so that the list can be grown more efficiently at run time.

4.2.4 Use Multiple Assignment


Multiple assignment is one of the most commonly used coding shortcuts in
Python. You can, for example, create five different variables at once, assigning
them all the same value—in this case, 0:
a = b = c = d = e = 0

From the Library of Vineeth Babu

Overland_Book.indb 100 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 101
Consequently, the following returns True:
a is b
This statement would no longer return True if either of these variables was
later assigned to a different object.
Even though this coding technique may look like it is borrowed from C and
C++, you should not assume that Python follows C syntax in most respects.
Assignment in Python is a statement and not an expression, as it is in C.

4.2.5 Use Tuple Assignment


Multiple assignment is useful when you want to assign a group of variables
the same initial value.
But what if you want to assign different values to different variables? For

4
example, suppose you want to assign 1 to a, and 0 to b. The obvious way to do
that is to use the following statements:
a = 1
b = 0
But through tuple assignment, you can combine these into a single
statement.
a, b = 1, 0
In this form of assignment, you have a series of values on one side of the
equals sign (=) and another on the right. They must match in number, with
one exception: You can assign a tuple of any size to a single variable (which
itself now represents a tuple as a result of this operation).
a = 4, 8, 12 # a is now a tuple containing three values.
Tuple assignment can be used to write some passages of code more com-
pactly. Consider how compact a Fibonacci-generating function can be in
Python.
def fibo(n):
a, b = 1, 0
while a <= n:
print(a, end=' ')
a, b = a + b, a
In the last statement, the variable a gets a new value: a + b; the variable b
gets a new value—namely, the old value of a.

From the Library of Vineeth Babu

Overland_Book.indb 101 4/30/19 1:37 PM


102 Chapter 4 Shortcuts, Command Line, and Packages

Most programming languages have no way to set a and b simultaneously.


Setting the value of a changes what gets put into b, and vice versa. So nor-
mally, a temporary variable would be required. You could do that in Python,
if you wanted to:
temp = a # Preserve old value of a
a = a + b # Set new value of a
b = temp # Set b to old value of a
But with tuple assignment, there’s no need for a temporary variable.
a, b = a + b, a
Here’s an even simpler example of tuple assignment. Sometimes, it’s useful
to swap two values.
x, y = 1, 25
print(x, y) # prints 1 25
x, y = y, x
print(x, y) # prints 25 1
The interesting part of this example is the statement that performs the
swap:
x, y = y, x
In another language, such an action would require three separate state-
ments. But Python does not require this, because—as just shown—it can do
the swap all at once. Here is what another language would require you to do:
temp = x
x = y
y = temp

4.2.6 Use Advanced Tuple Assignment


Tuple assignment has some refined features. For example, you can unpack a
tuple to assign to multiple variables, as in the following example.
tup = 10, 20, 30
a, b, c = tup
print(a, b, c) # Produces 10, 20, 30
It’s important that the number of input variables on the left matches the
size of the tuple on the right. The following statement would produce a run-
time error.

From the Library of Vineeth Babu

Overland_Book.indb 102 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 103
tup = 10, 20, 30
a, b = tup # Error: too many values to unpack
Another technique that’s occasionally useful is creating a tuple that has one
element. That would be easy to do with lists.
my_list = [3]
This is a list with one element, 3. But the same approach won’t work with
tuples.
my_tup = (3)
print(type(my_tup))
This print statement shows that my_tup, in this case, produced a simple
integer.

4
<class 'int'>
This is not what was wanted in this case. The parentheses were treated as a
no-op, as would any number of enclosing parentheses. But the following state-
ment produces a tuple with one element, although, to be fair, a tuple with just
one element isn’t used very often.
my_tup = (3,) # Assign tuple with one member, 3.
The use of an asterisk (*) provides a good deal of additional flexibility with
tuple assignment. You can use it to split off parts of a tuple and have one (and
only one) variable that becomes the default target for the remaining elements,
which are then put into a list. Some examples should make this clear.
a, *b = 2, 4, 6, 8
In this example, a gets the value 2, and b is assigned to a list:
2
[4, 6, 8]
You can place the asterisk next to any variable on the left, but in no case
more than one. The variable modified with the asterisk is assigned a list of
whatever elements are left over. Here’s an example:
a, *b, c = 10, 20, 30, 40, 50
In this case, a and c refer to 10 and 50, respectively, after this statement is
executed, and b is assigned the list [20, 30, 40].
You can, of course, place the asterisk next to a variable at the end.
big, bigger, *many = 100, 200, 300, 400, 500, 600

From the Library of Vineeth Babu

Overland_Book.indb 103 4/30/19 1:37 PM


104 Chapter 4 Shortcuts, Command Line, and Packages

Printing these variables produces the following:


>>> print(big, bigger, many, sep='\n')
100
200
[300, 400, 500, 600]

4.2.7 Use List and String “Multiplication”


Serious programs often deal with large data sets—for example, a collection of
10,000 integers all initialized to 0. In languages such as C and Java, the way to
do this is to first declare an array with a large dimension.
Because there are no data declarations in Python, the only way to create a
large list is to construct it on the right side of an assignment. But constructing
a super-long list by hand is impractical. Imagine trying to construct a super-
long list this way:
my_list = [0, 0, 0, 0, 0, 0, 0, 0...]
As you can imagine, entering 10,000 zeros into program code would be
very time-consuming! And it would make your hands ache.
Applying the multiplication operator provides a more practical solution:
my_list = [0] * 10000
This example creates a list of 10,000 integers, all initialized to 0.
Such operations are well optimized in Python, so that even in the interac-
tive development environment (IDLE), such interactions are handled quickly.
>>> my_list = [0] * 10000
>>> len(my_list)
10000
Note that the integer may be either the left or the right operand in such an
expression.
>>> my_list = 1999 * [12]
>>> len(my_list)
1999
You can also “multiply” longer lists. For example, the following list is 300
elements long. It consists of the numbers 1, 2, 3, repeated over and over.
>>> trip_list = [1, 2, 3] * 100
>>> len(trip_list)
300

From the Library of Vineeth Babu

Overland_Book.indb 104 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 105
The multiplication sign (*) does not work with dictionaries and sets, which
require unique keys. But it does work with the string class (str); for example,
you can create a string consisting of 40 underscores, which you might use for
display purposes:
divider_str = '_' * 40
Printing out this string produces the following:
________________________________________

4.2.8 Return Multiple Values


You can’t pass a simple variable to a Python function, change the value inside
the function, and expect the original variable to reflect the change. Here’s an

4
example:
def double_me(n):
n *= 2

a = 10
double_me(a)
print(a) # Value of a did not get doubled!!
When n is assigned a new value, the association is broken between that
variable and the value that was passed. In effect, n is a local variable that is
now associated with a different place in memory. The variable passed to the
function is unaffected.
But you can always use a return value this way:
def double_me(n):
return n * 2

a = 10
a = double_me(a)
print(a)
Therefore, to get an out parameter, just return a value. But what if you
want more than one out parameter?
In Python, you can return as many values as you want. For example, the
following function performs the quadratic equation by returning two values.
def quad(a, b, c):
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)

From the Library of Vineeth Babu

Overland_Book.indb 105 4/30/19 1:37 PM


106 Chapter 4 Shortcuts, Command Line, and Packages

x2 = (-b - determin) / (2 * a)
return x1, x2
This function has three input arguments and two output variables. In call-
ing the function, it’s important to receive both arguments:
x1, x2 = quad(1, -1, -1)
If you return multiple values to a single variable in this case, that variable
will store the values as a tuple. Here’s an example:
>>> x = quad(1, -1, -1)
>>> x
(1.618033988749895, -0.6180339887498949)
Note that this feature—returning multiple values—is actually an applica-
tion of the use of tuples in Python.

4.2.9 Use Loops and the “else” Keyword


The else keyword is most frequently used in combination with the if key-
word. But in Python, it can also be used with try-except syntax and with
loops.
With loops, the else clause is executed if the loop has completed without
an early exit, such as break. This feature applies to both while loops and for
loops.
The following example tries to find an even divisor of n, up to and includ-
ing the limit, max. If no such divisor is found, it reports that fact.
def find_divisor(n, max):
for i in range(2, max + 1):
if n % i == 0:
print(i, 'divides evenly into', n)
break
else:
print('No divisor found')
Here’s an example:
>>> find_divisor(49, 6)
No divisor found
>>> find_divisor(49, 7)
7 divides evenly into 49

From the Library of Vineeth Babu

Overland_Book.indb 106 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 107

4.2.10 Take Advantage of Boolean Values and “not”


Every object in Python evaluates to True or False. For example, every empty
collection in Python evaluates to False if tested as a Boolean value; so does
the special value None. Here’s one way of testing a string for being length zero:
if len(my_str) == 0:
break
However, you can instead test for an input string this way:
if not s:
break
Here are the general guidelines for Boolean conversions.

4
◗ Nonempty collections and nonempty strings evaluate as True; so do nonzero
numeric values.
◗ Zero-length collections and zero-length strings evaluate to False; so does
any number equal to 0, as well as the special value None.

4.2.11 Treat Strings as Lists of Characters


When you’re doing complicated operations on individual characters and
building a string, it’s sometimes more efficient to build a list of characters
(each being a string of length 1) and use list comprehension plus join to put it
all together.
For example, to test whether a string is a palindrome, it’s useful to omit all
punctuation and space characters and convert the rest of the string to either
all-uppercase or all-lowercase. List comprehension does this efficiently.
test_str = input('Enter test string: ')
a_list = [[Link]() for c in test_str if [Link]()]
print(a_list == a_list[::-1])
The second line in this example uses list comprehension, which was intro-
duced in Section 3.15, “List Comprehension.”
The third line in this example uses slicing to get the reverse of the list. Now
we can test whether test_str is a palindrome by comparing it to its own
reverse. These three lines of code have to be the shortest possible program for
testing whether a string is a palindrome. Talk about compaction!
Enter test string: A man, a plan, a canal, Panama!
True

From the Library of Vineeth Babu

Overland_Book.indb 107 4/30/19 1:37 PM


108 Chapter 4 Shortcuts, Command Line, and Packages

4.2.12 Eliminate Characters by Using “replace”


To quickly remove all instances of a particular character from a string, use
replace and specify the empty string as the replacement.
For example, a code sample in Chapter 10 asks users to enter strings that
represent fractions, such as “1/2”. But if the user puts extra spaces in, as in
“1 / 2”, this could cause a problem. Here’s some code that takes an input
string, s, and quickly rids it of all spaces wherever they are found (so it goes
beyond stripping):
s = [Link](' ', '')
Using similar code, you can quickly get rid of all offending characters or
substrings in the same way—but only one at a time. Suppose, however, that
you want to get rid of all vowels in one pass. List comprehension, in that case,
comes to your aid.
a_list = [c for c in s if c not in 'aeiou']
s = ''.join(a_list)

4.2.13 Don’t Write Unnecessary Loops


Make sure that you don’t overlook all of Python’s built-in abilities, especially
when you’re working with lists and strings. With most computer languages,
you’d probably have to write a loop to get the sum of all the numbers in a list.
But Python performs summation directly. For example, the following func-
tion calculates 1 + 2 + 3 … + N:
def calc_triangle_num(n):
return sum(range(n+1))
Another way to use the sum function is to quickly get the average (the mean)
of any list of numbers.
def get_avg(a_list):
return sum(a_list) / len(a_list)

4.2.14 Use Chained Comparisons (n < x < m)


This is a slick little shortcut that can save you a bit of work now and then, as
well as making your code more readable.
It’s common to write if conditions such as the following:
if 0 < x and x < 100:
print('x is in range.')

From the Library of Vineeth Babu

Overland_Book.indb 108 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 109
But in this case, you can save a few keystrokes by instead using this:
if 0 < x < 100: # Use 'chained' comparisons.
print('x is in range.')
This ability potentially goes further. You can chain together any number of
comparisons, and you can include any of the standard comparison operators,
including ==, <, <=, >, and >=. The arrows don’t even have to point in the same
direction or even be combined in any order! So you can do things like this:
a, b, c = 5, 10, 15
if 0 < a <= c > b > 1:
print('All these comparisons are true!')
print('c is equal or greater than all the rest!')
You can even use this technique to test a series of variables for equality.

4
Here’s an example:
a = b = c = d = e = 100
if a == b == c == d == e:
print('All the variables are equal to each other.')
For larger data sets, there are ways to achieve these results more efficiently.
Any list, no matter how large, can be tested to see whether all the elements are
equal this way:
if min(a_list) == max(a_list):
print('All the elements are equal to each other.')
However, when you just want to test a few variables for equality or perform
a combination of comparisons on a single line, the techniques shown in this
section are a nice convenience with Python. Yay, Python!

4.2.15 Simulate “switch” with a Table of Functions


This next technique is nice because it can potentially save a number of lines of
code.
Section 15.12 offers the user a menu of choices, prompts for an integer, and
then uses that integer to decide which of several functions to call. The obvious
way to implement this logic is with a series of if/elif statements, because
Python has no “switch” statement.
if n == 1:
do_plot(stockdf)
elif n == 2:
do_highlow_plot(stockdf)

From the Library of Vineeth Babu

Overland_Book.indb 109 4/30/19 1:37 PM


110 Chapter 4 Shortcuts, Command Line, and Packages

elif n == 3:
do_volume_subplot(stockdf)
elif n == 4:
do_movingavg_plot(stockdf)
Code like this is verbose. It will work, but it’s longer than it needs to be.
But Python functions are objects, and they can be placed in a list just like any
other kind of objects. You can therefore get a reference to one of the functions
and call it.
fn = [do_plot, do_highlow_plot, do_volume_subplot,
do_movingavg_plot][n-1]
fn(stockdf) # Call the function
For example, n-1 is evaluated, and if that value is 0 (that is, n is equal to 1),
the first function listed, do_plot, is executed.
This code creates a compact version of a C++ switch statement by calling
a different function depending on the value of n. (By the way, the value 0 is
excluded in this case, because that value is used to exit.)
You can create a more flexible control structure by using a dictionary com-
bined with functions. For example, suppose that “load,” “save,” “update,”
and “exit” are all menu functions. We might implement the equivalent of a
switch statement this way:
menu_dict = {'load':load_fn, 'save':save_fn,
'exit':exit_fn, 'update':update_fn}
(menu_dict[selector])() # Call the function
Now the appropriate function will be called, depending on the string con-
tained in selector, which presumably contains 'load', 'save', 'update',
or 'exit'.

4.2.16 Use the “is” Operator Correctly


Python supports both a test-for-equality operator (==) and an is operator.
These tests sometimes return the same result, and sometimes they don’t. If
two strings have the same value, a test for equality always produces True.
a = 'cat'
b = 'cat'
a == b # This must produce True.
But the is operator isn’t guaranteed to produce True in string compar-
isons, and it’s risky to rely upon. A constructed string isn’t guaranteed to

From the Library of Vineeth Babu

Overland_Book.indb 110 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 111
match another string if you use is rather than test-for-equality (==). For
example:
>>> s1 = 'I am what I am and that is all that I am.'
>>> s2 = 'I am what I am' + ' and that is all that I am.'
>>> s1 == s2
True
>>> s1 is s2
False
What this example demonstrates is that just because two strings have iden-
tical contents does not mean that they correspond to the same object in memory,
and therefore the is operator produces False.
If the is operator is unreliable in such cases, why is it in the language at all?
The answer is that Python has some unique objects, such as None, True, and

4
False. When you’re certain that you’re comparing a value to a unique object,
then the is keyword works reliably; moreover, it’s preferable in those situa-
tions because such a comparison is more efficient.
a_value = my_function()
if a_value is None:
# Take special action if None is returned.

4.2.17 Use One-Line “ for” Loops


If a for loop is short enough, with only one statement inside the loop (that is,
the statement body), you can squeeze the entire for loop onto a single physical
line.
ntax
Key Sy

for var in sequence: statement


Not all programmers favor this programming style. However, it’s useful
as a way of making your program more compact. For example, the following
one-line statement prints all the numbers from 0 to 9:
>>> for i in range(10): print(i, end=' ')

0 1 2 3 4 5 6 7 8 9
Notice that when you’re within IDLE, this for loop is like any other: You
need to type an extra blank line in order to terminate it.

From the Library of Vineeth Babu

Overland_Book.indb 111 4/30/19 1:37 PM


112 Chapter 4 Shortcuts, Command Line, and Packages

4.2.18 Squeeze Multiple Statements onto a Line


If you have a lot of statements you want to squeeze onto the same line, you can
do it—if you’re determined and the statements are short enough.
The technique is to use a semicolon (;) to separate one statement on a phys-
ical line from another. Here’s an example:
>>> for i in range(5): n=i*2; m = 5; print(n+m, end=' ')

5 7 9 11 13
You can squeeze other kinds of loops onto a line in this way. Also, you don’t
have to use loops but can place any statements on a line that you can manage
to fit there.
>>> a = 1; b = 2; c = a + b; print(c)
3
At this point, some people may object, “But with those semicolons, this
looks like C code!” (Oh, no—anything but that!)
Maybe it does, but it saves space. Keep in mind that the semicolons are
statement separators and not terminators, as in the old Pascal language.

4.2.19 Write One-Line if/then/else Statements


This feature is also called an in line if conditional. Consider the following if/
else statement, which is not uncommon:
turn = 0
...
if turn % 2:
cell = 'X'
else:
cell = 'O'
The book Python Without Fear uses this program logic to help operate a
tic-tac-toe game. On alternate turns, the cell to be added was either an “X” or
an “O”. The turn counter, advanced by 1 each time, caused a switch back and
forth (a toggle) between the two players, “X” and “O.”
That book replaced the if/else block just shown with the more compact
version:
cell = 'X' if turn % 2 else 'O'

From the Library of Vineeth Babu

Overland_Book.indb 112 4/30/19 1:37 PM


4.2 Twenty-Two Programming Shortcuts 113
ntax true_expr if conditional else false_expr
Key Sy

If the conditional is true, then the true_expr is evaluated and returned;


otherwise the false_expr is evaluated and returned.

4.2.20 Create Enum Values with “range”


Many programmers like to use enumerated (or “enum”) types in place of
so-called magic numbers. For example, if you have a color_indicator vari-
able, in which the values 1 through 5 represent the values red, green, blue,
back, and white, the code becomes more readable if you can use the color
names instead of using the literal numbers 1 through 5.
You could make this possible by assigning a number to each variable name.
red = 0

4
blue = 1
green = 2
black = 3
white = 4
This works fine, but it would be nice to find a way to automate this code.
There is a simple trick in Python that allows you to do that, creating an enu-
meration. You can take advantage of multiple assignment along with use of
the range function:
red, blue, green, black, white = range(5)
The number passed to range in this case is the number of settings. Or, if
you want to start the numbering at 1 instead of 0, you can use the following:
red, blue, green, black, white = range(1, 6)

Note Ë For more sophisticated control over the creation and specification of
enumerated types, you can import and examine the enum package.
import enum
help(enum)
You can find information on this feature at
[Link]
Ç Note

From the Library of Vineeth Babu

Overland_Book.indb 113 4/30/19 1:37 PM


114 Chapter 4 Shortcuts, Command Line, and Packages

4.2.21 Reduce the Inefficiency of the “print” Function


Within IDLE
Within IDLE, calls to the print statement are incredibly slow. If you run pro-
grams from within the environment, you can speed up performance dramati-
cally by reducing the number of separate calls to print.
For example, suppose you want to print a 40 × 20 block of asterisks (*). The
slowest way to do this, by far, is to print each character individually. Within
IDLE, this code is painfully slowly.
for i in range(20):
for j in range(40):
print('*', end='')
print()
You can get much better performance by printing a full row of asterisks at
a time.
row_of_asterisks = '*' * 40
for i in range(20):
print(row_of_asterisks)
But the best performance is achieved by revising the code so that it calls the
print function only once, after having assembled a large, multiline output
string.
row_of_asterisks = '*' * 40
s = ''
for i in range(20):
s += row_of_asterisks + '\n'
print(s)
This example can be improved even further by utilizing the string class
join method. The reason this code is better is that it uses in-place appending
of a list rather than appending to a string, which must create a new string each
time.
row_of_asterisks = '*' * 40
list_of_str = []
for i in range(20):
list_of_str.append(row_of_asterisks)
print('\n'.join(list_of_str))
Better yet, here is a one-line version of the code!
print('\n'.join(['*' * 40] * 20))

From the Library of Vineeth Babu

Overland_Book.indb 114 4/30/19 1:37 PM


4.3 Running Python from the Command Line 115

4.2.22 Place Underscores Inside Large Numbers


In programming, you sometimes have to deal with large numeric literals.
Here’s an example:
CEO_salary = 1500000
Such numbers are difficult to read in programming code. You might like to
use commas as separators, but commas are reserved for other purposes, such
as creating lists. Fortunately, Python provides another technique: You can use
underscores ( _) inside a numeric literal.
CEO_salary = 1_500_000
Subject to the following rules, the underscores can be placed anywhere
inside the number. The effect is for Python to read the number as if no under-

4
scores were present. This technique involves several rules.

◗ You can’t use two underscores in a row.


◗ You can’t use a leading or trailing underscore. If you use a leading underscore
(as in _1), the figure is treated as a variable name.
◗ You can use underscores on either side of a decimal point.

This technique affects only how numbers appear in the code itself and not
how anything is printed. To print a number with thousands-place separators,
use the format function or method as described in Chapter 5, “Formatting
Text Precisely.”

4.3 Running Python from the Command Line


If you’ve been running Python programs from within IDLE—either as com-
mands entered one at a time or as scripts—one way to improve execution
speed is to run programs from a command line instead; in particular, doing so
greatly speeds up the time it takes to execute calls to the print function.
Some of the quirks of command-line operation depend on which operating
system you’re using. This section covers the two most common operating sys-
tems: Windows and Macintosh.

4.3.1 Running on a Windows-Based System


Windows systems, unlike Macintosh, usually do not come with a version of
Python 2.0 preloaded, a practice that actually saves you a good deal of fuss as
long as you install Python 3 yourself.

From the Library of Vineeth Babu

Overland_Book.indb 115 4/30/19 1:37 PM


116 Chapter 4 Shortcuts, Command Line, and Packages

To use Python from the command line, first start the DOS Box applica-
tion, which is present as a major application on all Windows systems. Python
should be easily available because it should be placed in a directory that is part
of the PATH setting. Checking this setting is easy to do while you’re running
a Windows DOS Box.
In Windows, you can also check the PATH setting by opening the Control
Panel, choose Systems, and select the Advanced tab. Then click Environment
Variables.
You then should be able to run Python programs directly as long as they’re
in your PATH. To run a program from the command line, enter python and
the name of the source file (the main module), including the .py extension.
python [Link]

4.3.2 Running on a Macintosh System


Macintosh systems often come with a version of Python already installed;
unfortunately, on recent systems, the version is Python 2.0 and not Python 3.0.
To determine which version has been installed for command-line use, first
bring up the Terminal application on your Macintosh system. You may need
to first click the Launchpad icon.
You should find yourself in your default directory, whatever it is. You can
determine which command-line version of Python you have by using the fol-
lowing command:
python -V
If the version of Python is 2.0+, you’ll get a message such as the following:
python 2.7.10
But if you’ve downloaded some version of Python 3.0, you should have that
version of Python loaded as well. However, to run it, you’ll have to use the
command python3 rather than python.
If you do have python3 loaded, you can verify the exact version from the
command line as follows:
python3 -V
python 3.7.0
For example, if the file [Link] is in the current directory, and you want to
compile it as a Python 3.0 program, then use the following command:
python3 [Link]
The Python command (whether python or python3) has some useful vari-
ations. If you enter it with -h, the “help” flag, you get a printout on all the

From the Library of Vineeth Babu

Overland_Book.indb 116 4/30/19 1:37 PM


4.4 Writing and Using Doc Strings 117
possible flags that you can use with the command, as well as relevant environ-
ment variables.
python3 -h

4.3.3 Using pip or pip3 to Download Packages


Some of the packages in this book require that you download and install the
packages from the Internet before you use those packages. The first chapter
that requires that is Chapter 12, which introduces the numpy package.
All the packages mentioned in this book are completely free of charge (as
most packages for Python are). Even better, the pip utility—which is included
with the Python 3 download—goes out and finds the package that you name;
thus all you should need is an Internet connection!

4
On Windows-based systems, use the following command to download and
install a desired package.
pip install package_name
The package name, incidentally, uses no file extension:
pip install numpy
On Macintosh systems, you may need to use the pip3 utility, which is
download with Python 3 when you install it on your computer. (You may also
have inherited a version of pip, but it will likely be out-of-date and unusable.)
pip3 install package_name

4.4 Writing and Using Doc Strings


Python doc strings enable you to leverage the work you do writing comments
to get free online help. That help is then available to you while running IDLE,
as well as from the command line, when you use the pydoc utility.
You can write doc strings for both functions and classes. Although this
book has not yet introduced how to write classes, the principles are the same.
Here’s an example with a function, showcasing a doc string.
def quad(a, b, c):
'''Quadratic Formula function.

This function applies the Quadratic Formula


to determine the roots of x in a quadratic
equation of the form ax^2 + bx + c = 0.
'''

From the Library of Vineeth Babu

Overland_Book.indb 117 4/30/19 1:37 PM


118 Chapter 4 Shortcuts, Command Line, and Packages

determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
When this doc string is entered in a function definition, you can get help
from within IDLE:
>>> help(quad)
Help on function quad in module _ _main_ _:

quad(a, b, c)
Quadratic Formula function.

This function applies the Quadratic Formula


to determine the roots of x in a quadratic
equation of the form ax^2 + bx + c = 0.
The mechanics of writing a doc string follow a number of rules.

◗ The doc string itself must immediately follow the heading of the function.
◗ It must be a literal string utilizing the triple-quote feature. (You can actually
use any style quote, but you need a literal quotation if you want to span mul-
tiple lines.)
◗ The doc string must also be aligned with the “level-1” indentation under the
function heading: For example, if the statements immediately under the func-
tion heading are indented four spaces, then the beginning of the doc string
must also be indented four spaces.
◗ Subsequent lines of the doc string may be indented as you choose, because
the string is a literal string. You can place the subsequent lines flush left or
continue the indentation you began with the doc string. In either case, Python
online help will line up the text in a helpful way.

This last point needs some clarification. The doc string shown in the previ-
ous example could have been written this way:
def quad(a, b, c):
'''Quadratic Formula function.

This function applies the Quadratic Formula


to determine the roots of x in a quadratic
equation of the form ax^2 + bx + c = 0.
'''

From the Library of Vineeth Babu

Overland_Book.indb 118 4/30/19 1:37 PM


4.5 Importing Packages 119
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
You might expect this doc string to produce the desired behavior—to print
help text that lines up—and you’d be right. But you can also put in extra
spaces so that the lines also align within program code. It might seem this
shouldn’t work, but it does.
For stylistic reasons, programmers are encouraged to write the doc string
this way, in which the subsequent lines in the quote line up with the beginning
of the quoted string instead of starting flush left in column 1:
def quad(a, b, c):
'''Quadratic Formula function.

4
This function applies the Quadratic Formula
to determine the roots of x in a quadratic
equation of the form ax^2 + bx + c = 0.
'''
As part of the stylistic guidelines, it’s recommended that you put in a brief
summary of the function, followed by a blank line, followed by more detailed
description.
When running Python from the command line, you can use the pydoc util-
ity to get this same online help shown earlier. For example, you could get help
on the module named [Link]. The pydoc utility responds by printing a
help summary for every function. Note that “py” is not entered as part of the
module name in this case.
python -m pydoc queens

4.5 Importing Packages


Later sections in this chapter, as well as later chapters in the book, make use of
packages to extend the capabilities of the Python language.
A package is essentially a software library of objects and functions that
perform services. Packages come in two varieties:

◗ Packages included with the Python download itself. This includes math, random,
sys, os, time, datetime, and [Link]. These packages are especially conve-
nient, because no additional downloading is necessary.
◗ Packages you can download from the Internet.

From the Library of Vineeth Babu

Overland_Book.indb 119 4/30/19 1:37 PM


120 Chapter 4 Shortcuts, Command Line, and Packages

The syntax shown here is the recommended way to an import a package.


There are a few variations on this syntax, as we’ll show later.
ntax
Key Sy

import package_name
For example:
import math
Once a package is imported, you can, within IDLE, get help on its contents.
Here’s an example:
>>> import math
>>> help(math)
If you type these commands from within IDLE, you’ll see that the math
package supports a great many functions.
But with this approach, each of the functions needs to be qualified using
the dot (.) syntax. For example, one of the functions supported is sqrt (square
root), which takes an integer or floating-point input.
>>> [Link](2)
1.4142135623730951
You can use the math package, if you choose, to calculate the value of pi.
However, the math package also provides this number directly.
>>> [Link](1) * 4
3.141592653589793
>>> [Link]
3.141592653589793
Let’s look at one of the variations on the import statement.
ntax
Key Sy

import package_name [as new_name]


In this syntax, the brackets indicate that the as new_name clause is optional.
You can use it, if you choose, to give the package another name, or alias, that
is referred to in your source file.
This feature provides short names if the full package name is long. For
example, Chapter 13 introduces the [Link] package.
import [Link] as plt
Now, do you want to use the prefix [Link], or do you want
to prefix a function name with plt? Good. We thought so.
Python supports other forms of syntax for the import statement. With
both of these approaches, the need to use the package name and the dot syn-
tax is removed.

From the Library of Vineeth Babu

Overland_Book.indb 120 4/30/19 1:37 PM


4.6 A Guided Tour of Python Packages 121
ntax from package_name import symbol_name
Key Sy

from package_name import *


In the first form of this syntax, only the symbol_name gets imported, and
not the rest of the package. But the specified symbol (such as pi in this next
example) can then be referred to without qualification.
>>> from math import pi
>>> print(pi)
3.141592653589793
This approach imports only one symbol—or a series of symbols sepa-
rated by commas—but it enables the symbolic name to be used more directly.
To import an entire package, while also gaining the ability to refer to all its
objects and functions directly, use the last form of the syntax, which includes

4
an asterisk (*).
>>> from math import *
>>> print(pi)
3.141592653589793
>>> print(sqrt(2))
1.4142135623730951
The drawback of using this version of import is that with very large and
complex programs, it gets difficult to keep track of all the names you’re using,
and when you import packages without requiring a package-name qualifier,
name conflicts can arise.
So, unless you know what you’re doing or are importing a really small pack-
age, it’s more advisable to import specific symbols than use the asterisk (*).

4.6 A Guided Tour of Python Packages


Thousands of other packages are available if you go to [Link], and they
are all free to use. The group of packages in Table 4.1 is among the most use-
ful of all packages available for use with Python, so you should be sure to look
them over.
The re, math, random, array, decimal, and fractions packages are all
included with the standard Python 3 download, so you don’t need to down-
load them separately.
The numpy, matplotlib, and pandas packages need to be installed separately
by using the pip or pip3 utility. Later chapters, starting with Chapter 12,
cover those utilities in depth.

From the Library of Vineeth Babu

Overland_Book.indb 121 4/30/19 1:37 PM


122 Chapter 4 Shortcuts, Command Line, and Packages

Table 4.1. Python Packages Covered in This Book


NAME TO IMPORT DESCRIPTION
re Regular-expression package. This package lets you create text patterns that can
match many different words, phrases, or sentences. This pattern-specification
language can do sophisticated searches with high efficiency.
This package is so important that it’s explored in both Chapters 6 and 7.
math Math package. Contains helpful and standard math functions so that you don’t
have to write them yourself. These include trigonometric, hyperbolic, exponen-
tial, and logarithmic functions, as well as the constants e and pi.
This package is explored in Chapter 11.
random A set of functions for producing pseudo-random values. Pseudo-random numbers
behave as if random—meaning, among other things, it’s a practical impossibility
for a user to predict them.
This random-number generation package includes the ability to produce random
integers from a requested range, as well as floating-point numbers and normal
distributions. The latter cluster around a mean value to form a “bell curve” of
frequencies.
This package is explored in Chapter 11.
decimal This package supports the Decimal data type, which (unlike the float type)
enables you to represent dollars-and-cents figures precisely without any possi-
bility of rounding errors. Decimal is often preferred for use in accounting and
financial applications.
This package is explored in Chapter 10.
fractions This package supports the Fraction data type, which stores any fractional
number with absolute precision, provided it can be represented as the ratio of two
integers. So, for example, this data type can represent the ratio 1/3 absolutely,
something that neither the float nor Decimal type can do without rounding
errors.
This package is explored in Chapter 10.
array This package supports the array class, which differs from lists in that it holds
raw data in contiguous storage. This isn’t always faster, but sometimes it’s
necessary to pack your data into contiguous storage so as to interact with other
processes. However, the benefits of this package are far exceeded by the numpy
package, which gives you the same ability, but much more.
This package is briefly covered in Chapter 12.
numpy This package supports the numpy (numeric Python) class, which in turn supports
high-speed batch operations on one-, two-, and higher-dimensional arrays. The
class is useful not only in itself, as a way of supercharging programs that handle
large amounts of data, but also as the basis for work with other classes.
This package is explored in Chapters 12 and 13. numpy needs to be installed with
pip or pip3.

From the Library of Vineeth Babu

Overland_Book.indb 122 4/30/19 1:37 PM


4.7 Functions as First-Class Objects 123
Table 4.1. Python Packages Covered in This Book (continued)
NAME TO IMPORT DESCRIPTION
[Link] Similar to random, but designed especially for use with numpy, and ideally suited
to situations in which you need to generate a large quantity of random numbers
quickly. In head-to-head tests with the standard random class, the numpy random
class is several times faster when you need to create an array of such numbers.
This package is also explored in Chapter 12.
[Link] This package supports sophisticated plotting routines for Python. Using these
routines, you can create beautiful looking charts and figures—even three-
dimensional ones.
This package is explored in Chapter 13. It needs to be installed with pip or pip3.
pandas This package supports data frames, which are tables that can hold a variety of
information, as well as routines for going out and grabbing information from the
Internet and loading it. Such information can then be combined with the numpy

4
and plotting routines to create impressive-looking graphs.
This package is explored in Chapter 15. It also needs to be downloaded.

4.7 Functions as First-Class Objects


Another productivity tool—which may be useful in debugging, profiling, and
related tasks—is to treat Python functions as first-class objects. That means
taking advantage of how you can get information about a function at run
time. For example, suppose you’ve defined a function called avg.
def avg(a_list):
'''This function finds the average val in a list.'''
x = (sum(a_list) / len(a_list))
print('The average is:', x)
return x
The name avg is a symbolic name that refers to a function, which in Python
lingo is also a callable. There are a number of things you can do with avg,
such as verify its type, which is function. Here’s an example:
>>> type(avg)
<class 'function'>
We already know that avg names a function, so this is not new informa-
tion. But one of the interesting things you can do with an object is assign it to a

From the Library of Vineeth Babu

Overland_Book.indb 123 4/30/19 1:37 PM


124 Chapter 4 Shortcuts, Command Line, and Packages

new name. You can also assign a different function altogether to the symbolic
name, avg.
def new_func(a_list):
return (sum(a_list) / len(a_list))

old_avg = avg
avg = new_func
The symbolic name old_avg now refers to the older, and longer, function
we defined before. The symbolic name avg now refers to the newer function just
defined.
The name old_avg now refers to our first averaging function, and we can
call it, just as we used to call avg.
>>> old_avg([4, 6])
The average is 5.0
5.0
The next function shown (which we might loosely term a “metafunction,”
although it’s really quite ordinary) prints information about another function—
specifically, the function argument passed to it.
def func_info(func):
print('Function name:', func._ _name_ _)
print('Function documentation:')
help(func)
If we run this function on old_avg, which has been assigned to our first
averaging function at the beginning of this section, we get this result:
Function name: avg
Function documentation:
Help on function avg in module _ _main_ _:

avg(a_list)
This function finds the average val in a list.
We’re currently using the symbolic name old_avg to refer to the first func-
tion that was defined in this section. Notice that when we get the function’s
name, the information printed uses the name that the function was originally
defined with.
All of these operations will become important when we get to the topic of
“decorating” in Section 4.9, “Decorators and Function Profilers.”

From the Library of Vineeth Babu

Overland_Book.indb 124 4/30/19 1:37 PM


4.8 Variable-Length Argument Lists 125

4.8 Variable-Length Argument Lists


One of the most versatile features of Python is the ability to access variable-
length argument lists. With this capability, your functions can, if you choose,
handle any number of arguments—much as the built-in print function does.
The variable-length argument ability extends to the use of named argu-
ments, also called “keyword arguments.”

4.8.1 The *args List


The *args syntax can be used to access argument lists of any length.
ntax
Key Sy

def func_name([ordinary_args,] *args):


statements

4
The brackets are used in this case to show that *args may optionally be
preceded by any number of ordinary positional arguments, represented here
as ordinary_args. The use of such arguments is always optional.
In this syntax, the name args can actually be any symbolic name you want.
By convention, Python programs use the name args for this purpose.
The symbolic name args is then interpreted as a Python list like any other;
you expand it by indexing it or using it in a for loop. You can also take its
length as needed. Here’s an example:
def my_var_func(*args):
print('The number of args is', len(args))
for item in args:
print(items)
This function, my_var_func, can be used with argument lists of any length.
>>> my_var_func(10, 20, 30, 40)
The number of args is 4
10
20
30
40
A more useful function would be one that took any number of numeric
arguments and returned the average. Here’s an easy way to write that function.
def avg(*args):
return sum(args)/len(args)

From the Library of Vineeth Babu

Overland_Book.indb 125 4/30/19 1:37 PM


126 Chapter 4 Shortcuts, Command Line, and Packages

Now we can call the function with a different number of arguments each
time.
>>> avg(11, 22, 33)
22.0
>>> avg(1, 2)
1.5
The advantage of writing the function this way is that no brackets are
needed when you call this function. The arguments are interpreted as if they
were elements of a list, but you pass these arguments without list syntax.
What about the ordinary arguments we mentioned earlier? Additional
arguments, not included in the list *args, must either precede *args in the
argument list or be keyword arguments.
For example, let’s revisit the avg example. Suppose we want a separate
argument that specifies what units we’re using. Because units is not a key-
word argument, it must appear at the beginning of the list, in front of *args.
def avg(units, *args):
print (sum(args)/len(args), units)
Here’s a sample use:
>>> avg('inches', 11, 22, 33)
22.0 inches
This function is valid because the ordinary argument, units, precedes the
argument list, *args.

Note Ë The asterisk (*) has a number of uses in Python. In this context, it’s
called the splat or the positional expansion operator. Its basic use is to rep-
resent an “unpacked list”; more specifically, it replaces a list with a simple
sequence of separate items.
The limitation on such an entity as *args is that there isn’t much you can
do with it. One thing you can do (which will be important in Section 4.9,
“Decorators and Function Profilers”) is pass it along to a function. Here’s an
example:
>>> ls = [1, 2, 3] # Unpacked list.
>>> print(*ls) # Print unpacked version
1 2 3
>>> print(ls) # Print packed (ordinary list).
[1, 2, 3]

From the Library of Vineeth Babu

Overland_Book.indb 126 4/30/19 1:37 PM


4.8 Variable-Length Argument Lists 127
The other thing you can do with *args or *ls is to pack it (or rather,
repack it) into a standard Python list; you do that by dropping the asterisk. At
that point, it can be manipulated with all the standard list-handling abilities
in Python.
Ç Note

4.8.2 The “**kwargs” List


The more complete syntax supports keyword arguments, which are named
arguments during a function call. For example, in the following call to the
print function, the end and sep arguments are named.
print(10, 20, 30, end='.', sep=',')
The more complete function syntax recognizes both unnamed and named

4
arguments.
ntax
Key Sy

def func_name([ordinary_args,] *args, **kwargs):


statements
As with the symbolic name args, the symbolic name kwargs can actually
be any name, but by convention, Python programmers use kwargs.
Within the function definition, kwargs refers to a dictionary in which each
key-value pair is a string containing a named argument (as the key) and a
value, which is the argument value passed.
An example should clarify. Assume you define a function as follows:
def pr_named_vals(**kwargs):
for k in kwargs:
print(k, ':', kwargs[k])
This function cycles through the dictionary represented by kwargs, printing
both the key values (corresponding to argument names) and the correspond-
ing values, which have been passed to the arguments.
For example:
>>> pr_named_vals(a=10, b=20, c=30)
a : 10
b : 20
c : 30
A function definition may combine any number of named arguments,
referred to by kwargs, with any number of arguments that are not named,
referred to by args. Here is a function definition that does exactly that.

From the Library of Vineeth Babu

Overland_Book.indb 127 4/30/19 1:37 PM


128 Chapter 4 Shortcuts, Command Line, and Packages

The following example defines such a function and then calls it.
def pr_vals_2(*args, **kwargs):
for i in args:
print(i)
for k in kwargs:
print(k, ':', kwargs[k])

pr_vals_2(1, 2, 3, -4, a=100, b=200)


This miniprogram, when run as a script, prints the following:
1
2
3
-4
a : 100
b : 200

Note Ë Although args and kwargs are expanded into a list and a dictionary,
respectively, these symbols can be passed along to another function, as shown
in the next section.
Ç Note

4.9 Decorators and Function Profilers


When you start refining your Python programs, one of the most useful things
to do is to time how fast individual functions run. You might want to know
how many seconds and fractions of a second elapse while your program exe-
cutes a function generating a thousand random numbers.
Decorated functions can profile the speed of your code, as well as provide
other information, because functions are first-class objects. Central to the
concept of decoration is a wrapper function, which does everything the origi-
nal function does but also adds other statements to be executed.
Here’s an example, illustrated by Figure 4.3. The decorator takes a func-
tion F1 as input and returns another function, F2, as output. This second
function, F2, is produced by including a call to F1 but adding other statements
as well. F2 is a wrapper function.

From the Library of Vineeth Babu

Overland_Book.indb 128 4/30/19 1:37 PM


4.9 Decorators and Function Profilers 129

F1(): Decorator(f): F2():


Original Create a wrapped Wrapped
function, to version of f and version
be wrapped. return it. of F1.

F1 = Decorator(F1)

F2 now replaces F1, so that the name


F1 refers to the wrapped version, F2.
Figure 4.3. How decorators work (high-level view)

4
Here’s an example of a decorator function that takes a function as argu-
ment and wraps it by adding calls to the [Link] function. Note that time
is a package, and it must be imported before [Link] is called.
import time

def make_timer(func):
def wrapper():
t1 = [Link]()
ret_val = func()
t2 = [Link]()
print('Time elapsed was', t2 - t1)
return ret_val
return wrapper
There are several functions involved with this simple example (which, by
the way, is not yet complete!), so let’s review.

◗ There is a function to be given as input; let’s call this the original function (F1
in this case). We’d like to be able to input any function we want, and have it
decorated—that is, acquire some additional statements.
◗ The wrapper function is the result of adding these additional statements to
the original function. In this case, these added statements report the number
of seconds the original function took to execute.
◗ The decorator is the function that performs the work of creating the wrapper
function and returning it. The decorator is able to do this because it internally
uses the def keyword to define a new function.

From the Library of Vineeth Babu

Overland_Book.indb 129 4/30/19 1:37 PM


130 Chapter 4 Shortcuts, Command Line, and Packages

◗ Ultimately, the wrapped version is intended to replace the original version, as


you’ll see in this section. This is done by reassigning the function name.

If you look at this decorator function, you should notice it has an important
omission: The arguments to the original function, func, are ignored. The wrap-
per function, as a result, will not correctly call func if arguments are involved.
The solution involves the *args and **kwargs language features, intro-
duced in the previous section. Here’s the full decorator:
import time

def make_timer(func):
def wrapper(*args, **kwargs):
t1 = [Link]()
ret_val = func(*args, **kwargs)
t2 = [Link]()
print('Time elapsed was', t2 - t1)
return ret_val
return wrapper
The new function, remember, will be wrapper. It is wrapper (or rather, the
function temporarily named wrapper) that will eventually be called in place
of func; this wrapper function therefore must be able to take any number of
arguments, including any number of keyword arguments. The correct action
is to pass along all these arguments to the original function, func. Here’s how:
ret_val = func(*args, **kwargs)
Returning a value is also handled here; the wrapper returns the same value
as func, as it should. What if func returns no value? That’s not a problem,
because Python functions return None by default. So the value None, in that
case, is simply passed along. (You don’t have to test for the existence of a
return value; there always is one!)
Having defined this decorator, make_timer, we can take any function and
produce a wrapped version of it. Then—and this is almost the final trick—
we reassign the function name so that it refers to the wrapped version of the
function.
def count_nums(n):
for i in range(n):
for j in range(1000):
pass

count_nums = make_timer(count_nums)

From the Library of Vineeth Babu

Overland_Book.indb 130 4/30/19 1:37 PM


4.9 Decorators and Function Profilers 131
The wrapper function produced by make_timer is defined as follows
(except that the identifier func will be reassigned, as you’ll see in a moment).
def wrapper(*args, **kwargs):
t1 = [Link]()
ret_val = func(*args, **kwargs)
t2 = [Link]()
print('Time elapsed was', t2 - t1)
return ret_val
We now reassign the name count_nums so that it refers to this function—
wrapper—which will call the original count_nums function but also does
other things.
Confused yet? Admittedly, it’s a brain twister at first. But all that’s going on is
that (1) a more elaborate version of the original function is being created at run

4
time, and (2) this more elaborate version is what the name, count_nums, will
hereafter refer to. Python symbols can refer to any object, including functions
(callable objects). Therefore, we can reassign function names all we want.
count_nums = wrapper
Or, more accurately,
count_nums = make_timer(count_nums)
So now, when you run count_nums (which now refers to the wrapped ver-
sion of the function), you’ll get output like this, reporting execution time in
seconds.
>>> count_nums(33000)
Time elapsed was 1.063697338104248
The original version of count_nums did nothing except do some count-
ing; this wrapped version reports the passage of time in addition to calling the
original version of count_nums.
As a final step, Python provides a small but convenient bit of syntax to
automate the reassignment of the function name.
ntax
Key Sy

@decorator
def func(args):
statements
This syntax is translated into the following:
def func(args):
statements
func = decorator(func)

From the Library of Vineeth Babu

Overland_Book.indb 131 4/30/19 1:37 PM


132 Chapter 4 Shortcuts, Command Line, and Packages

In either case, it’s assumed that decorator is a function that has already
been defined. This decorator must take a function as its argument and return
a wrapped version of the function. Assuming all this has been done correctly,
here’s a complete example utilizing the @ sign.
@make_timer
def count_nums(n):
for i in range(n):
for j in range(1000):
pass
After this definition is executed by Python, count_num can then be called,
and it will execute count_num as defined, but it will also add (as part of the
wrapper) a print statement telling the number of elapsed seconds.
Remember that this part of the trick (the final trick, actually) is to get the
name count_nums to refer to the new version of count_nums, after the new
statements have been added through the process of decoration.

4.10 Generators
There’s no subject in Python about which more confusion abounds than gen-
erators. It’s not a difficult feature once you understand it. Explaining it’s the
hard part.
But first, what does a generator do? The answer: It enables you to deal with
a sequence one element at a time.
Suppose you need to deal with a sequence of elements that would take a
long time to produce if you had to store it all in memory at the same time. For
example, you want to examine all the Fibonacci numbers up to 10 to the 50th
power. It would take a lot of time and space to calculate the entire sequence.
Or you may want to deal with an infinite sequence, such as all even numbers.
The advantage of a generator is that it enables you to deal with one member
of a sequence at a time. This creates a kind of “virtual sequence.”

4.10.1 What’s an Iterator?


One of the central concepts in Python is that of iterator (sometimes confused
with iterable). An iterator is an object that produces a stream of values, one at
a time.

From the Library of Vineeth Babu

Overland_Book.indb 132 4/30/19 1:37 PM


4.10 Generators 133
All lists can be iterated, but not all iterators are lists. There are many func-
tions, such as reversed, that produce iterators that are not lists. These cannot
be indexed or printed in a useful way, at least not directly. Here’s an example:
>>> iter1 = reversed([1, 2, 3, 4])
>>> print(iter1)
<list_reverseiterator object at 0x1111d7f28>
However, you can convert an iterator to a list and then print it, index it, or
slice it:
>>> print(list(iter1))
[4, 3, 2, 1]
Iterators in Python work with for statements. For example, because iter1
is an iterator, the following lines of code work perfectly well.

4
>>> iter1 = reversed([1, 2, 3, 4])
>>> for i in iter1:
print(i, end=' ')

4 3 2 1
Iterators have state information; after reaching the end of its series, an iter-
ator is exhausted. If we used iter1 again without resetting it, it would produce
no more values.

4.10.2 Introducing Generators


A generator is one of the easiest ways to produce an iterator. But the generator
function is not itself an iterator. Here’s the basic procedure.

◗ Write a generator function. You do this by using a yield statement anywhere


in the definition.
◗ Call the function you completed in step 1 to get an iterator object.
◗ The iterator created in step 2 is what yields values in response to the next
function. This object contains state information and can be reset as needed.

Figure 4.4 illustrates the process.

From the Library of Vineeth Babu

Overland_Book.indb 133 4/30/19 1:37 PM


134 Chapter 4 Shortcuts, Command Line, and Packages

A generator function is really a generator factory!


GENERATOR GENERATOR
FUNCTION OBJECT (gen_obj)

def mak_gen(m): Interator


n = 1 object
while n < m: yielding n
yield n n
n += 1
(Holds state
informtion)

Returns When next(gen_obj)


is called, it yields n!
Figure 4.4. Returning a generator from a function

Here’s what almost everybody gets wrong when trying to explain this pro-
cess: It looks as if the yield statement, placed in the generator function (the
thing on the left in Figure 4.4), is doing the yielding. That’s “sort of” true, but
it’s not really what’s going on.
The generator function defines the behavior of the iterator. But the iterator
object, the thing to its right in Figure 4.4, is what actually executes this behavior.
When you include one or more yield statements in a function, the func-
tion is no longer an ordinary Python function; yield describes a behavior in
which the function does not return a value but sends a value back to the caller
of next. State information is saved, so when next is called again, the iterator
advances to the next value in the series without starting over. This part, every-
one seems to understand.
But—and this is where people get confused—it isn’t the generator function
that performs these actions, even though that’s where the behavior is defined.
Fortunately, you don’t need to understand it; you just need to use it. Let’s start
with a function that prints even numbers from 2 to 10:
def print_evens():
for n in range(2, 11, 2):
print(n)
Now replace print(n) with the statement yield n. Doing so changes the
nature of what the function does. While we’re at it, let’s change the name to
make_evens_gen to have a more accurate description.

From the Library of Vineeth Babu

Overland_Book.indb 134 4/30/19 1:37 PM


4.10 Generators 135
def make_evens_gen():
for n in range(2, 11, 2):
yield n
The first thing you might say is “This function no longer returns anything;
instead, it yields the value n, suspending its execution and saving its internal state.”
But this revised function, make_evens_gen, does indeed have a return
value! As shown in Figure 4.4, the value returned is not n; the return value is
an iterator object, also called a “generator object.” Look what happens if you
call make_evens_gen and examine the return value.
>>> make_evens_gen()
<generator object make_evens_gen at 0x1068bd410>
What did the function do? Yield a value for n? No! Instead, it returned an

4
iterator object, and that’s the object that yields a value. We can save the itera-
tor object (or generator object) and then pass it to next.
>>> my_gen = make_evens_gen()
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
Eventually, calling next exhausts the series, and a StopIteration excep-
tion is raised. But what if you want to reset the sequence of values to the begin-
ning? Easy. You can do that by calling make_evens_gen again, producing a
new instance of the iterator. This has the effect of starting over.
>>> my_gen = make_evens_gen() # Start over
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
>>> my_gen = make_evens_gen() # Start over
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6

From the Library of Vineeth Babu

Overland_Book.indb 135 4/30/19 1:37 PM


136 Chapter 4 Shortcuts, Command Line, and Packages

What happens if you call make_evens_gen every time? In that case, you
keep starting over, because each time you’re creating a new generator object.
This is most certainly not what you want.
>>> next(make_evens_gen())
2
>>> next(make_evens_gen())
2
>>> next(make_evens_gen())
2
Generators can be used in for statements, and that’s one of the most fre-
quent uses. For example, we can call make_evens_gen as follows:
for i in make_evens_gen():
print(i, end=' ')
This block of code produces the result you’d expect:
2 4 6 8 10
But let’s take a look at what’s really happening. The for block calls make_
evens_gen one time. The result of the call is to get a generator object. That
object then provides the values in the for loop. The same effect is achieved by
the following code, which breaks the function call onto an earlier line.
>>> my_gen = make_evens_gen()
>>> for i in my_gen:
print(i, end=' ')
Remember that my_gen is an iterator object. If you instead referred to
make_evens_gen directly, Python would raise an exception.
for i in make_evens_gen: # ERROR! Not an iterable!
print(i, end=' ')
Once you understand that the object returned by the generator function
is the generator object, also called the iterator, you can call it anywhere an
iterable or iterator is accepted in the syntax. For example, you can con-
vert a generator object to a list, as follows.
>>> my_gen = make_evens_gen()
>>> a_list = list(my_gen)
>>> a_list
[2, 4, 6, 8, 10]

From the Library of Vineeth Babu

Overland_Book.indb 136 4/30/19 1:37 PM


4.10 Generators 137
>>> a_list = list(my_gen) # Oops! No reset!
>>> a_list
[]
The problem with the last few statements in this example is that each
time you iterate through a sequence using a generator object, the iteration is
exhausted and needs to be reset.
>>> my_gen = make_evens_gen() # Reset!
>>> a_list = list(my_gen)
>>> a_list
[2, 4, 6, 8, 10]
You can of course combine the function call and the list conversion. The
list itself is stable and (unlike a generator object) will retain its values.

4
>>> a_list = list(make_evens_gen())
>>> a_list
[2, 4, 6, 8, 10]
One of the most practical uses of an iterator is with the in and not in
keywords. We can, for example, generate an iterator that produces Fibonacci
numbers up to and including N, but not larger than N.
def make_fibo_gen(n):
a, b = 1, 1
while a <= n:
yield a
a, b = a + b, a
The yield statement changes this function from an ordinary function to
a generator function, so it returns a generator object (iterator). We can now
determine whether a number is a Fibonacci by using the following test:
n = int(input('Enter number: '))
if n in make_fibo_gen(n):
print('number is a Fibonacci. ')
else:
print('number is not a Fibonacci. ')
This example works because the iterator produced does not yield an infinite
sequence, something that would cause a problem. Instead, the iterator termi-
nates if n is reached without being confirmed as a Fibonacci.
Remember—and we state this one last time—by putting yield into the
function make_fibo_gen, it becomes a generator function and it returns the

From the Library of Vineeth Babu

Overland_Book.indb 137 4/30/19 1:37 PM


138 Chapter 4 Shortcuts, Command Line, and Packages

generator object we need. The previous example could have been written as
follows, so that the function call is made in a separate statement. The effect is
the same.
n = int(input('Enter number: '))
my_fibo_gen = make_fibo_gen(n)
if n in my_fibo_gen:
print('number is a Fibonacci. ')
else:
print('number is not a Fibonacci. ')
As always, remember that a generator function (which contains the yield
statement) is not a generator object at all, but rather a generator factory. This
is confusing, but you just have to get used to it. In any case, Figure 4.4 shows
what’s really going on, and you should refer to it often.

4.11 Accessing Command-Line Arguments


Running a program from the command lets you provide the program an extra
degree of flexibility. You can let the user specify command-line arguments;
these are optional arguments that give information directly to the program
on start-up. Alternatively, you can let the program prompt the user for the
information needed. But use of command-line arguments is typically more
efficient.
Command-line arguments are always stored in the form of strings. So—
just as with data returned by the input function—you may need to convert
this string data to numeric format.
To access command-line arguments from within a Python program, first
import the sys package.
import sys
You can then refer to the full set of command-line arguments, including the
function name itself, by referring to a list named argv.
ntax
Key Sy

argv # If 'import [Link]' used


[Link] # If sys imported as 'import sys'
In either case, argv refers to a list of command-line arguments, all stored
as strings. The first element in the list is always the name of the program
itself. That element is indexed as argv[0], because Python uses zero-based
indexing.

From the Library of Vineeth Babu

Overland_Book.indb 138 4/30/19 1:37 PM


4.11 Accessing Command-Line Arguments 139
For example, suppose that you are running quad (a quadratic-equation
evaluator) and input the following command line:
python [Link] -1 -1 1
In this case, argv will be realized as a list of four strings.
Figure 4.5 illustrates how these strings are stored, emphasizing that the first
element, argv[0], refers to a string containing the program name.

len(argv) = 4

"[Link]" "–1" "–1" "1"

argv[0] argv[1] argv[2] argv[3]

4
Program name
Figure 4.5. Command-line arguments and argv

In most cases, you’ll probably ignore the program name and focus on the
other arguments. For example, here is a program named [Link] that does
nothing but print all the arguments given to it, including the program name.
import sys
for thing in [Link]:
print(thing, end=' ')
Now suppose we enter this command line:
python [Link] arg1 arg2 arg3
The Terminal program (in Mac) or the DOS Box prints the following:
[Link] arg1 arg2 arg3
The following example gives a more sophisticated way to use these strings,
by converting them to floating-point format and passing the numbers to the
quad function.
import sys

def quad(a, b, c):


'''Quadratic Formula function.'''

determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)

From the Library of Vineeth Babu

Overland_Book.indb 139 4/30/19 1:37 PM


140 Chapter 4 Shortcuts, Command Line, and Packages

x2 = (-b - determin) / (2 * a)
return x1, x2

def main():
'''Get argument values, convert, call quad.'''

s1, s2, s3 = [Link][1], [Link][2], [Link][3]


a, b, c = float(s1), float(s2), float(s3)
x1, x2 = quad(a, b, c)
print('x values: {}, {}.'.format(x1, x2))

main()
The interesting line here is this one:
s1, s2, s3 = [Link][1], [Link][2], [Link][3]
Again, the [Link] list is zero-based, like any other Python list, but the
program name, referred to as [Link][0], typically isn’t used in the program
code. Presumably you already know what the name of your program is, so you
don’t need to look it up.
Of course, from within the program you can’t always be sure that argument
values were specified on the command line. If they were not specified, you
may want to provide an alternative, such as prompting the user for these same
values.
Remember that the length of the argument list is always N+1, where N
is the number of command-line arguments—beyond the program name, of
course.
Therefore, we could revise the previous example as follows:
import sys

def quad(a, b, c):


'''Quadratic Formula function.'''

determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2

def main():
'''Get argument values, convert, call quad.'''

From the Library of Vineeth Babu

Overland_Book.indb 140 4/30/19 1:37 PM


Summary 141
if len([Link]) > 3:
s1, s2, s3 = [Link][1], [Link][2], [Link][3]
else:
s1 = input('Enter a: ')
s2 = input('Enter b: ')
s3 = input('Enter c: ')
a, b, c = float(s1), float(s2), float(s3)
x1, x2 = quad(a, b, c)
print('x values: {}, {}.'.format(x1, x2))

main()
The key lines in this version are in the following if statement:
if len([Link]) > 3:

4
s1, s2, s3 = [Link][1], [Link][2], [Link][3]
else:
s1 = input('Enter a: ')
s2 = input('Enter b: ')
s3 = input('Enter c: ')
a, b, c = float(s1), float(s2), float(s3)
If there are at least four elements in [Link] (and therefore three
command-line arguments beyond the program name itself), the program uses
those strings. Otherwise, the program prompts for the values.
So, from the command line, you’ll be able to run the following:
python [Link] 1 -9 20
The program then prints these results:
x values: 4.0 5.0

Chapter 4 Summary
A large part of this chapter presented ways to improve your efficiency through
writing better and more efficient Python code. Beyond that, you can make your
Python programs run faster if you call the print function as rarely as possible
from within IDLE—or else run programs from the command line only.
A technique helpful in making your code more efficient is to profile it by
using the time and datetime packages to compute the relative speed of the
code, given different algorithms. Writing decorators is helpful in this respect,
because you can use them to profile function performance.

From the Library of Vineeth Babu

Overland_Book.indb 141 4/30/19 1:37 PM


142 Chapter 4 Shortcuts, Command Line, and Packages

One of the best ways of supercharging your applications, in many cases, is


to use one of the many free packages available for use with Python. Some of
these are built in; others, like the numpy package, you’ll need to download.

Chapter 4 Questions for Review


1 Is an assignment operator such as += only a convenience? Can it actually result
in faster performance at run time?
2 In most computer languages, what is the minimum number of statements
you’d need to write instead of the Python statement a, b = a + b, a?
3 What’s the most efficient way to initialize a list of 100 integers to 0 in Python?
4 What’s the most efficient way of initializing a list of 99 integers with the pat-
tern 1, 2, 3 repeated? Show precisely how to do that, if possible.
5 If you’re running a Python program from within IDLE, describe how to most
efficiently print a multidimensional list.
6 Can you use list comprehension on a string? If so, how?
7 How can you get help on a user-written Python program from the command
line? From within IDLE?
8 Functions are said to be “first-class objects” in Python but not in most other
languages, such as C++ or Java. What is something you can do with a Python
function (callable object) that you cannot do in C or C++?
9 What’s the difference between a wrapper, a wrapped function, and a
decorator?
10 When a function is a generator function, what does it return, if anything?
11 From the standpoint of the Python language, what is the one change that
needs to be made to a function to turn it into a generator function?
12 Name at least one advantage of generators.

Chapter 4 Suggested Problems


1 Print a matrix of 20 × 20 stars or asterisks (*). From within IDLE, demon-
strate the slowest possible means of doing this task and the fastest possible
means. (Hint: Does the fastest way utilize string concatenation of the join

From the Library of Vineeth Babu

Overland_Book.indb 142 4/30/19 1:37 PM


Suggested Problems 143
method?) Compare and contrast. Then use a decorator to profile the speeds of
the two ways of printing the asterisks.
2 Write a generator to print all the perfect squares of integers, up to a specified
limit. Then write a function to determine whether an integer argument is a
perfect square if it falls into this sequence—that is, if n is an integer argument,
the phrase n in square_iter(n) should yield True or False.

From the Library of Vineeth Babu

Overland_Book.indb 143 4/30/19 1:37 PM


This page intentionally left blank

From the Library of Vineeth Babu

Overland_Book.indb 634 4/30/19 1:38 PM


5 Formatting Text
Precisely
When programming for business and professional use, you want to format
text to create beautiful-looking tables and presentations. In this area, Python
has an embarrassment of riches. It has several ways to modify and enhance the
printing of information in text-character form.
This chapter presents all three approaches in detail, beginning with the
string-formatting operator, %s, which typically provides the quickest, easi-
est solution. For the most complete control, you may want to use the format
function or format method, which support many options, even letting you
print large numbers with the thousands-place separator (,).

5.1 Formatting with the Percent Sign Operator (%)


Here’s a simple problem in formatting output. Suppose you want to print a
sentence in the following form, in which a, b, and c are currently equal to 25,
75, and 100, but they could have any values. You want to get the following
result by referring to the variables.
25 plus 75 equals 100.
This should be easy. But if you use the print function, it puts a space between
the number 100 and the dot (.), so you get the following:
25 plus 75 equals 100 .
What do you do about that unwanted space? The print function lets you turn
off the default placing of a space between print fields by setting the sep argument
to an empty space. But in that case, you have to put in all the spaces yourself.
print(a, ' plus ', b, ' equals ', c, '.', sep='')
This works, but it’s ugly.

145
From the Library of Vineeth Babu

Overland_Book.indb 145 4/30/19 1:37 PM


146 Chapter 5 Formatting Text Precisely

A better approach is to use the str class formatting operator (%) to format
the output, using format specifiers like those used by the C-language “printf”
function. Here’s how you’d revise the example:
print('%d plus %d equals %d.' % (a, b, c))
Isn’t that better?
The expression (a, b, c) is actually a tuple containing three arguments,
each corresponding to a separate occurrence of %d within the format string.
The parentheses in (a, b, c) are strictly required—although they are not
required if there is only one argument.
>>> 'Here is a number: %d.' % 100
'Here is a number: 100.'
These elements can be broken up programmatically, of course. Here’s an
example:
n = 25 + 75
fmt_str = 'The sum is %d.'
print(fmt_str % n)
This example prints the following:
The sum is 100.
The string formatting operator, %, can appear in either of these two
versions.
ntax
Key Sy

format_str % value # Single value


format_str % (values) # One or more values
If there is more than one value argument, the arguments corresponding to
print fields (which are marked by a type character and a percent sign, %) must
be placed inside a tuple. Both of the following statements are valid:
print('n is %d' % n)
print('n is %d and m is %d' % (n, m))
The next example also works, because it organizes three numbers into a
tuple.
tup = 10, 20, 30
print('Answers are %d, %d, and %d.' % tup)
These statements print the following:
Answers are 10, 20, and 30.

From the Library of Vineeth Babu

Overland_Book.indb 146 4/30/19 1:37 PM


5.2 Percent Sign (%) Format Specifiers 147

5.2 Percent Sign (%) Format Specifiers


The format specifier %d stands for decimal integer. It’s a common format, but
the formatting operator (%) works with other formats, as shown in Table 5.1.

Table 5.1. Percent-Sign Specifiers


EXAMPLE OF
SPECIFIER MEANING OUTPUT
%d Decimal integer. 199
%i Integer. Same meaning as %d. 199
%s Standard string representation of the input. This field says, “Produce a Thomas
string,” but it can be used to print the standard string representation of
any data object. So this can actually be used with integers if you choose.
%r Standard %r representation of the input, which is often the same as ' Bob'
%s but uses the canonical representation of the object as it appears in
Python code. (For more information, see Section 5.7, “‘Repr’ Versus
String Conversion.”)

5
%x Hexadecimal integer. ff09a
%X Same as %x, but letter digits A–F are uppercase. FF09A
%o Octal integer. 177
%u Unsigned integer. (But note that this doesn’t reliably change signed 257
integers into their unsigned equivalent, as you’d expect.)
%f Floating-point number to be printed in fixed-point format 3.1400
%F Same as %f. 33.1400
%e Floating-point number, printing exponent sign (e). 3.140000e+00
%E Same as %e but uses uppercase E. 3.140000E+00
%g Floating point, using shortest canonical representation. 7e-06
%G Same as %g but uses uppercase E if printing an exponent. 7E-06
%% A literal percent sign (%). %

Here’s an example that uses the int conversion, along with hexadecimal
output, to add two hexadecimal numbers: e9 and 10.
h1 = int('e9', 16)
h2 = int('10', 16)
print('The result is %x.' % (h1 + h2))
The example prints
The result is f9.

From the Library of Vineeth Babu

Overland_Book.indb 147 4/30/19 1:37 PM


148 Chapter 5 Formatting Text Precisely

Therefore, adding hexadecimal e9 and hexadecimal 10 produces hexadeci-


mal f9, which is correct.
The parentheses around h1 and h2 are necessary in this example. Other-
wise, the example creates a formatted string by using h1 as the data, and then
it attempts to concatenate that string with a number, h2, causing an error.
print('The result is %x.' % h1 + h2) # ERROR!
When you’re printing a hexadecimal or octal number, the formatting
operator (%) puts no prefix in front of the number. If you want to print hexa-
decimal numbers with prefixes, you need to specify them yourself.
print('The result is 0x%x.' % (h1 + h2))
That statement prints the following:
The result is 0xf9.
Printing a substring (%s) inside a larger string is another common usage for
the formatting operator. Here’s an example:
s = 'We is %s, %s, & %s.' % ('Moe', 'Curly', 'Larry')
print(s)
This prints
We is Moe, Curly, & Larry.
The behavior of these formats can be altered by the use of width and pre-
cision numbers. Each print field has the format shown here, in which c rep-
resents one of the format characters in Table 5.1.
ntax
Key Sy

%[-][width][.precision]c
In this syntax, the square brackets indicate optional items and are not
intended literally. The minus sign (–) specifies left justification within the print
field. With this technology, the default is right justification for all data types.
But the following example uses left justification, which is not the default,
by including the minus sign (–) as part of the specifier.
>>> 'This is a number: %-6d.' % 255
'This is a number: 255 .'
As for the rest of the syntax, a format specifier can take any of the follow-
ing formats.
%c
%widthc
%[Link]
%.precisionc

From the Library of Vineeth Babu

Overland_Book.indb 148 4/30/19 1:37 PM


5.2 Percent Sign (%) Format Specifiers 149
In the case of string values, the text is placed into a print field of size width,
if specified. The substring is right justified (by default) and padded with
spaces. If the print field is smaller than the length of the substring, width is
ignored. The precision, if included, specifies a maximum size for the string,
which will be truncated if longer.
Here’s an example of the use of a 10-space print-field width.
print('My name is %10s.' % 'John')
This prints the following, including six spaces of padding.
My name is John.
In the case of integers to be printed, the width number is interpreted in
the same way. But in addition, the precision specifies a smaller field, within
which the number is right justified and padded with leading zeros. Here’s an
example:
print('Amount is %10d.' % 25)
print('Amount is %.5d.' % 25)
print('Amount is %10.5d.' % 25)

5
These statements print
Amount is 25.
Amount is 00025.
Amount is 00025.
Finally, the width and precision fields control print-field width and pre-
cision in a floating-point number. The precision is the number of digits to the
right of the decimal point; this number contains trailing zeros if necessary.
Here’s an example:
print('result:%12.5f' % 3.14)
print('result:%12.5f' % 333.14)
These statements print the following:
result: 3.14000
result: 333.14000
In this case, the number 3.14 is padded with trailing zeros, because a pre-
cision of 5 digits was specified. When the precision field is smaller than the
precision of the value to be printed, the number is rounded up or down as
appropriate.
print('%.4f' % 3.141592)

From the Library of Vineeth Babu

Overland_Book.indb 149 4/30/19 1:37 PM


150 Chapter 5 Formatting Text Precisely

This function call prints the following—in this case with 4 digits of preci-
sion, produced through rounding:
3.1416
Use of the %s and %r format characters enables you to work with any classes
of data. These specifiers result in the calling of one of the internal methods
from those classes supporting string representation of the class, as explained
in Chapter 9, “Classes and Magic Methods.”
In many cases, there’s no difference in effect between the %s and %r speci-
fiers. For example, either one, used with an int or float object, will result in
that number being translated into the string representation you’d expect.
You can see those results in the following IDLE session, in which user input
is in bold.
>>> 'The number is %s.' % 10
The number is 10.
>>> 'The number is %r.' % 10
The number is 10.
From these examples, you can see that both the %s and the %r just print the
standard string representation of an integer.
In some cases, there is a difference between the string representation indi-
cated by %s and by %r. The latter is intended to get the canonical representa-
tion of the object as it appears in Python code.
One of the principal differences between the two forms of representation is
that the %r representation includes quotation marks around strings, whereas
%s does not.
>>> print('My name is %r.' % 'Sam')
My name is 'Sam'.
>>> print('My name is %s.' % 'Sam')
My name is Sam.

5.3 Percent Sign (%) Variable-Length Print Fields


After you’ve been using the format operator (%) for a while, you may won-
der whether there’s a way to create variable-length widths for print fields. For
example, you might want to print a table after determining the maximum
width needed, set this as the desired width (say, N = 6, where N is the maxi-
mum size needed), and then give every print field the same size.

From the Library of Vineeth Babu

Overland_Book.indb 150 4/30/19 1:37 PM


5.3 Percent Sign (%) Variable-Length Print Fields 151
Fortunately, the percent sign formatting (%) provides an easy way to do this.
To create a variable-width field, place an asterisk (*) where you’d normally
place an integer specifying a fixed width. Here’s an example:
>>> 'Here is a number: %*d' % (3, 6)
'Here is a number: 6'
Each asterisk used in this way creates the need for an extra argument. That
argument appears first, before the data object it’s being applied to. So the order
of the two arguments within the tuple is (1) print field width and (2) data to be
printed.
You can print other kinds of data, such as strings.
>>> 'Here is a number: %*s' % (3, 'VI')
'Here is a number: VI'
Again, the first argument is the print-field width—in this case, 3. The sec-
ond argument is the data to be printed—in this case, the string 'VI'.
You can include multiple uses of a variable-width print field within a for-
mat string. Remember that for each asterisk that appears in the format string,

5
there must be an additional argument. So if you want to format two such
data objects at once, you’d need to have four arguments altogether. Here’s an
example:
>>> 'Item 1: %*s, Item 2: %*s' % (8, 'Bob', 8, 'Suzanne')
'Item 1: Bob, Item 2: Suzanne'
The arguments—all placed in the tuple following the argument (with
parentheses required, by the way)—are 8, 'Bob', 8, and 'Suzanne'.
The meaning of these four arguments is as follows:

◗ The first print-field width is 8.


◗ The first data object to be printed is 'Bob' (that is, print the string as is).
◗ The second print-field width is 8.
◗ The second data object to be printed is 'Suzanne'.

As indicated earlier, this number can be a variable whose value is deter-


mined at run time. Here’s an example:
>>> n = 8
>>> 'Item 1: %*s, Item 2: %*s' % (n, 'Bob', n, 'Suzanne')
'Item 1: Bob, Item 2: Suzanne'

From the Library of Vineeth Babu

Overland_Book.indb 151 4/30/19 1:37 PM


152 Chapter 5 Formatting Text Precisely

All the arguments—including the field-width arguments (n in this exam-


ple)—are placed in a tuple that follows the percent operator (%).
The variable-length width feature can be combined with other features. For
example, you can use the %r specifier instead of %s; this has no effect on num-
bers to be printed, but it causes strings to be printed with quotation marks.
>>> n = 9
>>> 'Item 1: %*r, Item 2: %*r' % (n, 'Bob', n, 'Suzanne')
"Item 1: 'Bob', Item 2: 'Suzanne'"
You can also create variable-length precision indicators. The general rule
with the format operator (%) is this:

✱ Where you’d normally put an integer as a formatting code, you can instead
place an asterisk (*); and for each such asterisk, you must place a correspond-
ing integer expression in the argument list.

For example, the following statement formats a number as if the specifier


were '%8.3f':
>>> '%*.*f' % (8, 3, 3.141592)
' 3.142'

5.4 The Global “ format” Function


Two closely related features of Python give you even greater control over
specifications. The global format function enables specification of one print
field. For example, it provides an easy way to add commas as thousands place
separators.
>>> big_n = 10 ** 12 # big_n is 10 to 12th power
>>> format(big_n, ',')
'1,000,000,000,000'
This is only a hint of what you can do with the format function. This section
provides only an introduction to this function’s capabilities. Section 5.8, “The
‘spec’ Field of the ‘format’ Function and Method,” describes other syntactic ele-
ments of a format specification (or spec) that you can use with this function.
The format function is closely related to the format method of the string
class (str).
When the format method processes a string, it analyzes format specifiers,
along with the data objects used as input. It carries out this analysis by calling
the global format function for each individual field.

From the Library of Vineeth Babu

Overland_Book.indb 152 4/30/19 1:37 PM


5.4 The Global “format” Function 153
The format function then calls the _ _format_ _ method for the data
object’s class, as explained in Chapter 9. This process has the virtue of letting
every type, including any new classes you might write, interact with all the
format specifier syntax—or choose to ignore it.
Figure 5.1 shows the flow of control between the various functions involved:
the format method of the string class, the global format function, and finally
the _ _format_ _ method within each class.
The class may or may not choose to handle this method directly. By default,
the _ _str_ _ method of that class is called if _ _format_ _ is not defined.

Class of object
being printed

format global _ _format_ _


method (str format for the
class) function class

5
For each
print
field
May have
multiple print
fields
Figure 5.1. Flow of control between formatting routines
ntax
Key Sy

format(data, spec)
This function returns a string after evaluating the data and then formatting
according to the specification string, spec. The latter argument is a string
containing the specification for printing one item.
The syntax shown next provides a simplified view of spec grammar. It
omits some features such as the fill and align characters, as well as the use of 0
in right justifying and padding a number. To see the complete syntax of spec,
see Section 5.8, “The ‘spec’ Field of the ‘format’ Method.”
ntax
Key Sy

[width][,][.precision][type]
In this syntax, the brackets are not intended literally but signify optional
items. Here is a summary of the meaning.

From the Library of Vineeth Babu

Overland_Book.indb 153 4/30/19 1:37 PM


154 Chapter 5 Formatting Text Precisely

The function attempts to place the string representation of the data into a
print field of width size, justifying text if necessary by padding with spaces.
Numeric data is right justified by default; string data is left justified by default.
The comma (,) indicates insertion of commas as thousands place separa-
tors. This is legal only with numeric data; otherwise, an exception is raised.
The precision indicates the total number of digits to print with a float-
ing-point number, or, if the data is not numeric, a maximum length for string
data. It is not supported for use with integers. If the type_char is f, then the
precision indicates a fixed number of digits to print to the right of the decimal
point.
The type_char is sometimes a radix indicator, such as b or x (binary or
hexadecimal), but more often it is a floating-point specifier such as f, which
indicates fixed-point format, or e and g, as described later in Table 5.5.
Table 5.2 gives some examples of using this specification. You can figure
out most of the syntax by studying these examples.

Table 5.2. Sample Format Specifiers for the “format” Function


FORMAT
SPECIFICATION MEANING
',' Displays thousands place separators as part of a number—for example, displaying
1000000 as 1,000,000.
'5' Specifies a minimum print-field width of 5 characters. If the information to be dis-
played is smaller in size, it is justified by being padded. Numbers are right justified
by default; strings are left justified by default.
'10' Specifies a minimum print-field width of 10 characters. If the representation of the
object is smaller than 10 characters, it is justified within a field that wide.
'10,' Specifies a minimum print-field width of 10 characters and also displays thousands
place separators.
'10.5' Specifies a minimum print-field width of 10. If the data is a string, 5 characters is a
print-field maximum and anything larger is truncated. If the data is floating point,
the field displays at most 5 digits total to the left and right of the decimal point;
rounding is performed, but if the display size still exceeds the space allowed, the
number is displayed in exponential format, such as 3+010e. The precision field (5 in
this case) is not valid for integers.
'8.4' Same as above, but print-field width is 8 and precision is 4.
'10,.7' Specifies a minimum print-field width of 10 and precision of 7 (total number of
digits to the left and right), and it displays thousands place separators.
'10.3f' Fixed-point display. Uses a print-field width of 10 and displays exactly 3 digits to
the right of the decimal point. Rounding up or down, or putting in trailing zeros, is
performed as needed to make the number of digits come out exactly right.

From the Library of Vineeth Babu

Overland_Book.indb 154 4/30/19 1:37 PM


5.4 The Global “format” Function 155
Table 5.2. Sample Format Specifiers for the “format” Function (continued)
FORMAT
SPECIFICATION MEANING
'10.5f' Uses a print-field width of 10 and displays exactly 5 digits to the right of the decimal
point.
'.3f' Displays exactly 3 digits to the right of the decimal point. There is no minimum
width in this case.
'b' Uses binary radix.
'6b' Uses binary radix; right justifies numbers within a field of 6 characters.
'x' Uses hexadecimal radix.
'5x' Uses hexadecimal radix; right justifies numbers within a field of 5 characters.
'o' Uses octal radix.
'5o' Uses octal radix; right justifies numbers within a field of 5 characters.

The remainder of this section discusses the features in more detail, particu-
larly width and precision fields.

5
The thousands place separator is fairly self-explanatory but works only
with numbers. Python raises an exception if this specifier is used with data
that isn’t numeric.
You might use it to format a large number such as 150 million.
>>> n = 150000000
>>> print(format(n, ','))
150,000,000
The width character is used consistently, always specifying a minimum
print-field width. The string representation is padded—with spaces by
default—and uses a default of left justification for strings and right justifica-
tion for numbers. Both the padding character and justification can be altered,
however, as explained later in this chapter, in Section 5.8.2, “Text Justifica-
tion: ‘fill’ and ‘align’ Characters.”
Here are examples of justification, padding, and print fields. The single
quotation marks implicitly show the extent of the print fields. Remember that
numeric data (150 and 99, in this case) are right justified by default, but other
data is not.
>>> format('Bob', '10')
'Bob '
>>> format('Suzie', '7')
'Suzie '

From the Library of Vineeth Babu

Overland_Book.indb 155 4/30/19 1:37 PM


156 Chapter 5 Formatting Text Precisely

>>> format(150, '8')


' 150'
>>> format(99, '5')
' 99'
The width is always a print-field minimum, not a maximum. A width field
does not cause truncation.
The precision specifier works differently, depending on the kind of data it’s
applied to. With string data, the precision is a print-field maximum, and it can
cause truncation. With floating-point fields, precision specifies the maximum
number of total characters to the left and right of the decimal point—not
counting the decimal point itself—and thereby rounds up or down as needed.
Here’s an example:
>>> format('Bobby K.', '6.3')
'Bob '
>>> format(3.141592, '6.3')
' 3.14'
But if the f type specifier is also used, it specifies fixed-point display for-
mat, and that changes the rules for floating point. With fixed-point format,
the precision specifies the number of digits to the right of the decimal point,
unconditionally.
The format function uses rounding or padding with trailing zeros, as
needed, to achieve the fixed number of digits to the right of the decimal point.
Here’s an example:
>>> format(3.141592, '9.3f')
' 3.142'
>>> format(100.7, '9.3f')
' 100.700'
As you can see, the fixed-point format is useful for placing numbers in col-
umns in which the decimal point lines up nicely.
As mentioned earlier, Section 5.8 discusses the complete syntax for spec,
which is used by both the global format function and the format method.

5.5 Introduction to the “ format” Method


To get the most complete control over formatting, use the format method.
This technique contains all the power of the global format function but is
more flexible because of its ability to handle multiple print fields.

From the Library of Vineeth Babu

Overland_Book.indb 156 4/30/19 1:37 PM


5.5 Introduction to the “format” Method 157
Let’s return to the example that started this chapter. Suppose you have
three integer variables (a, b, and c) and you want to print them in a sentence
that reads as follows:
25 plus 75 equals 100.
The format method provides a smooth, readable way to produce this print
string.
print('{} plus {} equals {}.'.format(25, 75, 100))
Each occurrence of {} in the format string is filled in with the string repre-
sentation of the corresponding argument.
ntax
Key Sy

format_specifying_str.format(args)
Let’s break down the syntax a little. This expression passes through all
the text in format_specifying_str (or just “format string”), except where
there’s a print field. Print fields are denoted as “{}.” Within each print field,
the value of one of the args is printed.
If you want to print data objects and are not worried about the finer issues

5
of formatting, just use a pair of curly braces, {}, for each argument. Strings are
printed as strings, integers are printed as integers, and so on, for any type of
data. Here’s an example:
fss = '{} said, I want {} slices of {}.'

name = 'Pythagoras'
pi = 3.141592
print([Link](name, 2, pi))
This prints
Pythagoras said, I want 2 slices of 3.141592.
The arg values, of course, either can be constants or can be supplied by
variables (such as name and pi in this case).
Curly braces are special characters in this context. To print literal curly
braces, not interpreted as field delimiters, use {{ and }}. Here’s an example:
print('Set = {{{}, {}}}'.format(1, 2))
This prints
Set = {1, 2}

From the Library of Vineeth Babu

Overland_Book.indb 157 4/30/19 1:37 PM


158 Chapter 5 Formatting Text Precisely

This example is a little hard to read, but the following may be clearer.
Remember that double open curly braces, {{, and double closed curly braces,
}}, cause a literal curly brace to be printed.
fss = 'Set = {{ {}, {}, {} }}'
print([Link](15, 35, 25))
This prints
Set = { 15, 35, 25 }
Of course, as long as you have room on a line, you can put everything
together:
print('Set = {{ {}, {}, {} }}'.format(15, 35, 25))
This prints the same output. Remember that each pair of braces defines
a print field and therefore causes an argument to be printed, but {{ and }}
cause printing of literal braces.

5.6 Ordering by Position (Name or Number)


ntax
Key Sy

{ [position] [!r|s|a] [: spec ] }


In the syntax for print fields within a format string, the square brackets
are not intended literally but indicate optional items. With the second item,
the syntax indicates an exclamation mark followed by r, s, or a, but not more
than one of these; we look at that syntax in the next section.
The spec is a potentially complex series of formatting parameters. This
chapter focuses on spec beginning in Section 5.8 and explains all the possible
subfields.
One of the simplest applications of this syntax is to use a lone position
indicator.
{ position }
The position indicates which argument is being referred to by using
either a number or a name. Using a position indicator lets you to refer to
arguments out of order.
The position indicator, in turn, is either an index number or a named
position:
pos_index | pos_name

From the Library of Vineeth Babu

Overland_Book.indb 158 4/30/19 1:37 PM


5.6 Ordering by Position (Name or Number) 159
We’ll consider each of these in turn. A position index is a number referring
to an item in the format method argument list according to its zero-based
index. A position name needs to be matched by named arguments, which we’ll
return to. First, let’s look at position indexes, because these are fairly easy to
understand.
The general rule about arguments to the format method is this:

✱ A call to the format method must have at least as many arguments as the
format-specification string has print fields, unless fields are repeated as shown
at the end of this section. But if more arguments than print fields appear, the
excess arguments (the last ones given) are ignored.

So, for example, consider the following print statement:


print('{}; {}; {}!'.format(10, 20, 30))
This prints
10; 20; 30!
You can use integer constants in the position field to print in reverse order.

5
These are zero-based indexes, so they are numbered 0, 1, and 2.
print('The items are {2}, {1}, {0}.'.format(10, 20, 30))
This statement prints
The items are 30, 20, 10.
You can also use zero-based index numbers to refer to excess arguments, in
which there are more arguments than print fields. Here’s an example:
fss = 'The items are {3}, {1}, {0}.'
print([Link](10, 20, 30, 40))
These statements print
The items are 40, 20, 10.
Note that referring to an out-of-range argument raises an error. In this
example there are four arguments, so they are indexed as 0, 1, 2, and 3. No
index number was an out-of-range reference in this case.
Print fields can also be matched to arguments according to argument
names. Here’s an example:
fss = 'a equals {a}, b equals{b}, c equals {c}.'
print([Link](a=10, c=100, b=50))

From the Library of Vineeth Babu

Overland_Book.indb 159 4/30/19 1:37 PM


160 Chapter 5 Formatting Text Precisely

This example prints


a equals 10, b equals 50, c equals 100.
You can also use the positioning techniques to repeat values in your output.
Here’s an example:
print('{0}, {0}, {1}, {1}'.format(100, 200))
This example prints
100, 100, 200, 200
Position ordering has an advanced feature that’s occasionally useful for cer-
tain applications. By changing the format string itself, you can change which
parts of an argument get selected for inclusion in the print string.
For example, {0[0]:} means “Select the first element of the first argument.”
{0[1]:} means “Select the second element of the first argument.” And so on.
Here’s a more complete example. Remember that zero-based indexing is
used, as usual.
>>> a_list = [100, 200, 300]
>>> '{0[1]:}, {0[2]:}'.format(a_list)
'200, 300'
This technology works with named positions as well.
>>> '{a[1]:}, {a[2]:}'.format(a=a_list)
'200, 300'
So what is the point of maintaining this control over position ordering?
Many applications will never need it, but it enables you to use a format string
to reorder data as needed. This is particularly useful, for example, when you’re
translating to another natural language and reordering may be mandated by
the language grammar.
One case of this might involve the Japanese language, as in this example:
if current_lang == 'JPN':
fss = '{0}はいつ{2}の{1}と会うのだろうか?'
else:
fss = "When will {0} meet {1} at {2}'s?"
print([Link]('Fred', 'Sam', 'Joe'))
Depending on the value of current_lang, this may print the following:
When will Fred meet Sam at Joe's?

From the Library of Vineeth Babu

Overland_Book.indb 160 4/30/19 1:37 PM


5.7 “Repr” Versus String Conversion 161
Or else it will print the following. Notice that the position of the names has
changed, in line with Japanese grammar, which changes the order of some of
the names.
FredはいつJoeのSamと会うのだろうか?

5.7 “Repr” Versus String Conversion


In Python, every type may have up to two different string representations.
This may seem like overkill, but occasionally it’s useful. It stems from Python
being an interpreted language.
This section discusses the difference between str and repr conversions.
However, all the information here is equally applicable to other uses of str
and repr, such as the %s and %r formatting specifiers.
When you apply a str conversion, that conversion returns the string equiv-
alent of the data exactly as it would be printed by the print function.
print(10) # This prints 10.

5
print(str(10)) # So does this!
But for some types of data, there is a separate repr conversion that is not
the same as str. The repr conversion translates a data object into its canon-
ical representation in source code—that is, how it would look inside a Python
program.
Here’s an example:
print(repr(10)) # This ALSO prints 10.
In this case, there’s no difference in what gets printed. But there is a dif-
ference with strings. Strings are stored in memory without quotation marks;
such marks are delimiters that usually appear only in source code. Furthermore,
escape sequences such as \n (a newline) are translated into special characters
when they are stored; again \n is a source-code representation, not the actual
storage.
Take the following string, test_str:
test_str = 'Here is a \n newline! '
Printing this string directly causes the following to be displayed:
Here is a
newline!
But applying repr to the string and then printing it produces a different
result, essentially saying, “Show the canonical source-code representation.”

From the Library of Vineeth Babu

Overland_Book.indb 161 4/30/19 1:37 PM


162 Chapter 5 Formatting Text Precisely

This includes quotation marks, even though they are not part of the string
itself unless they’re embedded. But the repr function includes quotation
marks because they are part of what would appear in Python source code to
represent the string.
print(repr(test_str))
This statement prints
'Here is a \n newline.'
The %s and %r formatting specifiers, as well as the format method, enable
you to control which style of representation to use. Printing a string argument
without repr has the same effect as printing it directly. Here’s an example:
>>> print('{}'.format(test_str))
Here is a
newline!
Using the !r modifier causes a repr version of the argument to be used—
that is, the repr conversion is applied to the data.
>>> print('{!r}'.format(test_str))
'Here is a \n newline! '
The use of !r is orthogonal with regard to position ordering. Either may
be used without interfering with the other. So can you see what the following
example does?
>>> print('{1!r} loves {0!r}'.format('Joanie', 'ChaCha'))
'ChaCha' loves 'Joanie'
The formatting characters inside the curly braces do two things in this case.
First, they use position indexes to reverse “Joanie loves ChaCha”; then the !r
format causes the two names to be printed with quotation marks, part of the
canonical representation within Python code.

Note Ë Where !s or !r would normally appear, you can also use !a, which is
similar to !s but returns an ASCII-only string.
Ç Note

5.8 The “spec” Field of the “ format” Function and Method


This section and all its subsections apply to both the global format function
and the format method. However, most of the examples in the remainder of

From the Library of Vineeth Babu

Overland_Book.indb 162 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 163
the chapter assume the use of the format method, which is why they show
spec in the context of a print field, {}, and a colon (:).
The syntax of the spec, the format specifier, is the most complex part of
format method grammar. Each part is optional, but if used, it must observe
the order shown. (The square brackets indicate that each of these items is
optional.)
ntax
Key Sy

[[fill]align][sign][#][0][width][,][.prec][type]
The items here are mostly independent of each other. Python interprets
each item according to placement and context. For example, prec (precision)
appears right after a decimal point (.) if it appears at all.
When looking at the examples, remember that curly braces and colons are
used only when you use spec with the global format function and not the
format method. With the format function, you might include align, sign,
0, width, precision, and type specifiers, but no curly braces or colon.
Here’s an example:
s = format(32.3, '<+08.3f')

5
5.8.1 Print-Field Width
One of the commonly used items is print-field width, specified as an integer.
The text to be printed is displayed in a field of this size. If the text is shorter
than this width, it’s justified and extra spaces are padded with blank spaces by
default.
Placement: As you can see from the syntax display, the width item is in
the middle of the spec syntax. When used with the format method, width
always follows a colon (:), as does the rest of the spec syntax.
The following example shows how width specification works on two num-
bers: 777 and 999. The example uses asterisks (*) to help illustrate where the
print fields begin and end, but otherwise these asterisks are just literal charac-
ters thrown in for the sake of illustration.
n1, n2 = 777, 999
print('**{:10}**{:2}**'.format(n1, n2))
This prints
** 777**999**
The numeral 777 is right justified within a large print field (10). This
is because, by default, numeric data is right justified and string data is left
justified.

From the Library of Vineeth Babu

Overland_Book.indb 163 4/30/19 1:37 PM


164 Chapter 5 Formatting Text Precisely

The numeral 999 exceeds its print-field size (2) in length, so it is simply
printed as is. No truncation is performed.
Width specification is frequently useful with tables. For example, suppose
you want to print a table of integers, but you want them to line up.
10
2001
2
55
144
2525
1984
It’s easy to print a table like this. Just use the format method with a print-
field width that’s wider than the longest number you expect. Because the data
is numeric, it’s right justified by default.
'{:5}'.format(n)
Print-field width is orthogonal with most of the other capabilities. The
“ChaCha loves Joanie” example from the previous section could be revised:
fss = '{1!r:10} loves {0!r:10}!!'
print([Link]('Joanie', 'ChaCha'))
This prints
'ChaCha' loves 'Joanie' !!
The output here is similar output to the earlier “ChaCha and Joanie”
example but adds a print-field width of 10 for both arguments. Remember
that a width specification must appear to the right of the colon; otherwise it
would function as a position number.

5.8.2 Text Justification: “ fill” and “align” Characters


The fill and align characters are optional, but the fill character can
appear only if the align character does.
ntax
Key Sy

[[fill]align]
Placement: these items, if they appear within a print-field specification,
precede all other parts of the syntax, including width. Here’s an example con-
taining fill, align, and width:
{:->24}

From the Library of Vineeth Babu

Overland_Book.indb 164 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 165
The next example uses this specification in context:
print('{:->24}'.format('Hey Bill G, pick me!'))
This prints
----Hey Bill G, pick me!
Let’s examine each part of this print field, {:->24}. Here’s the breakdown.

◗ The colon (:) is the first item to appear inside the print-field spec when you’re
working with the format method (but not the global format function).
◗ After the colon, a fill and an align character appear. The minus sign (-) is
the fill character here, and the alignment is right justification (>).
◗ After fill and align are specified, the print-field width of 24 is given.

Because the argument to be printed (' Hey Bill G, pick me!') is 20 char-
acters in length but the print-field width is 24 characters, four copies of the fill
character, a minus sign in this case, are used for padding.

5
The fill character can be any character other than a curly brace. Note that
if you want to pad a number with zeros, you can alternatively use the '0' speci-
fier described in Section 5.8.4, “The Leading Zero Character (0).”
The align character must be one of the four values listed in Table 5.3.

Table 5.3. “Align” Characters Used in Formatting


ALIGN
CHARACTER MEANING
< Left justify. This is the default for string data.
> Right justify. This is the default for numbers.
^ Center the text in the middle of the print field. (This slightly favors
left justification when the text can't be centered perfectly.)
= Place all padding characters between the sign character (+ or –)
and the number to be printed. This specification is valid only for
numeric data.

A fill (or padding) character is recognized as such only if there is an align


character just after it (<, >, ^, or =).
print('{:>7}'.format('Tom')) # Print ' Tom'
print('{:@>7}'.format('Lady')) # Print '@@@Lady'
print('{:*>7}'.format('Bill')) # Print '***Bill'

From the Library of Vineeth Babu

Overland_Book.indb 165 4/30/19 1:37 PM


166 Chapter 5 Formatting Text Precisely

In the first of these examples, no fill character is specified, so a default


value of a blank space is used to pad the print field. In the second and third
cases, fill characters of an ampersand (@) and an asterisk (*) are used.
If we were to instead use < to specify left justification, padding would
be placed on the right (although note that left justification is the default for
strings). So the previous examples would be revised:
print('{:<7}'.format('Tom')) # Print 'Tom '
print('{:@<7}'.format('Lady')) # Print 'Lady@@@'
print('{:*<7}'.format('Bill')) # Print 'Bill***'
The next few examples demonstrate the use of ^ to specify centering of the
data; padding appears on either side of the text.
fss = '{:^10}Jones'
print([Link]('Tom')) # Print ' Tom Jones'
fss = '{:@^10}'
print([Link]('Lady')) # Print '@@@Lady@@@'
fss = '{:*^10}'
print([Link]('Bill')) # Print '***Bill***'
Finally, the next examples show the use of = to specify padding between a
sign character (+ or -) and numeric data. The second case uses a zero as a fill
character.
print('{:=8}'.format(-1250)) # Print '- 1250'
print('{:0=8}'.format(-1250)) # Print '-0001250'

Note Ë Remember (and sorry if we’re getting a little redundant about this), all the
examples for the spec grammar apply to the global format function as well.
But the format function, as opposed to the format method, does not use curly
braces to create multiple print fields. It works on only one print field at a time.
Here’s an example:
print(format('Lady', '@<7')) # Print 'Lady@@@'
Ç Note

5.8.3 The “sign” Character


The sign character, which is usually a plus sign (+) if used at all, helps deter-
mine whether or not a plus or minus sign is printed in a numeric field.
Placement: The sign character comes after the fill and align charac-
ters, if included, but before other parts of spec. In particular, it precedes the
width. Table 5.4 lists the possible values for this character.

From the Library of Vineeth Babu

Overland_Book.indb 166 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 167
Table 5.4. “Sign” Characters for the “format” Method
CHARACTER MEANING
+ Prints a plus sign (+) for nonnegative numbers; prints a minus sign
(–) for negative numbers, as usual.
- Prints a minus sign for negative numbers only. This is the default
behavior.
(blank space) Prints a blank space where a plus sign would go, for nonnegative
numbers; prints a minus sign for negative numbers, as usual. This
is useful for getting numbers to line up nicely in tabs, whether or
not a negative sign is present.

A simple example illustrates the use of the sign character.


print('results>{: },{:+},{:-}'.format(25, 25, 25))
This example prints
results> 25,+25,25

5
Notice how there’s an extra space in front of the first occurrence of 25,
even though it’s nonnegative; however, if the print fields had definite widths
assigned—which they do not in this case—that character would produce no
difference.
This next example applies the same formatting to three negative values (–25).
print('results>{: },{:+},{:-}'.format(-25, -25, -25))
This example prints the following output, illustrating that negative num-
bers are always printed with a minus sign.
results>-25,-25,-25

5.8.4 The Leading-Zero Character (0)


This character specifies padding a 0 digit character for numbers, causing a “0”
to be used instead of spaces. Although you can achieve similar effects by spec-
ifying align and fill characters, this technique is slightly less verbose.
Placement: This character, if used, immediately precedes the width specifica-
tion. Essentially, it amounts to adding a leading-zero prefix (0) to the width itself.
For example, the following statement causes leading zeros to be printed
whenever the text to be displayed is smaller than the print-field width.
i, j = 125, 25156
print('{:07} {:010}.'.format(i, j))

From the Library of Vineeth Babu

Overland_Book.indb 167 4/30/19 1:37 PM


168 Chapter 5 Formatting Text Precisely

This prints
0000125 0000025156.
Here’s another example:
print('{:08}'.format(375)) # This prints 00000375
The same results could have been achieved by using fill and align char-
acters, but because you can’t specify fill without also explicitly specifying
align, that approach is slightly more verbose.
fss = '{:0>7} {:0>10}'
Although these two approaches—specifying 0 as fill character and specify-
ing a leading zero—are often identical in effect, there are situations in which
the two cause different results. A fill character is not part of the number itself
and is therefore not affected by the comma, described in the next section.
There’s also interaction with the plus/minus sign. If you try the following,
you’ll see a difference in the location where the plus sign (+) gets printed.
print('{:0>+10} {:+010}'.format(25, 25))
This example prints
0000000+25 +000000025

5.8.5 Thousands Place Separator


One of the most convenient features of the format method is the ability to use
a thousands place separator with numeric output. How often have you seen
output like the following?
The US owes 21035786433031 dollars.
How much is this really? One’s eyes glaze over, which probably is a happy
result for most politicians. It just looks like “a big number.”
This number is much more readable if printed as follows—although it’s
still too large for most mortals to comprehend. But if you have a little numeric
aptitude, you’ll see that this is not just 21 million or 21 billion, but rather 21
trillion.
The US owes 21,035,786,433,031 dollars.
Placement: The comma follows the width specifier and precedes the pre-
cision specifier, it if appears. The comma should be the last item other than
precision and type, if they appear.
You may want to refer to the syntax display at the beginning of Section 5.8.1.

From the Library of Vineeth Babu

Overland_Book.indb 168 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 169
The following examples use a {:,} print field. This is a simple specifica-
tion because it just involves a comma to the immediate right of the colon—all
inside a print field.
fss1 = 'The USA owes {:,} dollars.'
print([Link](21000000000))
fss2 = 'The sun is {:,} miles away.'
print([Link](93000000))
These statements print
The USA owes 21,000,000,000,000 dollars.
The sun is 93,000,000 miles away.
The next example uses the comma in combination with fill and align
characters * and >, respectively. The width specifier is 12. Notice that the
comma (,) appears just after width; it’s the last item before the closing curly
brace.
n = 4500000
print('The amount on the check was ${:*>12,}'.format(n))

5
This example prints
The amount on the check was $***4,500,000
The print width of 12 includes room for the number that was printed,
including the commas (a total of nine characters); therefore, this example uses
three fill characters. The fill character in this case is an asterisk (*). The dollar
sign ($) is not part of this calculation because it is a literal character and is
printed as is.
If there is a leading-zero character as described in Section 5.8.4 (as opposed to
a 0 fill character), the zeros are also grouped with commas. Here’s an example:
print('The amount is {:011,}'.format(13000))
This example prints
The amount is 000,013,000
In this case, the leading zeros are grouped with commas, because all the
zeros are considered part of the number itself.
A print-field size of 12 (or any other multiple of 4), creates a conflict with
the comma, because an initial comma cannot be part of a valid number.
Therefore, Python adds an additional leading zero in that special case.
n = 13000
print('The amount is {:012,}'.format(n))

From the Library of Vineeth Babu

Overland_Book.indb 169 4/30/19 1:37 PM


170 Chapter 5 Formatting Text Precisely

This prints
The amount is 0,000,013,000
But if 0 is specified as a fill character instead of as a leading zero, the zeros
are not considered part of the number and are not grouped with commas.
Note the placement of the 0 here relative to the right justify (>) sign. This time
it’s just to the left of this sign.
print('The amount is {:0>11,}'.format(n))
This prints
The amount is 0000013,000

5.8.6 Controlling Precision


The precision specifier is a number provided primarily for use with floating-
point values, although it can also be used with strings. It causes rounding and
truncation. The precision of a floating-point number is the maximum number of
digits to be printed, both to the right and to the left of the decimal point.
Precision can also be used, in the case of fixed-point format (which has an
f type specifier), to ensure that an exact number of digits are always printed
to the right of the decimal point, helping floating-point values to line up in
a table.
Placement: Precision is always a number to the immediate right of a decimal
point (.). It’s the last item in a spec field, with the exception of the one-letter
type specifier described in the next section.
ntax
Key Sy

.precision
Here are some simple examples in which precision is used to limit the total
number of digits printed.
pi = 3.14159265
phi = 1.618

fss = '{:.2} + {:.2} = {:.2}'


print([Link](pi, phi, pi + phi))
These statements print the following results. Note that each number has
exactly two total digits:
3.1 + 1.6 = 4.8
This statement looks inaccurate, due to rounding errors. For each number,
only two digits total are printed. Printing three digits for each number yields
better results.

From the Library of Vineeth Babu

Overland_Book.indb 170 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 171
pi = 3.14159265
phi = 1.618

fss = '{:.3} + {:.3} = {:.3}'


print([Link](pi, phi, pi + phi))
This prints
3.14 + 1.62 = 4.76
The last digit to appear, in all cases of limited precision, is rounded as
appropriate.
If you want to use precision to print numbers in fixed-point format, com-
bine width and precision with an f type specifier at the end of the print
field. Here’s an example:
fss = ' {:10.3f}\n {:10.3f}'
print([Link](22.1, 1000.007))
This prints

5
22.100
1000.007
Notice how well things line up in this case. In this context (with the f type
specifier) the precision specifies not the total number of digits but the number
of digits just to the right of the decimal point—which are padded with trailing
zeros if needed.
The example can be combined with other features, such as the thousands
separator, which comes after the width but before precision. Therefore, in this
example, each comma comes right after 10, the width specifier.
fss = ' {:10,.3f}\n {:10,.3f}'
print([Link](22333.1, 1000.007))
This example prints
22,333.100
1,000.007
The fixed-point format f, in combination with width and precision, is
useful for creating tables in which the numbers line up. Here’s an example:
fss = ' {:10.2f}'
for x in [22.7, 3.1415, 555.5, 29, 1010.013]:
print([Link](x))

From the Library of Vineeth Babu

Overland_Book.indb 171 4/30/19 1:37 PM


172 Chapter 5 Formatting Text Precisely

This example prints


22.70
3.14
555.50
29.00
1010.01

5.8.7 “Precision” Used with Strings (Truncation)


When used with strings, the precision specifier potentially causes trunca-
tion. If the length of the string to be printed is greater than the precision,
the text is truncated. Here’s an example:
print('{:.5}'.format('Superannuated.')) # Prints 'Super'
print('{:.5}'.format('Excellent!')) # Prints 'Excel'
print('{:.5}'.format('Sam')) # Prints 'Sam'
In these examples, if the string to be printed is shorter than the precision,
there is no effect. But the next examples use a combination of fill character,
alignment, width, and precision.
fss = '{:*<6.6}'
Let’s break down what these symbols mean.

◗ The fill and align characters are * and <, respectively. The < symbol spec-
ifies left justification, so asterisks are used for padding on the right, if needed.
◗ The width character is 6, so any string shorter than 6 characters in length is
padded after being left justified.
◗ The precision (the character after the dot) is also 6, so any string longer
than 6 characters is truncated.

Let’s apply this format to several strings.


print([Link]('Tom'))
print([Link]('Mike'))
print([Link]('Rodney'))
print([Link]('Hannibal'))
print([Link]('Mortimer'))
These statements could have easily been written by using the global format
function. Notice the similarities as well as the differences; the previous exam-
ples involved the format string '{:*<6.6}'.

From the Library of Vineeth Babu

Overland_Book.indb 172 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 173
print(format('Tom', '*<6.6'))
print(format('Mike', '*<6.6'))
print(format('Rodney', '*<6.6'))
print(format('Hannibal', '*<6.6'))
print(format('Mortimer', '*<6.6'))
In either case—that is, for either block of code just shown—the output is
Tom***
Mike**
Rodney
Hannib
Mortim
The width and precision need not be the same. For example, the follow-
ing format specifies a width of 5, so any string shorter than 5 is padded; but
the precision is 10, so any string longer than 10 is truncated.
fss = '{:*<5.10}'

5
5.8.8 “Type” Specifiers
The last item in the spec syntax is the type specifier, which influences how
the data to be printed is interpreted. It’s limited to one character and has one
of the values listed in Table 5.5.
Placement: When the type specifier is used, it’s the very last item in the
spec syntax.

Table 5.5. “Type” Specifiers Recognized by the Format Method


TYPE
CHARACTER DESCRIPTION
b Display number in binary.
c Translate a number into its ASCII or Unicode character.
d Display number in decimal format (the default).
e Display a floating-point value using exponential format, with
lowercase e—for example, 12e+20.
E Same as e, but display with an uppercase E—for example, 12E+20.
f or F Display number in fixed-point format.
g Use format e or f, whichever is shorter.
G Same as g, but use uppercase E.
▼ continued on next page

From the Library of Vineeth Babu

Overland_Book.indb 173 4/30/19 1:37 PM


174 Chapter 5 Formatting Text Precisely

Table 5.5. “Type” Specifiers Recognized by the Format Method (continued)


TYPE
CHARACTER DESCRIPTION
n Use the local format for displaying numbers. For example, instead
of printing 1,200.34, the American format, use the European
format: 1.200,34.
o Display integer in octal format (base 8).
x Display integer in hexadecimal format, using lowercase letters to
represent digits greater than 9.
X Same as x, but uses uppercase letters for hex digits.
% Displays a number as a percentage: Multiply by 100 and then add a
percent sign (%).

The next five sections illustrate specific uses of the type specifier.

5.8.9 Displaying in Binary Radix


To print an integer in binary radix (base 2), use the b specifier. The result is a
series of 1’s and 0’s. For example, the following statement displays 5, 6, and 16
in binary radix:
print('{:b} {:b} {:b}'.format(5, 6, 16))
This prints the following:
101 110 10000
You can optionally use the # specifier to automatically put in radix pre-
fixes, such as 0b for binary. This formatting character is placed after the fill,
align, and sign characters if they appear but before the type specifier. (It
also precedes width and precision.) Here’s an example:
print('{:#b}'.format(7))
This prints
0b111

5.8.10 Displaying in Octal and Hex Radix


The octal (base 8) and hexadecimal (base 16) radixes are specified by the o, x,
and X type specifiers. The last two specify lowercase and uppercase hexadeci-
mal, respectively, for digits greater than 9.

From the Library of Vineeth Babu

Overland_Book.indb 174 4/30/19 1:37 PM


5.8 The “spec” Field of the “format” Function and Method 175
The following example illustrates how each format displays decimal 63:
print('{:o}, {:x}, {:X}'.format(63, 63, 63))
This could also be written as
print('{0:o}, {0:x}, {0:X}'.format(63))
In either case, this prints
77, 3f, 3F
Again, you can have the format method automatically insert a radix prefix
by using the # specifier, which is placed after the fill, align, and sign char-
acters if they appear. Here’s an example:
print('{0:#o}, {0:#x}, {0:#X}'.format(63))
This statement prints
0o77, 0x3f, 0X3F

5
5.8.11 Displaying Percentages
A common use of formatting is to turn a number into a percentage—for exam-
ple, displaying 0.5 as 50% and displaying 1.25 as 125%. You can perform that
task yourself, but the % type specifier automates the process.
The percent format character (%) multiplies the value by 100 and then
appends a percent sign. Here’s an example:
print('You own {:%} of the shares.'.format(.517))
This example prints
You own 51.700000% of the shares.
If a precision is used in combination with the % type specifier, the preci-
sion controls the number of digits to the right of the decimal point as usual—
but after first multiplying by 100. Here’s an example:
print('{:.2%} of {:.2%} of 40...'.format(0.231, 0.5))
This prints
23.10% of 50.00% of 40...
As with fixed-point format, if you want to print percentages so that they
line up nicely in a table, then specify both width and precision specifiers.

From the Library of Vineeth Babu

Overland_Book.indb 175 4/30/19 1:37 PM


176 Chapter 5 Formatting Text Precisely

5.8.12 Binary Radix Example


The format method provides the tools to print numeric output in binary,
octal, or hex radix. You can combine that capability with int conversions to
create a binary calculator that uses both binary input and output—that is to
say, its input and output features strings of 1’s and 0’s.
This next example performs binary addition, displaying results in both
decimal and binary.
def calc_binary():
print('Enter values in binary only!')
b1 = int(input('Enter b1:'), 2)
b2 = int(input('Enter b2:'), 2)
print('Total is: {:#b}'.format(b1 + b2))
print('{} + {} = {}'.format(b1, b2, b1 + b2))
Here’s a sample session with user input in bold.
>>> calc_binary()
Enter values in binary only!
Enter b1: 101
Enter b2: 1010
Total is: 0b1111
5 + 10 = 15
The key format-specification string is in the following statement:
print('Total is: {:#b}'.format(b1 + b2))
To the right of the colon are two characters: the pound sign (#), which
causes the radix symbol, 0b, to be printed; and the type specifier, b, which
causes the use of binary radix—that is, base 2.
'{:#b}'
The second output line uses simple print fields, which default to decimal
output.
'{} + {} = {}'

5.9 Variable-Size Fields


Section 5.3 explained how to use variable-width print fields with the format-
ting operator (%). The format method provides the same, or more, flexibility.
You can leave any part of the specifier syntax open to be filled in later.

From the Library of Vineeth Babu

Overland_Book.indb 176 4/30/19 1:37 PM


5.9 Variable-Size Fields 177
The general rule for variable fields within the format method is to place a
nested pair of curly braces, {}, within a print field, where you would ordinarily
put a fixed value. The method then scans the format string and performs a
substitution, replacing a nested {} minifield with the corresponding item from
the argument list. Finally, the string is applied to formatting as usual.
The value to be filled in is read from the argument list.
>>> 'Here is a num: {:{}.{}}'.format(1.2345, 10, 4)
'Here is a num: 1.234'
This example works as if it were written as follows, with the numbers 10
and 4 substituting for the two inner sets of curly braces (so the previous exam-
ple has the same effect as this):
'Here is a num: {:10.4}'.format(1.2345)
The arguments in this case are integer expressions, so the variable-length
example could have been written with variable references:
a, b = 10, 4
'Here is a num: {:{}.{}}'.format(1.2345, a, b)

5
The way in which arguments are applied with this method is slightly differ-
ent from the way they work with the formatting operator (Section 5.3).
The difference is this: When you use the format method this way, the data
object comes first in the list of arguments; the expressions that alter format-
ting come immediately after. This is true even with multiple print fields. For
example:
>>> '{:{}} {:{}}!'.format('Hi', 3, 'there', 7)
'Hi there !'
Note that with this technology, strings are left justified by default.
The use of position numbers to clarify order is recommended. Use of these
numbers helps keep the meaning of the expressions clearer and more predictable.
The example just shown could well be revised so that it uses the following
expression:
>>> '{0:{1}} {2:{3}}!'.format('Hi', 3, 'there', 7)
'Hi there !'
The meaning of the format is easier to interpret with the position numbers.
By looking at the placement of the numbers in this example, you should be
able to see that position indexes 0 and 2 (corresponding to first and third argu-
ment positions, respectively) refer to the first and third arguments to format.

From the Library of Vineeth Babu

Overland_Book.indb 177 4/30/19 1:37 PM


178 Chapter 5 Formatting Text Precisely

Meanwhile, position indexes 1 and 3 (corresponding to second and fourth


arguments) refer to the integer expressions 3 and 7, which become the print-
field widths of the respective fields.
Similarly, the following example shows the use of position indexes to display
the number 3.141592, using a print-field width of 8 and a fixed-point display of
3 digits to the right of the decimal point. Note that numbers are right justified
by default.
>>> 'Pi is approx. {0:{1}.{2}f}'.format(3.141592, 8, 3)
'Pi is approx. 3.142'
Remember that both 8 and 3, in this case, could be replaced by any inte-
ger expressions, including variables, which is really the whole point of this
feature.
>>> a, b = 8, 3
>>> 'Pi is approx. {0:{1}.{2}f}'.format(3.141592, a, b)
'Pi is approx. 3.142'
This example is equivalent to the following in its effects:
'Pi is approx. {0:8.3f}'.format(3.141592)
Position names are also very useful in this context, as a way of making the
intent of the formatting especially clear. Here’s an example:
>>> 'Pi is {pi:{fill}{align}{width}.{prec}f}'.format(
pi=3.141592, width=8, prec=3, fill='0', align='>')
Again, the values of the arguments can be filled in with numeric and string
variables, which in turn allow adjustment of these values during execution of
the code.

Chapter 5 Summary
The Python core language provides three techniques for formatting out-
put strings. One is to use the string-class formatting operator (%) on display
strings; these strings contain print-field specifiers similar to those used in the
C language, with “printf” functions.
The second technique involves the format function. This approach allows
you to specify not only things such as width and precision, but also thousands
place grouping and handling of percentages.
The third technique, the format method of the string class, builds on the
global format function but provides the most flexibility of all with multiple
print fields.

From the Library of Vineeth Babu

Overland_Book.indb 178 4/30/19 1:37 PM


Suggested Problems 179
The next two chapters take text-handling capabilities to a higher level still
by utilizing the regular expression package.

Chapter 5 Review Questions


1 What, if any, are the advantages of using the first major technique—the
string-class format operator (%)?
2 What, if any, are the advantages of using the global format function?
3 What advantage does the format method of the string class have, if any, com-
pared to use of the global format function?
4 How exactly are these two techniques—format function and the format
method of the string class—related, if at all?
5 How, in turn do these two techniques involve the _ _format_ _ methods of
individual classes, if at all?
6 What features of the format operator (%) do you need, at minimum, to print a

5
table that lines up floating-point numbers in a nice column?
7 What features of the format method do you need, at minimum, to print a
table that lines up floating-point numbers in a nice column?
8 Cite at least one example in which repr and str provide a different represen-
tation of a piece of data. Why does the repr version print more characters?
9 The format method enables you to specify a zero (0) as a fill character or as a
leading zero to numeric expressions. Is this entirely redundant syntax? Or can
you give at least one example in which the result might be different?
10 Of the three techniques—format operator (%), global format function, and
format method of the string class—which support the specification of
variable-length print fields?

Chapter 5 Suggested Problems


1 Write a hexadecimal calculator program that takes any number of hexadeci-
mal numbers—breaking only when the user enters an empty string—and then
outputs the sum, again, in hexadecimal numbers. (Hint: Remember that the
int conversion, as explained in Chapter 1, “Review of the Fundamentals,”
enables conversion of strings using hexadecimal radix.)

From the Library of Vineeth Babu

Overland_Book.indb 179 4/30/19 1:37 PM


180 Chapter 5 Formatting Text Precisely

2 Write a two-dimensional array program that does the following: Take integer
input in the form of five rows of five columns each. Then, by looking at the
maximum print width needed by the entire set (that is, the number of digits in
the biggest number), determine the ideal print width for every cell in the table.
This should be a uniform width, but one that contains the largest entry in the
table. Use variable-length print fields to print this table.
3 Do the same application just described but for floating-point numbers. The
printing of the table should output all the numbers in nice-looking columns.

From the Library of Vineeth Babu

Overland_Book.indb 180 4/30/19 1:37 PM


6 Regular
Expressions, Part I
Increasingly, the most sophisticated computer software deals with patterns—
for example, speech patterns and the recognition of images. This chapter deals
with the former: how to recognize patterns of words and characters. Although
you can’t construct a human language translator with these techniques alone,
they are a start.
That’s what regular expressions are for. A regular expression is a pattern
you specify, using special characters to represent combinations of specified
characters, digits, and words. It amounts to learning a new language, but it’s
a relatively simple one, and once you learn it, this technology lets you to do a
great deal in a small space—sometimes only a statement or two—that would
otherwise require many lines.

Note Ë Regular expression syntax has a variety of flavors. The Python regular-
expression package conforms to the Perl standard, which is an advanced and
flexible version.
Ç Note

6.1 Introduction to Regular Expressions


A regular expression can be as simple as a series of characters that match a
given word. For example, the following pattern matches the word “cat”; no
surprise there.
cat

181
From the Library of Vineeth Babu

Overland_Book.indb 181 4/30/19 1:37 PM


182 Chapter 6 Regular Expressions, Part I

But what if you wanted to match a larger set of words? For example, let’s
say you wanted to match the following combination of letters:

◗ Match a “c” character.


◗ Match any number of “a” characters, but at least one.
◗ Match a “t” character.

Here’s the regular expression that implements these criteria:


ca+t
With regular expressions (as with formatting specifiers in the previous chap-
ter), there’s a fundamental difference between literal and special characters.
Literal characters, such as “c” and “t” in this example, must be matched
exactly, or the result is failure to match. Most characters are literal characters,
and you should assume that a character is literal unless a special character
changes its meaning. All letters and digits are, by themselves, literal charac-
ters; in contrast, punctuation characters are usually special; they change the
meaning of nearby characters.
The plus sign (+) is a special character. It does not cause the regular-
expression processor to look for a plus sign. Instead, it forms a subexpression,
together with “a” that says, “Match one or more ‘a’ characters.”
The pattern ca+t therefore matches any of the following:
cat
caat
caaat
caaaat
What if you wanted to match an actual plus sign? In that case, you’d use
a backslash (\) to create an escape sequence. One of the functions of escape
sequences is to turn a special character back into a literal character.
So the following regular expression matches ca+t exactly:
ca\+t
Another important operator is the multiplication sign (*), which means
“zero or more occurrences of the preceding expression.” Therefore, the
expression ca*t matches any of the following:
ct
cat
caat
caaaaaat

From the Library of Vineeth Babu

Overland_Book.indb 182 4/30/19 1:37 PM


6.2 A Practical Example: Phone Numbers 183
Notably, this pattern matches “ct”. It’s important to keep in mind that the
asterisk is an expression modifier and should not be evaluated separately.
Instead, observe this rule.

✱ The asterisk (*) modifies the meaning of the expression immediately preced-
ing it, so the a, together with the *, matches zero or more “a” characters.

You can break this down syntactically, as shown in Figure 6.1. The literal
characters “c” and “t” each match a single character, but a* forms a unit that
says, “Match zero or more occurrences of ‘a’.”

ca*t
Match “c” exactly. Match “c” exactly.

This forms a unit


that matches zero
or more “a” characters.
Figure 6.1. Parsing a simple expression

The plus sign (+), introduced earlier, works in a similar way. The plus sign,

6
together with the character or group that precedes it, means “Match one or
more instances of this expression.”

6.2 A Practical Example: Phone Numbers


Suppose you want to write a verification function for phone numbers. We
might think of the pattern as follows, in which # represents a digit:
###-###-####
With regular-expression syntax, you’d write the pattern this way:
\d\d\d-\d\d\d-\d\d\d\d
In this case, the backslash (\) continues to act as the escape character, but
its action here is not to make “d” a literal character but to create a special
meaning.
The subexpression \d means to match any one-digit character. Another
way to express a digit character is to use the following subexpression:
[0-9]

From the Library of Vineeth Babu

Overland_Book.indb 183 4/30/19 1:37 PM


184 Chapter 6 Regular Expressions, Part I

However, \d is only two characters long instead of five and is therefore


more succinct.
Here’s a complete Python program that implements this regular-expression
pattern for verifying a telephone number.
import re
pattern = r'\d\d\d-\d\d\d-\d\d\d\d'

s = input('Enter tel. number: ')


if [Link](pattern, s):
print('Number accepted.')
else:
print('Incorrect format.')
The first thing the example does is import the regular-expression package.
This needs to be done only one time for each module (source file) that uses
regular-expression abilities.
import re
Next, the example specifies the regular-expression pattern, coded as a raw
string. With raw strings, Python itself does not translate any of the charac-
ters; it does not translate \n as a newline, for example, or \b by ringing a bell.
Instead, all text in a raw string is passed directly along to the regular-expression
evaluator.
ntax
Key Sy

r'string' or
r"string"
After prompting the user for input, the program then calls the match
function, which is qualified as [Link] because it is imported from the re
package.
[Link](pattern, s)
If the pattern argument matches the target string (s in this case), the func-
tion returns a match object; otherwise it returns the value None, which con-
verts to the Boolean value False.
You can therefore use the value returned as if it were a Boolean value. If a
match is confirmed, True is returned; otherwise, False is returned.

Note Ë If you forget to include r (the raw-string indicator), this particular exam-
ple still works, but your code will be more reliable if you always use the r when
specifying regular-expression patterns. Python string interpretation does not
work precisely the way C/C++ string interpretation does. In those languages,

From the Library of Vineeth Babu

Overland_Book.indb 184 4/30/19 1:37 PM


6.3 Refining Matches 185
every backslash is automatically treated with special meaning unless you use
a raw string. (Late versions of C++ also support a raw-string feature.) With
Python, certain subexpressions, such as \n have special meaning. But other-
wise, a backslash is accepted as a literal character.
Because Python sometimes interprets a backslash literally and sometimes
doesn’t, results can be unreliable and unpredictable, unless you get in the
habit of always using raw strings. Therefore, the safe policy is to always place
an r in front of regular-expression specification strings.
Ç Note

6.3 Refining Matches


Although the phone-number example featured in the previous section works,
it has some limitations. The [Link] function returns a “true” value any
time the pattern matches the beginning of the target string. It does not have
to match the entire string. So the code confirms a match for the following
phone-number pattern:
555-123-5000
But it also matches the following:
555-345-5000000

6
If you want to restrict positive results to exact matches—so that the entire
string has to match the pattern with nothing left over—you can add the spe-
cial character $, which means “end of string.” This character causes the match
to fail if any additional text is detected beyond the specified pattern.
pattern = r'\d\d\d-\d\d\d-\d\d\d\d$'
There are other ways you might want to refine the regular-expression pat-
tern. For example, you might want to permit input matching either of the fol-
lowing formats:
555-123-5000
555 123 5000
To accommodate both these patterns, you need to create a character set,
which allows for more than one possible value in a particular position. For
example, the following expression says to match either an “a” or a “b”, but not
both:
[ab]

From the Library of Vineeth Babu

Overland_Book.indb 185 4/30/19 1:37 PM


186 Chapter 6 Regular Expressions, Part I

It’s possible to put many characters in a character set. But only one of
the characters will be matched at a time. For example, the following range
matches exactly one character: an “a”, “b”, “c”, or “d” in the next position.
[abcd]
Likewise, the following expression says that either a space or a minus sign
(–) can be matched—which is what we want in this case:
[ -]
In this context, the square brackets are the only special characters; the two
characters inside are literal and at most one of them will be matched. The
minus sign often has a special meaning within square brackets, but not when
it appears in the very front or end of the characters inside the brackets.
Here’s the full regular expression we need:
pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d$'
Now, putting everything together with the refined pattern we’ve come up
with in this section, here’s the complete example:
import re
pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d$'

s = input('Enter tel. number: ')


if [Link](pattern, s):
print('Number accepted.')
else:
print('Incorrect format.')
To review, here’s what the Python regular-expression evaluator does, given
this pattern.

◗ It attempts to match three digits: \d\d\d.


◗ It then reads the character set [ -] and attempts to match either a space or
a minus sign, but not both—that is, only one of these two characters will be
matched here.
◗ It attempts to match three more digits: \d\d\d.
◗ Again, it attempts to match a space or a minus sign.
◗ It attempts to match four more digits : \d\d\d\d.

From the Library of Vineeth Babu

Overland_Book.indb 186 4/30/19 1:37 PM


6.3 Refining Matches 187
◗ It must match an end-of-string, $. This means there cannot be any more input
in the target string after these last four digits are matched.

Another way to enforce an exact match, so that no trailing data is permit-


ted, is to use the [Link] method instead of [Link]. You could use
the following statements to match the telephone-number pattern; the use of
fullmatch makes the end-of-string character unnecessary in this case.
import re

pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d'

s = input('Enter tel. number: ')


if [Link](pattern, s):
print('Number accepted.')
else:
print('Incorrect format.')
So far, this chapter has only scratched the surface of what regular-expression
syntax can do. Section 6.5 explains the syntax in greater detail. But in master-
ing this syntax, there are several principles to keep in mind.

◗ A number of characters have special meaning when placed in a regular-

6
expression pattern. It’s a good idea to become familiar with all of them. These
include most punctuation characters, such as + and *.
◗ Any characters that do not have special meaning to the Python regular-
expression interpreter are considered literal characters. The regular-expression
interpreter attempts to match these exactly.
◗ The backslash can be used to “escape” special characters, making them into
literal characters. The backslash can also add special meaning to certain ordinary
characters—for example, causing \d to mean “any digit” rather than a “d”.

Admittedly, this might be a little confusing at first. If a character (such as *)


is special to begin with, escaping it (preceding it with a backslash) takes away
that special meaning. But in other cases, escaping a character gives it special
meaning.
Yes, both those things are true! But if you look at enough examples, it
should make sense.
Here’s a short program that tests for the validity of Social Security num-
bers. It’s similar, but not identical, to that for checking the format of telephone

From the Library of Vineeth Babu

Overland_Book.indb 187 4/30/19 1:37 PM


188 Chapter 6 Regular Expressions, Part I

numbers. This pattern looks for three digits, a minus sign, two digits, another
minus sign, and then four digits.
import re

pattern = r'\d\d\d-\d\d-\d\d\d\d$'

s = input('Enter SSN: ')


if [Link](pattern, s):
print('Number accepted.')
else:
print('Incorrect format.')

6.4 How Regular Expressions Work: Compiling Versus Running


Regular expressions can seem like magic. But the implementation is a stan-
dard, if a relatively advanced, topic in computer science. The processing of
regular expressions takes two major steps.

◗ A regular expression pattern is analyzed and then compiled into a series of


data structures collectively called a state machine.
◗ The actual process of matching is considered “run time” for the regular-
expression evaluator, as opposed to “compile time.” During run time, the pro-
gram traverses the state machine as it looks for a match.

Unless you’re going to implement a regular-expression package yourself,


it’s not necessary to understand how to create these state machines, only what
they do. But it’s important to understand this dichotomy between compile
time and runtime.
Let’s take another simple example. Just as the modifier + means “Match
one or more instances of the previous expression,” the modifier * means
“Match zero or more instances of the previous expression.” So consider this:
ca*b
This expression matches “cb” as well as “cab”, “caab”, “caaab”, and so
on. When this regular expression is compiled, it produces the state machine
shown in Figure 6.2.

From the Library of Vineeth Babu

Overland_Book.indb 188 4/30/19 1:37 PM


6.4 How Regular Expressions Work: Compiling Versus Running 189

1 2 3
c b
(Start) (Done!)
Figure 6.2. State machine for ca*b

The following list describes how the program traverses this state machine
to find a match at run time. Position 1 is the starting point.

◗ A character is read. If it’s a “c”, the machine goes to state 2. Reading any other
character causes failure.
◗ From state 2, either an “a” or a “b” can be read. If an “a” is read, the machine
stays in state 2. It can do this any number of times. If a “b” is read, the machine
transitions to state 3. Reading any other character causes failure.
◗ If the machine reaches state 3, it is finished, and success is reported.

This state machine illustrates some basic principles, simple though it is. In
particular, a state machine has to be compiled and then later traversed at run

6
time.

Note Ë The state machine diagrams in this chapter assume DFAs (determinis-
tic finite automata), whereas Python actually uses NFAs (nondeterministic
finite automata). This makes no difference to you unless you’re implementing
a regular-expression evaluator, something you’ll likely never need to do.
So if that’s the case, you can ignore the difference between DFAs and NFAs!
You’re welcome.
Ç Note

Here’s what you need to know: If you’re going to use the same regular-
expression pattern multiple times, it’s a good idea to compile that pattern into
a regular-expression object and then use that object repeatedly. The regex
package provides a method for this purpose called compile.
ntax
Key Sy

regex_object_name = [Link](pattern)

From the Library of Vineeth Babu

Overland_Book.indb 189 4/30/19 1:37 PM


190 Chapter 6 Regular Expressions, Part I

Here’s a full example using the compile function to create a regular expres-
sion object called reg1.
import re

reg1 = [Link](r'ca*b$') # Compile the pattern!

def test_item(s):
if [Link](reg1, s):
print(s, 'is a match.')
else:
print(s, 'is not a match!')

test_item('caab')
test_item('caaxxb')

This little program prints the following:


caab is a match.
caaxxb is not a match!
You could perform these tasks without precompiling a regular-expression
object. However, compiling can save execution time if you’re going to use the
same pattern more than once. Otherwise, Python may have to rebuild a state
machine multiple times when it could have been built only once.
As a point of comparison, Figure 6.3 shows a state machine that implements
the plus-sign (+), which means “one or more” rather than “zero or more.”

1 2 3 4
c a b
(Start) (Done!)
Figure 6.3. State machine for ca+b

Given this pattern, “cb” is not a successful match, but “cab”, “caab”, and
“caaab” are. This state machine requires the reading of at least one “a”. After
that, matching further “a” characters is optional, but it can match as many
instances of “a” in a row as it finds.

From the Library of Vineeth Babu

Overland_Book.indb 190 4/30/19 1:37 PM


6.4 How Regular Expressions Work: Compiling Versus Running 191
Another basic operator is the alteration operator (|), which means
“either-or.”
The following pattern matches an expression on either side of the bar. So
what exactly do you think the following means?
ax|yz
The alteration operator, |, has about the lowest precedence of any part of
the syntax. Therefore, this expression matches “ax” and “yz”, but not “axyz”.
If no parentheses are used, the expression is evaluated as if written this way:
(ax)|(yz)
Figure 6.4 shows the state machine that implements this expression.

2
a x
(Start) 1 4 (Done!)

y 3 z
Figure 6.4. State machine for (ax)|(yz)

Now consider following expression, which uses parentheses to change the

6
order of evaluation. With these parentheses, the alteration operator is inter-
preted to mean “either x or y but not both.”
a(x|y)z
The parentheses and the | symbol are all special characters. Figure 6.5
illustrates the state machine that is compiled from the expression a(x|y)z.

x
1 2 3 4
a z
y
(Start) (Done!)
Figure 6.5. State machine for a(x|y)z

From the Library of Vineeth Babu

Overland_Book.indb 191 4/30/19 1:37 PM


192 Chapter 6 Regular Expressions, Part I

This behavior is the same as that for the following expression, which uses a
character set rather than alteration:
a[xy]z
Is there a difference between alteration and a character set? Yes: A charac-
ter set always matches one character of text (although it may be part of a more
complex pattern, of course). Alteration, in contrast, may involve groups lon-
ger than a single character. For example, the following pattern matches either
“cat” or “dog” in its entirety—but not “catdog”:
cat|dog

6.5 Ignoring Case, and Other Function Flags


When a regular-expression pattern is compiled or being interpreted directly
(through a call to a function such as [Link]), you can combine a series of
regex flags to influence behavior. A commonly used flag is the [Link]
flag. For example, the following code prints “Success.”
if [Link]('m*ack', 'Mack the Knife', [Link]):
print ('Success.')
The pattern 'm*ack' matches the word “Mack,” because the flag tells
Python to ignore the case of the letters. Watch out for Mack the Knife even if
he doesn’t know how to use uppercase!
The following does the same thing, because it uses the I abbreviation for
the IGNORECASE flag, so [Link] and re.I mean the same thing.
if [Link]('m*ack', 'Mack the Knife', re.I):
print ('Success.')
Binary flags may be combined using the binary OR operator (|). So you can
turn on both the I and DEBUG flags as follows:
if [Link]('m*ack', 'Mack the Knife', re.I | [Link]):
print ('Success.')
Table 6.1 summarizes the flags that can be used with regular-expression
searching, matching, compiling, and so on.

From the Library of Vineeth Babu

Overland_Book.indb 192 4/30/19 1:37 PM


6.6 Regular Expressions: Basic Syntax Summary 193
Table 6.1. Regular-Expression Flags
FLAG ABBREVIATION DESCRIPTION
ASCII A Assume ASCII settings.
IGNORECASE I All searches and matches are case-insensitive.
DEBUG When the operation is carried out within IDLE, debugging
information is printed.
LOCALE L Causes matching of alphanumeric characters, word boundaries, and
digits to recognize LOCALE settings.
MULTILINE M Causes the special characters ^ and $ to match beginnings and ends of
lines as well as the beginning and end of the string.
DOTALL S The dot operator (.) matches all characters, including end of line (\n).
UNICODE U Causes matching of alphanumeric characters, word boundaries, and
digits to recognize characters that UNICODE classifies as such.
VERBOSE X White space within patterns is ignored except when part of a charac-
ter class. This enables the writing of prettier expressions in code.

6.6 Regular Expressions: Basic Syntax Summary


Learning regular-expression syntax is a little like learning a new language;
but once you learn it, you’ll be able to create patterns of endless variety. As

6
powerful as this language is, it can be broken down into a few major elements.

◗ Meta characters: These are tools for specifying either a specific character or
one of a number of characters, such as “any digit” or “any alphanumeric char-
acter.” Each of these characters matches one character at a time.
◗ Character sets: This part of the syntax also matches one character at a time—
in this case, giving a set of values from which to match.
◗ Expression quantifiers: These are operators that enable you to combine indi-
vidual characters, including wildcards, into patterns of expressions that can
be repeated any number of times.
◗ Groups: You can use parentheses to combine smaller expressions into larger
ones.

From the Library of Vineeth Babu

Overland_Book.indb 193 4/30/19 1:37 PM


194 Chapter 6 Regular Expressions, Part I

6.6.1 Meta Characters


Table 6.2 lists meta characters, including wildcards that can be matched by
any of a group, or range, of characters. For example, a dot (.) matches any one
character, subject to a few limitations.
These meta characters match exactly one character at a time. Section 6.6.3,
“Pattern Quantifiers,” shows how to match a variable number of characters. The
combination of wildcards, together with quantifiers, provides amazing flexibility.
Meta characters include not only those shown in the table but also the
standard escape characters: These include \t (tab), \n (newline), \r (carriage
return), \f (form feed), and \v (vertical tab).

Table 6.2. Regular-Expression Meta Characters


SPECIAL
CHARACTER NAME/DESCRIPTION
. Dot. Matches any one character except a newline. If the DOTALL flag is
enabled, it matches any character at all.
^ Caret. Matches the beginning of the string. If the MULTILINE flag
is enabled, it also matches beginning of lines (any character after a
newline).
$ Matches the end of a string. If the MULTILINE flag is enabled, it
matches the end of a line (the last character before a newline or end of
string).
\A Matches beginning of a string.
\b Word boundary. For example, r'ish\b' matches 'ish is' and
'ish)' but not 'ishmael'.
\B Nonword boundary. Matches only if a new word does not begin at this
point. For example, r'al\B' matches 'always' but not 'al '.
\d Any digit character. This includes the digit characters 0 through 9. If
the UNICODE flag is set, then Unicode characters classified as digits are
also included.
\s Any whitespace character; may be blank space or any of the following:
\t, \n, \r, \f, or \v. UNICODE and LOCALE flags may have an effect on
what is considered a whitespace character.
\S Any character that is not a white space, as defined just above.
\w Matches any alphanumeric character (letter or digit) or an underscore
(_). The UNICODE and LOCALE flags may have an effect on what charac-
ters are considered to be alphanumeric.
\W Matches any character that is not alphanumeric as described just
above.
\z Matches the end of a string.

From the Library of Vineeth Babu

Overland_Book.indb 194 4/30/19 1:37 PM


6.6 Regular Expressions: Basic Syntax Summary 195
For example, the following regular-expression pattern matches any string
that begins with two digits:
r'\d\d'
The next example matches a string that consists of a two-digit string and
nothing else:
r'\d\d$'

6.6.2 Character Sets


The character-set syntax of Python regular expressions provides even finer
control over what character is to be matched next.
ntax
Key Sy

[char_set] // Match any one character in the set.


[^char_set] // Match any one character NOT in the set.
You can specify character sets by listing characters directly, as well as by
ranges, covered a few paragraphs later. For example, the following expression
matches any vowel (except, of course, for “y”).
[aeiou]
For example, suppose you specify the following regular-expression pattern:

6
r'c[aeiou]t'
This matches any of the following:
cat
cet
cit
cot
cut
We can combine ranges with other operators, such as +, which retains its
usual meaning outside the square brackets. So consider
c[aeiou]+t
This matches any of the following, as well as many other possible strings:
cat
ciot
ciiaaet
caaauuuut
ceeit

From the Library of Vineeth Babu

Overland_Book.indb 195 4/30/19 1:37 PM


196 Chapter 6 Regular Expressions, Part I

Within a range, the minus sign (-) enables you to specify ranges of charac-
ters when the minus sign appears between two other characters in a character
range. Otherwise, it is treated as a literal character.
For example, the following range matches any character from lowercase
“a” to lowercase “n”:
[a-n]
This range therefore matches an “a”, “b”, “c”, up to an “l”, “m”, or “n”. If
the IGNORECASE flag is enabled, it also matches uppercase versions of these
letters.
The following matches any uppercase or lowercase letter, or digit. Unlike
“\w,” however, this character set does not match an underscore (_).
[A-Za-z0-9]
The following matches any hexadecimal digit: a digit from 0 to 9 or an
uppercase or lowercase letter in the range “A”, “B”, “C”, “D”, “E”, and “F”.
[A-Fa-f0-9]
Character sets observe some special rules.

◗ Almost all characters within square brackets ([ ]) lose their special meaning,
except where specifically mentioned here. Therefore, almost everything is
interpreted literally.
◗ A closing square bracket has special meaning, terminating the character set;
therefore, a closing bracket must be escaped with a backslash to be interpreted
literally: “\]”
◗ The minus sign (-) has special meaning unless it occurs at the very beginning
or end of the character set, in which case it is interpreted as a literal minus
sign. Likewise, a caret (^) has special meaning at the beginning of a range but
not elsewhere.
◗ The backslash (\), even in this context, must be escaped to be represented lit-
erally. Use “\\” to represent a backslash.

For example, outside a character-set specification, the arithmetic opera-


tors + and * have special meaning. Yet they lose their meaning within square
brackets, so you can specify a range that matches any one of these characters:
[+*/-]
This range specification includes a minus sign (-), but it has no special
meaning because it appears at the end of the character set rather than in the
middle.

From the Library of Vineeth Babu

Overland_Book.indb 196 4/30/19 1:37 PM


6.6 Regular Expressions: Basic Syntax Summary 197
The following character-set specification uses a caret to match any char-
acter that is not one of the four operators +, *, /, or -. The caret has special
meaning here because it appears at the beginning.
[^+*/-]
But the following specification, which features the caret (^) in a different
position, matches any of five operators, ^, +, *, /, or -.
[+*^/-]
Therefore, the following Python code prints “Success!” when run.
import re
if [Link](r'[+*^/-]', '^'):
print('Success!')
However, the following Python code does not print “Success,” because the
caret at the beginning of the character set reverses the meaning of the charac-
ter set.
import re
if [Link](r'[^+*^/-]', '^'):
print('Success!')

6
6.6.3 Pattern Quantifiers
All of the quantifiers in Table 6.3 are expression modifiers, and not expression
extenders. Section 6.6.4, discusses in detail what the implications of “greedy”
matching are.

Table 6.3. Regular-Expression Quantifiers (Greedy)


SYNTAX DESCRIPTION
expr* Modifies meaning of expression expr so that it matches zero or more
occurrences rather than one. For example, a* matches “a”, “aa”, and
“aaa”, as well as an empty string.
expr+ Modifies meaning of expression expr so that it matches one or more
occurrences rather than only one. For example, a+ matches “a”, “aa”,
and “aaa”.
expr? Modifies meaning of expression expr so that it matches zero or one
occurrence of expr. For example, a? matches “a” or an empty string.
expr1 | expr2 Alternation. Matches a single occurrence of expr1, or a single occur-
rence of expr2, but not both. For example, a|b matches “a” or “b”.
Note that the precedence of this operator is very low, so cat|dog
matches “cat” or “dog”.
▼ continued on next page

From the Library of Vineeth Babu

Overland_Book.indb 197 4/30/19 1:37 PM


198 Chapter 6 Regular Expressions, Part I

Table 6.3. Regular-Expression Quantifiers (Greedy) (continued)


SYNTAX DESCRIPTION
expr{n} Modifies expression so that it matches exactly n occurrences of expr.
For example, a{3} matches “aaa”; but although sa{3}d matches
“saaad” it does not match “saaaaaad”.
expr{m, n} Matches a minimum of m occurrences of expr and a maximum of n.
For example, x{2,4}y matches “xxy”, “xxxy”, and “xxxxy” but not
“xxxxxxy” or “xy”.
expr{m,} Matches a minimum of m occurrences of expr with no upper limit
to how many can be matched. For example, x{3,} finds a match if it
can match the pattern “xxx” anywhere. But it will match more than
three if it can. Therefore zx(3,)y matches “zxxxxxy”.
expr{,n} Matches a minimum of zero, and a maximum of n, instances of the
expression expr. For example, ca{,2}t matches “ct”, “cat”, and
“caat” but not “caaat”.
(expr) Causes the regular-expression evaluator to look at all of expr as a
single group. There are two major purposes for doing so. First, a
quantifier applies to the expression immediately preceding it; but if
that expression is a group, the entire group is referred to. For exam-
ple, (ab)+ matches “ab”, “abab”, “ababab”, and so on.
Second, groups are significant because they can be referred to later,
both in matching and text replacement.
\n Refers to a group that has already previously matched; the reference
is to the text actually found at run time and not just a repeat of the
pattern itself. \1 refers to the first group, \2 refers to the second
group, and so on.

The next-to-last quantifier listed in Table 6.3 is the use of parentheses for
creating groups. Grouping can dramatically affect the meaning of a pattern.
Putting items in parentheses also creates tagged groups for later reference.
The use of the numeric quantifiers from Table 6.3 makes some expres-
sions easier to render, or at least more compact. For example, consider the
phone-number verification pattern introduced earlier.
r'\d\d\d-\d\d\d-\d\d\d\d'
This can be revised as
r'\d{3}-\d{3}-\d{4}'
This example saves a few keystrokes of typing, but other cases might save
quite a bit more. Using these features also creates code that is more readable
and easier to maintain.

From the Library of Vineeth Babu

Overland_Book.indb 198 4/30/19 1:37 PM


6.6 Regular Expressions: Basic Syntax Summary 199
Parentheses have a great deal of significance beyond mere clarity. Their
most important role is in specifying groups, which in turn can affect how a
pattern is parsed. For example, consider the following two patterns:
pat1 = r'cab+'
pat2 = r'c(ab)+'
The first pattern matches any of the following strings, in which the “b” is
repeated.
cab
cabb
cabbb
cabbbb
But the second pattern—thanks to the virtues of grouping—matches any of
the following strings. These strings repeat “ab” rather than “b”.
cab
cabab
cababab
cabababab
In this case, grouping is highly significant. Figure 6.6 shows how the Python
regular-expression evaluator interprets the meaning of the pattern differently

6
because of the parentheses; specifically, it’s the group “ab” that is repeated.

c(ab)+
Match “c” exactly.

This forms a unit that


matches one or more
instances of “ab”.
Figure 6.6. Parsing a group in a regular expression

6.6.4 Backtracking, Greedy, and Non-Greedy


Python regular expressions are flexible in many subtle ways. In particular, the
regular-expression evaluator will always favor a match over a nonmatch, even
if this requires a technique called backtracking.

From the Library of Vineeth Babu

Overland_Book.indb 199 4/30/19 1:37 PM


200 Chapter 6 Regular Expressions, Part I

Consider the following example.


import re
pat = r'c.*t'
if [Link](pat, 'cat'):
print('Success!')
Ask yourself: Does the pattern c.*t match the target string, “cat”? It
should, shouldn’t it? Because “c” will match a “c”, “t” will match a “t”, and
the pattern “.*” says, “Match any number of characters.” So it should match
“cat”.
But wait a moment. If you take the “.*” pattern literally, shouldn’t it do the
following?

◗ Match the “c”.


◗ Match the general pattern “.*” by matching all the remaining characters,
namely “at”.
◗ The end of the string is then reached. The regular-expression evaluator tries
to match a “t” but it can’t, because it’s now at the end of the string. The result?
It looks like failure.

Fortunately, the regular-expression evaluator is more sophisticated than


that. Having failed to match the string, it will backtrack and try matching
fewer characters based on “.*”; after backtracking one character, it finds that
it does match the target string, “cat”.
The point is that regular-expression syntax is flexible and correctly matches
any pattern it can legally match, even if it has to use backtracking.
A related issue is that of greedy versus non-greedy quantifiers. All types of
pattern specification in Python regular expressions follow the Golden Rule:
Report a match if one is possible, even if you have to backtrack. But within
that rule, sometimes multiple results are possible. “Greedy versus non-greedy”
is an issue of which string to select when more than one is possible.
Chapter 7, “Regular Expressions, Part II,” covers that issue in depth, listing
the non-greedy quantifiers.

6.7 A Practical Regular-Expression Example


This section uses the elements shown earlier in a practical example. Suppose
you’re given the task of writing software that verifies whether a password is
strong enough.

From the Library of Vineeth Babu

Overland_Book.indb 200 4/30/19 1:37 PM


6.7 A Practical Regular-Expression Example 201
We’re not talking about password encryption. That’s a different topic. But
before a password is accepted, you could test whether it has sufficient strength.
Some time ago, in the Wild West of software development, any word at
least one character in size might be accepted. Such passwords proved easy to
crack. Nowadays, only difficult-to-crack passwords are accepted. Otherwise
the user is automatically prompted to reenter. Here are some typical criteria:

◗ Each and every character must be an uppercase or lowercase letter, digit, or


underscore ( _), or one of the following punctuation characters: @, #, $, %, ^,
&, *, or !.
◗ The minimum length is eight characters total.
◗ It must contain at least one letter.
◗ It must contain at least one digit.
◗ It must contain one of the accepted punctuation characters.

Now let’s say you’re employed to write these tests. If you use regular expres-
sions, this job will be easy for you—a delicious piece of cake.
The following verification function performs the necessary tests. We can
implement the five rules by using four patterns and performing [Link]
with each.

6
import re

pat1 = r'(\w|[@#$%^&*!]){8,}$'
pat2 = r'.*\d'
pat3 = r'.*[a-zA-Z]'
pat4 = r'.*[@#$%^$*]'

def verify_passwd(s):
b = ([Link](pat1, s) and [Link](pat2, s) and
[Link](pat3, s) and [Link](pat4, s))
return bool(b)
The verify_passwd function applies four different match criteria to a
target string, s. The [Link] function is called with each of four different
patterns, pat1 through pat4. If all four matches succeed, the result is “true.”
The first pattern accepts any character that is a letter, character, or under-
score or a character in the range @#$%^&*! . . . and then it requires a match of
eight or more of such characters.

From the Library of Vineeth Babu

Overland_Book.indb 201 4/30/19 1:37 PM


202 Chapter 6 Regular Expressions, Part I

The \w meta character means “Match any alphanumeric character.” So


when the expression inside parentheses is put together, it means “Match an
alphanumeric character or one of the punctuation characters listed.”
(\w|[@#$%^&*!]){8,}
Let’s break this down a little bit. Inside the parentheses, we find this
expression:
\w|[@#$%^&*!]
Alteration is used here, indicated by the vertical bar, |. This subpattern
says, “Match \w or match a character in the set [@#$%^&*!].”
The characters within the square brackets lose the special meaning that
they would otherwise have outside the brackets. Therefore, everything inside
the range specification is treated literally rather than as a special character.
Putting this all together, the subexpression says, “Match either an alphanu-
meric character (\w), or match one of the punctuation characters listed.” The
next part of the pattern, {8,}, says to do this at least eight times.
Therefore, we match eight or more characters, in which each is alphanumeric
or one of the punctuation characters shown.
Finally, there is an end-of-string indicator, $. Consequently, there cannot
be, for example, any trailing spaces. Appending an end-of-line symbol, $,
requires the string to terminate after reading the last character.
(\w|[@#$%^&*!]){8,}$
The rest of the tests imple