Supercharged Python - Take Your Code To The Next Level
Supercharged Python - Take Your Code To The Next Level
Supercharged Python
Brian Overland
John Bennett
Acknowledgments xxvii
vii
From the Library of Vineeth Babu
Index 605
xxiii
From the Library of Vineeth Babu
◗ You’re rusty: If you’ve dabbled in Python but you’re a little rusty, you may
want to take a look at Chapter 1, “Review of the Fundamentals.” Otherwise,
you may want to skip Chapter 1 or only take a brief look at it.
◗ You know the basics but are still learning: Start with Chapters 2 and 3, which sur-
vey the abilities of strings and lists. This survey includes some advanced abilities
of these data structures that people often miss the first time they learn Python.
◗ Your understanding of Python is strong, but you don’t know everything yet:
Start with Chapter 4, which lists 22 programming shortcuts unique to Python,
that most people take a long time to fully learn.
◗ You want to master special features: You can start in an area of specialty. For
example, Chapters 5, 6, and 7 deal with text formatting and regular expres-
sions. The two chapters on regular expression syntax, Chapters 6 and 7, start
with the basics but then cover the finer points of this pattern-matching tech-
nology. Other chapters deal with other specialties. For example, Chapter 8
describes the different ways of handling text and binary files.
◗ You want to learn advanced math and plotting software: If you want to do
plotting, financial, or scientific applications, start with Chapter 12, “The ‘numpy’
(Numeric Python) Package.” This is the basic package that provides an under-
lying basis for many higher-level capabilities described in Chapters 13 through 15.
Note Ë We sometimes use Notes to point out facts you’ll eventually want to know
but that diverge from the main discussion. You might want to skip over Notes the
first time you read a section, but it’s a good idea to go back later and read them.
Ç Note
ntax
The Key Syntax Icon introduces general syntax displays, into which you supply
Key Sy
some or all of the elements. These elements are called “placeholders,” and they
appear in italics. Some of the syntax—especially keywords and punctuation—
are in bold and intended to be typed in as shown. Finally, square brackets,
when not in bold, indicate an optional item. For example:
set([iterable])
This syntax display implies that iterable is an iterable object (such as a
list or a generator object) that you supply. And it’s optional.
Square brackets, when in bold, are intended literally, to be typed in as
shown. For example:
list_name = [obj1, obj2, obj3, …]
Ellipses (…) indicate a language element that can be repeated any number
of times.
Performance Performance tips are like Notes in that they constitute a short digression
Tip from the rest of the chapter. These tips address the question of how you
can improve software performance. If you’re interested in that topic, you’ll
want to pay special attention to these notes.
Ç Performance Tip
Have Fun
When you master some or all of the techniques of this book, you should make
a delightful discovery: Python often enables you to do a great deal with a rel-
atively small amount of code. That’s why it’s dramatically increasing in popu-
larity every day. Because Python is not just a time-saving device, it’s fun to be
able to program this way . . . to see a few lines of code do so much.
We wish you the joy of that discovery.
Register your copy of Supercharged Python on the InformIT site for conve-
nient access to updates and/or corrections as they become available. To start
the registration process, go to [Link]/register and log in or create
an account. Enter the product ISBN (9780135159941) and click Submit.
Look on the Registered Products tab for an Access Bonus Content link
next to this product, and follow that link to access any available bonus
materials. If you would like to be notified of exclusive offers on new edi-
tions and updates, please check the box to receive email from us.
From John
I want to thank my coauthor, Brian Overland, for inviting me to join him on
this book. This allows me to pass on many of the things I had to work hard to
find documentation for or figure out by brute-force experimentation. Hope-
fully this will save readers a lot of work dealing with the problems I ran into.
xxvii
From the Library of Vineeth Babu
xxix
From the Library of Vineeth Babu
chapter. You might want to take a look at the global statement at the end of
this chapter, however, if you’re not familiar with it. Many people fail to under-
stand this keyword.
1
From the Library of Vineeth Babu
If it helps you in the beginning, you can think of variables as storage loca-
tions into which to place values, even though that’s not precisely what Python
does.
What Python really does is make a, b, and c into names for the values 10,
20, and 30. By this we mean “names” in the ordinary sense of the word. These
names are looked up in a symbol table; they do not correspond to fixed places
in memory! The difference doesn’t matter now, but it will later, when we get
to functions and global variables. These statements, which create a, b, and c
as names, are assignments.
In any case, you can assign new values to a variable once it’s created. So
in the following example, it looks as if we’re incrementing a value stored in a
magical box (even though we’re really not doing that).
>>> n = 5
>>> n = n + 1
>>> n = n + 1
>>> n
7
What’s really going on is that we’re repeatedly reassigning n as a name for
an increasingly higher value. Each time, the old association is broken and n
refers to a new value.
Assignments create variables, and you can’t use a variable name that hasn’t
yet been created. IDLE complains if you attempt the following:
>>> a = 5
>>> b = a + x # ERROR!
Because x has not yet been assigned a value, Python isn’t happy. The solu-
tion is to assign a value to x before it’s used on the right side of an assignment.
In the next example, referring to x no longer causes an error, because it’s been
assigned a value in the second line.
>>> a = 5
>>> x = 2.5
>>> b = a + x
>>> b
7.5
Python has no data declarations. Let us repeat that: There are no data dec-
larations. Instead, a variable is created by an assignment. There are some
other ways to create variables (function arguments and for loops), but for
the most part, a variable must appear on the left of an assignment before it
appears on the right.
1
following:
Then choose Run Module from the Run menu. When you’re prompted to
save the file, click OK and enter the program name as [Link]. The program
then runs and prints the results in the main IDLE window (or “shell”).
Alternatively, you could enter this program directly into the IDLE environ-
ment, one statement at a time, in which case the sample session should look
like this:
>>> side1 = 5
>>> side2 = 12
>>> hyp = (side1 * side1 + side2 * side2) ** 0.5
>>> hyp
13.0
Let’s step through this example a statement or two at a time. First, the val-
ues 5 and 12 are assigned to variables side1 and side2. Then the hypotenuse
of a right triangle is calculated by squaring both values, adding them together,
and taking the square root of the result. That’s what ** 0.5 does. It raises a
value to the power 0.5, which is the same as taking its square root.
(That last factoid is a tidbit you get from not falling asleep in algebra class.)
The answer printed by the program should be 13.0. It would be nice to
write a program that calculated the hypotenuse for any two values entered by
the user; but we’ll get that soon enough by examining the input statement.
Before moving on, you should know about Python comments. A comment is
text that’s ignored by Python itself, but you can use it to put in information help-
ful to yourself or other programmers who may need to maintain the program.
All text from a hashtag (#) to the end of the line is a comment. This is text
ignored by Python itself that still may be helpful for human readability’s sake.
For example:
side1 = 5 # Initialize one side.
side2 = 12 # Initialize the other.
hyp = (side1 * side1 + side2 * side2) ** 0.5
print(hyp) # Print results.
◗ The first character must be a letter or an underscore (_), but the remaining
characters can be any combination of underscores, letters, and digits.
◗ However, names with leading underscores are intended to be private to a
class, and names starting with double underscores may have special meaning,
such as _ _init_ _ or _ _add_ _, so avoid using names that start with double
underscores.
◗ Avoid any name that is a keyword, such as if, else, elif, and, or, not,
class, while, break, continue, yield, import, and def.
◗ Also, although you can use capitals if you want (names are case-sensitive),
initial-all-capped names are generally reserved for special types, such as class
names. The universal Python convention is to stick to all-lowercase for most
variable names.
Within these rules, there is still a lot of leeway. For example, instead of
using boring names like a, b, and c, we can use i, thou, and a jug_of_wine—
because it’s more fun (with apologies to Omar Khayyam).
i = 10
thou = 20
a_jug_of_wine = 30
loaf_of_bread = 40
inspiration = i + thou + a_jug_of_wine + loaf_of_bread
print(inspiration, 'percent good')
This prints the following:
100 percent good
1
offers a shortcut, just as C and C++ do. Python provides shortcut assignment
ops for many combinations of different operators within an assignment.
n = 0 # n must exist before being modified.
n += 1 # Equivalent to n = n + 1
n += 10 # Equivalent to n = n + 10
n *= 2 # Equivalent to n = n * 2
n -= 1 # Equivalent to n = n - 1
n /= 3 # Equivalent to n = n / 3
The effect of these statements is to start n at the value 0. Then they add 1
to n, then add 10, and then double that, resulting in the value 22, after which 1
is subtracted, producing 21. Finally, n is divided by 3, producing a final result
of n set to 7.0.
Table 1.1 shows that exponentiation has a higher precedence than the mul-
tiplication, division, and remainder operations, which in turn have a higher
precedence than addition and subtraction.
Consequently, parentheses are required in the following statement to pro-
duce the desired result:
hypot = (a * a + b * b) ** 0.5
This statement adds a squared to b squared and then takes the square root
of the sum.
The way that Python interprets integer and floating-point division (/) depends
on the version of Python in use.
◗ Division of any two numbers (integer and/or floating point) always results in a
floating-point result. For example:
4 / 2 # Result is 2.0
7 / 4 # Result is 1.75
1
ground division (//). This also works with floating-point values.
4 // 2 # Result is 2
7 // 4 # Result is 1
23 // 5 # Result is 4
8.0 // 2.5 # Result is 3.0
◗ You can get the remainder using remainder (or modulus) division.
23 % 5 # Result is 3
Note that in remainder division, a division is carried out first and the quo-
tient is thrown away. The result is whatever is left over after division. So 5
goes into 23 four times but results in a remainder of 3.
In Python 2.0, the rules are as follows:
Python also supports a divmod function that returns quotient and remain-
der as a tuple (that is, an ordered group) of two values. For example:
quot, rem = divmod(23, 10)
The values returned in quot and rem, in this case, will be 2 and 3 after exe-
cution. This means that 10 divides into 23 two times and leaves a remainder
of 3.
In Python 2.0, the input function works differently: it instead evaluates the
string entered as if it were a Python statement. To achieve the same result as
the Python 3.0 input statement, use the raw_input function in Python 2.0.
The input function prints the prompt string, if specified; then it returns
the string the user entered. The input string is returned as soon as the user
presses the Enter key; but no newline is appended.
ntax
Key Sy
input(prompt_string)
To store the string returned as a number, you need to convert to integer
(int) or floating-point (float) format. For example, to get an integer use this
code:
n = int(input('Enter integer here: '))
Or use this to get a floating-point number:
x = float(input('Enter floating pt value here: '))
The prompt is printed without an added space, so you typically need to
provide that space yourself.
Why is an int or float conversion necessary? Remember that they are
necessary when you want to get a number. When you get any input by using
the input function, you get back a string, such as “5.” Such a string is fine for
many purposes, but you cannot perform arithmetic on it without performing
the conversion first.
Python 3.0 also supports a print function that—in its simplest form—
prints all its arguments in the order given, putting a space between each.
ntax
Key Sy
print(arguments)
Python 2.0 has a print statement that does the same thing but does not use
parentheses.
The print function has some special arguments that can be entered by
using the name.
1
Python script that’s a complete program. For example, you can enter the fol-
lowing statements into a text file and run it as a script.
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = ((side1 * side1) + (side2 * side2)) ** 0.5
print('Length of hypotenuse is:', hyp)
Note Ë Mixing tab characters with actual spaces can cause errors even though it
might not look wrong. So be careful with tabs!
Ç Note
Because there is no “begin block” and “end block” syntax, Python relies on
indentation to know where statement blocks begin and end. The critical rule
is this:
✱ Within any given block of code, the indentation of all statements (that is, at
the same level of nesting) must be the same.
1
the function until you’re done—after which, enter a blank line. Then run
the function by typing its name followed by parentheses. Once a function is
defined, you can execute it as often as you want.
So the following sample session, in the IDLE environment, shows the process
of defining a function and calling it twice. For clarity, user input is in bold.
>>> def main():
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = (side1 * side1 + side2 * side2) ** 0.5
print('Length of hypotenuse is: ', hyp)
>>> main()
Enter length of a side: 3
Enter another length: 4
Length of hypotenuse is: 5.0
>>> main()
Enter length of a side: 30
Enter another length: 40
Length of hypotenuse is: 50.0
As you can see, once a function is defined, you can call it (causing it to exe-
cute) as many times as you like.
The Python philosophy is this: Because you should do this indentation any-
way, why shouldn’t Python rely on the indentation and thereby save you the
extra work of putting in curly braces? This is why Python doesn’t have any
“begin block” or “end block” syntax but relies on indentation.
else:
print('a is not greater than b')
c = -10
An if statement can also have any number of optional elif clauses.
Although the following example has statement blocks of one line each, they
can be larger.
age = int(input('Enter age: '))
if age < 13:
print('You are a preteen.')
elif age < 20:
print('You are a teenager.')
elif age <= 30:
print('You are still young.')
else:
print('You are one of the oldies.')
You cannot have empty statement blocks; to have a statement block that
does nothing, use the pass keyword.
Here’s the syntax summary, in which square brackets indicate optional
items, and the ellipses indicate a part of the syntax that can be repeated any
number of times.
ntax
Key Sy
if condition:
indented_statements
[ elif condition:
indented_statements ]...
[ else:
indented_statements ]
while condition:
indented_statements
1
n = 10 # This may be set to any positive integer.
i = 1
while i <= n:
print(i, end=' ')
i += 1
Let’s try entering these statements in a function. But this time, the function
takes an argument, n. Each time it’s executed, the function can take a differ-
ent value for n.
>>> def print_nums(n):
i = 1
while i <= n:
print(i, end=' ')
i += 1
>>> print_nums(3)
1 2 3
>>> print_nums(7)
1 2 3 4 5 6 7
>>> print_nums(8)
1 2 3 4 5 6 7 8
It should be clear how this function works. The variable i starts as 1, and
it’s increased by 1 each time the loop is executed. The loop is executed again
as long as i is equal to or less than n. When i exceeds n, the loop stops, and no
further values are printed.
Optionally, the break statement can be used to exit from the nearest
enclosing loop. And the continue statement can be used to continue to the
next iteration of the loop immediately (going to the top of the loop) but not
exiting as break does.
ntax
Key Sy
break
For example, you can use break to exit from an otherwise infinite loop.
True is a keyword that, like all words in Python, is case-sensitive. Capitaliza-
tion matters.
n = 10 # Set n to any positive integer.
i = 1
while True: # Always executes!
print(i)
if i >= n:
break
i += 1
Note the use of i += 1. If you’ve been paying attention, this means the
same as the following:
i = i + 1 # Add 1 to the current value and reassign.
1
The second app (try it yourself!) is a complete computer game. It secretly
selects a random number between 1 and 50 and then requires you, the player,
to try to find the answer through repeated guesses.
The program begins by using the random package; we present more infor-
mation about that package in Chapter 11. For now, enter the first two lines as
shown, knowing they will be explained later in the book.
from random import randint
n = randint(1, 50)
while True:
ans = int(input('Enter a guess: '))
if ans > n:
print('Too high! Guess again. ')
elif ans < n:
print('Too low! Guess again. ')
else:
print('Congrats! You got it!')
break
To run, enter all this in a Python script (choose New from the File menu),
and then choose Run Module from the Run menu, as usual. Have fun.
All the operators in Table 1.2 are binary—that is, they take two oper-
ands—except not, which takes a single operand and reverses its logical value.
Here’s an example:
if not (age > 12 and age < 20):
print('You are not a teenager.')
By the way, another way to write this—using a Python shortcut—is to write
the following:
if not (12 < age < 20):
print('You are not a teenager.')
This is, as far as we know, a unique Python coding shortcut. In Python
3.0, at least, this example not only works but doesn’t even require parentheses
right after the if and not keywords, because logical not has low precedence
as an operator.
def function_name(arguments):
indented_statements
In this syntax, arguments is a list of argument names, separated by com-
mas if there’s more than one. Here’s the syntax of the return statement:
return value
You can also return multiple values:
return value, value ...
Finally, you can omit the return value. If you do, the effect is the same as the
statement return None.
return # Same effect as return None
1
caller of the function. Reaching the end of a function causes an implicit return—
returning None by default. (Therefore, using return at all is optional.)
Technically speaking, Python argument passing is closer to “pass by ref-
erence” than “pass by value”; however, it isn’t exactly either. When a value is
passed to a Python function, that function receives a reference to the named
data. However, whenever the function assigns a new value to the argument
variable, it breaks the connection to the original variable that was passed.
Therefore, the following function does not do what you might expect. It
does not change the value of the variable passed to it.
def double_it(n):
n = n * 2
x = 10
double_it(x)
print(x) # x is still 10!
This may at first seem a limitation, because sometimes a programmer needs
to create multiple “out” parameters. However, you can do that in Python by
returning multiple values directly. The calling statement must expect the values.
def set_values():
return 10, 20, 30
a, b, c = set_values()
The variables a, b, and c are set to 10, 20, and 30, respectively.
Because Python has no concept of data declarations, an argument list in
Python is just a series of comma-separated names—except that each may
optionally be given a default value. Here is an example of a function definition
with two arguments but no default values:
def calc_hyp(a, b):
hyp = (a * a + b * b) ** 0.5
return hyp
These arguments are listed without type declaration; Python functions do
no type checking except the type checking you do yourself! (However, you can
check a variable’s type by using the type or isinstance function.)
Although arguments have no type, they may be given default values.
The use of default values enables you to write a function in which not all
arguments have to be specified during every function call. A default argument
has the following form:
argument_name = default_value
For example, the following function prints a value multiple times, but the
default number of times is 1:
def print_nums(n, rep=1):
i = 1
while i <= rep:
print(n)
i += 1
Here, the default value of rep is 1; so if no value is given for the last argument,
it’s given the value 1. Therefore this function call prints the number 5 one time:
print_nums(5)
The output looks like this:
5
Note Ë Because the function just shown uses n as an argument name, it’s nat-
ural to assume that n must be a number. However, because Python has no
variable or argument declarations, there’s nothing enforcing that; n could just
as easily be passed a string.
But there are repercussions to data types in Python. In this case, a problem
can arise if you pass a nonnumber to the second argument, rep. The value
passed here is repeatedly compared to a number, so this value, if given, needs
to be numeric. Otherwise, an exception, representing a runtime error, is raised.
Ç Note
1
Named arguments, if used, must come at the end of the list of arguments.
1
'H'
However, you cannot assign new values to characters within existing
strings, because Python strings are immutable: They cannot be changed.
How, then, can new strings be constructed? You do that by using a combi-
nation of concatenation and assignment. Here’s an example:
s1 = 'Abe'
s2 = 'Lincoln'
s1 = s1 + ' ' + s2
In this example, the string s1 started with the value 'Abe', but then it ends
up containing 'Abe Lincoln'.
This operation is permitted because a variable is only a name.
Therefore, you can “modify” a string through concatenation without actually
violating the immutability of strings. Why? It’s because each assignment cre-
ates a new association between the variable and the data. Here’s an example:
my_str = 'a'
my_str += 'b'
my_str += 'c'
The effect of these statements is to create the string 'abc' and to assign
it (or rather, reassign it) to the variable my_str. No string data was actually
modified, despite appearances. What’s really going on in this example is that
the name my_str is used and reused, to name an ever-larger string.
You can think of it this way: With every statement, a larger string is created
and then assigned to the name my_str.
In dealing with Python strings, there’s another important rule to keep in
mind: Indexing a string in Python produces a single character. In Python, a
single character is not a separate type (as it is in C or C++), but is merely a
string of length 1. The choice of quotation marks used has no effect on this
rule.
[ items ]
Here the square brackets are intended literally, and items is a list of zero or
more items, separated by commas if there are more than one. Here’s an exam-
ple, representing a series of high temperatures, in Fahrenheit, over a summer
weekend:
[78, 81, 81]
Lists can contain any kind of object (including other lists!) and, unlike C or
C++, Python lets you mix the types. For example, you can have lists of strings:
['John', 'Paul', 'George', 'Ringo' ]
And you can have lists that mix up the types:
['John', 9, 'Paul', 64 ]
However, lists that have mixed types cannot be automatically sorted in
Python 3.0, and sorting is an important feature.
Unlike some other Python collection classes (dictionaries and sets), order
is significant in a list, and duplicate values are allowed. But it’s the long list of
built-in capabilities (all covered in Chapter 3) that makes Python lists really
impressive. In this section we use two: append, which adds an element to a list
dynamically, and the aforementioned sort capability.
Here’s a slick little program that showcases the Python list-sorting capabil-
ity. Type the following into a Python script and run it.
a_list = []
while True:
s = input('Enter name: ')
if not s:
break
a_list.append(s)
a_list.sort()
print(a_list)
Wow, that’s incredibly short! But does it work? Here’s a sample session:
Enter name: John
Enter name: Paul
Enter name: George
Enter name: Ringo
Enter name: Brian
Enter name:
['Brian', 'George', 'John', 'Paul', 'Ringo']
1
the group and now all are printed in alphabetical order.
This little program, you should see, prompts the user to enter one name at
a time; as each is entered, it’s added to the list through the append method.
Finally, when an empty string is entered, the loop breaks. After that, it’s sorted
and printed.
It may seem that this loop should double each element of my_lst, but it
does not. To process a list in this way, changing values in place, it’s necessary
to use indexing.
my_lst = [10, 15, 25]
for i in [0, 1, 2]:
my_lst[i] *= 2
This has the intended effect: doubling each individual element of my_lst,
so that now the list data is [20, 30, 50].
To index into a list this way, you need to create a sequence of indexes of the
form
0, 1, 2, ... N-1
in which N is the length of the list. You can automate the production of such
sequences of indexes by using the range function. For example, to double
every element of an array of length 5, use this code:
my_lst = [100, 102, 50, 25, 72]
for i in range(5):
my_lst[i] *= 2
This code fragment is not optimal because it hard-codes the length of the
list, that length being 5, into the code. Here is a better way to write this loop:
my_lst = [100, 102, 50, 25, 72]
for i in range(len(my_lst)):
my_lst[i] *= 2
After this loop is executed, my_lst contains [200, 204, 100, 50, 144].
The range function produces a sequence of integers as shown in Table 1.3,
depending on whether you specify one, two, or three arguments.
1
integers. For example, the following loop calculates a factorial number.
n = int(input('Enter a positive integer: '))
prod = 1
for i in range(1, n + 1):
prod *= i
print(prod)
This loop works because range(1, n + 1) produces integers beginning
with 1 up to but not including n + 1. This loop therefore has the effect of
doing the following calculation:
1 * 2 * 3 * ... n
1.17 Tuples
The Python concept of tuple is closely related to that of lists; if anything, the
concept of tuple is even more fundamental. The following code returns a list
of integers:
def my_func():
return [10, 20, 5]
This function returns values as a list.
my_lst = my_func()
But the following code, returning a simple series of values, actually returns
a tuple:
def a_func():
return 10, 20, 5
It can be called as follows:
a, b, c = a_func()
Note that a tuple is a tuple even if it’s grouped within parentheses for clar-
ity’s sake.
return (10, 20, 5) # Parens have no effect in
# this case.
The basic properties of a tuple and a list are almost the same: Each is an
ordered collection, in which any number of repeated values are allowed.
1.18 Dictionaries
A Python dictionary is a collection that contains a series of associations
between key-value pairs. Unlike lists, dictionaries are specified with curly
braces, not square brackets.
ntax
Key Sy
1
grade_dict = { }
Additional rules apply to selecting types for use in dictionaries:
◗ In Python version 3.0, all the keys must share the same type, or at least a com-
patible type, such as integers and floating point, that can be compared.
◗ The key type should be immutable (data you cannot change “in place”).
Strings and tuples are immutable, but lists are not.
◗ Therefore, lists such as [1, 2] cannot be used for keys, but tuples, such as
(1, 2), can.
◗ The values may be of any type; however, it is often a good idea to use the same
type, if possible, for all the value objects.
There’s a caution you should keep in mind. If you attempt to get the value
for a particular key and if that key does not exist, Python raises an exception.
To avoid this, use the get method to ensure that the specified key exists.
ntax
Key Sy
[Link](key [,default_value])
In this syntax, the square brackets indicate an optional item. If the key
exists, its corresponding value in the dictionary is returned. Otherwise, the
default_value is returned, if specified; or None is returned if there is no
such default value. This second argument enables you to write efficient histo-
gram code such as the following, which counts frequencies of words.
s = (input('Enter a string: ')).split()
wrd_counter = {}
for wrd in s:
wrd_counter[wrd] = wrd_counter.get(wrd, 0) + 1
What this example does is the following: When it finds a new word, that
word is entered into the dictionary with the value 0 + 1, or just 1. If it finds an
existing word, that word frequency is returned by get, and then 1 is added to
it. So if a word is found, its frequency count is incremented by 1. If the word
is not found, it’s added to the dictionary with a starting count of 1. Which is
what we want.
In this example, the split method of the string class is used to divide a
string into a list of individual words. For more information on split, see Sec-
tion 2.12, “Breaking Up Input Using ‘split’.”
1.19 Sets
Sets are similar to dictionaries, but they lack associated values. A set, in effect,
is only a set of unique keys, which has the effect of making a set different from
a list in the following ways:
◗ All its members must be unique. An attempt to add an existing value to a set is
simply ignored.
◗ All its members should be immutable, as with dictionary keys.
◗ Order is never significant.
1
operation; it has all those elements that appear in one set or the other but not
both. setSub contains elements that are in the first set (setA in this case) but
not the second (setB).
Appendix C, “Set Methods,” lists all the methods supported by the set
class, along with examples for most of them.
def funcB():
print(count) # Prints 10, the global value.
Do you see how this works? The first function in this example uses its
own local version of count, because such a variable was created within that
function.
But the second function, funcB, created no such variable. Therefore, it uses the
global version, which was created in the first line of the example (count = 10).
The difficulty occurs when you want to refer to a global version of a vari-
able, but you make it the target of an assignment statement. Python has no
def my_func():
global count
count += 1
1
global foo—and therefore foo is created as a global variable. This works
even though the assignment to foo is not part of module-level code.
In general, there is a golden rule about global and local variables in Python.
It’s simply this:
✱ If there’s any chance that a function might attempt to assign a value to a global
variable, use the global statement so that it’s not treated as local.
Chapter 1 Summary
Chapter 1 covers the fundamentals of Python except for class definitions,
advanced operations on collections, and specialized parts of the library such
as file operations. The information presented here is enough to write many
Python programs.
So congratulations! If you understand everything in this chapter, you are
already well on the way to becoming a fluent Python programmer. The next
couple of chapters plunge into the fine points of lists and strings, the two most
important kinds of collections.
Chapter 3 covers called something called “comprehension” in Python (not
to be confused with artificial intelligence) and explains how comprehension
applies not only to lists but also to sets, dictionaries, and other collections. It
also shows you how to use lambda functions.
6 Explain precisely why tab characters can cause a problem with the indenta-
tions used in a Python program (and thereby introduce syntax errors)?
7 What is the advantage of having to rely so much on indentation in Python?
8 How many different values can a Python function return to the caller?
9 Recount this chapter’s solution to the forward reference problem for func-
tions. How can such an issue arise in the first place?
10 When you’re writing a Python text string, what, if anything, should guide
your choice of what kind of quotation marks to use (single, double, or triple)?
11 Name at least one way in which Python lists are different from arrays in other
languages, such as C, which are contiguously stored collections of a single
base type.
33
From the Library of Vineeth Babu
type(data_object)
The action is to take the specified data_object and produce the result
after converting it to the specified type—if the appropriate conversion exists.
If not, Python raises a ValueError exception.
Here are some examples:
s = '45'
n = int(s)
x = float(s)
If you then print n and x, you get the following:
45
45.0
2
Likewise, you can use other bases with the int conversion. The following
code uses octal (8) and hexadecimal (16) bases.
n1 = int('775', 8)
n2 = int('1E', 16)
print('775 octal and 16 hex:', n1, n2)
These statements print the following results:
775 octal and 1E hex: 509 30
We can therefore summarize the int conversion as taking an optional sec-
ond argument, which has a default value of 10, indicating decimal radix.
ntax
Key Sy
int(data_object, radix=10)
The int and float conversions are necessary when you get input from the
keyboard—usually by using the input statement—or get input from a text
file, and you need to convert the digit characters into an actual numeric value.
A str conversion works in the opposite direction. It converts a number into
its string representation. In fact, it works on any type of data for which the
type defines a string representation.
Converting a number to a string enables you to do operations such as
counting the number of printable digits or counting the number of times a
specific digit occurs. For example, the following statements print the length of
the number 1007.
n = 1007
s = str(n) # Convert to '1007'
print('The length of', n, 'is', len(s), 'digits.')
This example prints the following output:
The length of 1007 is 4 digits.
There are other ways to get this same information. You could, for exam-
ple, use the mathematical operation that takes the base-10 logarithm. But
this example suggests what you can do by converting a number to its string
representation.
Note Ë Converting a number to its string representation is not the same as con-
verting a number to its ASCII or Unicode number. That's a different opera-
tion, and it must be done one character at a time by using the ord function.
Ç Note
2
str1 != str2 Returns True if str1 and str2 have different contents.
str1 < str2 Returns True if str1 is earlier in alphabetical ordering than str2. For
example, 'abc' < 'def' returns True, but 'abc' < 'aaa' returns False.
(See the note about ordering.)
str1 > str2 Returns True if str1 is later in alphabetical ordering than str2. For example,
'def' > 'abc' returns True, but 'def' > 'xyz' returns False.
str1 <= str2 Returns True if str1 is earlier than str2 in alphabetical ordering or if the
strings have the same content.
str1 >= str2 Returns True if str1 is later than str2 in alphabetical ordering or if the
strings have the same content.
str1 + str2 Produces the concatenation of the two strings, which is the result of simply
gluing str2 contents onto the end of str1. For example, 'Big' + 'Deal'
produces the concatenated string 'BigDeal'.
str1 * n Produces the result of a string concatenated onto itself n times, where n is an
integer. For example, 'Goo' * 3 produces 'GooGooGoo'.
n * str1 Same effect as str1 * n.
str1 in str2 Produces True if the substring str1, in its entirety, is contained in str2.
str1 not in str2 Produces True if the substring str1 is not contained in str2.
str is obj Returns True if str and obj refer to the same object in memory; sometimes
necessary for comparisons to None or to an unknown object type.
str is not obj Returns True if str and obj do not refer to the same object in memory.
Note Ë When strings are compared, Python uses a form of alphabetical order;
more specifically, it uses code point order, which looks at ASCII or Unicode
values of the characters. In this order, all uppercase letters precede all lower-
case letters, but otherwise letters involve alphabetical comparisons, as you'd
expect. Digit comparisons also work as you’d expect, so that '1' is less than '2'.
Ç Note
Concatenation does not automatically add a space between two words. You
have to do that yourself. But all strings, including literal strings such as ' ',
have the same type, str, so Python has no problem carrying out the following:
first = 'Will'
last = 'Shakespeare'
full_name = first + ' ' + last
print(full_name)
This example prints
Will Shakespeare
The string-multiplication operator (*) can be useful when you’re doing
character-oriented graphics and want to initialize a long line—a divider, for
example.
divider_str = '_' * 30
print(divider_str)
This prints the following:
_ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
The result of this operation, '_' * 30, is a string made up of 30 underscores.
Be careful not to abuse the is and is not operators. These operators test
for whether or not two values are the same object in memory. You could have
two string variables, for example, which both contain the value "cat". Test-
ing them for equality (==) will always return True in this situation, but obj1
is obj2 might not.
When should you use is or is not? You should use them primarily when
you’re comparing objects of different types, for which the appropriate test for
equality (==) might not be defined. One such case is testing to see whether
some value is equal to the special value None, which is unique and therefore
appropriate to test using is.
2
place within the string.
◗ Slicing is an ability more unique to Python. It enables you to refer to an entire
substring of characters by using a compact syntax.
✱ You cannot use indexing, slicing, or any other operation to change values of a
string “in place,” because strings are immutable.
You can use both positive (nonnegative) and negative indexes in any combi-
nation. Figure 2.1 illustrates how positive indexes run from 0 to N–1, where N
is the length of the string.
This figure also illustrates negative indexes, which run backward from –1
(indicating the last character) to –N.
K i n g M e !
0 1 2 3 4 5 6 7
K i n g M e !
–8 –7 –6 –5 –4 –3 –2 –1
Figure 2.1. String indexing in Python
Suppose you want to remove the beginning and last characters from a
string. In this case, you’ll want to combine positive and negative indexes. Start
with a string that includes opening and closing double quotation marks.
king_str = '"Henry VIII"'
If you print this string directly, you get the following:
"Henry VIII"
But what if you want to print the string without the quotation marks? An
easy way to do that is by executing the following code:
new_str = king_str[1:-1]
print(new_str)
The output is now
Henry VIII
Figure 2.2 illustrates how this works. In slicing operations, the slice begins
with the first argument, up to but not including the second argument.
0 1 –1
“ H e n r y V I I I “
2
Sliced section includes 1, up to
but not including –1
Figure 2.2. String slicing example 1
Here’s another example. Suppose we’d like to extract the second word,
“Bad,” from the phrase “The Bad Dog.” As Figure 2.3 illustrates, the correct
slice would begin with index 4 and extend to all the characters up to but not
including index 7. The string could therefore be accessed as string[4:7].
string[4:7]
0 1 2 3 4 5 6 7 8 9 10
T h e B a d D o g
◗ If both beg and end are positive indexes, beg-end gives the maximum length
of the slice.
◗ To get a string containing the first N characters of a string, use string[:N].
◗ To get a string containing the last N characters of a string, use string[-N:].
◗ To cause a complete copy of the string to be made, use string[:].
Slicing permits a third, and optional, step argument. When positive, the
step argument specifies how many characters to move ahead at a time. A
step argument of 2 means “Get every other character.” A step argument of
3 means “Get every third character.” For example, the following statements
start with the second character in 'RoboCop' and then step through the string
two characters at a time.
a_str = 'RoboCop'
b_str = a_str[1::2] # Get every other character.
print(b_str)
This example prints the following:
ooo
Here’s another example. A step value of 3 means “Get every third charac-
ter.” This time the slice, by default, starts in the first position.
a_str = 'AbcDefGhiJklNop'
b_str = a_str[::3] # Get every third character.
print(b_str)
This example prints the following:
ADGJN
You can even use a negative step value, which causes the slicing to be per-
formed backward through the string. For example, the following function
returns the exact reverse of the string fed to it as an argument.
def reverse_str(s):
return s[::-1]
2
lowing example confirms that the ASCII code for the letter A is decimal 65.
print(ord('A')) # Print 65.
The chr function is the inverse of the ord function. It takes a character code
and returns its ASCII or Unicode equivalent, as a string of length1. Calling chr
with an argument of 65 should therefore print a letter A, which it does.
print(chr(65)) # Print 'A'
The in and not in operators, although not limited to use with one-character
strings, often are used that way. For example, the following statements test
whether the first character of a string is a vowel:
s = 'elephant'
if s[0] in 'aeiou':
print('First char. is a vowel.')
Conversely, you could write a consonant test.
s = 'Helephant'
if s[0] not in 'aeiou':
print('First char. is a consonant.')
One obvious drawback is that these examples do not correctly work on
uppercase letters. Here’s one way to fix that:
if s[0] in 'aeiouAEIOU':
print('First char. is a vowel.')
Alternatively, you can convert a character to uppercase before testing it;
that has the effect of creating a case-insensitive comparison.
s = 'elephant'
if s[0].upper() in 'AEIOU':
print('First char. is a vowel.')
You can also use in and not in to test substrings that contain more than
one character. In that case, the entire substring must be found to produce True.
'bad' in 'a bad dog' # True!
Is there bad in a bad dog? Yes, there is.
2
immutability, they actually do not.
a_str = 'Big '
a_str += 'Bad '
a_str += 'John'
This technique, of using =, +, and += to build strings, is adequate for simple
cases involving a few objects. For example, you could build a string contain-
ing all the letters of the alphabet as follows, using the ord and chr functions
introduced in Section 2.5, “Single-Character Operations (Character Codes).”
n = ord('A')
s = ''
for i in range(n, n + 26):
s += chr(i)
This example has the virtue of brevity. But it causes Python to create
entirely new strings in memory, over and over again.
An alternative, which is slightly better, is to use the join method.
ntax
Key Sy
separator_string.join(list)
This method joins together all the strings in list to form one large string.
If this list has more than one element, the text of separator_string is placed
between each consecutive pair of strings. An empty list is a valid separator
string; in that case, all the strings in the list are simply joined together.
Use of join is usually more efficient at run time than concatenation, although
you probably won’t see the difference in execution time unless there are a great
many elements.
n = ord('A')
a_lst = [ ]
for i in range(n, n + 26):
a_lst.append(chr(i))
s = ''.join(a_lst)
The join method concatenates all the strings in a_lst, a list of strings,
into one large string. The separator string is empty in this case.
Performance The advantage of join over simple concatenation can be seen in large
Tip cases involving thousands of operations. The drawback of concatena-
tion in such cases is that Python has to create thousands of strings of increas-
ing size, which are used once and then thrown away, through “garbage
collection.” But garbage collection exacts a cost in execution time, assuming it
is run often enough to make a difference.
Ç Performance Tip
Here’s a case in which the approach of using join is superior: Suppose you
want to write a function that takes a list of names and prints them one at a
time, nicely separated by commas. Here’s the hard way to write the code:
def print_nice(a_lst):
s = ''
for item in a_lst:
s += item + ', '
if len(s) > 0: # Get rid of trailing
# comma+space
s = s[:-2]
print(s)
Given this function definition, we can call it on a list of strings.
print_nice(['John', 'Paul', 'George', 'Ringo'])
This example prints the following:
John, Paul, George, Ringo
Here’s the version using the join method:
def print_nice(a_lst):
print(', '.join(a_lst))
That’s quite a bit less code!
2
One of the most important functions is len, which can be used with any
of the standard collection classes to determine the number of elements. In
the case of strings, this function returns the number of characters. Here’s an
example:
dog1 = 'Jaxx'
dog2 = 'Cutie Pie'
print(dog1, 'has', len(dog1), 'letters.')
print(dog2, 'has', len(dog2), 'letters.')
This prints the following strings. Note that “Cutie Pie” has nine letters
because it counts the space.
Jaxx has 4 letters.
Cutie Pie has 9 letters.
The reversed and sorted functions produce an iterator and a list, respec-
tively, rather than strings. However, the output from these data objects can
be converted back into strings by using the join method. Here’s an example:
a_str = ''.join(reversed('Wow,Bob,wow!'))
print(a_str)
b_str = ''.join(sorted('Wow,Bob,wow!'))
print(b_str)
This prints the following:
!wow,boB,woW
!,,BWbooowww
2
ter. This requires that each word be capitalized and that no uppercase
letter appear anywhere but at the beginning of a word. There may be
whitespace and punctuation characters in between words.
[Link]() All letters in the string are uppercase, and there is at least one letter.
(There may, however, be nonalphabetic characters.)
These functions are valid for use with single-character strings as well as
longer strings. The following code illustrates the use of both.
h_str = 'Hello'
if h_str[0].isupper():
print('First letter is uppercase.')
if h_str.isupper():
print('All chars are uppercase.')
else:
print('Not all chars are uppercase.')
This example prints the following:
First letter is uppercase.
Not all chars are uppercase.
This string would also pass the test for being a title, because the first letter
is uppercase and the rest are not.
if h_str.istitle():
print('Qualifies as a title.')
The effects of the lower and upper methods are straightforward. The first
converts each uppercase letter in a string to a lowercase letter; the second does
the converse, converting each lowercase letter to an uppercase letter. Nonletter
characters are not altered but kept in the string as is.
The result, after conversion, is then returned as a new string. The original
string data, being immutable, isn’t changed “in place.” But the following state-
ments do what you’d expect.
my_str = "I'm Henry VIII, I am!"
new_str = my_str.upper()
my_str = new_str
The last two steps can be efficiently merged:
my_str = my_str.upper()
If you then print my_str, you get the following:
I'M HENRY VIII, I AM!
The swapcase method is used only rarely. The string it produces has an
uppercase letter where the source string had a lowercase latter, and vice versa.
For example:
my_str = my_str.swapcase()
print(my_str)
This prints the following:
i'M hENRY viii, i AM!
2
me_str = 'John Bennett, PhD'
is_doc = me_str.endswith('PhD')
These methods, startswith and endswith, can be used on an empty
string without raising an error. If the substring is empty, the return value is
always True.
Now let’s look at other search-and-replace methods of Python strings.
ntax
Key Sy
n = frank_str.count('doo')
print(n) # Print 3.
You can optionally use the start and end arguments with this same
method call.
print(frank_str.count('doo', 1)) # Print 2
print(frank_str.count('doo', 1, 10)) # Print 1
A start argument of 1 specifies that counting begins with the second char-
acter. If start and end are both used, then counting happens over a target
string beginning with start position up to but not including the end position.
These arguments are zero-based indexes, as usual.
If either or both of the arguments (begin, end) are out of range, the count
method does not raise an exception but works on as many characters as it can.
Similar rules apply to the find method. A simple call to this method finds
the first occurrence of the substring argument and returns the nonnegative
index of that instance; it returns –1 if the substring isn’t found.
frank_str = 'doo be doo be doo...'
print(frank_str.find('doo')) # Print 0
print(frank_str.find('doob')) # Print -1
If you want to find the positions of all occurrences of a substring, you can
call the find method in a loop, as in the following example.
frank_str = 'doo be doo be doo...'
n = -1
while True:
n = frank_str.find('doo', n + 1)
if n == -1:
break
print(n, end=' ')
This example prints every index at which an instance of 'doo' can be
found.
0 7 14
This example works by taking advantage of the start argument. After
each successful call to the find method, the initial searching position, n, is set
to the previous successful find index and then is adjusted upward by 1. This
guarantees that the next call to the find method must look for a new instance
of the substring.
If the find operation fails to find any occurrences, it returns a value of –1.
The index and rfind methods are almost identical to the find method,
with a few differences. The index function does not return –1 when it fails to
find an occurrence of the substring. Instead it raises a ValueError exception.
The rfind method searches for the last occurrence of the substring argu-
ment. By default, this method starts at the end and searches to the left. How-
ever, this does not mean it looks for a reverse of the substring. Instead, it
searches for a regular copy of the substring, and it returns the starting index
number of the last occurrence—that is, where the last occurrence starts.
frank_str = 'doo be doo be doo...'
print(frank_str.rfind('doo')) # Prints 14.
The example prints 14 because the rightmost occurrence of 'doo' starts in
zero-based position 14.
2
title = '25 Hues of Grey'
new_title = [Link]('Grey', 'Gray')
Printing new_title produces this:
25 Hues of Gray
The next example illustrates how replace works on multiple occurrences
of the same substring.
title = 'Greyer Into Grey'
new_title = [Link]('Grey', 'Gray')
The new string is now
Grayer Into Gray
input_str.split(delim_string=None)
The call to this method returns a list of substrings taken from input_
string. The delim_string specifies a string that serves as the delimiter; this
is a substring used to separate one token from another.
If delim_string is omitted or is None, then the behavior of split is to, in
effect, use any sequence of one or more whitespace characters (spaces, tabs,
and newlines) to distinguish one token from the next.
For example, the split method—using the default delimiter of a space—
can be used to break up a string containing several names.
stooge_list = 'Moe Larry Curly Shemp'.split()
The resulting list, if printed, is as follows:
['Moe', 'Larry', 'Curly', 'Shemp']
The behavior of split with a None or default argument uses any number
of white spaces in a row as the delimiter. Here’s an example:
stooge_list = 'Moe Larry Curly Shemp'.split()
If, however, a delimiter string is specified, it must be matched precisely to
recognize a divider between one character and the next.
stooge_list = 'Moe Larry Curly Shemp'.split(' ')
In this case, the split method recognizes an extra string—although it is
empty—wherever there’s an extra space. That might not be the behavior you
want. The example just shown would produce the following:
['Moe', '', '', '', 'Larry', 'Curly', '', 'Shemp']
Another common delimiter string is a comma, or possibly a comma com-
bined with a space. In the latter case, the delimiter string must be matched
exactly. Here’s an example:
stooge_list = 'Moe, Larry, Curly, Shemp'.split(', ')
In contrast, the following example uses a simple comma as delimiter. This
example causes the tokens to contain the extra spaces.
stooge_list = 'Moe, Larry, Curly, Shemp'.split(',')
The result in this case includes a leading space in the last three of the four
string elements:
['Moe', ' Larry', ' Curly', ' Shemp']
If you don’t want those leading spaces, an easy solution is to use stripping,
as shown next.
2.13 Stripping
Once you retrieve input from the user or from a text file, you may want to
place it in the correct format by stripping leading and trailing spaces. You
might also want to strip leading and trailing “0” digits or other characters.
The str class provides several methods to let you perform this stripping.
ntax
Key Sy
2
trailing asterisks (*) as well as all leading or trailing “0” digits and plus signs (+).
Internal instances of the character to be stripped are left alone. For exam-
ple, the following statement strips leading and trailing spaces but not the space
in the middle.
name_str = ' Will Shakes '
new_str = name_str.strip()
Figure 2.4 illustrates how this method call works.
W i l l S h a k e s
W i l l S h a k e s
Figure 2.4. Python stripping operations
◗ The text of str is placed in a larger print field of size specified by width.
◗ If the string text is shorter than the specified length, the text is justified left,
right, or centered, as appropriate. The center method slightly favors left jus-
tification if it cannot be centered perfectly.
◗ The rest of the result is padded with the fill character. If this fill character is
not specified, then the default value is a white space.
Here’s an example:
new_str = 'Help!'.center(10, '#')
print(new_str)
This example prints
##Help!###
Another common fill character (other than a space) is the digit character
“0”. Number strings are typically right justified rather than left justified.
Here’s an example:
new_str = '750'.rjust(6, '0')
print(new_str)
This example prints
000750
The zfill method provides a shorter, more compact way of doing the
same thing: padding a string of digits with leading “0” characters.
s = '12'
print([Link](7))
But the zfill method is not just a shortcut for rjust; instead, with zfill,
the zero padding becomes part of the number itself, so the zeros are printed
between the number and the sign:
>>> '-3'.zfill(5)
'-0003'
>>> '-3'.rjust(5, '0')
'000-3'
Chapter 2 Summary
The Python string type (str) is an exceptionally powerful data type, even in
comparison to strings in other languages. String methods include the abilities
to tokenize input (splitting); remove leading and trailing spaces (stripping);
convert to numeric formats; and print numeric expressions in any radix.
The built-in search abilities include methods for counting and finding sub-
strings (count, find, and index) as well as the ability to do text replacement.
2
Chapter 2 Review Questions
1 Does assignment to an indexed character of a string violate Python’s immuta-
bility for strings?
2 Does string concatenation, using the += operator, violate Python’s immutabil-
ity for strings? Why or why not?
3 How many ways are there in Python to index a given character?
4 How, precisely, are indexing and slicing related?
5 What is the exact data type of an indexed character? What is the data type of a
substring produced from slicing?
6 In Python, what is the relationship between the string and character “types”?
7 Name at least two operators and one method that enable you to build a larger
string out of one or more smaller strings.
8 If you are going to use the index method to locate a substring, what is the
advantage of first testing the target string by using in or not in?
9 Which built-in string methods, and which operators, produce a simple Bool-
ean (true/false) results?
To paraphrase the Lord High Executioner in The Mikado, we’ve got a little
list. . . . Actually, in Python we’ve got quite a few of them. One of the foun-
dations of a strong programming language is the concept of arrays or lists—
objects that hold potentially large numbers of other objects, all held together
in a collection.
Python’s most basic collection class is the list, which does everything an
array does in other languages, but much more. This chapter explores the
basic, intermediate, and advanced features of Python lists.
◗ Specify the data on the right side of an assignment. This is where a list is actu-
ally created, or built.
◗ On the left side, put a variable name, just as you would for any other assign-
ment, so that you have a way to refer to the list.
59
From the Library of Vineeth Babu
But it’s much better to use a variable to represent only one type of data and
stick to it. We also recommend using suggestive variable names. For example,
it’s a good idea to use a “list” suffix when you give a name to list collections.
my_int_list = [5, -20, 5, -69]
Here’s a statement that creates a list of strings and names it beat_list:
beat_list = [ 'John', 'Paul', 'George', 'Ringo' ]
You can even create lists that mix numeric and string data.
mixed_list = [10, 'John', 5, 'Paul' ]
But you should mostly avoid mixing data types inside lists. In Python 3.0,
mixing data types prevents you from using the sort method on the list. Inte-
ger and floating-point data, however, can be freely mixed.
num_list = [3, 2, 17, 2.5]
num_list.sort() # Sorts into [2, 2.5, 3, 17]
Another technique you can use for building a collection is to append one
element at a time to an empty list.
my_list = [] # Must do this before you append!
my_list.append(1)
my_list.append(2)
my_list.append(3)
These statements have the same effect as initializing a list all at once, as here:
my_list = [1, 2, 3]
You can also remove list items.
my_list.remove(1) # List is now [2, 3]
The result of this statement is to remove the first instance of an element
equal to 1. If there is no such value in the list, Python raises a ValueError
exception.
List order is meaningful, as are duplicate values. For example, to store a
series of judge’s ratings, you might use the following statement, which indi-
cates that three different judges all assigned the score 1.0, but the third judge
assigned 9.8.
the_scores = [1.0, 1.0, 9.8, 1.0]
The following statement removes only the first instance of 1.0.
the_scores.remove(1.0) # List now equals [1.0, 9.8, 1.0]
3
The first statement creates a list by building it on the right side of the assign-
ment (=). But the second statement in this example creates no data. It just does
the following action:
Make “b_list” an alias for whatever “a_list” refers to.
The variable b_list therefore becomes an alias for whatever a_list
refers to. Consequently, if changes are made to either variable, both reflect
that change.
b_list.append(100)
a_list.append(200)
b_list.append(1)
print(a_list) # This prints [2, 5, 10, 100, 200, 1]
If instead you want to create a separate copy of all the elements of a list, you
need to perform a member-by-member copy. The simplest way to do that is to
use slicing.
my_list = [1, 10, 5]
yr_list = my_list[:] # Perform member-by-member copy.
Now, because my_list and yr_list refer to separate copies of [1, 10, 5],
you can change one of the lists without changing the other.
3.3 Indexing
Python supports both nonnegative and negative indexes.
The nonnegative indexes are zero-based, so in the following example,
list_name[0] refers to the first element. (Section 3.3.2 covers negative
indexes.)
my_list = [100, 500, 1000]
print(my_list[0]) # Print 100.
Because lists are mutable, they can be changed “in place” without creat-
ing an entirely new list. Consequently, you can change individual elements by
making one of those elements the target of an assignment—something you
can’t do with strings.
my_list[1] = 55 # Set second element to 55.
0 1 2 3 4 5
100 200 300 400 500 600
Figure 3.1. Nonnegative indexes
Performance Here, as elsewhere, we’ve used separate calls to the print function
Tip because it’s convenient for illustration purposes. But remember that
repeated calls to print slow down your program, at least within IDLE. A
faster way to print these values is to use only one call to print.
print(a_list[0], a_list[1], a_list[2], sep='\n')
Ç Performance Tip
–6 –5 –4 –3 –2 –1
3
100 200 300 400 500 600
Figure 3.2. Negative indexes
for s in a_list:
print(s)
This prints the following:
Tom
Dick
Jane
This approach is more natural and efficient than relying on indexing, which
would be inefficient and slower.
for i in range(len(a_list)):
print(a_list[i])
But what if you want to list the items next to numbers? You can do that by
using index numbers (plus 1, if you want the indexing to be 1-based), but a
better technique is to use the enumerate function.
ntax
Key Sy
enumerate(iter, start=0)
In this syntax, start is optional. Its default value is 0.
This function takes an iterable, such as a list, and produces another iter-
able, which is a series of tuples. Each of those tuples has the form
(num, item)
In which num is an integer in a series beginning with start. The following
statement shows an example, using a_list from the previous example and
starting the series at 1:
list(enumerate(a_list, 1))
This produces the following:
[(1, 'Tom'), (2, 'Dick'), (3, 'Jane')]
We can put this together with a for loop to produce the desired result.
for item_num, name_str in enumerate(a_list, 1):
print(item_num, '. ', name_str, sep='')
This loop calls the enumerate function to produce tuples of the form (num,
item). Each iteration prints the number followed by a period (“.”) and an
element.
1. Tom
2. Dick
3. Jane
3
list[beg: end: step] All elements starting with beg, up to but not including end;
but movement through the list is step items at a time.
With this syntax, any or all of the three values may be omit-
ted. Each has a reasonable default value; the default value of
step is 1.
Note Ë When Python carries out a slicing operation, which always includes at
least one colon (:) between the square brackets, the index specifications are
not required to be in range. Python copies as many elements as it can. If it fails
to copy any elements at all, the result is simply an empty list.
Ç Note
Figure 3.3 shows an example of how slicing works. Remember that Python
selects elements starting with beg, up to but not including the element referred
to by end. Therefore, the slice a_list[2:5] copies the sublist [300, 400, 500].
a_list[2:5]
0 1 2 3 4 5
100 200 300 400 500 600
Finally, specifying a value for step, the third argument, can affect the data
produced. For example, a value of 2 causes Python to get every other element
from the range [2:5].
a_list = [100, 200, 300, 400, 500, 600]
b_list = a_list[Link] # Produces [300, 500]
A negative step value reverses the direction in which list elements are
accessed. So a step value of –1 produces values in the slice by going backward
through the list one item at a time. A step value of –2 produces values in the
slice by going backward through the list two items at a time.
The following example starts with the last element and works backwards;
it therefore produces an exact copy of the list—with all elements reversed!
rev_list = a_list[::-1]
Here’s an example:
a_list = [100, 200, 300]
rev_list = a_list[::-1]
print(rev_list) # Prints [300, 200, 100]
The step argument can be positive or negative but cannot be 0. If step is
negative, then the defaults for the other values change as follows:
◗ The default value of beg becomes the last element in the list (indexed as –1).
◗ The default value of end becomes the beginning of the list.
3
[10, 707, 777, 50, 60]
You may even assign into a position of length 0. The effect is to insert new
list items without deleting existing ones. Here’s an example:
my_list = [1, 2, 3, 4]
my_list[0:0] = [-50, -40]
print(my_list) # prints [-50, -40, 1, 2, 3, 4]
The following restrictions apply to this ability to assign into slices:
◗ When you assign to a slice of a list, the source of the assignment must be
another list or collection, even if it has zero or one element.
◗ If you include a step argument in the slice to be assigned to, the sizes of the
two collections—the slice assigned to and the sequence providing the data—
must match in size. If step is not specified, the sizes do not need to match.
The first two of these operators (+ and *) involve making copies of list
items. But these are shallow copies. (Section 3.7, “Shallow Versus Deep Copy-
ing,” discusses this issue in greater detail.) So far, shallow copying has worked
fine, but the issue will rear its head when we discuss multidimensional arrays
in Section 3.18.
Consider the following statements:
a_list = [1, 3, 5, 0, 2]
b_list = a_list # Make an alias.
c_list = a_list[:] # Member-by-member copy
After b_list is created, the variable name b_list is just an alias for
a_list. But the third statement in this example creates a new copy of the
data. If a_list is modified later, c_list retains the original order.
3
Section 9.10.3, “Comparison Methods.”
Neither an empty list nor the value None necessarily returns True when
applied to the in operator.
a = [1, 2, 3]
None in a # This produces False
[] in a # So does this.
1 1
2 2
10
And now you can see the problem. A member-by-member copy was carried
out, but the list within the list was a reference, so both lists ended up referring
to the same data in the final position.
The solution is simple. You need to do a deep copy to get the expected
behavior. To get a deep copy, in which even embedded list items get copied,
import the copy package and use [Link].
import copy
1 1
2 2
5 5
10 10
3
Figure 3.5. Deep copying
With deep copying, the depth of copying extends to every level. You could
have collections within collections to any level of complexity.
If changes are now made to b_list after being copied to a_list, they will
have no further effect on a_list. The last element of a_list will remain
set to [5,10] until changed directly. All this functionality is thanks to deep
copying.
You’ll often use len when working with lists. For example, the following
loop doubles every item in a list. It’s necessary to use len to make this a gen-
eral solution.
for i in range(len(a_list)):
a_list[i] *= 2
The max and min functions produce maximum and minimum elements,
respectively. These functions work only on lists that have elements with com-
patible types, such as all numeric elements or all string elements. In the case of
strings, alphabetical order (or rather, code point order) enables comparisons.
Here’s an example:
a_list = [100, -3, -5, 120]
print('Length of the list is', len(a_list))
print('Max and min are', max(a_list), min(a_list))
This prints the following:
Length of the list is 4
Max and min are 120 -5
The sorted and reversed functions are similar to the sort and reverse
methods, presented in Section 3.11. But whereas those methods reorganize a
list in place, these functions produce new lists.
These functions work on tuples and strings as well as lists, but the sorted
function always produces a list. Here’s an example:
a_tup = (30, 55, 15, 45)
print(sorted(a_tup)) # Print [15, 30, 45, 55]
The reversed function is unusual because it produces an iterable but not
a collection. In simple terms, this means you need a for loop to print it or else
use a list or tuple conversion. Here’s an example:
a_tup = (1, 3, 5, 0)
for i in reversed(a_tup):
print(i, end=' ')
This prints
0 5 3 1
Alternatively, you can use the following:
print(tuple(reversed(a_tup)))
3
>>> num_list = [2.45, 1, -10, 55.5, 100.03, 40, -3]
>>> print('The average is ', sum(num_list) / len(num_list))
The average is 26.56857142857143
a_list.append(4)
a_list.extend([4]) # This has the same effect.
If the index is out of range, the method places the new value at the end of
the list if the index is too high to be in range, and it inserts the new value at the
beginning of the list if the index is too low. Here’s an example:
a_list = [10, 20, 40] # Missing 30.
a_list.insert(2, 30 ) # At index 2 (third), insert 30.
print(a_list) # Prints [10, 20, 30, 40]
a_list.insert(100, 33)
print(a_list) # Prints [10, 20, 30, 40, 33]
a_list.insert(-100, 44)
print(a_list) # Prints [44, 10, 20, 30, 40, 33]
The remove method removes the first occurrence of the specified argument
from the list. There must be at least one occurrence of this value, or Python
raises a ValueError exception.
my_list = [15, 25, 15, 25]
my_list.remove(25)
print(my_list) # Prints [15, 15, 25]
You may want to use in, not in, or the count method to verify that a
value is in a list before attempting to remove it.
Here’s a practical example that combines these methods.
In competitive gymnastics, winners are determined by a panel of judges,
each of whom submits a score. The highest and lowest scores are thrown out,
and then the average of the remaining scores is taken. The following function
performs these tasks:
def eval_scores(a_list):
a_list.remove(max(a_list))
a_list.remove(min(a_list))
return sum(a_list) / len(a_list)
Here’s a sample session. Suppose that the_scores contains the judges’
ratings.
the_scores = [8.5, 6.0, 8.5, 8.7, 9.9, 9.0]
The eval_scores function throws out the low and high values (6.0 and
9.9); then it calculates the average of the rest, producing 8.675.
print(eval_scores(the_scores))
3
# indexed item: use
# last by default.
In this syntax, brackets are not intended literally but instead indicate optional
items.
The count method returns the number of occurrences of the specified element.
It returns the number of matching items at the top level only. Here’s an example:
yr_list = [1, 2, 1, 1,[3, 4]]
print(yr_list.count(1)) # Prints 3
print(yr_list.count(2)) # Prints 1
print(yr_list.count(3)) # Prints 0
print(yr_list.count([3, 4])) # Prints 1
The index method returns the zero-based index of the first occurrence
of a specified value. You may optionally specify start and end indexes; the
searching happens in a subrange beginning with the start position, up to but
not including the end position. An exception is raised if the item is not found.
For example, the following call to the index method returns 3, signifying
the fourth element.
beat_list = ['John', 'Paul', 'George', 'Ringo']
print(beat_list.index('Ringo')) # Print 3.
But 3 is also printed if the list is defined as
beat_list = ['John', 'Paul', 'George', 'Ringo', 'Ringo']
[Link]([key=None] [, reverse=False])
[Link]() # Reverse existing order.
Each of these methods changes the ordering of all the elements in place.
In Python 3.0, all the elements of the list—in the case of either method—
must have compatible types, such as all strings or all numbers. The sort
method places all the elements in lowest-to-highest order by default—or by
highest-to-lowest if reverse is specified and set to True. If the list consists of
strings, the strings are placed in alphabetical (code point) order.
The following example program prompts the user for a series of strings,
until the user enters an empty string by pressing Enter without any other
input. The program then prints the strings in alphabetical order.
def main():
my_list = [] # Start with empty list
while True:
s = input('Enter next name: ')
if len(s) == 0:
break
my_list.append(s)
my_list.sort() # Place all elems in order.
print('Here is the sorted list:')
for a_word in my_list:
print(a_word, end=' ')
main()
Here’s a sample session of this program, showing user input in bold.
Enter next name: John
Enter next name: Paul
Enter next name: George
Enter next name: Ringo
Enter next name: Brian
Enter next name:
Here is the sorted list:
Brian George John Paul Ringo
The sort method has some optional arguments. The first is the key argu-
ment, which by default is set to None. This argument, if specified, is a func-
tion (a callable) that’s run on each element to get that element’s key value.
Those keys are compared to determine the new order. So, for example, if a
three-member list produced key values of 15, 1, and 7, they would be sorted as
middle-last-first.
For example, suppose you want a list of strings to be ordered according to
case-insensitive comparisons. An easy way to do that is to write a function
3
b_list.sort(key=ignore_case)
If you now print a_list and b_list in an IDLE session, you get the fol-
lowing results (with user input shown in bold):
>>> a_list
['George', 'Ringo', 'brian', 'john', 'paul']
>>> b_list
['brian', 'George', 'john', 'paul', 'Ringo']
Notice how a_list and b_list, which started with identical contents, are
sorted. The first was sorted by ordinary, case-sensitive comparisons, in which
all uppercase letters are “less than” compared to lowercase letters. The second
list was sorted by case-insensitive comparisons, pushing poor old 'Ringo' to
the end.
The second argument is the reversed argument, which by default is
False. If this argument is included and is True, elements are sorted in high-
to-low order.
The reverse method changes the ordering of the list, as you’d expect, but
without sorting anything. Here’s an example:
my_list = ['Brian', 'John', 'Paul', 'George', 'Ringo']
my_list.reverse() # Reverse elems in place.
for a_word in my_list:
print(a_word, end=' ')
Calling reverse has the effect of producing a reverse sort: the last shall be
first, and the first shall be last. Now Ringo becomes the frontman.
Ringo Paul John George Brian
Note Ë Using the keys argument, as just explained, is a good candidate for the
use of lambda functions, as explained later in Section 3.14.
Ç Note
Push(20)
Push(10) Pop ->20
Pop ->10
20 20
10 10 10 10
0 0 0 0 0
The push and pop functions on a traditional stack are replaced by the
append and pop methods of a Python list.
The key change that needs to be made—conceptually, at any rate—is to
think of operating on the last element to be added to the end of the list, rather
than to the literal top of a stack.
This end-of-the-list approach is functionally equivalent to a stack. Figure 3.7
illustrates 10 and 20 being pushed on, and then popped off, a list used as a
stack. The result is that the items are popped off in reverse order.
0 10 [Link](10)
0 10 20 [Link](20)
0 10 20 [Link]() -> 20
0 10 [Link]() -> 10
3
7 3 +
This adds 7 to 3, which produces 10. Or, to multiply 10 by 5, producing 50,
you use this:
10 5 *
Then—and here is why RPN is so useful—you can put these two expres-
sions together in a clear, unambiguous way, without any need for parentheses:
10 5 * 7 3 + /
This expression is equivalent to the following standard notation, which
produces 5.0:
(10 * 5) / (7 + 3)
Here's another example:
1 2 / 3 4 / +
This example translates into (1/2) + (3/4) and therefore produces 1.25.
Here’s another example:
2 4 2 3 7 + + + *
This translates into
2 * (4 + (2 + (3 + 7)))
which evaluates to 32. The beauty of an RPN expression is that parentheses
are never needed. The best part is that the interpreter follows only a few sim-
ple rules:
def push(v):
the_stack.append(v)
def pop():
return the_stack.pop()
def main():
s = input('Enter RPN string: ')
a_list = [Link]()
for item in a_list:
if item in '+-*/':
op2 = pop()
op1 = pop()
if item == '+':
push(op1 + op2)
elif item == '-':
push(op1 - op2)
elif item == '*':
push(op1 * op2)
else:
push(op1 / op2)
main()
This application, although not long, could be more compact. We’ve included
dedicated push and pop functions operating on a global variable, the_stack.
A few lines could have been saved by using methods of the_stack directly.
op1 = the_stack.pop()
3
...
the_stack.append(op1 + op2) # Push op1 + op2.
Revising the example so that it uses these methods directly is left as an exer-
cise. Note also that there is currently no error checking, such as checking to
make sure that the stack is at least two elements in length before an operation
is carried out. Error checking is also left as an exercise.
Performance The following tip saves you seven lines of code. Instead of testing for
Tip each operator separately, you can use the eval function to take a Python
command string and execute it. You would then need only one function call to
carry out any arithmetic operation in this app.
push(eval(str(op1) + item + str(op2)))
Be careful, however, because the eval function can easily be misused. In
this application, it should be called only if the item is one of the four opera-
tors: +, *, –, or /.
Ç Performance Tip
[Link](function, list)
The action of reduce is to apply the specified function to each succes-
sive pair of neighboring elements in list, accumulating the result, passing it
along, and finally returning the overall answer. The function argument—a
callable—must itself take two arguments and produce a result. Assuming that
a list (or other sequence) has at least four elements, the effect is as follows.
◗ Take the first two elements as arguments to the function. Remember the result.
◗ Take the result from step 1 and the third element as arguments to the func-
tion. Remember this result.
◗ Take the result from step 2 and the fourth element as arguments to the
function.
◗ Continue to the end of the list in this manner.
n = 5
a_list = list(range(1, n + 1))
3
Ç Note
But this usage, while interesting to note, is not usually how a lambda is
used. A more practical use is with the reduce function. For example, here’s
how to calculate the triangle number for 5:
t5 = [Link](lambda x, y: x + y, [1,2,3,4,5])
Here’s how to calculate the factorial of 5:
f5 = [Link](lambda x, y: x * y, [1,2,3,4,5])
Programs create data dynamically, at run time, and assign names to data
objects if you want to refer to them again. The same thing happens with func-
tions (callables); they are created at run time and are either assigned names—
if you want to refer to them again—or used anonymously, as in the last two
examples.
3
b_list = [i * i for i in a_list]
Perhaps by now you can see the pattern. In this second example, the ele-
ments inside the square brackets can be broken down as follows:
b_list = [ ]
for i in a_lst:
b_lst.append(i * i)
[ value for_statement_header ]
new_list = []
for i in my_list:
if i > 0:
new_list.append(i)
The result, in this case, is to place the values [10, 12, 13, 15] in new_list.
The following statement, using list comprehension, does the same thing:
new_list = [i for i in my_list if i > 0 ]
The list-comprehension statement on the right, within the square brackets,
breaks down into three pieces in this case:
3
neg_list:
[-10, -500, -1]
Alternatively, suppose you want to produce the same set, but have it consist
of the squares of positive values from a_list, resulting in {25, 4}. In that
case, you could use the following statement:
my_set = {i * i for i in a_list if i > 0}
Dictionary comprehension is a little more complicated, because in order to
work, it’s necessary to create a loop that generates key-value pairs, using this
syntax:
key : value
Suppose you have a list of tuples that you’d like to be the basis for a data
dictionary.
vals_list = [ ('pi', 3.14), ('phi', 1.618) ]
A dictionary could be created as follows:
my_dict = { i[0]: i[1] for i in vals_list }
Note the use of the colon (:) in the key-value expression, i[0] : i[1].
You can verify that a dictionary was successfully produced by referring to or
printing the following expression, which should produce the number 3.14:
my_dict['pi'] # Produces 3.14.
Here’s another example, which combines data from two lists into a dictio-
nary. It assumes that these two lists are the same length.
keys = ['Bob', 'Carol', 'Ted', 'Alice' ]
vals = [4.0, 4.0, 3.75, 3.9]
grade_dict = { keys[i]: vals[i] for i in range(len(keys)) }
This example creates a dictionary initialized as follows:
grade_dict = { 'Bob':4.0, 'Carol':4.0, 'Ted':3.75,
'Alice':3.9 }
Performance You can improve the performance of the code in this last example by
Tip using the built-in zip function to merge the lists. The comprehension
then is as follows:
grade_dict = { key: val for key, val in zip(keys, vals)}
Ç Performance Tip
3
idict = {v : k for k, v in phone_dict.items() }
The items method of data dictionaries produces a list of k, v pairs, in
which k is a key and v is a value. For each such pair, the value expression v:k
inverts the key-value relationship in producing the new dictionary, idict.
a_list = [0, 0, 0]
set_list_vals(a_list)
print(a_list) # Prints [100, 200, 150]
This approach works because the values of the list are changed in place,
without creating a new list and requiring variable reassignment. But the fol-
lowing example fails to change the list passed to it.
def set_list_vals(list_arg):
list_arg = [100, 200, 150]
a_list = [0, 0, 0]
set_list_vals(a_list)
print(a_list) # Prints [0, 0, 0]
With this approach, the values of the list, a_list, were not changed after
the function returned. What happened?
The answer is that the list argument, list_arg, was reassigned to refer to
a completely new list. The association between the variable list_arg and the
original data, [0, 0, 0], was broken.
However, slicing and indexing are different. Assigning into an indexed item
or a slice of a list does not change what the name refers to; it still refers to the
same list, but the first element of that list is modified.
my_list[0] = new_data # This really changes list data.
Note Ë This chapter describes how to use the core Python language to create
multidimensional lists. Chapter 12 describes the use of the numpy package,
3
which enables the use of highly optimized routines for manipulating multidi-
mensional arrays, especially arrays (or matrixes) of numbers.
Ç Note
It might seem that list multiplication would solve the problem. It does, in
the case of one-dimensional lists.
big_list = [0] * 100 # Create a list of 100 elements
# each initialized to 0.
This works so well, you might be tempted to just generalize to a second
dimension.
mat = [[0] * 100] * 200
But although this statement is legal, it doesn’t do what you want. The inner
expression, [0] * 100, creates a list of 100 elements. But the code repeats
that data 200 times—not by creating 200 separate rows but instead by creat-
ing 200 references to the same row.
The effect is to create 200 rows that aren’t separate. This is a shallow copy;
you get 200 redundant references to the same row. This is frustrating. The
way around it is to append each of the 200 rows one at a time, which you can
do in a for loop:
mat = [ ]
for i in range(200):
[Link]([0] * 100)
In this example, mat starts out as an empty list, just like any other.
Each time through the loop, a row containing 100 zeros is appended. After
this loop is executed, mat will refer to a true two-dimensional matrix made up
of 20,000 fully independent cells. It can then be indexed as high as mat[199]
[99]. Here’s an example:
mat[150][87] = 3.141592
As with other for loops that append data to a list, the previous example is a
great candidate for list comprehension.
mat = [ [0] * 100 for i in range(200) ]
The expression [0] * 100 is the value part of this list-comprehension
expression; it specifies a one-dimensional list (or “row”) that consists of 100
elements, each set to 0. This expression should not be placed in an additional
pair of brackets, by the way, or the effect would be to create an extra, and
unnecessary, level of indexing.
The expression for i in range(200) causes Python to create, and
ntax
append, such a row . . . 200 times.
Key Sy
3
mat2 = [[ [0] * 25 for _ in range(20) ]
for _ in range(30) ]
And here is a 10 × 10 × 10 × 10 four-dimensional list:
mat2 = [[[ [0] * 10 for _ in range(10) ]
for _ in range(10) ]
for _ in range(10) ]
You can build matrixes of higher dimensions still, but remember that as
dimensions increase, things get bigger—fast!
Chapter 3 Summary
This chapter has demonstrated just how powerful Python lists are. Many of
these same abilities are realized in functions, such as len, count, and index,
which apply to other collection classes as well, including strings and tuples.
However, because lists are mutable, there are some list capabilities not sup-
ported by those other types, such as sort and reverse, which alter list data
“in place.”
This chapter also introduced some exotic abilities, such as the use of functools
and lambda functions. It also explained techniques for creating multidimen-
sional lists, an ability that Chapter 12 provides efficient and superior alterna-
tives to; still, it’s useful to know how to create multidimensional lists using the
core language.
2 What’s the most efficient way of creating a Python list that has 1,000 elements
to start with? Assume every element should be initialized to the same value.
3 How do you use slicing to get every other element of a list, while ignoring the
rest? (For example, you want to create a new list that has the first, third, fifth,
seventh, and so on element.)
4 Describe some of the differences between indexing and slicing.
5 What happens when one of the indexes used in a slicing expression is out of
range?
6 If you pass a list to a function, and if you want the function to be able to
change the values of the list—so that the list is different after the function
returns—what action should you avoid?
7 What is an unbalanced matrix?
8 Why does the creation of arbitrarily large matrixes require the use of either
list comprehension or a loop?
4.1 Overview
Python is unusually gifted with shortcuts and time-saving programming
techniques. This chapter begins with a discussion of twenty-two of these
techniques.
Another thing you can do to speed up certain programs is to take advantage
of the many packages that are available with Python. Some of these—such as
re (regular expressions), system, random, and math—come with the stan-
dard Python download, and all you have to do is to include an import state-
ment. Other packages can be downloaded quite easily with the right tools.
95
From the Library of Vineeth Babu
4
the next physical line. Consequently, you can enter as long a statement as you
want—and you can enter a string of any length you want—without necessar-
ily inserting newlines.
my_str = ('I am Hen-er-y the Eighth, '
'I am! I am not just any Henry VIII, '
'I really am!')
This statement places all this text in one string. You can likewise use open
parentheses with other kinds of statements.
length_of_hypotenuse = ( (side1 * side1 + side2 * side2)
** 0.5 )
A statement is not considered complete until all open parentheses [(] have
been matched by closing parentheses [)]. The same is true for braces and
square brackets. As a result, this statement will automatically continue to the
next physical line.
If you ever write code like this, you should try to break the habit as soon as
you can. It’s better to print the contents of a list or iterator directly.
beat_list = ['John', 'Paul', 'George', 'Ringo']
for guy in beat_list:
print(guy)
Even if you need access to a loop variable, it’s better to use the enumerate
function to generate such numbers. Here’s an example:
beat_list = ['John', 'Paul', 'George', 'Ringo']
for i, name in enumerate(beat_list, 1):
print(i, '. ', name, sep='')
This prints
1. John
2. Paul
3. George
4. Ringo
There are, of course, some cases in which it’s necessary to use indexing.
That happens most often when you are trying to change the contents of a list
in place.
4
a_list += [30, 40]
print('a_list:', a_list)
print('b_list:', b_list)
This code prints
a_list: [10, 20, 30, 40]
b_list: [10, 20, 30, 40]
In this case, the change was made to the list in place, so there was no need
to create a new list and reassign that list to the variable. Therefore, a_list
was not assigned to a new list, and b_list, a variable that refers to the same
data in memory, reflects the change as well.
In-place operations are almost always more efficient. In the case of lists,
Python reserves some extra space to grow when allocating a list in memory,
and that in turns permits append operations, as well as +=, to efficiently
grow lists. However, occasionally lists exceed the reserved space and must be
moved. Such memory management is seamless and has little or no impact on
program behavior.
Non-in-place operations are less efficient, because a new object must be cre-
ated. That’s why it’s advisable to use the join method to grow large strings
rather than use the += operator, especially if performance is important. Here’s an
example using the join method to create a list and join 26 characters together.
str_list = []
n = ord('a')
for i in range(n, n + 26):
str_list += chr(i)
alphabet_str = ''.join(str_list)
Figures 4.1 and 4.2 illustrate the difference between in-place operations
and non-in-place operations. In Figure 4.1, string data seems to be appended
onto an existing string, but what the operation really does is to create a new
string and then assign it to the variable—which now refers to a different place
in memory.
2 Create new
S ‘Here’s a string’ string.
1
But in Figure 4.2, list data is appended onto an existing list without the
need to create a new list and reassign the variable.
a_list 10 20 30 40
Create new Grow the list
1 2
list. in place.
Figure 4.2. Appending to a list (in-place)
Here’s a summary:
4
example, suppose you want to assign 1 to a, and 0 to b. The obvious way to do
that is to use the following statements:
a = 1
b = 0
But through tuple assignment, you can combine these into a single
statement.
a, b = 1, 0
In this form of assignment, you have a series of values on one side of the
equals sign (=) and another on the right. They must match in number, with
one exception: You can assign a tuple of any size to a single variable (which
itself now represents a tuple as a result of this operation).
a = 4, 8, 12 # a is now a tuple containing three values.
Tuple assignment can be used to write some passages of code more com-
pactly. Consider how compact a Fibonacci-generating function can be in
Python.
def fibo(n):
a, b = 1, 0
while a <= n:
print(a, end=' ')
a, b = a + b, a
In the last statement, the variable a gets a new value: a + b; the variable b
gets a new value—namely, the old value of a.
4
<class 'int'>
This is not what was wanted in this case. The parentheses were treated as a
no-op, as would any number of enclosing parentheses. But the following state-
ment produces a tuple with one element, although, to be fair, a tuple with just
one element isn’t used very often.
my_tup = (3,) # Assign tuple with one member, 3.
The use of an asterisk (*) provides a good deal of additional flexibility with
tuple assignment. You can use it to split off parts of a tuple and have one (and
only one) variable that becomes the default target for the remaining elements,
which are then put into a list. Some examples should make this clear.
a, *b = 2, 4, 6, 8
In this example, a gets the value 2, and b is assigned to a list:
2
[4, 6, 8]
You can place the asterisk next to any variable on the left, but in no case
more than one. The variable modified with the asterisk is assigned a list of
whatever elements are left over. Here’s an example:
a, *b, c = 10, 20, 30, 40, 50
In this case, a and c refer to 10 and 50, respectively, after this statement is
executed, and b is assigned the list [20, 30, 40].
You can, of course, place the asterisk next to a variable at the end.
big, bigger, *many = 100, 200, 300, 400, 500, 600
4
example:
def double_me(n):
n *= 2
a = 10
double_me(a)
print(a) # Value of a did not get doubled!!
When n is assigned a new value, the association is broken between that
variable and the value that was passed. In effect, n is a local variable that is
now associated with a different place in memory. The variable passed to the
function is unaffected.
But you can always use a return value this way:
def double_me(n):
return n * 2
a = 10
a = double_me(a)
print(a)
Therefore, to get an out parameter, just return a value. But what if you
want more than one out parameter?
In Python, you can return as many values as you want. For example, the
following function performs the quadratic equation by returning two values.
def quad(a, b, c):
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
This function has three input arguments and two output variables. In call-
ing the function, it’s important to receive both arguments:
x1, x2 = quad(1, -1, -1)
If you return multiple values to a single variable in this case, that variable
will store the values as a tuple. Here’s an example:
>>> x = quad(1, -1, -1)
>>> x
(1.618033988749895, -0.6180339887498949)
Note that this feature—returning multiple values—is actually an applica-
tion of the use of tuples in Python.
4
◗ Nonempty collections and nonempty strings evaluate as True; so do nonzero
numeric values.
◗ Zero-length collections and zero-length strings evaluate to False; so does
any number equal to 0, as well as the special value None.
4
Here’s an example:
a = b = c = d = e = 100
if a == b == c == d == e:
print('All the variables are equal to each other.')
For larger data sets, there are ways to achieve these results more efficiently.
Any list, no matter how large, can be tested to see whether all the elements are
equal this way:
if min(a_list) == max(a_list):
print('All the elements are equal to each other.')
However, when you just want to test a few variables for equality or perform
a combination of comparisons on a single line, the techniques shown in this
section are a nice convenience with Python. Yay, Python!
elif n == 3:
do_volume_subplot(stockdf)
elif n == 4:
do_movingavg_plot(stockdf)
Code like this is verbose. It will work, but it’s longer than it needs to be.
But Python functions are objects, and they can be placed in a list just like any
other kind of objects. You can therefore get a reference to one of the functions
and call it.
fn = [do_plot, do_highlow_plot, do_volume_subplot,
do_movingavg_plot][n-1]
fn(stockdf) # Call the function
For example, n-1 is evaluated, and if that value is 0 (that is, n is equal to 1),
the first function listed, do_plot, is executed.
This code creates a compact version of a C++ switch statement by calling
a different function depending on the value of n. (By the way, the value 0 is
excluded in this case, because that value is used to exit.)
You can create a more flexible control structure by using a dictionary com-
bined with functions. For example, suppose that “load,” “save,” “update,”
and “exit” are all menu functions. We might implement the equivalent of a
switch statement this way:
menu_dict = {'load':load_fn, 'save':save_fn,
'exit':exit_fn, 'update':update_fn}
(menu_dict[selector])() # Call the function
Now the appropriate function will be called, depending on the string con-
tained in selector, which presumably contains 'load', 'save', 'update',
or 'exit'.
4
False. When you’re certain that you’re comparing a value to a unique object,
then the is keyword works reliably; moreover, it’s preferable in those situa-
tions because such a comparison is more efficient.
a_value = my_function()
if a_value is None:
# Take special action if None is returned.
0 1 2 3 4 5 6 7 8 9
Notice that when you’re within IDLE, this for loop is like any other: You
need to type an extra blank line in order to terminate it.
5 7 9 11 13
You can squeeze other kinds of loops onto a line in this way. Also, you don’t
have to use loops but can place any statements on a line that you can manage
to fit there.
>>> a = 1; b = 2; c = a + b; print(c)
3
At this point, some people may object, “But with those semicolons, this
looks like C code!” (Oh, no—anything but that!)
Maybe it does, but it saves space. Keep in mind that the semicolons are
statement separators and not terminators, as in the old Pascal language.
4
blue = 1
green = 2
black = 3
white = 4
This works fine, but it would be nice to find a way to automate this code.
There is a simple trick in Python that allows you to do that, creating an enu-
meration. You can take advantage of multiple assignment along with use of
the range function:
red, blue, green, black, white = range(5)
The number passed to range in this case is the number of settings. Or, if
you want to start the numbering at 1 instead of 0, you can use the following:
red, blue, green, black, white = range(1, 6)
Note Ë For more sophisticated control over the creation and specification of
enumerated types, you can import and examine the enum package.
import enum
help(enum)
You can find information on this feature at
[Link]
Ç Note
4
scores were present. This technique involves several rules.
This technique affects only how numbers appear in the code itself and not
how anything is printed. To print a number with thousands-place separators,
use the format function or method as described in Chapter 5, “Formatting
Text Precisely.”
To use Python from the command line, first start the DOS Box applica-
tion, which is present as a major application on all Windows systems. Python
should be easily available because it should be placed in a directory that is part
of the PATH setting. Checking this setting is easy to do while you’re running
a Windows DOS Box.
In Windows, you can also check the PATH setting by opening the Control
Panel, choose Systems, and select the Advanced tab. Then click Environment
Variables.
You then should be able to run Python programs directly as long as they’re
in your PATH. To run a program from the command line, enter python and
the name of the source file (the main module), including the .py extension.
python [Link]
4
On Windows-based systems, use the following command to download and
install a desired package.
pip install package_name
The package name, incidentally, uses no file extension:
pip install numpy
On Macintosh systems, you may need to use the pip3 utility, which is
download with Python 3 when you install it on your computer. (You may also
have inherited a version of pip, but it will likely be out-of-date and unusable.)
pip3 install package_name
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
When this doc string is entered in a function definition, you can get help
from within IDLE:
>>> help(quad)
Help on function quad in module _ _main_ _:
quad(a, b, c)
Quadratic Formula function.
◗ The doc string itself must immediately follow the heading of the function.
◗ It must be a literal string utilizing the triple-quote feature. (You can actually
use any style quote, but you need a literal quotation if you want to span mul-
tiple lines.)
◗ The doc string must also be aligned with the “level-1” indentation under the
function heading: For example, if the statements immediately under the func-
tion heading are indented four spaces, then the beginning of the doc string
must also be indented four spaces.
◗ Subsequent lines of the doc string may be indented as you choose, because
the string is a literal string. You can place the subsequent lines flush left or
continue the indentation you began with the doc string. In either case, Python
online help will line up the text in a helpful way.
This last point needs some clarification. The doc string shown in the previ-
ous example could have been written this way:
def quad(a, b, c):
'''Quadratic Formula function.
4
This function applies the Quadratic Formula
to determine the roots of x in a quadratic
equation of the form ax^2 + bx + c = 0.
'''
As part of the stylistic guidelines, it’s recommended that you put in a brief
summary of the function, followed by a blank line, followed by more detailed
description.
When running Python from the command line, you can use the pydoc util-
ity to get this same online help shown earlier. For example, you could get help
on the module named [Link]. The pydoc utility responds by printing a
help summary for every function. Note that “py” is not entered as part of the
module name in this case.
python -m pydoc queens
◗ Packages included with the Python download itself. This includes math, random,
sys, os, time, datetime, and [Link]. These packages are especially conve-
nient, because no additional downloading is necessary.
◗ Packages you can download from the Internet.
import package_name
For example:
import math
Once a package is imported, you can, within IDLE, get help on its contents.
Here’s an example:
>>> import math
>>> help(math)
If you type these commands from within IDLE, you’ll see that the math
package supports a great many functions.
But with this approach, each of the functions needs to be qualified using
the dot (.) syntax. For example, one of the functions supported is sqrt (square
root), which takes an integer or floating-point input.
>>> [Link](2)
1.4142135623730951
You can use the math package, if you choose, to calculate the value of pi.
However, the math package also provides this number directly.
>>> [Link](1) * 4
3.141592653589793
>>> [Link]
3.141592653589793
Let’s look at one of the variations on the import statement.
ntax
Key Sy
4
an asterisk (*).
>>> from math import *
>>> print(pi)
3.141592653589793
>>> print(sqrt(2))
1.4142135623730951
The drawback of using this version of import is that with very large and
complex programs, it gets difficult to keep track of all the names you’re using,
and when you import packages without requiring a package-name qualifier,
name conflicts can arise.
So, unless you know what you’re doing or are importing a really small pack-
age, it’s more advisable to import specific symbols than use the asterisk (*).
4
and plotting routines to create impressive-looking graphs.
This package is explored in Chapter 15. It also needs to be downloaded.
new name. You can also assign a different function altogether to the symbolic
name, avg.
def new_func(a_list):
return (sum(a_list) / len(a_list))
old_avg = avg
avg = new_func
The symbolic name old_avg now refers to the older, and longer, function
we defined before. The symbolic name avg now refers to the newer function just
defined.
The name old_avg now refers to our first averaging function, and we can
call it, just as we used to call avg.
>>> old_avg([4, 6])
The average is 5.0
5.0
The next function shown (which we might loosely term a “metafunction,”
although it’s really quite ordinary) prints information about another function—
specifically, the function argument passed to it.
def func_info(func):
print('Function name:', func._ _name_ _)
print('Function documentation:')
help(func)
If we run this function on old_avg, which has been assigned to our first
averaging function at the beginning of this section, we get this result:
Function name: avg
Function documentation:
Help on function avg in module _ _main_ _:
avg(a_list)
This function finds the average val in a list.
We’re currently using the symbolic name old_avg to refer to the first func-
tion that was defined in this section. Notice that when we get the function’s
name, the information printed uses the name that the function was originally
defined with.
All of these operations will become important when we get to the topic of
“decorating” in Section 4.9, “Decorators and Function Profilers.”
4
The brackets are used in this case to show that *args may optionally be
preceded by any number of ordinary positional arguments, represented here
as ordinary_args. The use of such arguments is always optional.
In this syntax, the name args can actually be any symbolic name you want.
By convention, Python programs use the name args for this purpose.
The symbolic name args is then interpreted as a Python list like any other;
you expand it by indexing it or using it in a for loop. You can also take its
length as needed. Here’s an example:
def my_var_func(*args):
print('The number of args is', len(args))
for item in args:
print(items)
This function, my_var_func, can be used with argument lists of any length.
>>> my_var_func(10, 20, 30, 40)
The number of args is 4
10
20
30
40
A more useful function would be one that took any number of numeric
arguments and returned the average. Here’s an easy way to write that function.
def avg(*args):
return sum(args)/len(args)
Now we can call the function with a different number of arguments each
time.
>>> avg(11, 22, 33)
22.0
>>> avg(1, 2)
1.5
The advantage of writing the function this way is that no brackets are
needed when you call this function. The arguments are interpreted as if they
were elements of a list, but you pass these arguments without list syntax.
What about the ordinary arguments we mentioned earlier? Additional
arguments, not included in the list *args, must either precede *args in the
argument list or be keyword arguments.
For example, let’s revisit the avg example. Suppose we want a separate
argument that specifies what units we’re using. Because units is not a key-
word argument, it must appear at the beginning of the list, in front of *args.
def avg(units, *args):
print (sum(args)/len(args), units)
Here’s a sample use:
>>> avg('inches', 11, 22, 33)
22.0 inches
This function is valid because the ordinary argument, units, precedes the
argument list, *args.
Note Ë The asterisk (*) has a number of uses in Python. In this context, it’s
called the splat or the positional expansion operator. Its basic use is to rep-
resent an “unpacked list”; more specifically, it replaces a list with a simple
sequence of separate items.
The limitation on such an entity as *args is that there isn’t much you can
do with it. One thing you can do (which will be important in Section 4.9,
“Decorators and Function Profilers”) is pass it along to a function. Here’s an
example:
>>> ls = [1, 2, 3] # Unpacked list.
>>> print(*ls) # Print unpacked version
1 2 3
>>> print(ls) # Print packed (ordinary list).
[1, 2, 3]
4
arguments.
ntax
Key Sy
The following example defines such a function and then calls it.
def pr_vals_2(*args, **kwargs):
for i in args:
print(i)
for k in kwargs:
print(k, ':', kwargs[k])
Note Ë Although args and kwargs are expanded into a list and a dictionary,
respectively, these symbols can be passed along to another function, as shown
in the next section.
Ç Note
F1 = Decorator(F1)
4
Here’s an example of a decorator function that takes a function as argu-
ment and wraps it by adding calls to the [Link] function. Note that time
is a package, and it must be imported before [Link] is called.
import time
def make_timer(func):
def wrapper():
t1 = [Link]()
ret_val = func()
t2 = [Link]()
print('Time elapsed was', t2 - t1)
return ret_val
return wrapper
There are several functions involved with this simple example (which, by
the way, is not yet complete!), so let’s review.
◗ There is a function to be given as input; let’s call this the original function (F1
in this case). We’d like to be able to input any function we want, and have it
decorated—that is, acquire some additional statements.
◗ The wrapper function is the result of adding these additional statements to
the original function. In this case, these added statements report the number
of seconds the original function took to execute.
◗ The decorator is the function that performs the work of creating the wrapper
function and returning it. The decorator is able to do this because it internally
uses the def keyword to define a new function.
If you look at this decorator function, you should notice it has an important
omission: The arguments to the original function, func, are ignored. The wrap-
per function, as a result, will not correctly call func if arguments are involved.
The solution involves the *args and **kwargs language features, intro-
duced in the previous section. Here’s the full decorator:
import time
def make_timer(func):
def wrapper(*args, **kwargs):
t1 = [Link]()
ret_val = func(*args, **kwargs)
t2 = [Link]()
print('Time elapsed was', t2 - t1)
return ret_val
return wrapper
The new function, remember, will be wrapper. It is wrapper (or rather, the
function temporarily named wrapper) that will eventually be called in place
of func; this wrapper function therefore must be able to take any number of
arguments, including any number of keyword arguments. The correct action
is to pass along all these arguments to the original function, func. Here’s how:
ret_val = func(*args, **kwargs)
Returning a value is also handled here; the wrapper returns the same value
as func, as it should. What if func returns no value? That’s not a problem,
because Python functions return None by default. So the value None, in that
case, is simply passed along. (You don’t have to test for the existence of a
return value; there always is one!)
Having defined this decorator, make_timer, we can take any function and
produce a wrapped version of it. Then—and this is almost the final trick—
we reassign the function name so that it refers to the wrapped version of the
function.
def count_nums(n):
for i in range(n):
for j in range(1000):
pass
count_nums = make_timer(count_nums)
4
time, and (2) this more elaborate version is what the name, count_nums, will
hereafter refer to. Python symbols can refer to any object, including functions
(callable objects). Therefore, we can reassign function names all we want.
count_nums = wrapper
Or, more accurately,
count_nums = make_timer(count_nums)
So now, when you run count_nums (which now refers to the wrapped ver-
sion of the function), you’ll get output like this, reporting execution time in
seconds.
>>> count_nums(33000)
Time elapsed was 1.063697338104248
The original version of count_nums did nothing except do some count-
ing; this wrapped version reports the passage of time in addition to calling the
original version of count_nums.
As a final step, Python provides a small but convenient bit of syntax to
automate the reassignment of the function name.
ntax
Key Sy
@decorator
def func(args):
statements
This syntax is translated into the following:
def func(args):
statements
func = decorator(func)
In either case, it’s assumed that decorator is a function that has already
been defined. This decorator must take a function as its argument and return
a wrapped version of the function. Assuming all this has been done correctly,
here’s a complete example utilizing the @ sign.
@make_timer
def count_nums(n):
for i in range(n):
for j in range(1000):
pass
After this definition is executed by Python, count_num can then be called,
and it will execute count_num as defined, but it will also add (as part of the
wrapper) a print statement telling the number of elapsed seconds.
Remember that this part of the trick (the final trick, actually) is to get the
name count_nums to refer to the new version of count_nums, after the new
statements have been added through the process of decoration.
4.10 Generators
There’s no subject in Python about which more confusion abounds than gen-
erators. It’s not a difficult feature once you understand it. Explaining it’s the
hard part.
But first, what does a generator do? The answer: It enables you to deal with
a sequence one element at a time.
Suppose you need to deal with a sequence of elements that would take a
long time to produce if you had to store it all in memory at the same time. For
example, you want to examine all the Fibonacci numbers up to 10 to the 50th
power. It would take a lot of time and space to calculate the entire sequence.
Or you may want to deal with an infinite sequence, such as all even numbers.
The advantage of a generator is that it enables you to deal with one member
of a sequence at a time. This creates a kind of “virtual sequence.”
4
>>> iter1 = reversed([1, 2, 3, 4])
>>> for i in iter1:
print(i, end=' ')
4 3 2 1
Iterators have state information; after reaching the end of its series, an iter-
ator is exhausted. If we used iter1 again without resetting it, it would produce
no more values.
Here’s what almost everybody gets wrong when trying to explain this pro-
cess: It looks as if the yield statement, placed in the generator function (the
thing on the left in Figure 4.4), is doing the yielding. That’s “sort of” true, but
it’s not really what’s going on.
The generator function defines the behavior of the iterator. But the iterator
object, the thing to its right in Figure 4.4, is what actually executes this behavior.
When you include one or more yield statements in a function, the func-
tion is no longer an ordinary Python function; yield describes a behavior in
which the function does not return a value but sends a value back to the caller
of next. State information is saved, so when next is called again, the iterator
advances to the next value in the series without starting over. This part, every-
one seems to understand.
But—and this is where people get confused—it isn’t the generator function
that performs these actions, even though that’s where the behavior is defined.
Fortunately, you don’t need to understand it; you just need to use it. Let’s start
with a function that prints even numbers from 2 to 10:
def print_evens():
for n in range(2, 11, 2):
print(n)
Now replace print(n) with the statement yield n. Doing so changes the
nature of what the function does. While we’re at it, let’s change the name to
make_evens_gen to have a more accurate description.
4
iterator object, and that’s the object that yields a value. We can save the itera-
tor object (or generator object) and then pass it to next.
>>> my_gen = make_evens_gen()
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
Eventually, calling next exhausts the series, and a StopIteration excep-
tion is raised. But what if you want to reset the sequence of values to the begin-
ning? Easy. You can do that by calling make_evens_gen again, producing a
new instance of the iterator. This has the effect of starting over.
>>> my_gen = make_evens_gen() # Start over
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
>>> my_gen = make_evens_gen() # Start over
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
What happens if you call make_evens_gen every time? In that case, you
keep starting over, because each time you’re creating a new generator object.
This is most certainly not what you want.
>>> next(make_evens_gen())
2
>>> next(make_evens_gen())
2
>>> next(make_evens_gen())
2
Generators can be used in for statements, and that’s one of the most fre-
quent uses. For example, we can call make_evens_gen as follows:
for i in make_evens_gen():
print(i, end=' ')
This block of code produces the result you’d expect:
2 4 6 8 10
But let’s take a look at what’s really happening. The for block calls make_
evens_gen one time. The result of the call is to get a generator object. That
object then provides the values in the for loop. The same effect is achieved by
the following code, which breaks the function call onto an earlier line.
>>> my_gen = make_evens_gen()
>>> for i in my_gen:
print(i, end=' ')
Remember that my_gen is an iterator object. If you instead referred to
make_evens_gen directly, Python would raise an exception.
for i in make_evens_gen: # ERROR! Not an iterable!
print(i, end=' ')
Once you understand that the object returned by the generator function
is the generator object, also called the iterator, you can call it anywhere an
iterable or iterator is accepted in the syntax. For example, you can con-
vert a generator object to a list, as follows.
>>> my_gen = make_evens_gen()
>>> a_list = list(my_gen)
>>> a_list
[2, 4, 6, 8, 10]
4
>>> a_list = list(make_evens_gen())
>>> a_list
[2, 4, 6, 8, 10]
One of the most practical uses of an iterator is with the in and not in
keywords. We can, for example, generate an iterator that produces Fibonacci
numbers up to and including N, but not larger than N.
def make_fibo_gen(n):
a, b = 1, 1
while a <= n:
yield a
a, b = a + b, a
The yield statement changes this function from an ordinary function to
a generator function, so it returns a generator object (iterator). We can now
determine whether a number is a Fibonacci by using the following test:
n = int(input('Enter number: '))
if n in make_fibo_gen(n):
print('number is a Fibonacci. ')
else:
print('number is not a Fibonacci. ')
This example works because the iterator produced does not yield an infinite
sequence, something that would cause a problem. Instead, the iterator termi-
nates if n is reached without being confirmed as a Fibonacci.
Remember—and we state this one last time—by putting yield into the
function make_fibo_gen, it becomes a generator function and it returns the
generator object we need. The previous example could have been written as
follows, so that the function call is made in a separate statement. The effect is
the same.
n = int(input('Enter number: '))
my_fibo_gen = make_fibo_gen(n)
if n in my_fibo_gen:
print('number is a Fibonacci. ')
else:
print('number is not a Fibonacci. ')
As always, remember that a generator function (which contains the yield
statement) is not a generator object at all, but rather a generator factory. This
is confusing, but you just have to get used to it. In any case, Figure 4.4 shows
what’s really going on, and you should refer to it often.
len(argv) = 4
4
Program name
Figure 4.5. Command-line arguments and argv
In most cases, you’ll probably ignore the program name and focus on the
other arguments. For example, here is a program named [Link] that does
nothing but print all the arguments given to it, including the program name.
import sys
for thing in [Link]:
print(thing, end=' ')
Now suppose we enter this command line:
python [Link] arg1 arg2 arg3
The Terminal program (in Mac) or the DOS Box prints the following:
[Link] arg1 arg2 arg3
The following example gives a more sophisticated way to use these strings,
by converting them to floating-point format and passing the numbers to the
quad function.
import sys
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
def main():
'''Get argument values, convert, call quad.'''
main()
The interesting line here is this one:
s1, s2, s3 = [Link][1], [Link][2], [Link][3]
Again, the [Link] list is zero-based, like any other Python list, but the
program name, referred to as [Link][0], typically isn’t used in the program
code. Presumably you already know what the name of your program is, so you
don’t need to look it up.
Of course, from within the program you can’t always be sure that argument
values were specified on the command line. If they were not specified, you
may want to provide an alternative, such as prompting the user for these same
values.
Remember that the length of the argument list is always N+1, where N
is the number of command-line arguments—beyond the program name, of
course.
Therefore, we could revise the previous example as follows:
import sys
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
def main():
'''Get argument values, convert, call quad.'''
main()
The key lines in this version are in the following if statement:
if len([Link]) > 3:
4
s1, s2, s3 = [Link][1], [Link][2], [Link][3]
else:
s1 = input('Enter a: ')
s2 = input('Enter b: ')
s3 = input('Enter c: ')
a, b, c = float(s1), float(s2), float(s3)
If there are at least four elements in [Link] (and therefore three
command-line arguments beyond the program name itself), the program uses
those strings. Otherwise, the program prompts for the values.
So, from the command line, you’ll be able to run the following:
python [Link] 1 -9 20
The program then prints these results:
x values: 4.0 5.0
Chapter 4 Summary
A large part of this chapter presented ways to improve your efficiency through
writing better and more efficient Python code. Beyond that, you can make your
Python programs run faster if you call the print function as rarely as possible
from within IDLE—or else run programs from the command line only.
A technique helpful in making your code more efficient is to profile it by
using the time and datetime packages to compute the relative speed of the
code, given different algorithms. Writing decorators is helpful in this respect,
because you can use them to profile function performance.
145
From the Library of Vineeth Babu
A better approach is to use the str class formatting operator (%) to format
the output, using format specifiers like those used by the C-language “printf”
function. Here’s how you’d revise the example:
print('%d plus %d equals %d.' % (a, b, c))
Isn’t that better?
The expression (a, b, c) is actually a tuple containing three arguments,
each corresponding to a separate occurrence of %d within the format string.
The parentheses in (a, b, c) are strictly required—although they are not
required if there is only one argument.
>>> 'Here is a number: %d.' % 100
'Here is a number: 100.'
These elements can be broken up programmatically, of course. Here’s an
example:
n = 25 + 75
fmt_str = 'The sum is %d.'
print(fmt_str % n)
This example prints the following:
The sum is 100.
The string formatting operator, %, can appear in either of these two
versions.
ntax
Key Sy
5
%x Hexadecimal integer. ff09a
%X Same as %x, but letter digits A–F are uppercase. FF09A
%o Octal integer. 177
%u Unsigned integer. (But note that this doesn’t reliably change signed 257
integers into their unsigned equivalent, as you’d expect.)
%f Floating-point number to be printed in fixed-point format 3.1400
%F Same as %f. 33.1400
%e Floating-point number, printing exponent sign (e). 3.140000e+00
%E Same as %e but uses uppercase E. 3.140000E+00
%g Floating point, using shortest canonical representation. 7e-06
%G Same as %g but uses uppercase E if printing an exponent. 7E-06
%% A literal percent sign (%). %
Here’s an example that uses the int conversion, along with hexadecimal
output, to add two hexadecimal numbers: e9 and 10.
h1 = int('e9', 16)
h2 = int('10', 16)
print('The result is %x.' % (h1 + h2))
The example prints
The result is f9.
%[-][width][.precision]c
In this syntax, the square brackets indicate optional items and are not
intended literally. The minus sign (–) specifies left justification within the print
field. With this technology, the default is right justification for all data types.
But the following example uses left justification, which is not the default,
by including the minus sign (–) as part of the specifier.
>>> 'This is a number: %-6d.' % 255
'This is a number: 255 .'
As for the rest of the syntax, a format specifier can take any of the follow-
ing formats.
%c
%widthc
%[Link]
%.precisionc
5
These statements print
Amount is 25.
Amount is 00025.
Amount is 00025.
Finally, the width and precision fields control print-field width and pre-
cision in a floating-point number. The precision is the number of digits to the
right of the decimal point; this number contains trailing zeros if necessary.
Here’s an example:
print('result:%12.5f' % 3.14)
print('result:%12.5f' % 333.14)
These statements print the following:
result: 3.14000
result: 333.14000
In this case, the number 3.14 is padded with trailing zeros, because a pre-
cision of 5 digits was specified. When the precision field is smaller than the
precision of the value to be printed, the number is rounded up or down as
appropriate.
print('%.4f' % 3.141592)
This function call prints the following—in this case with 4 digits of preci-
sion, produced through rounding:
3.1416
Use of the %s and %r format characters enables you to work with any classes
of data. These specifiers result in the calling of one of the internal methods
from those classes supporting string representation of the class, as explained
in Chapter 9, “Classes and Magic Methods.”
In many cases, there’s no difference in effect between the %s and %r speci-
fiers. For example, either one, used with an int or float object, will result in
that number being translated into the string representation you’d expect.
You can see those results in the following IDLE session, in which user input
is in bold.
>>> 'The number is %s.' % 10
The number is 10.
>>> 'The number is %r.' % 10
The number is 10.
From these examples, you can see that both the %s and the %r just print the
standard string representation of an integer.
In some cases, there is a difference between the string representation indi-
cated by %s and by %r. The latter is intended to get the canonical representa-
tion of the object as it appears in Python code.
One of the principal differences between the two forms of representation is
that the %r representation includes quotation marks around strings, whereas
%s does not.
>>> print('My name is %r.' % 'Sam')
My name is 'Sam'.
>>> print('My name is %s.' % 'Sam')
My name is Sam.
5
there must be an additional argument. So if you want to format two such
data objects at once, you’d need to have four arguments altogether. Here’s an
example:
>>> 'Item 1: %*s, Item 2: %*s' % (8, 'Bob', 8, 'Suzanne')
'Item 1: Bob, Item 2: Suzanne'
The arguments—all placed in the tuple following the argument (with
parentheses required, by the way)—are 8, 'Bob', 8, and 'Suzanne'.
The meaning of these four arguments is as follows:
✱ Where you’d normally put an integer as a formatting code, you can instead
place an asterisk (*); and for each such asterisk, you must place a correspond-
ing integer expression in the argument list.
Class of object
being printed
5
For each
print
field
May have
multiple print
fields
Figure 5.1. Flow of control between formatting routines
ntax
Key Sy
format(data, spec)
This function returns a string after evaluating the data and then formatting
according to the specification string, spec. The latter argument is a string
containing the specification for printing one item.
The syntax shown next provides a simplified view of spec grammar. It
omits some features such as the fill and align characters, as well as the use of 0
in right justifying and padding a number. To see the complete syntax of spec,
see Section 5.8, “The ‘spec’ Field of the ‘format’ Method.”
ntax
Key Sy
[width][,][.precision][type]
In this syntax, the brackets are not intended literally but signify optional
items. Here is a summary of the meaning.
The function attempts to place the string representation of the data into a
print field of width size, justifying text if necessary by padding with spaces.
Numeric data is right justified by default; string data is left justified by default.
The comma (,) indicates insertion of commas as thousands place separa-
tors. This is legal only with numeric data; otherwise, an exception is raised.
The precision indicates the total number of digits to print with a float-
ing-point number, or, if the data is not numeric, a maximum length for string
data. It is not supported for use with integers. If the type_char is f, then the
precision indicates a fixed number of digits to print to the right of the decimal
point.
The type_char is sometimes a radix indicator, such as b or x (binary or
hexadecimal), but more often it is a floating-point specifier such as f, which
indicates fixed-point format, or e and g, as described later in Table 5.5.
Table 5.2 gives some examples of using this specification. You can figure
out most of the syntax by studying these examples.
The remainder of this section discusses the features in more detail, particu-
larly width and precision fields.
5
The thousands place separator is fairly self-explanatory but works only
with numbers. Python raises an exception if this specifier is used with data
that isn’t numeric.
You might use it to format a large number such as 150 million.
>>> n = 150000000
>>> print(format(n, ','))
150,000,000
The width character is used consistently, always specifying a minimum
print-field width. The string representation is padded—with spaces by
default—and uses a default of left justification for strings and right justifica-
tion for numbers. Both the padding character and justification can be altered,
however, as explained later in this chapter, in Section 5.8.2, “Text Justifica-
tion: ‘fill’ and ‘align’ Characters.”
Here are examples of justification, padding, and print fields. The single
quotation marks implicitly show the extent of the print fields. Remember that
numeric data (150 and 99, in this case) are right justified by default, but other
data is not.
>>> format('Bob', '10')
'Bob '
>>> format('Suzie', '7')
'Suzie '
format_specifying_str.format(args)
Let’s break down the syntax a little. This expression passes through all
the text in format_specifying_str (or just “format string”), except where
there’s a print field. Print fields are denoted as “{}.” Within each print field,
the value of one of the args is printed.
If you want to print data objects and are not worried about the finer issues
5
of formatting, just use a pair of curly braces, {}, for each argument. Strings are
printed as strings, integers are printed as integers, and so on, for any type of
data. Here’s an example:
fss = '{} said, I want {} slices of {}.'
name = 'Pythagoras'
pi = 3.141592
print([Link](name, 2, pi))
This prints
Pythagoras said, I want 2 slices of 3.141592.
The arg values, of course, either can be constants or can be supplied by
variables (such as name and pi in this case).
Curly braces are special characters in this context. To print literal curly
braces, not interpreted as field delimiters, use {{ and }}. Here’s an example:
print('Set = {{{}, {}}}'.format(1, 2))
This prints
Set = {1, 2}
This example is a little hard to read, but the following may be clearer.
Remember that double open curly braces, {{, and double closed curly braces,
}}, cause a literal curly brace to be printed.
fss = 'Set = {{ {}, {}, {} }}'
print([Link](15, 35, 25))
This prints
Set = { 15, 35, 25 }
Of course, as long as you have room on a line, you can put everything
together:
print('Set = {{ {}, {}, {} }}'.format(15, 35, 25))
This prints the same output. Remember that each pair of braces defines
a print field and therefore causes an argument to be printed, but {{ and }}
cause printing of literal braces.
✱ A call to the format method must have at least as many arguments as the
format-specification string has print fields, unless fields are repeated as shown
at the end of this section. But if more arguments than print fields appear, the
excess arguments (the last ones given) are ignored.
5
These are zero-based indexes, so they are numbered 0, 1, and 2.
print('The items are {2}, {1}, {0}.'.format(10, 20, 30))
This statement prints
The items are 30, 20, 10.
You can also use zero-based index numbers to refer to excess arguments, in
which there are more arguments than print fields. Here’s an example:
fss = 'The items are {3}, {1}, {0}.'
print([Link](10, 20, 30, 40))
These statements print
The items are 40, 20, 10.
Note that referring to an out-of-range argument raises an error. In this
example there are four arguments, so they are indexed as 0, 1, 2, and 3. No
index number was an out-of-range reference in this case.
Print fields can also be matched to arguments according to argument
names. Here’s an example:
fss = 'a equals {a}, b equals{b}, c equals {c}.'
print([Link](a=10, c=100, b=50))
5
print(str(10)) # So does this!
But for some types of data, there is a separate repr conversion that is not
the same as str. The repr conversion translates a data object into its canon-
ical representation in source code—that is, how it would look inside a Python
program.
Here’s an example:
print(repr(10)) # This ALSO prints 10.
In this case, there’s no difference in what gets printed. But there is a dif-
ference with strings. Strings are stored in memory without quotation marks;
such marks are delimiters that usually appear only in source code. Furthermore,
escape sequences such as \n (a newline) are translated into special characters
when they are stored; again \n is a source-code representation, not the actual
storage.
Take the following string, test_str:
test_str = 'Here is a \n newline! '
Printing this string directly causes the following to be displayed:
Here is a
newline!
But applying repr to the string and then printing it produces a different
result, essentially saying, “Show the canonical source-code representation.”
This includes quotation marks, even though they are not part of the string
itself unless they’re embedded. But the repr function includes quotation
marks because they are part of what would appear in Python source code to
represent the string.
print(repr(test_str))
This statement prints
'Here is a \n newline.'
The %s and %r formatting specifiers, as well as the format method, enable
you to control which style of representation to use. Printing a string argument
without repr has the same effect as printing it directly. Here’s an example:
>>> print('{}'.format(test_str))
Here is a
newline!
Using the !r modifier causes a repr version of the argument to be used—
that is, the repr conversion is applied to the data.
>>> print('{!r}'.format(test_str))
'Here is a \n newline! '
The use of !r is orthogonal with regard to position ordering. Either may
be used without interfering with the other. So can you see what the following
example does?
>>> print('{1!r} loves {0!r}'.format('Joanie', 'ChaCha'))
'ChaCha' loves 'Joanie'
The formatting characters inside the curly braces do two things in this case.
First, they use position indexes to reverse “Joanie loves ChaCha”; then the !r
format causes the two names to be printed with quotation marks, part of the
canonical representation within Python code.
Note Ë Where !s or !r would normally appear, you can also use !a, which is
similar to !s but returns an ASCII-only string.
Ç Note
[[fill]align][sign][#][0][width][,][.prec][type]
The items here are mostly independent of each other. Python interprets
each item according to placement and context. For example, prec (precision)
appears right after a decimal point (.) if it appears at all.
When looking at the examples, remember that curly braces and colons are
used only when you use spec with the global format function and not the
format method. With the format function, you might include align, sign,
0, width, precision, and type specifiers, but no curly braces or colon.
Here’s an example:
s = format(32.3, '<+08.3f')
5
5.8.1 Print-Field Width
One of the commonly used items is print-field width, specified as an integer.
The text to be printed is displayed in a field of this size. If the text is shorter
than this width, it’s justified and extra spaces are padded with blank spaces by
default.
Placement: As you can see from the syntax display, the width item is in
the middle of the spec syntax. When used with the format method, width
always follows a colon (:), as does the rest of the spec syntax.
The following example shows how width specification works on two num-
bers: 777 and 999. The example uses asterisks (*) to help illustrate where the
print fields begin and end, but otherwise these asterisks are just literal charac-
ters thrown in for the sake of illustration.
n1, n2 = 777, 999
print('**{:10}**{:2}**'.format(n1, n2))
This prints
** 777**999**
The numeral 777 is right justified within a large print field (10). This
is because, by default, numeric data is right justified and string data is left
justified.
The numeral 999 exceeds its print-field size (2) in length, so it is simply
printed as is. No truncation is performed.
Width specification is frequently useful with tables. For example, suppose
you want to print a table of integers, but you want them to line up.
10
2001
2
55
144
2525
1984
It’s easy to print a table like this. Just use the format method with a print-
field width that’s wider than the longest number you expect. Because the data
is numeric, it’s right justified by default.
'{:5}'.format(n)
Print-field width is orthogonal with most of the other capabilities. The
“ChaCha loves Joanie” example from the previous section could be revised:
fss = '{1!r:10} loves {0!r:10}!!'
print([Link]('Joanie', 'ChaCha'))
This prints
'ChaCha' loves 'Joanie' !!
The output here is similar output to the earlier “ChaCha and Joanie”
example but adds a print-field width of 10 for both arguments. Remember
that a width specification must appear to the right of the colon; otherwise it
would function as a position number.
[[fill]align]
Placement: these items, if they appear within a print-field specification,
precede all other parts of the syntax, including width. Here’s an example con-
taining fill, align, and width:
{:->24}
◗ The colon (:) is the first item to appear inside the print-field spec when you’re
working with the format method (but not the global format function).
◗ After the colon, a fill and an align character appear. The minus sign (-) is
the fill character here, and the alignment is right justification (>).
◗ After fill and align are specified, the print-field width of 24 is given.
Because the argument to be printed (' Hey Bill G, pick me!') is 20 char-
acters in length but the print-field width is 24 characters, four copies of the fill
character, a minus sign in this case, are used for padding.
5
The fill character can be any character other than a curly brace. Note that
if you want to pad a number with zeros, you can alternatively use the '0' speci-
fier described in Section 5.8.4, “The Leading Zero Character (0).”
The align character must be one of the four values listed in Table 5.3.
Note Ë Remember (and sorry if we’re getting a little redundant about this), all the
examples for the spec grammar apply to the global format function as well.
But the format function, as opposed to the format method, does not use curly
braces to create multiple print fields. It works on only one print field at a time.
Here’s an example:
print(format('Lady', '@<7')) # Print 'Lady@@@'
Ç Note
5
Notice how there’s an extra space in front of the first occurrence of 25,
even though it’s nonnegative; however, if the print fields had definite widths
assigned—which they do not in this case—that character would produce no
difference.
This next example applies the same formatting to three negative values (–25).
print('results>{: },{:+},{:-}'.format(-25, -25, -25))
This example prints the following output, illustrating that negative num-
bers are always printed with a minus sign.
results>-25,-25,-25
This prints
0000125 0000025156.
Here’s another example:
print('{:08}'.format(375)) # This prints 00000375
The same results could have been achieved by using fill and align char-
acters, but because you can’t specify fill without also explicitly specifying
align, that approach is slightly more verbose.
fss = '{:0>7} {:0>10}'
Although these two approaches—specifying 0 as fill character and specify-
ing a leading zero—are often identical in effect, there are situations in which
the two cause different results. A fill character is not part of the number itself
and is therefore not affected by the comma, described in the next section.
There’s also interaction with the plus/minus sign. If you try the following,
you’ll see a difference in the location where the plus sign (+) gets printed.
print('{:0>+10} {:+010}'.format(25, 25))
This example prints
0000000+25 +000000025
5
This example prints
The amount on the check was $***4,500,000
The print width of 12 includes room for the number that was printed,
including the commas (a total of nine characters); therefore, this example uses
three fill characters. The fill character in this case is an asterisk (*). The dollar
sign ($) is not part of this calculation because it is a literal character and is
printed as is.
If there is a leading-zero character as described in Section 5.8.4 (as opposed to
a 0 fill character), the zeros are also grouped with commas. Here’s an example:
print('The amount is {:011,}'.format(13000))
This example prints
The amount is 000,013,000
In this case, the leading zeros are grouped with commas, because all the
zeros are considered part of the number itself.
A print-field size of 12 (or any other multiple of 4), creates a conflict with
the comma, because an initial comma cannot be part of a valid number.
Therefore, Python adds an additional leading zero in that special case.
n = 13000
print('The amount is {:012,}'.format(n))
This prints
The amount is 0,000,013,000
But if 0 is specified as a fill character instead of as a leading zero, the zeros
are not considered part of the number and are not grouped with commas.
Note the placement of the 0 here relative to the right justify (>) sign. This time
it’s just to the left of this sign.
print('The amount is {:0>11,}'.format(n))
This prints
The amount is 0000013,000
.precision
Here are some simple examples in which precision is used to limit the total
number of digits printed.
pi = 3.14159265
phi = 1.618
5
22.100
1000.007
Notice how well things line up in this case. In this context (with the f type
specifier) the precision specifies not the total number of digits but the number
of digits just to the right of the decimal point—which are padded with trailing
zeros if needed.
The example can be combined with other features, such as the thousands
separator, which comes after the width but before precision. Therefore, in this
example, each comma comes right after 10, the width specifier.
fss = ' {:10,.3f}\n {:10,.3f}'
print([Link](22333.1, 1000.007))
This example prints
22,333.100
1,000.007
The fixed-point format f, in combination with width and precision, is
useful for creating tables in which the numbers line up. Here’s an example:
fss = ' {:10.2f}'
for x in [22.7, 3.1415, 555.5, 29, 1010.013]:
print([Link](x))
◗ The fill and align characters are * and <, respectively. The < symbol spec-
ifies left justification, so asterisks are used for padding on the right, if needed.
◗ The width character is 6, so any string shorter than 6 characters in length is
padded after being left justified.
◗ The precision (the character after the dot) is also 6, so any string longer
than 6 characters is truncated.
5
5.8.8 “Type” Specifiers
The last item in the spec syntax is the type specifier, which influences how
the data to be printed is interpreted. It’s limited to one character and has one
of the values listed in Table 5.5.
Placement: When the type specifier is used, it’s the very last item in the
spec syntax.
The next five sections illustrate specific uses of the type specifier.
5
5.8.11 Displaying Percentages
A common use of formatting is to turn a number into a percentage—for exam-
ple, displaying 0.5 as 50% and displaying 1.25 as 125%. You can perform that
task yourself, but the % type specifier automates the process.
The percent format character (%) multiplies the value by 100 and then
appends a percent sign. Here’s an example:
print('You own {:%} of the shares.'.format(.517))
This example prints
You own 51.700000% of the shares.
If a precision is used in combination with the % type specifier, the preci-
sion controls the number of digits to the right of the decimal point as usual—
but after first multiplying by 100. Here’s an example:
print('{:.2%} of {:.2%} of 40...'.format(0.231, 0.5))
This prints
23.10% of 50.00% of 40...
As with fixed-point format, if you want to print percentages so that they
line up nicely in a table, then specify both width and precision specifiers.
5
The way in which arguments are applied with this method is slightly differ-
ent from the way they work with the formatting operator (Section 5.3).
The difference is this: When you use the format method this way, the data
object comes first in the list of arguments; the expressions that alter format-
ting come immediately after. This is true even with multiple print fields. For
example:
>>> '{:{}} {:{}}!'.format('Hi', 3, 'there', 7)
'Hi there !'
Note that with this technology, strings are left justified by default.
The use of position numbers to clarify order is recommended. Use of these
numbers helps keep the meaning of the expressions clearer and more predictable.
The example just shown could well be revised so that it uses the following
expression:
>>> '{0:{1}} {2:{3}}!'.format('Hi', 3, 'there', 7)
'Hi there !'
The meaning of the format is easier to interpret with the position numbers.
By looking at the placement of the numbers in this example, you should be
able to see that position indexes 0 and 2 (corresponding to first and third argu-
ment positions, respectively) refer to the first and third arguments to format.
Chapter 5 Summary
The Python core language provides three techniques for formatting out-
put strings. One is to use the string-class formatting operator (%) on display
strings; these strings contain print-field specifiers similar to those used in the
C language, with “printf” functions.
The second technique involves the format function. This approach allows
you to specify not only things such as width and precision, but also thousands
place grouping and handling of percentages.
The third technique, the format method of the string class, builds on the
global format function but provides the most flexibility of all with multiple
print fields.
5
table that lines up floating-point numbers in a nice column?
7 What features of the format method do you need, at minimum, to print a
table that lines up floating-point numbers in a nice column?
8 Cite at least one example in which repr and str provide a different represen-
tation of a piece of data. Why does the repr version print more characters?
9 The format method enables you to specify a zero (0) as a fill character or as a
leading zero to numeric expressions. Is this entirely redundant syntax? Or can
you give at least one example in which the result might be different?
10 Of the three techniques—format operator (%), global format function, and
format method of the string class—which support the specification of
variable-length print fields?
2 Write a two-dimensional array program that does the following: Take integer
input in the form of five rows of five columns each. Then, by looking at the
maximum print width needed by the entire set (that is, the number of digits in
the biggest number), determine the ideal print width for every cell in the table.
This should be a uniform width, but one that contains the largest entry in the
table. Use variable-length print fields to print this table.
3 Do the same application just described but for floating-point numbers. The
printing of the table should output all the numbers in nice-looking columns.
Note Ë Regular expression syntax has a variety of flavors. The Python regular-
expression package conforms to the Perl standard, which is an advanced and
flexible version.
Ç Note
181
From the Library of Vineeth Babu
But what if you wanted to match a larger set of words? For example, let’s
say you wanted to match the following combination of letters:
✱ The asterisk (*) modifies the meaning of the expression immediately preced-
ing it, so the a, together with the *, matches zero or more “a” characters.
You can break this down syntactically, as shown in Figure 6.1. The literal
characters “c” and “t” each match a single character, but a* forms a unit that
says, “Match zero or more occurrences of ‘a’.”
ca*t
Match “c” exactly. Match “c” exactly.
The plus sign (+), introduced earlier, works in a similar way. The plus sign,
6
together with the character or group that precedes it, means “Match one or
more instances of this expression.”
r'string' or
r"string"
After prompting the user for input, the program then calls the match
function, which is qualified as [Link] because it is imported from the re
package.
[Link](pattern, s)
If the pattern argument matches the target string (s in this case), the func-
tion returns a match object; otherwise it returns the value None, which con-
verts to the Boolean value False.
You can therefore use the value returned as if it were a Boolean value. If a
match is confirmed, True is returned; otherwise, False is returned.
Note Ë If you forget to include r (the raw-string indicator), this particular exam-
ple still works, but your code will be more reliable if you always use the r when
specifying regular-expression patterns. Python string interpretation does not
work precisely the way C/C++ string interpretation does. In those languages,
6
If you want to restrict positive results to exact matches—so that the entire
string has to match the pattern with nothing left over—you can add the spe-
cial character $, which means “end of string.” This character causes the match
to fail if any additional text is detected beyond the specified pattern.
pattern = r'\d\d\d-\d\d\d-\d\d\d\d$'
There are other ways you might want to refine the regular-expression pat-
tern. For example, you might want to permit input matching either of the fol-
lowing formats:
555-123-5000
555 123 5000
To accommodate both these patterns, you need to create a character set,
which allows for more than one possible value in a particular position. For
example, the following expression says to match either an “a” or a “b”, but not
both:
[ab]
It’s possible to put many characters in a character set. But only one of
the characters will be matched at a time. For example, the following range
matches exactly one character: an “a”, “b”, “c”, or “d” in the next position.
[abcd]
Likewise, the following expression says that either a space or a minus sign
(–) can be matched—which is what we want in this case:
[ -]
In this context, the square brackets are the only special characters; the two
characters inside are literal and at most one of them will be matched. The
minus sign often has a special meaning within square brackets, but not when
it appears in the very front or end of the characters inside the brackets.
Here’s the full regular expression we need:
pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d$'
Now, putting everything together with the refined pattern we’ve come up
with in this section, here’s the complete example:
import re
pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d$'
6
expression pattern. It’s a good idea to become familiar with all of them. These
include most punctuation characters, such as + and *.
◗ Any characters that do not have special meaning to the Python regular-
expression interpreter are considered literal characters. The regular-expression
interpreter attempts to match these exactly.
◗ The backslash can be used to “escape” special characters, making them into
literal characters. The backslash can also add special meaning to certain ordinary
characters—for example, causing \d to mean “any digit” rather than a “d”.
numbers. This pattern looks for three digits, a minus sign, two digits, another
minus sign, and then four digits.
import re
pattern = r'\d\d\d-\d\d-\d\d\d\d$'
1 2 3
c b
(Start) (Done!)
Figure 6.2. State machine for ca*b
The following list describes how the program traverses this state machine
to find a match at run time. Position 1 is the starting point.
◗ A character is read. If it’s a “c”, the machine goes to state 2. Reading any other
character causes failure.
◗ From state 2, either an “a” or a “b” can be read. If an “a” is read, the machine
stays in state 2. It can do this any number of times. If a “b” is read, the machine
transitions to state 3. Reading any other character causes failure.
◗ If the machine reaches state 3, it is finished, and success is reported.
This state machine illustrates some basic principles, simple though it is. In
particular, a state machine has to be compiled and then later traversed at run
6
time.
Note Ë The state machine diagrams in this chapter assume DFAs (determinis-
tic finite automata), whereas Python actually uses NFAs (nondeterministic
finite automata). This makes no difference to you unless you’re implementing
a regular-expression evaluator, something you’ll likely never need to do.
So if that’s the case, you can ignore the difference between DFAs and NFAs!
You’re welcome.
Ç Note
Here’s what you need to know: If you’re going to use the same regular-
expression pattern multiple times, it’s a good idea to compile that pattern into
a regular-expression object and then use that object repeatedly. The regex
package provides a method for this purpose called compile.
ntax
Key Sy
regex_object_name = [Link](pattern)
Here’s a full example using the compile function to create a regular expres-
sion object called reg1.
import re
def test_item(s):
if [Link](reg1, s):
print(s, 'is a match.')
else:
print(s, 'is not a match!')
test_item('caab')
test_item('caaxxb')
1 2 3 4
c a b
(Start) (Done!)
Figure 6.3. State machine for ca+b
Given this pattern, “cb” is not a successful match, but “cab”, “caab”, and
“caaab” are. This state machine requires the reading of at least one “a”. After
that, matching further “a” characters is optional, but it can match as many
instances of “a” in a row as it finds.
2
a x
(Start) 1 4 (Done!)
y 3 z
Figure 6.4. State machine for (ax)|(yz)
6
order of evaluation. With these parentheses, the alteration operator is inter-
preted to mean “either x or y but not both.”
a(x|y)z
The parentheses and the | symbol are all special characters. Figure 6.5
illustrates the state machine that is compiled from the expression a(x|y)z.
x
1 2 3 4
a z
y
(Start) (Done!)
Figure 6.5. State machine for a(x|y)z
This behavior is the same as that for the following expression, which uses a
character set rather than alteration:
a[xy]z
Is there a difference between alteration and a character set? Yes: A charac-
ter set always matches one character of text (although it may be part of a more
complex pattern, of course). Alteration, in contrast, may involve groups lon-
ger than a single character. For example, the following pattern matches either
“cat” or “dog” in its entirety—but not “catdog”:
cat|dog
6
powerful as this language is, it can be broken down into a few major elements.
◗ Meta characters: These are tools for specifying either a specific character or
one of a number of characters, such as “any digit” or “any alphanumeric char-
acter.” Each of these characters matches one character at a time.
◗ Character sets: This part of the syntax also matches one character at a time—
in this case, giving a set of values from which to match.
◗ Expression quantifiers: These are operators that enable you to combine indi-
vidual characters, including wildcards, into patterns of expressions that can
be repeated any number of times.
◗ Groups: You can use parentheses to combine smaller expressions into larger
ones.
6
r'c[aeiou]t'
This matches any of the following:
cat
cet
cit
cot
cut
We can combine ranges with other operators, such as +, which retains its
usual meaning outside the square brackets. So consider
c[aeiou]+t
This matches any of the following, as well as many other possible strings:
cat
ciot
ciiaaet
caaauuuut
ceeit
Within a range, the minus sign (-) enables you to specify ranges of charac-
ters when the minus sign appears between two other characters in a character
range. Otherwise, it is treated as a literal character.
For example, the following range matches any character from lowercase
“a” to lowercase “n”:
[a-n]
This range therefore matches an “a”, “b”, “c”, up to an “l”, “m”, or “n”. If
the IGNORECASE flag is enabled, it also matches uppercase versions of these
letters.
The following matches any uppercase or lowercase letter, or digit. Unlike
“\w,” however, this character set does not match an underscore (_).
[A-Za-z0-9]
The following matches any hexadecimal digit: a digit from 0 to 9 or an
uppercase or lowercase letter in the range “A”, “B”, “C”, “D”, “E”, and “F”.
[A-Fa-f0-9]
Character sets observe some special rules.
◗ Almost all characters within square brackets ([ ]) lose their special meaning,
except where specifically mentioned here. Therefore, almost everything is
interpreted literally.
◗ A closing square bracket has special meaning, terminating the character set;
therefore, a closing bracket must be escaped with a backslash to be interpreted
literally: “\]”
◗ The minus sign (-) has special meaning unless it occurs at the very beginning
or end of the character set, in which case it is interpreted as a literal minus
sign. Likewise, a caret (^) has special meaning at the beginning of a range but
not elsewhere.
◗ The backslash (\), even in this context, must be escaped to be represented lit-
erally. Use “\\” to represent a backslash.
6
6.6.3 Pattern Quantifiers
All of the quantifiers in Table 6.3 are expression modifiers, and not expression
extenders. Section 6.6.4, discusses in detail what the implications of “greedy”
matching are.
The next-to-last quantifier listed in Table 6.3 is the use of parentheses for
creating groups. Grouping can dramatically affect the meaning of a pattern.
Putting items in parentheses also creates tagged groups for later reference.
The use of the numeric quantifiers from Table 6.3 makes some expres-
sions easier to render, or at least more compact. For example, consider the
phone-number verification pattern introduced earlier.
r'\d\d\d-\d\d\d-\d\d\d\d'
This can be revised as
r'\d{3}-\d{3}-\d{4}'
This example saves a few keystrokes of typing, but other cases might save
quite a bit more. Using these features also creates code that is more readable
and easier to maintain.
6
because of the parentheses; specifically, it’s the group “ab” that is repeated.
c(ab)+
Match “c” exactly.
Now let’s say you’re employed to write these tests. If you use regular expres-
sions, this job will be easy for you—a delicious piece of cake.
The following verification function performs the necessary tests. We can
implement the five rules by using four patterns and performing [Link]
with each.
6
import re
pat1 = r'(\w|[@#$%^&*!]){8,}$'
pat2 = r'.*\d'
pat3 = r'.*[a-zA-Z]'
pat4 = r'.*[@#$%^$*]'
def verify_passwd(s):
b = ([Link](pat1, s) and [Link](pat2, s) and
[Link](pat3, s) and [Link](pat4, s))
return bool(b)
The verify_passwd function applies four different match criteria to a
target string, s. The [Link] function is called with each of four different
patterns, pat1 through pat4. If all four matches succeed, the result is “true.”
The first pattern accepts any character that is a letter, character, or under-
score or a character in the range @#$%^&*! . . . and then it requires a match of
eight or more of such characters.