Learning Python - Fourth Edition
Learning Python - Fourth Edition
FOURTH EDITION
Learning Python
Mark Lutz
Editor: Julie Steele Production Editor: Sumita Mukherji Copyeditor: Rachel Head Production Services: Newgen North America Printing History:
March 1999: December 2003: October 2007: September 2009: First Edition. Second Edition. Third Edition. Fourth Edition.
Indexer: John Bickelhaupt Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano
Nutshell Handbook, the Nutshell Handbook logo, and the OReilly logo are registered trademarks of OReilly Media, Inc. Learning Python, the image of a wood rat, and related trade dress are trademarks of OReilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and OReilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and author assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
18 19 19
35 37 38 39 41 42 43 44 46 47 47 49 50 51 53 56 57 58 58 60 62 63 64
Embedding Calls Frozen Binary Executables Text Editor Launch Options Still Other Launch Options Future Possibilities? Which Option Should I Use? Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers Test Your Knowledge: Part I Exercises
64 65 65 66 66 66 68 68 69 70
Table of Contents | ix
User-Defined Classes And Everything Else Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
143 144 145 146 148 149 151 152 153 153 154
7. Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
String Literals Single- and Double-Quoted Strings Are the Same Escape Sequences Represent Special Bytes Raw Strings Suppress Escapes Triple Quotes Code Multiline Block Strings Strings in Action Basic Operations Indexing and Slicing String Conversion Tools Changing Strings String Methods String Method Examples: Changing Strings String Method Examples: Parsing Text Other Common String Methods in Action The Original string Module (Gone in 3.0) String Formatting Expressions Advanced String Formatting Expressions Dictionary-Based String Formatting Expressions String Formatting Method Calls The Basics Adding Keys, Attributes, and Offsets Adding Specific Formatting Comparison to the % Formatting Expression Why the New Format Method? General Type Categories Types Share Operation Sets by Categories Mutable Types Can Be Changed In-Place Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers 157 158 158 161 162 163 164 165 169 171 172 174 176 177 178 179 181 182 183 184 184 185 187 190 193 194 194 195 195 196
More Dictionary Methods A Languages Table Dictionary Usage Notes Other Ways to Make Dictionaries Dictionary Changes in Python 3.0 Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
A Tale of Two ifs What Python Adds What Python Removes Why Indentation Syntax? A Few Special Cases A Quick Example: Interactive Loops A Simple Interactive Loop Doing Math on User Inputs Handling Errors by Testing Inputs Handling Errors with try Statements Nesting Code Three Levels Deep Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
264 264 265 266 269 271 271 272 273 274 275 276 276 277
Truth Tests The if/else Ternary Expression Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
351 352 354 356 358 359 359 361 362 366 367 368 369
Dictionary View Iterators Other Iterator Topics Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
Table of Contents | xv
435 436 438 439 440 441 442 443 444 446 450 453 454 455 456 456 457 459 460 461 462
Timing Module Alternatives Other Suggestions Function Gotchas Local Names Are Detected Statically Defaults and Mutable Objects Functions Without returns Enclosing Scope Loop Variables Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers Test Your Knowledge: Part IV Exercises
513 517 518 518 520 522 522 522 523 523 524
Part V. Modules
21. Modules: The Big Picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Why Use Modules? Python Program Architecture How to Structure a Program Imports and Attributes Standard Library Modules How Imports Work 1. Find It 2. Compile It (Maybe) 3. Run It The Module Search Path Configuring the Search Path Search Path Variations The [Link] List Module File Selection Advanced Module Selection Concepts Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers 529 530 531 531 533 533 534 534 535 535 537 538 538 539 540 541 541 542
Cross-File Name Changes import and from Equivalence Potential Pitfalls of the from Statement Module Namespaces Files Generate Namespaces Attribute Name Qualification Imports Versus Scopes Namespace Nesting Reloading Modules reload Basics reload Example Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
547 548 548 550 550 552 552 553 554 555 556 558 558 558
Modules Are Objects: Metaprograms Importing Modules by Name String Transitive Module Reloads Module Design Concepts Module Gotchas Statement Order Matters in Top-Level Code from Copies Names but Doesnt Link from * Can Obscure the Meaning of Variables reload May Not Impact from Imports reload, from, and Interactive Testing Recursive from Imports May Not Work Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers Test Your Knowledge: Part V Exercises
591 594 595 598 599 599 600 601 601 602 603 604 604 605 605
xx | Table of Contents
Classes Versus Dictionaries Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
Example Methods Method Example Calling Superclass Constructors Other Method Call Possibilities Inheritance Attribute Tree Construction Specializing Inherited Methods Class Interface Techniques Abstract Superclasses Python 2.6 and 3.0 Abstract Superclasses Namespaces: The Whole Story Simple Names: Global Unless Assigned Attribute Names: Object Namespaces The Zen of Python Namespaces: Assignments Classify Names Namespace Dictionaries Namespace Links Documentation Strings Revisited Classes Versus Modules Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
682 684 685 686 686 687 687 687 689 690 692 693 693 693 694 696 699 701 703 703 703 704
The 2.6 __cmp__ Method (Removed in 3.0) Boolean Tests: __bool__ and __len__ Object Destruction: __del__ Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
Why the Special Methods? Static Methods in 2.6 and 3.0 Static Method Alternatives Using Static and Class Methods Counting Instances with Static Methods Counting Instances with Class Methods Decorators and Metaclasses: Part 1 Function Decorator Basics A First Function Decorator Example Class Decorators and Metaclasses For More Details Class Gotchas Changing Class Attributes Can Have Side Effects Changing Mutable Class Attributes Can Have Side Effects, Too Multiple Inheritance: Order Matters Methods, Classes, and Nested Scopes Delegation-Based Classes in 3.0: __getattr__ and built-ins Overwrapping-itis Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers Test Your Knowledge: Part VI Exercises
795 796 798 799 800 802 804 804 805 807 808 808 808 810 811 812 814 814 815 815 815 816
Example: Default Behavior Example: Catching Built-in Exceptions The try/finally Statement Example: Coding Termination Actions with try/finally Unified try/except/finally Unified try Statement Syntax Combining finally and except by Nesting Unified try Example The raise Statement Propagating Exceptions with raise Python 3.0 Exception Chaining: raise from The assert Statement Example: Trapping Constraints (but Not Errors!) with/as Context Managers Basic Usage The Context Management Protocol Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
840 841 842 843 844 845 845 846 848 849 849 850 851 851 852 853 855 856 856
Functions Can Signal Conditions with raise Closing Files and Server Connections Debugging with Outer try Statements Running In-Process Tests More on sys.exc_info Exception Design Tips and Gotchas What Should Be Wrapped Catching Too Much: Avoid Empty except and Exception Catching Too Little: Use Class-Based Categories Core Language Summary The Python Toolset Development Tools for Larger Projects Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers Test Your Knowledge: Part VII Exercises
878 878 879 880 881 882 882 883 885 885 886 887 890 891 891 891
Using Text and Binary Files Text File Basics Text and Binary Modes in 3.0 Type and Content Mismatches Using Unicode Files Reading and Writing Unicode in 3.0 Handling the BOM in 3.0 Unicode Files in 2.6 Other String Tool Changes in 3.0 The re Pattern Matching Module The struct Binary Data Module The pickle Object Serialization Module XML Parsing Tools Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
920 920 921 923 924 924 926 928 929 929 930 932 934 937 937 937
Using __getattribute__ to Validate Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
Decorator Arguments Versus Function Annotations Other Applications: Type Testing (If You Insist!) Chapter Summary Test Your Knowledge: Quiz Test Your Knowledge: Answers
Preface
This book provides an introduction to the Python programming language. Python is a popular open source programming language used for both standalone programs and scripting applications in a wide variety of domains. It is free, portable, powerful, and remarkably easy and fun to use. Programmers from every corner of the software industry have found Pythons focus on developer productivity and software quality to be a strategic advantage in projects both large and small. Whether you are new to programming or are a professional developer, this books goal is to bring you quickly up to speed on the fundamentals of the core Python language. After reading this book, you will know enough about Python to apply it in whatever application domains you choose to explore. By design, this book is a tutorial that focuses on the core Python language itself, rather than specific applications of it. As such, its intended to serve as the first in a two-volume set: Learning Python, this book, teaches Python itself. Programming Python, among others, shows what you can do with Python after youve learned it. That is, applications-focused books such as Programming Python pick up where this book leaves off, exploring Pythons role in common domains such as the Web, graphical user interfaces (GUIs), and databases. In addition, the book Python Pocket Reference provides additional reference materials not included here, and it is designed to supplement this book. Because of this books foundations focus, though, it is able to present Python fundamentals with more depth than many programmers see when first learning the language. And because its based upon a three-day Python training class with quizzes and exercises throughout, this book serves as a self-paced introduction to the language.
xxxi
xxxii | Preface
Many popular Python libraries and tools will likely be available for Python 3.0 by the time you read these words, especially given the file I/O performance improvements expected in the upcoming 3.1 release. If you are using a system based on Python 2.X, however, youll find that this book addresses your concerns, too, and will help you migrate to 3.0 in the future. By proxy, this edition addresses other Python version 2 and 3 releases as well, though some older version 2.X code may not be able to run all the examples here. Although class decorators are available in both Python 2.6 and 3.0, for example, you cannot use them in an older Python 2.X that did not yet have this feature. See Tables P-1 and P-2 later in this Preface for summaries of 2.6 and 3.0 changes.
Shortly before going to press, this book was also augmented with notes about prominent extensions in the upcoming Python 3.1 release comma separators and automatic field numbering in string format method calls, multiple context manager syntax in with statements, new methods for numbers, and so on. Because Python 3.1 was targeted primarily at optimization, this book applies directly to this new release as well. In fact, because Python 3.1 supersedes 3.0, and because the latest Python is usually the best Python to fetch and use anyhow, in this book the term Python 3.0 generally refers to the language variations introduced by Python 3.0 but that are present in the entire 3.X line.
New Chapters
Although the main purpose of this edition is to update the examples and material from the preceding edition for 3.0 and 2.6, Ive also added five new chapters to address new topics and add context: Chapter 27 is a new class tutorial, using a more realistic example to explore the basics of Python object-oriented programming (OOP). Chapter 36 provides details on Unicode and byte strings and outlines string and file differences between 3.0 and 2.6. Chapter 37 collects managed attribute tools such as properties and provides new coverage of descriptors. Chapter 38 presents function and class decorators and works through comprehensive examples. Chapter 39 covers metaclasses and compares and contrasts them with decorators. The first of these chapters provides a gradual, step-by-step tutorial for using classes and OOP in Python. Its based upon a live demonstration I have been using in recent years in the training classes I teach, but has been honed here for use in a book. The chapter is designed to show OOP in a more realistic context than earlier examples and to
Preface | xxxiii
illustrate how class concepts come together into larger, working programs. I hope it works as well here as it has in live classes. The last four of these new chapters are collected in a new final part of the book, Advanced Topics. Although these are technically core language topics, not every Python programmer needs to delve into the details of Unicode text or metaclasses. Because of this, these four chapters have been separated out into this new part, and are officially optional reading. The details of Unicode and binary data strings, for example, have been moved to this final part because most programmers use simple ASCII strings and dont need to know about these topics. Similarly, decorators and metaclasses are specialist topics that are usually of more interest to API builders than application programmers. If you do use such tools, though, or use code that does, these new advanced topic chapters should help you master the basics. In addition, these chapters examples include case studies that tie core language concepts together, and they are more substantial than those in most of the rest of the book. Because this new part is optional reading, it has end-of-chapter quizzes but no end-of-part exercises.
xxxiv | Preface
edition is somewhat more advanced, because Python is somewhat more advanced. As for Python 3.0 itself, though, youre probably better off discovering most of this books changes for yourself, rather than reading about them further in this Preface.
Preface | xxxv
Extension Exception chaining in 3.0: raise e2 from e1 Reserved word changes in 2.6 and 3.0 New-style class cutover in 3.0 Property decorators in 2.6 and 3.0: @property Descriptor use in 2.6 and 3.0 Metaclass use in 2.6 and 3.0 Abstract base classes support in 2.6 and 3.0
Replacement
[Link](M) (or exec) f(*ps, **ks) repr(X) X != Y int 9999 K in D (or [Link](key) != None) input eval(input()) range open (and io module classes) X.__next__, called by next(X) X.__getitem__ passed a slice object X.__setitem__ passed a slice object [Link] (or loop code) exec(open(filename).read()) exec(open(filename).read()) 0o777 print(x, y)
old input
xrange file [Link] X.__getslice__ X.__setslice__ reduce execfile(filename) exec open(filename) 0777 print x, y
xxxvi | Preface
Removed
print >> F, x, y print x, y, u'ccc' 'bbb' for byte strings raise E, V except E, X: def f((a, b)): [Link] [Link](), etc. as lists map(), range(), etc. as lists map(None, ...) X=[Link](); [Link]() cmp(x, y) X.__cmp__(y) X.__nonzero__ X.__hex__, X.__oct__
Replacement
print(x, y, file=F) print(x, y, end=' ') 'ccc' b'bbb' raise E(V) except E as X: def f(x): (a, b) = x for line in file: (or X=iter(file)) list([Link]()) (dictionary views) list(map()), list(range()) (built-ins) zip (or manual code to pad results) sorted(D) (or list([Link]())) (x > y) - (x < y) __lt__, __gt__, __eq__, etc. X.__bool__ X._index__
Covered in chapter(s) 11 11 7, 36 7, 9, 36 32, 33, 34 32, 33, 34 11, 18, 20 13, 14 8, 14 14 13, 20 4, 8, 14 29 29 29 29 8 8, 9 9 28, 31, 39 17 18, 19, 24, 29, 30 34, 35 19, 38 30, 37, 38 10, 12 22 23 34 34 17 27 9 14 32, 33, 34
Redefine __X__ methods in wrapper classes Inconsistent tabs/spaces use is always an error May only appear at the top level of a file
from . import mod, package-relative form class MyException(Exception):
String-based exceptions
Preface | xxxvii
Removed String module functions Unbound methods Mixed type comparisons, sorts
Replacement String object methods Functions (staticmethod to call via instance) Nonnumeric mixed type comparisons are errors
There are additional changes in Python 3.0 that are not listed in this table, simply because they dont affect this book. Changes in the standard library, for instance, might have a larger impact on applications-focused books like Programming Python than they do here; although most standard library functionality is still present, Python 3.0 takes further liberties with renaming modules, grouping them into packages, and so on. For a more comprehensive list of changes in 3.0, see the Whats New in Python 3.0 document in Pythons standard manual set. If you are migrating from Python 2.X to Python 3.X, be sure to also see the 2to3 automatic code conversion script that is available with Python 3.0. It cant translate everything, but it does a reasonable job of converting the majority of 2.X code to run under 3.X. As I write this, a new 3to2 back-conversion project is also underway to translate Python 3.X code to run in 2.X environments. Either tool may prove useful if you must maintain code for both Python lines; see the Web for details. Because this fourth edition is mostly a fairly straightforward update for 3.0 with a handful of new chapters, and because its only been two years since the prior edition was published, the rest of this Preface is taken from the prior edition with only minor updating.
xxxviii | Preface
The new B if A else C conditional expression (Chapter 19) with/as context managers (Chapter 33) try/except/finally unification (Chapter 33) Relative import syntax (Chapter 23) Generator expressions (Chapter 20) New generator function features (Chapter 20) Function decorators (Chapter 31) The set object type (Chapter 5) New built-in functions: sorted, sum, any, all, enumerate (Chapters 13 and 14) The decimal fixed-precision object type (Chapter 5) Files, list comprehensions, and iterators (Chapters 14 and 20) New development tools: Eclipse, distutils, unittest and doctest, IDLE enhancements, Shedskin, and so on (Chapters 2 and 35)
Smaller language changes (for instance, the widespread use of True and False; the new sys.exc_info for fetching exception details; and the demise of string-based exceptions, string methods, and the apply and reduce built-ins) are discussed throughout the book. The third edition also expanded coverage of some of the features that were new in the second edition, including three-limit slices and the arbitrary arguments call syntax that subsumed apply.
Many additions and changes were made with Python beginners in mind, and some topics were moved to appear at the places where they proved simplest to digest in training classes. List comprehensions and iterators, for example, now make their initial appearance in conjunction with the for loop statement, instead of later with functional tools.
Preface | xxxix
Coverage of many original core language topics also was substantially expanded in the third edition, with new discussions and examples added. Because this text has become something of a de facto standard resource for learning the core Python language, the presentation was made more complete and augmented with new use cases throughout. In addition, a new set of Python tips and tricks, gleaned from 10 years of teaching classes and 15 years of using Python for real work, was incorporated, and the exercises were updated and expanded to reflect current Python best practices, new language features, and common beginners mistakes witnessed firsthand in classes. Overall, the core language coverage was expanded.
stuckvariables change out from under them, mutable default arguments mutate inexplicably, and so on. The goal here is instead to provide a solid grounding in Python fundamentals, so that even the unusual cases will make sense when they crop up. This scope is deliberate. By restricting our gaze to language fundamentals, we can investigate them here in more satisfying depth. Other texts, described ahead, pick up where this book leaves off and provide a more complete look at application-level topics and additional reference materials. The purpose of the book you are reading now is solely to teach Python itself so that you can apply it to whatever domain you happen to work in.
Preface | xli
sometimes omits the small details that are readily available in reference manuals. Because of that, this book is probably best described as an introduction and a steppingstone to more advanced and complete texts. For example, we wont talk much about Python/C integrationa complex topic that is nevertheless central to many Python-based systems. We also wont talk much about Pythons history or development processes. And popular Python applications such as GUIs, system tools, and network scripting get only a short glance, if they are mentioned at all. Naturally, this scope misses some of the big picture. By and large, Python is about raising the quality bar a few notches in the scripting world. Some of its ideas require more context than can be provided here, and Id be remiss if I didnt recommend further study after you finish this book. I hope that most readers of this book will eventually go on to gain a more complete understanding of applicationlevel programming from other texts. Because of its beginners focus, Learning Python is designed to be naturally complemented by OReillys other Python books. For instance, Programming Python, another book I authored, provides larger and more complete examples, along with tutorials on application programming techniques, and was explicitly designed to be a follow-up text to the one you are reading now. Roughly, the current editions of Learning Python and Programming Python reflect the two halves of their authors training materialsthe core language, and application programming. In addition, OReillys Python Pocket Reference serves as a quick reference supplement for looking up some of the finer details skipped here. Other follow-up books can also provide references, additional examples, or details about using Python in specific domains such as the Web and GUIs. For instance, OReillys Python in a Nutshell and Samss Python Essential Reference serve as useful references, and OReillys Python Cookbook offers a library of self-contained examples for people already familiar with application programming techniques. Because reading books is such a subjective experience, I encourage you to browse on your own to find advanced texts that suit your needs. Regardless of which books you choose, though, keep in mind that the rest of the Python story requires studying examples that are more realistic than there is space for here. Having said that, I think youll find this book to be a good first text on Python, despite its limited scope (and perhaps because of it). Youll learn everything you need to get started writing useful standalone Python programs and scripts. By the time youve finished this book, you will have learned not only the language itself, but also how to apply it well to your day-to-day tasks. And youll be equipped to tackle more advanced topics and examples as they come your way.
xlii | Preface
Preface | xliii
Part III, Statements and Syntax The next part moves on to introduce Pythons statementsthe code you type to create and process objects in Python. It also presents Pythons general syntax model. Although this part focuses on syntax, it also introduces some related tools, such as the PyDoc system, and explores coding alternatives. Part IV, Functions This part begins our look at Pythons higher-level program structure tools. Functions turn out to be a simple way to package code for reuse and avoid code redundancy. In this part, we will explore Pythons scoping rules, argument-passing techniques, and more. Part V, Modules Python modules let you organize statements and functions into larger components, and this part illustrates how to create, use, and reload modules. Well also look at some more advanced topics here, such as module packages, module reloading, and the __name__ variable. Part VI, Classes and OOP Here, we explore Pythons object-oriented programming tool, the classan optional but powerful way to structure code for customization and reuse. As youll see, classes mostly reuse ideas we will have covered by this point in the book, and OOP in Python is mostly about looking up names in linked objects. As youll also see, OOP is optional in Python, but it can shave development time substantially, especially for long-term strategic project development. Part VII, Exceptions and Tools We conclude the language fundamentals coverage in this text with a look at Pythons exception handling model and statements, plus a brief overview of development tools that will become more useful when you start writing larger programs (debugging and testing tools, for instance). Although exceptions are a fairly lightweight tool, this part appears after the discussion of classes because exceptions should now all be classes. Part VIII, Advanced Topics (new in the fourth edition) In the final part, we explore some advanced topics. Here, we study Unicode and byte strings, managed attribute tools like properties and descriptors, function and class decorators, and metaclasses. These chapters are all optional reading, because not all programmers need to understand the subjects they address. On the other hand, readers who must process internationalized text or binary data, or are responsible for developing APIs for other programmers to use, should find something of interest in this part. Part IX, Appendixes The book wraps up with a pair of appendixes that give platform-specific tips for using Python on various computers (Appendix A) and provide solutions to the endof-part exercises (Appendix B). Solutions to end-of-chapter quizzes appear in the chapters themselves.
xliv | Preface
Note that the index and table of contents can be used to hunt for details, but there are no reference appendixes in this book (this book is a tutorial, not a reference). As mentioned earlier, you can consult Python Pocket Reference, as well as other books, and the free Python reference manuals maintained at [Link] for syntax and built-in tool details.
Book Updates
Improvements happen (and so do mis^H^H^H typos). Updates, supplements, and corrections for this book will be maintained (or referenced) on the Web at one of the following sites: [Link] (OReillys web page for the book) [Link] (the authors site) [Link] (the authors web page for the book) The last of these three URLs points to a web page for this book where I will post updates, but be sure to search the Web if this link becomes invalid. If I could become more clairvoyant, I would, but the Web changes faster than printed books.
Preface | xlv
writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: Learning Python, Fourth Edition, by Mark Lutz. Copyright 2009 Mark Lutz, 978-0-596-15806-4. If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@[Link].
Font Conventions
This book uses the following typographical conventions: Italic Used for email addresses, URLs, filenames, pathnames, and emphasizing new terms when they are first introduced
Constant width
Used for the contents of files and the output from commands, and to designate modules, methods, statements, and commands
Constant width bold
Used in code sections to show commands or text that would be typed by the user, and, occasionally, to highlight portions of code
Constant width italic
xlvi | Preface
Notes specific to this book: In this books examples, the % character at the start of a system command line stands for the systems prompt, whatever that may be on your machine (e.g., C:\Python30> in a DOS window). Dont type the % character (or the system prompt it sometimes stands for) yourself. Similarly, in interpreter interaction listings, do not type the >>> and ... characters shown at the start of linesthese are prompts that Python displays. Type just the text after these prompts. To help you remember this, user inputs are shown in bold font in this book. Also, you normally dont need to type text that starts with a # in listings; as youll learn, these are comments, not executable code.
How to Contact Us
Please address comments and questions concerning this book to the publisher: OReilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We will also maintain a web page for this book, where we list errata, examples, and any additional information. You can access this page at: [Link]
Preface | xlvii
To comment or ask technical questions about this book, send email to: bookquestions@[Link] For more information about our books, conferences, Resource Centers, and the OReilly Network, see our website at: [Link] For book updates, be sure to also see the other links mentioned earlier in this Preface.
Acknowledgments
As I write this fourth edition of this book in 2009, I cant help but be in a sort of mission accomplished state of mind. I have now been using and promoting Python for 17 years, and have been teaching it for 12 years. Despite the passage of time and events, I am still constantly amazed at how successful Python has been over the years. It has grown in ways that most of us could not possibly have imagined in 1992. So, at the risk of sounding like a hopelessly self-absorbed author, youll have to pardon a few words of reminiscing, congratulations, and thanks here. Its been the proverbial long and winding road. Looking back today, when I first discovered Python in 1992, I had no idea what an impact it would have on the next 17 years of my life. Two years after writing the first edition of Programming Python in 1995, I began traveling around the country and the world teaching Python to beginners and experts. Since finishing the first edition of Learning Python in 1999, Ive been an independent Python trainer and writer, thanks largely to Pythons exponential growth in popularity. As I write these words in mid-2009, I have written 12 Python books (4 editions of 3). I have also been teaching Python for more than a decade; have taught some 225 Python training sessions in the U.S., Europe, Canada, and Mexico; and have met over 3,000 students along the way. Besides racking up frequent flyer miles, these classes helped me refine this text as well as my other Python books. Over the years, teaching honed the books, and vice versa. In fact, the book youre reading is derived almost entirely from my classes. Because of this, Id like to thank all the students who have participated in my courses during the last 12 years. Along with changes in Python itself, your feedback played a huge role in shaping this text. (Theres nothing quite as instructive as watching 3,000 students repeat the same beginners mistakes!) This edition owes its changes primarily to classes held after 2003, though every class held since 1997 has in some way helped refine this book. Id especially like to single out clients who hosted classes in Dublin, Mexico City, Barcelona, London, Edmonton, and Puerto Rico; better perks would be hard to imagine. Id also like to express my gratitude to everyone who played a part in producing this book. To the editors who worked on this project: Julie Steele on this edition, Tatiana
xlviii | Preface
Apandi on the prior edition, and many others on earlier editions. To Doug Hellmann and Jesse Noller for taking part in the technical review of this book. And to OReilly for giving me a chance to work on those 12 book projectsits been net fun (and only feels a little like the movie Groundhog Day). I want to thank my original coauthor David Ascher as well for his work on the first two editions of this book. David contributed the Outer Layers part in prior editions, which we unfortunately had to trim to make room for new core language materials in the third edition. To compensate, I added a handful of more advanced programs as a self-study final exercise in the third edition, and added both new advanced examples and a new complete part for advanced topics in the fourth edition. Also see the prior notes in this Preface about follow-up application-level texts you may want to consult once youve learned the fundamentals here. For creating such an enjoyable and useful language, I owe additional thanks to Guido van Rossum and the rest of the Python community. Like most open source systems, Python is the product of many heroic efforts. After 17 years of programming Python, I still find it to be seriously fun. Its been my privilege to watch Python grow from a new kid on the scripting languages block to a widely used tool, deployed in some fashion by almost every organization writing software. That has been an exciting endeavor to be a part of, and Id like to thank and congratulate the entire Python community for a job well done. I also want to thank my original editor at OReilly, the late Frank Willison. This book was largely Franks idea, and it reflects the contagious vision he had. In looking back, Frank had a profound impact on both my own career and that of Python itself. It is not an exaggeration to say that Frank was responsible for much of the fun and success of Python when it was new. We still miss him. Finally, a few personal notes of thanks. To OQO for the best toys so far (while they lasted). To the late Carl Sagan for inspiring an 18-year-old kid from Wisconsin. To my Mom, for courage. And to all the large corporations Ive come across over the years, for reminding me how lucky I have been to be self-employed for the last decade! To my children, Mike, Sammy, and Roxy, for whatever futures you will choose to make. You were children when I began with Python, and you seem to have somehow grown up along the way; Im proud of you. Life may compel us down paths all our own, but there will always be a path home. And most of all, to Vera, my best friend, my girlfriend, and my wife. The best day of my life was the day I finally found you. I dont know what the next 50 years hold, but I do know that I want to spend all of them holding you. Mark Lutz Sarasota, Florida July 2009
Preface | xlix
PART I
Getting Started
CHAPTER 1
If youve bought this book, you may already know what Python is and why its an important tool to learn. If you dont, you probably wont be sold on Python until youve learned the language by reading the rest of this book and have done a project or two. But before we jump into details, the first few pages of this book will briefly introduce some of the main reasons behind Pythons popularity. To begin sculpting a definition of Python, this chapter takes the form of a question-and-answer session, which poses some of the most common questions asked by beginners.
less to debug, and less to maintain after the fact. Python programs also run immediately, without the lengthy compile and link steps required by some other tools, further boosting programmer speed. Program portability Most Python programs run unchanged on all major computer platforms. Porting Python code between Linux and Windows, for example, is usually just a matter of copying a scripts code between machines. Moreover, Python offers multiple options for coding portable graphical user interfaces, database access programs, webbased systems, and more. Even operating system interfaces, including program launches and directory processing, are as portable in Python as they can possibly be. Support libraries Python comes with a large collection of prebuilt and portable functionality, known as the standard library. This library supports an array of application-level programming tasks, from text pattern matching to network scripting. In addition, Python can be extended with both homegrown libraries and a vast collection of third-party application support software. Pythons third-party domain offers tools for website construction, numeric programming, serial port access, game development, and much more. The NumPy extension, for instance, has been described as a free and more powerful equivalent to the Matlab numeric programming system. Component integration Python scripts can easily communicate with other parts of an application, using a variety of integration mechanisms. Such integrations allow Python to be used as a product customization and extension tool. Today, Python code can invoke C and C++ libraries, can be called from C and C++ programs, can integrate with Java and .NET components, can communicate over frameworks such as COM, can interface with devices over serial ports, and can interact over networks with interfaces like SOAP, XML-RPC, and CORBA. It is not a standalone tool. Enjoyment Because of Pythons ease of use and built-in toolset, it can make the act of programming more pleasure than chore. Although this may be an intangible benefit, its effect on productivity is an important asset. Of these factors, the first two (quality and productivity) are probably the most compelling benefits to most Python users.
Software Quality
By design, Python implements a deliberately simple and readable syntax and a highly coherent programming model. As a slogan at a recent Python conference attests, the net result is that Python seems to fit your brainthat is, features of the language interact in consistent and limited ways and follow naturally from a small set of core
4 | Chapter 1:A Python Q&A Session
concepts. This makes the language easier to learn, understand, and remember. In practice, Python programmers do not need to constantly refer to manuals when reading or writing code; its a consistently designed system that many find yields surprisingly regular-looking code. By philosophy, Python adopts a somewhat minimalist approach. This means that although there are usually multiple ways to accomplish a coding task, there is usually just one obvious way, a few less obvious alternatives, and a small set of coherent interactions everywhere in the language. Moreover, Python doesnt make arbitrary decisions for you; when interactions are ambiguous, explicit intervention is preferred over magic. In the Python way of thinking, explicit is better than implicit, and simple is better than complex.* Beyond such design themes, Python includes tools such as modules and OOP that naturally promote code reusability. And because Python is focused on quality, so too, naturally, are Python programmers.
Developer Productivity
During the great Internet boom of the mid-to-late 1990s, it was difficult to find enough programmers to implement software projects; developers were asked to implement systems as fast as the Internet evolved. Today, in an era of layoffs and economic recession, the picture has shifted. Programming staffs are often now asked to accomplish the same tasks with even fewer people. In both of these scenarios, Python has shined as a tool that allows programmers to get more done with less effort. It is deliberately optimized for speed of developmentits simple syntax, dynamic typing, lack of compile steps, and built-in toolset allow programmers to develop programs in a fraction of the time needed when using some other tools. The net effect is that Python typically boosts developer productivity many times beyond the levels supported by traditional languages. Thats good news in both boom and bust times, and everywhere the software industry goes in between.
* For a more complete look at the Python philosophy, type the command import this at any Python interactive prompt (youll see how in Chapter 2). This invokes an Easter egg hidden in Pythona collection of design principles underlying Python. The acronym EIBTI is now fashionable jargon for the explicit is better than implicit rule.
preference for script to describe a simpler top-level file and program to refer to a more sophisticated multifile application. Because the term scripting language has so many different meanings to different observers, some would prefer that it not be applied to Python at all. In fact, people tend to make three very different associations, some of which are more useful than others, when they hear Python labeled as such: Shell tools Sometimes when people hear Python described as a scripting language, they think it means that Python is a tool for coding operating-system-oriented scripts. Such programs are often launched from console command lines and perform tasks such as processing text files and launching other programs. Python programs can and do serve such roles, but this is just one of dozens of common Python application domains. It is not just a better shell-script language. Control language To others, scripting refers to a glue layer used to control and direct (i.e., script) other application components. Python programs are indeed often deployed in the context of larger applications. For instance, to test hardware devices, Python programs may call out to components that give low-level access to a device. Similarly, programs may run bits of Python code at strategic points to support end-user product customization without the need to ship and recompile the entire systems source code. Pythons simplicity makes it a naturally flexible control tool. Technically, though, this is also just a common Python role; many (perhaps most) Python programmers code standalone scripts without ever using or knowing about any integrated components. It is not just a control language. Ease of use Probably the best way to think of the term scripting language is that it refers to a simple language used for quickly coding tasks. This is especially true when the term is applied to Python, which allows much faster program development than compiled languages like C++. Its rapid development cycle fosters an exploratory, incremental mode of programming that has to be experienced to be appreciated. Dont be fooled, thoughPython is not just for simple tasks. Rather, it makes tasks simple by its ease of use and flexibility. Python has a simple feature set, but it allows programs to scale up in sophistication as needed. Because of that, it is commonly used for quick tactical tasks and longer-term strategic development. So, is Python a scripting language or not? It depends on whom you ask. In general, the term scripting is probably best used to describe the rapid and flexible mode of development that Python supports, rather than a particular application domain.
with Linux distributions, Macintosh computers, and some products and hardware, further clouding the user-base picture. In general, though, Python enjoys a large user base and a very active developer community. Because Python has been around for some 19 years and has been widely used, it is also very stable and robust. Besides being employed by individual users, Python is also being applied in real revenue-generating products by real companies. For instance: Google makes extensive use of Python in its web search systems, and employs Pythons creator. The YouTube video sharing service is largely written in Python. The popular BitTorrent peer-to-peer file sharing system is a Python program. Googles popular App Engine web development framework uses Python as its application language. EVE Online, a Massively Multiplayer Online Game (MMOG), makes extensive use of Python. Maya, a powerful integrated 3D modeling and animation system, provides a Python scripting API. Intel, Cisco, Hewlett-Packard, Seagate, Qualcomm, and IBM use Python for hardware testing. Industrial Light & Magic, Pixar, and others use Python in the production of animated movies. JPMorgan Chase, UBS, Getco, and Citadel apply Python for financial market forecasting. NASA, Los Alamos, Fermilab, JPL, and others use Python for scientific programming tasks. iRobot uses Python to develop commercial robotic devices. ESRI uses Python as an end-user customization tool for its popular GIS mapping products. The NSA uses Python for cryptography and intelligence analysis. The IronPort email server product uses more than 1 million lines of Python code to do its job. The One Laptop Per Child (OLPC) project builds its user interface and activity model in Python. And so on. Probably the only common thread amongst the companies using Python today is that Python is used all over the map, in terms of application domains. Its general-purpose nature makes it applicable to almost all fields, not just one. In fact, its safe to say that virtually every substantial organization writing software is using Python, whether for short-term tactical tasks, such as testing and administration, or for longterm strategic product development. Python has proven to work well in both modes.
For more details on companies using Python today, see Pythons website at [Link] .[Link].
Systems Programming
Pythons built-in interfaces to operating-system services make it ideal for writing portable, maintainable system-administration tools and utilities (sometimes called shell tools). Python programs can search files and directory trees, launch other programs, do parallel processing with processes and threads, and so on. Pythons standard library comes with POSIX bindings and support for all the usual OS tools: environment variables, files, sockets, pipes, processes, multiple threads, regular expression pattern matching, command-line arguments, standard stream interfaces, shell-command launchers, filename expansion, and more. In addition, the bulk of Pythons system interfaces are designed to be portable; for example, a script that copies directory trees typically runs unchanged on all major Python platforms. The Stackless Python system, used by EVE Online, also offers advanced solutions to multiprocessing requirements.
GUIs
Pythons simplicity and rapid turnaround also make it a good match for graphical user interface programming. Python comes with a standard object-oriented interface to the Tk GUI API called tkinter (Tkinter in 2.6) that allows Python programs to implement portable GUIs with a native look and feel. Python/tkinter GUIs run unchanged on Microsoft Windows, X Windows (on Unix and Linux), and the Mac OS (both Classic and OS X). A free extension package, PMW, adds advanced widgets to the tkinter toolkit. In addition, the wxPython GUI API, based on a C++ library, offers an alternative toolkit for constructing portable GUIs in Python.
Higher-level toolkits such as PythonCard and Dabo are built on top of base APIs such as wxPython and tkinter. With the proper library, you can also use GUI support in other toolkits in Python, such as Qt with PyQt, GTK with PyGTK, MFC with PyWin32, .NET with IronPython, and Swing with Jython (the Java version of Python, described in Chapter 2) or JPype. For applications that run in web browsers or have simple interface requirements, both Jython and Python web frameworks and serverside CGI scripts, described in the next section, provide additional user interface options.
Internet Scripting
Python comes with standard Internet modules that allow Python programs to perform a wide variety of networking tasks, in client and server modes. Scripts can communicate over sockets; extract form information sent to server-side CGI scripts; transfer files by FTP; parse, generate, and analyze XML files; send, receive, compose, and parse email; fetch web pages by URLs; parse the HTML and XML of fetched web pages; communicate over XML-RPC, SOAP, and Telnet; and more. Pythons libraries make these tasks remarkably simple. In addition, a large collection of third-party tools are available on the Web for doing Internet programming in Python. For instance, the HTMLGen system generates HTML files from Python class-based descriptions, the mod_python package runs Python efficiently within the Apache web server and supports server-side templating with its Python Server Pages, and the Jython system provides for seamless Python/Java integration and supports coding of server-side applets that run on clients. In addition, full-blown web development framework packages for Python, such as Django, TurboGears, web2py, Pylons, Zope, and WebWare, support quick construction of full-featured and production-quality websites with Python. Many of these include features such as object-relational mappers, a Model/View/Controller architecture, server-side scripting and templating, and AJAX support, to provide complete and enterprise-level web development solutions.
Component Integration
We discussed the component integration role earlier when describing Python as a control language. Pythons ability to be extended by and embedded in C and C++ systems makes it useful as a flexible glue language for scripting the behavior of other systems and components. For instance, integrating a C library into Python enables Python to test and launch the librarys components, and embedding Python in a product enables onsite customizations to be coded without having to recompile the entire product (or ship its source code at all).
Tools such as the SWIG and SIP code generators can automate much of the work needed to link compiled components into Python for use in scripts, and the Cython system allows coders to mix Python and C-like code. Larger frameworks, such as Pythons COM support on Windows, the Jython Java-based implementation, the IronPython .NET-based implementation, and various CORBA toolkits for Python, provide alternative ways to script components. On Windows, for example, Python scripts can use frameworks to script Word and Excel.
Database Programming
For traditional database demands, there are Python interfaces to all commonly used relational database systemsSybase, Oracle, Informix, ODBC, MySQL, PostgreSQL, SQLite, and more. The Python world has also defined a portable database API for accessing SQL database systems from Python scripts, which looks the same on a variety of underlying database systems. For instance, because the vendor interfaces implement the portable API, a script written to work with the free MySQL system will work largely unchanged on other systems (such as Oracle); all you have to do is replace the underlying vendor interface. Pythons standard pickle module provides a simple object persistence systemit allows programs to easily save and restore entire Python objects to files and file-like objects. On the Web, youll also find a third-party open source system named ZODB that provides a complete object-oriented database system for Python scripts, and others (such as SQLObject and SQLAlchemy) that map relational tables onto Pythons class model. Furthermore, as of Python 2.5, the in-process SQLite embedded SQL database engine is a standard part of Python itself.
Rapid Prototyping
To Python programs, components written in Python and C look the same. Because of this, its possible to prototype systems in Python initially, and then move selected components to a compiled language such as C or C++ for delivery. Unlike some prototyping tools, Python doesnt require a complete rewrite once the prototype has solidified. Parts of the system that dont require the efficiency of a language such as C++ can remain coded in Python for ease of maintenance and use.
animation, 3D visualization, parallel processing, and so on. The popular SciPy and ScientificPython extensions, for example, provide additional libraries of scientific programming tools and use NumPy code.
The PSF (Python Software Foundation), a formal nonprofit group, organizes conferences and deals with intellectual property issues. Numerous Python conferences are held around the world; OReillys OSCON and the PSFs PyCon are the largest. The former of these addresses multiple open source projects, and the latter is a Python-only event that has experienced strong growth in recent years. Attendance at PyCon 2008 nearly doubled from the prior year, growing from 586 attendees in 2007 to over 1,000 in 2008. This was on the heels of a 40% attendance increase in 2007, from 410 in 2006. PyCon 2009 had 943 attendees, a slight decrease from 2008, but a still very strong showing during a global recession.
Its Object-Oriented
Python is an object-oriented language, from the ground up. Its class model supports advanced notions such as polymorphism, operator overloading, and multiple inheritance; yet, in the context of Pythons simple syntax and typing, OOP is remarkably easy to apply. In fact, if you dont understand these terms, youll find they are much easier to learn with Python than with just about any other OOP language available. Besides serving as a powerful code structuring and reuse device, Pythons OOP nature makes it ideal as a scripting tool for object-oriented systems languages such as C++ and Java. For example, with the appropriate glue code, Python programs can subclass (specialize) classes implemented in C++, Java, and C#. Of equal significance, OOP is an option in Python; you can go far without having to become an object guru all at once. Much like C++, Python supports both procedural and object-oriented programming modes. Its object-oriented tools can be applied if and when constraints allow. This is especially useful in tactical development modes, which preclude design phases.
Its Free
Python is completely free to use and distribute. As with other open source software, such as Tcl, Perl, Linux, and Apache, you can fetch the entire Python systems source code for free on the Internet. There are no restrictions on copying it, embedding it in your systems, or shipping it with your products. In fact, you can even sell Pythons source code, if you are so inclined.
But dont get the wrong idea: free doesnt mean unsupported. On the contrary, the Python online community responds to user queries with a speed that most commercial software help desks would do well to try to emulate. Moreover, because Python comes with complete source code, it empowers developers, leading to the creation of a large team of implementation experts. Although studying or changing a programming languages implementation isnt everyones idea of fun, its comforting to know that you can do so if you need to. Youre not dependent on the whims of a commercial vendor; the ultimate documentation source is at your disposal. As mentioned earlier, Python development is performed by a community that largely coordinates its efforts over the Internet. It consists of Pythons creatorGuido van Rossum, the officially anointed Benevolent Dictator for Life (BDFL) of Pythonplus a supporting cast of thousands. Language changes must follow a formal enhancement procedure and be scrutinized by both other developers and the BDFL. Happily, this tends to make Python more conservative with changes than some other languages.
Its Portable
The standard implementation of Python is written in portable ANSI C, and it compiles and runs on virtually every major platform currently in use. For example, Python programs run today on everything from PDAs to supercomputers. As a partial list, Python is available on: Linux and Unix systems Microsoft Windows and DOS (all modern flavors) Mac OS (both OS X and Classic) BeOS, OS/2, VMS, and QNX Real-time systems such as VxWorks Cray supercomputers and IBM mainframes PDAs running Palm OS, PocketPC, and Linux Cell phones running Symbian OS and Windows Mobile Gaming consoles and iPods And more
Like the language interpreter itself, the standard library modules that ship with Python are implemented to be as portable across platform boundaries as possible. Further, Python programs are automatically compiled to portable byte code, which runs the same on any platform with a compatible version of Python installed (more on this in the next chapter).
What that means is that Python programs using the core language and standard libraries run the same on Linux, Windows, and most other systems with a Python interpreter. Most Python ports also contain platform-specific extensions (e.g., COM support on Windows), but the core Python language and libraries work the same everywhere. As mentioned earlier, Python also includes an interface to the Tk GUI toolkit called tkinter (Tkinter in 2.6), which allows Python programs to implement full-featured graphical user interfaces that run on all major GUI platforms without program changes.
Its Powerful
From a features perspective, Python is something of a hybrid. Its toolset places it between traditional scripting languages (such as Tcl, Scheme, and Perl) and systems development languages (such as C, C++, and Java). Python provides all the simplicity and ease of use of a scripting language, along with more advanced software-engineering tools typically found in compiled languages. Unlike some scripting languages, this combination makes Python useful for large-scale development projects. As a preview, here are some of the main things youll find in Pythons toolbox: Dynamic typing Python keeps track of the kinds of objects your program uses when it runs; it doesnt require complicated type and size declarations in your code. In fact, as youll see in Chapter 6, there is no such thing as a type or variable declaration anywhere in Python. Because Python code does not constrain data types, it is also usually automatically applicable to a whole range of objects. Automatic memory management Python automatically allocates objects and reclaims (garbage collects) them when they are no longer used, and most can grow and shrink on demand. As youll learn, Python keeps track of low-level memory details so you dont have to. Programming-in-the-large support For building larger systems, Python includes tools such as modules, classes, and exceptions. These tools allow you to organize systems into components, use OOP to reuse and customize code, and handle events and errors gracefully. Built-in object types Python provides commonly used data structures such as lists, dictionaries, and strings as intrinsic parts of the language; as youll see, theyre both flexible and easy to use. For instance, built-in objects can grow and shrink on demand, can be arbitrarily nested to represent complex information, and more. Built-in tools To process all those object types, Python comes with powerful and standard operations, including concatenation (joining collections), slicing (extracting sections), sorting, mapping, and more.
Library utilities For more specific tasks, Python also comes with a large collection of precoded library tools that support everything from regular expression matching to networking. Once you learn the language itself, Pythons library tools are where much of the application-level action occurs. Third-party utilities Because Python is open source, developers are encouraged to contribute precoded tools that support tasks beyond those supported by its built-ins; on the Web, youll find free support for COM, imaging, CORBA ORBs, XML, database access, and much more. Despite the array of tools in Python, it retains a remarkably simple syntax and design. The result is a powerful programming tool with all the usability of a scripting language.
Its Mixable
Python programs can easily be glued to components written in other languages in a variety of ways. For example, Pythons C API lets C programs call and be called by Python programs flexibly. That means you can add functionality to the Python system as needed, and use Python programs within other environments or systems. Mixing Python with libraries coded in languages such as C or C++, for instance, makes it an easy-to-use frontend language and customization tool. As mentioned earlier, this also makes Python good at rapid prototyping; systems may be implemented in Python first, to leverage its speed of development, and later moved to C for delivery, one piece at a time, according to performance demands.
Is more powerful than Tcl. Pythons support for programming in the large makes it applicable to the development of larger systems. Has a cleaner syntax and simpler design than Perl, which makes it more readable and maintainable and helps reduce program bugs. Is simpler and easier to use than Java. Python is a scripting language, but Java inherits much of the complexity and syntax of systems languages such as C++. Is simpler and easier to use than C++, but it doesnt often compete with C++; as a scripting language, Python typically serves different roles. Is both more powerful and more cross-platform than Visual Basic. Its open source nature also means it is not controlled by a single company. Is more readable and general-purpose than PHP. Python is sometimes used to construct websites, but its also widely used in nearly every other computer domain, from robotics to movie animation. Is more mature and has a more readable syntax than Ruby. Unlike Ruby and Java, OOP is an option in PythonPython does not impose OOP on users or projects to which it may not apply. Has the dynamic flavor of languages like SmallTalk and Lisp, but also has a simple, traditional syntax accessible to developers as well as end users of customizable systems. Especially for programs that do more than scan text files, and that might have to be read in the future by others (or by you!), many people find that Python fits the bill better than any other scripting or programming language available today. Furthermore, unless your application requires peak performance, Python is often a viable alternative to systems development languages such as C, C++, and Java: Python code will be much less difficult to write, debug, and maintain. Of course, your author has been a card-carrying Python evangelist since 1992, so take these comments as you may. They do, however, reflect the common experience of many developers who have taken time to explore what Python has to offer.
Chapter Summary
And that concludes the hype portion of this book. In this chapter, weve explored some of the reasons that people pick Python for their programming tasks. Weve also seen how it is applied and looked at a representative sample of who is using it today. My goal is to teach Python, though, not to sell it. The best way to judge a language is to see it in action, so the rest of this book focuses entirely on the language details weve glossed over here. The next two chapters begin our technical introduction to the language. In them, well explore ways to run Python programs, peek at Pythons byte code execution model, and introduce the basics of module files for saving code. The goal will be to give you
18 | Chapter 1:A Python Q&A Session
just enough information to run the examples and exercises in the rest of the book. You wont really start programming per se until Chapter 4, but make sure you have a handle on the startup details before moving on.
4. 5.
6.
7.
linked-in C code in the interpreter. If speed is critical, compiled extensions are available for number-crunching parts of an application. You can use Python for nearly anything you can do with a computer, from website development and gaming to robotics and spacecraft control. import this triggers an Easter egg inside Python that displays some of the design philosophies underlying the language. Youll learn how to run this statement in the next chapter. Spam is a reference from a famous Monty Python skit in which people trying to order food in a cafeteria are drowned out by a chorus of Vikings singing about spam. Oh, and its also a common variable name in Python scripts.... Blue. No, yellow!
But as anyone who has done any substantial code maintenance should be able to attest, freedom of expression is great for art, but lousy for engineering. In engineering, we need a minimal feature set and predictability. In engineering, freedom of expression can lead to maintenance nightmares. As more than one Perl user has confided to me, the result of too much freedom is often code that is much easier to rewrite from scratch than to modify. Consider this: when people create a painting or a sculpture, they do so for themselves for purely aesthetic purposes. The possibility of someone else having to change that painting or sculpture later does not enter into it. This is a critical difference between art and engineering. When people write software, they are not writing it for themselves. In fact, they are not even writing primarily for the computer. Rather, good programmers know that code is written for the next human being who has to read it in order to maintain or reuse it. If that person cannot understand the code, its all but useless in a realistic development scenario. This is where many people find that Python most clearly differentiates itself from scripting languages like Perl. Because Pythons syntax model almost forces users to write readable code, Python programs lend themselves more directly to the full software development cycle. And because Python emphasizes ideas such as limited interactions, code uniformity and regularity, and feature consistency, it more directly fosters code that can be used long after it is first written. In the long run, Pythons focus on code quality in itself boosts programmer productivity, as well as programmer satisfaction. Python programmers can be creative, too, of course, and as well see, the language does offer multiple solutions for some tasks. At its core, though, Python encourages good engineering in ways that other scripting languages often do not. At least, thats the common consensus among many people who have adopted Python. You should always judge such claims for yourself, of course, by learning what Python has to offer. To help you get started, lets move on to the next chapter.
CHAPTER 2
This chapter and the next take a quick look at program executionhow you launch code, and how Python runs it. In this chapter, well study the Python interpreter. Chapter 3 will then show you how to get your own programs up and running. Startup details are inherently platform-specific, and some of the material in these two chapters may not apply to the platform you work on, so you should feel free to skip parts not relevant to your intended use. Likewise, more advanced readers who have used similar tools in the past and prefer to get to the meat of the language quickly may want to file some of this chapter away as for future reference. For the rest of you, lets learn how to run some code.
23
Windows users fetch and run a self-installing executable file that puts Python on their machines. Simply double-click and say Yes or Next at all prompts. Linux and Mac OS X users probably already have a usable Python preinstalled on their computersits a standard component on these platforms today. Some Linux and Mac OS X users (and most Unix users) compile Python from its full source code distribution package. Linux users can also find RPM files, and Mac OS X users can find various Macspecific installation packages. Other platforms have installation techniques relevant to those platforms. For instance, Python is available on cell phones, game consoles, and iPods, but installation details vary widely. Python itself may be fetched from the downloads page on the website, [Link] .[Link]. It may also be found through various other distribution channels. Keep in mind that you should always check to see whether Python is already present before installing it. If youre working on Windows, youll usually find Python in the Start menu, as captured in Figure 2-1 (these menu options are discussed in the next chapter). On Unix and Linux, Python probably lives in your /usr directory tree. Because installation details are so platform-specific, well finesse the rest of this story here. For more details on the installation process, consult Appendix A. For the purposes of this chapter and the next, Ill assume that youve got Python ready to go.
Program Execution
What it means to write and run a Python script depends on whether you look at these tasks as a programmer, or as a Python interpreter. Both views offer important perspectives on Python programming.
This file contains two Python print statements, which simply print a string (the text in quotes) and a numeric expression result (2 to the power 100) to the output stream. Dont worry about the syntax of this code yetfor this chapter, were interested only in getting it to run. Ill explain the print statement, and why you can raise 2 to the power 100 in Python without overflowing, in the next parts of this book.
Figure 2-1. When installed on Windows, this is how Python shows up in your Start button menu. This can vary a bit from release to release, but IDLE starts a development GUI, and Python starts a simple interactive session. Also here are the standard manuals and the PyDoc documentation engine (Module Docs).
You can create such a file of statements with any text editor you like. By convention, Python program files are given names that end in .py; technically, this naming scheme is required only for files that are imported, as shown later in this book, but most Python files have .py names for consistency. After youve typed these statements into a text file, you must tell Python to execute the filewhich simply means to run all the statements in the file from top to bottom, one after another. As youll see in the next chapter, you can launch Python program files
Program Execution | 25
by shell command lines, by clicking their icons, from within IDEs, and with other standard techniques. If all goes well, when you execute the file, youll see the results of the two print statements show up somewhere on your computerby default, usually in the same window you were in when you ran the program:
hello world 1267650600228229401496703205376
For example, heres what happened when I ran this script from a DOS command line on a Windows laptop (typically called a Command Prompt window, found in the Accessories program menu), to make sure it didnt have any silly typos:
C:\temp> python [Link] hello world 1267650600228229401496703205376
Weve just run a Python script that prints a string and a number. We probably wont win any programming awards with this code, but its enough to capture the basics of program execution.
Pythons View
The brief description in the prior section is fairly standard for scripting languages, and its usually all that most Python programmers need to know. You type code into text files, and you run those files through the interpreter. Under the hood, though, a bit more happens when you tell Python to go. Although knowledge of Python internals is not strictly required for Python programming, a basic understanding of the runtime structure of Python can help you grasp the bigger picture of program execution. When you instruct Python to run your script, there are a few steps that Python carries out before your code actually starts crunching away. Specifically, its first compiled to something called byte code and then routed to something called a virtual machine.
programs alongside the corresponding source code files (that is, in the same directories). Python saves byte code like this as a startup speed optimization. The next time you run your program, Python will load the .pyc files and skip the compilation step, as long as you havent changed your source code since the byte code was last saved. Python automatically checks the timestamps of source and byte code files to know when it must recompileif you resave your source code, byte code is automatically re-created the next time your program is run. If Python cannot write the byte code files to your machine, your program still works the byte code is generated in memory and simply discarded on program exit.* However, because .pyc files speed startup time, youll want to make sure they are written for larger programs. Byte code files are also one way to ship Python programsPython is happy to run a program if all it can find are .pyc files, even if the original .py source files are absent. (See Frozen Binaries on page 32 for another shipping option.)
Performance implications
Readers with a background in fully compiled languages such as C and C++ might notice a few differences in the Python model. For one thing, there is usually no build or make step in Python work: code runs immediately after it is written. For another, Python byte code is not binary machine code (e.g., instructions for an Intel chip). Byte code is a Python-specific representation.
* And, strictly speaking, byte code is saved only for files that are imported, not for the top-level file of a program. Well explore imports in Chapter 3, and again in Part V. Byte code is also never saved for code typed at the interactive prompt, which is described in Chapter 3.
Program Execution | 27
Figure 2-2. Pythons traditional runtime execution model: source code you type is translated to byte code, which is then run by the Python Virtual Machine. Your code is automatically compiled, but then it is interpreted.
This is why some Python code may not run as fast as C or C++ code, as described in Chapter 1the PVM loop, not the CPU chip, still must interpret the byte code, and byte code instructions require more work than CPU instructions. On the other hand, unlike in classic interpreters, there is still an internal compile stepPython does not need to reanalyze and reparse each source statement repeatedly. The net effect is that pure Python code runs at speeds somewhere between those of a traditional compiled language and a traditional interpreted language. See Chapter 1 for more on Python performance tradeoffs.
Development implications
Another ramification of Pythons execution model is that there is really no distinction between the development and execution environments. That is, the systems that compile and execute your source code are really one and the same. This similarity may have a bit more significance to readers with a background in traditional compiled languages, but in Python, the compiler is always present at runtime and is part of the system that runs programs. This makes for a much more rapid development cycle. There is no need to precompile and link before execution may begin; simply type and run the code. This also adds a much more dynamic flavor to the languageit is possible, and often very convenient, for Python programs to construct and execute other Python programs at runtime. The eval and exec built-ins, for instance, accept and run strings containing Python program code. This structure is also why Python lends itself to product customizationbecause Python code can be changed on the fly, users can modify the Python parts of a system onsite without needing to have or compile the entire systems code. At a more fundamental level, keep in mind that all we really have in Python is runtime there is no initial compile-time phase at all, and everything happens as the program is running. This even includes operations such as the creation of functions and classes and the linkage of modules. Such events occur before execution in more static languages, but happen as programs execute in Python. As well see, the net effect makes for a much more dynamic programming experience than that to which some readers may be accustomed.
CPython
The original, and standard, implementation of Python is usually called CPython, when you want to contrast it with the other two. Its name comes from the fact that it is coded in portable ANSI C language code. This is the Python that you fetch from [Link] .[Link], get with the ActivePython distribution, and have automatically on most Linux and Mac OS X machines. If youve found a preinstalled version of Python on your machine, its probably CPython, unless your company is using Python in very specialized ways. Unless you want to script Java or .NET applications with Python, you probably want to use the standard CPython system. Because it is the reference implementation of the language, it tends to run the fastest, be the most complete, and be more robust than the alternative systems. Figure 2-2 reflects CPythons runtime architecture.
Jython
The Jython system (originally known as JPython) is an alternative implementation of the Python language, targeted for integration with the Java programming language. Jython consists of Java classes that compile Python source code to Java byte code and then route the resulting byte code to the Java Virtual Machine (JVM). Programmers still code Python statements in .py text files as usual; the Jython system essentially just replaces the rightmost two bubbles in Figure 2-2 with Java-based equivalents. Jythons goal is to allow Python code to script Java applications, much as CPython allows Python to script C and C++ components. Its integration with Java is remarkably seamless. Because Python code is translated to Java byte code, it looks and feels like a true Java program at runtime. Jython scripts can serve as web applets and servlets, build Java-based GUIs, and so on. Moreover, Jython includes integration support that allows
Execution Model Variations | 29
Python code to import and use Java classes as though they were coded in Python. Because Jython is slower and less robust than CPython, though, it is usually seen as a tool of interest primarily to Java developers looking for a scripting language to be a frontend to Java code.
IronPython
A third implementation of Python, and newer than both CPython and Jython, IronPython is designed to allow Python programs to integrate with applications coded to work with Microsofts .NET Framework for Windows, as well as the Mono open source equivalent for Linux. .NET and its C# programming language runtime system are designed to be a language-neutral object communication layer, in the spirit of Microsofts earlier COM model. IronPython allows Python programs to act as both client and server components, accessible from other .NET languages. By implementation, IronPython is very much like Jython (and, in fact, was developed by the same creator)it replaces the last two bubbles in Figure 2-2 with equivalents for execution in the .NET environment. Also, like Jython, IronPython has a special focusit is primarily of interest to developers integrating Python with .NET components. Because it is being developed by Microsoft, though, IronPython might also be able to leverage some important optimization tools for better performance. IronPythons scope is still evolving as I write this; for more details, consult the Python online resources or search the Web.
Jython and IronPython are completely independent implementations of Python that compile Python source for different runtime architectures. It is also possible to access Java and .NET software from standard CPython programs: JPype and Python for .NET systems, for example, allow CPython code to call out to Java and .NET components.
translation without requiring changes to the code or a separate compilation step during development. Roughly, while your program runs, Psyco collects information about the kinds of objects being passed around; that information can be used to generate highly efficient machine code tailored for those object types. Once generated, the machine code then replaces the corresponding part of the original byte code to speed your programs overall execution. The net effect is that, with Psyco, your program becomes much quicker over time and as it is running. In ideal cases, some Python code may become as fast as compiled C code under Psyco. Because this translation from byte code happens at program runtime, Psyco is generally known as a just-in-time (JIT) compiler. Psyco is actually a bit different from the JIT compilers some readers may have seen for the Java language, though. Really, Psyco is a specializing JIT compilerit generates machine code tailored to the data types that your program actually uses. For example, if a part of your program uses different data types at different times, Psyco may generate a different version of machine code to support each different type combination. Psyco has been shown to speed Python code dramatically. According to its web page, Psyco provides 2x to 100x speed-ups, typically 4x, with an unmodified Python interpreter and unmodified source code, just a dynamically loadable C extension module. Of equal significance, the largest speedups are realized for algorithmic code written in pure Pythonexactly the sort of code you might normally migrate to C to optimize. With Psyco, such migrations become even less important. Psyco is not yet a standard part of Python; you will have to fetch and install it separately. It is also still something of a research project, so youll have to track its evolution online. In fact, at this writing, although Psyco can still be fetched and installed by itself, it appears that much of the system may eventually be absorbed into the newer PyPy projectan attempt to reimplement Pythons PVM in Python code, to better support optimizations like Psyco. Perhaps the largest downside of Psyco is that it currently only generates machine code for Intel x86 architecture chips, though this includes Windows and Linux boxes and recent Macs. For more details on the Psyco extension, and other JIT efforts that may arise, consult [Link] you can also check out Psycos home page, which currently resides at [Link]
Initial results, though, show that it has the potential to outperform both standard Python and the Psyco extension in terms of execution speed, and it is a promising project. Search the Web for details on the projects current status.
Frozen Binaries
Sometimes when people ask for a real Python compiler, what theyre really seeking is simply a way to generate standalone binary executables from their Python programs. This is more a packaging and shipping idea than an execution-flow concept, but its somewhat related. With the help of third-party tools that you can fetch off the Web, it is possible to turn your Python programs into true executables, known as frozen binaries in the Python world. Frozen binaries bundle together the byte code of your program files, along with the PVM (interpreter) and any Python support files your program needs, into a single package. There are some variations on this theme, but the end result can be a single binary executable program (e.g., an .exe file on Windows) that can easily be shipped to customers. In Figure 2-2, it is as though the byte code and PVM are merged into a single componenta frozen binary file. Today, three primary systems are capable of generating frozen binaries: py2exe (for Windows), PyInstaller (which is similar to py2exe but also works on Linux and Unix and is capable of generating self-installing binaries), and freeze (the original). You may have to fetch these tools separately from Python itself, but they are available free of charge. They are also constantly evolving, so consult [Link] or your favorite web search engine for more on these tools. To give you an idea of the scope of these systems, py2exe can freeze standalone programs that use the tkinter, PMW, wxPython, and PyGTK GUI libraries; programs that use the pygame game programming toolkit; win32com client programs; and more. Frozen binaries are not the same as the output of a true compilerthey run byte code through a virtual machine. Hence, apart from a possible startup improvement, frozen binaries run at the same speed as the original source files. Frozen binaries are not small (they contain a PVM), but by current standards they are not unusually large either. Because Python is embedded in the frozen binary, though, it does not have to be installed on the receiving end to run your program. Moreover, because your code is embedded in the frozen binary, it is more effectively hidden from recipients. This single file-packaging scheme is especially appealing to developers of commercial software. For instance, a Python-coded user interface program based on the tkinter toolkit can be frozen into an executable file and shipped as a self-contained program on a CD or on the Web. End users do not need to install (or even have to know about) Python to run the shipped program.
Future Possibilities?
Finally, note that the runtime execution model sketched here is really an artifact of the current implementation of Python, not of the language itself. For instance, its not impossible that a full, traditional compiler for translating Python source code to machine code may appear during the shelf life of this book (although one has not in nearly two decades!). New byte code formats and implementation variants may also be adopted in the future. For instance: The Parrot project aims to provide a common byte code format, virtual machine, and optimization techniques for a variety of programming languages (see http:// [Link]). Pythons own PVM runs Python code more efficiently than Parrot, but its unclear how Parrot will evolve. The PyPy project is an attempt to reimplement the PVM in Python itself to enable new implementation techniques. Its goal is to produce a fast and flexible implementation of Python. The Google-sponsored Unladen Swallow project aims to make standard Python faster by a factor of at least 5, and fast enough to replace the C language in many contexts. It is an optimization branch of CPython, intended to be fully compatible and significantly faster. This project also hopes to remove the Python multithreading Global Interpreter Lock (GIL), which prevents pure Python threads from truly overlapping in time. This is currently an emerging project being developed as open source by Google engineers; it is initially targeting Python 2.6, though 3.0 may acquire its changes too. Search Google for up-to-date details. Although such future implementation schemes may alter the runtime structure of Python somewhat, it seems likely that the byte code compiler will still be the standard for
Execution Model Variations | 33
some time to come. The portability and runtime flexibility of byte code are important features of many Python systems. Moreover, adding type constraint declarations to support static compilation would break the flexibility, conciseness, simplicity, and overall spirit of Python coding. Due to Pythons highly dynamic nature, any future implementation will likely retain many artifacts of the current PVM.
Chapter Summary
This chapter introduced the execution model of Python (how Python runs your programs) and explored some common variations on that model (just-in-time compilers and the like). Although you dont really need to come to grips with Python internals to write Python scripts, a passing acquaintance with this chapters topics will help you truly understand how your programs run once you start coding them. In the next chapter, youll start actually running some code of your own. First, though, heres the usual chapter quiz.
CHAPTER 3
OK, its time to start running some code. Now that you have a handle on program execution, youre finally ready to start some real Python programming. At this point, Ill assume that you have Python installed on your computer; if not, see the prior chapter and Appendix A for installation and configuration hints. There are a variety of ways to tell Python to execute the code you type. This chapter discusses all the program launching techniques in common use today. Along the way, youll learn how to type code interactively and how to save it in files to be run with system command lines, icon clicks, module imports and reloads, exec calls, menu options in GUIs such as IDLE, and more. If you just want to find out how to run a Python program quickly, you may be tempted to read the parts of this chapter that pertain only to your platform and move on to Chapter 4. But dont skip the material on module imports, as thats essential to understanding Pythons program architecture. I also encourage you to at least skim the sections on IDLE and other IDEs, so youll know what tools are available for when you start developing more sophisticated Python programs.
35
% python Python 3.0.1 (r301:69561, Feb 13 2009, [Link]) [MSC v.1500 32 bit (Intel)] ... Type "help", "copyright", "credits" or "license" for more information. >>>
Typing the word python at your system shell prompt like this begins an interactive Python session; the % character at the start of this listing stands for a generic system prompt in this bookits not input that you type yourself. The notion of a system shell prompt is generic, but exactly how you access it varies by platform: On Windows, you can type python in a DOS console window (a.k.a. the Command Prompt, usually found in the Accessories section of the StartPrograms menu) or in the StartRun... dialog box. On Unix, Linux, and Mac OS X, you might type this command in a shell or terminal window (e.g., in an xterm or console running a shell such as ksh or csh). Other systems may use similar or platform-specific devices. On handheld devices, for example, you generally click the Python icon in the home or application window to launch an interactive session. If you have not set your shells PATH environment variable to include Pythons install directory, you may need to replace the word python with the full path to the Python executable on your machine. On Unix, Linux, and similar, /usr/local/bin/python or /usr/bin/python will often suffice. On Windows, try typing C:\Python30\python (for version 3.0):
C:\misc> c:\python30\python Python 3.0.1 (r301:69561, Feb 13 2009, [Link]) [MSC v.1500 32 bit (Intel)] ... Type "help", "copyright", "credits" or "license" for more information. >>>
Alternatively, you can run a change-directory command to go to Pythons install directory before typing pythontry the cd c:\python30 command on Windows, for example:
C:\misc> cd C:\Python30 C:\Python30> python Python 3.0.1 (r301:69561, Feb 13 2009, [Link]) [MSC v.1500 32 bit (Intel)] ... Type "help", "copyright", "credits" or "license" for more information. >>>
On Windows, besides typing python in a shell window, you can also begin similar interactive sessions by starting IDLEs main window (discussed later) or by selecting the Python (command line) menu option from the Start button menu for Python, as shown in Figure 2-1 back in Chapter 2. Both spawn a Python interactive prompt with equivalent functionality; typing a shell command isnt necessary.
Again, you dont need to worry about the details of the print statements shown here yet; well start digging into syntax in the next chapter. In short, they print a Python string and an integer, as shown by the output lines that appear after each >>> input line (2 ** 8 means 2 raised to the power 8 in Python). When coding interactively like this, you can type as many Python commands as you like; each is run immediately after its entered. Moreover, because the interactive session automatically prints the results of expressions you type, you dont usually need to say print explicitly at this prompt:
>>> lumberjack = 'okay' >>> lumberjack 'okay' >>> 2 ** 8 256 >>> %
Here, the fist line saves a value by assigning it to a variable, and the last two lines typed are expressions (lumberjack and 2 ** 8)their results are displayed automatically. To exit an interactive session like this one and return to your system shell prompt, type Ctrl-D on Unix-like machines; on MS-DOS and Windows systems, type Ctrl-Z to exit. In the IDLE GUI discussed later, either type Ctrl-D or simply close the window. Now, we didnt do much in this sessions codejust typed some Python print and assignment statements, along with a few expressions, which well study in detail later. The main thing to notice is that the interpreter executes the code entered on each line immediately, when the Enter key is pressed.
For example, when we typed the first print statement at the >>> prompt, the output (a Python string) was echoed back right away. There was no need to create a source-code file, and no need to run the code through a compiler and linker first, as youd normally do when using a language such as C or C++. As youll see in later chapters, you can also run multiline statements at the interactive prompt; such a statement runs immediately after youve entered all of its lines and pressed Enter twice to add a blank line.
Experimenting
Because code is executed immediately, the interactive prompt is a perfect place to experiment with the language and will be used often in this book to demonstrate smaller examples. In fact, this is the first rule of thumb to remember: if youre ever in doubt about how a piece of Python code works, fire up the interactive command line and try it out to see what happens. For instance, suppose youre reading a Python programs code and you come across an expression like 'Spam!' * 8 whose meaning you dont understand. At this point, you can spend 10 minutes wading through manuals and books to try to figure out what the code does, or you can simply run it interactively:
>>> 'Spam!' * 8 'Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam!' <== Learning by trying
The immediate feedback you receive at the interactive prompt is often the quickest way to deduce what a piece of code does. Here, its clear that it does string repetition: in Python * means multiply for numbers, but repeat for stringsits like concatenating a string to itself repeatedly (more on strings in Chapter 4). Chances are good that you wont break anything by experimenting this wayat least, not yet. To do real damage, like deleting files and running shell commands, you must really try, by importing modules explicitly (you also need to know more about Pythons system interfaces in general before you will become that dangerous!). Straight Python code is almost always safe to run. For instance, watch what happens when you make a mistake at the interactive prompt:
>>> X Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'X' is not defined
In Python, using a variable before it has been assigned a value is always an error (otherwise, if names were filled in with defaults, some errors might go undetected). Well learn more about that later; the important point here is that you dont crash Python or your computer when you make a mistake this way. Instead, you get a meaningful error message pointing out the mistake and the line of code that made it, and you can continue on in your session or script. In fact, once you get comfortable with Python, its error messages may often provide as much debugging support as youll need (youll read more on debugging in the sidebar Debugging Python Code on page 67).
Testing
Besides serving as a tool for experimenting while youre learning the language, the interactive interpreter is also an ideal place to test code youve written in files. You can import your module files interactively and run tests on the tools they define by typing calls at the interactive prompt. For instance, of the following tests a function in a precoded module that ships with Python in its standard library (it prints the name of the directory youre currently working in), but you can do the same once you start writing module files of your own:
>>> import os >>> [Link]() 'c:\\Python30' <== Testing on the fly
More generally, the interactive prompt is a place to test program components, regardless of their sourceyou can import and test functions and classes in your Python files, type calls to linked-in C functions, exercise Java classes under Jython, and more. Partly because of its interactive nature, Python supports an experimental and exploratory programming style youll find convenient when getting started.
print statements are required only in files. Because the interactive interpreter automatically prints the results of expressions, you do not need to type complete print statements interactively. This is a nice feature, but it tends to confuse users when they move on to writing code in files: within a code file, you must use print statements to see your output because expression results are not automatically echoed. Remember, you must say print in files, but not interactively. Dont indent at the interactive prompt (yet). When typing Python programs, either interactively or into a text file, be sure to start all your unnested statements in column 1 (that is, all the way to the left). If you dont, Python may print a SyntaxError message, because blank space to the left of your code is taken to be indentation that groups nested statements. Until Chapter 10, all statements you write will be unnested, so this includes everything for now. This seems to be a recurring confusion in introductory Python classes. Remember, a leading space generates an error message. Watch out for prompt changes for compound statements. We wont meet compound (multiline) statements until Chapter 4, and not in earnest until Chapter 10, but as a preview, you should know that when typing lines 2 and beyond of a compound statement interactively, the prompt may change. In the simple shell window interface, the interactive prompt changes to ... instead of >>> for lines 2 and beyond; in the IDLE interface, lines after the first are automatically indented. Youll see why this matters in Chapter 10. For now, if you happen to come across a ... prompt or a blank line when entering your code, it probably means that youve somehow confused interactive Python into thinking youre typing a multiline statement. Try hitting the Enter key or a Ctrl-C combination to get back to the main prompt. The >>> and ... prompt strings can also be changed (they are available in the built-in module sys), but Ill assume they have not been in the books example listings. Terminate compound statements at the interactive prompt with a blank line. At the interactive prompt, inserting a blank line (by hitting the Enter key at the start of a line) is necessary to tell interactive Python that youre done typing the multiline statement. That is, you must press Enter twice to make a compound statement run. By contrast, blank lines are not required in files and are simply ignored if present. If you dont press Enter twice at the end of a compound statement when working interactively, youll appear to be stuck in a limbo state, because the interactive interpreter will do nothing at allits waiting for you to press Enter again! The interactive prompt runs one statement at a time. At the interactive prompt, you must run one statement to completion before typing another. This is natural for simple statements, because pressing the Enter key runs the statement entered. For compound statements, though, remember that you must submit a blank line to terminate the statement and make it run before you can type the next statement.
You dont need the blank line after compound statements in a script file, though; this is required only at the interactive prompt. In a file, blank lines are not required and are simply ignored when present; at the interactive prompt, they terminate multiline statements. Also bear in mind that the interactive prompt runs just one statement at a time: you must press Enter twice to run a loop or other multiline statement before you can type the next statement:
>>> for x in 'spam': ... print(x) ... print('done') File "<stdin>", line 3 print('done') ^ SyntaxError: invalid syntax <== Need to press Enter twice before a new statement
This means you cant cut and paste multiple lines of code into the interactive prompt, unless the code includes blank lines after each compound statement. Such code is better run in a filethe next sections topic.
To save programs permanently, you need to write your code in files, which are usually known as modules. Modules are simply text files containing Python statements. Once coded, you can ask the Python interpreter to execute the statements in such a file any number of times, and in a variety of waysby system command lines, by file icon clicks, by options in the IDLE user interface, and more. Regardless of how it is run, Python executes all the code in a module file from top to bottom each time you run the file. Terminology in this domain can vary somewhat. For instance, module files are often referred to as programs in Pythonthat is, a program is considered to be a series of precoded statements stored in a file for repeated execution. Module files that are run directly are also sometimes called scriptsan informal term usually meaning a top-level program file. Some reserve the term module for a file imported from another file. (More on the meaning of top-level and imports in a few moments.) Whatever you call them, the next few sections explore ways to run code typed into module files. In this section, youll learn how to run files in the most basic way: by listing their names in a python command line entered at your computers system prompt. Though it might seem primitive to some, for many programmers a system shell command-line window, together with a text editor window, constitutes as much of an integrated development environment as they will ever need.
A First Script
Lets get started. Open your favorite text editor (e.g., vi, Notepad, or the IDLE editor), and type the following statements into a new text file named [Link]:
# A first Python script import sys print([Link]) print(2 ** 100) x = 'Spam!' print(x * 8) # Load a library module # Raise 2 to a power # String repetition
This file is our first official Python script (not counting the two-liner in Chapter 2). You shouldnt worry too much about this files code, but as a brief description, this file: Imports a Python module (libraries of additional tools), to fetch the name of the platform Runs three print function calls, to display the scripts results Uses a variable named x, created when its assigned, to hold onto a string object Applies various object operations that well begin studying in the next chapter The [Link] here is just a string that identifies the kind of computer youre working on; it lives in a standard Python module called sys, which you must import to load (again, more on imports later).
For color, Ive also added some formal Python comments herethe text after the # characters. Comments can show up on lines by themselves, or to the right of code on a line. The text after a # is simply ignored as a human-readable comment and is not considered part of the statements syntax. If youre copying this code, you can ignore the comments as well. In this book, we usually use a different formatting style to make comments more visually distinctive, but theyll appear as normal text in your code. Again, dont focus on the syntax of the code in this file for now; well learn about all of it later. The main point to notice is that youve typed this code into a file, rather than at the interactive prompt. In the process, youve coded a fully functional Python script. Notice that the module file is called [Link]. As for all top-level files, it could also be called simply script, but files of code you want to import into a client have to end with a .py suffix. Well study imports later in this chapter. Because you may want to import them in the future, its a good idea to use .py suffixes for most Python files that you code. Also, some text editors detect Python files by their .py suffix; if the suffix is not present, you may not get features like syntax colorization and automatic indentation.
Again, you can type such a system shell command in whatever your system provides for command-line entrya Windows Command Prompt window, an xterm window, or similar. Remember to replace python with a full directory path, as before, if your PATH setting is not configured. If all works as planned, this shell command makes Python run the code in this file line by line, and you will see the output of the scripts three print statementsthe name of the underlying platform, 2 raised to the power 100, and the result of the same string repetition expression we saw earlier (again, more on the last two of these in Chapter 4). If all didnt work as planned, youll get an error messagemake sure youve entered the code in your file exactly as shown, and try again. Well talk about debugging options in the sidebar Debugging Python Code on page 67, but at this point in the book your best bet is probably rote imitation. Because this scheme uses shell command lines to start Python programs, all the usual shell syntax applies. For instance, you can route the output of a Python script to a file to save it for later use or inspection by using special shell syntax:
% python [Link] > [Link]
In this case, the three output lines shown in the prior run are stored in the file [Link] instead of being printed. This is generally known as stream redirection; it works for input and output text and is available on Windows and Unix-like systems. It also has little to do with Python (Python simply supports it), so we will skip further details on shell redirection syntax here. If you are working on a Windows platform, this example works the same, but the system prompt is normally different:
C:\Python30> python [Link] win32 1267650600228229401496703205376 Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam!
As usual, be sure to type the full path to Python if you havent set your PATH environment variable to include this path or run a change-directory command to go to the path:
D:\temp> C:\python30\python [Link] win32 1267650600228229401496703205376 Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam!
On all recent versions of Windows, you can also type just the name of your script, and omit the name of Python itself. Because newer Windows systems use the Windows Registry to find a program with which to run a file, you dont need to name python on the command line explicitly to run a .py file. The prior command, for example, could be simplified to this on most Windows machines:
D:\temp> [Link]
Finally, remember to give the full path to your script file if it lives in a different directory from the one in which you are working. For example, the following system command line, run from D:\other, assumes Python is in your system path but runs a file located elsewhere:
D:\other> python c:\code\[Link]
If your PATH doesnt include Pythons directory, and neither Python nor your script file is in the directory youre working in, use full paths for both:
D:\other> C:\Python30\python c:\code\[Link]
Beware of automatic extensions on Windows. If you use the Notepad program to code program files on Windows, be careful to pick the type All Files when it comes time to save your file, and give the file a .py suffix explicitly. Otherwise, Notepad will save your file with a .txt extension (e.g., as [Link]), making it difficult to run in some launching schemes. Worse, Windows hides file extensions by default, so unless you have changed your view options you may not even notice that youve coded a text file and not a Python file. The files icon may give this awayif it doesnt have a snake on it, you may have trouble. Uncolored code in IDLE and files that open to edit instead of run when clicked are other symptoms of this problem. Microsoft Word similarly adds a .doc extension by default; much worse, it adds formatting characters that are not legal Python syntax. As a rule of thumb, always pick All Files when saving under Windows, or use a more programmer-friendly text editor such as IDLE. IDLE does not even add a .py suffix automaticallya feature programmers tend to like, but users do not. Use file extensions and directory paths at system prompts, but not for imports. Dont forget to type the full name of your file in system command lines that is, use python [Link] rather than python script1. By contrast, Pythons import statements, which well meet later in this chapter, omit both the .py file suffix and the directory path (e.g., import script1). This may seem trivial, but confusing these two is a common mistake. At the system prompt, you are in a system shell, not Python, so Pythons module file search rules do not apply. Because of that, you must include both the .py extension and, if necessary, the full directory path leading to the file you wish to run. For instance, to run a file that resides in a different directory from the one in which you are working, you would typically list its full path (e.g., python d:\tests\[Link]). Within Python code, however, you can just say import spam and rely on the Python module search path to locate your file, as described later. Use print statements in files. Yes, weve already been over this, but it is such a common mistake that its worth repeating at least once here. Unlike in interactive coding, you generally must use print statements to see output from program files. If you dont see any output, make sure youve said print in your file. Again, though, print statements are not required in an interactive session, since Python automatically echoes expression results; prints dont hurt here, but are superfluous extra typing.
The special line at the top of the file tells the system where the Python interpreter lives. Technically, the first line is a Python comment. As mentioned earlier, all comments in Python programs start with a # and span to the end of the line; they are a place to insert extra information for human readers of your code. But when a comment such as the first line in this file appears, its special because the operating system uses it to find an interpreter for running the program code in the rest of the file. Also, note that this file is called simply brian, without the .py suffix used for the module file earlier. Adding a .py to the name wouldnt hurt (and might help you remember that this is a Python program file), but because you dont plan on letting other modules import the code in this file, the name of the file is irrelevant. If you give the file executable privileges with a chmod +x brian shell command, you can run it from the operating system shell as though it were a binary program:
% brian The Bright Side of Life...
A note for Windows users: the method described here is a Unix trick, and it may not work on your platform. Not to worry; just use the basic command-line technique explored earlier. List the files name on an explicit python command line:*
* As we discussed when exploring command lines, modern Windows versions also let you type just the name of a .py file at the system command linethey use the Registry to determine that the file should be opened with Python (e.g., typing [Link] is equivalent to typing python [Link]). This command-line mode is similar in spirit to the Unix #!, though it is system-wide on Windows, not per-file. Note that some programs may actually interpret and use a first #! line on Windows much like on Unix, but the DOS system shell on Windows simply ignores it.
In this case, you dont need the special #! comment at the top (although Python just ignores it if its present), and the file doesnt need to be given executable privileges. In fact, if you want to run files portably between Unix and Microsoft Windows, your life will probably be simpler if you always use the basic command-line approach, not Unixstyle scripts, to launch programs.
When coded this way, the env program locates the Python interpreter according to your system search path settings (i.e., in most Unix shells, by looking in all the directories listed in the PATH environment variable). This scheme can be more portable, as you dont need to hardcode a Python install path in the first line of all your scripts. Provided you have access to env everywhere, your scripts will run no matter where Python lives on your systemyou need only change the PATH environment variable settings across platforms, not in the first line in all your scripts. Of course, this assumes that env lives in the same place everywhere (on some machines, it may be in /sbin, /bin, or elsewhere); if not, all portability bets are off!
# A first Python script import sys print([Link]) print(2 ** 100) x = 'Spam!' print(x * 8)
As weve seen, you can always run this file from a system command line:
C:\misc> c:\python30\python [Link] win32 1267650600228229401496703205376
However, icon clicks allow you to run the file without any typing at all. If you find this files iconfor instance, by selecting Computer (or My Computer in XP) in your Start menu and working your way down on the C drive on Windowsyou will get the file explorer picture captured in Figure 3-1 (Windows Vista is being used here). Python source files show up with white backgrounds on Windows, and byte code files show up with black backgrounds. You will normally want to click (or otherwise run) the source code file, in order to pick up your most recent changes. To launch the file here, simply click on the icon for [Link].
Figure 3-1. On Windows, Python program files show up as icons in file explorer windows and can automatically be run with a double-click of the mouse (though you might not see printed output or error messages this way).
In general, input reads the next line of standard input, waiting if there is none yet available. The net effect in this context will be to pause the script, thereby keeping the output window shown in Figure 3-2 open until you press the Enter key.
Figure 3-2. When you click a programs icon on Windows, you will be able to see its printed output if you include an input call at the very end of the script. But you only need to do so in this context!
Now that Ive shown you this trick, keep in mind that it is usually only required for Windows, and then only if your script prints text and exits and only if you will launch the script by clicking its file icon. You should add this call to the bottom of your toplevel files if and only if all of these three conditions apply. There is no reason to add this call in any other contexts (unless youre unreasonably fond of pressing your computers Enter key!). That may sound obvious, but its another common mistake in live classes. Before we move ahead, note that the input call applied here is the input counterpart of using the print statement for outputs. It is the simplest way to read user input, and it is more general than this example implies. For instance, input: Optionally accepts a string that will be printed as a prompt (e.g., input('Press Enter to exit')) Returns to your script a line of text read as a string (e.g., nextinput = input()) Supports input stream redirections at the system shell level (e.g., python [Link] < [Link]), just as the print statement does for output Well use input in more advanced ways later in this text; for instance, Chapter 10 will apply it in an interactive loop.
Version skew note: If you are working in Python 2.6 or earlier, use raw_input() instead of input() in this code. The former was renamed to the latter in Python 3.0. Technically, 2.6 has an input too, but it also evaluates strings as though they are program code typed into a script, and so will not work in this context (an empty string is an error). Python 3.0s input (and 2.6s raw_input) simply returns the entered text as a string, unevaluated. To simulate 2.6s input in 3.0, use eval(input()).
It is also possible to completely suppress the pop-up DOS console window for clicked files on Windows. Files whose names end in a .pyw extension will display only windows constructed by your script, not the default DOS console window. .pyw files are simply .py source files that have this special operational behavior on Windows. They are mostly used for Python-coded user interfaces that build windows of their own, often in conjunction with various techniques for saving printed output and errors to files.
Because of these limitations, it is probably best to view icon clicks as a way to launch programs after they have been debugged or have been instrumented to write their output to a file. Especially when starting out, use other techniquessuch as system command lines and IDLE (discussed further in the section The IDLE User Interface on page 58)so that you can see generated error messages and view your normal output without resorting to coding tricks. When we discuss exceptions later in this book, youll also learn that it is possible to intercept and recover from errors so that they do not terminate your programs. Watch for the discussion of the try statement later in this book for an alternative way to keep the console window from closing on errors.
This works, but only once per session (really, process) by default. After the first import, later imports do nothing, even if you change and save the modules source file again in another window:
>>> import script1 >>> import script1
This is by design; imports are too expensive an operation to repeat more than once per file, per program run. As youll learn in Chapter 21, imports must find files, compile them to byte code, and run the code. If you really want to force Python to run the file again in the same session without stopping and restarting the session, you need to instead call the reload function available in the imp standard library module (this function is also a simple built-in in Python 2.6, but not in 3.0):
>>> from imp import reload # Must load from module in 3.0 >>> reload(script1) win32 65536 Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam! <module 'script1' from '[Link]'> >>>
The from statement here simply copies a name out of a module (more on this soon). The reload function itself loads and runs the current version of your files code, picking up changes if youve changed and saved it in another window. This allows you to edit and pick up new code on the fly within the current Python interactive session. In this session, for example, the second print statement in [Link] was changed in another window to print 2 ** 16 between the time of the first import and the reload call. The reload function expects the name of an already loaded module object, so you have to have successfully imported a module once before you reload it. Notice that reload also expects parentheses around the module object name, whereas import does not. reload is a function that is called, and import is a statement. Thats why you must pass the module name to reload as an argument in parentheses, and thats why you get back an extra output line when reloading. The last output line is just the display representation of the reload calls return value, a Python module object. Well learn more about using functions in general in Chapter 16.
Version skew note: Python 3.0 moved the reload built-in function to the imp standard library module. It still reloads files as before, but you must import it in order to use it. In 3.0, run an import imp and use [Link](M), or run a from imp import reload and use reload(M), as shown here. Well discuss import and from statements in the next section, and more formally later in this book. If you are working in Python 2.6 (or 2.X in general), reload is available as a built-in function, so no import is required. In Python 2.6, reload is available in both formsbuilt-in and module functionto aid the transition to 3.0. In other words, reloading is still available in 3.0, but an extra line of code is required to fetch the reload call. The move in 3.0 was likely motivated in part by some well-known issues involving reload and from statements that well encounter in the next section. In short, names loaded with a from are not directly updated by a reload, but names accessed with an import statement are. If your names dont seem to change after a reload, try using import and [Link] name references instead.
This may be one of the worlds simplest Python modules (it contains a single assignment statement), but its enough to illustrate the point. When this file is imported, its code is run to generate the modules attribute. The assignment statement creates a module attribute named title.
You can access this modules title attribute in other components in two different ways. First, you can load the module as a whole with an import statement, and then qualify the module name with the attribute name to fetch it:
% python >>> import myfile >>> print([Link]) The Meaning of Life # Start Python # Run file; load module as a whole # Use its attribute names: '.' to qualify
In general, the dot expression syntax [Link] lets you fetch any attribute attached to any object, and this is a very common operation in Python code. Here, weve used it to access the string variable title inside the module myfilein other words, [Link]. Alternatively, you can fetch (really, copy) names out of a module with from statements:
% python >>> from myfile import title >>> print(title) The Meaning of Life # Start Python # Run file; copy its names # Use name directly: no need to qualify
As youll see in more detail later, from is just like an import, with an extra assignment to names in the importing component. Technically, from copies a modules attributes, such that they become simple variables in the recipientthus, you can simply refer to the imported string this time as title (a variable) instead of [Link] (an attribute reference). Whether you use import or from to invoke an import operation, the statements in the module file [Link] are executed, and the importing component (here, the interactive prompt) gains access to names assigned at the top level of the file. Theres only one such name in this simple examplethe variable title, assigned to a stringbut the concept will be more useful when you start defining objects such as functions and classes in your modules: such objects become reusable software components that can be accessed by name from one or more client modules. In practice, module files usually define more than one name to be used in and outside the files. Heres an example that defines three:
a = 'dead' b = 'parrot' c = 'sketch' print(a, b, c) # Define three attributes # Exported to other files # Also used in this file
This file, [Link], assigns three variables, and so generates three attributes for the outside world. It also uses its own three variables in a print statement, as we see when we run this as a top-level file:
Notice that import and from both list the name of the module file as simply myfile without its .py suffix. As youll learn in Part V, when Python looks for the actual file, it knows to include the suffix in its search procedure. Again, you must include the .py suffix in system shell command lines, but not in import statements.
All of this files code runs as usual the first time it is imported elsewhere (by either an import or from). Clients of this file that use import get a module with attributes, while clients that use from get copies of the files names:
% python >>> import threenames dead parrot sketch >>> >>> threenames.b, threenames.c ('parrot', 'sketch') >>> >>> from threenames import a, b, c >>> b, c ('parrot', 'sketch') # Grab the whole module
The results here are printed in parentheses because they are really tuples (a kind of object covered in the next part of this book); you can safely ignore them for now. Once you start coding modules with multiple names like this, the built-in dir function starts to come in handyyou can use it to fetch a list of the names available inside a module. The following returns a Python list of strings (well start studying lists in the next chapter):
>>> dir(threenames) ['__builtins__', '__doc__', '__file__', '__name__', '__package__', 'a', 'b', 'c']
I ran this on Python 3.0 and 2.6; older Pythons may return fewer names. When the dir function is called with the name of an imported module passed in parentheses like this, it returns all the attributes inside that module. Some of the names it returns are names you get for free: names with leading and trailing double underscores are builtin names that are always predefined by Python and that have special meaning to the interpreter. The variables our code defined by assignmenta, b, and cshow up last in the dir result.
In fact, as youll see, modules are one of a handful of ways that Python goes to great lengths to package your variables into compartments to avoid name clashes. Well discuss modules and other namespace constructs (including classes and function scopes) further later in the book. For now, modules will come in handy as a way to run your code many times without having to retype it.
import versus from: I should point out that the from statement in a sense defeats the namespace partitioning purpose of modulesbecause the from copies variables from one file to another, it can cause same-named variables in the importing file to be overwritten (and wont warn you if it does). This essentially collapses namespaces together, at least in terms of the copied variables. Because of this, some recommend using import instead of from. I wont go that far, though; not only does from involve less typing, but its purported problem is rarely an issue in practice. Besides, this is something you control by listing the variables you want in the from; as long as you understand that theyll be assigned values, this is no more dangerous than coding assignment statementsanother feature youll probably want to use!
For now, if you must import, try to keep all your files in the directory you are working in to avoid complications. That said, imports and reloads have proven to be a popular testing technique in Python classes, and you may prefer using this approach too. As usual, though, if you find yourself running into a wall, stop running into a wall!
The exec call has an effect similar to an import, but it doesnt technically import the moduleby default, each time you call exec this way it runs the file anew, as though you had pasted it in at the place where exec is called. Because of that, exec does not require module reloads after file changesit skips the normal module import logic. On the downside, because it works as if pasting code into the place where it is called, exec, like the from statement mentioned earlier, has the potential to silently overwrite variables you may currently be using. For example, our [Link] assigns to a variable named x. If that name is also being used in the place where exec is called, the names value is replaced:
>>> x = 999 >>> exec(open('[Link]').read()) ...same outout... >>> x 'Spam!' # Code run in this namespace by default # Its assignments can overwrite names here
If youre burning with curiosity, the short story is that Python searches for imported modules in every directory listed in [Link] Python list of directory name strings in the sys module, which is initialized from a PYTHONPATH environment variable, plus a set of standard directories. If you want to import from a directory other than the one you are working in, that directory must generally be listed in your PYTHONPATH setting. For more details, see Chapter 21.
By contrast, the basic import statement runs the file only once per process, and it makes the file a separate module namespace so that its assignments will not change variables in your scope. The price you pay for the namespace partitioning of modules is the need to reload after changes.
Version skew note: Python 2.6 also includes an execfile('[Link]') built-in function, in addition to allowing the form exec(open('[Link]')), which both automatically read the files content. Both of these are equivalent to the exec(open('[Link]').read()) form, which is more complex but runs in both 2.6 and 3.0. Unfortunately, neither of these two simpler 2.6 forms is available in 3.0, which means you must understand both files and their read methods to fully understand this technique today (alas, this seems to be a case of aesthetics trouncing practicality in 3.0). In fact, the exec form in 3.0 involves so much typing that the best advice may simply be not to do itits usually best to launch files by typing system shell command lines or by using the IDLE menu options described in the next section. For more on the 3.0 exec form, see Chapter 9.
IDLE Basics
Lets jump right into an example. IDLE is easy to start under Windowsit has an entry in the Start button menu for Python (see Figure 2-1, shown previously), and it can also be selected by right-clicking on a Python program icon. On some Unix-like systems,
IDLE is officially a corruption of IDE, but its really named in honor of Monty Python member Eric Idle.
you may need to launch IDLEs top-level script from a command line, or by clicking on the icon for the [Link] or [Link] file located in the idlelib subdirectory of Pythons Lib directory. On Windows, IDLE is a Python script that currently lives in C:\Python30\Lib\idlelib (or C:Python26\Lib\idlelib in Python 2.6).# Figure 3-3 shows the scene after starting IDLE on Windows. The Python shell window that opens initially is the main window, which runs an interactive session (notice the >>> prompt). This works like all interactive sessionscode you type here is run immediately after you type itand serves as a testing tool.
Figure 3-3. The main Python shell window of the IDLE development GUI, shown here running on Windows. Use the File menu to begin (New Window) or change (Open...) a source file; use the text edit windows Run menu to run the code in that window (Run Module).
#IDLE is a Python program that uses the standard librarys tkinter GUI toolkit (a.k.a. Tkinter in Python 2.6) to build the IDLE GUI. This makes IDLE portable, but it also means that youll need to have tkinter support in your Python to use IDLE. The Windows version of Python has this by default, but some Linux and Unix users may need to install the appropriate tkinter support (a yum tkinter command may suffice on some Linux distributions, but see the installation hints in Appendix A for details). Mac OS X may have everything you need preinstalled, too; look for an idle command or script on your machine.
IDLE uses familiar menus with keyboard shortcuts for most of its operations. To make (or edit) a source code file under IDLE, open a text edit window: in the main window, select the File pull-down menu, and pick New Window (or Open... to open a text edit window displaying an existing file for editing). Although it may not show up fully in this books graphics, IDLE uses syntax-directed colorization for the code typed in both the main window and all text edit windows keywords are one color, literals are another, and so on. This helps give you a better picture of the components in your code (and can even help you spot mistakes run-on strings are all one color, for example). To run a file of code that you are editing in IDLE, select the files text edit window, open that windows Run pull-down menu, and choose the Run Module option listed there (or use the equivalent keyboard shortcut, given in the menu). Python will let you know that you need to save your file first if youve changed it since it was opened or last saved and forgot to save your changesa common mistake when youre knee deep in coding. When run this way, the output of your script and any error messages it may generate show up back in the main interactive window (the Python shell window). In Figure 3-3, for example, the three lines after the RESTART line near the middle of the window reflect an execution of our [Link] file opened in a separate edit window. The RESTART message tells us that the user-code process was restarted to run the edited script and serves to separate script output (it does not appear if IDLE is started without a user-code subprocessmore on this mode in a moment).
IDLE hint of the day: If you want to repeat prior commands in IDLEs main interactive window, you can use the Alt-P key combination to scroll backward through the command history, and Alt-N to scroll forward (on some Macs, try Ctrl-P and Ctrl-N instead). Your prior commands will be recalled and displayed, and may be edited and rerun. You can also recall commands by positioning the cursor on them, or use cut-and-paste operations, but these techniques tend to involve more work. Outside IDLE, you may be able to recall commands in an interactive session with the arrow keys on Windows.
Using IDLE
IDLE is free, easy to use, portable, and automatically available on most platforms. I generally recommend it to Python newcomers because it sugarcoats some of the details and does not assume prior experience with system command lines. However, it is somewhat limited compared to more advanced commercial IDEs. To help you avoid some common pitfalls, here is a list of issues that IDLE beginners should bear in mind: You must add .py explicitly when saving your files. I mentioned this when talking about files in general, but its a common IDLE stumbling block, especially
60 | Chapter 3:How You Run Programs
for Windows users. IDLE does not automatically add a .py extension to filenames when files are saved. Be careful to type the .py extension yourself when saving a file for the first time. If you dont, while you will be able to run your file from IDLE (and system command lines), you will not be able to import it either interactively or from other modules. Run scripts by selecting RunRun Module in text edit windows, not by interactive imports and reloads. Earlier in this chapter, we saw that its possible to run a file by importing it interactively. However, this scheme can grow complex because it requires you to manually reload files after changes. By contrast, using the RunRun Module menu option in IDLE always runs the most current version of your file, just like running it using a system shell command line. IDLE also prompts you to save your file first, if needed (another common mistake outside IDLE). You need to reload only modules being tested interactively. Like system shell command lines, IDLEs RunRun Module menu option always runs the current version of both the top-level file and any modules it imports. Because of this, RunRun Module eliminates common confusions surrounding imports. You only need to reload modules that you are importing and testing interactively in IDLE. If you choose to use the import and reload technique instead of RunRun Module, remember that you can use the Alt-P/Alt-N key combinations to recall prior commands. You can customize IDLE. To change the text fonts and colors in IDLE, select the Configure option in the Options menu of any IDLE window. You can also customize key combination actions, indentation settings, and more; see IDLEs Help pull-down menu for more hints. There is currently no clear-screen option in IDLE. This seems to be a frequent request (perhaps because its an option available in similar IDEs), and it might be added eventually. Today, though, there is no way to clear the interactive windows text. If you want the windows text to go away, you can either press and hold the Enter key, or type a Python loop to print a series of blank lines (nobody really uses the latter technique, of course, but it sounds more high-tech than pressing the Enter key!). tkinter GUI and threaded programs may not work well with IDLE. Because IDLE is a Python/tkinter program, it can hang if you use it to run certain types of advanced Python/tkinter programs. This has become less of an issue in more recent versions of IDLE that run user code in one process and the IDLE GUI itself in another, but some programs (especially those that use multithreading) might still hang the GUI. Your code may not exhibit such problems, but as a rule of thumb, its always safe to use IDLE to edit GUI programs but launch them using other options, such as icon clicks or system command lines. When in doubt, if your code fails in IDLE, try it outside the GUI.
If connection errors arise, try starting IDLE in single-process mode. Because IDLE requires communication between its separate user and GUI processes, it can sometimes have trouble starting up on certain platforms (notably, it fails to start occasionally on some Windows machines, due to firewall software that blocks connections). If you run into such connection errors, its always possible to start IDLE with a system command line that forces it to run in single-process mode without a user-code subprocess and therefore avoids communication issues: its -n command-line flag forces this mode. On Windows, for example, start a Command Prompt window and run the system command line [Link] -n from within the directory C:\Python30\Lib\idlelib (cd there first if needed). Beware of some IDLE usability features. IDLE does much to make life easier for beginners, but some of its tricks wont apply outside the IDLE GUI. For instance, IDLE runs your scripts in its own interactive namespace, so variables in your code show up automatically in the IDLE interactive sessionyou dont always need to run import commands to access names at the top level of files youve already run. This can be handy, but it can also be confusing, because outside the IDLE environment names must always be imported from files to be used. IDLE also automatically changes both to the directory of a file just run and adds its directory to the module import search patha handy feature that allows you to import files there without search path settings, but also something that wont work the same when you run files outside IDLE. Its OK to use such features, but dont forget that they are IDLE behavior, not Python behavior.
intuitive GUI interactions, you should experiment with the system live to get a feel for its other tools.
Other IDEs
Because IDLE is free, portable, and a standard part of Python, its a nice first development tool to become familiar with if you want to use an IDE at all. Again, I recommend that you use IDLE for this books exercises if youre just starting out, unless you are already familiar with and prefer a command-line-based development mode. There are, however, a handful of alternative IDEs for Python developers, some of which are substantially more powerful and robust than IDLE. Here are some of the most commonly used IDEs: Eclipse and PyDev Eclipse is an advanced open source IDE GUI. Originally developed as a Java IDE, Eclipse also supports Python development when you install the PyDev (or a similar) plug-in. Eclipse is a popular and powerful option for Python development, and it goes well beyond IDLEs feature set. It includes support for code completion, syntax highlighting, syntax analysis, refactoring, debugging, and more. Its downsides are that it is a large system to install and may require shareware extensions for some features (this may vary over time). Still, when you are ready to graduate from IDLE, the Eclipse/PyDev combination is worth your attention. Komodo A full-featured development environment GUI for Python (and other languages), Komodo includes standard syntax-coloring, text-editing, debugging, and other features. In addition, Komodo offers many advanced features that IDLE does not, including project files, source-control integration, regular-expression debugging, and a drag-and-drop GUI builder that generates Python/tkinter code to implement the GUIs you design interactively. At this writing, Komodo is not free; it is available at [Link] NetBeans IDE for Python NetBeans is a powerful open-source development environment GUI with support for many advanced features for Python developers: code completion, automatic indentation and code colorization, editor hints, code folding, refactoring, debugging, code coverage and testing, projects, and more. It may be used to develop both CPython and Jython code. Like Eclipse, NetBeans requires installation steps beyond those of the included IDLE GUI, but it is seen by many as more than worth the effort. Search the Web for the latest information and links. PythonWin PythonWin is a free Windows-only IDE for Python that ships as part of ActiveStates ActivePython distribution (and may also be fetched separately from http:// [Link] resources). It is roughly like IDLE, with a handful of useful Windows-specific extensions added; for example, PythonWin has support for
Other IDEs | 63
COM objects. Today, IDLE is probably more advanced than PythonWin (for instance, IDLEs dual-process architecture often prevents it from hanging). However, PythonWin still offers tools for Windows developers that IDLE does not. See http: //[Link] for more information. Others There are roughly half a dozen other widely used IDEs that Im aware of (including the commercial Wing IDE and PythonCard) but do not have space to do justice to here, and more will probably appear over time. In fact, almost every programmerfriendly text editor has some sort of support for Python development these days, whether it be preinstalled or fetched separately. Emacs and Vim, for instance, have substantial Python support. I wont try to document all such options here; for more information, see the resources available at [Link] or search the Web for Python IDE. You might also try running a web search for Python editorstoday, this leads you to a wiki page that maintains information about many IDE and text-editor options for Python programming.
Embedding Calls
In some specialized domains, Python code may be run automatically by an enclosing system. In such cases, we say that the Python programs are embedded in (i.e., run by) another program. The Python code itself may be entered into a text file, stored in a database, fetched from an HTML page, parsed from an XML document, and so on. But from an operational perspective, another systemnot youmay tell Python to run the code youve created. Such an embedded execution mode is commonly used to support end-user customizationa game program, for instance, might allow for play modifications by running user-accessible embedded Python code at strategic points in time. Users can modify this type of system by providing or changing Python code. Because Python code is interpreted, there is no need to recompile the entire system to incorporate the change (see Chapter 2 for more on how Python code is run).
In this mode, the enclosing system that runs your code might be written in C, C++, or even Java when the Jython system is used. As an example, its possible to create and run strings of Python code from a C program by calling functions in the Python runtime API (a set of services exported by the libraries created when Python is compiled on your machine):
#include <Python.h> ... Py_Initialize(); PyRun_SimpleString("x = 'brave ' + 'sir robin'"); // This is C, not Python // But it runs Python code
In this C code snippet, a program coded in the C language embeds the Python interpreter by linking in its libraries, and passes it a Python assignment statement string to run. C programs may also gain access to Python modules and objects and process or execute them using other Python API tools. This book isnt about Python/C integration, but you should be aware that, depending on how your organization plans to use Python, you may or may not be the one who actually starts the Python programs you create. Regardless, you can usually still use the interactive and file-based launching techniques described here to test code in isolation from those enclosing systems that may eventually use it.*
* See Programming Python (OReilly) for more details on embedding Python in C/C++. The embedding API can call Python functions directly, load modules, and more. Also, note that the Jython system allows Java programs to invoke Python code using a Java-based API (a Python interpreter class).
Future Possibilities?
This chapter reflects current practice, but much of the material is both platform- and time-specific. Indeed, many of the execution and launch details presented arose during the shelf life of this books various editions. As with program execution options, its not impossible that new program launch options may arise over time. New operating systems, and new versions of existing systems, may also provide execution techniques beyond those outlined here. In general, because Python keeps pace with such changes, you should be able to launch Python programs in whatever way makes sense for the machines you use, both now and in the futurebe that by drawing on tablet PCs or PDAs, grabbing icons in a virtual reality, or shouting a scripts name over your coworkers conversations. Implementation changes may also impact launch schemes somewhat (e.g., a full compiler could produce normal executables that are launched much like frozen binaries today). If I knew what the future truly held, though, I would probably be talking to a stockbroker instead of writing these words!
way of universal guidelines; in general, whatever environment you like to use will be the best for you to use.
standalone debugger with advanced debugging support and cross-platform GUI and console interfaces. These options will become more important as we start writing larger scripts. Probably the best news on the debugging front, though, is that errors are detected and reported in Python, rather than passing silently or crashing the system altogether. In fact, errors themselves are a well-defined mechanism known as exceptions, which you can catch and process (more on exceptions in Part VII). Making mistakes is never fun, of course, but speaking as someone who recalls when debugging meant getting out a hex calculator and poring over piles of memory dump printouts, Pythons debugging support makes errors much less painful than they might otherwise be.
Chapter Summary
In this chapter, weve looked at common ways to launch Python programs: by running code typed interactively, and by running code stored in files with system command lines, file-icon clicks, module imports, exec calls, and IDE GUIs such as IDLE. Weve covered a lot of pragmatic startup territory here. This chapters goal was to equip you with enough information to enable you to start writing some code, which youll do in the next part of the book. There, we will start exploring the Python language itself, beginning with its core data types. First, though, take the usual chapter quiz to exercise what youve learned here. Because this is the last chapter in this part of the book, its followed with a set of more complete exercises that test your mastery of this entire parts topics. For help with the latter set of problems, or just for a refresher, be sure to turn to Appendix B after youve given the exercises a try.
a scripts variables are automatically imported to the interactive scope in IDLE, for instance, but not by Python in general. 8. A namespace is just a package of variables (i.e., names). It takes the form of an object with attributes in Python. Each module file is automatically a namespace that is, a package of variables reflecting the assignments made at the top level of the file. Namespaces help avoid name collisions in Python programs: because each module file is a self-contained namespace, files must explicitly import other files in order to use their names.
4. Scripts. If your platform supports it, add the #! line to the top of your [Link] module file, give the file executable privileges, and run it directly as an executable. What does the first line need to contain? #! usually only has meaning on Unix, Linux, and Unix-like platforms such as Mac OS X; if youre working on Windows, instead try running your file by listing just its name in a DOS console window without the word python before it (this works on recent versions of Windows), or via the StartRun... dialog box. 5. Errors and debugging. Experiment with typing mathematical expressions and assignments at the Python interactive command line. Along the way, type the expressions 2 ** 500 and 1 / 0, and reference an undefined variable name as we did in this chapter. What happens? You may not know it yet, but when you make a mistake, youre doing exception processing (a topic well explore in depth in Part VII). As youll learn there, you are technically triggering whats known as the default exception handlerlogic that prints a standard error message. If you do not catch an error, the default handler does and prints the standard error message in response. Exceptions are also bound up with the notion of debugging in Python. When youre first starting out, Pythons default error messages on exceptions will probably provide as much error-handling support as you needthey give the cause of the error, as well as showing the lines in your code that were active when the error occurred. For more about debugging, see the sidebar Debugging Python Code on page 67. 6. Breaks and cycles. At the Python command line, type:
L = [1, 2] [Link](L) L # Make a 2-item list # Append L as a single item to itself # Print L
What happens? In all recent versions of Python, youll see a strange output that well describe in the solutions appendix, and which will make more sense when we study references in the next part of the book. If youre using a Python version older than 1.5.1, a Ctrl-C key combination will probably help on most platforms. Why do you think your version of Python responds the way it does for this code?
If you do have a Python older than Release 1.5.1 (a hopefully rare scenario today!), make sure your machine can stop a program with a Ctrl-C key combination of some sort before running this test, or you may be waiting a long time.
7. Documentation. Spend at least 17 minutes browsing the Python library and language manuals before moving on to get a feel for the available tools in the standard library and the structure of the documentation set. It takes at least this long to become familiar with the locations of major topics in the manual set; once youve done this, its easy to find what you need. You can find this manual via the Python
Start button entry on Windows, in the Python Docs option on the Help pull-down menu in IDLE, or online at [Link] Ill also have a few more words to say about the manuals and other documentation sources available (including PyDoc and the help function) in Chapter 15. If you still have time, go explore the Python website, as well as its PyPy third-party extension repository. Especially check out the [Link] documentation and search pages; they can be crucial resources.
PART II
CHAPTER 4
This chapter begins our tour of the Python language. In an informal sense, in Python, we do things with stuff. Things take the form of operations like addition and concatenation, and stuff refers to the objects on which we perform those operations. In this part of the book, our focus is on that stuff, and the things our programs can do with it. Somewhat more formally, in Python, data takes the form of objectseither built-in objects that Python provides, or objects we create using Python or external language tools such as C extension libraries. Although well firm up this definition later, objects are essentially just pieces of memory, with values and sets of associated operations. Because objects are the most fundamental notion in Python programming, well start this chapter with a survey of Pythons built-in object types. By way of introduction, however, lets first establish a clear picture of how this chapter fits into the overall Python picture. From a more concrete perspective, Python programs can be decomposed into modules, statements, expressions, and objects, as follows: 1. 2. 3. 4. Programs are composed of modules. Modules contain statements. Statements contain expressions. Expressions create and process objects.
The discussion of modules in Chapter 3 introduced the highest level of this hierarchy. This parts chapters begin at the bottom, exploring both built-in objects and the expressions you can code to use them.
75
Booleans, types, None Functions, modules, classes (Part IV, Part V, Part VI) Compiled code, stack tracebacks (Part IV, Part VII)
Table 4-1 isnt really complete, because everything we process in Python programs is a kind of object. For instance, when we perform text pattern matching in Python, we create pattern objects, and when we perform network scripting, we use socket objects. These other kinds of objects are generally created by importing and using modules and have behavior all their own. As well see in later parts of the book, program units such as functions, modules, and classes are objects in Python toothey are created with statements and expressions such as def, class, import, and lambda and may be passed around scripts freely, stored within other objects, and so on. Python also provides a set of implementation-related types such as compiled code objects, which are generally of interest to tool builders more than application developers; these are also discussed in later parts of this text. We usually call the other object types in Table 4-1 core data types, though, because they are effectively built into the Python languagethat is, there is specific expression syntax for generating most of them. For instance, when you run the following code:
>>> 'spam'
* In this book, the term literal simply means an expression whose syntax generates an objectsometimes also called a constant. Note that the term constant does not imply objects or variables that can never be changed (i.e., this term is unrelated to C++s const or Pythons immutablea topic explored in the section Immutability on page 82).
you are, technically speaking, running a literal expression that generates and returns a new string object. There is specific Python language syntax to make this object. Similarly, an expression wrapped in square brackets makes a list, one in curly braces makes a dictionary, and so on. Even though, as well see, there are no type declarations in Python, the syntax of the expressions you run determines the types of objects you create and use. In fact, object-generation expressions like those in Table 4-1 are generally where types originate in the Python language. Just as importantly, once you create an object, you bind its operation set for all time you can perform only string operations on a string and list operations on a list. As youll learn, Python is dynamically typed (it keeps track of types for you automatically instead of requiring declaration code), but it is also strongly typed (you can perform on an object only operations that are valid for its type). Functionally, the object types in Table 4-1 are more general and powerful than what you may be accustomed to. For instance, youll find that lists and dictionaries alone are powerful data representation tools that obviate most of the work you do to support collections and searching in lower-level languages. In short, lists provide ordered collections of other objects, while dictionaries store objects by key; both lists and dictionaries may be nested, can grow and shrink on demand, and may contain objects of any type. Well study each of the object types in Table 4-1 in detail in upcoming chapters. Before digging into the details, though, lets begin by taking a quick look at Pythons core objects in action. The rest of this chapter provides a preview of the operations well explore in more depth in the chapters that follow. Dont expect to find the full story herethe goal of this chapter is just to whet your appetite and introduce some key ideas. Still, the best way to get started is to get started, so lets jump right into some real code.
Numbers
If youve done any programming or scripting in the past, some of the object types in Table 4-1 will probably seem familiar. Even if you havent, numbers are fairly straightforward. Pythons core objects set includes the usual suspects: integers (numbers without a fractional part), floating-point numbers (roughly, numbers with a decimal point in them), and more exotic numeric types (complex numbers with imaginary parts, fixed-precision decimals, rational fractions with numerator and denominator, and fullfeatured sets). Although it offers some fancier options, Pythons basic number types are, well, basic. Numbers in Python support the normal mathematical operations. For instance, the plus sign (+) performs addition, a star (*) is used for multiplication, and two stars (**) are used for exponentiation:
>>> 123 + 222 345 >>> 1.5 * 4 6.0 >>> 2 ** 100 1267650600228229401496703205376
Notice the last result here: Python 3.0s integer type automatically provides extra precision for large numbers like this when needed (in 2.6, a separate long integer type handles numbers too large for the normal integer type in similar ways). You can, for instance, compute 2 to the power 1,000,000 as an integer in Python, but you probably shouldnt try to print the resultwith more than 300,000 digits, you may be waiting awhile!
>>> len(str(2 ** 1000000)) 301030 # How many digits in a really BIG number?
Once you start experimenting with floating-point numbers, youre likely to stumble across something that may look a bit odd on first glance:
>>> 3.1415 * 2 6.2830000000000004 >>> print(3.1415 * 2) 6.283 # repr: as code # str: user-friendly
The first result isnt a bug; its a display issue. It turns out that there are two ways to print every object: with full precision (as in the first result shown here), and in a userfriendly form (as in the second). Formally, the first form is known as an objects ascode repr, and the second is its user-friendly str. The difference can matter when we step up to using classes; for now, if something looks odd, try showing it with a print built-in call statement. Besides expressions, there are a handful of useful numeric modules that ship with Pythonmodules are just packages of additional tools that we import to use:
>>> import math >>> [Link] 3.1415926535897931 >>> [Link](85) 9.2195444572928871
The math module contains more advanced numeric tools as functions, while the random module performs random number generation and random selections (here, from a Python list, introduced later in this chapter):
>>> import random >>> [Link]() 0.59268735266273953 >>> [Link]([1, 2, 3, 4]) 1
Python also includes more exotic numeric objectssuch as complex, fixed-precision, and rational numbers, as well as sets and Booleansand the third-party open source
Numbers | 79
extension domain has even more (e.g., matrixes and vectors). Well defer discussion of these types until later in the book. So far, weve been using Python much like a simple calculator; to do better justice to its built-in types, lets move on to explore strings.
Strings
Strings are used to record textual information as well as arbitrary collections of bytes. They are our first example of what we call a sequence in Pythonthat is, a positionally ordered collection of other objects. Sequences maintain a left-to-right order among the items they contain: their items are stored and fetched by their relative position. Strictly speaking, strings are sequences of one-character strings; other types of sequences include lists and tuples, covered later.
Sequence Operations
As sequences, strings support operations that assume a positional ordering among items. For example, if we have a four-character string, we can verify its length with the built-in len function and fetch its components with indexing expressions:
>>> >>> 4 >>> 'S' >>> 'p' S = 'Spam' len(S) S[0] S[1] # Length # The first item in S, indexing by zero-based position # The second item from the left
In Python, indexes are coded as offsets from the front, and so start from 0: the first item is at index 0, the second is at index 1, and so on. Notice how we assign the string to a variable named S here. Well go into detail on how this works later (especially in Chapter 6), but Python variables never need to be declared ahead of time. A variable is created when you assign it a value, may be assigned any type of object, and is replaced with its value when it shows up in an expression. It must also have been previously assigned by the time you use its value. For the purposes of this chapter, its enough to know that we need to assign an object to a variable in order to save it for later use. In Python, we can also index backward, from the endpositive indexes count from the left, and negative indexes count back from the right:
>>> S[-1] 'm' >>> S[-2] 'a' # The last item from the end in S # The second to last item from the end
Formally, a negative index is simply added to the strings size, so the following two operations are equivalent (though the first is easier to code and less easy to get wrong):
>>> S[-1] 'm' >>> S[len(S)-1] 'm' # The last item in S # Negative indexing, the hard way
Notice that we can use an arbitrary expression in the square brackets, not just a hardcoded number literalanywhere that Python expects a value, we can use a literal, a variable, or any expression. Pythons syntax is completely general this way. In addition to simple positional indexing, sequences also support a more general form of indexing known as slicing, which is a way to extract an entire section (slice) in a single step. For example:
>>> S 'Spam' >>> S[1:3] 'pa' # A 4-character string # Slice of S from offsets 1 through 2 (not 3)
Probably the easiest way to think of slices is that they are a way to extract an entire column from a string in a single step. Their general form, X[I:J], means give me everything in X from offset I up to but not including offset J. The result is returned in a new object. The second of the preceding operations, for instance, gives us all the characters in string S from offsets 1 through 2 (that is, 3 1) as a new string. The effect is to slice or parse out the two characters in the middle. In a slice, the left bound defaults to zero, and the right bound defaults to the length of the sequence being sliced. This leads to some common usage variations:
>>> S[1:] 'pam' >>> S 'Spam' >>> S[0:3] 'Spa' >>> S[:3] 'Spa' >>> S[:-1] 'Spa' >>> S[:] 'Spam' # Everything past the first (1:len(S)) # S itself hasn't changed # Everything but the last # Same as S[0:3] # Everything but the last again, but simpler (0:-1) # All of S as a top-level copy (0:len(S))
Note how negative offsets can be used to give bounds for slices, too, and how the last operation effectively copies the entire string. As youll learn later, there is no reason to copy a string, but this form can be useful for sequences like lists. Finally, as sequences, strings also support concatenation with the plus sign (joining two strings into a new string) and repetition (making a new string by repeating another):
>>> S Spam' >>> S + 'xyz' # Concatenation
Strings | 81
Notice that the plus sign (+) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python that well call polymorphism later in the bookin sum, the meaning of an operation depends on the objects being operated on. As youll see when we study dynamic typing, this polymorphism property accounts for much of the conciseness and flexibility of Python code. Because types arent constrained, a Python-coded operation can normally work on many different types of objects automatically, as long as they support a compatible interface (like the + operation here). This turns out to be a huge idea in Python; youll learn more about it later on our tour.
Immutability
Notice that in the prior examples, we were not changing the original string with any of the operations we ran on it. Every string operation is defined to produce a new string as its result, because strings are immutable in Pythonthey cannot be changed in-place after they are created. For example, you cant change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Because Python cleans up old objects as you go (as youll see later), this isnt as inefficient as it may sound:
>>> S 'Spam' >>> S[0] = 'z' # Immutable objects cannot be changed ...error text omitted... TypeError: 'str' object does not support item assignment >>> S = 'z' + S[1:] >>> S 'zpam' # But we can run expressions to make new objects
Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, numbers, strings, and tuples are immutable; lists and dictionaries are not (they can be changed in-place freely). Among other things, immutability can be used to guarantee that an object remains constant throughout your program.
Type-Specific Methods
Every string operation weve studied so far is really a sequence operationthat is, these operations will work on other sequences in Python as well, including lists and tuples. In addition to generic sequence operations, though, strings also have operations all their own, available as methodsfunctions attached to the object, which are triggered with a call expression.
82 | Chapter 4:Introducing Python Object Types
For example, the string find method is the basic substring search operation (it returns the offset of the passed-in substring, or 1 if it is not present), and the string replace method performs global searches and replacements:
>>> [Link]('pa') 1 >>> S 'Spam' >>> [Link]('pa', 'XYZ') 'SXYZm' >>> S 'Spam' # Find the offset of a substring
Again, despite the names of these string methods, we are not changing the original strings here, but creating new strings as the resultsbecause strings are immutable, we have to do it this way. String methods are the first line of text-processing tools in Python. Other methods split a string into substrings on a delimiter (handy as a simple form of parsing), perform case conversions, test the content of the string (digits, letters, and so on), and strip whitespace characters off the ends of the string:
>>> line = 'aaa,bbb,ccccc,dd' >>> [Link](',') # Split on a delimiter into a list of substrings ['aaa', 'bbb', 'ccccc', 'dd'] >>> S = 'spam' >>> [Link]() # Upper- and lowercase conversions 'SPAM' >>> [Link]() True # Content tests: isalpha, isdigit, etc.
>>> line = 'aaa,bbb,ccccc,dd\n' >>> line = [Link]() # Remove whitespace characters on the right side >>> line 'aaa,bbb,ccccc,dd'
Strings also support an advanced substitution operation known as formatting, available as both an expression (the original) and a string method call (new in 2.6 and 3.0):
>>> '%s, eggs, and %s' % ('spam', 'SPAM!') 'spam, eggs, and SPAM!' >>> '{0}, eggs, and {1}'.format('spam', 'SPAM!') 'spam, eggs, and SPAM!' # Formatting expression (all) # Formatting method (2.6, 3.0)
One note here: although sequence operations are generic, methods are notalthough some types share some method names, string method operations generally work only on strings, and nothing else. As a rule of thumb, Pythons toolset is layered: generic operations that span multiple types show up as built-in functions or expressions (e.g., len(X), X[0]), but type-specific operations are method calls (e.g., [Link]()). Finding the tools you need among all these categories will become more natural as you use Python more, but the next section gives a few tips you can use right now.
Strings | 83
Getting Help
The methods introduced in the prior section are a representative, but small, sample of what is available for string objects. In general, this book is not exhaustive in its look at object methods. For more details, you can always call the built-in dir function, which returns a list of all the attributes available for a given object. Because methods are function attributes, they will show up in this list. Assuming S is still the string, here are its attributes on Python 3.0 (Python 2.6 varies slightly):
>>> dir(S) ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum','isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
You probably wont care about the names with underscores in this list until later in the book, when we study operator overloading in classesthey represent the implementation of the string object and are available to support customization. In general, leading and trailing double underscores is the naming pattern Python uses for implementation details. The names without the underscores in this list are the callable methods on string objects. The dir function simply gives the methods names. To ask what they do, you can pass them to the help function:
>>> help([Link]) Help on built-in function replace: replace(...) [Link] (old, new[, count]) -> str Return a copy of S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
help is one of a handful of interfaces to a system of code that ships with Python known
as PyDoca tool for extracting documentation from objects. Later in the book, youll see that PyDoc can also render its reports in HTML format. You can also ask for help on an entire string (e.g., help(S)), but you may get more help than you want to seei.e., information about every string method. Its generally better to ask about a specific method.
For more details, you can also consult Pythons standard library reference manual or commercially published reference books, but dir and help are the first line of documentation in Python.
Python allows strings to be enclosed in single or double quote characters (they mean the same thing). It also allows multiline string literals enclosed in triple quotes (single or double)when this form is used, all the lines are concatenated together, and endof-line characters are added where line breaks appear. This is a minor syntactic convenience, but its useful for embedding things like HTML and XML code in a Python script:
>>> msg = """ aaaaaaaaaaaaa bbb'''bbbbbbbbbb""bbbbbbb'bbbb cccccccccccccc""" >>> msg '\naaaaaaaaaaaaa\nbbb\'\'\'bbbbbbbbbb""bbbbbbb\'bbbb\ncccccccccccccc'
Python also supports a raw string literal that turns off the backslash escape mechanism (such string literals start with the letter r), as well as Unicode string support that supports internationalization. In 3.0, the basic str string type handles Unicode too (which makes sense, given that ASCII text is a simple kind of Unicode), and a bytes type represents raw byte strings; in 2.6, Unicode is a separate type, and str handles both 8bit strings and binary data. Files are also changed in 3.0 to return and accept str for text and bytes for binary data. Well meet all these special string forms in later chapters.
Pattern Matching
One point worth noting before we move on is that none of the string objects methods support pattern-based text processing. Text pattern matching is an advanced tool outside this books scope, but readers with backgrounds in other scripting languages may be interested to know that to do pattern matching in Python, we import a module called
Strings | 85
re. This module has analogous calls for searching, splitting, and replacement, but be-
cause we can use patterns to specify substrings, we can be much more general:
>>> import re >>> match = [Link]('Hello[ \t]*(.*)world', 'Hello >>> [Link](1) 'Python ' Python world')
This example searches for a substring that begins with the word Hello, followed by zero or more tabs or spaces, followed by arbitrary characters to be saved as a matched group, terminated by the word world. If such a substring is found, portions of the substring matched by parts of the pattern enclosed in parentheses are available as groups. The following pattern, for example, picks out three groups separated by slashes:
>>> match = [Link]('/(.*)/(.*)/(.*)', '/usr/home/lumberjack') >>> [Link]() ('usr', 'home', 'lumberjack')
Pattern matching is a fairly advanced text-processing tool by itself, but there is also support in Python for even more advanced language processing, including natural language processing. Ive already said enough about strings for this tutorial, though, so lets move on to the next type.
Lists
The Python list object is the most general sequence provided by the language. Lists are positionally ordered collections of arbitrarily typed objects, and they have no fixed size. They are also mutableunlike strings, lists can be modified in-place by assignment to offsets as well as a variety of list method calls.
Sequence Operations
Because they are sequences, lists support all the sequence operations we discussed for strings; the only difference is that the results are usually lists instead of strings. For instance, given a three-item list:
>>> L = [123, 'spam', 1.23] >>> len(L) 3 # A list of three different-type objects # Number of items in the list
Type-Specific Operations
Pythons lists are related to arrays in other languages, but they tend to be more powerful. For one thing, they have no fixed type constraintthe list we just looked at, for example, contains three objects of completely different types (an integer, a string, and a floating-point number). Further, lists have no fixed size. That is, they can grow and shrink on demand, in response to list-specific operations:
>>> [Link]('NI') >>> L [123, 'spam', 1.23, 'NI'] >>> [Link](2) 1.23 >>> L [123, 'spam', 'NI'] # Growing: add object at end of list
# Shrinking: delete an item in the middle # "del L[2]" deletes from a list too
Here, the list append method expands the lists size and inserts an item at the end; the pop method (or an equivalent del statement) then removes an item at a given offset, causing the list to shrink. Other list methods insert an item at an arbitrary position (insert), remove a given item by value (remove), and so on. Because lists are mutable, most list methods also change the list object in-place, instead of creating a new one:
>>> M = ['bb', 'aa', 'cc'] >>> [Link]() >>> M ['aa', 'bb', 'cc'] >>> [Link]() >>> M ['cc', 'bb', 'aa']
The list sort method here, for example, orders the list in ascending fashion by default, and reverse reverses itin both cases, the methods modify the list directly.
Bounds Checking
Although lists have no fixed size, Python still doesnt allow us to reference items that are not present. Indexing off the end of a list is always a mistake, but so is assigning off the end:
>>> L [123, 'spam', 'NI'] >>> L[99] ...error text omitted... IndexError: list index out of range
Lists | 87
>>> L[99] = 1 ...error text omitted... IndexError: list assignment index out of range
This is intentional, as its usually an error to try to assign off the end of a list (and a particularly nasty one in the C language, which doesnt do as much error checking as Python). Rather than silently growing the list in response, Python reports an error. To grow a list, we call list methods such as append instead.
Nesting
One nice feature of Pythons core data types is that they support arbitrary nestingwe can nest them in any combination, and as deeply as we like (for example, we can have a list that contains a dictionary, which contains another list, and so on). One immediate application of this feature is to represent matrixes, or multidimensional arrays in Python. A list with nested lists will do the job for basic applications:
>>> M = [[1, 2, [4, 5, [7, 8, >>> M [[1, 2, 3], [4, 3], 6], 9]] 5, 6], [7, 8, 9]] # A 3 3 matrix, as nested lists # Code can span lines if bracketed
Here, weve coded a list that contains three other lists. The effect is to represent a 3 3 matrix of numbers. Such a structure can be accessed in a variety of ways:
>>> M[1] [4, 5, 6] >>> M[1][2] 6 # Get row 2 # Get row 2, then get item 3 within the row
The first operation here fetches the entire second row, and the second grabs the third item within that row. Stringing together index operations takes us deeper and deeper into our nested-object structure.
Comprehensions
In addition to sequence operations and list methods, Python includes a more advanced operation known as a list comprehension expression, which turns out to be a powerful way to process structures like our matrix. Suppose, for instance, that we need to extract the second column of our sample matrix. Its easy to grab rows by simple indexing
This matrix structure works for small-scale tasks, but for more serious number crunching you will probably want to use one of the numeric extensions to Python, such as the open source NumPy system. Such tools can store and process large matrixes much more efficiently than our nested list structure. NumPy has been said to turn Python into the equivalent of a free and more powerful version of the Matlab system, and organizations such as NASA, Los Alamos, and JPMorgan Chase use this tool for scientific and financial tasks. Search the Web for more details.
because the matrix is stored by rows, but its almost as easy to get a column with a list comprehension:
>>> col2 = [row[1] for row in M] >>> col2 [2, 5, 8] >>> M [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # Collect the items in column 2
List comprehensions derive from set notation; they are a way to build a new list by running an expression on each item in a sequence, one at a time, from left to right. List comprehensions are coded in square brackets (to tip you off to the fact that they make a list) and are composed of an expression and a looping construct that share a variable name (row, here). The preceding list comprehension means basically what it says: Give me row[1] for each row in matrix M, in a new list. The result is a new list containing column 2 of the matrix. List comprehensions can be more complex in practice:
>>> [row[1] + 1 for row in M] [3, 6, 9] # Add 1 to each item in column 2
>>> [row[1] for row in M if row[1] % 2 == 0] # Filter out odd items [2, 8]
The first operation here, for instance, adds 1 to each item as it is collected, and the second uses an if clause to filter odd numbers out of the result using the % modulus expression (remainder of division). List comprehensions make new lists of results, but they can be used to iterate over any iterable object. Here, for instance, we use list comprehensions to step over a hardcoded list of coordinates and a string:
>>> diag = [M[i][i] for i in [0, 1, 2]] >>> diag [1, 5, 9] >>> doubles = [c * 2 for c in 'spam'] >>> doubles ['ss', 'pp', 'aa', 'mm'] # Collect a diagonal from matrix
List comprehensions, and relatives like the map and filter built-in functions, are a bit too involved for me to say more about them here. The main point of this brief introduction is to illustrate that Python includes both simple and advanced tools in its arsenal. List comprehensions are an optional feature, but they tend to be handy in practice and often provide a substantial processing speed advantage. They also work on any type that is a sequence in Python, as well as some types that are not. Youll hear much more about them later in this book. As a preview, though, youll find that in recent Pythons, comprehension syntax in parentheses can also be used to create generators that produce results on demand (the sum built-in, for instance, sums items in a sequence):
Lists | 89
The map built-in can do similar work, by generating the results of running items through a function. Wrapping it in list forces it to return all its values in Python 3.0:
>>> list(map(sum, M)) [6, 15, 24] # Map sum over items in M
In Python 3.0, comprehension syntax can also be used to create sets and dictionaries:
>>> {sum(row) for row in M} {24, 6, 15} >>> {i : sum(M[i]) for i in range(3)} {0: 6, 1: 15, 2: 24} # Create a set of row sums # Creates key/value table of row sums
In fact, lists, sets, and dictionaries can all be built with comprehensions in 3.0:
>>> [ord(x) for x in 'spaam'] [115, 112, 97, 97, 109] >>> {ord(x) for x in 'spaam'} {112, 97, 115, 109} >>> {x: ord(x) for x in 'spaam'} {'a': 97, 'p': 112, 's': 115, 'm': 109} # List of character ordinals # Sets remove duplicates # Dictionary keys are unique
To understand objects like generators, sets, and dictionaries, though, we must move ahead.
Dictionaries
Python dictionaries are something completely different (Monty Python reference intended)they are not sequences at all, but are instead known as mappings. Mappings are also collections of other objects, but they store objects by key instead of by relative position. In fact, mappings dont maintain any reliable left-to-right order; they simply map keys to associated values. Dictionaries, the only mapping type in Pythons core objects set, are also mutable: they may be changed in-place and can grow and shrink on demand, like lists.
Mapping Operations
When written as literals, dictionaries are coded in curly braces and consist of a series of key: value pairs. Dictionaries are useful anytime we need to associate a set of values with keysto describe the properties of something, for instance. As an example, consider the following three-item dictionary (with keys food, quantity, and color):
>>> D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}
We can index this dictionary by key to fetch and change the keys associated values. The dictionary index operation uses the same syntax as that used for sequences, but the item in the square brackets is a key, not a relative position:
>>> D['food'] 'Spam' # Fetch value of key 'food'
>>> D['quantity'] += 1 # Add 1 to 'quantity' value >>> D {'food': 'Spam', 'color': 'pink', 'quantity': 5}
Although the curly-braces literal form does see use, it is perhaps more common to see dictionaries built up in different ways. The following code, for example, starts with an empty dictionary and fills it out one key at a time. Unlike out-of-bounds assignments in lists, which are forbidden, assignments to new dictionary keys create those keys:
>>> >>> >>> >>> D = {} D['name'] = 'Bob' D['job'] = 'dev' D['age'] = 40 # Create keys by assignment
>>> D {'age': 40, 'job': 'dev', 'name': 'Bob'} >>> print(D['name']) Bob
Here, were effectively using dictionary keys as field names in a record that describes someone. In other applications, dictionaries can also be used to replace searching operationsindexing a dictionary by key is often the fastest way to code a search in Python.
Nesting Revisited
In the prior example, we used a dictionary to describe a hypothetical person, with three keys. Suppose, though, that the information is more complex. Perhaps we need to record a first name and a last name, along with multiple job titles. This leads to another application of Pythons object nesting in action. The following dictionary, coded all at once as a literal, captures more structured information:
>>> rec = {'name': {'first': 'Bob', 'last': 'Smith'}, 'job': ['dev', 'mgr'], 'age': 40.5}
Here, we again have a three-key dictionary at the top (keys name, job, and age), but the values have become more complex: a nested dictionary for the name to support multiple parts, and a nested list for the job to support multiple roles and future expansion. We can access the components of this structure much as we did for our matrix earlier, but this time some of our indexes are dictionary keys, not list offsets:
Dictionaries | 91
>>> rec['name'] {'last': 'Smith', 'first': 'Bob'} >>> rec['name']['last'] 'Smith' >>> rec['job'] ['dev', 'mgr'] >>> rec['job'][-1] 'mgr'
# 'name' is a nested dictionary # Index the nested dictionary # 'job' is a nested list # Index the nested list
>>> rec['job'].append('janitor') # Expand Bob's job description in-place >>> rec {'age': 40.5, 'job': ['dev', 'mgr', 'janitor'], 'name': {'last': 'Smith', 'first': 'Bob'}}
Notice how the last operation here expands the nested job listbecause the job list is a separate piece of memory from the dictionary that contains it, it can grow and shrink freely (object memory layout will be discussed further later in this book). The real reason for showing you this example is to demonstrate the flexibility of Pythons core data types. As you can see, nesting allows us to build up complex information structures directly and easily. Building a similar structure in a low-level language like C would be tedious and require much more code: we would have to lay out and declare structures and arrays, fill out values, link everything together, and so on. In Python, this is all automaticrunning the expression creates the entire nested object structure for us. In fact, this is one of the main benefits of scripting languages like Python. Just as importantly, in a lower-level language we would have to be careful to clean up all of the objects space when we no longer need it. In Python, when we lose the last reference to the objectby assigning its variable to something else, for exampleall of the memory space occupied by that objects structure is automatically cleaned up for us:
>>> rec = 0 # Now the object's space is reclaimed
Technically speaking, Python has a feature known as garbage collection that cleans up unused memory as your program runs and frees you from having to manage such details in your code. In Python, the space is reclaimed immediately, as soon as the last reference to an object is removed. Well study how this works later in this book; for now, its enough to know that you can use objects freely, without worrying about creating their space or cleaning up as you go.
Keep in mind that the rec record we just created really could be a database record, when we employ Pythons object persistence systeman easy way to store native Python objects in files or access-by-key databases. We wont go into details here, but watch for discussion of Pythons pickle and shelve modules later in this book.
What do we do, though, if we do need to impose an ordering on a dictionarys items? One common solution is to grab a list of keys with the dictionary keys method, sort that with the list sort method, and then step through the result with a Python for loop (be sure to press the Enter key twice after coding the for loop belowas explained in Chapter 3, an empty line means go at the interactive prompt, and the prompt changes to ... on some interfaces):
>>> Ks = list([Link]()) >>> Ks ['a', 'c', 'b'] >>> [Link]() >>> Ks ['a', 'b', 'c'] >>> for key in Ks: print(key, '=>', D[key]) a => 1 b => 2 c => 3 # Unordered keys list # A list in 2.6, "view" in 3.0: use list() # Sorted keys list
This is a three-step process, although, as well see in later chapters, in recent versions of Python it can be done in one step with the newer sorted built-in function. The sorted call returns the result and sorts a variety of object types, in this case sorting dictionary keys automatically:
>>> D {'a': 1, 'c': 3, 'b': 2} >>> for key in sorted(D): print(key, '=>', D[key]) a => 1 b => 2 c => 3
Besides showcasing dictionaries, this use case serves to introduce the Python for loop. The for loop is a simple and efficient way to step through all the items in a sequence
Dictionaries | 93
and run a block of code for each item in turn. A user-defined loop variable (key, here) is used to reference the current item each time through. The net effect in our example is to print the unordered dictionarys keys and values, in sorted-key order. The for loop, and its more general cousin the while loop, are the main ways we code repetitive tasks as statements in our scripts. Really, though, the for loop (like its relative the list comprehension, which we met earlier) is a sequence operation. It works on any object that is a sequence and, like the list comprehension, even on some things that are not. Here, for example, it is stepping across the characters in a string, printing the uppercase version of each as it goes:
>>> for c in 'spam': print([Link]()) S P A M
Pythons while loop is a more general sort of looping tool, not limited to stepping across sequences:
>>> x = 4 >>> while x > 0: print('spam!' * x) x -= 1 spam!spam!spam!spam! spam!spam!spam! spam!spam! spam!
Well discuss looping statements, syntax, and tools in depth later in the book.
This also means that any list comprehension expression, such as this one, which computes the squares of a list of numbers:
>>> squares = [x ** 2 for x in [1, 2, 3, 4, 5]] >>> squares [1, 4, 9, 16, 25]
can always be coded as an equivalent for loop that builds the result list manually by appending as it goes:
>>> squares = [] >>> for x in [1, 2, 3, 4, 5]: [Link](x ** 2) >>> squares [1, 4, 9, 16, 25] # This is what a list comprehension does # Both run the iteration protocol internally
The list comprehension, though, and related functional programming tools like map and filter, will generally run faster than a for loop today (perhaps even twice as fast) a property that could matter in your programs for large data sets. Having said that, though, I should point out that performance measures are tricky business in Python because it optimizes so much, and performance can vary from release to release. A major rule of thumb in Python is to code for simplicity and readability first and worry about performance later, after your program is working, and after youve proved that there is a genuine performance concern. More often than not, your code will be quick enough as it is. If you do need to tweak code for performance, though, Python includes tools to help you out, including the time and timeit modules and the profile module. Youll find more on these later in this book, and in the Python manuals.
This is what we wantits usually a programming error to fetch something that isnt really there. But in some generic programs, we cant always know what keys will be present when we write our code. How do we handle such cases and avoid errors? One trick is to test ahead of time. The dictionary in membership expression allows us to
Dictionaries | 95
query the existence of a key and branch on the result with a Python if statement (as with the for, be sure to press Enter twice to run the if interactively here):
>>> 'f' in D False >>> if not 'f' in D: print('missing') missing
Ill have much more to say about the if statement and statement syntax in general later in this book, but the form were using here is straightforward: it consists of the word if, followed by an expression that is interpreted as a true or false result, followed by a block of code to run if the test is true. In its full form, the if statement can also have an else clause for a default case, and one or more elif (else if) clauses for other tests. Its the main selection tool in Python, and its the way we code logic in our scripts. Still, there are other ways to create dictionaries and avoid accessing nonexistent keys: the get method (a conditional index with a default); the Python 2.X has_key method (which is no longer available in 3.0); the try statement (a tool well first meet in Chapter 10 that catches and recovers from exceptions altogether); and the if/else expression (essentially, an if statement squeezed onto a single line). Here are a few examples:
>>> >>> 0 >>> >>> 0 value = [Link]('x', 0) value value = D['x'] if 'x' in D else 0 value # Index but with a default # if/else expression form
Well save the details on such alternatives until a later chapter. For now, lets move on to tuples.
Tuples
The tuple object (pronounced toople or tuhple, depending on who you ask) is roughly like a list that cannot be changedtuples are sequences, like lists, but they are immutable, like strings. Syntactically, they are coded in parentheses instead of square brackets, and they support arbitrary types, arbitrary nesting, and the usual sequence operations:
>>> T = (1, 2, 3, 4) >>> len(T) 4 >> T + (5, 6) (1, 2, 3, 4, 5, 6) >>> T[0] 1 # A 4-item tuple # Length # Concatenation # Indexing, slicing, and more
Tuples also have two type-specific callable methods in Python 3.0, but not nearly as many as lists:
>>> [Link](4) 3 >>> [Link](4) 1 # Tuple methods: 4 appears at offset 3 # 4 appears once
The primary distinction for tuples is that they cannot be changed once created. That is, they are immutable sequences:
>>> T[0] = 2 # Tuples are immutable ...error text omitted... TypeError: 'tuple' object does not support item assignment
Like lists and dictionaries, tuples support mixed types and nesting, but they dont grow and shrink because they are immutable:
>>> T = ('spam', 3.0, [11, 22, 33]) >>> T[1] 3.0 >>> T[2][1] 22 >>> [Link](4) AttributeError: 'tuple' object has no attribute 'append'
Why Tuples?
So, why have a type that is like a list, but supports fewer operations? Frankly, tuples are not generally used as often as lists in practice, but their immutability is the whole point. If you pass a collection of objects around your program as a list, it can be changed anywhere; if you use a tuple, it cannot. That is, tuples provide a sort of integrity constraint that is convenient in programs larger than those well write here. Well talk more about tuples later in the book. For now, though, lets jump ahead to our last major core type: the file.
Files
File objects are Python codes main interface to external files on your computer. Files are a core type, but theyre something of an oddballthere is no specific literal syntax for creating them. Rather, to create a file object, you call the built-in open function, passing in an external filename and a processing mode as strings. For example, to create a text output file, you would pass in its name and the 'w' processing mode string to write data:
>>> >>> 6 >>> 6 >>> f = open('[Link]', 'w') [Link]('Hello\n') [Link]('world\n') [Link]() # Make a new file in output mode # Write strings of bytes to it # Returns number of bytes written in Python 3.0 # Close to flush output buffers to disk
Files | 97
This creates a file in the current directory and writes text to it (the filename can be a full directory path if you need to access a file elsewhere on your computer). To read back what you just wrote, reopen the file in 'r' processing mode, for reading text inputthis is the default if you omit the mode in the call. Then read the files content into a string, and display it. A files contents are always a string in your script, regardless of the type of data the file contains:
>>> f = open('[Link]') >>> text = [Link]() >>> text 'Hello\nworld\n' >>> print(text) Hello world >>> [Link]() ['Hello', 'world'] # 'r' is the default processing mode # Read entire file into a string
Other file object methods support additional features we dont have time to cover here. For instance, file objects provide more ways of reading and writing (read accepts an optional byte size, readline reads one line at a time, and so on), as well as other tools (seek moves to a new file position). As well see later, though, the best way to read a file today is to not read it at all files provide an iterator that automatically reads line by line in for loops and other contexts. Well meet the full set of file methods later in this book, but if you want a quick preview now, run a dir call on any open file and a help on any of the method names that come back:
>>> dir(f) [ ...many names omitted... 'buffer', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines'] >>>help([Link]) ...try it and see...
Later in the book, well also see that files in Python 3.0 draw a sharp distinction between text and binary data. Text files represent content as strings and perform Unicode encoding and decoding automatically, while binary files represent content as a special bytes string type and allow you to access file content unaltered:
>>> data = open('[Link]', 'rb').read() >>> data b'\x00\x00\x00\x07spam\x00\x08' >>> data[4:8] b'spam' # Open binary file # bytes string holds binary data
Although you wont generally need to care about this distinction if you deal only with ASCII text, Python 3.0s strings and files are an asset if you deal with internationalized applications or byte-oriented data.
In addition, Python recently grew a few new numeric types: decimal numbers (fixedprecision floating-point numbers) and fraction numbers (rational numbers with both a numerator and a denominator). Both can be used to work around the limitations and inherent inaccuracies of floating-point math:
>>> 1 / 3 0.33333333333333331 >>> (2/3) + (1/2) # Floating-point (use .0 in Python 2.6)
1.1666666666666665 >>> import decimal >>> d = [Link]('3.141') >>> d + 1 Decimal('4.141') # Decimals: fixed precision
>>> [Link]().prec = 2 >>> [Link]('1.00') / [Link]('3.00') Decimal('0.33') >>> from fractions import Fraction >>> f = Fraction(2, 3) >>> f + 1 Fraction(5, 3) >>> f + Fraction(1, 2) Fraction(7, 6) # Fractions: numerator+denominator
Python also comes with Booleans (with predefined True and False objects that are essentially just the integers 1 and 0 with custom display logic), and it has long supported a special placeholder object called None commonly used to initialize names and objects:
>>> 1 > 2, 1 < 2 (False, True) >>> bool('spam') True # Booleans
>>> X = None # None placeholder >>> print(X) None >>> L = [None] * 100 # Initialize a list of 100 Nones >>> L [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, ...a list of 100 Nones...]
Besides allowing you to explore your objects interactively, the practical application of this is that it allows code to check the types of the objects it processes. In fact, there are at least three ways to do so in a Python script:
>>> if type(L) == type([]): print('yes') yes >>> if type(L) == list: print('yes') yes >>> if isinstance(L, list): print('yes') yes # Type testing, if you must...
# Object-oriented tests
Now that Ive shown you all these ways to do type testing, however, I am required by law to tell you that doing so is almost always the wrong thing to do in a Python program (and often a sign of an ex-C programmer first starting to use Python!). The reason why wont become completely clear until later in the book, when we start writing larger code units such as functions, but its a (perhaps the) core Python concept. By checking for specific types in your code, you effectively break its flexibilityyou limit it to working on just one type. Without such tests, your code may be able to work on a whole range of types. This is related to the idea of polymorphism mentioned earlier, and it stems from Pythons lack of type declarations. As youll learn, in Python, we code to object interfaces (operations supported), not to types. Not caring about specific types means that code is automatically applicable to many of themany object with a compatible interface will work, regardless of its specific type. Although type checking is supported and even required, in some rare casesyoull see that its not usually the Pythonic way of thinking. In fact, youll find that polymorphism is probably the key idea behind using Python well.
User-Defined Classes
Well study object-oriented programming in Pythonan optional but powerful feature of the language that cuts development time by supporting programming by customizationin depth later in this book. In abstract terms, though, classes define new types of objects that extend the core set, so they merit a passing glance here. Say, for example, that you wish to have a type of object that models employees. Although there is no such specific core type in Python, the following user-defined class might fit the bill:
>>> class Worker: def __init__(self, name, pay): [Link] = name # Initialize when created # self is the new object
[Link] = pay def lastName(self): return [Link]()[-1] def giveRaise(self, percent): [Link] *= (1.0 + percent)
This class defines a new kind of object that will have name and pay attributes (sometimes called state information), as well as two bits of behavior coded as functions (normally called methods). Calling the class like a function generates instances of our new type, and the classs methods automatically receive the instance being processed by a given method call (in the self argument):
>>> bob = Worker('Bob Smith', 50000) >>> sue = Worker('Sue Jones', 60000) >>> [Link]() 'Smith' >>> [Link]() 'Jones' >>> [Link](.10) >>> [Link] 66000.0 # Make two instances # Each has name and pay attrs # Call method: bob is self # sue is the self subject # Updates sue's pay
The implied self object is why we call this an object-oriented model: there is always an implied subject in functions within a class. In a sense, though, the class-based type simply builds on and uses core typesa user-defined Worker object here, for example, is just a collection of a string and a number (name and pay, respectively), plus functions for processing those two built-in objects. The larger story of classes is that their inheritance mechanism supports software hierarchies that lend themselves to customization by extension. We extend software by writing new classes, not by changing what already works. You should also know that classes are an optional feature of Python, and simpler built-in types such as lists and dictionaries are often better tools than user-coded classes. This is all well beyond the bounds of our introductory object-type tutorial, though, so consider this just a preview; for full disclosure on user-defined types coded with classes, youll have to read on to Part VI.
Moreover, keep in mind that the objects weve met here are objects, but not necessarily object-orienteda concept that usually requires inheritance and the Python class statement, which well meet again later in this book. Still, Pythons core objects are the workhorses of almost every Python script youre likely to meet, and they usually are the basis of larger noncore types.
Chapter Summary
And thats a wrap for our concise data type tour. This chapter has offered a brief introduction to Pythons core object types and the sorts of operations we can apply to them. Weve studied generic operations that work on many object types (sequence operations such as indexing and slicing, for example), as well as type-specific operations available as method calls (for instance, string splits and list appends). Weve also defined some key terms, such as immutability, sequences, and polymorphism. Along the way, weve seen that Pythons core object types are more flexible and powerful than what is available in lower-level languages such as C. For instance, Pythons lists and dictionaries obviate most of the work you do to support collections and searching in lower-level languages. Lists are ordered collections of other objects, and dictionaries are collections of other objects that are indexed by key instead of by position. Both dictionaries and lists may be nested, can grow and shrink on demand, and may contain objects of any type. Moreover, their space is automatically cleaned up as you go. Ive skipped most of the details here in order to provide a quick tour, so you shouldnt expect all of this chapter to have made sense yet. In the next few chapters, well start to dig deeper, filling in details of Pythons core object types that were omitted here so you can gain a more complete understanding. Well start off in the next chapter with an in-depth look at Python numbers. First, though, another quiz to review.
5. What does mapping mean, and which core type is a mapping? 6. What is polymorphism, and why should you care?
CHAPTER 5
Numeric Types
This chapter begins our in-depth tour of the Python language. In Python, data takes the form of objectseither built-in objects that Python provides, or objects we create using Python tools and other languages such as C. In fact, objects are the basis of every Python program you will ever write. Because they are the most fundamental notion in Python programming, objects are also our first focus in this book. In the preceding chapter, we took a quick pass over Pythons core object types. Although essential terms were introduced in that chapter, we avoided covering too many specifics in the interest of space. Here, well begin a more careful second look at data type concepts, to fill in details we glossed over earlier. Lets get started by exploring our first data type category: Pythons numeric types.
105
Rational fraction numbers Sets Booleans Unlimited integer precision A variety of numeric built-ins and modules
This chapter starts with basic numbers and fundamentals, then moves on to explore the other tools in this list. Before we jump into code, though, the next few sections get us started with a brief overview of how we write and process numbers in our scripts.
Numeric Literals
Among its basic types, Python provides integers (positive and negative whole numbers) and floating-point numbers (numbers with a fractional part, sometimes called floats for economy). Python also allows us to write integers using hexadecimal, octal, and binary literals; offers a complex number type; and allows integers to have unlimited precision (they can grow to have as many digits as your memory space allows). Table 5-1 shows what Pythons numeric types look like when written out in a program, as literals.
Table 5-1. Basic numeric literals Literal
1234, 24, 0, 99999999999999 1.23, 1., 3.14e-10, 4E210, 4.0e+210 0177, 0x9ff, 0b101010 0o177, 0x9ff, 0b101010 3+4j, 3.0+4.0j, 3J
Interpretation Integers (unlimited size) Floating-point numbers Octal, hex, and binary literals in 2.6 Octal, hex, and binary literals in 3.0 Complex number literals
In general, Pythons numeric type literals are straightforward to write, but a few coding concepts are worth highlighting here: Integer and floating-point literals Integers are written as strings of decimal digits. Floating-point numbers have a decimal point and/or an optional signed exponent introduced by an e or E and followed by an optional sign. If you write a number with a decimal point or exponent, Python makes it a floating-point object and uses floating-point (not integer) math when the object is used in an expression. Floating-point numbers are implemented as C doubles, and therefore get as much precision as the C compiler used to build the Python interpreter gives to doubles.
Integers in Python 2.6: normal and long In Python 2.6 there are two integer types, normal (32 bits) and long (unlimited precision), and an integer may end in an l or L to force it to become a long integer. Because integers are automatically converted to long integers when their values overflow 32 bits, you never need to type the letter L yourselfPython automatically converts up to long integer when extra precision is needed. Integers in Python 3.0: a single type In Python 3.0, the normal and long integer types have been mergedthere is only integer, which automatically supports the unlimited precision of Python 2.6s separate long integer type. Because of this, integers can no longer be coded with a trailing l or L, and integers never print with this character either. Apart from this, most programs are unaffected by this change, unless they do type testing that checks for 2.6 long integers. Hexadecimal, octal, and binary literals Integers may be coded in decimal (base 10), hexadecimal (base 16), octal (base 8), or binary (base 2). Hexadecimals start with a leading 0x or 0X, followed by a string of hexadecimal digits (09 and AF). Hex digits may be coded in lower- or uppercase. Octal literals start with a leading 0o or 0O (zero and lower- or uppercase letter o), followed by a string of digits (07). In 2.6 and earlier, octal literals can also be coded with just a leading 0, but not in 3.0 (this original octal form is too easily confused with decimal, and is replaced by the new 0o format). Binary literals, new in 2.6 and 3.0, begin with a leading 0b or 0B, followed by binary digits (01). Note that all of these literals produce integer objects in program code; they are just alternative syntaxes for specifying values. The built-in calls hex(I), oct(I), and bin(I) convert an integer to its representation string in these three bases, and int(str, base) converts a runtime string to an integer per a given base. Complex numbers Python complex literals are written as realpart+imaginarypart, where the imaginarypart is terminated with a j or J. The realpart is technically optional, so the imaginarypart may appear on its own. Internally, complex numbers are implemented as pairs of floating-point numbers, but all numeric operations perform complex math when applied to complex numbers. Complex numbers may also be created with the complex(real, imag) built-in call. Coding other numeric types As well see later in this chapter, there are additional, more advanced number types not included in Table 5-1. Some of these are created by calling functions in imported modules (e.g., decimals and fractions), and others have literal syntax all their own (e.g., sets).
Description Generator function send protocol Anonymous function generation Ternary selection (x is evaluated only if y is true) Logical OR (y is evaluated only if x is false) Logical AND (y is evaluated only if x is true) Logical negation Membership (iterables, sets) Object identity tests Magnitude comparison, set subset and superset; Value equality operators Bitwise OR, set union Bitwise XOR, set symmetric difference Bitwise AND, set intersection Shift x left or right by y bits Addition, concatenation; Subtraction, set difference Multiplication, repetition; Remainder, format; Division: true and floor Negation, identity Bitwise NOT (inversion) Power (exponentiation) Indexing (sequence, mapping, others) Slicing Call (function, method, class, other callable) Attribute reference Tuple, expression, generator expression List, list comprehension Dictionary, set, set and dictionary comprehensions
Since this book addresses both Python 2.6 and 3.0, here are some notes about version differences and recent additions related to the operators in Table 5-2: In Python 2.6, value inequality can be written as either X != Y or X <> Y. In Python 3.0, the latter of these options is removed because it is redundant. In either version, best practice is to use X != Y for all value inequality tests. In Python 2.6, a backquotes expression `X` works the same as repr(X) and converts objects to display strings. Due to its obscurity, this expression is removed in Python 3.0; use the more readable str and repr built-in functions, described in Numeric Display Formats on page 115. The X // Y floor division expression always truncates fractional remainders in both Python 2.6 and 3.0. The X / Y expression performs true division in 3.0 (retaining remainders) and classic division in 2.6 (truncating for integers). See Division: Classic, Floor, and True on page 117. The syntax [...] is used for both list literals and list comprehension expressions. The latter of these performs an implied loop and collects expression results in a new list. See Chapters 4, 14, and 20 for examples. The syntax (...) is used for tuples and expressions, as well as generator expressionsa form of list comprehension that produces results on demand, instead of building a result list. See Chapters 4 and 20 for examples. The parentheses may sometimes be omitted in all three constructs. The syntax {...} is used for dictionary literals, and in Python 3.0 for set literals and both dictionary and set comprehensions. See the set coverage in this chapter and Chapters 4, 8, 14, and 20 for examples. The yield and ternary if/else selection expressions are available in Python 2.5 and later. The former returns send(...) arguments in generators; the latter is shorthand for a multiline if statement. yield requires parentheses if not alone on the right side of an assignment statement. Comparison operators may be chained: X < Y < Z produces the same result as X < Y and Y < X. See Comparisons: Normal and Chained on page 116 for details. In recent Pythons, the slice expression X[I:J:K] is equivalent to indexing with a slice object: X[slice(I, J, K)]. In Python 2.X, magnitude comparisons of mixed typesconverting numbers to a common type, and ordering other mixed types according to the type nameare allowed. In Python 3.0, nonnumeric mixed-type magnitude comparisons are not allowed and raise exceptions; this includes sorts by proxy. Magnitude comparisons for dictionaries are also no longer supported in Python 3.0 (though equality tests are); comparing sorted([Link]()) is one possible replacement. Well see most of the operators in Table 5-2 in action later; first, though, we need to take a quick look at the ways these operators may be combined in expressions.
110 | Chapter 5:Numeric Types
So, how does Python know which operation to perform first? The answer to this question lies in operator precedence. When you write an expression with more than one operator, Python groups its parts according to what are called precedence rules, and this grouping determines the order in which the expressions parts are computed. Table 5-2 is ordered by operator precedence: Operators lower in the table have higher precedence, and so bind more tightly in mixed expressions. Operators in the same row in Table 5-2 generally group from left to right when combined (except for exponentiation, which groups right to left, and comparisons, which chain left to right). For example, if you write X + Y * Z, Python evaluates the multiplication first (Y * Z), then adds that result to X because * has higher precedence (is lower in the table) than +. Similarly, in this sections original example, both multiplications (A * B and C * D) will happen before their results are added.
In the first case, + is applied to X and Y first, because this subexpression is wrapped in parentheses. In the second case, the * is performed first (just as if there were no parentheses at all). Generally speaking, adding parentheses in large expressions is a good ideait not only forces the evaluation order you want, but also aids readability.
But this leads to another question: what type is the resultinteger or floating-point? The answer is simple, especially if youve used almost any other language before: in mixed-type numeric expressions, Python first converts operands up to the type of the most complicated operand, and then performs the math on same-type operands. This behavior is similar to type conversions in the C language. Python ranks the complexity of numeric types like so: integers are simpler than floatingpoint numbers, which are simpler than complex numbers. So, when an integer is mixed with a floating point, as in the preceding example, the integer is converted up to a floating-point value first, and floating-point math yields the floating-point result. Similarly, any mixed-type expression where one operand is a complex number results in the other operand being converted up to a complex number, and the expression yields a complex result. (In Python 2.6, normal integers are also converted to long integers whenever their values are too large to fit in a normal integer; in 3.0, integers subsume longs entirely.) You can force the issue by calling built-in functions to convert types manually:
>>> int(3.1415) 3 >>> float(3) 3.0 # Truncates float to integer # Converts integer to float
However, you wont usually need to do this: because Python automatically converts up to the more complex type within an expression, the results are normally what you want. Also, keep in mind that all these mixed-type conversions apply only when mixing numeric types (e.g., an integer and a floating-point) in an expression, including those using numeric and comparison operators. In general, Python does not convert across any other type boundaries automatically. Adding a string to an integer, for example, results in an error, unless you manually convert one or the other; watch for an example when we meet strings in Chapter 7.
In Python 2.6, nonnumeric mixed types can be compared, but no conversions are performed (mixed types compare according to a fixed but arbitrary rule). In 3.0, nonnumeric mixed-type comparisons are not allowed and raise exceptions.
For example, the + operator performs addition when applied to numbers but performs concatenation when applied to sequence objects such as strings and lists. In fact, + can mean anything at all when applied to objects you define with classes. As we saw in the prior chapter, this property is usually called polymorphisma term indicating that the meaning of an operation depends on the type of the objects being operated on. Well revisit this concept when we explore functions in Chapter 16, because it becomes a much more obvious feature in that context.
Numbers in Action
On to the code! Probably the best way to understand numeric objects and expressions is to see them in action, so lets start up the interactive command line and try some basic but illustrative operations (see Chapter 3 for pointers if you need help starting an interactive session).
In other words, these assignments cause the variables a and b to spring into existence automatically:
% python >>> a = 3 >>> b = 4 # Name created
Ive also used a comment here. Recall that in Python code, text after a # mark and continuing to the end of the line is considered to be a comment and is ignored. Comments are a way to write human-readable documentation for your code. Because code you type interactively is temporary, you wont normally write comments in this context, but Ive added them to some of this books examples to help explain the code.* In the next part of the book, well meet a related featuredocumentation stringsthat attaches the text of your comments to objects.
* If youre working along, you dont need to type any of the comment text from the # through to the end of the line; comments are simply ignored by Python and not required parts of the statements were running.
Now, lets use our new integer objects in some expressions. At this point, the values of a and b are still 3 and 4, respectively. Variables like these are replaced with their values whenever theyre used inside an expression, and the expression results are echoed back immediately when working interactively:
>>> a + 1, a (4, 2) >>> b * 3, b (12, 2.0) >>> a % 2, b (1, 16) >>> 2 + 4.0, (6.0, 16.0) 1 / 2 ** 2 2.0 ** b # Addition (3 + 1), subtraction (3 - 1) # Multiplication (4 * 3), division (4 / 2) # Modulus (remainder), power (4 ** 2) # Mixed-type conversions
Technically, the results being echoed back here are tuples of two values because the lines typed at the prompt contain two expressions separated by commas; thats why the results are displayed in parentheses (more on tuples later). Note that the expressions work because the variables a and b within them have been assigned values. If you use a different variable that has never been assigned, Python reports an error rather than filling in some default value:
>>> c * 2 Traceback (most recent call last): File "<stdin>", line 1, in ? NameError: name 'c' is not defined
You dont need to predeclare variables in Python, but they must have been assigned at least once before you can use them. In practice, this means you have to initialize counters to zero before you can add to them, initialize lists to an empty list before you can append to them, and so on. Here are two slightly larger expressions to illustrate operator grouping and more about conversions:
>>> b / 2 + a 5.0 >>> print(b / (2.0 + a)) 0.8 # Same as ((4 / 2) + 3) # Same as (4 / (2.0 + 3))
In the first expression, there are no parentheses, so Python automatically groups the components according to its precedence rulesbecause / is lower in Table 5-2 than +, it binds more tightly and so is evaluated first. The result is as if the expression had been organized with parentheses as shown in the comment to the right of the code. Also, notice that all the numbers are integers in the first expression. Because of that, Python 2.6 performs integer division and addition and will give a result of 5, whereas Python 3.0 performs true division with remainders and gives the result shown. If you want integer division in 3.0, code this as b // 2 + a (more on division in a moment). In the second expression, parentheses are added around the + part to force Python to evaluate it first (i.e., before the /). We also made one of the operands floating-point by adding a decimal point: 2.0. Because of the mixed types, Python converts the integer
114 | Chapter 5:Numeric Types
referenced by a to a floating-point value (3.0) before performing the +. If all the numbers in this expression were integers, integer division (4 / 5) would yield the truncated integer 0 in Python 2.6 but the floating-point 0.8 in Python 3.0 (again, stay tuned for division details).
The full story behind this odd result has to do with the limitations of floating-point hardware and its inability to exactly represent some values in a limited number of bits. Because computer architecture is well beyond this books scope, though, well finesse this by saying that all of the digits in the first output are really there in your computers floating-point hardwareits just that youre not accustomed to seeing them. In fact, this is really just a display issuethe interactive prompts automatic result echo shows more digits than the print statement. If you dont want to see all the digits, use print; as the sidebar str and repr Display Formats on page 116 will explain, youll get a user-friendly display. Note, however, that not all values have so many digits to display:
>>> 1 / 2.0 0.5
and that there are more ways to display the bits of a number inside your computer than using print and automatic echoes:
>>> num = 1 / 3.0 >>> num 0.33333333333333331 >>> print(num) 0.333333333333 >>> '%e' % num '3.333333e-001' >>> '%4.2f' % num '0.33' >>> '{0:4.2f}'.format(num) '0.33' # Echoes # print rounds # String formatting expression # Alternative floating-point format # String formatting method (Python 2.6 and 3.0)
The last three of these expressions employ string formatting, a tool that allows for format flexibility, which we will explore in the upcoming chapter on strings (Chapter 7). Its results are strings that are typically printed to displays or reports.
Both of these convert arbitrary objects to their string representations: repr (and the default interactive echo) produces results that look as though they were code; str (and the print operation) converts to a typically more user-friendly format if available. Some objects have botha str for general use, and a repr with extra details. This notion will resurface when we study both strings and operator overloading in classes, and youll find more on these built-ins in general later in the book. Besides providing print strings for arbitrary objects, the str built-in is also the name of the string data type and may be called with an encoding name to decode a Unicode string from a byte string. Well study the latter advanced role in Chapter 36 of this book.
Notice again how mixed types are allowed in numeric expressions (only); in the second test here, Python compares values in terms of the more complex type, float. Interestingly, Python also allows us to chain multiple comparisons together to perform range tests. Chained comparisons are a sort of shorthand for larger Boolean expressions. In short, Python lets us string together magnitude comparison tests to code chained comparisons such as range tests. The expression (A < B < C), for instance, tests whether B is between A and C; it is equivalent to the Boolean test (A < B and B < C) but is easier on the eyes (and the keyboard). For example, assume the following assignments:
The following two expressions have identical effects, but the first is shorter to type, and it may run slightly faster since Python needs to evaluate Y only once:
>>> X < Y < Z True >>> X < Y and Y < Z True # Chained comparisons: range tests
The same equivalence holds for false results, and arbitrary chain lengths are allowed:
>>> X < Y > Z False >>> X < Y and Y > Z False >>> 1 < 2 < 3.0 < 4 True >>> 1 > 2 > 3.0 > 4 False
You can use other comparisons in chained tests, but the resulting expressions can become nonintuitive unless you evaluate them the way Python does. The following, for instance, is false just because 1 is not equal to 2:
>>> 1 == 2 < 3 False # Same as: 1 == 2 and 2 < 3 # Not same as: False < 3 (which means 0 < 3, which is true)
Python does not compare the 1 == 2 False result to 3this would technically mean the same as 0 < 3, which would be True (as well see later in this chapter, True and False are just customized 1 and 0).
Classic and true division. In Python 2.6 and earlier, this operator performs classic division, truncating results for integers and keeping remainders for floating-point numbers. In Python 3.0, it performs true division, always keeping remainders regardless of types.
X // Y
Floor division. Added in Python 2.2 and available in both Python 2.6 and 3.0, this operator always truncates fractional remainders down to their floor, regardless of types.
True division was added to address the fact that the results of the original classic division model are dependent on operand types, and so can be difficult to anticipate in a dynamically typed language like Python. Classic division was removed in 3.0 because of this constraintthe / and // operators implement true and floor division in 3.0. In sum: In 3.0, the / now always performs true division, returning a float result that includes any remainder, regardless of operand types. The // performs floor division, which truncates the remainder and returns an integer for integer operands or a float if any operand is a float. In 2.6, the / does classic division, performing truncating integer division if both operands are integers and float division (keeping remainders) otherwise. The // does floor division and works as it does in 3.0, performing truncating division for integers and floor division for floats. Here are the two operators at work in 3.0 and 2.6:
C:\misc> C:\Python30\python >>> >>> 10 / 4 # Differs in 3.0: keeps remainder 2.5 >>> 10 // 4 # Same in 3.0: truncates remainder 2 >>> 10 / 4.0 # Same in 3.0: keeps remainder 2.5 >>> 10 // 4.0 # Same in 3.0: truncates to floor 2.0 C:\misc> C:\Python26\python >>> >>> 10 / 4 2 >>> 10 // 4 2 >>> 10 / 4.0 2.5 >>> 10 // 4.0 2.0
Notice that the data type of the result for // is still dependent on the operand types in 3.0: if either is a float, the result is a float; otherwise, it is an integer. Although this may seem similar to the type-dependent behavior of / in 2.X that motivated its change in 3.0, the type of the return value is much less critical than differences in the return value itself. Moreover, because // was provided in part as a backward-compatibility tool for programs that rely on truncating integer division (and this is more common than you might expect), it must return integers for integers.
Alternatively, you can enable 3.0 / division in 2.6 with a __future__ import, rather than forcing it with float conversions:
C:\misc> C:\Python26\python >>> from __future__ import division >>> 10 / 4 2.5 >>> 10 // 4 2 # Enable 3.0 "/" behavior
When running division operators, you only really truncate for positive results, since truncation is the same as floor; for negatives, its a floor result (really, they are both floor, but floor is the same as truncation for positives). Heres the case for 3.0:
C:\misc> c:\python30\python >>> 5 / 2, 5 / 2 (2.5, 2.5) >>> 5 // 2, 5 // 2 (2, 3) >>> 5 / 2.0, 5 / 2.0 (2.5, 2.5) # Truncates to floor: rounds to first lower integer # 2.5 becomes 2, 2.5 becomes 3
If you really want truncation regardless of sign, you can always run a float division result through [Link], regardless of Python version (also see the round built-in for related functionality):
C:\misc> c:\python30\python >>> import math >>> 5 / 2 2.5 >>> 5 // 2 -3 >>> [Link](5 / 2) 2 C:\misc> c:\python26\python >>> import math >>> 5 / float(2) 2.5 >>> 5 / 2, 5 // 2 (3, 3) >>> [Link](5 / float(2)) 2 # Keep remainder # Floor below result # Truncate instead of floor
>>> (5 // 2), (5 // 2.0), (5 // 2.0), (5 // 2) (2, 2.0, 3.0, 3) >>> (9 / 3), (9.0 / 3), (9 // 3), (9 // 3.0) (3, 3.0, 3, 3.0)
Although results have yet to come in, its possible that the nontruncating behavior of / in 3.0 may break a significant number of programs. Perhaps because of a C language legacy, many programmers rely on division truncation for integers and will have to learn to use // in such contexts instead. Watch for a simple prime number while loop example in Chapter 13, and a corresponding exercise at the end of Part IV that illustrates the sort of code that may be impacted by this / change. Also stay tuned for more on the special from command used in this section; its discussed further in Chapter 24.
Integer Precision
Division may differ slightly across Python releases, but its still fairly standard. Heres something a bit more exotic. As mentioned earlier, Python 3.0 integers support unlimited size:
>>> 999999999999999999999999999999 + 1 1000000000000000000000000000000
Python 2.6 has a separate type for long integers, but it automatically converts any number too large to store in a normal integer to this type. Hence, you dont need to code any special syntax to use longs, and the only way you can tell that youre using 2.6 longs is that they print with a trailing L:
>>> 999999999999999999999999999999 + 1 1000000000000000000000000000000L
Unlimited-precision integers are a convenient built-in tool. For instance, you can use them to count the U.S. national debt in pennies in Python directly (if you are so inclined, and have enough memory on your computer for this years budget!). They are also why we were able to raise 2 to such large powers in the examples in Chapter 3. Here are the 3.0 and 2.6 cases:
>>> 2 ** 200 1606938044258990275541962092341162602522202993782792835301376 >>> 2 ** 200 1606938044258990275541962092341162602522202993782792835301376L
Because Python must do extra work to support their extended precision, integer math is usually substantially slower than normal when numbers grow large. However, if you need the precision, the fact that its built in for you to use will likely outweigh its performance penalty.
Complex Numbers
Although less widely used than the types weve been exploring thus far, complex numbers are a distinct core object type in Python. If you know what they are, you know why they are useful; if not, consider this section optional reading. Complex numbers are represented as two floating-point numbersthe real and imaginary partsand are coded by adding a j or J suffix to the imaginary part. We can also write complex numbers with a nonzero real part by adding the two parts with a +. For example, the complex number with a real part of 2 and an imaginary part of 3 is written 2 + 3j. Here are some examples of complex math at work:
>>> 1j * 1J (-1+0j) >>> 2 + 1j * 3 (2+3j) >>> (2 + 1j) * 3 (6+3j)
Complex numbers also allow us to extract their parts as attributes, support all the usual mathematical expressions, and may be processed with tools in the standard cmath module (the complex version of the standard math module). Complex numbers typically find roles in engineering-oriented programs. Because they are advanced tools, check Pythons language reference manual for additional details.
Here, the octal value 0o377, the hex value 0xFF, and the binary value 0b11111111 are all decimal 255. Python prints in decimal (base 10) by default but provides built-in functions that allow you to convert integers to other bases digit strings:
>>> oct(64), hex(64), bin(64) ('0100', '0x40', '0b1000000')
The oct function converts decimal to octal, hex to hexadecimal, and bin to binary. To go the other way, the built-in int function converts a string of digits to an integer, and an optional second argument lets you specify the numeric base:
>>> int('64'), int('100', 8), int('40', 16), int('1000000', 2) (64, 64, 64, 64) >>> int('0x40', 16), int('0b1000000', 2) (64, 64) # Literals okay too
The eval function, which youll meet later in this book, treats strings as though they were Python code. Therefore, it has a similar effect (but usually runs more slowlyit actually compiles and runs the string as a piece of a program, and it assumes you can trust the source of the string being run; a clever user might be able to submit a string that deletes files on your machine!):
>>> eval('64'), eval('0o100'), eval('0x40'), eval('0b1000000') (64, 64, 64, 64)
Finally, you can also convert integers to octal and hexadecimal strings with string formatting method calls and expressions:
>>> '{0:o}, {1:x}, {2:b}'.format(64, 64, 64) '100, 40, 1000000' >>> '%o, %x, %X' % (64, 255, 255) '100, ff, FF'
String formatting is covered in more detail in Chapter 7. Two notes before moving on. First, Python 2.6 users should remember that you can code octals with simply a leading zero, the original octal format in Python:
>>> (1, >>> (1, 0o1, 0o20, 0o377 16, 255) 01, 020, 0377 16, 255) # New octal format in 2.6 (same as 3.0) # Old octal literals in 2.6 (and earlier)
In 3.0, the syntax in the second of these examples generates an error. Even though its not an error in 2.6, be careful not to begin a string of digits with a leading zero unless you really mean to code an octal value. Python 2.6 will treat it as base 8, which may not work as youd expect010 is always decimal 8 in 2.6, not decimal 10 (despite what you may or may not think!). This, along with symmetry with the hex and binary forms, is why the octal format was changed in 3.0you must use 0o010 in 3.0, and probably should in 2.6. Secondly, note that these literals can produce arbitrarily long integers. The following, for instance, creates an integer with hex notation and then displays it first in decimal and then in octal and binary with converters:
>>> X = 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFF >>> X 5192296858534827628530496329220095L >>> oct(X)
Speaking of binary digits, the next section shows tools for processing individual bits.
Bitwise Operations
Besides the normal numeric operations (addition, subtraction, and so on), Python supports most of the numeric expressions available in the C language. This includes operators that treat integers as strings of binary bits. For instance, here it is at work performing bitwise shift and Boolean operations:
>>> >>> 4 >>> 3 >>> 1 x = 1 x << 2 x | 2 x & 1 # 0001 # Shift left 2 bits: 0100 # Bitwise OR: 0011 # Bitwise AND: 0001
In the first expression, a binary 1 (in base 2, 0001) is shifted left two slots to create a binary 4 (0100). The last two operations perform a binary OR (0001|0010 = 0011) and a binary AND (0001&0001 = 0001). Such bit-masking operations allow us to encode multiple flags and other values within a single integer. This is one area where the binary and hexadecimal number support in Python 2.6 and 3.0 become especially usefulthey allow us to code and inspect numbers by bit-strings:
>>> X = 0b0001 >>> X << 2 4 >>> bin(X << 2) '0b100' >>> bin(X | 0b010) '0b11' >>> bin(X & 0b1) '0b1' # Binary literals # Shift left # Binary digits string # Bitwise OR # Bitwise AND
>>> X = 0xFF # Hex literals >>> bin(X) '0b11111111' >>> X ^ 0b10101010 # Bitwise XOR 85 >>> bin(X ^ 0b10101010) '0b1010101' >>> int('1010101', 2) 85 >>> hex(85) '0x55' # String to int per base # Hex digit string
We wont go into much more detail on bit-twiddling here. Its supported if you need it, and it comes in handy if your Python code must deal with things like network packets or packed binary data produced by a C program. Be aware, though, that bitwise operations are often not as important in a high-level language such as Python as they are in a low-level language such as C. As a rule of thumb, if you find yourself wanting to flip bits in Python, you should think about which language youre really coding. In general, there are often better ways to encode information in Python than bit strings.
In the upcoming Python 3.1 release, the integer bit_length method also allows you to query the number of bits required to represent a numbers value in binary. The same effect can often be achieved by subtracting 2 from the length of the bin string using the len built-in function we met in Chapter 4, though it may be less efficient:
>>> X = 99 >>> bin(X), X.bit_length() ('0b1100011', 7) >>> bin(256), (256).bit_length() ('0b100000000', 9) >>> len(bin(256)) - 2 9
The sum function shown here works on a sequence of numbers, and min and max accept either a sequence or individual arguments. There are a variety of ways to drop the
Numbers in Action | 125
decimal digits of floating-point numbers. We met truncation and floor earlier; we can also round, both numerically and for display purposes:
>>> [Link](2.567), [Link](-2.567) (2, 3) >>> [Link](2.567), [Link](2.567) (2, 2) >>> int(2.567), int(2.567) (2, 2) >>> round(2.567), round(2.467), round(2.567, 2) (3, 2, 2.5699999999999998) >>> '%.1f' % 2.567, '{0:.2f}'.format(2.567) ('2.6', '2.57') # Floor (next-lower integer) # Truncate (drop decimal digits) # Truncate (integer conversion) # Round (Python 3.0 version) # Round for display (Chapter 7)
As we saw earlier, the last of these produces strings that we would usually print and supports a variety of formatting options. As also described earlier, the second to last test here will output (3, 2, 2.57) if we wrap it in a print call to request a more userfriendly display. The last two lines still differ, thoughround rounds a floating-point number but still yields a floating-point number in memory, whereas string formatting produces a string and doesnt yield a modified number:
>>> (1 / 3), round(1 / 3, 2), ('%.2f' % (1 / 3)) (0.33333333333333331, 0.33000000000000002, '0.33')
Interestingly, there are three ways to compute square roots in Python: using a module function, an expression, or a built-in function (if youre interested in performance, we will revisit these in an exercise and its solution at the end of Part IV, to see which runs quicker):
>>> import math >>> [Link](144) 12.0 >>> 144 ** .5 12.0 >>> pow(144, .5) 12.0 >>> [Link](1234567890) 35136.418286444619 >>> 1234567890 ** .5 35136.418286444619 >>> pow(1234567890, .5) 35136.418286444619 # Module # Expression # Built-in # Larger numbers
Notice that standard library modules such as math must be imported, but built-in functions such as abs and round are always available without imports. In other words, modules are external components, but built-in functions live in an implied namespace that Python automatically searches to find names used in your program. This namespace corresponds to the module called builtins in Python 3.0 (__builtin__ in 2.6). There
126 | Chapter 5:Numeric Types
is much more about name resolution in the function and module parts of this book; for now, when you hear module, think import. The standard library random module must be imported as well. This module provides tools for picking a random floating-point number between 0 and 1, selecting a random integer between two numbers, choosing an item at random from a sequence, and more:
>>> import random >>> [Link]() 0.44694718823781876 >>> [Link]() 0.28970426439292829 >>> [Link](1, 10) 5 >>> [Link](1, 10) 4 >>> [Link](['Life of Brian', 'Holy Grail', 'Meaning of Life']) 'Life of Brian' >>> [Link](['Life of Brian', 'Holy Grail', 'Meaning of Life']) 'Holy Grail'
The random module can be useful for shuffling cards in games, picking images at random in a slideshow GUI, performing statistical simulations, and much more. For more details, see Pythons library manual.
Decimal Type
Python 2.4 introduced a new core numeric type: the decimal object, formally known as Decimal. Syntactically, decimals are created by calling a function within an imported module, rather than running a literal expression. Functionally, decimals are like floating-point numbers, but they have a fixed number of decimal points. Hence, decimals are fixed-precision floating-point values. For example, with decimals, we can have a floating-point value that always retains just two decimal digits. Furthermore, we can specify how to round or truncate the extra decimal digits beyond the objects cutoff. Although it generally incurs a small performance penalty compared to the normal floating-point type, the decimal type is well suited to representing fixed-precision quantities like sums of money and can help you achieve better numeric accuracy.
The basics
The last point merits elaboration. As you may or may not already know, floating-point math is less than exact, because of the limited space used to store values. For example, the following should yield zero, but it does not. The result is close to zero, but there are not enough bits to be precise here:
>>> 0.1 + 0.1 + 0.1 - 0.3 5.5511151231257827e-17
Printing the result to produce the user-friendly display format doesnt completely help either, because the hardware related to floating-point math is inherently limited in terms of accuracy:
>>> print(0.1 + 0.1 + 0.1 - 0.3) 5.55111512313e-17
As shown here, we can make decimal objects by calling the Decimal constructor function in the decimal module and passing in strings that have the desired number of decimal digits for the resulting object (we can use the str function to convert floating-point values to strings if needed). When decimals of different precision are mixed in expressions, Python converts up to the largest number of decimal digits automatically:
>>> Decimal('0.1') + Decimal('0.10') + Decimal('0.10') - Decimal('0.30') Decimal('0.00')
In Python 3.1 (to be released after this books publication), its also possible to create a decimal object from a floating-point object, with a call of the form [Link].from_float(1.25). The conversion is exact but can sometimes yield a large number of digits.
This is especially useful for monetary applications, where cents are represented as two decimal digits. Decimals are essentially an alternative to manual rounding and string formatting in this context:
>>> 1999 + 1.33 2000.3299999999999 >>> >>> [Link]().prec = 2 >>> pay = [Link](str(1999 + 1.33)) >>> pay Decimal('2000.33')
Though useful, this statement requires much more background knowledge than youve obtained at this point; watch for coverage of the with statement in Chapter 33. Because use of the decimal type is still relatively rare in practice, Ill defer to Pythons standard library manuals and interactive help for more details. And because decimals address some of the same floating-point accuracy issues as the fraction type, lets move on to the next section to see how the two compare.
Fraction Type
Python 2.6 and 3.0 debut a new numeric type, Fraction, which implements a rational number object. It essentially keeps both a numerator and a denominator explicitly, so as to avoid some of the inaccuracies and limitations of floating-point math.
The basics
Fraction is a sort of cousin to the existing Decimal fixed-precision type described in the
prior section, as both can be used to control numerical accuracy by fixing decimal digits and specifying rounding or truncation policies. Its also used in similar wayslike
Decimal, Fraction resides in a module; import its constructor and pass in a numerator
Fraction objects can also be created from floating-point number strings, much like
decimals:
>>> Fraction('.25') Fraction(1, 4) >>> Fraction('1.25') Fraction(5, 4) >>> >>> Fraction('.25') + Fraction('1.25') Fraction(3, 2)
Numeric accuracy
Notice that this is different from floating-point-type math, which is constrained by the underlying limitations of floating-point hardware. To compare, here are the same operations run with floating-point objects, and notes on their limited accuracy:
>>> a = 1 / 3.0 >>> b = 4 / 6.0 >>> a 0.33333333333333331 >>> b 0.66666666666666663 >>> a + b 1.0 >>> a - b -0.33333333333333331 >>> a * b 0.22222222222222221 # Only as accurate as floating-point hardware # Can lose precision over calculations
This floating-point limitation is especially apparent for values that cannot be represented accurately given their limited number of bits in memory. Both Fraction and
130 | Chapter 5:Numeric Types
Decimal provide ways to get exact results, albeit at the cost of some speed. For instance,
in the following example (repeated from the prior section), floating-point numbers do not accurately give the zero answer expected, but both of the other types do:
>>> 0.1 + 0.1 + 0.1 - 0.3 5.5511151231257827e-17 # This should be zero (close, but not exact)
>>> from fractions import Fraction >>> Fraction(1, 10) + Fraction(1, 10) + Fraction(1, 10) - Fraction(3, 10) Fraction(0, 1) >>> from decimal import Decimal >>> Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3') Decimal('0.0')
Moreover, fractions and decimals both allow more intuitive and accurate results than floating points sometimes can, in different ways (by using rational representation and by limiting precision):
>>> 1 / 3 0.33333333333333331 >>> Fraction(1, 3) Fraction(1, 3) # Use 3.0 in Python 2.6 for true "/" # Numeric accuracy
In fact, fractions both retain accuracy and automatically simplify results. Continuing the preceding interaction:
>>> (1 / 3) + (6 / 12) 0.83333333333333326 >>> Fraction(6, 12) Fraction(1, 2) >>> Fraction(1, 3) + Fraction(6, 12) Fraction(5, 6) >>> [Link](str(1/3)) + [Link](str(6/12)) Decimal('0.83') >>> 1000.0 / 1234567890 8.1000000737100011e-07 >>> Fraction(1000, 1234567890) Fraction(100, 123456789) # Use ".0" in Python 2.6 for true "/" # Automatically simplified
float accepts a Fraction as an argument. Trace through the following interaction to see how this pans out (the * in the second test is special syntax that expands a tuple
into individual arguments; more on this when we study function argument passing in Chapter 18):
>>> (2.5).as_integer_ratio() (5, 2) >>> f = 2.5 >>> z = Fraction(*f.as_integer_ratio()) >>> z Fraction(5, 2) >>> x Fraction(1, 3) >>> x + z Fraction(17, 6) >>> float(x) 0.33333333333333331 >>> float(z) 2.5 >>> float(x + z) 2.8333333333333335 >>> 17 / 6 2.8333333333333335 >>> Fraction.from_float(1.75) Fraction(7, 4) >>> Fraction(*(1.75).as_integer_ratio()) Fraction(7, 4) # float object method
# Convert float -> fraction: two args # Same as Fraction(5, 2) # x from prior interaction # 5/2 + 1/3 = 15/6 + 2/6 # Convert fraction -> float
Finally, some type mixing is allowed in expressions, though Fraction must sometimes be manually propagated to retain accuracy. Study the following interaction to see how this works:
>>> x Fraction(1, 3) >>> x + 2 Fraction(7, 3) >>> x + 2.0 2.3333333333333335 >>> x + (1./3) 0.66666666666666663 >>> x + (4./3) 1.6666666666666665 >>> x + Fraction(4, 3) Fraction(5, 3) # Fraction + int -> Fraction # Fraction + float -> float # Fraction + float -> float
Caveat: although you can convert from floating-point to fraction, in some cases there is an unavoidable precision loss when you do so, because the number is inaccurate in its original floating-point form. When needed, you can simplify such results by limiting the maximum denominator value:
132 | Chapter 5:Numeric Types
>>> x Fraction(1, 3) >>> a = x + Fraction(*(4.0 / 3).as_integer_ratio()) >>> a Fraction(22517998136852479, 13510798882111488) >>> 22517998136852479 / 13510798882111488. 1.6666666666666667 >>> a.limit_denominator(10) Fraction(5, 3) # 5 / 3 (or close to it!) # Simplify to closest fraction
For more details on the Fraction type, experiment further on your own and consult the Python 2.6 and 3.0 library manuals and other documentation.
Sets
Python 2.4 also introduced a new collection type, the setan unordered collection of unique and immutable objects that supports operations corresponding to mathematical set theory. By definition, an item appears only once in a set, no matter how many times it is added. As such, sets have a variety of applications, especially in numeric and database-focused work. Because sets are collections of other objects, they share some behavior with objects such as lists and dictionaries that are outside the scope of this chapter. For example, sets are iterable, can grow and shrink on demand, and may contain a variety of object types. As well see, a set acts much like the keys of a valueless dictionary, but it supports extra operations. However, because sets are unordered and do not map keys to values, they are neither sequence nor mapping types; they are a type category unto themselves. Moreover, because sets are fundamentally mathematical in nature (and for many readers, may seem more academic and be used much less often than more pervasive objects like dictionaries), well explore the basic utility of Pythons set objects here.
You get back a set object, which contains all the items in the object passed in (notice that sets do not have a positional ordering, and so are not sequences):
>>> x set(['a', 'c', 'b', 'e', 'd']) # 2.6 display format
Sets made this way support the common mathematical set operations with expression operators. Note that we cant perform these expressions on plain sequenceswe must create sets from them in order to apply these tools:
>>> 'e' in x True >>> x y set(['a', 'c', 'e']) >>> x | y set(['a', 'c', 'b', 'e', 'd', 'y', 'x', 'z']) >>> x & y set(['b', 'd']) >>> x ^ y set(['a', 'c', 'e', 'y', 'x', 'z']) >>> x > y, x < y (False, False) # Membership # Difference # Union # Intersection # Symmetric difference (XOR) # Superset, subset
In addition to expressions, the set object provides methods that correspond to these operations and more, and that support set changesthe set add method inserts one item, update is an in-place union, and remove deletes an item by value (run a dir call on any set instance or the set type name to see all the available methods). Assuming x and y are still as they were in the prior interaction:
>>> z = [Link](y) >>> z set(['b', 'd']) >>> [Link]('SPAM') >>> z set(['b', 'd', 'SPAM']) >>> [Link](set(['X', 'Y'])) >>> z set(['Y', 'X', 'b', 'd', 'SPAM']) >>> [Link]('b') >>> z set(['Y', 'X', 'd', 'SPAM']) # Same as x & y # Insert one item # Merge: in-place union # Delete one item
As iterable containers, sets can also be used in operations such as len, for loops, and list comprehensions. Because they are unordered, though, they dont support sequence operations like indexing and slicing:
>>> for item in set('abc'): print(item * 3) ... aaa
ccc bbb
Finally, although the set expressions shown earlier generally require two sets, their method-based counterparts can often work with any iterable type as well:
>>> S = set([1, 2, 3]) >>> S | set([3, 4]) # Expressions require both to be sets set([1, 2, 3, 4]) >>> S | [3, 4] TypeError: unsupported operand type(s) for |: 'set' and 'list' >>> [Link]([3, 4]) # But their methods allow any iterable set([1, 2, 3, 4]) >>> [Link]((1, 3, 5)) set([1, 3]) >>> [Link](range(-5, 5)) True
For more details on set operations, see Pythons library reference manual or a reference book. Although set operations can be coded manually in Python with other types, like lists and dictionaries (and often were in the past), Pythons built-in sets use efficient algorithms and implementation techniques to provide quick and standard operation.
This syntax makes sense, given that sets are essentially like valueless dictionaries because they are unordered, unique, and immutable, a sets items behave much like a dictionarys keys. This operational similarity is even more striking given that dictionary key lists in 3.0 are view objects, which support set-like behavior such as intersections and unions (see Chapter 8 for more on dictionary view objects). In fact, regardless of how a set is made, 3.0 displays it using the new literal format. The set built-in is still required in 3.0 to create empty sets and to build sets from existing iterable objects (short of using set comprehensions, discussed later in this chapter), but the new literal is convenient for initializing sets of known structure:
C:\Misc> c:\python30\python >>> set([1, 2, 3, 4]) {1, 2, 3, 4} >>> set('spam') {'a', 'p', 's', 'm'} >>> {1, 2, 3, 4} # Built-in: same as in 2.6 # Add all items in an iterable # Set literals: new in 3.0
{1, 2, 3, 4} >>> S = {'s', 'p', 'a', 'm'} >>> [Link]('alot') >>> S {'a', 'p', 's', 'm', 'alot'}
All the set processing operations discussed in the prior section work the same in 3.0, but the result sets print differently:
>>> S1 = {1, 2, 3, 4} >>> S1 & {1, 3} {1, 3} >>> {1, 5, 3, 6} | S1 {1, 2, 3, 4, 5, 6} >>> S1 - {1, 3, 4} {2} >>> S1 > {1, 3} True # Intersection # Union # Difference # Superset
Note that {} is still a dictionary in Python. Empty sets must be created with the set built-in, and print the same way:
>>> S1 - {1, 2, 3, 4} set() >>> type({}) <class 'dict'> >>> S = set() >>> [Link](1.23) >>> S {1.23} # Empty sets print differently # Because {} is an empty dictionary # Initialize an empty set
As in Python 2.6, sets created with 3.0 literals support the same methods, some of which allow general iterable operands that expressions do not:
>>> {1, 2, 3} | {3, 4} {1, 2, 3, 4} >>> {1, 2, 3} | [3, 4] TypeError: unsupported operand type(s) for |: 'set' and 'list' >>> {1, >>> {1, >>> {1, {1, 2, 3}.union([3, 4]) 2, 3, 4} {1, 2, 3}.union({3, 4}) 2, 3, 4} {1, 2, 3}.union(set([3, 4])) 2, 3, 4}
>>> {1, 2, 3}.intersection((1, 3, 5)) {1, 3} >>> {1, 2, 3}.issubset(range(-5, 5)) True
only contain immutable (a.k.a hashable) object types. Hence, lists and dictionaries cannot be embedded in sets, but tuples can if you need to store compound values. Tuples compare by their full values when used in set operations:
>>> S {1.23} >>> [Link]([1, 2, 3]) TypeError: unhashable type: 'list' >>> [Link]({'a':1}) TypeError: unhashable type: 'dict' >>> [Link]((1, 2, 3)) >>> S {1.23, (1, 2, 3)} >>> S | {(4, 5, 6), (1, 2, 3)} {1.23, (4, 5, 6), (1, 2, 3)} >>> (1, 2, 3) in S True >>> (1, 4, 3) in S False # Only mutable objects work in a set
# No list or dict, but tuple okay # Union: same as [Link](...) # Membership: by complete values
Tuples in a set, for instance, might be used to represent dates, records, IP addresses, and so on (more on tuples later in this part of the book). Sets themselves are mutable too, and so cannot be nested in other sets directly; if you need to store a set inside another set, the frozenset built-in call works just like set but creates an immutable set that cannot change and thus can be embedded in other sets.
In this expression, the loop is coded on the right, and the collection expression is coded on the left (x ** 2). As for list comprehensions, we get back pretty much what this expression says: Give me a new set containing X squared, for every X in a list. Comprehensions can also iterate across other kinds of objects, such as strings (the first of the following examples illustrates the comprehension-based way to make a set from an existing iterable):
>>> {x for x in 'spam'} {'a', 'p', 's', 'm'} >>> {c * 4 for c in 'spam'} {'ssss', 'aaaa', 'pppp', 'mmmm'} >>> {c * 4 for c in 'spamham'} # Same as: set('spam') # Set of collected expression results
{'ssss', 'aaaa', 'hhhh', 'pppp', 'mmmm'} >>> S = {c * 4 for c in 'spam'} >>> S | {'mmmm', 'xxxx'} {'ssss', 'aaaa', 'pppp', 'mmmm', 'xxxx'} >>> S & {'mmmm', 'xxxx'} {'mmmm'}
Because the rest of the comprehensions story relies upon underlying concepts were not yet prepared to address, well postpone further details until later in this book. In Chapter 8, well meet a first cousin in 3.0, the dictionary comprehension, and Ill have much more to say about all comprehensions (list, set, dictionary, and generator) later, especially in Chapters14 and 20. As well learn later, all comprehensions, including sets, support additional syntax not shown here, including nested loops and if tests, which can be difficult to understand until youve had a chance to study larger statements.
Why sets?
Set operations have a variety of common uses, some more practical than mathematical. For example, because items are stored only once in a set, sets can be used to filter duplicates out of other collections. Simply convert the collection to a set, and then convert it back again (because sets are iterable, they work in the list call here):
>>> >>> {1, >>> >>> [1, L = [1, 2, 1, 3, 2, 4, 5] set(L) 2, 3, 4, 5} L = list(set(L)) L 2, 3, 4, 5]
# Remove duplicates
Sets can also be used to keep track of where youve already been when traversing a graph or other cyclic structure. For example, the transitive module reloader and inheritance tree lister examples well study in Chapters 24 and 30, respectively, must keep track of items visited to avoid loops. Although recording states visited as keys in a dictionary is efficient, sets offer an alternative thats essentially equivalent (and may be more or less intuitive, depending on who you ask). Finally, sets are also convenient when dealing with large data sets (database query results, for example)the intersection of two sets contains objects in common to both categories, and the union contains all items in either set. To illustrate, heres a somewhat more realistic example of set operations at work, applied to lists of people in a hypothetical company, using 3.0 set literals (use set in 2.6):
>>> engineers = {'bob', 'sue', 'ann', 'vic'} >>> managers = {'tom', 'sue'} >>> 'bob' in engineers True >>> engineers & managers # Is bob an engineer? # Who is both engineer and manager?
{'sue'} >>> engineers | managers {'vic', 'sue', 'tom', 'bob', 'ann'} >>> engineers managers {'vic', 'bob', 'ann'} >>> managers engineers {'tom'} >>> engineers > managers False >>> {'bob', 'sue'} < engineers True >>> (managers | engineers) > managers True >>> managers ^ engineers {'vic', 'bob', 'ann', 'tom'} # All people in either category # Engineers who are not managers # Managers who are not engineers # Are all managers engineers? (superset) # Are both engineers? (subset) # All people is a superset of managers # Who is in one but not both? # Intersection!
You can find more details on set operations in the Python library manual and some mathematical and relational database theory texts. Also stay tuned for Chapter 8s revival of some of the set operations weve seen here, in the context of dictionary view objects in Python 3.0.
Booleans
Some argue that the Python Boolean type, bool, is numeric in nature because its two values, True and False, are just customized versions of the integers 1 and 0 that print themselves differently. Although thats all most programmers need to know, lets explore this type in a bit more detail. More formally, Python today has an explicit Boolean data type called bool, with the values True and False available as new preassigned built-in names. Internally, the names True and False are instances of bool, which is in turn just a subclass (in the objectoriented sense) of the built-in integer type int. True and False behave exactly like the integers 1 and 0, except that they have customized printing logicthey print themselves as the words True and False, instead of the digits 1 and 0. bool accomplishes this by redefining str and repr string formats for its two objects. Because of this customization, the output of Boolean expressions typed at the interactive prompt prints as the words True and False instead of the older and less obvious 1 and 0. In addition, Booleans make truth values more explicit. For instance, an infinite loop can now be coded as while True: instead of the less intuitive while 1:. Similarly,
flags can be initialized more clearly with flag = False. Well discuss these statements further in Part III. Again, though, for all other practical purposes, you can treat True and False as though they are predefined variables set to integer 1 and 0. Most programmers used to preassign True and False to 1 and 0 anyway; the bool type simply makes this standard. Its implementation can lead to curious results, though. Because True is just the integer 1 with a custom display format, True + 4 yields 5 in Python:
>>> type(True) <class 'bool'> >>> isinstance(True, int) True >>> True == 1 True >>> True is 1 False >>> True or False True >>> True + 4 5
# Same value # But different object: see the next chapter # Same as: 1 or 0 # (Hmmm)
Since you probably wont come across an expression like the last of these in real Python code, you can safely ignore its deeper metaphysical implications.... Well revisit Booleans in Chapter 9 (to define Pythons notion of truth) and again in Chapter 12 (to see how Boolean operators like and and or work).
Numeric Extensions
Finally, although Python core numeric types offer plenty of power for most applications, there is a large library of third-party open source extensions available to address more focused needs. Because numeric programming is a popular domain for Python, youll find a wealth of advanced tools. For example, if you need to do serious number crunching, an optional extension for Python called NumPy (Numeric Python) provides advanced numeric programming tools, such as a matrix data type, vector processing, and sophisticated computation libraries. Hardcore scientific programming groups at places like Los Alamos and NASA use Python with NumPy to implement the sorts of tasks they previously coded in C++, FORTRAN, or Matlab. The combination of Python and NumPy is often compared to a free, more flexible version of Matlabyou get NumPys performance, plus the Python language and its libraries. Because its so advanced, we wont talk further about NumPy in this book. You can find additional support for advanced numeric programming in Python, including graphics and plotting tools, statistics libraries, and the popular SciPy package at Pythons PyPI site, or by searching the Web. Also note that NumPy is currently an optional extension; it doesnt come with Python and must be installed separately.
140 | Chapter 5:Numeric Types
Chapter Summary
This chapter has taken a tour of Pythons numeric object types and the operations we can apply to them. Along the way, we met the standard integer and floating-point types, as well as some more exotic and less commonly used types such as complex numbers, fractions, and sets. We also explored Pythons expression syntax, type conversions, bitwise operations, and various literal forms for coding numbers in scripts. Later in this part of the book, Ill fill in some details about the next object type, the string. In the next chapter, however, well take some time to explore the mechanics of variable assignment in more detail than we have here. This turns out to be perhaps the most fundamental idea in Python, so make sure you check out the next chapter before moving on. First, though, its time to take the usual chapter quiz.
5.
6.
7.
8.
9.
expression X ** 2 or the built-in function pow(X, 2). Either of these last two can also compute the square root when given a power of 0.5 (e.g., X ** .5). The result will be a floating-point number: the integers are converted up to floating point, the most complex type in the expression, and floating-point math is used to evaluate it. The int(N) and [Link](N) functions truncate, and the round(N, digits) function rounds. We can also compute the floor with [Link](N) and round for display with string formatting operations. The float(I) function converts an integer to a floating point; mixing an integer with a floating point within an expression will result in a conversion as well. In some sense, Python 3.0 / division converts tooit always returns a floating-point result that includes the remainder, even if both operands are integers. The oct(I) and hex(I) built-in functions return the octal and hexadecimal string forms for an integer. The bin(I) call also returns a numbers binary digits string in Python 2.6 and 3.0. The % string formatting expression and format string method also provide targets for some such conversions. The int(S, base) function can be used to convert from octal and hexadecimal strings to normal integers (pass in 8, 16, or 2 for the base). The eval(S) function can be used for this purpose too, but its more expensive to run and can have security issues. Note that integers are always stored in binary in computer memory; these are just display string format conversions.
CHAPTER 6
In the prior chapter, we began exploring Pythons core object types in depth with a look at Python numbers. Well resume our object type tour in the next chapter, but before we move on, its important that you get a handle on what may be the most fundamental idea in Python programming and is certainly the basis of much of both the conciseness and flexibility of the Python languagedynamic typing, and the polymorphism it yields. As youll see here and later in this book, in Python, we do not declare the specific types of the objects our scripts use. In fact, programs should not even care about specific types; in exchange, they are naturally applicable in more contexts than we can sometimes even plan ahead for. Because dynamic typing is the root of this flexibility, lets take a brief look at the model here.
143
at least conceptually, Python will perform three distinct steps to carry out the request. These steps reflect the operation of all assignments in the Python language: 1. Create an object to represent the value 3. 2. Create the variable a, if it does not yet exist. 3. Link the variable a to the new object 3. The net result will be a structure inside Python that resembles Figure 6-1. As sketched, variables and objects are stored in different parts of memory and are associated by links (the link is shown as a pointer in the figure). Variables always link to objects and never to other variables, but larger objects may link to other objects (for instance, a list object has links to the objects it contains).
Figure 6-1. Names and objects after running the assignment a = 3. Variable a becomes a reference to the object 3. Internally, the variable is really a pointer to the objects memory space created by running the literal expression 3.
These links from variables to objects are called references in Pythonthat is, a reference is a kind of association, implemented as a pointer in memory.* Whenever the variables are later used (i.e., referenced), Python automatically follows the variable-to-object links. This is all simpler than the terminology may imply. In concrete terms: Variables are entries in a system table, with spaces for links to objects. Objects are pieces of allocated memory, with enough space to represent the values for which they stand. References are automatically followed pointers from variables to objects. At least conceptually, each time you generate a new value in your script by running an expression, Python creates a new object (i.e., a chunk of memory) to represent that value. Internally, as an optimization, Python caches and reuses certain kinds of unchangeable objects, such as small integers and strings (each 0 is not really a new piece of memorymore on this caching behavior later). But, from a logical perspective, it works as though each expressions result value is a distinct object and each object is a distinct piece of memory. Technically speaking, objects have more structure than just enough space to represent their values. Each object also has two standard header fields: a type designator used to mark the type of the object, and a reference counter used to determine when its OK to reclaim the object. To understand how these two header fields factor into the model, we need to move on.
* Readers with a background in C may find Python references similar to C pointers (memory addresses). In fact, references are implemented as pointers, and they often serve the same roles, especially with objects that can be changed in-place (more on this later). However, because references are always automatically dereferenced when used, you can never actually do anything useful with a reference itself; this is a feature that eliminates a vast category of C bugs. You can think of Python references as C void* pointers, which are automatically followed whenever used.
This isnt typical Python code, but it does worka starts out as an integer, then becomes a string, and finally becomes a floating-point number. This example tends to look especially odd to ex-C programmers, as it appears as though the type of a changes from integer to string when we say a = 'spam'. However, thats not really whats happening. In Python, things work more simply. Names have no types; as stated earlier, types live with objects, not names. In the preceding listing, weve simply changed a to reference different objects. Because variables have no type, we havent actually changed the type of the variable a; weve simply made the variable reference a different type of object. In fact, again, all we can ever say about a variable in Python is that it references a particular object at a particular point in time. Objects, on the other hand, know what type they areeach object contains a header field that tags the object with its type. The integer object 3, for example, will contain the value 3, plus a designator that tells Python that the object is an integer (strictly speaking, a pointer to an object called int, the name of the integer type). The type designator of the 'spam' string object points to the string type (called str) instead. Because objects know their types, variables dont have to. To recap, types are associated with objects in Python, not with variables. In typical code, a given variable usually will reference just one kind of object. Because this isnt a requirement, though, youll find that Python code tends to be much more flexible than you may be accustomed toif you use Python well, your code might work on many types automatically. I mentioned that objects have two header fields, a type designator and a reference counter. To understand the latter of these, we need to move on and take a brief look at what happens at the end of an objects life.
The answer is that in Python, whenever a name is assigned to a new object, the space held by the prior object is reclaimed (if it is not referenced by any other name or object). This automatic reclamation of objects space is known as garbage collection. To illustrate, consider the following example, which sets the name x to a different object on each assignment:
146 | Chapter 6:The Dynamic Typing Interlude
x x x x
= = = =
# Reclaim 42 now (unless referenced elsewhere) # Reclaim 'shrubbery' now # Reclaim 3.1415 now
First, notice that x is set to a different type of object each time. Again, though this is not really the case, the effect is as though the type of x is changing over time. Remember, in Python types live with objects, not names. Because names are just generic references to objects, this sort of code works naturally. Second, notice that references to objects are discarded along the way. Each time x is assigned to a new object, Python reclaims the prior objects space. For instance, when it is assigned the string 'shrubbery', the object 42 is immediately reclaimed (assuming it is not referenced anywhere else)that is, the objects space is automatically thrown back into the free space pool, to be reused for a future object. Internally, Python accomplishes this feat by keeping a counter in every object that keeps track of the number of references currently pointing to that object. As soon as (and exactly when) this counter drops to zero, the objects memory space is automatically reclaimed. In the preceding listing, were assuming that each time x is assigned to a new object, the prior objects reference counter drops to zero, causing it to be reclaimed. The most immediately tangible benefit of garbage collection is that it means you can use objects liberally without ever needing to free up space in your script. Python will clean up unused space for you as your program runs. In practice, this eliminates a substantial amount of bookkeeping code required in lower-level languages such as C and C++.
Technically speaking, Pythons garbage collection is based mainly upon reference counters, as described here; however, it also has a component that detects and reclaims objects with cyclic references in time. This component can be disabled if youre sure that your code doesnt create cycles, but it is enabled by default. Because references are implemented as pointers, its possible for an object to reference itself, or reference another object that does. For example, exercise 3 at the end of Part I and its solution in Appendix B show how to create a cycle by embedding a reference to a list within itself. The same phenomenon can occur for assignments to attributes of objects created from user-defined classes. Though relatively rare, because the reference counts for such objects never drop to zero, they must be treated specially. For more details on Pythons cycle detector, see the documentation for the gc module in Pythons library manual. Also note that this description of Pythons garbage collector applies to the standard CPython only; Jython and IronPython may use different schemes, though the net effect in all is similarunused space is reclaimed for you automatically.
Shared References
So far, weve seen what happens as a single variable is assigned references to objects. Now lets introduce another variable into our interaction and watch what happens to its names and objects:
>>> a = 3 >>> b = a
Typing these two statements generates the scene captured in Figure 6-2. The second line causes Python to create the variable b; the variable a is being used and not assigned here, so it is replaced with the object it references (3), and b is made to reference that object. The net effect is that the variables a and b wind up referencing the same object (that is, pointing to the same chunk of memory). This scenario, with multiple names referencing the same object, is called a shared reference in Python.
Figure 6-2. Names and objects after next running the assignment b = a. Variable b becomes a reference to the object 3. Internally, the variable is really a pointer to the objects memory space created by running the literal expression 3.
As with all Python assignments, this statement simply makes a new object to represent the string value 'spam' and sets a to reference this new object. It does not, however, change the value of b; b still references the original object, the integer 3. The resulting reference structure is shown in Figure 6-3. The same sort of thing would happen if we changed b to 'spam' insteadthe assignment would change only b, not a. This behavior also occurs if there are no type differences at all. For example, consider these three statements:
>>> a = 3 >>> b = a >>> a = a + 2
Figure 6-3. Names and objects after finally running the assignment a = spam. Variable a references the new object (i.e., piece of memory) created by running the literal expression spam, but variable b still refers to the original object 3. Because this assignment is not an in-place change to the object 3, it changes only variable a, not b.
In this sequence, the same events transpire. Python makes the variable a reference the object 3 and makes b reference the same object as a, as in Figure 6-2; as before, the last assignment then sets a to a completely different object (in this case, the integer 5, which is the result of the + expression). It does not change b as a side effect. In fact, there is no way to ever overwrite the value of the object 3as introduced in Chapter 4, integers are immutable and thus can never be changed in-place. One way to think of this is that, unlike in some languages, in Python variables are always pointers to objects, not labels of changeable memory areas: setting a variable to a new value does not alter the original object, but rather causes the variable to reference an entirely different object. The net effect is that assignment to a variable can impact only the single variable being assigned. When mutable objects and in-place changes enter the equation, though, the picture changes somewhat; to see how, lets move on.
L1 here is a list containing the objects 2, 3, and 4. Items inside a list are accessed by their positions, so L1[0] refers to object 2, the first item in the list L1. Of course, lists are also
objects in their own right, just like integers and strings. After running the two prior assignments, L1 and L2 reference the same object, just like a and b in the prior example (see Figure 6-2). Now say that, as before, we extend this interaction to say the following:
>>> L1 = 24
This assignment simply sets L1 is to a different object; L2 still references the original list. If we change this statements syntax slightly, however, it has a radically different effect:
>>> L1 = [2, 3, 4] >>> L2 = L1 >>> L1[0] = 24 >>> L1 [24, 3, 4] >>> L2 [24, 3, 4] # A mutable object # Make a reference to the same object # An in-place change # L1 is different # But so is L2!
Really, we havent changed L1 itself here; weve changed a component of the object that L1 references. This sort of change overwrites part of the list object in-place. Because the list object is shared by (referenced from) other variables, though, an in-place change like this doesnt only affect L1that is, you must be aware that when you make such changes, they can impact other parts of your program. In this example, the effect shows up in L2 as well because it references the same object as L1. Again, we havent actually changed L2, either, but its value will appear different because it has been overwritten. This behavior is usually what you want, but you should be aware of how it works, so that its expected. Its also just the default: if you dont want such behavior, you can request that Python copy objects instead of making references. There are a variety of ways to copy a list, including using the built-in list function and the standard library copy module. Perhaps the most common way is to slice from start to finish (see Chapters 4 and 7 for more on slicing):
>>> L1 = [2, 3, 4] >>> L2 = L1[:] >>> L1[0] = 24 >>> L1 [24, 3, 4] >>> L2 [2, 3, 4] # Make a copy of L1
# L2 is not changed
Here, the change made through L1 is not reflected in L2 because L2 references a copy of the object L1 references; that is, the two variables point to different pieces of memory.
Note that this slicing technique wont work on the other major mutable core types, dictionaries and sets, because they are not sequencesto copy a dictionary or set, instead use their [Link]() method call. Also, note that the standard library copy module has a call for copying any object type generically, as well as a call for copying nested object structures (a dictionary with nested lists, for example):
import copy X = [Link](Y) X = [Link](Y) # Make top-level "shallow" copy of any object Y # Make deep copy of any object Y: copy all nested parts
Well explore lists and dictionaries in more depth, and revisit the concept of shared references and copies, in Chapters 8 and 9. For now, keep in mind that objects that can be changed in-place (that is, mutable objects) are always open to these kinds of effects. In Python, this includes lists, dictionaries, and some objects defined with class statements. If this is not the desired behavior, you can simply copy your objects as needed.
Because Python caches and reuses small integers and small strings, as mentioned earlier, the object 42 here is probably not literally reclaimed; instead, it will likely remain in a system table to be reused the next time you generate a 42 in your code. Most kinds of objects, though, are reclaimed immediately when they are no longer referenced; for those that are not, the caching mechanism is irrelevant to your code. For instance, because of Pythons reference model, there are two different ways to check for equality in a Python program. Lets create a shared reference to demonstrate:
>>> L >>> M >>> L True >>> L True = [1, 2, 3] = L == M is M # M and L reference the same object # Same value # Same object
The first technique here, the == operator, tests whether the two referenced objects have the same values; this is the method almost always used for equality checks in Python. The second method, the is operator, instead tests for object identityit returns True only if both names point to the exact same object, so it is a much stronger form of equality testing.
Really, is simply compares the pointers that implement references, and it serves as a way to detect shared references in your code if needed. It returns False if the names point to equivalent but different objects, as is the case when we run two different literal expressions:
>>> L >>> M >>> L True >>> L False = [1, 2, 3] = [1, 2, 3] == M is M # M and L reference different objects # Same values # Different objects
Now, watch what happens when we perform the same operations on small numbers:
>>> X >>> Y >>> X True >>> X True = 42 = 42 == Y is Y # Should be two different objects # Same object anyhow: caching at work!
In this interaction, X and Y should be == (same value), but not is (same object) because we ran two different literal expressions. Because small integers and strings are cached and reused, though, is tells us they reference the same single object. In fact, if you really want to look under the hood, you can always ask Python how many references there are to an object: the getrefcount function in the standard sys module returns the objects reference count. When I ask about the integer object 1 in the IDLE GUI, for instance, it reports 837 reuses of this same object (most of which are in IDLEs system code, not mine):
>>> import sys >>> [Link](1) 837 # 837 pointers to this shared piece of memory
This object caching and reuse is irrelevant to your code (unless you run the is check!). Because you cannot change numbers or strings in-place, it doesnt matter how many references there are to the same object. Still, this behavior reflects one of the many ways Python optimizes its model for execution speed.
contexts. As youll see, it works the same in assignment statements, function arguments, for loop variables, module imports, class attributes, and more. The good news is that there is just one assignment model in Python; once you get a handle on dynamic typing, youll find that it works the same everywhere in the language. At the most practical level, dynamic typing means there is less code for you to write. Just as importantly, though, dynamic typing is also the root of Pythons polymorphism, a concept we introduced in Chapter 4 and will revisit again later in this book. Because we do not constrain types in Python code, it is highly flexible. As youll see, when used well, dynamic typing and the polymorphism it provides produce code that automatically adapts to new requirements as your systems evolve.
Chapter Summary
This chapter took a deeper look at Pythons dynamic typing modelthat is, the way that Python keeps track of object types for us automatically, rather than requiring us to code declaration statements in our scripts. Along the way, we learned how variables and objects are associated by references in Python; we also explored the idea of garbage collection, learned how shared references to objects can affect multiple variables, and saw how references impact the notion of equality in Python. Because there is just one assignment model in Python, and because assignment pops up everywhere in the language, its important that you have a handle on the model before moving on. The following quiz should help you review some of this chapters ideas. After that, well resume our object tour in the next chapter, with strings.
CHAPTER 7
Strings
The next major type on our built-in object tour is the Python stringan ordered collection of characters used to store and represent text-based information. We looked briefly at strings in Chapter 4. Here, we will revisit them in more depth, filling in some of the details we skipped then. From a functional perspective, strings can be used to represent just about anything that can be encoded as text: symbols and words (e.g., your name), contents of text files loaded into memory, Internet addresses, Python programs, and so on. They can also be used to hold the absolute binary values of bytes, and multibyte Unicode text used in internationalized programs. You may have used strings in other languages, too. Pythons strings serve the same role as character arrays in languages such as C, but they are a somewhat higher-level tool than arrays. Unlike in C, in Python, strings come with a powerful set of processing tools. Also unlike languages such as C, Python has no distinct type for individual characters; instead, you just use one-character strings. Strictly speaking, Python strings are categorized as immutable sequences, meaning that the characters they contain have a left-to-right positional order and that they cannot be changed in-place. In fact, strings are the first representative of the larger class of objects called sequences that we will study here. Pay special attention to the sequence operations introduced in this chapter, because they will work the same on other sequence types well explore later, such as lists and tuples. Table 7-1 previews common string literals and operations we will discuss in this chapter. Empty strings are written as a pair of quotation marks (single or double) with nothing in between, and there are a variety of ways to code strings. For processing, strings support expression operations such as concatenation (combining strings), slicing (extracting sections), indexing (fetching by offset), and so on. Besides expressions, Python also provides a set of string methods that implement common string-specific tasks, as well as modules for more advanced text-processing tasks such as pattern matching. Well explore all of these later in the chapter.
155
Interpretation Empty string Double quotes, same as single Escape sequences Triple-quoted block strings Raw strings Byte strings in 3.0 (Chapter 36) Unicode strings in 2.6 only (Chapter 36) Concatenate, repeat Index, slice, length
String formatting expression String formatting method in 2.6 and 3.0 String method calls: search, remove whitespace, replacement, split on delimiter, content test, case conversion, end test, delimiter join, Unicode encoding, etc. Iteration, membership
Beyond the core set of string tools in Table 7-1, Python also supports more advanced pattern-based string processing with the standard librarys re (regular expression) module, introduced in Chapter 4, and even higher-level text processing tools such as XML parsers, discussed briefly in Chapter 36. This books scope, though, is focused on the fundamentals represented by Table 7-1.
To cover the basics, this chapter begins with an overview of string literal forms and string expressions, then moves on to look at more advanced tools such as string methods and formatting. Python comes with many string tools, and we wont look at them all here; the complete story is chronicled in the Python library manual. Our goal here is to explore enough commonly used tools to give you a representative sample; methods we wont see in action here, for example, are largely analogous to those we will.
Content note: Technically speaking, this chapter tells only part of the string story in Pythonthe part most programmers need to know. It presents the basic str string type, which handles ASCII text and works the same regardless of which version of Python you use. That is, this chapter intentionally limits its scope to the string processing essentials that are used in most Python scripts. From a more formal perspective, ASCII is a simple form of Unicode text. Python addresses the distinction between text and binary data by including distinct object types: In Python 3.0 there are three string types: str is used for Unicode text (ASCII or otherwise), bytes is used for binary data (including encoded text), and bytearray is a mutable variant of bytes. In Python 2.6, unicode strings represent wide Unicode text, and str strings handle both 8-bit text and binary data. The bytearray type is also available as a back-port in 2.6, but not earlier, and its not as closely bound to binary data as it is in 3.0. Because most programmers dont need to dig into the details of Unicode encodings or binary data formats, though, Ive moved all such details to the Advanced Topics part of this book, in Chapter 36. If you do need to deal with more advanced string concepts such as alternative character sets or packed binary data and files, see Chapter 36 after reading the material here. For now, well focus on the basic string type and its operations. As youll find, the basics well study here also apply directly to the more advanced string types in Pythons toolset.
String Literals
By and large, strings are fairly easy to use in Python. Perhaps the most complicated thing about them is that there are so many ways to write them in your code: Single quotes: 'spa"m' Double quotes: "spa'm" Triple quotes: '''... spam ...''', """... spam ...""" Escape sequences: "s\tp\na\0m" Raw strings: r"C:\new\[Link]"
Byte strings in 3.0 (see Chapter 36): b'sp\x01am' Unicode strings in 2.6 only (see Chapter 36): u'eggs\u0020spam' The single- and double-quoted forms are by far the most common; the others serve specialized roles, and were postponing discussion of the last two advanced forms until Chapter 36. Lets take a quick look at all the other options in turn.
The reason for supporting both is that it allows you to embed a quote character of the other variety inside a string without escaping it with a backslash. You may embed a single quote character in a string enclosed in double quote characters, and vice versa:
>>> 'knight"s', "knight's" ('knight"s', "knight's")
Incidentally, Python automatically concatenates adjacent string literals in any expression, although it is almost as simple to add a + operator between them to invoke concatenation explicitly (as well see in Chapter 12, wrapping this form in parentheses also allows it to span multiple lines):
>>> title = "Meaning " 'of' " Life" >>> title 'Meaning of Life' # Implicit concatenation
Notice that adding commas between these strings would result in a tuple, not a string. Also notice in all of these outputs that Python prefers to print strings in single quotes, unless they embed one. You can also embed quotes by escaping them with backslashes:
>>> 'knight\'s', "knight\"s" ("knight's", 'knight"s')
value specified by the escape sequence. For example, here is a five-character string that embeds a newline and a tab:
>>> s = 'a\nb\tc'
The two characters \n stand for a single characterthe byte containing the binary value of the newline character in your character set (usually, ASCII code 10). Similarly, the sequence \t is replaced with the tab character. The way this string looks when printed depends on how you print it. The interactive echo shows the special characters as escapes, but print interprets them instead:
>>> s 'a\nb\tc' >>> print(s) a b c
To be completely sure how many bytes are in this string, use the built-in len function it returns the actual number of bytes in a string, regardless of how it is displayed:
>>> len(s) 5
This string is five bytes long: it contains an ASCII a byte, a newline byte, an ASCII b byte, and so on. Note that the original backslash characters are not really stored with the string in memory; they are used to tell Python to store special byte values in the string. For coding such special bytes, Python recognizes a full set of escape code sequences, listed in Table 7-2.
Table 7-2. String backslash characters Escape
\newline \\ \' \" \a \b \f \n \r \t \v \xhh \ooo \0
Meaning Ignored (continuation line) Backslash (stores one \) Single quote (stores ') Double quote (stores ") Bell Backspace Formfeed Newline (linefeed) Carriage return Horizontal tab Vertical tab Character with hex value hh (at most 2 digits) Character with octal value ooo (up to 3 digits) Null: binary 0 character (doesnt end string)
Escape
\N{ id } \uhhhh \Uhhhhhhhh \other
a
Meaning Unicode database ID Unicode 16-bit hex Unicode 32-bit hexa Not an escape (keeps both \ and other)
The \Uhhhh... escape sequence takes exactly eight hexadecimal digits (h); both \u and \U can be used only in Unicode string literals.
Some escape sequences allow you to embed absolute binary values into the bytes of a string. For instance, heres a five-character string that embeds two binary zero bytes (coded as octal escapes of one digit):
>>> s = 'a\0b\0c' >>> s 'a\x00b\x00c' >>> len(s) 5
In Python, the zero (null) byte does not terminate a string the way it typically does in C. Instead, Python keeps both the strings length and text in memory. In fact, no character terminates a string in Python. Heres a string that is all absolute binary escape codesa binary 1 and 2 (coded in octal), followed by a binary 3 (coded in hexadecimal):
>>> s = '\001\002\x03' >>> s '\x01\x02\x03' >>> len(s) 3
Notice that Python displays nonprintable characters in hex, regardless of how they were specified. You can freely combine absolute value escapes and the more symbolic escape types in Table 7-2. The following string contains the characters spam, a tab and newline, and an absolute zero value byte coded in hex:
>>> S = "s\tp\na\x00m" >>> S 's\tp\na\x00m' >>> len(S) 7 >>> print(S) s p a m
This becomes more important to know when you process binary data files in Python. Because their contents are represented as strings in your scripts, its OK to process binary files that contain any sorts of binary byte values (more on files in Chapter 9).*
* If you need to care about binary data files, the chief distinction is that you open them in binary mode (using open mode flags with a b, such as 'rb', 'wb', and so on). In Python 3.0, binary file content is a bytes string, with an interface similar to that of normal strings; in 2.6, such content is a normal str string. See also the standard struct module introduced in Chapter 9, which can parse binary data loaded from a file, and the extended coverage of binary files and byte strings in Chapter 36.
Finally, as the last entry in Table 7-2 implies, if Python does not recognize the character after a \ as being a valid escape code, it simply keeps the backslash in the resulting string:
>>> x = "C:\py\code" >>> x 'C:\\py\\code' >>> len(x) 10 # Keeps \ literally
Unless youre able to commit all of Table 7-2 to memory, though, you probably shouldnt rely on this behavior. To code literal backslashes explicitly such that they are retained in your strings, double them up (\\ is an escape for one \) or use raw strings; the next section shows how.
thinking that they will open a file called [Link] in the directory C:\new. The problem here is that \n is taken to stand for a newline character, and \t is replaced with a tab. In effect, the call tries to open a file named C:(newline)ew(tab)[Link], with usually less than stellar results. This is just the sort of thing that raw strings are useful for. If the letter r (uppercase or lowercase) appears just before the opening quote of a string, it turns off the escape mechanism. The result is that Python retains your backslashes literally, exactly as you type them. Therefore, to fix the filename problem, just remember to add the letter r on Windows:
myfile = open(r'C:\new\[Link]', 'w')
Alternatively, because two backslashes are really an escape sequence for one backslash, you can keep your backslashes by simply doubling them up:
myfile = open('C:\\new\\[Link]', 'w')
In fact, Python itself sometimes uses this doubling scheme when it prints strings with embedded backslashes:
>>> path = r'C:\new\[Link]' >>> path 'C:\\new\\[Link]' >>> print(path) # Show as Python code # User-friendly format
In classes, Ive met people who have indeed committed most or all of this table to memory; Id probably think that was really sick, but for the fact that Im a member of the set, too.
# String length
As with numeric representation, the default format at the interactive prompt prints results as if they were code, and therefore escapes backslashes in the output. The print statement provides a more user-friendly format that shows that there is actually only one backslash in each spot. To verify this is the case, you can check the result of the built-in len function, which returns the number of bytes in the string, independent of display formats. If you count the characters in the print(path) output, youll see that there really is just 1 character per backslash, for a total of 15. Besides directory paths on Windows, raw strings are also commonly used for regular expressions (text pattern matching, supported with the re module introduced in Chapter 4). Also note that Python scripts can usually use forward slashes in directory paths on Windows and Unix because Python tries to interpret paths portably (i.e., 'C:/new/ [Link]' works when opening files, too). Raw strings are useful if you code paths using native Windows backslashes, though.
Despite its role, even a raw string cannot end in a single backslash, because the backslash escapes the following quote characteryou still must escape the surrounding quote character to embed it in the string. That is, r"...\" is not a valid string literala raw string cannot end in an odd number of backslashes. If you need to end a raw string with a single backslash, you can use two and slice off the second (r'1\nb\tc\\'[:-1]), tack one on manually (r'1\nb\tc' + '\\'), or skip the raw string syntax and just double up the backslashes in a normal string ('1\\nb\\tc\\'). All three of these forms create the same eightcharacter string containing three backslashes.
This string spans three lines (in some interfaces, the interactive prompt changes to ... on continuation lines; IDLE simply drops down one line). Python collects all the triple-quoted text into a single multiline string, with embedded newline characters (\n) at the places where your code has line breaks. Notice that, as in the literal, the second line in the result has a leading space, but the third does notwhat you type is truly what you get. To see the string with the newlines interpreted, print it instead of echoing:
>>> print(mantra) Always look on the bright side of life.
Triple-quoted strings are useful any time you need multiline text in your program; for example, to embed multiline error messages or HTML or XML code in your source code files. You can embed such blocks directly in your scripts without resorting to external text files or explicit concatenation and newline characters. Triple-quoted strings are also commonly used for documentation strings, which are string literals that are taken as comments when they appear at specific points in your file (more on these later in the book). These dont have to be triple-quoted blocks, but they usually are to allow for multiline comments. Finally, triple-quoted strings are also sometimes used as a horribly hackish way to temporarily disable lines of code during development (OK, its not really too horrible, and its actually a fairly common practice). If you wish to turn off a few lines of code and run your script again, simply put three quotes above and below them, like this:
X = 1 """ import os print([Link]()) """ Y = 2 # Disable this code temporarily
I said this was hackish because Python really does make a string out of the lines of code disabled this way, but this is probably not significant in terms of performance. For large sections of code, its also easier than manually adding hash marks before each line and later removing them. This is especially true if you are using a text editor that does not have support for editing Python code specifically. In Python, practicality often beats aesthetics.
Strings in Action
Once youve created a string with the literal expressions we just met, you will almost certainly want to do things with it. This section and the next two demonstrate string expressions, methods, and formattingthe first line of text-processing tools in the Python language.
Basic Operations
Lets begin by interacting with the Python interpreter to illustrate the basic string operations listed earlier in Table 7-1. Strings can be concatenated using the + operator and repeated using the * operator:
% python >>> len('abc') 3 >>> 'abc' + 'def' 'abcdef' >>> 'Ni!' * 4 'Ni!Ni!Ni!Ni!' # Length: number of items # Concatenation: a new string # Repetition: like "Ni!" + "Ni!" + ...
Formally, adding two string objects creates a new string object, with the contents of its operands joined. Repetition is like adding a string to itself a number of times. In both cases, Python lets you create arbitrarily sized strings; theres no need to predeclare anything in Python, including the sizes of data structures. The len built-in function returns the length of a string (or any other object with a length). Repetition may seem a bit obscure at first, but it comes in handy in a surprising number of contexts. For example, to print a line of 80 dashes, you can count up to 80, or let Python count for you:
>>> print('------- ...more... ---') >>> print('-' * 80) # 80 dashes, the hard way # 80 dashes, the easy way
Notice that operator overloading is at work here already: were using the same + and * operators that perform addition and multiplication when using numbers. Python does the correct operation because it knows the types of the objects being added and multiplied. But be careful: the rules arent quite as liberal as you might expect. For instance, Python doesnt allow you to mix numbers and strings in + expressions: 'abc'+9 raises an error instead of automatically converting 9 to a string. As shown in the last row in Table 7-1, you can also iterate over strings in loops using for statements and test membership for both characters and substrings with the in expression operator, which is essentially a search. For substrings, in is much like the [Link]() method covere