📚 Python Basics & Beyond
Brief Overview
This note covers Python programming and was created from the Harvard CS50’s Introduction to Programming
with Python – Full University Course YouTube video. It’s a 958‑minute deep dive that walks you through the
fundamentals of the language, from simple statements to complex projects, so you can jump straight into coding
without missing a beat.
Key Points
Core Python syntax and data types
Building and using functions
Control flow with if‑else, loops, and comprehensions
Working with files, libraries, and unit tests
📚 Course Overview
CS50’s Introduction to Programming with Python teaches reading, writing, testing, and debugging Python
code.
Designed for learners with or without prior programming experience.
Covers core concepts: functions, variables, conditionals, loops, exceptions, libraries, unit tests, file
I/O, regular expressions, and object‑oriented programming.
🛠️ Development Environment
A text editor is any program that lets you write plain text; code is just text.
Visual Studio Code (VS Code) is a popular free editor with:
Syntax coloring (e.g., print appears in blue).
Integrated terminal for command‑line execution.
Debugging tools.
Any plain‑text editor (e.g., Notepad, Sublime Text) works; avoid word processors like Microsoft Word.
▶️ Running Python Programs
The Python interpreter reads a .py file and translates it into machine‑readable binary.
1. Create a file ending in .py (e.g., hello.py).
2. In the terminal, execute:
python hello.py
The $ symbol denotes the command prompt, not currency.
🧩 Functions
A function is a reusable action (verb) that performs a task.
Built‑in example: print() displays output.
Functions may have arguments (inputs) that influence behavior.
print("Hello, World")
Functions can produce side effects such as displaying text or playing audio.
📥 Input & Return Values
The input() function prompts the user and returns the entered string.
name = input("What’s your name? ")
The returned value is stored in a variable for later use.
🔀 Variables & Assignment
A variable is a named container that holds a value in memory.
Assignment uses a single = (assignment operator), copying the right‑hand value to the left‑hand name.
name = input("What’s your name? ")
print(name)
Omitting quotes around a variable name tells Python to use the variable’s value instead of the literal
text.
⚠️ Syntax Errors & Debugging
A syntax error occurs when code violates the language’s grammatical rules.
Example: missing a closing parenthesis.
print("Hello, World" # ← SyntaxError: unexpected EOF while parsing
The interpreter points out the location of the error; fixing the typo resolves the issue.
🗒️ Comments
Comments are notes for humans that the interpreter ignores.
In Python, a comment starts with #.
# This line explains the purpose of the following code
📊 Programming Paradigms Overview
Paradigm Description Typical Use
Procedural Write step‑by‑step functions that Simple scripts, scripts that follow a
execute in order. linear flow
Object‑Oriented Model real‑world entities as Larger applications, code reuse via
objects with attributes & methods. classes
Functional Emphasize pure functions, Data transformation pipelines,
immutability, and higher‑order concise logic
functions.
🧩 Additional Topics Mentioned
Conditionals – execute code based on true/false tests.
Loops – repeat actions a set number of times or while a condition holds.
Exceptions – handle runtime errors gracefully using try/except.
Libraries – reuse third‑party code to avoid reinventing the wheel.
Unit Tests – write tests that verify your own code works as intended.
File I/O – read from and write to files for persistent storage.
Regular Expressions – define patterns to validate or extract text data.
These concepts are introduced later in the course and build on the foundations covered above.
📝 Comments & Pseudo‑code
Comment – text preceded by # that the interpreter ignores; used to explain code to humans.
Single‑line comment example:
# ask user for their name
Multi‑line comment technique using triple quotes:
"""
This block serves as a comment.
It can span multiple lines.
"""
Pseudo‑code: informal, language‑agnostic description of program steps (e.g., “step 1: ask user for their
name”). Helpful for planning before writing actual code.
🔤 Strings, Concatenation & the + Operator
String – a sequence of characters (text) stored in a variable.
Concatenation joins two strings into one. The + operator performs concatenation (not numeric
addition).
greeting = "Hello, " + name
Chaining more than two strings is possible but can become hard to read:
message = "Hi, " + first_name + " " + last_name + "!"
For readability, prefer other techniques (comma separation, sep, or f‑strings) when combining many
parts.
📤 Printing: Arguments, sep, & end
print() – built‑in function that outputs text to the console.
Positional arguments: values separated by commas are printed in order. Python inserts a default
separator (sep) of a single space between them.
print("Hello,", name) # output: Hello, David
sep parameter: changes the string inserted between multiple arguments.
print("Hello,", name, sep="---")
# output: Hello,---David
end parameter: defines what is printed after the last argument. Default is "\n" (newline). Setting end=""
keeps the cursor on the same line.
print("Hello, ", end="") # no newline
print(name) # continues on the same line
Combining sep and end offers fine‑grained control over output formatting.
🔧 Escape Characters
Escape character (\) – signals that the following character should be interpreted specially.
Common escapes:
\n – newline
\" – double‑quote inside a double‑quoted string
\' – single‑quote inside a single‑quoted string
\\ – literal backslash
print("She said, \"Hello!\"") # output includes the inner quotes
Useful for embedding quotes or other special symbols without terminating the string.
📦 Formatted (f‑) Strings
f‑string – string prefixed with f that allows expressions inside {} to be evaluated and inserted.
Simplifies embedding variable values directly within a string.
print(f"Hello, {name}")
VS Code highlights the {} sections, indicating they will be evaluated at runtime.
Preferred for readability over manual concatenation, especially with many variables.
🛠️ Practical Tips & Best Practices
Comment frequency – place a comment every one or few lines to clarify intent.
Use pseudo‑code when planning a program; convert steps to actual code afterward.
Choose the right string‑building method:
Simple concatenation (+) for two parts.
Comma‑separated print for quick output with automatic spacing.
sep/end for custom delimiters or line control.
f‑strings for clear, maintainable interpolation.
Escape quotes when the same quote character is needed inside the string, or switch between single
and double quotes.
🧹 String Methods for Cleaning Input
Method – a function that is attached to an object (e.g., a string) and is invoked with a dot (.).
strip() – removes whitespace from the beginning and end of a string.
name = name.strip()
capitalize() – makes the first character uppercase and the rest lowercase.
name = name.capitalize()
title() – capitalizes the first character of each word (useful for full names).
name = name.title()
lstrip() / rstrip() – remove whitespace only from the left or right side, respectively.
Method chaining
You can apply several methods in one expression; the result of the left‑most method becomes the input for the
next.
name = name.strip().title()
The assignment operator (=) copies the final return value back into name.
Chaining reduces line count but can make a line long; readability is a matter of style.
📂 Splitting a String
split(separator) – divides a string at each occurrence of separator and returns a list of substrings.
first, last = name.split(" ")
The left‑hand side can unpack the resulting list directly into separate variables.
Useful for extracting a first name from a full name entered by the user.
🔢 Integers & Basic Arithmetic
int – a numeric type representing whole numbers (…, ‑2, ‑1, 0, 1, 2, …).
Common operators:
Operator Meaning Example
+ addition 3+4
- subtraction 5-2
* multiplication 6 \times 7
/ division (float result) 8/2
% modulo – remainder after division 8%3 yields 2
🖥️ Interactive (REPL) Mode
REPL – “Read‑Eval‑Print Loop”; an interactive prompt where each line is executed immediately.
Launch with python (no file name).
Acts as a quick calculator: typing 1 + 2 returns 3.
Helpful for testing snippets without creating a script file.
🧮 Building a Simple Calculator
1. Prompt for input (still returns strings).
2. Convert each entry to an integer with int().
3. Perform arithmetic and print the result.
x = int(input("What is X? "))
y = int(input("What is Y? "))
print(x + y)
Nesting functions: int(input(...)) calls input first, then feeds its string result to int.
The temporary variable z is optional; you can print the expression directly.
Style note
Removing an unused variable (z) shortens the program and clarifies intent.
Over‑chaining (e.g., int(input(...)).strip().title()) can become hard to read; balance brevity with clarity.
📊 Comparing Two Approaches
Aspect Separate statements One‑liner (chained)
Readability Clear step‑by‑step flow Concise but may be dense
Line count More lines Fewer lines
Debugging Easy to isolate each step Harder to pinpoint errors
Typical use Teaching, collaborative codebases Scripts where brevity is prized
🛠️ Additional String Utilities Mentioned
lstrip() – removes leading whitespace only.
rstrip() – removes trailing whitespace only.
These can be combined similarly:
clean = raw.lstrip().rstrip()
📌 Key Takeaways
String methods (strip, capitalize, title, split) let you clean and reshape user input without extra
libraries.
Method chaining updates a variable in a single line, but watch line length for maintainability.
int() converts a numeric‑looking string to an integer, enabling true arithmetic.
The REPL is a fast sandbox for experimenting with expressions and functions.
Code style (separate vs. chained statements) is a trade‑off between readability and brevity; choose
the convention that best fits your team or project.
💡 Readability vs. Conciseness
Readability is the ease with which another programmer (or you later) can understand code.
Conciseness is the amount of functionality expressed in as few characters as possible.
One‑liner example (compact but hard to read):
print(int(input("X? ")) + int(input("Y? ")))
Multi‑line version (clear separation of steps):
x = int(input("X? "))
y = int(input("Y? "))
print(x + y)
Trade‑off considerations
Fewer lines ⇒ less visual clutter, but more mental parsing (matching parentheses, tracking
temporary values).
More lines ⇒ easier debugging, clearer intent, lower chance of syntax errors.
Best practice: prioritize readability, especially when functions involve user input or multiple nested calls.
🔢 Floating‑Point Numbers
A float (floating‑point value) represents a real number with a decimal point, e.g., 3.14.
Convert user input to a float:
x = float(input("Enter first number: "))
y = float(input("Enter second number: "))
Example calculation:
print(x + y) # 1.2 + 3.4 → 4.6
Precision limit: floats cannot represent infinitely many decimal digits; they are stored with a fixed
amount of binary memory, leading to rounding errors after a certain length.
🔁 Rounding Numbers
The built‑in round() function rounds a numeric value to a desired precision.
Basic usage (nearest integer):
result = round(x + y) # no second argument → integer
Specifying decimal places (second optional argument):
rounded = round(x / y, 2) # rounds to two digits after the decimal point
Documentation convention: items in square brackets ([ ]) denote optional parameters.
📊 Number Formatting with f‑Strings
f‑strings allow inline expression evaluation and formatting inside string literals.
Adding thousands separators (commas)
z = 1000
print(f"{z:,}") # → 1,000
Controlling decimal places (:.2f → two fractional digits)
value = 2 / 3
print(f"{value:.2f}") # → 0.67
Combining both features
big = 1234567.8912
print(f"{big:,.2f}") # → 1,234,567.89
Note: the format spec follows a colon after the variable name inside the braces.
⚙️ Defining Your Own Functions
def introduces a new function definition.
Minimal function (no parameters)
def hello():
print("Hello")
Calling the function
hello() # prints: Hello
NameError occurs when a function is called before it has been defined.
🔀 Parameters, Arguments, and Default Values
Parameter – variable listed in the function’s definition.
Argument – actual value supplied at the call site.
def greet(name):
print(f"Hello, {name}")
greet("David") # argument "David" → parameter name
Default parameter values provide a fallback when no argument is passed.
def hello(to="world"):
print(f"Hello, {to}")
hello() # → Hello, world
hello("Alice") # → Hello, Alice
Inside the function, the argument value is copied to the parameter’s name, allowing the original variable
name to differ from the parameter name.
🛠️ Practical Tips
Prefer explicit variable names (x, y) over chaining multiple calls in one line for complex logic.
Indent consistently (four spaces) so Python can correctly associate statements with their function
bodies.
Use f‑strings for both interpolation and formatting; they replace older %‑style formatting and are
easier to read.
Round only when needed; keep the original float for further calculations to avoid cumulative rounding
errors.
Define reusable functions to avoid repeating the same code block (e.g., a custom hello function).
When adding new features (e.g., handling floats), adjust type conversion consistently (int → float)
to keep the program behavior predictable.
📂 Function Order & NameError
NameError – occurs when Python tries to use a name (variable or function) that hasn’t been defined yet.
Python reads a file top‑to‑bottom.
A function must be defined before it is called.
If a call appears earlier, the interpreter raises NameError: name 'hello' is not defined.
✅ Standard pattern
1. Define a main() function that contains the program’s primary flow.
2. Define helper functions (e.g., hello) anywhere below main.
3. Call main() at the very end of the file.
def main():
name = input("What’s your name? ")
hello(name)
def hello(person):
print(f"Hello, {person}")
main() # triggers the program
This arrangement lets you keep the logical “top‑to‑bottom” reading order while still using functions
defined later.
📦 Scope
Scope – the region of a program where a variable name is valid and can be accessed.
A variable created inside a function (e.g., name inside main) is local to that function.
Other functions cannot see that variable unless it is passed as an argument.
def main():
name = input("Your name: ")
hello() # ❌ raises NameError – `name` not visible here
def hello():
print(name) # `name` undefined in this scope
Solution: pass the value as a parameter (hello(name)) or define the variable in a broader scope (e.g.,
global, though globals are discouraged).
🔁 Return Values
return – keyword that exits a function and sends a value back to the caller.
Functions that only print have a side effect but produce no usable value.
Using return lets other code capture the result for further computation.
def square(n):
return n * n # value can be stored or printed elsewhere
The returned value can be passed directly to another function, e.g., print(square(x)).
📐 Building a Simple Calculator
Steps demonstrated
1. Prompt for a number and convert it to int.
2. Call a custom square() function to compute x².
3. Print the result.
def main():
x = int(input("Enter X: "))
print(f"x² is {square(x)}")
def square(n):
return n * n
Alternative exponentiation syntax
n ** 2 – raises n to the power of 2.
pow(n, 2) – built‑in function with the same effect.
All three approaches (n * n, n ** 2, pow(n, 2)) yield the same result; choose the one that best conveys intent.
⚖️ Comparison Operators
Operator Meaning Example
> greater than x>y
>= greater than or equal x >= y
< less than x<y
<= less than or equal x <= y
== equality (comparison) x == y
!= inequality x != y
A single = performs assignment, not comparison.
Two = (==) checks whether the left‑hand and right‑hand values are equal.
🧩 Conditional Statements
Basic if
if x < y:
print("x is less than y")
The line ends with a colon (:).
The indented block (four spaces or one tab) runs only if the Boolean expression is True.
Adding elif (else‑if)
if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
elif is evaluated only when previous conditions were False.
Prevents unnecessary checks, making the flow mutually exclusive.
Final else
if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else:
print("x equals y")
else captures all remaining cases; no additional condition needed.
Guarantees that exactly one block runs.
Boolean expressions
Boolean expression – a statement that evaluates to True or False.
Used as the condition after if, elif, or while.
Example: x != y evaluates to True when x and y differ.
📊 Visualizing Control Flow
Flowcharts illustrate how a program moves from start to finish.
Key symbols:
Oval – start/end.
Diamond – decision point (Boolean test).
Arrows – direction of execution based on true/false outcomes.
With if / elif / else, the flowchart narrows after a true branch, skipping later tests, which reduces the
number of questions the program asks.
🛠️ Best‑Practice Takeaways
Define all functions before they are invoked, or wrap the entry point in main() and call it last.
Pass needed data between functions via parameters to respect scope rules.
Use return when you need a function’s result for further processing; reserve print for user‑facing output.
Choose the appropriate comparison operator (== vs =) to avoid accidental assignment.
Structure conditionals with if → elif → else to create clear, mutually exclusive branches and minimize
redundant checks.
These patterns keep Python code readable, maintainable, and free of runtime NameError or scope issues.
🔀 Simplifying Conditional Chains with else 🚦
else – the fallback block that runs when all preceding if/elif tests are false.
Replacing a final elif that only checks x == y with a single else removes the need for a third comparison.
The flowchart shrinks: fewer nodes → fewer arrows → reduced logical complexity.
Fewer lines of code lower the chance of mistakes and improve readability.
if x < y:
print("x is less than y")
elif x > y:
print("x is greater than y")
else: # x must equal y
print("x is equal to y")
🧩 Combining Conditions with or and and ⚙️
Using or for “not equal”
or – logical operator that yields true if any operand is true.
if x < y or x > y:
print("x is not equal to y")
else:
print("x is equal to y")
Using and for range checks
and – logical operator that yields true only when both operands are true.
if 90 <= score <= 100: # chained comparison (same as 90 <= score and score <= 100)
print("A")
elif 80 <= score < 90:
print("B")
elif 70 <= score < 80:
print("C")
elif 60 <= score < 70:
print("D")
else:
print("F")
Chained comparisons are a Pythonic shortcut; they evaluate left‑to‑right and stop early when a
condition fails.
📊 Equality vs. Inequality Checks 🔎
Operator Meaning Example
== equality test x == y
!= inequality test x != y
Using != lets us ask a single question instead of two separate < / > checks:
if x != y:
print("x is not equal to y")
else:
print("x is equal to y")
📈 Grading Program Example 🎓
Problem
Assign a letter grade based on a numeric score (0‑100).
Initial verbose version
if score >= 90 and score <= 100:
print("A")
elif score >= 80 and score <= 89:
print("B")
elif score >= 70 and score <= 79:
print("C")
elif score >= 60 and score <= 69:
print("D")
else:
print("F")
Optimized version (fewer comparisons)
if score >= 90:
print("A")
elif score >= 80:
print("B")
elif score >= 70:
print("C")
elif score >= 60:
print("D")
else:
print("F")
Once a higher bound fails, the next elif implicitly knows the score is below the previous threshold,
eliminating the upper‑bound check.
Key: keep conditions mutually exclusive; otherwise multiple blocks could execute.
Grade range table
Score range Grade
90 ≤ score ≤ 100 A
80 ≤ score < 90 B
70 ≤ score < 80 C
60 ≤ score < 70 D
otherwise F
🔁 Using Modulo (%) for Parity 🧮
% (modulo) – returns the remainder after integer division.
Even numbers have a remainder of 0 when divided by 2.
Odd numbers have a remainder of 1.
x = int(input("What is X? "))
if x % 2 == 0:
print("even")
else:
print("odd")
🛠️ Defining Helper Functions and Boolean Returns ✅
Boolean type
bool – a data type that can be only True or False (capitalized).
is_even helper function
def is_even(n):
if n % 2 == 0:
return True
return False
Using the helper in main
def main():
x = int(input("Enter X: "))
if is_even(x):
print("even")
else:
print("odd")
main()
Encapsulating the parity check makes the main logic clearer and promotes reuse.
📏 Syntax Essentials: Indentation & Colons
Indentation defines block structure in Python; missing or inconsistent indentation causes a
IndentationError.
Colon (:) terminates the header of if, elif, else, def, for, etc., signalling the start of an indented block.
Both are required; Python does not use curly braces for grouping statements.
🛠️ Creating Your Own Function: is_even
Function – a reusable block of code that can accept arguments and optionally return a value.
The custom is_even function determines whether a number is even and returns a Boolean (True or
False).
It can be used directly in conditionals, eliminating the need for explicit comparison each time.
def is_even(n):
if n % 2 == 0:
return True
return False
Pythonic one‑liner (ternary expression)
def is_even(n):
return True if n % 2 == 0 else False
Most concise form (direct Boolean expression)
def is_even(n):
return n % 2 == 0
📚 Functions as Boolean Expressions in Conditionals
Boolean expression – any expression that evaluates to True or False.
Because is_even returns a Boolean, it can serve as the test in an if statement:
if is_even(x):
print("even")
else:
print("odd")
This keeps the conditional readable and abstracts the parity test into a reusable component.
❓ Common Questions About Arguments & Methods
Pass‑by‑reference / address – No. Python passes arguments by object reference; there is no
separate “address” mechanism like in Java or C++.
Using the dot operator on custom functions – You can only call methods (e.g., strip(), title()) on
objects that implement them.
A Boolean (True/False) has no string‑related methods, so is_even(...).strip() is invalid.
If a custom function returns a string, you may chain string methods on that result.
🐍 Writing “Pythonic” Code
Pythonic – code that follows idiomatic Python conventions, often more concise and readable.
Replace a four‑line if/else block with a single return statement (as shown above).
Use English‑like syntax:
return True if n % 2 == 0 else False
The interpreter reads this almost as a natural language sentence: “return true if the remainder of n
divided by 2 is zero, otherwise false.”
🔀 Pattern Matching with match / case
match statement – Python’s structural pattern matching (introduced in Python 3.10), similar in spirit to switch in
other languages.
Basic syntax
match value:
case pattern1:
# block
case pattern2:
# block
case _:
# default block
The underscore (_) acts as a catch‑all pattern, analogous to else in an if chain.
🏠 Example: Determining a Hogwarts House
1️⃣ Using if / elif / else (initial version)
name = input("What’s your name? ")
if name == "Harry" or name == "Hermione" or name == "Ron":
print("Gryffindor")
elif name == "Draco":
print("Slytherin")
else:
print("who?")
2️⃣ Consolidating with or
The three Gryffindor checks are combined into a single condition using the or operator (already shown
above).
3️⃣ Refactoring with match
name = input("What’s your name? ")
match name:
case "Harry" | "Hermione" | "Ron":
print("Gryffindor")
case "Draco":
print("Slytherin")
case _:
print("who?")
The vertical bar (|) groups multiple literal patterns into one case, reducing repetition.
The final case _: provides the default response for any unhandled name.
🔁 Loops: The while Statement
while loop – repeatedly executes a block as long as its Boolean condition remains True.
Motivating example (printing “meow” three times)
i = 3
while i != 0:
print("meow")
i = i - 1 # decrement to eventually break the loop
Without the decrement (i = i - 1), the condition would stay true forever, producing an infinite loop.
Controlling an accidental infinite loop
Press Ctrl + C in the terminal to interrupt execution.
⏹️ Counting Up vs. Counting Down
The same behavior can be expressed by counting upward:
i = 1
while i <= 3:
print("meow")
i = i + 1 # increment toward the terminating condition
Choose the direction that feels most natural; the loop logic remains identical.
🧭 Visualizing Loop Flow (flowchart reminder)
Oval – start/end points.
Diamond – Boolean test (i != 0 or i <= 3).
Arrows – indicate the path taken when the test is True (repeat) or False (exit).
Seeing the loop as a flowchart helps confirm that the variable (i) is being updated each iteration, guaranteeing
eventual termination.
🔄 Incrementing & Counting Basics
Assignment operator (=) copies the value on the right‑hand side to the name on the left.
In statements like i = i + 1 the right‑hand side is evaluated first, then the result overwrites i.
Convention: most programmers start counting at 0 instead of 1.
Benefits: aligns with zero‑based indexing used by lists and many APIs.
Example change: i = 0 rather than i = 1.
Shorthand increment: i += 1 does exactly the same as i = i + 1 but with fewer keystrokes.
i = 0
i += 1 # i is now 1
🐱 Loop Variants
⏳ While Loop Refresher
A while loop repeats its block as long as its Boolean condition stays true.
i = 3
while i != 0:
print("meow")
i = i - 1 # prevents an infinite loop
Off‑by‑one bugs often arise when the loop condition (<=, <, !=) doesn’t match the intended number of
iterations.
Switching from <= 3 to < 3 fixes a bug where the cat meowed four times instead of three.
⚠️ Common Pitfalls
Symptom Typical cause
Too many iterations Using <= when < is appropriate
Zero iterations (or none) Starting index wrong (e.g., i = 1 instead of 0)
Endless execution (Ctrl + C) Forgetting to update the loop variable or using while
True without a break
🔁 For Loops & Lists
📦 Lists in Python
A list holds an ordered collection of values inside square brackets [].
students = ["Hermione", "Harry", "Ron"]
Lists are zero‑based: the first element is at index 0.
🔄 Iterating Over a List
for i in [0, 1, 2]:
print("meow")
Python automatically assigns each element of the list to i on successive iterations.
📏 Using range() for Dynamic Lists
range(n) generates a sequence of numbers 0 … n‑1.
for i in range(3): # equivalent to [0, 1, 2]
print("meow")
Scaling to large counts (e.g., a million) requires only range(1_000_000)—no manual enumeration.
🟦 Underscore Placeholder
When the loop variable isn’t needed, the idiomatic placeholder is a single underscore _.
for _ in range(n):
print("meow")
Signals to readers that the value is intentionally ignored.
📐 String Multiplication & Formatting
✖️ Multiplying Strings
The * operator repeats a string a given number of times.
print("meow " * 3) # → meow meow meow
Combining with newline characters produces separate lines.
⬇️ Escape Sequence \n
\n inserts a line break inside a string literal.
print("meow\n" * 3)
🛠️ Controlling print Endings
print(..., end="") replaces the default newline (\n) that follows each call.
print("meow\n" * 3, end="") # no extra blank line at the end
♾️ Infinite Loops & Input Validation
🔁 while True Pattern
while True: creates an intentionally infinite loop that must be exited with break.
while True:
n = int(input("How many times? "))
if n > 0:
break # exit once a positive number is entered
# otherwise loop repeats
⏭️ continue vs. break
Keyword Effect
continue Skip the remainder of the current iteration and start
the next one.
break Exit the innermost loop immediately.
Used together, they enforce “keep asking until the input satisfies a condition”.
📦 Helper Function get_number
def get_number():
while True:
n = int(input("Enter a positive integer: "))
if n > 0:
return n # returns the validated value and ends the function
return hands a value back to the caller, allowing the surrounding code to use the validated number.
📦 Encapsulating Logic in Functions
🧩 Defining main and meow
def meow(times):
for _ in range(times):
print("meow")
def main():
n = get_number()
meow(n)
if __name__ == "__main__":
main()
Side effect: print displays output.
Return value: get_number provides a usable integer to main.
🛠️ Design Considerations
Separate concerns: input validation (get_number) vs. output (meow).
Reuse: meow can be called with any positive integer.
📚 Indexing Lists
Indexing accesses an element by its position using square brackets: list[index].
first_student = students[0] # "Hermione"
second_student = students[1] # "Harry"
Negative indices count from the end (-1 → last element).
Attempting to access an index outside the list length raises IndexError.
📋 Zero‑Based Indexing
Zero‑based indexing – In Python, the first element of a list is at position 0, the second at 1, the third at 2, and
so on.
Accessing elements:
students = ["Hermione", "Harry", "Ron"]
print(students[0]) # Hermione
print(students[1]) # Harry
print(students[2]) # Ron
Remembering the offset becomes automatic with practice; avoid hard‑coding indices when the list size
may change.
🔁 Iterating Over a List Directly
For‑loop over an iterable – Python can loop over any iterable object (lists, strings, etc.) without explicit
counters.
Simple iteration:
for student in students:
print(student)
The loop variable (student) receives each element in order, so the code works for any list length—no
need to know the size beforehand.
Choose descriptive loop variables; using _ as a placeholder is acceptable only when the value is truly
unused.
🔢 Using range and len for Index‑Based Loops
len() – Returns the number of items in a sequence (e.g., a list).
range(stop) – Generates a sequence of integers from 0 up to but not including stop.
Combining both lets you iterate by numeric index:
for i in range(len(students)):
print(i, students[i])
Adding 1 to the index gives a human‑friendly rank:
for i in range(len(students)):
print(i + 1, students[i])
This pattern is useful when the loop body needs the index itself (e.g., for numbering output).
📚 Dictionaries (Key‑Value Pairs)
Dictionary (dict) – A collection that maps keys to values using curly braces {}.
Basic creation and literal syntax:
houses = {
"Hermione": "Gryffindor",
"Harry": "Gryffindor",
"Ron": "Gryffindor",
"Draco": "Slytherin"
}
Accessing a value by key:
print(houses["Hermione"]) # Gryffindor
Keys can be any immutable type (strings are common); values can be any object.
The special constant None represents the absence of a value (used, for example, when a student has no
Patronus).
🔄 Looping Over Dictionaries
Iterating a dict – By default, a for loop over a dictionary yields its keys.
for student in houses:
print(student, houses[student])
Output shows each key followed by its associated value.
If only the values are needed, use houses.values().
To iterate over both simultaneously, use houses.items() (not shown in the transcript but a natural
extension).
📋 List of Dictionaries (Complex Records)
List of dictionaries – A common way to store multiple records where each record has the same set of fields.
students = [
{"name": "Hermione", "house": "Gryffindor", "patronus": "Otter"},
{"name": "Harry", "house": "Gryffindor", "patronus": "Stag"},
{"name": "Ron", "house": "Gryffindor", "patronus": "Jack Russell Terrier"},
{"name": "Draco", "house": "Slytherin", "patronus": None}
]
The outer square brackets denote a list; each inner pair of curly braces denotes a dictionary
representing one student.
All dictionaries share the same keys (name, house, patronus) but have unique values.
🛠️ Accessing Nested Data
Looping through the list and printing specific fields:
for student in students:
print(student["name"], student["house"])
Including the patronus (handling None gracefully):
for student in students:
print(student["name"], student["house"], student["patronus"])
Customizing output with sep and end (as introduced earlier) can produce comma‑separated lines:
for student in students:
print(student["name"], student["house"], sep=", ")
📊 Summary Tables
Concept Syntax Typical Use
Zero‑based indexing list[0], list[1] Direct element access
For‑loop over list for item in list: Process each element without
indices
Index‑based loop for i in range(len(list)): Need the numeric position
Dictionary literal {key1: value1, key2: value2} Map keys → values
Access dict value dict[key] Retrieve associated data
Iterate dict keys for key in dict: Work with identifiers
List of dicts [{'k': v}, {'k': v}] Store structured records
Access nested value list[i]["key"] Pull a field from a specific record
📚 Dictionaries & Efficient Lookup
Dictionary – a mutable collection that maps unique keys to values.
Internally implemented as a hash table, providing average‑case O(1) time for look‑ups, insertions, and
deletions, even when the dictionary contains thousands of entries.
Keys can be any immutable type (commonly strings).
Values can be any object, including other containers (e.g., strings, lists, dictionaries).
No need to sort for fast access; Python’s hash‑based design finds a key directly.
Sorting for display (optional)
sorted_names = sorted(students_dict) # alphabetical list of keys
for name in sorted_names:
print(name, students_dict[name])
🧩 Accessing Nested Data
When a list holds dictionaries (records), each dictionary can store several related fields, such as a student’s name,
house, and Patronus.
students = [
{"name": "Hermione", "house": "Gryffindor", "patronus": "Otter"},
{"name": "Harry", "house": "Gryffindor", "patronus": "Stag"},
{"name": "Ron", "house": "Gryffindor", "patronus": "Jack Russell Terrier"},
{"name": "Draco", "house": "Slytherin", "patronus": None}
]
for s in students:
print(s["name"], s["house"], s["patronus"])
Use the key inside square brackets to retrieve the associated value.
None indicates the absence of a Patronus.
🏗️ Building Reusable Functions (Abstraction)
Abstraction – hiding the internal details of a piece of code behind a clear, stable interface (function name &
parameters).
print_column – vertical stack of bricks
def print_column(height):
for _ in range(height):
print("#")
height determines how many # symbols are printed, one per line.
print_row – horizontal line of bricks
def print_row(width):
print("#" * width)
Uses string multiplication to repeat # without an explicit loop.
print_square – 2‑D block of bricks
def print_square(size):
for _ in range(size): # each iteration prints one row
print("#" * size) # row of `size` bricks
Combines the ideas of print_row (horizontal) and an outer loop (vertical).
Alternative nested‑loop version
def print_square_nested(size):
for i in range(size): # rows
for j in range(size): # columns
print("#", end="") # stay on same line
print() # newline after each row
Demonstrates nested loops (i for rows, j for columns) and explicit control of line endings with end="".
🔁 Loops for Two‑Dimensional Structures
Outer loop → iterates over rows (top‑to‑bottom).
Inner loop → iterates over columns (left‑to‑right) within the current row.
The pattern mirrors how a printer or typewriter outputs text: line by line, each line left‑to‑right.
Loop level Variable Purpose
Outer i Select current row (0 … size‑1)
Inner j Print each column in that row
📐 Alternative Implementations & Formatting Tricks
String multiplication ("#" * n) replaces a simple inner loop for fixed‑character rows.
Controlling the line break with print(..., end="") allows multiple characters to appear on the same line
before a manual print() inserts the newline.
for _ in range(3):
print("#", end="") # stays on same line
print() # final newline
🧭 Design Principles
Stable interface: callers use print_column(height) without caring whether the function uses a loop or
multiplication internally.
Implementation freedom: you can refactor the body (e.g., replace a loop with a one‑liner) as long as
the name, parameters, and return behavior stay the same.
Separation of concerns: main() orchestrates program flow, while helper functions (print_column,
print_row, print_square) handle specific rendering tasks.
⚡ Performance of Dictionaries
Look‑ups are constant‑time on average, making them suitable for large data sets (e.g., thousands of
student records).
No need to traverse the entire collection; the hash function jumps directly to the bucket containing the
key.
🧪 Exception Handling Primer
Exception – an event that disrupts normal program flow, typically raised when an operation encounters an
unexpected condition.
Common runtime error demonstrated
x = int(input("Enter an integer: "))
If the user types a non‑numeric string (e.g., "cat"), Python raises a ValueError:
ValueError: invalid literal for int() with base 10: 'cat'
ValueError signals that a built‑in conversion function received an inappropriate value.
Defensive strategy (outline)
Validate input before conversion or wrap the conversion in a try/except block (full syntax not shown in
the transcript).
This protects the program from crashing due to unexpected user input.
🛡️ Defensive Programming & Error Handling
Defensive programming – writing code that anticipates and gracefully handles unexpected user actions or
malicious inputs.
Assume users may:
Ignore prompts.
Provide the wrong type (e.g., strings instead of numbers).
Attempt to crash the program.
🔧 try / except Syntax
try – attempts to execute a block of code that might raise an exception.
except – runs when a specified exception occurs within the preceding try block.
try:
x = int(input("Enter an integer: "))
print(f"x is {x}")
except ValueError:
print("x is not an integer")
Indentation matters: only the indented lines belong to the try.
Exception names are case‑sensitive (ValueError not valueerror).
📛 Common Exceptions
Exception When it occurs Typical cause in this context
ValueError Raised by int() when the input User types “cat”, “dog”, etc.
cannot be parsed as an integer.
NameError Raised when a variable is x never assigned because a
referenced before it exists in the ValueError interrupted the
current scope. assignment.
NameError – indicates that the interpreter cannot find a definition for the given name at the point of use.
📊 Exception Flow & Variable Scope
The right‑hand side of an assignment (int(input(...))) is evaluated first.
If a ValueError is raised, the assignment never completes, leaving the variable undefined.
Consequently, code that later references the variable triggers a NameError.
🔁 try / except / else
else – executes only when the try block finishes without raising an exception.
try:
x = int(input("Enter an integer: "))
except ValueError:
print("x is not an integer")
else:
print(f"x is {x}")
The else clause replaces the need to repeat the successful‑case code after the try.
🔄 Looping for Robust Input
while True – creates an infinite loop that continues until break is encountered.
while True:
try:
x = int(input("Enter an integer: "))
except ValueError:
print("x is not an integer")
else:
break # exit loop once a valid integer is obtained
print(f"x is {x}")
The loop repeatedly prompts the user until a proper integer is supplied.
break immediately terminates the loop, allowing execution to continue after it.
📦 Abstracting Validation into a Function
Encapsulating the repeated pattern in a reusable function improves readability and reuse.
def get_int():
while True:
try:
x = int(input("Enter an integer: "))
except ValueError:
print("x is not an integer")
else:
return x # return exits the function (and the loop)
Main program can now be reduced to three lines:
def main():
x = get_int()
print(f"x is {x}")
main()
⚡ return vs break
break – exits the nearest enclosing loop only.
return – exits the current function and any surrounding loops, passing a value back to the caller.
In the get_int example, return x both ends the while loop and provides the integer to main().
📚 Key Takeaways
Use try / except to catch anticipated errors such as ValueError when converting user input.
Place the minimal code that can raise the exception inside the try block to avoid catching unrelated
errors.
The else clause runs only on successful execution of the try, keeping success‑path logic separate
from error handling.
Combine while True with break (or return) to repeatedly prompt until valid input is received.
Abstract common patterns (e.g., integer input) into functions for cleaner, reusable code.
🔧 Refactoring get_int for Conciseness
Refactor – restructure code to be shorter or clearer without changing its behavior.
Return directly from the try block instead of assigning to an intermediate variable:
def get_int():
while True:
try:
return int(input("Enter X: "))
except ValueError:
pass # silently ignore the error and loop again
Eliminate unnecessary variables:
If a variable is defined only to be used once, replace it with the expression that produces its
value.
Example: x = int(input("X? ")) → return int(input("X? ")).
Trade‑off: fewer lines reduce the chance of typo‑related bugs, but the return point may be less obvious.
Choose consciously and be able to justify the style.
🚦 Silent Error Handling with pass
pass – a no‑op statement used when a block is syntactically required but no action is desired.
In a try/except pair, pass lets the program catch an exception without producing any output:
while True:
try:
x = int(input("Enter X: "))
break
except ValueError:
pass # keep looping silently
Effect on callers: the caller receives no indication that an error occurred; the loop simply repeats the
prompt.
📐 Indentation & Control Flow
Indentation – whitespace that groups statements under a control structure (e.g., def, while, try).
Every indented block belongs to the statement directly above it.
Example hierarchy (4‑space indent levels):
def get_int(): # level 0
while True: # level 1
try: # level 2
x = ... # level 3
except ValueError: # level 2
pass # level 3
Consistent indentation makes the logical flow explicit and improves readability for any future reader.
❓ Error Propagation & isnumeric
isnumeric() – a string method that returns True if all characters are numeric.
Using isnumeric() before conversion is an alternative to try/except:
s = input("Enter a number: ")
if s.isnumeric():
x = int(s)
else:
# handle non‑numeric input
Pythonic preference: attempt the operation (try) and handle failure (except). Checking first (if …
isnumeric()) is perfectly valid when explicit validation is desired.
🔁 Making get_int Reusable with a Prompt Parameter
Parameter – a variable listed in a function’s definition that receives an argument at call time.
Adding a prompt argument decouples the function from a hard‑coded message:
def get_int(prompt):
while True:
try:
return int(input(prompt))
except ValueError:
pass
Call sites can now supply any prompt ("Enter age: ", "What is your score? "), enhancing reusability.
📦 Modules & Import Styles
Module – a file containing Python code (functions, classes, variables) that can be imported elsewhere.
Import form Scope effect Typical use
import random All names stay under the random. Avoids name clashes; explicit
namespace source
from random import choice choice appears directly in the local Shorter calls when name conflict is
namespace unlikely
Importing the whole module keeps the original namespace intact, preventing accidental overwriting of
functions with the same name.
Importing specific symbols reduces typing but requires vigilance about name collisions.
🎲 Random Module Overview
random – a standard library module that provides functions for generating pseudo‑random data.
Key functions discussed:
Function Purpose Typical call pattern
random.choice(seq) Randomly select one element from choice(["heads", "tails"])
a sequence (list, tuple, etc.)
random.randint(a, b) Return a random integer inclusive randint(1, 10)
of both endpoints
random.shuffle(seq) Randomly reorder a mutable shuffle(cards)
sequence in place
🔀 Using random.choice
import random
coin = random.choice(["heads", "tails"])
print(coin)
The list ["heads", "tails"] is the sequence argument.
Each call yields one element with equal probability (50 % for two items).
🔢 Generating Random Integers with randint
import random
number = random.randint(1, 10) # 1 ≤ number ≤ 10
print(number)
The bounds are inclusive; both 1 and 10 can appear.
Useful for simple games, simulations, or any situation requiring an unbiased integer selection.
🔀 Shuffling Lists with random.shuffle
import random
cards = ["Jack", "Queen", "King"]
random.shuffle(cards) # modifies `cards` directly
for card in cards:
print(card)
shuffle does not return a new list; it rearranges the original list in place.
Iterating afterward prints the cards in their new random order.
🧩 Practical Tips & Best Practices
Explicit returns (return int(input(...))) reduce variable churn and shrink the function body.
Use pass when you want to silently ignore caught exceptions; remember this hides the error from
callers.
Keep indentation consistent (4 spaces) to clearly delineate logical blocks.
Prefer import module for large libraries to protect against name collisions; switch to from module import
name for brevity when the risk is low.
When building reusable utilities, accept parameters (e.g., a prompt string) rather than hard‑coding
messages.
Remember that random.shuffle mutates its argument; if you need the original order later, work on a
copy (cards.copy()).
🎲 Random Module – Simple Randomness Functions
The random module supplies easy‑to‑use functions that return values with equal probability.
Function Purpose Example Output Typical Use
random.choice(seq) Randomly selects one "heads" or "tails" Simulating a coin toss
element from a sequence
(list, tuple, etc.)
random.randint(a, b) Returns a random integer 7 when called as Picking a number in a
inclusive of both a and b randint(1,10) range
random.shuffle(seq) In‑place rearranges the ['queen', 'king', 'jack'] after Randomizing order of
elements of a mutable shuffling cards, names, etc.
sequence ['jack','queen','king']
These functions are user‑friendly and give each possible outcome the same chance.
To obtain biased probabilities you must write your own logic or use more advanced utilities (not
covered by these three functions).
import random
cards = ["Jack", "Queen", "King"]
random.shuffle(cards) # cards now in a random order
for card in cards:
print(card) # prints each card on its own line
📊 Statistics Module – Computing an Average
statistics.mean(iterable) returns the arithmetic mean of the supplied numeric values.
import statistics
average = statistics.mean([100, 90])
print(average) # → 95
The module must be imported separately from random.
Useful for quick analysis of grades, scores, or any numeric dataset.
💻 Command‑Line Arguments (sys.argv)
sys.argv is a list containing every word typed after the python command, including the script name at index 0.
import sys
print(sys.argv) # e.g. ['name.py', 'David']
Zero‑based indexing: sys.argv[0] → script name, sys.argv[1] → first user‑supplied argument.
Supplying multiple words without quotes creates separate list elements; quoting groups them into a
single element.
🛡️ Common Pitfall – IndexError
Attempting to read sys.argv[1] when no argument was provided raises IndexError: list index out of range.
$ python name.py
Traceback (most recent call last):
File "name.py", line 4, in
print(sys.argv[1])
IndexError: list index out of range
This is one of the most frequent mistakes when working with command‑line arguments.
🔧 Defensive Programming: Length Checks & Conditionals
Before accessing list elements, verify the length of sys.argv.
import sys
if len(sys.argv) < 2:
print("Too few arguments")
elif len(sys.argv) > 2:
print("Too many arguments")
else:
print(f"Hello, my name is {sys.argv[1]}")
Condition Meaning
len(sys.argv) < 2 No user argument supplied
len(sys.argv) > 2 More than one argument (e.g., extra words)
else Exactly one argument – safe to use sys.argv[1]
This approach prevents IndexError without needing exception handling.
🚨 Exception Handling with try/except
When you prefer to catch the error rather than pre‑check, wrap the risky access in a try block.
import sys
try:
print(f"Hello, my name is {sys.argv[1]}")
except IndexError:
print("Too few arguments")
Only the code that might raise the exception should be inside try to avoid masking unrelated errors.
⏹️ Exiting Early Using sys.exit
sys.exit(message) prints message (if provided) and terminates the program immediately.
import sys
if len(sys.argv) < 2:
sys.exit("Too few arguments")
if len(sys.argv) > 2:
sys.exit("Too many arguments")
print(f"Hello, my name is {sys.argv[1]}")
After calling sys.exit, any code that follows is never executed, guaranteeing that later list indexing is
safe.
🧩 Processing Multiple Arguments
To handle an arbitrary number of names (or other data), iterate over sys.argv starting at index 1.
import sys
for name in sys.argv[1:]:
print(f"Hello, my name is {name}")
The slice sys.argv[1:] skips the script name and yields only the user‑supplied words.
This pattern replaces the earlier “too many arguments” check when you intend to accept many values.
❓ Frequently Asked Points
Question Answer
Can I adjust probabilities? Not with random.choice, randint, or shuffle. You must
write custom logic or use a more advanced library.
Can I have multiple else clauses? No. An if … elif … else chain may contain one else. You
can have multiple elifs but only a single final else.
How do I pass a full name containing spaces? Enclose the name in quotes at the command line:
python name.py "David Malon". The entire quoted
string becomes sys.argv[1].
Is it possible to access non‑contiguous arguments Yes. After importing sys, you can index any position,
directly? e.g., sys.argv[1] for the first name and sys.argv[6] for
the sixth word, provided those indices exist.
What if I mix length checks and try/except? Either method works; using length checks is often
clearer for expected argument counts, while
try/except is useful when the exact failure point is
uncertain.
📚 Summary of Random‑Related Modules
Module Key Function(s) Typical Scenario
random choice, randint, shuffle Simple games, shuffling decks,
random selections
statistics mean Quick calculation of averages,
grades, sensor data
sys argv, exit Command‑line argument handling,
graceful termination
These tools together enable random behavior, basic data analysis, and flexible user input without interactive
prompts.
📂 Slicing sys.argv for Arguments
Slice – a subset of a list obtained by specifying a start and/or end index inside square brackets (list[start:end]).
The first element of sys.argv is the script name; user‑supplied arguments start at index 1.
To ignore the script name and keep only the arguments:
import sys
args = sys.argv[1:] # → list of all arguments after the script name
Omitting the end index (: with nothing after it) returns everything from the start position to the list’s
end.
Negative indices count from the right‑hand side. Using a negative end index excludes items from the
tail:
# Example: drop the last argument
args = sys.argv[1:-1]
📦 Python Packages & the pip Installer
Package – a reusable collection of modules, typically distributed as a folder; third‑party packages extend
Python’s standard library.
Packages are hosted on the Python Package Index (PyPI), searchable via the web or the command
line.
pip is the built‑in package manager used to install packages:
pip install cow
After installation, the package can be imported like any standard module.
Example: Using the cow Package
import sys
import cow
if len(sys.argv) == 2: # exactly one user argument
message = "Hello, " + sys.argv[1]
cow.cow(message) # cow says the message
The program checks that len(sys.argv) == 2 (script name + one name) before invoking cow.cow.
If the length check fails, the program does nothing (or could sys.exit with a message).
Comparison with Java
Aspect Python (import package) Java (import class)
Unit of reuse module or package (folder of class (single file)
modules)
Installation pip install (runtime) Build‑tool (Maven/Gradle) or
manual JAR
Namespace handling package.module or from package package.ClassName
import …
🎯 Why Use Command‑Line Arguments?
Command‑line arguments – values supplied after the script name when invoking a program (python script.py
arg1 arg2).
Automation: Allows scripts to be run repeatedly with different inputs without interactive prompts.
Speed: Enables quick re‑execution via shell history (↑ key).
Consistency: Facilitates integration into larger workflows (e.g., Makefiles, CI pipelines).
Less user‑friendly for novices, but proficiency yields higher productivity.
🌐 Accessing Web APIs with requests
API (Application Programming Interface) – a defined set of endpoints and data formats that let programs
request services or data from a remote server.
The third‑party requests library simplifies HTTP interactions:
import requests
import sys
if len(sys.argv) != 2:
sys.exit("Usage: python itunes.py ")
artist = sys.argv[1]
url = (
"https://itunes.com/search?"
"entity=song&limit=1&term=" + artist
)
response = requests.get(url) # perform HTTP GET
print(response.json()) # raw JSON turned into a Python dict
response.json() automatically parses the returned JSON into native Python data structures (dicts, lists,
etc.).
📄 Working with JSON Data
JSON (JavaScript Object Notation) – a text‑based, language‑agnostic format for representing structured data
(objects ↔ dictionaries, arrays ↔ lists).
Python’s built‑in json module can pretty‑print and manipulate JSON:
import json
# pretty‑print with 2‑space indentation
print(json.dumps(response.json(), indent=2))
The formatted output makes nested structures easier to read:
Curly braces {} → dictionary (key/value pairs).
Square brackets [] → list (ordered collection).
Navigating Nested Results
The iTunes response contains a top‑level key "results" whose value is a list of song dictionaries.
To extract the first track’s name:
data = response.json()
first_song = data["results"][0] # first element of the list
track_name = first_song["trackName"]
print("Track:", track_name)
Storing the entire response for later use:
o = response # variable `o` references the full Response object
🛠️ Practical Tips & Best Practices
Slice early: args = sys.argv[1:] prevents accidental processing of the script name.
Validate length before accessing sys.argv[1] to avoid IndexError.
Use sys.exit(message) for clean termination when arguments are missing or malformed.
Prefer requests over urllib for readability and automatic JSON decoding.
Leverage json.dumps(..., indent=n) for debugging complex API responses.
Keep package installation declarative (e.g., a requirements.txt file) to reproduce environments.
🔄 Recap of Key Concepts
Slicing (list[start:end]) removes unwanted elements like the script name.
Negative indices enable slicing from the list’s end.
Packages extend core Python; install with pip.
Command‑line arguments streamline automation despite a steeper learning curve.
requests handles HTTP calls; response.json() converts JSON to native Python types.
json provides pretty‑printing and further manipulation of API data.
These tools together let you build flexible, reusable scripts that interact with external services and handle user
input efficiently.
📡 Working with a Real‑World API
API (Application Programming Interface) – a web service that returns data (often JSON) in response to
HTTP requests.
HTTP request with requests
import requests
response = requests.get(
"https://itunes.apple.com/search?term=weezer&entity=song&limit=50"
)
data = response.json() # turn JSON text into Python objects
The top‑level JSON object contains a results key whose value is a list of song dictionaries.
Iterating over the list
for item in data["results"]:
print(item["trackName"])
Changing the limit query parameter (e.g., from 1 to 50) returns more track names without altering the
loop logic.
Key names (results, trackName, etc.) are defined by iTunes; they cannot be renamed in the response.
You may, however, copy their values into variables with any name you prefer.
🚦 Controlling Program Flow
Construct Typical Use Effect
break Inside loops Immediately exits the nearest loop.
sys.exit() Anywhere Terminates the entire program;
optional message can be supplied.
break cannot replace sys.exit() because break only affects loops, not the whole script.
📦 Installing & Exploring Third‑Party Packages
Use pip to install a package (e.g., cow) that provides ASCII art for various animals.
pip install cow
After installation:
import cow
cow.cow("Hello")
The package is primarily illustrative; real‑world projects will rely on many such libraries.
🛠️ Building Your Own Module
1. Create sayings.py
def hello(name):
print(f"Hello, {name}")
def goodbye(name):
print(f"Goodbye, {name}")
def main():
hello("world")
goodbye("world")
if __name__ == "__main__":
main()
The conditional block (if __name__ == "__main__":) ensures main() runs only when the file is executed
directly, not when it is imported.
2. Importing a single function
import sys
from sayings import hello
if len(sys.argv) == 2:
hello(sys.argv[1])
sys.argv[0] is the script name; sys.argv[1] holds the user‑supplied argument.
Attempting to import the module without the conditional would execute main() automatically, producing
unwanted output.
🧪 Intro to Unit Testing
Test file layout
By convention, name the test script test_.py (e.g., test_calculator.py).
Import the function under test directly:
from calculator import square
Simple test function
def test_square():
assert square(2) == 4
assert square(3) == 9
assert evaluates a Boolean expression; if false, Python raises an AssertionError and displays the failing
line.
Handling failures with try/except (more user‑friendly)
def test_square():
try:
assert square(2) == 4
except AssertionError:
print("2² is not 4")
try:
assert square(3) == 9
except AssertionError:
print("3² is not 9")
This pattern adds extra code but yields clearer messages for end users.
Why multiple test cases matter
A buggy implementation like def square(n): return n + n passes the test for 2 (because 2+2 equals 4) but
fails for 3.
Adding edge cases (e.g., zero, negative numbers) helps expose hidden defects.
Minimalist alternative using only assert
def test_square():
assert square(2) == 4
assert square(3) == 9
assert square(0) == 0
assert square(-5) == 25
Fewer lines, but failures produce raw traceback output.
📂 Organising Test Execution
Include the same if __name__ == "__main__": guard in the test file to allow manual runs:
def main():
test_square()
if __name__ == "__main__":
main()
Running python test_calculator.py will execute all assert statements; a silent run indicates all tests
passed.
🔁 Recap of Key Patterns
Pattern Purpose
API request → response.json() → data["results"] Retrieve and iterate over remote data.
if __name__ == "__main__": Prevent automatic execution when a file is imported
as a module.
from import Import only the needed symbols, keeping the
namespace clean.
sys.argv length check Validate the correct number of command‑line
arguments before use.
assert Concise, automatic test that halts on failure.
try/except AssertionError Provide custom error messages while still using
assertions.
These constructs together enable robust scripts that interact with external services, expose reusable functionality
through modules, and verify correctness via lightweight automated tests.
🧪 Unit Testing with pytest
pytest – a third‑party testing framework that discovers test functions, runs them automatically, and reports
which passed or failed.
Installed via pip install pytest.
Follows naming conventions: files start with test_ and test functions start with test_.
Writing Simple Tests
assert – a statement that raises an AssertionError when its Boolean expression is False.
def test_square():
assert square(2) == 4
assert square(3) == 9
assert square(-2) == 4
assert square(-3) == 9
assert square(0) == 0
Each assert checks a single input‑output pair.
No explicit try/except or print statements are required.
Running pytest
Execute from the terminal:
pytest test_calculator.py
Output symbols:
Symbol Meaning
. test passed
F test failed
E test raised an error (e.g., unexpected exception)
Interpreting Failure Output
A failing line shows the assertion that was false, e.g.:
> assert square(3) == 9
E AssertionError: assert 6 == 9
The left‑hand value (6) is the actual result returned by square(3).
The red F at the top signals that at least one test failed.
Organizing Tests for Better Clues
Splitting tests into focused functions gives more granular feedback:
def test_positive():
assert square(2) == 4
assert square(3) == 9
def test_negative():
assert square(-2) == 4
assert square(-3) == 9
def test_zero():
assert square(0) == 0
Approach Pros Cons
One large test (test_square) Fewer lines of code First failure stops the rest; you see
only one clue
Multiple focused tests Each failure is reported; easier to Slightly more boilerplate (extra
locate the bug function names)
Testing Expected Exceptions
pytest.raises – a context manager that asserts a specific exception is raised.
import pytest
def test_type_error():
with pytest.raises(TypeError):
square("cat") # passing a string should raise TypeError
If square("cat") does not raise TypeError, the test fails.
Why Separate Logic from main
square has a well‑defined input (a number) and output (its square).
User‑input handling lives in main, which can be tested separately or mocked.
Keeping pure functions small makes them directly testable without dealing with input() or print().
Choosing Representative Test Values
Positive numbers (e.g., 2, 3) – typical case.
Negative numbers (e.g., -2, -3) – verify sign‑independent behavior.
Zero – edge case where addition and multiplication both yield 0.
Large numbers – guard against overflow or performance issues (optional).
Invalid types (strings, floats) – confirm proper exception handling.
Test File Layout & Directories
Single file (test_calculator.py) works for small projects.
Folder of tests (tests/) allows many modules; running pytest at the project root discovers all files
matching test_*.py.
Comparison with check50
Feature pytest check50 (CS50)
Installation pip install pytest Built into CS50 environment
Language support Pure Python Primarily Python, some C
Custom assertions assert, pytest.raises Limited to predefined checks
Test discovery Automatic by naming convention Requires explicit check files
Output style Symbols (. / F) + traceback Human‑readable messages specific
to CS50 assignments
🔧 Advanced Test Patterns
Loop‑Based Assertions (avoiding repetition)
def test_many():
cases = [(2, 4), (3, 9), (-2, 4), (-3, 9), (0, 0)]
for n, expected in cases:
assert square(n) == expected
Uses a list of (input, expected) tuples; keeps the test concise while still checking multiple values.
Parameterized Tests (pytest feature)
@pytest.mark.parametrize – decorator that generates separate test cases from a list of arguments.
import pytest
@pytest.mark.parametrize("n,expected", [
(2, 4), (3, 9), (-2, 4), (-3, 9), (0, 0)
])
def test_square_param(n, expected):
assert square(n) == expected
Each tuple becomes an independent test, giving individual pass/fail reports.
📦 Testing Functions that Print
Functions that return values are straightforward to test.
For functions that only print, capture stdout with capsys (a built‑in pytest fixture):
def test_hello_output(capsys):
hello("David")
captured = capsys.readouterr()
assert captured.out == "Hello, David\n"
Keeps the test pure without modifying the function itself.
🛡️ Defensive Testing Practices
Never rely on a single test; add more cases as the code evolves.
Test both happy paths (correct inputs) and error paths (invalid inputs).
Run the full test suite frequently (e.g., before each commit) to catch regressions early.
📚 Key Takeaways
pytest automates test discovery, execution, and reporting, removing the need for manual try/except
scaffolding.
Use assertions for simple input‑output checks; employ pytest.raises to verify proper exception
handling.
Separate pure logic from I/O code to make functions easily testable.
Organize tests into focused functions or parameterized suites to obtain clear, actionable failure
reports.
Incorporate edge‑case values (negative, zero, large, wrong type) to ensure robust code.
🧪 Testing Functions & Return Values
Side effect – an operation that changes state outside the function (e.g., printing to the console) without
returning a value.
Return value – data explicitly sent back to the caller with the return statement.
The original hello function printed its greeting (print) instead of returning it, so an assertion like assert
hello("David") == "Hello, David" failed.
Square demonstrated a proper return (return n * n).
🔧 Refactoring hello for Testability
def hello(name="world"):
return f"Hello, {name}"
Now the caller decides whether to print:
print(hello("David"))
Tests can focus on the return value rather than the side effect.
✅ Writing Tests with pytest
Simple test for default argument:
def test_default():
assert hello() == "Hello, world"
Test for a supplied argument:
def test_argument():
assert hello("David") == "Hello, David"
Running pytest test_hello.py yields . for each passed test.
📐 Multiple Tests vs. One Large Test
Approach Pros Cons
Single test function with several Fewer functions Harder to pinpoint which assertion
asserts failed
Separate test functions Clear failure location Slightly more boilerplate
(test_default, test_argument)
🔁 Loops & Parameterization in Tests
A loop can iterate over many inputs inside a single test:
def test_multiple_names():
for name in ["Hermione", "Harry", "Ron"]:
assert hello(name) == f"Hello, {name}"
pytest also supports the @pytest.mark.parametrize decorator for concise parameterized tests (not
shown in the transcript but consistent with the loop idea).
📁 Organizing Tests in a Package
Create a tests/ folder (any name works).
Inside, place test_hello.py with the test functions.
Add an empty __init__.py to mark the folder as a package, enabling pytest discovery.
project/
│
├─ hello.py
└─ tests/
├─ __init__.py # makes `tests` a package
└─ test_hello.py
Run all tests with pytest tests – pytest recursively finds any file named test_*.py.
📂 File I/O Basics
File handle – the object returned by open() that provides methods like .write() and .close().
📖 Opening Files with open
file = open("names.txt", "w") # write mode (creates or overwrites)
Mode argument determines behavior:
Mode Effect
"w" Truncate file (or create) and write from the start
"a" Append to the end of the file (creates if missing)
"r" Read existing file (default)
✏️ Writing vs. Appending
Using "w" repeatedly overwrites previous content, leaving only the last name.
Switch to append mode ("a") to keep earlier entries:
with open("names.txt", "a") as file:
file.write(name) # writes without a newline
📐 Adding Newlines
print() automatically appends \n; file.write() does not.
Include an explicit newline in the string:
file.write(f"{name}\n")
🛡️ Using the with Context Manager
with ensures the file is closed automatically, even if an error occurs:
with open("names.txt", "a") as file:
file.write(f"{name}\n")
No explicit file.close() needed.
📋 Example: Collecting Multiple Names and Saving Them
1. Collect names (e.g., three times) and store in a list:
names = []
for _ in range(3):
name = input("What's your name? ")
names.append(name)
2. Sort and display (optional):
for name in sorted(names):
print(f"Hello, {name}")
3. Persist to a file using the with‑append pattern:
with open("names.txt", "a") as file:
for name in names:
file.write(f"{name}\n")
Running the script multiple times adds each new batch of names to names.txt, each on its own line.
🧩 Key Takeaways
Testable design: prefer functions that return data rather than produce side effects.
pytest encourages small, focused test functions; separate tests make debugging easier.
Loops can streamline repetitive assertions, but keep individual tests simple.
Test organization: place tests in a dedicated folder with __init__.py for package recognition.
File I/O: use the correct mode ("w" vs. "a"), remember to add newlines when writing text, and employ
with for safe automatic closing.
These practices together support reliable, maintainable code and automated verification of program behavior.
📂 Reading Files with a with Statement
with statement – a context manager that automatically closes the file when the block ends, preventing
forgotten close() calls.
Open a file for reading (default mode is 'r'):
with open("names.txt") as file:
# file is available inside this block
...
# file is closed automatically here
Reading all lines at once
lines = file.readlines() # returns a list of strings, each ending with '\n'
Iterating over the list
for line in lines:
print("Hello,", line)
🛠️ Handling Newline Characters
The newline character (\n) stored in the file and the newline added by print() can combine to produce blank
lines.
Option 1: suppress print’s newline
print("Hello,", line, end="")
Option 2 (preferred): strip the trailing newline from each line
print("Hello,", line.rstrip())
rstrip() removes whitespace (including \n) from the right‑hand side of the string.
📈 Iterating Directly Over a File Object
Instead of loading the entire file into memory, you can loop over the file object itself:
with open("names.txt") as file:
for line in file: # yields one line per iteration
print("Hello,", line.rstrip())
This approach is more memory‑efficient and reads lines lazily.
🔀 Sorting Data Read from a File
When output order matters (e.g., “Draco first, then Harry, Hermione, Ron”), you must:
1. Collect all lines into a list.
2. Sort the list.
3. Print the sorted result.
names = [] # empty list to accumulate
with open("names.txt") as file:
for line in file:
names.append(line.rstrip()) # strip newline before storing
for name in sorted(names): # default ascending order
print(f"Hello, {name}")
Reverse order (Z → A) uses the reverse keyword argument:
for name in sorted(names, reverse=True):
print(f"Hello, {name}")
Sorting must occur after all names are in memory; otherwise the output would be printed before the list is
ordered.
📂 Working with CSV Files
CSV (Comma‑Separated Values) stores multiple fields on one line, separated by commas.
Sample students.csv
Hermione,Gryffindor
Harry,Gryffindor
Ron,Gryffindor
Draco,Slytherin
Reading and Parsing CSV Rows
with open("students.csv") as file:
for line in file:
row = line.strip().split(",") # ['Hermione', 'Gryffindor']
print(f"{row[0]} is in {row[1]}")
Unpacking for Clarity
If each line always yields exactly two columns, unpack directly:
with open("students.csv") as file:
for line in file:
name, house = line.strip().split(",")
print(f"{name} is in {house}")
Unpacking eliminates the need for index notation (row[0], row[1]) and improves readability.
✏️ Modifying File Content
In‑place edits are not trivial. The usual pattern is:
1. Read the entire file into a list.
2. Apply changes to the list in memory (e.g., replace “Harry” → “Slytherin”).
3. Open the same file in write mode ('w') and write the updated list back.
Example sketch (no full code required):
Read lines → lines = file.readlines()
Modify with a loop or comprehension → new_lines = [...]
Write back → with open("students.csv", "w") as out: out.writelines(new_lines)
📊 Additional File‑IO Operations
Task Approach
Limit number of entries Count lines (sum(1 for _ in file)) before adding new
ones; exit with sys.exit() if limit exceeded.
Find a specific name Loop through lines and use if
line.strip().startswith("Harry"): to detect the target,
then act (e.g., print a message).
Count total names len(names) after collecting them in a list.
Reverse sorting sorted(names, reverse=True).
Strip whitespace only when needed Use strip() for both ends, rstrip() for trailing newline,
lstrip() for leading whitespace.
📚 Key Functions & Methods Summary
Function / Method Purpose Typical Usage
open(filename, mode) Open a file; default mode 'r'. with open("names.txt") as f:
file.readlines() Return a list of all lines (including lines = f.readlines()
\n).
for line in file: Iterate lazily over each line. for line in f:
str.strip() Remove leading & trailing clean = line.strip()
whitespace.
str.rstrip() Remove trailing whitespace (e.g., clean = line.rstrip()
\n).
str.split(sep) Split a string into a list using sep parts = line.split(",")
(comma for CSV).
list.append(item) Add item to the end of a list. names.append(name)
sorted(iterable, reverse=False) Return a new sorted list; reverse sorted_names = sorted(names,
toggles descending order. reverse=True)
with (context manager) Ensure resources (like files) are with open(... ) as f:
properly closed.
These constructs together enable robust reading, cleaning, sorting, and (when needed) updating of plain‑text and
CSV data files.
📜 Constructing Sentences from CSV and Initial Sorting
A naïve approach builds the output string while reading each line, appends the string to a list, and finally sorts
that list.
The list students holds complete English sentences such as "Harry is in Gryffindor".
Sorting works because the sentences start with the student’s name, but the sort key is the whole
sentence rather than the name itself.
📚 Dictionaries for Structured Data
Dictionary – a mutable collection that maps keys to values.
Instead of concatenating strings early, store each record as a dictionary:
students = [] # list that will contain dictionaries
for line in file:
name, house = line.strip().split(",")
student = {"name": name, "house": house}
students.append(student)
This keeps the name and house separate, enabling flexible processing later.
🔀 Sorting a List of Dictionaries
The built‑in sorted() function can sort complex objects when supplied with a key argument that extracts the
desired comparison value.
Defining a key‑extracting function
def get_name(student):
return student["name"]
for s in sorted(students, key=get_name):
print(f"{s['name']} is in {s['house']}")
sorted(..., key=get_name) calls get_name for each dictionary, using the returned name for alphabetical
ordering.
Reversing the order
for s in sorted(students, key=get_name, reverse=True):
...
Setting reverse=True flips the sorted sequence.
Sorting by a different field
def get_house(student):
return student["house"]
Replacing key=get_name with key=get_house orders the output by house instead of by name.
✨ Concise Dictionary Creation
Python allows dictionary literals with keys and values in a single expression.
student = {"name": name, "house": house}
This replaces the three‑step pattern of creating an empty dict and then assigning each key.
🛠️ Using Lambda (Anonymous) Functions as Keys
Lambda – an unnamed, inline function defined with the lambda keyword.
sorted(students, key=lambda s: s["name"])
The lambda receives a dictionary s and returns s["name"].
Useful when the key function is needed only once, avoiding a separate def block.
Approach When to prefer
Named function (def get_name) Reuse in multiple places or when logic is complex
Lambda One‑off simple extraction (e.g., a single field)
🏡 Switching Context: From House to Home
Variable names were updated from house to home to reflect new data (e.g., “Number 4 Privet Drive”).
The same dictionary‑based logic applies; only the key names change.
⚠️ Pitfall: Splitting on commas when data contains commas
Changing a home address to "Number 4, Privet Drive" introduced an extra comma.
line.strip().split(",") then produced three pieces, causing “too many values to unpack” (ValueError).
Why the error occurs
When the CSV line contains a comma inside a field, naïve split(",") treats it as a delimiter, producing more
columns than expected.
📂 Leveraging the csv Module
The csv module handles quoted fields, embedded commas, and other edge cases automatically.
import csv
students = []
with open("students.csv", newline="") as file:
reader = csv.reader(file) # parses CSV according to RFC 4180
for row in reader: # each row is a list of fields
name, home = row # unpack directly (exactly two columns)
students.append({"name": name, "home": home})
csv.reader respects quotes, so an address like "Number 4, Privet Drive" is treated as a single field.
Alternative unpacking style
for name, home in csv.reader(file):
students.append({"name": name, "home": home})
Direct unpacking eliminates the need for index notation (row[0], row[1]).
📊 Sorting After Using csv.reader
for s in sorted(students, key=lambda d: d["name"]):
print(f"{s['name']} is from {s['home']}")
The same sorted(..., key=...) pattern works with dictionaries built from the CSV reader.
📁 Reading & Writing Simultaneously
Files are sequential streams; you cannot arbitrarily jump to a position without moving the file pointer.
To read and then write without losing the read position, you must seek back to the start or open the file
in a mode that supports both (r+).
Typical workflow:
with open("students.csv", "r+", newline="") as file:
reader = csv.reader(file)
data = list(reader) # read everything into memory
# ... modify `data` as needed ...
file.seek(0) # rewind to beginning
writer = csv.writer(file)
writer.writerows(data) # overwrite with updated rows
file.truncate() # remove any leftover old content
The seek(0) call resets the pointer so that subsequent writes start at the file’s beginning.
truncate() ensures the file does not retain trailing data from the previous version.
📑 Key Takeaways
Store related attributes in dictionaries rather than concatenated strings.
Use sorted(..., key=func) to sort a list of dictionaries by any field.
Lambda provides a concise, anonymous key function when a full def is unnecessary.
When CSV fields may contain commas, rely on the csv module instead of manual split.
Reading and writing to the same file requires explicit pointer management (seek, truncate).
📂 Advanced CSV Reading & Writing
🗂️ Header Row & Dictionary Reader
Dictionary reader – a CSV iterator that returns each row as a dict keyed by the column names from the first line
of the file.
Include a header line in the CSV (e.g., name,home) so the reader can map values automatically.
Access fields by name instead of index, making the code resilient to column re‑ordering.
import csv
with open("students.csv") as f:
reader = csv.DictReader(f) # reads header and creates dicts
for row in reader:
name = row["name"]
home = row["home"]
# …process the values…
✍️ Writing CSV Files
1️⃣ Using csv.writer (list‑based)
Writer – writes rows as plain lists; the order of values must match the column order in the file.
import csv
with open("students.csv", "a", newline="") as f: # append mode
writer = csv.writer(f)
writer.writerow([name, home]) # list → CSV line
newline="" prevents extra blank lines on Windows.
The library automatically quotes fields that contain commas (e.g., "Number 4, Privet Drive").
2️⃣ Using csv.DictWriter (key‑based)
DictWriter – writes rows from a dict; column order is supplied via fieldnames.
import csv
with open("students.csv", "a", newline="") as f:
fieldnames = ["name", "home"]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writerow({"name": name, "home": home})
The order of keys in the dictionary does not matter; fieldnames tells the writer which column each key
belongs to.
📐 Handling Commas & Quoting
When a field itself contains a comma, the CSV module surrounds the whole field with double quotes.
This ensures that subsequent reads correctly split on the real column separator, not the embedded
comma.
🔁 Adding Rows Incrementally (append mode)
Open the CSV with mode "a" to keep existing data and add new rows at the end.
No need to read‑modify‑write the whole file; each execution merely appends one new record.
🛡️ Defensive CSV Design
Storing column names in the first row lets your program infer the structure at runtime.
If a collaborator re‑orders columns (e.g., home,name) or inserts a new column (e.g., house), code that
uses DictReader/DictWriter continues to work without changes.
This is an example of defensive programming: rely on explicit metadata (the header) rather than
implicit positional assumptions.
📊 Best Practices for CSV Validation
Assume human‑edited CSVs may contain stray whitespace, missing columns, or extra commas.
Use the CSV module’s built‑in quoting and header handling to reduce parsing errors.
For stricter validation, combine a try/except around the reading loop or pre‑check the header row
against an expected list of field names.
🖥️ Command‑Line Arguments Revisited
📦 sys.argv & Slicing
sys.argv – list of strings representing the command‑line invocation; index 0 is the script name.
To ignore the script name and process only user‑supplied arguments, slice from index 1:
import sys
for arg in sys.argv[1:]:
# handle each argument (e.g., file name)
This pattern is used when feeding multiple CSV filenames or image files to a program.
🔄 Using Arguments with CSV Programs
Prompt the user for a name and home, then write the pair to the CSV as shown in the csv.writer and
csv.DictWriter examples.
The same sys.argv slice can be used to let the user specify the target CSV file or to pass a list of image
files for later processing.
🎨 Image Processing with Pillow
📥 Importing Required Libraries
import sys
from PIL import Image
PIL (Pillow) provides high‑level image manipulation, including creation of animated GIFs.
📁 Building a List of Images from argv
images = []
for path in sys.argv[1:]: # skip the script name
img = Image.open(path) # opens any supported image format
images.append(img)
The list preserves the order in which the filenames were supplied on the command line.
📸 Creating an Animated GIF
images[0].save(
"costumes.gif", # output filename
save_all=True, # include all frames
append_images=images[1:], # subsequent frames
duration=200, # 200 ms per frame
loop=0, # 0 → infinite looping
)
save_all=True tells Pillow to treat the image as a multi‑frame sequence.
append_images receives the remaining frames; the first frame is the image on which save is called.
duration sets the pause between frames (in milliseconds).
loop=0 means the GIF repeats forever; any positive integer would limit the repetitions.
⚙️ Key Parameters Explained
Parameter Meaning
save_all=True Preserve every image in the list as a frame.
append_images= List of additional frames after the first.
duration= Milliseconds each frame is displayed.
loop= Number of animation cycles (0 = infinite).
📂 Handling File Names & Slices
sys.argv[0] is the script (costumes.py).
Using sys.argv[1:] cleanly extracts only the image filenames, regardless of how many are supplied.
🛠️ General File‑IO Takeaways
📄 Text vs. Binary Files
CSV and plain‑text files are human‑readable; each line ends with a newline (\n).
Image files (GIF, PNG, JPEG) are binary; Pillow abstracts opening, modifying, and saving them without
manual byte handling.
📋 Defensive Practices for Human‑Edited Data
Rely on header rows and dictionary‑based I/O to guard against column re‑ordering.
Use the CSV module’s quoting mechanisms to safely store fields that contain delimiters.
When writing, open files in append mode ("a") to avoid unintentionally overwriting existing data.
For binary assets (images, audio, video), let a dedicated library (e.g., Pillow) manage file opening and
closing; you only need to call its high‑level API.
📧 Simple Email Validation – “@” Check
Definition – A valid email address must contain at least one at sign (@).
Prompt the user and remove surrounding whitespace:
email = input("What’s your email? ").strip()
Minimal test:
if '@' in email:
print("valid")
else:
print("invalid")
Limitation – Accepts strings such as "@", "@.", or "username@", which are not real email addresses.
🔧 Adding a Dot Check
Goal – Ensure the domain part contains a period (.).
if '@' in email and '.' in email:
print("valid")
else:
print("invalid")
Still over‑permissive: "@.", "username@.", or "@domain.com" all pass.
🧩 Splitting the Address
str.split(sep) returns a list of substrings separated by sep.
username, domain = email.split('@')
Truthy check: non‑empty strings evaluate to True.
if username and '.' in domain:
print("valid")
else:
print("invalid")
Improves readability but still allows "@.edu" (empty username) to be considered valid.
🎯 Restricting to “.edu” Domains
str.endswith(suffix) returns True if the string ends with suffix.
if username and domain.endswith('.edu'):
print("valid")
else:
print("invalid")
Now the address must end with “.edu”, yet "@.edu" and "[email protected]" remain falsely accepted.
🧪 Introducing the re Module
Regular expression (regex) – a pattern that describes a set of strings.
Import the library:
import re
Basic use with re.search(pattern, string) (returns a match object when the pattern is found).
if re.search('@', email):
print("valid")
else:
print("invalid")
This mirrors the earlier in test but opens the door to richer patterns.
📚 Core Regex Metacharacters
Symbol Meaning
. Any character except a newline
* Zero or more repetitions of the preceding token
+ One or more repetitions
? Zero or one repetition
{m} Exactly m repetitions
{m,n} Between m and n repetitions (inclusive)
\. Literal dot (escaped)
Note – * is greedy and may match an empty string; + guarantees at least one character.
📈 Building a First Regex for Email
Desired structure: something @ something.
Using . for “any character” and * for “zero or more”:
pattern = r".*@.*"
* permits empty parts, so "@" still matches.
✅ Switching to + for Required Characters
Replace * with + to enforce at least one character on each side:
pattern = r".+@.+"
Now "@" fails, while "user@domain" succeeds.
🔁 Emulating + with Only *
If + were unavailable, the equivalent expression is ..* (a dot for the first required character, followed by .* for the
rest).
pattern = r"..*@..*"
Works but is less readable than using +.
🤖 How re.search Works – Finite State Machine
Finite State Machine (FSM) – a conceptual model with states and transitions used by the regex engine.
Start state → consumes characters that satisfy the first token (.).
Transition on encountering @ moves to the next state.
Accept (final) state is reached only after the entire pattern is satisfied.
If the engine cannot follow a transition for a given input character, the match fails and the address is
deemed invalid.
🚨 Common Pitfall: Unescaped Dot
In a pattern like .+.edu, the second . is interpreted as “any character”, so "user@xedu" or "user@?edu"
would match.
To require a literal period before “edu”, escape it: \..
pattern = r".+@.+\.edu"
This now matches addresses that contain an @, at least one character after it, and literally end with
“.edu”.
📄 Final Practical Regex for “.edu” Emails
import re
email = input("What’s your email? ").strip()
if re.search(r".+@.+\.edu$", email):
print("valid")
else:
print("invalid")
.+ before @ guarantees a non‑empty username.
.+ after @ guarantees a non‑empty domain before the final “.edu”.
\.edu$ forces the string to terminate with the literal “.edu”.
🛡️ Summary of Validation Steps
Step Technique Why it Helps
Trim whitespace str.strip() Removes accidental
leading/trailing spaces
Basic presence '@' in email Guarantees the essential separator
Split & check email.split('@') Isolates username and domain
Domain suffix domain.endswith('.edu') Limits to academic addresses
Regex pattern re.search(r".+@.+\.edu$", email) Enforces non‑empty parts and
literal “.edu”
Escape metacharacters \. Prevents the dot from matching
any character
These incremental refinements illustrate how moving from simple string checks to a concise regular expression
yields a more robust email validator while keeping the code readable.
🔧 Escape Characters in Regular Expressions
Escape character (\) – tells the regex engine to treat the following character literally instead of as a special
meta‑character.
Example: \. matches a literal period rather than “any character”.
Needed for characters such as . * + ? ^ $ [ ] ( ) { } | \.
📐 Raw Strings for Regex Patterns
Raw string (r"…") – a Python string literal that leaves backslashes untouched, passing them straight to re.
Without r, "\." would be interpreted as just ".".
Recommended habit: always prefix regex patterns with r to avoid accidental escape processing.
import re
email = input("Email? ").strip()
if re.search(r".+@.+\.edu$", email):
print("valid")
✅ Anchors: Start (^) and End ($)
Start anchor (^) – asserts that the match must begin at the start of the string.
End anchor ($) – asserts that the match must end at the end of the string (just before a newline).
Using both forces an exact‑match pattern rather than a substring search.
🔤 Literal Dot and Escaping
Literal dot (\.) – matches a period character.
Without escaping, . means “any single character”.
In an email validator, \.edu ensures the domain ends with “.edu” exactly.
📊 Quantifiers and Alternatives
Quantifier Meaning Example
* zero or more of the preceding a* matches `,̀ a, aa, …
token
+ one or more a+ matches a, aa, … (not empty)
{m} exactly m repetitions a{3} matches aaa
{m,} m or more a{2,} matches aa, aaa, …
{m,n} between m and n (inclusive) a{2,4} matches aa, aaa, aaaa
? zero or one (makes preceding colou?r matches color or colour
token optional)
The + quantifier is often preferred for “at least one” because it is concise.
🔁 Character Classes and Negation
Character class ([ … ]) – matches any one character inside the brackets.
Negated class ([^ … ]) – matches any character not listed.
# any character except '@'
[^@]+
Used to restrict the username and domain parts of an email so they cannot contain additional @
symbols.
📚 Shorthand Character Classes
Shorthand Meaning
\d any decimal digit (0‑9)
\D any non‑digit
\w any “word” character (letters, digits, underscore)
\W any non‑word character
\s any whitespace (space, tab, newline)
\S any non‑whitespace
\w+ is a compact way to require one or more alphanumeric/underscore characters.
📂 Using Regex in Everyday Tools
Google Forms: under “Response validation” you can select “Regular expression” to enforce patterns
such as email addresses.
VS Code Find/Replace: check “Use Regular Expression” to search for patterns instead of literal text.
Office 365 / VS Code: same regex syntax applies, allowing powerful bulk edits.
🧩 Grouping and Alternation
Parentheses (( … )) – create a group, affecting quantifier scope and enabling back‑references.
Alternation (|) – logical OR inside a group.
# accept .edu or .org
\.(edu|org)$
Grouping also clarifies precedence when mixing quantifiers and anchors.
🛠️ Practical Email‑Validation Regex
A robust pattern that:
Starts at the beginning of the string (^)
Allows only word characters before the @ (\w+)
Allows only word characters after the @ (\w+)
Requires a literal dot followed by “edu” (\.edu)
Ends at the string’s end ($)
pattern = r'^\w+@\w+\.edu$'
Adding ^ and $ eliminates false positives such as “My email is [email protected].”
Replacing the explicit [a‑zA‑Z0‑9_] sets with \w shortens the expression without losing meaning.
📌 Key Takeaways
Escape special characters with \ to match them literally.
Prefix regex strings with r to avoid Python’s own escape processing.
Use ^ and $ to enforce whole‑string matches.
Character classes ([ … ]) and their negated form ([^ … ]) give fine‑grained control over allowed
characters.
Shorthand classes (\d, \w, \s) make common patterns concise.
Quantifiers (*, +, {m,n}) define how many times a token may appear.
Parentheses and | enable grouping and alternation, useful for multiple acceptable suffixes.
Regex capabilities extend beyond code: they are usable in Google Forms, VS Code, and other
productivity tools.
🧩 Grouping & Alternation in Regular Expressions
Grouping – parentheses () collect several tokens into a single unit that can be quantified or referenced later.
Alternation – the vertical bar | means “either the expression on the left or the expression on the right”.
pattern = r"(cat|dog)s?" # matches "cat", "cats", "dog", "dogs"
The quantifier that follows the closing parenthesis applies to the whole group.
Grouping is also required when you want to apply a quantifier to a sequence that contains an
alternation.
␣ Including Spaces
\s matches any whitespace character (space, tab, newline).
When a literal space is needed, it can be placed directly inside a character class:
pattern = r"[A-Za-z0-9_ ]+" # one or more word characters **or** a space
Using \s is broader (captures tabs, newlines) while a literal space is the narrowest match.
🔡 The \w Shortcut
\w = [A-Za-z0-9_] (letters, digits, underscore).
Does not match a period (.) or a space.
Consequently, an email like
[email protected] is rejected unless the pattern is extended.
📏 Case Sensitivity & the re.IGNORECASE Flag
By default, regex matching is case‑sensitive.
The third argument to re.search (and other re functions) can be a flag that modifies the search behavior.
import re
email = "[email protected]"
if re.search(r".+@.+\.edu$", email, re.IGNORECASE):
print("valid")
re.IGNORECASE makes [A‑Z] and [a‑z] indistinguishable.
Other useful flags (mentioned briefly):
re.MULTILINE – ^ and $ match the start/end of each line.
re.DOTALL – . also matches newline characters.
🏷️ Making Parts Optional with ?
? – zero or one occurrence of the preceding token (makes it optional).
To allow an optional sub‑domain such as cs50. before harvard.edu:
pattern = r".+@(?:[A-Za-z0-9_]+\.)?harvard\.edu$"
(?: … ) creates a non‑capturing group; the trailing ? says the whole group may be absent.
If the group were omitted, only the dot would become optional, which is not the intended behavior.
⭐ Quantifiers at a Glance
Symbol Meaning Typical use
* zero or more flexible repetitions
+ one or more require at least one
? zero or one make a segment optional
{m,n} between m and n precise range control
📚 re Function Variants
Function Start position End position Need for ^/$
re.search anywhere in the string anywhere explicit anchors required
re.match always at the beginning anywhere ^ optional
re.fullmatch always at the beginning always at the end both ^ and $ optional
Using re.fullmatch eliminates the need for explicit start/end anchors when you want an exact‑match pattern.
🛠️ Raw Strings for Regex
Prefixing a string with r tells Python not to treat backslashes as escape characters.
pattern = r"\d+\.\d+" # matches a decimal number like 3.14
Without the r prefix, "\d" would be interpreted as a literal d.
📧 Extending Email Validation
Typical building steps (referencing earlier sections on \w, \s, and case handling):
1. Base username – one or more word characters: \w+
2. Optional dot in username – (?:\.\w+)?
3. Domain – optional sub‑domain, then required domain, then TLD:
email_pattern = r"""
^ # start of string
\w+(?:\.\w+)? # username, optional dot segment
@
(?:\w+\.)? # optional subdomain like cs50.
harvard\.edu$ # fixed domain and TLD
"""
if re.search(email_pattern, address, re.IGNORECASE | re.VERBOSE):
print("valid")
re.VERBOSE (or re.X) lets the pattern span multiple lines and ignore whitespace, improving readability.
The ? after the sub‑domain group makes the whole “subdomain.” segment optional.
🧹 Cleaning Names with split
When users enter names in “Last, First” order, a quick fix is:
name = input("What’s your name? ").strip()
if "," in name:
last, first = name.split(", ")
name = f"{first} {last}"
print(f"Hello, {name}")
strip() removes surrounding whitespace.
split(", ") returns exactly two parts; the unpacking (last, first) works only when the pattern matches.
⚠️ Fragile Splitting & Error Handling
If the input lacks a comma or the expected space after the comma, split returns fewer than two elements, raising a
ValueError during unpacking.
try:
last, first = name.split(", ")
except ValueError:
# fallback: treat the whole input as a single name
first, last = name, ""
Guarding against malformed data prevents the program from crashing when processing large CSVs
collected from forms.
🔄 Real‑World Data Cleaning
Manual spreadsheet cleanup becomes impractical with hundreds or thousands of rows.
A small Python script that reads a CSV (using csv.DictReader), applies the above name‑reformat logic,
and writes back with csv.DictWriter scales much better.
import csv
with open("raw.csv", newline="") as src, open("clean.csv", "w", newline="") as dst:
reader = csv.DictReader(src)
writer = csv.DictWriter(dst, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
raw_name = row["name"].strip()
if "," in raw_name:
last, first = raw_name.split(", ")
row["name"] = f"{first} {last}"
writer.writerow(row)
The same pattern can be reused for other fields (e.g., phone numbers, addresses) by adjusting the regex
or split logic.
📌 Key Takeaways
Parentheses let you treat multiple tokens as a single unit; combined with | they express “A or B”.
\s vs. literal space: choose based on whether you want to accept tabs or newlines.
\w does not include . or spaces; extend the character class or add explicit alternatives when needed.
Flags (re.IGNORECASE, re.MULTILINE, re.DOTALL) modify matching behavior without altering the
pattern itself.
The question‑mark ? makes the preceding group optional; use non‑capturing groups (?: … ) when
you only need grouping for quantifiers.
Prefer raw strings (r"…") for regex patterns to avoid double‑escaping.
For exact matches, re.fullmatch removes the need for ^ and $.
When cleaning user‑supplied strings, always anticipate malformed input and handle exceptions (e.g.,
ValueError from split).
These concepts extend the earlier foundation of basic regex syntax and provide a robust toolkit for real‑world
validation and data‑cleaning tasks.
📐 Optional Whitespace in Regex
? – quantifier meaning “zero or one” of the preceding token.
* – quantifier meaning “zero or more” of the preceding token.
When the token is a literal space ( ) you can write ? or * directly; parentheses are unnecessary.
Example pattern that allows an optional space after a comma:
pattern = r', ?' # comma followed by 0 or 1 space
To tolerate any amount of whitespace (including none) use \s* (zero‑or‑more whitespace characters).
pattern = r',\s*' # comma followed by 0 or more whitespace
🔍 Capturing Groups with re.search
Capturing group – a sub‑pattern surrounded by parentheses () whose matched text is returned to the caller.
re.search(pattern, string) returns a match object when the pattern is found.
The match object provides:
Method What it returns
match.group(0) The entire matched substring
match.group(1) Text captured by the first pair of parentheses
match.group(2) Text captured by the second pair, and so on
match.groups() Tuple of all captured groups (excluding group 0)
Non‑capturing groups are written (?:…) and are ignored by the group‑numbering scheme.
pattern = r'^(.+),\s*(.+)$' # capture “last” before the comma and “first” after it
match = re.search(pattern, name)
🐍 Accessing Captured Values
if match:
last = match.group(1) # left of the comma
first = match.group(2) # right of the comma
name = f"{first} {last}"
Why groups start at 1: group 0 is reserved for the whole match; the first set of parentheses becomes
group 1, the second becomes group 2, etc.
🐾 Assignment Expressions (Walrus Operator :=)
Walrus operator – allows assignment to a variable inside an expression (e.g., an if condition).
if (matches := re.search(r'^(.+),\s*(.+)$', name)):
name = f"{matches.group(2)} {matches.group(1)}"
Removes the need for a separate line that first assigns matches and then tests its truthiness.
📏 Refining Whitespace Handling
To accept zero, one, or many spaces after a comma:
pattern = r'^(.+),\s*(.+)$' # \s* = any amount of whitespace
If you only want to allow none or a single space, keep the original ?.
When users accidentally type multiple spaces, \s* prevents the match from failing and keeps the
captured names clean.
🔁 Substituting with re.sub
re.sub(pattern, repl, string) – returns a new string where every non‑overlapping match of pattern is replaced
by repl.
Useful for stripping a known prefix (e.g., the Twitter URL) from user input.
import re
url = input("Enter Twitter URL: ").strip()
username = re.sub(r'^https?://(?:www\.)?twitter\.com/', '', url)
print(f"username: {username}")
The pattern explained:
Component Meaning
^ Anchor to the start of the string
https? http followed optionally by s
:// Literal colon‑slash‑slash
(?:www\.)? Optional non‑capturing www.
twitter\.com/ Literal domain (dot escaped) and trailing slash
Because re.sub works with regular expressions, the same call handles http://…, https://…, with or
without www..
✂️ String Method removeprefix
str.removeprefix(prefix) – returns a copy of the string with prefix stripped if it is present; otherwise returns
the original string unchanged.
url = "https://twitter.com/davidj"
username = url.removeprefix("https://twitter.com/")
Limitation: works only for an exact prefix; it cannot handle optional www. or http vs. https. Regular
expressions (re.sub) are therefore more flexible for URL cleaning.
🐦 Extracting a Twitter Username – Step‑by‑Step
1. Prompt & trim
url = input("Enter Twitter URL: ").strip()
2. Remove protocol, optional subdomain, and domain using re.sub (see pattern above).
3. Result is the raw username; print or store as needed.
4. Edge cases to consider (handled by the regex):
http:// vs. https://
Presence or absence of www.
Trailing slash or query parameters (they will remain after the username unless further
stripped).
🛠️ Escaping Metacharacters
In a regex, characters like ., *, ?, +, ^, $, [ and ] have special meaning.
To match them literally, prepend a backslash \.
Always use raw strings (r'…') so Python does not treat the backslash as an escape character.
pattern = r'twitter\.com' # matches the literal text "twitter.com"
📋 Quantifier Cheat‑Sheet
Quantifier Meaning Typical use
? 0 or 1 of preceding token Optional space ( ?)
* 0 or more Any amount of whitespace (\s*)
+ 1 or more One or more characters (.+)
{m} Exactly m repetitions Exactly three digits (\d{3})
{m,} m or more At least two letters ([A-Za-z]{2,})
{m,n} Between m and n 1‑3 word characters (\w{1,3})
📌 Key Takeaways
Use ? or * after a literal space to make whitespace optional; no parentheses needed.
Parentheses in a regex capture matched substrings; retrieve them with .group(n) or .groups().
The walrus operator (:=) lets you assign and test a match in a single if statement, keeping code
concise.
re.sub is the go‑to tool for find‑and‑replace when the pattern may vary (e.g., different URL
protocols).
Always escape literal dots (\.) in domain names; raw strings (r'…') prevent accidental Python escapes.
str.removeprefix is handy for fixed prefixes but falls short when the prefix can appear in several forms
—regular expressions fill that gap.
🔧 Making the Protocol Optional in Regex
Optional quantifier (?) – matches zero or one of the preceding token.
To accept both http and https you can make the trailing s optional:
pattern = r'https?://'
The ? applies only to the character immediately before it, so only the s becomes optional while the rest
of the string remains required.
🌐 Handling an Optional www. Subdomain
Grouping (()) – treats the enclosed tokens as a single unit.
To make the whole www. segment optional you must group it before applying ?:
pattern = r'(www\.)?'
Without the parentheses, www\.? would make only the literal dot optional, allowing inputs such as
wwwt.
🧩 Nesting Optional Groups
You can combine the protocol and the subdomain in a single pattern:
pattern = r'(https?://)?(www\.)?'
Each pair of parentheses creates a separate capturing group; the outer ? makes the entire group
optional.
🛠️ Alternation (|) vs. Optional (?)
Use case Syntax Effect
Either A or B À B`
Make A optional A? Matches A or nothing
For protocols you could write http|https, but https? is shorter and clearer.
📦 Non‑capturing Groups (?: … )
Non‑capturing group – groups tokens without creating a numbered capture.
Useful when you need grouping for quantifiers but do not want the group in match.group():
pattern = r'(?:www\.)?'
This eliminates the extra capture that would otherwise shift the numbering of later groups.
🪝 Incremental Regex Development
Build a regex step‑by‑step:
1. Write the simplest pattern that matches a known good input.
2. Add an optional element (?) and test again.
3. Introduce grouping only when needed for quantifiers or alternation.
Checking each stage prevents the “cryptic” errors that appear when a large expression is written all at
once.
🧪 Conditional Logic with re.search
re.sub replaces text whether or not a match was found; for validation you want to search first and act
only on a successful match.
import re
url = input("Enter Twitter URL: ").strip()
match = re.search(r'^(https?://)?(www\.)?twitter\.com/([^/?#]+)', url,
re.IGNORECASE)
if match:
username = match.group(3) # captured after the final slash
print(f"username: {username}")
The ^ anchor forces the pattern to start at the beginning of the string, while the trailing ([^/?#]+)
captures the username up to the first slash, question mark, or hash.
🧭 Capturing vs. Non‑capturing
Construct Captures? Typical use
(…) Yes (creates group(n)) When you need the matched text
later
(?: … ) No When you only need grouping for
?, *, +, or `
Remember: group 0 is the entire match; the first set of parentheses becomes group 1, the second
group 2, etc.
📂 Validating a Twitter Username
Twitter permits only letters, digits, and underscores.
A precise character class (case‑insensitive) captures that rule:
username_pat = r'[a-z0-9_]+'
To allow an optional trailing slash, query string, or fragment while still extracting only the username:
pattern = rf'^(https?://)?(www\.)?twitter\.com/({username_pat})(?:[/?#].*)? $'
The non‑capturing (?:[/?#].*)? discards everything after the username.
📚 Other re Functions
Function Typical purpose
re.sub(pattern, repl, string) Replace all matches with repl.
re.split(pattern, string) Split string at each match of pattern.
re.findall(pattern, string) Return a list of all non‑overlapping matches.
re.finditer(pattern, string) Iterate over match objects for large inputs.
These utilities let you clean, segment, or extract data beyond a single search.
🧑💻 Incremental Code Example (with Walrus Operator)
import re
url = input("Twitter URL: ").strip()
if (m := re.search(r'^(https?://)?(www\.)?twitter\.com/([a-z0-9_]+)',
url, re.IGNORECASE)):
print(f"username: {m.group(3)}")
The walrus operator (:=) assigns the result of re.search to m while testing its truthiness in the same if
statement, reducing a line of code.
🧩 OOP Introduction – Why It Matters
Procedural style – code executes top‑to‑bottom, functions operate on primitive data.
Functional style – functions are first‑class, can be passed around, may be anonymous (lambda).
Object‑oriented style – groups data and behavior into objects (instances of classes), improving
organization for larger programs.
📁 Refactoring Functions for a Student Prompt
1. Procedural version – individual input calls in main.
2. Helper functions – get_name(), get_house() each return a single string.
3. Single responsibility – get_student() can gather both pieces of information at once.
def get_student():
name = input("Name: ").strip()
house = input("House: ").strip()
return {"name": name, "house": house}
Returning a dictionary keeps related values together without creating a custom class.
📊 Returning Multiple Values
Python allows a function to return a tuple of values:
def get_student():
return input("Name: "), input("House: ")
Call‑site unpacking:
name, house = get_student()
Either approach (dict or tuple) satisfies the need to pass multiple pieces of data back to the caller.
🛠️ Example Student Program Skeleton
def main():
student = get_student() # returns a dict
print(f"{student['name']} from {student['house']}")
def get_student():
return {
"name": input("Name: ").strip(),
"house": input("House: ").strip()
}
if __name__ == "__main__":
main()
The if __name__ == "__main__": guard prevents main() from running when the file is imported as a
module.
📌 Key Takeaways
Use ? after a group to make the whole group optional.
Parentheses are required when the optional element contains more than one token.
(?: … ) creates a non‑capturing group, avoiding unwanted numbering of groups.
Incremental regex building and frequent testing reduce debugging time.
re.search combined with conditional logic provides reliable validation before acting on data.
The walrus operator can streamline assignment‑and‑test patterns.
For extracting structured input (e.g., a Twitter username), capture only the needed part and discard
the rest with a non‑capturing suffix.
When a function must return several related values, choose between a tuple (simple ordering) or a
dictionary (named fields).
The if __name__ == "__main__": idiom keeps scripts reusable as modules.
📦 Returning Multiple Values with Tuples
Tuple – an immutable ordered collection created by separating items with commas (e.g., name, house).
A function can return several pieces of data in a single statement:
def get_student():
name = input("Name? ")
house = input("House? ")
return name, house # returns one tuple containing two values
Unpacking lets you receive the individual elements directly:
student_name, student_house = get_student()
The variable names on the left side of the assignment may be any identifiers; using descriptive names
(student_name, student_house) improves readability.
🔀 Tuples vs. Lists
Aspect Tuple List
Mutability Immutable – cannot change Mutable – elements can be
elements after creation. reassigned (list[0] = new_value).
Creation syntax value1, value2 or (value1, value2) [value1, value2]
Typical use Return multiple values when they Store collections that may need
should stay constant. updating (e.g., correcting a house).
Attempting to modify a tuple raises a TypeError:
student = ("Padma", "Gryffindor")
student[1] = "Ravenclaw" # → TypeError: 'tuple' object does not support item assignment
Switching to a list resolves the error:
student = ["Padma", "Gryffindor"]
student[1] = "Ravenclaw" # works because lists are mutable
📂 Accessing Tuple (or List) Elements
Indexing – using square brackets [] with a zero‑based integer to retrieve an element.
student = ("Harry", "Gryffindor")
print(student[0]) # → Harry
print(student[1]) # → Gryffindor
The same syntax works for lists; the only difference is whether the collection can be altered later.
📚 Dictionaries for Named Access
Dictionary (dict) – a mutable mapping of keys to values, defined with curly braces {}.
Building a student record with explicit keys:
def get_student():
student = {}
student["name"] = input("Name? ")
student["house"] = input("House? ")
return student
Access uses the key inside square brackets:
s = get_student()
print(s["name"])
print(s["house"])
Common pitfall: mixing quotation styles inside an f‑string causes a syntax error. Use single quotes for
the keys or switch the outer quotes to double quotes.
Dictionaries are mutable, so correcting Padma’s house is straightforward:
if s["name"] == "Padma":
s["house"] = "Ravenclaw"
🛠️ Modifying Data – Practical Example
1. Using a tuple (immutable) – cannot change the house after creation.
2. Using a list (mutable) – allows student[1] = "Ravenclaw".
3. Using a dictionary (mutable with named fields) – allows student["house"] = "Ravenclaw" and is
self‑documenting.
Choosing the appropriate container reduces the chance of accidental bugs.
📦 Nested Structures
Both tuples and lists may contain other collections, enabling complex data models (e.g., a list of tuples,
a tuple of lists).
No special syntax is required; nesting follows the same bracket/parenthesis rules.
🏗️ Introducing Classes & Objects
Class – a blueprint for creating custom data types (objects) with named attributes.
Declare a simple Student class:
class Student:
...
Instantiate and populate an object:
def get_student():
student = Student() # creates a new Student object
student.name = input("Name? ")
student.house = input("House? ")
return student
Access attributes with dot notation:
s = get_student()
print(s.name)
print(s.house)
Objects are mutable, so the same correction logic works:
if s.name == "Padma":
s.house = "Ravenclaw"
The class itself currently contains no methods or additional behavior; it merely provides a named
container for related data.
🧩 Data‑Type Decision Guide
Need Preferred container
Return a fixed pair of values, never to be altered Tuple
Collection that may need element updates List
Need explicit, self‑describing keys (e.g., name, house) Dictionary
Want a dedicated, semantically meaningful type with Class / Object
potential methods
Choosing the right structure promotes readability, reduces bugs, and aligns with Python’s “ask for forgiveness, not
permission” philosophy.
🏗️ Classes, Objects, and Instances
Class – a blueprint or mold that defines a new data type.
Object (or instance) – the concrete incarnation of a class; a specific entity created from the class template.
Defining a class gives you a custom data type (e.g., Student) that Python’s standard library does not
provide.
An object stores attributes (also called instance variables) that belong to that particular instance.
🛠️ Defining a Class and the __init__ Method
__init__ – the special “initializer” method that runs automatically when an object is created; it populates the
empty instance with data.
class Student:
def __init__(self, name, house):
self.name = name
self.house = house
self is a conventionally‑named reference to the current object being initialized.
self.name and self.house become instance variables attached to that object.
Constructor Call (Object Creation)
A constructor call invokes the class as if it were a function, passing arguments that __init__ receives.
student = Student(name, house) # creates a new Student object
The call allocates memory for a fresh object, then immediately executes __init__ to set its attributes.
📐 Parameter Passing and Validation in __init__
By accepting parameters (name, house) in __init__ you can centralize validation before the object is
fully formed.
class Student:
def __init__(self, name, house):
if not name:
raise ValueError("missing name")
valid_houses = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
if house not in valid_houses:
raise ValueError("invalid house")
self.name = name
self.house = house
raise ValueError("…") signals an error that callers can catch with try/except.
Why Use raise Instead of print or sys.exit
print merely informs the user but lets the program continue with a possibly malformed object.
sys.exit aborts the entire program, which may be too drastic.
raise creates an exception object that can be handled locally, preserving program flow and allowing
cleanup.
📦 Encapsulation of Validation
Encapsulation – keeping data and the functions that operate on that data together inside a class.
Validation logic belongs in the class (e.g., inside __init__) rather than in separate helper functions.
This keeps all student‑related behavior in one place, simplifying maintenance as the codebase grows.
🧩 Extending a Class: More Attributes
To store additional pieces of information (e.g., first, middle, last name) you can expand the parameter
list and create matching instance variables.
class Student:
def __init__(self, first, middle, last, house):
self.first = first
self.middle = middle
self.last = last
self.house = house
Alternatively, a single attribute like self.names = [first, middle, last] could hold a list, but separate,
descriptively‑named variables are usually clearer.
🧠 Memory Representation
Concept What It Is Where It Lives
Class definition Source code that describes the Stored as code; no memory
blueprint (class Student: …). allocation for individual objects.
Object (instance) A concrete chunk of memory Allocated in RAM when the
containing the instance variables constructor is called.
(self.name, self.house).
Python’s interpreter automatically decides the exact memory address; the programmer interacts only
with the object reference (student).
🔄 Classes vs. Dictionaries
Feature Class (custom data type) Dictionary
Structure enforcement Attributes defined in __init__; can Keys added ad‑hoc; no built‑in
validate types/values. validation.
Behavior (methods) Can include functions that operate Only data storage; behavior must
on the data (e.g., def greet(self): be external.
…).
Encapsulation Validation and related logic live Validation must be written
inside the class. separately.
Mutability Instances are mutable by default; Dictionaries are mutable; cannot
can be made immutable with enforce immutability per key.
property tricks.
Readability Clear attribute access Key access (student["name"]) can
(student.name). be less explicit.
⚙️ Mutable vs. Immutable Objects
By default, class instances are mutable – their attributes can be reassigned after creation.
You can design an immutable class (e.g., using @property without setters) to obtain the “best of both
worlds.”
📚 Key Takeaways
A class is a reusable blueprint; an object is a concrete instance created from that blueprint.
The __init__ method initializes an empty instance, using self to bind passed‑in values to instance
variables.
Validation (e.g., checking for a missing name or an invalid house) belongs inside __init__, and errors
are signaled with raise.
Encapsulation keeps data and its validation together, making code easier to extend and maintain.
Compared with dictionaries, classes provide structured enforcement, method support, and better
organization for growing programs.
📥 Multiple Input Strategies for Names
When gathering a person’s full name you can either prompt for each component separately (first, middle, last)
or request the entire name in one line and split it later.
Separate prompts give you clear validation for each part (e.g., ensure a middle name isn’t omitted).
A single prompt combined with str.split() or a regular expression is compact but must handle
variable‑length names (people can have two, three, or more parts).
In a class you can accept any number of arguments (including a list) if you need more flexibility later.
📦 Placing Classes in Their Own Modules
A class can live in a separate .py file (module) and be imported wherever it’s needed.
# student.py
class Student:
def __init__(self, name, house):
self.name = name
self.house = house
# main.py
from student import Student
s = Student("Harry", "Gryffindor")
This promotes reusability across projects and keeps the main script tidy.
The same mechanism is used for third‑party libraries, so your own class can become a personal library.
⚙️ Optional Parameters & Default Values
To make an argument optional, give it a default value in the function (or method) signature.
def __init__(self, name, house, petronis=None):
self.name = name
self.house = house
self.petronis = petronis
Callers may omit petronis; the attribute will be set to None.
Default values are evaluated once at definition time, so mutable defaults (e.g., list=[]) should be
avoided.
🛠️ Defining Your Own Exception Types
Custom exceptions let you signal domain‑specific errors.
class EricError(Exception):
"""Raised when an invalid house is supplied."""
pass
Raise it with raise EricError("invalid house").
Users can catch it separately from built‑in exceptions, making error handling more expressive.
✨ Special Methods for String Representations
Python looks for special “dunder” methods to decide how an object should be displayed.
Method Purpose Audience
__str__(self) Human‑readable description (used End users
by print)
__repr__(self) Unambiguous developer‑focused Developers/debuggers
representation (often
ClassName(...))
class Student:
def __str__(self):
return f"{self.name} is in {self.house}"
def __repr__(self):
return f"Student(name={self.name!r}, house={self.house!r})"
If only __repr__ is defined, print falls back to it; defining both gives distinct outputs for users vs.
debugging.
🧙♂️ Adding Custom Behaviour: the charm Method
Methods that you invent inside a class become actions the objects can perform.
class Student:
def charm(self):
match self.petronis:
case "Stag":
return "🐎"
case "Otter":
return "🦦"
case "Jack Russell Terrier":
return "🐶"
case _:
return "✨" # default “fizzle” effect
The method accesses self.petronis and returns an emoji (a Unicode character).
Emojis are ordinary strings; they can be placed in double quotes like any other text.
Usage from outside the class:
s = Student("Harry", "Gryffindor", "Stag")
print(s.charm()) # 🐎
Because the method is defined on the class, any Student instance can call it, regardless of where the call
occurs (e.g., inside main).
🏠 Properties, Getters, and Setters
A property wraps attribute access in custom logic, allowing validation on both read and write.
class Student:
def __init__(self, name, house):
self.name = name
self._house = house # “private” backing attribute
@property
def house(self):
"""Getter – returns the current house."""
return self._house
@house.setter
def house(self, value):
"""Setter – validates before assignment."""
valid = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
if value not in valid:
raise ValueError("invalid house")
self._house = value
The leading underscore (_house) signals that the attribute should not be accessed directly.
Access syntax remains unchanged: s.house reads, s.house = "Slytherin" writes, but the setter runs
automatically.
Why Use Properties?
Prevents adversarial code from bypassing validation after object construction.
Provides a clean, attribute‑style API while still enforcing rules.
🔐 Defensive Attribute Access
Even with validation in __init__, external code can later mutate attributes (e.g., student.house = "Privet Drive").
Properties guard against such mutations by centralising validation in the setter.
If you need an attribute to be read‑only, define only the getter and omit the setter.
class Student:
@property
def name(self):
return self._name # no corresponding setter → read‑only
Attempting s.name = "New" now raises an AttributeError.
📚 Summary of Key Class Features
Feature Typical Use How It’s Declared
Separate module Reuse across scripts from student import Student
Optional argument Allow missing data def __init__(..., petronis=None)
Custom exception Domain‑specific errors class EricError(Exception): ...
__str__ Friendly output def __str__(self): ...
__repr__ Debug representation def __repr__(self): ...
Custom method Domain actions (e.g., charm) def charm(self): ...
Property getter Controlled read @property\ndef house(self): ...
Property setter Controlled write with validation @house.setter\ndef house(self,
value): ...
These tools together give you expressive, safe, and reusable class designs—handling input flexibly, enforcing
invariants, and providing intuitive string representations for both users and developers.
🏠 Property Getters & Setters
Property – a managed attribute that runs a getter / setter function automatically when accessed or assigned.
How Python knows to use a setter
When the interpreter sees an assignment like self.house = new_value, the presence of a property
named house signals that the normal attribute should not be written directly.
Python therefore calls the method decorated with @house.setter instead of performing a plain
assignment.
Defining a getter
class Student:
@property # marks the following method as the getter for “house”
def house(self):
return self._house # returns the “private” storage attribute
The getter receives only self.
It simply returns the underlying value (conventionally stored in an attribute whose name starts with an
underscore).
Defining a setter
class Student:
@house.setter # links this method to the same property name
def house(self, value): # two parameters: self and the new value
if value not in ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]:
raise ValueError("invalid house")
self._house = value
The setter performs validation before storing the value.
Because both __init__ and any external code assign via self.house = …, the validation runs automatically
in every case.
Avoiding name collisions
An instance variable and a property cannot share the same identifier; otherwise Python cannot
distinguish between the stored value and the accessor method.
Convention: store the real data in an attribute prefixed with an underscore (e.g., _house) while the
public property remains house.
Centralizing error checking
By putting all validation logic in the setter, you eliminate duplicated checks in __init__ or elsewhere.
Any assignment—whether from the constructor, user input, or later code—passes through the same gate.
Getter vs. Setter signatures
Method Parameters Typical purpose
Getter self Return the current value
Setter self, new_value Validate and store new_value
📛 Visibility Conventions in Python
Underscore prefix – a naming convention that signals “private” or “protected” intent to other programmers.
Prefix Meaning Enforcement
_var Intended private; treat as Honored by convention only.
implementation detail.
__var Name‑mangled to reduce Still accessible via
accidental access; stronger hint of _ClassName__var.
privacy.
No underscore Public; free to read/write. No restrictions.
Python does not enforce access modifiers; the system relies on the honor code.
Developers are expected not to touch attributes that start with an underscore unless the class explicitly
exposes them.
🧩 Built‑in Types Are Classes
In Python, fundamental data structures such as integers, strings, lists, and dictionaries are instances of classes.
print(type(50)) # →
print(type("Hello")) # →
print(type([])) # →
print(type({})) # →
Each type call reveals the underlying class that implements the object’s behavior.
Methods like str.lower(), list.append(), and dict.get() are defined on these built‑in classes.
The community convention: user‑defined classes use CamelCase (e.g., Student), while built‑in classes
are lowercase (int, list).
📚 Class Methods
Class method – a function bound to the class itself rather than to an individual instance.
Declared with the @classmethod decorator.
Receives the class as its first argument, traditionally named cls, instead of self.
class Example:
@classmethod
def greet(cls, name):
return f"Hello, {name} from {cls.__name__}"
Use a class method when the operation does not depend on instance data but still belongs logically to
the class (e.g., factory methods, utilities that need class‑level information).
🎩 Example: Sorting Hat Class
The Sorting Hat demonstrates a simple class with a method that decides a Hogwarts house based on a
student’s name.
class Hat:
def sort(self, student_name):
# placeholder logic – actual sorting algorithm would go here
return "Gryffindor" if student_name.startswith("H") else "Slytherin"
Instantiation follows the familiar pattern: hat = Hat().
The sort method is an instance method because it may eventually rely on instance‑specific data (e.g., a
list of known houses).
📌 Key Takeaways
Properties (@property and @.setter) give fine‑grained control over attribute access, enabling
validation and encapsulation without changing external code syntax.
Use an underscore prefix (_house) for the internal storage attribute to avoid name collisions with
the public property.
Python’s visibility is convention‑based; respect the underscore naming to maintain data integrity.
Built‑in types (int, str, list, dict) are themselves classes with methods; type() reveals this fact.
Class methods (@classmethod) operate at the class level and receive cls instead of self.
The Sorting Hat example shows how to scaffold a new class, define an instance method, and
instantiate the class for later use.
🧙♂️ Defining the Hat class – instance method
Instance method – a function defined inside a class that receives the current object as its first parameter (self).
The sort method must be declared with self and the programmer‑supplied argument (the student’s
name).
Example signature:
def sort(self, name):
# placeholder implementation
print(f"{name} is in some house")
Calling Hat().sort("Harry") automatically passes the newly created Hat instance as self.
🎲 Adding randomness to house selection
Storing the list of houses
An __init__ method can create an instance variable that holds the four Hogwarts houses:
def __init__(self):
self.houses = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
Keeping the list in an instance variable makes it easy to change later (e.g., a fifth house).
Using the random library
To pick a house at random, import the standard library module and call random.choice:
import random
house = random.choice(self.houses)
print(f"{name} is in {house}")
Forgetting import random triggers a NameError because the name random is undefined.
🏠 Refactoring to class variables & class methods
Why move the list to a class variable
When the same constant is needed by multiple methods, placing it at the class level avoids duplication
and makes future changes obvious.
class Hat:
houses = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
All instances (and class methods) share a single copy of Hat.houses.
Defining a class method
Class method – a method that receives the class itself (cls) as its first argument, allowing access to class
variables without needing an object.
class Hat:
houses = ["Gryffindor", "Hufflepuff", "Ravenclaw", "Slytherin"]
@classmethod
def sort(cls, name):
import random
house = random.choice(cls.houses)
print(f"{name} is in {house}")
@classmethod replaces the implicit self with cls.
The method can now be invoked without creating a Hat instance:
Hat.sort("Harry")
Design rationale – the singleton idea
In the Harry‑Potter world there is only one Sorting Hat; creating multiple Hat objects would be
conceptually odd.
Using a class method with a class variable models a singleton without extra boilerplate.
📦 Consolidating related functionality
Moving student‑creation logic into the Student class
Original design used a free‑standing get_student() function that prompted for input, built a Student
object, and returned it.
Refactor by turning that logic into a class method called get:
class Student:
def __init__(self, name, house):
self.name = name
self.house = house
@classmethod
def get(cls):
name = input("Name? ")
house = input("House? ")
return cls(name, house)
The rest of the program can now call Student.get() directly, eliminating the separate helper function.
Benefits
All student‑related code lives inside the Student class, improving cohesion.
No “chicken‑and‑egg” problem: the class method creates an instance of its own class (cls(name, house)).
🔄 Order of definitions in a script
Python reads a file top‑to‑bottom, but functions (including main) are not executed until they are called.
Placing main before class definitions is safe as long as the call to main() occurs after the class definitions
(typically guarded by if __name__ == "__main__":).
🧩 Instance vs. class variables & methods
Scope Variable location Access pattern Typical use
Instance variable self.attribute (inside Each object has its own Per‑object state (e.g., a
__init__) copy specific student’s house)
Class variable ClassName.attribute or Shared by all objects and Constants, lookup tables
cls.attribute class methods (e.g., the list of houses)
Instance method Defined without Called on an object Operates on that object’s
decorator, first param self (obj.method()) state
Class method Decorated with Called on the class Works with class‑level
@classmethod, first (ClassName.method()) data, creates instances
param cls
📚 Brief glimpse of other OOP tools
Static methods (@staticmethod) exist for functions that belong to a class’s namespace but neither need
self nor cls.
Inheritance was mentioned as a powerful OOP feature but not demonstrated in this section.
🧬 Inheritance — Sharing Attributes & Behavior
Inheritance is an OOP mechanism that lets a class derive from another, automatically receiving its attributes
and methods.
The class that provides the shared features is called the superclass (or parent), while the derived class is the
subclass (or child).
Why inherit?
Eliminate duplicated code (e.g., the repeated self.name = name and error‑checking logic).
Centralise common validation in one place.
Model real‑world hierarchies (students → wizard, professor → wizard).
Refactoring with a common base class
class Wizard:
def __init__(self, name):
if not name:
raise ValueError("missing name")
self.name = name
# …additional wizard‑wide behaviour…
The Wizard class now owns everything that is true for any wizard.
Subclassing Wizard
class Student(Wizard):
def __init__(self, name, house):
super().__init__(name) # invoke Wizard.__init__
self.house = house
# …student‑specific behaviour…
class Professor(Wizard):
def __init__(self, name, subject):
super().__init__(name) # invoke Wizard.__init__
self.sub = subject
# …professor‑specific behaviour…
super() returns a proxy to the immediate superclass, allowing us to call its __init__ without repeating
code.
Each subclass adds only what is unique to it (house for Student, sub for Professor).
📚 Multi‑Level & Multiple Inheritance
Multi‑level inheritance occurs when a subclass derives from another subclass, forming a chain (e.g., Student →
Wizard → object).
All ancestors’ methods are available unless overridden.
object
└─ Wizard
├─ Student
└─ Professor
Overriding a method in a lower‑level class replaces the inherited version, but super() can still be used to
reach higher levels.
Multiple inheritance (brief)
Python supports multiple parents (class A(B, C): …).
The method‑resolution order (MRO) determines which parent’s attribute is used first.
The examples keep a single‑inheritance path for simplicity.
🧩 Exception Hierarchy – Reusing Superclass Behaviour
Built‑in exceptions form a tree: BaseException → Exception → ValueError, etc.
Custom exceptions should also inherit from an existing exception class to gain standard behaviour.
class MyError(ValueError):
"""Raised when a wizard‑specific value is invalid."""
pass
Catching a superclass (e.g., except Exception) handles all derived exceptions, useful when the exact
error type is unknown.
➕ Operator Overloading – Customising +
Operator overloading lets a class define special methods (dunder methods) that replace the default behaviour
of operators.
+ maps to __add__(self, other).
The Vault example
A vault stores three coin denominations: galleons, sickles, knuts.
class Vault:
def __init__(self, galleons=0, sickles=0, knuts=0):
self.galleons = galleons
self.sickles = sickles
self.knuts = knuts
def __str__(self):
return f"{self.galleons} galleons, {self.sickles} sickles, {self.knuts} knuts"
def __add__(self, other):
total_g = self.galleons + other.galleons
total_s = self.sickles + other.sickles
total_k = self.knuts + other.knuts
return Vault(total_g, total_s, total_k)
__str__ supplies a human‑readable representation used by print.
__add__ creates a new Vault whose coin counts are the element‑wise sums of the two operands.
Using the overloaded operator
potter = Vault(100, 50, 25)
weasley = Vault(25, 50, 100)
total = potter + weasley # invokes Vault.__add__
print(total) # → 125 galleons, 100 sickles, 125 knuts
No manual extraction of attributes is needed; the + operator now behaves exactly as a wizard would expect.
📏 Common Patterns When Overriding
Scenario Typical method Reason
Extending initialisation def __init__(self, …): Reuse parent setup, then add
super().__init__(…) subclass specifics
Providing a printable form def __str__(self): return … print(obj) uses __str__
Enabling arithmetic / combination def __add__(self, other): … return a + b calls __add__
NewType(…)
Custom exception class MyError(SomeBuiltinError): … Inherit standard error handling
🔄 Hierarchical Design Takeaways
Identify shared data (e.g., name) → extract into a base class.
Call super().__init__ to ensure the base class performs its work.
Override only what changes; keep the rest inherited.
Leverage Python’s built‑in hierarchies (exceptions, containers) to avoid reinventing common logic.
Operator overloading gives domain‑specific syntax (vault1 + vault2) without sacrificing readability.
📦 Operator Overloading Recap
Operator overloading – the ability for a class to define special methods (e.g., __add__) that replace the default
behavior of Python’s built‑in operators when the operator is used with an instance of that class.
Any class can overload existing operators; Python does not allow creation of brand‑new operator
symbols (e.g., an emoji as an operator).
The left‑hand operand must implement the corresponding special method; the right‑hand operand can
be of any type, but the method must decide what the operation means.
Example from the discussion: adding a Vault object to a string (vault + "stir"). The operation is legal as
long as Vault.__add__ exists, even if the semantic meaning is nonsensical.
What can be overloaded
Operator Special method Example usage
+ __add__(self, other) vault + other
+= __iadd__(self, other) vault += 5
- __sub__(self, other) vault - other
* __mul__(self, other) vault * 2
/ __truediv__(self, other) vault / 3
<, <=, >, >= __lt__, __le__, __gt__, __ge__ comparisons
==, != __eq__, __ne__ equality checks
+= (in‑place) __iadd__ augmented assignment
… … all operators shown on the “Special
method names” slide are
overloadable
The full list of overloadable operators is documented under Special method names in the official Python
reference.
🧩 Special Method Names
Special method – a method whose name begins and ends with double underscores (__method__). Python calls
these automatically in response to built‑in operations.
Overloading is achieved by defining the appropriate special method inside a class.
Commonly used overloads for arithmetic and comparison were highlighted; virtually any symbol you
type in code (+, -, *, /, <, <=, +=, ==, etc.) maps to a special method you can implement.
🪝 Using Operator Overloading in Practice
class Vault:
def __init__(self, galleons=0, sickles=0, knuts=0):
self.galleons = galleons
self.sickles = sickles
self.knuts = knuts
def __add__(self, other):
# “other” may be another Vault, a number, or any type you decide.
# Here we simply return a new Vault that combines the two.
total_g = self.galleons + getattr(other, "galleons", 0)
total_s = self.sickles + getattr(other, "sickles", 0)
total_k = self.knuts + getattr(other, "knuts", 0)
return Vault(total_g, total_s, total_k)
def __str__(self):
return f"{self.galleons}g {self.sickles}s {self.knuts}k"
Adding a Vault to a string (vault + "stir") is technically possible because __add__ receives the string as
other. The method must decide what to return; if it simply raises an exception, the operation fails
gracefully.
The key point: the left operand’s class determines whether the operation is allowed, not the type of
the right operand.
🛠️ Sets for Unique Collections
Set – an unordered collection type that automatically eliminates duplicate elements.
Using a set to extract unique Hogwarts houses
houses = set() # create an empty set
for student in students: # `students` is a list of dicts
houses.add(student["house"])
print(sorted(houses)) # → ['Gryffindor', 'Ravenclaw', 'Slytherin']
add inserts an element; duplicates are ignored automatically.
Membership testing works the same as with lists: if "Gryffindor" in houses:.
Sets are ideal when the only requirement is uniqueness; they are faster than the
“list‑and‑if‑not‑in‑append” pattern because the containment check is O(1) on average.
Comparison with the list‑based approach
Approach Code pattern Duplicate handling Ordering
List + manual check if house not in houses: Explicit if test needed Preserves insertion order
houses.append(house) (can later sort)
Set houses.add(house) Automatic Unordered (must sort if
needed)
🌍 Global vs Local Scope
Global variable – a name defined at the top level of a module (outside any function). It can be read from any
function, but writing to it from inside a function requires the global keyword.
Local variable – a name created inside a function (including its parameters). Its lifetime ends when the function
returns.
UnboundLocalError explained
Occurs when Python detects an assignment to a name inside a function and also sees a reference to
that name before the assignment.
Example:
balance = 0 # module‑level (global) variable
def deposit(n):
balance += n # assignment makes `balance` local → UnboundLocalError
The interpreter treats balance as a local variable because of the += assignment, but the variable has not
been defined yet within the function’s scope.
Fix with global
balance = 0
def deposit(n):
global balance
balance += n
global balance tells Python that assignments refer to the module‑level variable.
Shadowing rule
If a name is defined both globally and locally (e.g., balance inside main and at module level), the local
definition shadows the global one for the duration of the function.
Best practice: avoid creating a local variable with the same name as a global to prevent accidental
shadowing.
Passing arguments does not mutate the original (for immutable types)
Doing def modify(x): x += 5 and calling modify(balance) will not change the caller’s balance because
integers are immutable; the function works on a copy of the reference.
To truly update shared state without globals, encapsulate the data in a mutable object (e.g., a class
instance).
🏦 Refactoring with Object‑Oriented Design
Encapsulation – bundling data (attributes) and the functions that manipulate that data (methods) inside a
class, hiding the internal representation from outside code.
Class‑based bank account example
class Account:
def __init__(self):
self._balance = 0 # “private” by convention
@property
def balance(self):
"""Read‑only view of the account balance."""
return self._balance
def deposit(self, n):
self._balance += n
def withdraw(self, n):
self._balance -= n
self is the instance on which the method operates.
_balance is a private‑style attribute (underscore prefix signals “do not touch directly”).
The @property decorator creates a getter so callers can read account.balance without invoking a
method, while still preventing external code from assigning to it directly.
Using the class
def main():
acct = Account()
print(acct.balance) # → 0
acct.deposit(100)
acct.withdraw(50)
print(acct.balance) # → 50
if __name__ == "__main__":
main()
All methods share the same instance variable _balance; no global keyword is needed.
This approach scales better than module‑level globals because the state is bound to a specific object,
allowing multiple independent accounts if desired.
Why prefer OOP over globals
Clarity: The source of state (acct._balance) is explicit.
Safety: No accidental shadowing; each instance controls its own data.
Extensibility: Additional behavior (e.g., transaction history) can be added as new methods without
affecting global logic.
📚 Additional Concepts Mentioned
Property – a managed attribute that can define custom getter, setter, and deleter behavior while
presenting a simple attribute‑style interface.
Underscore convention – leading underscore (_var) signals “internal use”; double underscore (__var)
triggers name‑mangling to avoid accidental clashes in subclasses.
Module‑level scope – globals are technically scoped to the module; importing the module gives other
code access to those names.
Operator overloading as a toolbox – while the course emphasized functions, variables, and control
flow, operator overloading demonstrates how Python’s data model lets you give custom objects familiar
syntax (e.g., +, -, []).
These notes capture the core ideas needed to understand operator overloading, set usage, scope rules, and the
transition from global‑variable scripts to clean, class‑based designs.
🏷️ Property Getters (Read‑Only)
Property – a special method that lets you access a value using attribute syntax while controlling how that value
is retrieved.
Defining only a getter (no setter) makes the attribute read‑only.
Typical pattern: store the real data in a private attribute (conventionally prefixed with an underscore)
and expose it via a property.
class Account:
def __init__(self, balance: int):
self._balance = balance # “private” storage
@property
def balance(self) -> int: # read‑only getter
return self._balance
Attempting account.balance = 1_000 now raises an AttributeError because no setter exists.
📏 Constants & Naming Conventions
Constant – a value that should never change after its initial definition.
Python does not enforce immutability; the convention is to write constant names in UPPERCASE.
Placing constants at the top of a module makes “magic numbers” easy to locate and modify.
Bad practice Better practice
for i in range(3): (hard‑coded 3) python\nMEOW_COUNT = 3\nfor _ in
range(MEOW_COUNT):\n ...\n
balance = 0 (changed everywhere) BALANCE_START = 0 defined once, reused
Even though the language cannot stop you from reassigning BALANCE_START, the uppercase naming signals
intent to fellow developers.
🐱 Cat Meow Example
Class constant for the number of meows
class Cat:
MEOWS = 3 # class‑level constant (uppercase)
def meow(self):
for _ in range(Cat.MEOWS):
print("meow")
Cat.MEOWS is accessed inside the method; it is shared by all Cat instances.
Instantiating and using the class:
my_cat = Cat()
my_cat.meow() # prints “meow” three times
Using an underscore for an unused loop variable
for _ in range(Cat.MEOWS):
print("meow")
The underscore indicates the loop index is intentionally ignored, improving readability.
🧩 Type Hinting & Static Type Checking
Type hint – an annotation that expresses the expected type of a variable, function parameter, or return value.
Python remains dynamically typed; type hints are not enforced at runtime.
External tools such as mypy analyze these hints statically and report mismatches before execution.
Basic syntax
def meow(times: int) -> None: # `times` must be an int; function returns nothing
for _ in range(times):
print("meow")
Catching a common mistake
number = input("How many times? ") # returns a string
meow(number) # ❌ type error: str passed where int expected
Fix – convert the input and annotate the variable:
number: int = int(input("How many times? "))
meow(number)
Running mypy meow.py would report:
error: Argument 1 to "meow" has incompatible type "str"; expected "int"
After the conversion, mypy reports no issues.
Annotating a variable’s intended type
count: int = 5 # tells type checkers that `count` should stay an int
If later code assigns a non‑int to count, mypy will flag it.
🔄 Return Values vs. Side Effects
Side effect – an operation (e.g., printing) that changes the program’s state without returning a value.
A function that only prints implicitly returns None.
def greet(name: str) -> None:
print(f"Hello, {name}")
Assigning the result captures None:
msg = greet("Alice") # msg is None
print(msg) # prints "None"
Annotating the return type with -> None makes the intention explicit and helps static analysers detect misuse.
✨ String Multiplication (Operator Overloading)
The * operator is overloaded for strings: "" * n produces the text repeated n times.
print("meow\n" * 3) # meow on three separate lines
This avoids explicit loops when the goal is simple repetition.
📚 Docstrings (PEP 257)
Docstring – a triple‑quoted string placed as the first statement in a module, class, or function, describing its
purpose.
Unlike regular comments (#), docstrings are accessible at runtime via __doc__.
def meow(times: int) -> None:
"""
Print the word "meow" `times` times, each on a new line.
Parameters
----------
times : int
Number of repetitions.
"""
for _ in range(times):
print("meow")
Using docstrings standardizes documentation, enables tools like help() and automatic API generators, and fulfills
the community‑accepted PEP 257 format.
📊 Summary Tables
Read‑Only Property vs. Regular Attribute
Feature Property with getter only Plain attribute
External assignment Raises AttributeError Allowed
Encapsulation Yes (controls access) No
Syntax for access obj.attr obj.attr
Type Hint + mypy Workflow
Step Action
1️⃣ Add type hints to code (param: type, -> return_type).
2️⃣ Install mypy (pip install mypy).
3️⃣ Run mypy file.py.
4️⃣ Fix reported mismatches (e.g., convert input to int).
5️⃣ Re‑run mypy until no errors remain.
These practices collectively promote defensive programming, readability, and maintainability in Python
projects.
📜 Docstrings & Documentation Conventions
Docstring – a string literal placed as the first statement in a module, class, or function, used to describe its
purpose and usage.
Enclosed in triple quotes (""" … """ or ''' … ''').
When Python encounters a docstring, tools can automatically extract it for generating documentation
(HTML, PDF, etc.).
Structured Docstring Format (reStructuredText)
Section Syntax Example Purpose
Summary """One‑line description of the Brief overview, often the first line.
function."""
Blank line (empty line) Separates summary from details.
Parameters :param n: number of times to meow Describes each argument and its
:type n: int expected type.
Raises :raises TypeError: if *n* is not an Documents possible exceptions.
integer
Returns :return: a string containing *n* Explains the return value and its
“meow” lines type.
:rtype: str
def meow(n):
"""
Print “meow” *n* times, each on a new line.
:param n: number of times to meow
:type n: int
:raises TypeError: if *n* is not an integer
:return: a newline‑separated string of meows
:rtype: str
"""
if not isinstance(n, int):
raise TypeError("n must be an int")
return "\n".join(["meow"] * n)
The above follows the reStructuredText (reST) convention, which many Python documentation
generators (e.g., Sphinx) understand.
Docstrings do not enforce types; they merely inform developers and enable automated tools.
Using Docstrings for Simple Testing
Some tools allow doctest blocks inside docstrings: write an example call and its expected output; the
tool executes the example and verifies the result.
def add(a, b):
"""
Return the sum of *a* and *b*.
>>> add(2, 3)
5
"""
return a + b
Running python -m doctest mymodule.py will report mismatches, helping catch bugs early.
🖥️ Command‑Line Arguments with sys.argv
sys.argv – a list containing the command‑line arguments passed to a Python script; index 0 is the script name.
Basic Manual Parsing
import sys
if len(sys.argv) == 1:
# No extra arguments → default one meow
print("meow")
elif len(sys.argv) == 3 and sys.argv[1] == "-n":
n = int(sys.argv[2])
for _ in range(n):
print("meow")
else:
print("usage: meow.py [-n NUMBER]")
Handles three cases: no arguments, correct -n flag with a number, and an error fallback that prints a
usage hint.
Manual checks (len, equality tests, ordering) become cumbersome as the number of options grows.
⚙️ Argument Parsing with argparse
argparse – a standard‑library module that automates the parsing of command‑line options, handling defaults,
type conversion, help messages, and error reporting.
Typical Setup
import argparse
parser = argparse.ArgumentParser(description="Prints “meow” a specified number of times.")
parser.add_argument("-n", "--number",
type=int,
default=1,
help="number of times to meow")
args = parser.parse_args()
for _ in range(args.number):
print("meow")
Key points:
ArgumentParser creates a parser object.
add_argument registers a flag (-n or --number) with:
type=int – automatic conversion; invalid values trigger a clear error message.
default=1 – used when the flag is omitted.
help – appears in the automatically generated usage output.
parse_args() reads sys.argv for you, regardless of the order of flags.
Generated Help & Usage
Running the script with -h or --help prints:
usage: meow.py [-h] [-n NUMBER]
Prints “meow” a specified number of times.
optional arguments:
-h, --help show this help message and exit
-n NUMBER, --number NUMBER
number of times to meow (default: 1)
The description supplied to ArgumentParser appears at the top.
If the user provides a non‑integer (e.g., -n dog), argparse aborts with a helpful error and the usage
reminder.
🔀 Flags, Switches, and Order‑Independent Parsing
Single‑letter flags use a single dash (-n).
Longer, more descriptive flags use a double dash (--number).
argparse treats them equivalently; the user may supply them in any order, e.g., meow.py --number 3 -v.
Adding more options (e.g., -a, -b) follows the same pattern without manual index bookkeeping.
🧩 Unpacking Input & Ignoring Values
Unpacking – assigning multiple values returned from an iterable directly to separate variables.
Splitting a Full Name
full_name = input("What’s your name? ")
first, last = full_name.split(" ")
print(f"Hello, {first}")
The underscore (_) is a conventional placeholder for values that are required for unpacking but
intentionally unused:
first, _ = full_name.split(" ")
Packing a List into a Function
def total(galleons, sickles, knuts):
return (galleons * 17 + sickles) * 29 + knuts
coins = [100, 50, 25] # [galleons, sickles, knuts]
print(total(*coins), "knuts")
The * operator unpacks the list so that each element is passed as a separate positional argument.
This technique scales when the number of arguments matches the list length, avoiding manual indexing.
📐 Converting Command‑Line Input to the Desired Types
When using sys.argv directly, explicit conversion is necessary:
n = int(sys.argv[2]) # may raise ValueError if the string is not numeric
With argparse, specifying type=int moves the conversion step into the library, and any conversion error
is handled automatically.
🛡️ Defensive Parsing & Error Messages
Situation Manual sys.argv handling argparse handling
Missing flag value Must check length and raise custom Library prints “argument -n:
error expected one argument”
Non‑numeric value for an integer int() raises ValueError; you must Library prints “argument -n: invalid
flag catch it int value: 'dog'”
Unexpected extra arguments Must ignore or warn manually Library reports “unrecognized
arguments: …”
Using argparse therefore reduces boilerplate and provides a consistent, user‑friendly interface.
📚 Summary of Docstring & CLI Practices
Write docstrings in a recognized format (reST) to enable automatic documentation generation.
Document parameters, types, raises, and returns inside the docstring.
For simple validation, include doctest examples that can be executed automatically.
Prefer argparse over manual sys.argv parsing for anything beyond a single flag; it supplies defaults,
type conversion, help text, and order‑independent processing.
Use unpacking (first, last = …) to split strings and argument unpacking (func(*list)) to pass
collections to functions cleanly.
Apply the underscore (_) as a placeholder for values that are required syntactically but not used.
📦 List Indexing & Positional Arguments
Positional argument – a value passed to a function based solely on its position in the call.
Lists are zero‑based; the first element is at index 0, the second at 1, etc.
To feed a function that expects three separate integers you can index the list explicitly:
coins = [100, 50, 25] # galleons, sickles, knuts
total(coins[0], coins[1], coins[2])
This works but becomes verbose and error‑prone as the number of parameters grows.
✨ Unpacking a Sequence with *
Unpacking – the operation that expands a single iterable (list, tuple, etc.) into individual positional arguments.
Prefix the iterable with a single asterisk (*) when calling the function:
coins = [100, 50, 25]
total(*coins) # equivalent to total(100, 50, 25)
* removes the need for explicit indexing and guarantees the order of elements matches the function’s
signature.
Works only with ordered collections (lists, tuples).
Error cases:
Too few elements → TypeError: missing X required positional arguments.
Too many elements → TypeError: takes Y positional arguments but Z were given.
Unpacking does not perform any arithmetic; it merely passes the existing values.
🔁 Unpacking a Mapping with **
Mapping unpacking – using ** to expand a dictionary (or other mapping) into keyword arguments.
Dictionary keys must match the parameter names of the target function:
coins = {"galleons": 100, "sickles": 50, "knuts": 25}
total(**coins) # same as total(galleons=100, sickles=50, knuts=25)
Order is irrelevant because each key is paired with its name.
Limitations:
Sets are unordered; they cannot be unpacked with * or **.
Adding an extra key that the function does not expect raises TypeError: unexpected keyword
argument.
Useful when the data structure naturally stores named values (e.g., a wallet represented as a dict).
⚙️ Keyword Arguments & Default Values
Keyword argument – a value passed by explicitly naming the parameter (name=value).
Allows arguments to be supplied in any order:
total(galleons=100, knuts=25, sickles=50)
If the function definition supplies defaults (e.g., def total(g=0, s=0, k=0):), callers may omit those
arguments entirely.
When unpacking a dictionary, default values are respected only if the key is missing; otherwise the
supplied value overrides the default.
📚 Variable‑Length Argument Lists
*args – Arbitrary Positional Arguments
*args – collects any number of extra positional arguments into a tuple named args.
def f(*args):
print(args) # diagnostic: shows all received values
The function can be called with zero or more positional arguments: f(), f(1, 2), f(1, 2, 3, 4).
Inside the function, args behaves like an immutable sequence; you can iterate, slice, or count its items.
**kwargs – Arbitrary Keyword Arguments
**kwargs – gathers excess named arguments into a dictionary called kwargs.
def g(**kwargs):
print(kwargs) # shows {'name': 'Harry', 'house': 'Gryffindor'}
Enables functions to accept any combination of named parameters without prior knowledge of their
names.
kwargs can be forwarded to another function that also accepts **kwargs.
Combining Both
def h(*args, **kwargs):
print(args, kwargs)
Positional arguments are stored first; keyword arguments follow.
This pattern mirrors the signature of many built‑in functions (e.g., print).
🖨️ Built‑in Functions that Use *args / **kwargs
print is defined roughly as def print(*objects, sep=' ', end='\n', **kwargs):.
*objects lets you pass any number of values to be displayed.
sep and end are keyword arguments with default values; they can be overridden when calling
print.
print("Hello,", "world", sep="---", end="!")
# Output: Hello,---world!
The same * / ** pattern appears in many standard‑library functions (e.g., format, logging).
🧩 Practical Pitfalls & Error Messages
Situation Cause Typical Error
Passing a list directly (total(coins)) Function expects three separate TypeError: total() missing 2
integers, not a single list required positional arguments
Unpacking a list with the wrong Mismatch between number of TypeError: total() takes 3 positional
length elements and parameters arguments but 4 were given
Unpacking a dict with an extra key Function has no parameter TypeError: total() got an
matching the extra key unexpected keyword argument
'pennies'
Using * on an unordered collection Sets do not guarantee order, and * Runtime behavior unpredictable;
(set) expects a sequence generally avoided
When using variable‑length arguments, the responsibility to ensure the correct number and names of
arguments lies with the programmer.
🧪 Combining Unpacking with Default Parameters
def total(galleons=0, sickles=0, knuts=0):
return galleons*17 \times 29 + sickles*29 + knuts
With defaults, you can supply a partial list or dict:
coins = [100, 50] # missing knuts
print(total(*coins)) # knuts defaults to 0
wallet = {"galleons": 100, "knuts": 25}
print(total(**wallet)) # sickles defaults to 0
This flexibility reduces boiler‑plate while preserving safe fallbacks.
🗂️ Choosing the Right Data Structure for Argument Passing
Structure Best Use Unpacking Support
List / Tuple Ordered collection where position *seq
matters
Dictionary Named values, order optional **map
(Python 3.7+ preserves insertion)
Set Unordered, unique items only Not suitable for positional
unpacking
Custom Object Encapsulates behavior plus data Can define __iter__ for * or __dict__
for **
Prefer a dictionary when argument names matter; choose a list/tuple when the function’s signature is
purely positional.
📐 Summary of Unpacking Rules
*iterable expands an ordered collection into positional arguments.
**mapping expands a dictionary (or any mapping) into keyword arguments.
The number of elements must match the function’s expected parameters, unless defaults are defined.
*args gathers excess positional arguments into a tuple; **kwargs gathers excess keyword arguments
into a dict.
Unpacking works with any iterable that supports iteration (list, tuple, range, etc.) and any mapping
that provides a view of its items.
These mechanisms provide concise, readable ways to forward collections of values into functions, replace
repetitive indexing, and build flexible APIs that accept variable numbers of inputs.
📦 Unpacking Arguments with *args
Unpacking – the process of expanding an iterable (e.g., a list or tuple) so that each element becomes a
separate positional argument.
The syntax *variable placed in a function call tells Python to unpack the iterable.
Inside a function definition, a leading asterisk creates a parameter that gathers any number of
positional arguments into a tuple.
def yell(*words):
# `words` is a tuple containing all arguments passed to `yell`
print(*words) # unpack again when printing
Using *words makes yell behave like print: you can supply zero, one, or many arguments without
wrapping them in a list.
🪄 Applying *args to the yell Function
Original version required a list: yell(["CS50", "is", "awesome"]).
Refactored version:
def yell(*words):
uppercase = map(str.upper, words) # apply `upper` to each word
print(*uppercase) # unpack the resulting iterator
Benefits:
No need for the caller to build a list.
The function can accept any number of words, just like print.
📊 Mapping a Function with map
map(function, iterable) – returns an iterator that applies function to every item of iterable.
Example: convert each word to uppercase.
uppercase = map(str.upper, words) # `str.upper` is passed **by reference**, not called
map produces a lazy iterator; converting to a list (or unpacking) forces evaluation.
Works well when the transformation is simple and you want to keep the code short.
Approach How the transformation is When to prefer
expressed
for loop Explicit iteration, mutable list Complex logic, side‑effects
map Function reference applied Simple one‑liner transformations
automatically
List comprehension Inline expression inside [] Readability + one‑liner when the
result is needed as a list
🧩 List Comprehensions
List comprehension – a compact syntax for building a new list by applying an expression to each item of an
existing iterable.
uppercase = [word.upper() for word in words]
Structure:
[ for in ]
The expression is evaluated left‑to‑right; the for clause follows the expression.
Produces a list directly, no need for an explicit append.
Conditional List Comprehension
You can filter items while building the list:
gryffindors = [student["name"] for student in students
if student["house"] == "Gryffindor"]
The optional if clause appears after the for clause.
🛠️ Functional Tools: filter and lambda
filter
filter(function, iterable) – yields only those elements of iterable for which function returns a truthy value.
def is_gryffindor(student):
return student["house"] == "Gryffindor"
gryffindors = list(filter(is_gryffindor, students))
Like map, filter returns an iterator; wrap with list() or unpack to realize the result.
lambda (anonymous function)
When a function is needed only once, a lambda can replace a named definition.
gryffindors = list(filter(lambda s: s["house"] == "Gryffindor", students))
This removes the extra def is_gryffindor(...): boilerplate.
Feature Named function lambda
Reusability Yes (can be called elsewhere) No (inline, one‑off)
Verbosity More lines Concise, single expression
readability Clear when logic is complex Ideal for simple predicates
📚 Dictionary Comprehensions
Dictionary comprehension – similar to a list comprehension but creates a dictionary ({key: value, …}) in a single
expression.
gryffindor_dict = {student["name"]: "Gryffindor"
for student in students}
The outer braces indicate a dictionary, while the inner expression supplies key: value pairs.
Useful when you need a mapping rather than a list of objects.
Comparing List vs. Dictionary Comprehensions
Goal List comprehension Dictionary comprehension
Collect values only [value for ...] –
Build a mapping (key → value) – {key: value for ...}
Preserve order of insertion Yes Yes (ordered dict)
(Python 3.7+)
Duplicate keys Allowed (multiple entries) Later keys overwrite earlier ones
🔢 Enumerating Sequences with enumerate
enumerate(iterable, start=0) – yields pairs (index, element) while iterating, optionally starting the index at a
different number.
for i, student in enumerate(students, start=1):
print(i, student["name"])
Eliminates the need for range(len(...)) and manual indexing.
The start argument lets you display ranks that begin at 1 instead of 0.
🐑 Counting Sheep – Generating a Sequence
The final example introduces a small program that generates a series of values based on user input.
n = int(input("How many sheep? "))
for i in range(1, n + 1):
print(f"Sheep {i}")
Key points demonstrated:
input() returns a string; int() converts it to an integer.
range(start, stop) produces a lazy sequence of numbers; using start=1 aligns with natural counting.
The loop prints each sheep number, illustrating how a simple generator can replace a manual
“count‑the‑sheep” mental loop.
📐 Summary of Core Techniques
Technique Syntax Highlight Primary Use
Unpacking arguments *args / *iterable Flexible function signatures & calls
map map(func, iterable) Apply a single function to every
element
List comprehension [expr for x in seq] Build transformed lists concisely
Conditional list comprehension [expr for x in seq if cond] Filter while building
filter filter(pred, iterable) Keep only items satisfying a
predicate
lambda lambda args: expr Inline anonymous functions
Dictionary comprehension {k: v for x in seq} Construct mappings on the fly
enumerate enumerate(seq, start=1) Access index + value
simultaneously
Generator‑style loops for i in range(start, stop) Produce sequences without storing
them
These constructs together enable concise, readable, and memory‑efficient Python code, reducing boilerplate
loops and making functional patterns (map/filter/lambda) readily available alongside Pythonic comprehensions.
🐑 Looping Over Sheep with range()
range(stop) produces an immutable sequence of integers from 0 up to stop - 1.
Counting sheep from zero:
for i in range(n):
print(f"{i} sheep")
When n = 3 the output is
0 sheep
1 sheep
2 sheep
Increasing n (e.g., 10, 100, 10 000) simply adds more lines, but each iteration still executes the print
statement once.
🛠️ Refactoring Into a main() Function
main() – the conventional entry point that groups program logic, keeping the global scope clean.
Typical pattern:
def main():
# program logic here
...
if __name__ == "__main__":
main()
The if __name__ == "__main__": guard prevents main() from running when the file is imported as a
module.
Moving the sheep‑counting loop into main() does not change behaviour; it merely improves
organization.
📦 Helper Function for Sheep Generation
Initial helper that returns a single line
def sheep(i):
return f"{i} sheep"
main() can now call sheep(i) and print the result, keeping the counting logic separate from the
string‑building logic.
Returning the whole flock as a list
def flock(n):
flock = [] # empty list
for i in range(n):
flock.append(f"{i} sheep")
return flock
main() iterates over the returned list:
for s in flock(n):
print(s)
This works for modest values of n but creates a full list in memory before any output appears.
⚠️ Memory & Performance Pitfalls
Input size Behaviour Reason
≤ 10 000 All lines appear quickly (still List fits comfortably in RAM
acceptable)
≈ 1 000 000 Program stalls or crashes Building a list with a million strings
exhausts available memory and
CPU cycles
≫ 1 000 000 No output; process terminated by Memory over‑allocation leads to
the OS the interpreter being killed
The issue stems from eager evaluation: the entire flock is constructed before any print runs.
🚀 Introducing Generators & yield
Generator – a special type of iterator produced by a function that contains the yield keyword.
yield – pauses function execution, returns a single value, and remembers the current state for the next iteration.
Generator version of sheep
def sheep_generator(n):
for i in range(n):
yield f"{i} sheep"
sheep_generator returns an iterator that produces one string at a time.
main() can consume it exactly as it did a list:
for s in sheep_generator(n):
print(s)
Why yield solves the memory problem
Only one string lives in memory at any moment; the rest are created on demand.
CPU work is spread across iterations rather than concentrated in a massive list construction step.
Even with n = 1_000_000, the program continues to print lines without exhausting RAM.
🧩 How a Generator Works Under the Hood
1. First call to the generator creates a generator object but does not execute the body.
2. The surrounding for loop triggers the generator’s __next__() method.
3. Execution runs until it reaches yield, returns the yielded value, and suspends.
4. On the next __next__() call, execution resumes right after the yield, preserving local variables (i in this
case).
5. This cycle repeats until the loop finishes; a StopIteration exception signals the end of the iterator.
No explicit state‑management code is required; Python handles the “pause‑and‑resume” mechanics
automatically.
🔄 Comparing List‑Based vs. Generator‑Based Designs
Aspect List‑based flock Generator sheep_generator
Memory usage O(n) (stores all strings) O(1) (stores only current string)
When output starts After the entire list is built Immediately on first yield
Scalability Fails for very large n Handles arbitrarily large n (limited
by CPU time)
Code complexity Slightly simpler (just append) Requires yield but otherwise similar
⏹️ Controlling Long‑Running Generators
Even though a generator does not block on memory, printing millions of lines still takes time.
The user can interrupt the process with Ctrl + C, raising a KeyboardInterrupt and returning control to the
shell.
📚 Integrating Generators with Existing Best Practices
Modularization: Keep the generator in its own helper function (sheep_generator).
Testing: Unit tests can consume a finite slice of the generator (e.g., list(islice(sheep_generator(5), 5))) to
verify correctness without generating huge data sets.
Readability: The main() loop remains clean:
def main():
n = int(input("How many sheep? "))
for s in sheep_generator(n):
print(s)
This preserves the earlier lessons on separating concerns, naming functions descriptively, and avoiding
large side‑effects in a single block of code.
🎤 Quick Recap of Terminology
Iterator – an object that implements the __next__() method, yielding successive values until exhausted.
Generator function – a function that contains yield; calling it returns a generator (an iterator).
State retention – the ability of a generator to remember the values of its local variables between yields.
yield vs. return – return ends the function entirely; yield produces a temporary value and keeps the function
alive for subsequent calls.
🛠️ Example: Full Program Sketch (no extra explanation)
def sheep_generator(n):
for i in range(n):
yield f"{i} sheep"
def main():
n = int(input("How many sheep? "))
for s in sheep_generator(n):
print(s)
if __name__ == "__main__":
main()
This snippet incorporates the main guard, input conversion, generator, and iteration in a concise,
testable form.