Intro To Python FS2025
Intro To Python FS2025
Léo Picard
[Link]@[Link]
What’s Python?
Learning curve
− your past projects can still help you when you’re stuck
• My goal is to show you the basics and help you to become independent
Objectives
3. Scientific computing: Load datasets and work with them, plot data
Before we start
Sections
1 Set-up
Installation
Setting up your environment
2 Python essentials
Basics
Variables and data types
Operators and conditions
Loops
Functions
Exercises
3 Scientific computing
Accessing files
Packages
Loading a dataset
Summary statistics
Data manipulation
Plotting data
Exercise
Set-up
Installation
• As we want to use Python for scientific programming, you only have to install
"Anaconda": [Link]
→ Anaconda is a free distribution for Python which provides the core Python
package and the most popular scientific libraries
• We write and compile code ("scripts") in files with the following extension:
[Link]
Definition
The IDE (Integrated Development Environment) is the software we’re using to
run Python scripts
• Heavy but efficient: for big projects and software engineering (ex: VS Code...)
• Spyder is split into different "panes" which are sections providing us with
information or access to certain features. The most important are:
− The editor
− The console
Python essentials
Basics
• Using hashtags (#), we take notes ("comments") directly into the code
• I use the symbol > at the start of a line to show the result on the console
Note: Most IDEs have a color scheme to distinguish different elements of code
Variables
• Using the assignment operator "=", we give them names and values
• Variables can take different data types: numbers, text, they could be binary,
complex, numbers, contain a tuple, a list, even a dictionary!
• the variable explorer shows you the type of all variables you have created
Multiple assignment
# delete them
del number_1 , my_name , num_list
Numbers
• There are two different types of data representing numbers
− Integers (int): whole numbers (0, 1, 2, 5001, -9999)
Strings
• A string (str) is a series of characters
• Using F-strings, we can forward (enter) any variable value within a string
print (sent)
Booleans
• A boolean (bool) is a data type that has two possible values (True or False)
• But usually we get them from doing logical comparisons (ex: 2 == 3 → False)
boolname = False
print ( boolname )
> False
boolname = (5 ** 2 == 25)
print ( boolname )
> True
Lists
• You can modify any element by accessing its index (position in the list)
Important
The index numbering in Python starts at 0, not 1 (sorry Matlab users!)
> 5
> [1 ,4 ,7 ,8]
Lists
Method Description
[Link](i) Add the item i at the end of the list
[Link](x,i) Insert the item i at the index x
[Link](x) Remove the item at position x and return it
[Link](x) Return a copy of the list
[Link]() Sort all the items in the list (increasing by default)
List operations
Example Outcome
a = [1,2]; [Link](3) > a = [1,2,3]
a = [4,1,5,3]; b = [Link]();
[Link](); [Link](reverse = True) > a = [1, 3, 4, 5]; b = [5, 4, 3, 1]
Slicing lists
colors = ["red", " green ", "blue", " yellow "], print ( colors [1:3])
Dictionaries
dictname = {"BS": "Basel Stadt ", "GE": " Geneva ", "TI": " Ticino "}
print ( dictname ["BS"])
> {’BS ’: ’Basel Stadt ’, ’GE ’: ’Geneva ’, ’TI’: ’Ticino ’, ’ZG ’: ’Zug ’}
Dictionaries
Arithmetic Operators
− Substraction 30 − 20 10
* Multiplication 2 * 5 10
/ Division 6 / 2 3.0
% Modulus 10 % 4 2
** Exponent 2 ** 3 8
// Floor division 9 // 4 2
Comparison Operators
Logical Operations
Conditions
if y < x:
print ("y smaller than x")
else:
print ("y greater than x")
• the else block runs only if the condition is not satisfied (False)
• For more than two conditions, you can insert an elif ("else if") before else
• Be careful of indentation!
For loops
> 5
> 35
> 3
List comprehension
# we c a n e v e n add conditions
listname = [x for x in listname if x%2 == 0]
print ( listname )
• You need to name the current item (below, i), if you want to use its value
inside the loop
> 0 1.2
> 1 2.343
> 2 0.44
> [0, 2, 4, 6, 8]
While loops
i = 1
while i < 10:
print (i)
if i == 4:
break
i += 1 # e q u i v a l e n t t o i = i + 1
> 1
> 2
> 3
> 4
Functions
Definition
A function saves a specific task, to be executed upon calling its name
− A set of instructions
Functions
def fib(n):
"""
Print a Fibonacci series up t o n
"""
a, b = 0, 1
while a < n:
print (a, end = ’ ’)
a, b = b, a + b
fib (10)
> 0 1 1 2 3 5 8
Functions
Functions can return (return) an output, and store the result (if assigned to)
def squared (array ):
# Find the square of each element in a vector
output = []
for elem in array :
elem_squared = elem ** 2
output . append ( elem_squared )
return output
n = [2, 5, 10]
n_squared = squared (n)
print ( n_squared )
Lambda expressions
Functions can be time-wise inefficient for simple operations
Instead, we can use lambda expressions: (lambda x: operation)(value)
def simple_operation (x):
x_new = x ** 2 − 1
return x_new
> 99
> 99
3) Write a function that returns the square of all odds or even numbers between
0 and 20
Scientific computing
Paths
Definition
Your computer stores files in directories (folders), which can be accessed using
paths. The latter comes in different formats depending on your operating system.
Windows C:\Users\username\Desktop
MacOS /Users/username/Desktop
Linux (Ubuntu) /home/username/Desktop
Paths
• Not sure which one is it? Just type pwd ("print working directory") in the
console
• Paths can be absolute or relative
− Absolute paths refer to the entire path to your destination
Note: ".." refers to the parent directory (i.e., for going down the tree)
Packages
Definition
Packages are a collection of modules (Python files) that we import into our code.
They contain functions that serve a purpose, and are ready to be used.
• First, search a package name on the internet, find the command to install it
− [Link]
− [Link]
Packages
• They come in different versions, which can conflict with each other
• They need to be stored in a folder listed in the "$PATH" variable where Python
will look for them
Otherwise, here are nice tutorials on using Bash commands [LINK] and managing
the $PATH variable [LINK]
Packages
Finally, we import a package into our code using the keyword import
import numpy
Packages
import numpy as np
from numpy import cos , pi
> − 1.0
Some examples
• NumPy: Basic package for scientific computing. Very fast with mathematical
and matrix operations. You can create "ndarrays" which are flexible, efficient
and also faster than lists.
• SciPy: More advanced than Numpy (e.g. find the determinant or the inverse
of a matrix, solve linear equations).
• Matplotlib: Plotting data, with complete control over the outline of graphs.
Some examples
Loading a dataset
Loading a dataset
• Fast and efficient up to a few gigabytes of data (rule of thumb: 16Gb of RAM
works well for datasets < 1 or 2 Gb)
• If memory becomes scarce: look for alternatives like Dask, Modin, or Vaex
(many other packages exist)
Loading a dataset
Summary statistics
• Each column has its own data type, use [Link] in the console to see them
all at once
Mea Culpa
While I speak, I tend to use both Python and Stata notations (in parentheses)
Summary statistics
Summary statistics
Function Description
[Link] Show all data types
df["metaphor_score"].mean() Display the mean of the variable
df["metaphor_score"].std() Display the standard error
df["metaphor_score"].max() Display the maximum value (and so on)
df["metaphor_score"].describe() Display N, mean, std, p10, median...
df["arg1"].value_counts() Tabulate all values and frequencies
df["speaker"].unique() Look for duplicates
Here are nice websites to translate Stata [LINK] and R [LINK] commands into
Python
Data manipulation
# C r e a t e a new m e t a p h o r c o l u m n
df[" metaphor "] = df["arg0"] + " " + df["arg1"]
Apply
You can apply rule-based data manipulation with the function apply()
df[" gender_str "] = df. apply ( lambda x: recode_gender (x[" gender "]),
axis = 1)
Append
You can append (join) datasets based on columns with the function [Link]()
import os
import glob # t o s t o r e many f i l e names
import pandas as pd
df = pd. DataFrame () # c r e a t e s an e m p t y D a t a F r a m e
Merge
You can also merge (join) other information based on rows (e.g., political party)
Merge
− m:m = many-to-many
Collapse
print ( df_collapsed )
Reshape
Finally, we can reshape (rearrange rows and columns of) the dataset
df_collapsed [" statistic "] = " metaphor frequency "
print ( df_wide )
Plotting data
The easiest way to plot (visualize) data is using the Matplotlib package
[Link](x_vals , y_vals )
plt. ylabel ("y − axis")
plt. xlabel ("x − axis")
plt. savefig (" plot_example .png") # s a v e a s png
plt. savefig (" plot_example .pdf") # s a v e a s p d f
[Link] ()
Plotting data
4
y-axis
0
0 2 4 6 8 10
x-axis
Léo Picard Introduction to Python programming 63/79
Course outline Set-up Python essentials Scientific computing Asking for help Wrapping-up
Plotting data
Function Description
[Link]() Plot y versus x as lines and/or markers
[Link]() Set the label for the y-axis
[Link]() Set the label for the x-axis
[Link]() Method to get or set some axis properties
[Link]() Set a title for the axes
[Link]() A scatter plot of y vs x
[Link]() Make a bar plot
[Link]() Create a new figure
[Link]() Add a centered title to the figure
[Link]() Add a subplot to the current figure
[Link]() Display the figure
Plotting data
Plotting data
8000
Men
7000 Women
6000
Metaphor frequency
5000
4000
3000
2000
1000
0
Democrat Republican
Party
Example
Set-up
Metaphor frequency
0
5
10
15
20
25
South
North Dakota
Dakota
Mississippi
Illinois
Kansas
Wyoming
Jersey
Python essentials
NewVirginia
Oklahoma
Colorado
Pennsylvania
Alaska
Tennessee
Nevada
Florida
Hawaii
Massachusetts
Oregon
South Carolina
Léo Picard
Connecticut
Missouri
Louisiana
California
Indiana
State
Iowa
Idaho
WashingtonUtah
New York
Scientific computing
Michigan
New Hampshire
Delaware
Minnesota
Ohio
Vermont
Nebraska
Island
RhodeArizona
Introduction to Python programming
Maine
Arkansas
Georgia
Wisconsin
Kentucky
NewAlabama
West Mexico
Virginia
Maryland
Asking for help
68/79
Wrapping-up
Course outline Set-up Python essentials Scientific computing Asking for help Wrapping-up
The documentation: Every package comes with a document for each function,
containing information on:
• What the function does
• The full list of arguments, what they are, their default value
Search engines: Another way to find answers (tutorials, videos, short courses)
• Lot of content, but very few is applicable to your own special question
Friends and university staff: sharing your questions with someone also helps:
• Your interlocutors may learn from your questions too
• "I tried something but my code doesn’t give me the expected result"
− Be careful of copy and pasting things online, review your code
− If not, troubleshoot your code: follow what it does line by line and verify that is
gives you what you want using a simple model (e.g. fake data)
• "I tried something but my code doesn’t give me the expected result"
− Be careful of copy and pasting things online, review your code
− If not, troubleshoot your code: follow what it does line by line and verify that is
gives you what you want using a simple model (e.g. fake data)
• "I tried something but my code doesn’t give me the expected result"
− Be careful of copy and pasting things online, review your code
− If not, troubleshoot your code: follow what it does line by line and verify that is
gives you what you want using a simple model (e.g. fake data)
• "I tried something but my code doesn’t give me the expected result"
− Be careful of copy and pasting things online, review your code
− If not, troubleshoot your code: follow what it does line by line and verify that is
gives you what you want using a simple model (e.g. fake data)
− If the problem lies inside a loop, try to solve it outside of the loop
→ General rule: try to break down the problem: identify the source and make it
run alone, then add it back to your code
− If the problem lies inside a loop, try to solve it outside of the loop
→ General rule: try to break down the problem: identify the source and make it
run alone, then add it back to your code
− If the problem lies inside a loop, try to solve it outside of the loop
→ General rule: try to break down the problem: identify the source and make it
run alone, then add it back to your code
• Do not ask a question before checking if it has already been answered before
• Only after, ask your question in the clearest and shortest way
− Focus on what you don’t know, skip all the details that you know how to do
− End your post by writing what the outcome should look like
• [Link]
building-a-packed-and-building-the-structure
• [Link]
how-to-skip-2-data-index-array-on-numpy
• Response times are unpredictable: you could get an answer within minutes,
but it can also take hours, even days... Sadly may also wait for nothing :-(
• People won’t always be nice to you (no need to say "hi" and "thanks" too)
• People might misunderstand your question, or tell you why you shouldn’t do it
this way
• People might give you a solution that works for the example you’ve laid out to
them, but not on your entire dataset (incomplete representation of the data,
issues of scale...)
Wrapping-up
Wrapping-up
Questions, remarks?
[Link]@[Link]