0% found this document useful (0 votes)
24 views50 pages

Glossary

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views50 pages

Glossary

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 50

COURSE 1: FOUNDATIONS OF DATA SCIENCE

MODULE 1
INTRODUCTION TO DATA SCIENCE CONCEPTS

What is data science vs. data analytics


Data science is an entire field dedicated to making data more useful. A data scientist is a professional that
uses raw data to develop new ways to model data and understand the unknown.
Data analytics is a subfield of the larger data science discipline. The aim of data analytics is to create
methods to capture, process, and organize data to uncover actionable insights for current problems. Analysts
focus on processing the information stored in existing datasets and establishing the best way to present this
data. Data analysts rely on statistics and data modeling to solve problems and offer recommendations that
can lead to immediate improvements.

The connections between data science and data analytics


Data science and data analytics share a fundamental goal: discover insights that can be used to lead an
organization to improve and grow. They are closely connected with information gathered through
interactions within the measurable world.

The data career space over time


1965
1985

2005
Hiện tại
Explore your data toolbox
Glossary terms from module 1
Data professional: Any individual who works with data and/or has data skills
Data science: The discipline of making data useful
Data stewardship: The practices of an organization that ensure that data is accessible, usable, and safe
Edge computing: A way of distributing computational tasks over a bunch of nearby processors (i.e.,
computers) that is good for speed and resiliency and does not depend on a single source of computational
power
Jupyter Notebook: An open-source web application used to create and share documents that contain live
code, equations, visualizations, and narrative text
Machine learning: The use and development of algorithms and statistical models to teach computer systems
to analyze patterns in data
Metrics: Methods and criteria used to evaluate data
Python: A general-purpose programming language
MODULE 2
THE IMPACT OF DATA TODAY

Data-driven careers drive modern business


Data professionals are highly valuable: They identify crucial data streams for business projects and
challenges, set future goals, and enable organizations to take meaningful actions by re-imagining processes
and improving operations.

2 categories of data careers: Technical and Strategic.

Technical data professionals:


Expertise in mathematics, statistics, and computing (Technical data professionals are expert data analysts,
machine learning engineers, and statisticians).
Transform raw data into useful information
Build models and make predictions.
Explore datasets

Strategic data professionals:


Interpret information that affects an organization's operations, finance, research and development.
Work aligns with business strategy
Business intelligence professionals and technical project managers are strategic data professionals. In their
roles, strategic data professionals maximize information to guide how a business works.

Data professionals helps:

 Industries: Finance, Health Care, Manufacturing and Agriculture.


 Finance: Assess risks, Monitor market trends, Reduce fraud, Create a more stable financial system.
 Health Care: Process clinical data, Supports early detection and More precise diagnosis.
 Manufacturing: Predict when to perform preventative maintenance, Maximize quality assurance and
defect tracking, and Respond to logistical issues.
 Agriculture: Develop new approach and Improving harvesting technologies.

Where data makes a difference for the future


Industry Overview How data is used

App-driven business Facilitates users acquiring, providing, or  Maintaining functioning mobile


applications
 Delivering customized content
sharing access to goods and services, based on user history including
(sharing economy
often through online or app-based discounts
service)
communities  Using machine learning models
to send notifications at key times or
even locations

 Gaining greater control over


their supply chains
Includes industries associated with the  Improving production line
Automotive production, wholesaling, retailing, and performance, and designing new and
maintenance of motor vehicles more efficient vehicles
 Enhancing vehicle safety and
new features

 Locating weak points within


networks and systems using
predictive analytics
Protects networks, devices, and data
 Defending against security
from unauthorized access or criminal use
attacks
Cybersecurity and the practice of maintaining
 Detecting data breaches through
confidentiality, integrity, and availability
logic, models, and data tools
of information
 Improving the ability to identify
attacks and respond to them with
Artificial Intelligence (AI)

 Translating customer interaction


into actionable business data
 Predicting user behaviors to
Assists in advertising and promotional personalize content and offers
Digital marketing efforts of companies using the internet  Identifying patterns and trends
and online technologies that guide innovations
 Determining the return on
investment (ROI) of marketing
efforts

Energy Includes companies that explore,  Analyzing real-time data from


produce, refine, market, store, and power systems and monitoring
transport both renewable and non- devices
renewable energy resources  Optimizing technologies,
monitoring power grids, and
predicting failures
 Preventing accidents and
malfunctions

 Designing world-building and


character creation systems
 Monitoring character
engagement and how the
environment reacts to player input
Hosts an estimated 2.7 billion gamers
 Optimizing game-play by
Gaming worldwide, facilitating the interaction of
identifying potential new features or
players across the globe
upgrades
 Regulating in-game purchases
and fraud detection systems
 Personalizing marketing
campaigns

 Analyzing and monitoring user


interactions to better understand
Provides access to live and recorded customer sentiment
Streaming media and content on-demand, delivered via the  Matching users with advertisers
entertainment internet to computers, smart devices, and with real-time analytics
mobile devices  Guiding future content decisions
 Personalizing marketing
campaigns

 Assisting the deployment,


optimization, and predictive
maintenance of telecommunications
Primarily involves operating and networks
providing access to facilities for the  Optimizing pricing models
Telecommunications
transmission of voice, data, text, sound,  Targeting advertisement and
and video incentive campaigns, as well as
detecting fraudulent activity
 Analyzing customer data to
customize subscriber plans

Travel and tourism Encompasses a variety of services from  Marketing to individuals based
transportation, accommodations, on their previous travel or searched
attractions, booking, and much more destinations
 Directing machine learning
systems that can adjust a traveler’s
itinerary based on set factors
including weather and availability
 Generating recommendations
based on personal preferences and
location-based discounts
 Managing reservations and
processing transactions

Leverage data analysis in nonprofits


Nonprofit groups are created to further a social cause or provide benefit to the public.
Main purpose: Foster a collective, public or social advantage.

Open data is available to the public for free and includes guidance for navigating the datasets and
acknowledging the source.

A hackathon is an event where data professionals and programmers come together and collaborate on a
particular project.

Examples:

 A US charity uses census data to identify neighborhoods with the most school-age children in need of
bicycles.
 DataKind analyzes the cost of environmental cleanup in underserved communities.
 Hackathons develop tools for predicting extreme weather events and improving reading skills for
elementary school students.

Key points:

 Non-profits use data to guide decision-making, answer questions, and solve problems.
 Data can help non-profits identify areas of need and allocate resources effectively.
 Non-profits can access open data from public entities and government agencies.
 Data volunteers can contribute to projects that benefit communities around the world.
 Hackathons provide opportunities to collaborate on data-driven solutions for social good.

Explore: The data career neighborhood


Transportation
Ways data is collected:

 Traffic sensors and cameras


 GPS and mapping apps
 Ride sharing apps
How data professionals use the data:
 Create predictive analytics models
 Analyze the best way to get from one point to the next
 Determine the impact of development projects

Healthcare
Ways data is collected:

 Electronic health records


 Personal health tracking devices
 Clinical trial data
How data professionals use the data:

 Analyze medical imaging


 Predict genetic factors
 Create machine learning experiments to speed development of treatments

Finance
Ways data is collected:

 Real-time analytics of spending activity


 Apps for banking and investing
 Electronic payments
How data professionals use the data:

 Identify fraudulent behavior


 Facilitate payments
 Provide risk management solutions

Retail
Ways data is collected:

 Sales data (online or in store)


 Retailer apps
 Customer loyalty programs
How data professionals use the data:

 Recommend products
 Optimize inventory and pricing
 Market to individual customers
Restaurants
Ways data is collected:

 Refrigerator temperature monitoring


 Reservation apps for customers
 Marketing/promotion responses
How data professionals use the data:

 Monitor ingredients during storage


 Manage inventory and supply chain
 Anticipate staffing needs
 Gain feedback from customers

Utilities
Ways data is collected:

 Sensors in pipeline/equipment deliver real-time data


 Drone data
 Interactive meters
How data professionals use the data:

 Access usage data by both utility and customer


 Improve detection of risks within physical systems
 Reduce service interruptions

The top skills needed for a data career


Data professionals combine a knowledge about how to do practical tasks with an awareness of what makes
communication and collaboration successful.
Interpersonal skills: Traits that focus on communicating and building relationships.
Active listening: Allowing team members, bosses, and other collaborative stakeholders, to share their own
points of view before offering responses.
Data cleaning: The process of formatting data and removing unwanted material.

Important ethical considerations for data professionals


Personally Identifiable Information (PII): Information that permits the identity of an individual to be inferred
by either direct or indirect means. This includes things like biometric records, usernames, and Social
Security or national identification numbers.
Example: Biometric records, Usernames, and Social Security or National Identification Numbers.
Aggregate information: Data from a significant number of users that has eliminated personal information.

A key thing to keep in mind is that data gathering is a task managed by humans, and that process can be
informed by different backgrounds, experiences, beliefs, and worldviews. These and other types of biases
can affect the way that data is communicated and how the results are shared, which in turn can have an
impact on business decisions.

Sample: A segment of a population that is representative of that entire population.

Critical data security and privacy principles


Data privacy means preserving a data subject’s information and activity any time a data transaction occurs.
This is also called information privacy or data protection.
Data privacy is concerned with the access, use, and collection of personal data. For the people whose data is
being collected, this means they have the right to:

 Protection from unauthorized access to their private data


 Freedom from inappropriate use of their data
 The right to inspect, update, or correct their data
 Ability to give consent to data collection
 Legal right to access the data

Data anonymization is the process of protecting people's private or sensitive data by eliminating PII.
Data anonymization involves blanking, hashing, or masking personal information, often by using fixed-
length codes to represent data columns, or hiding data with altered values.
Data anonymization is used in just about every industry.
This data might include:

 Telephone numbers
 Names
 License plates and license numbers
 Social security numbers
 IP addresses
 Medical records
 Email addresses
 Photographs
 Account numbers
The practices and principles of good data stewardship
Data stewardship:
Ensuring the quality, integrity, accessibility, and security of data.
Making data stewardship a normal part of your work habits benefits everyone who relies on your analysis.
Best practices for data stewardship include respecting privacy, being cautious of unintentional harm,
avoiding creating or reinforcing bias, considering inclusivity, and upholding high standards of scientific
excellence.

Data ethics:
Addressing ethical concerns such as bias, user data protection, and personally identifiable information (PII).
Seeking guidance and support from online communities of data professionals can be helpful when facing
ethical dilemmas.
The reading provides examples of ethical scenarios and conversations to illustrate these concepts.

The data professional career space


Data tasks and responsibilities are dependent on an organization's data, team structure and how they make
use of insights and analytics.
Some organizations choose to be very specific with responsibilities, others leave job tasks quite broad in
scope.
 This program refers to the field as a career space.

The most common titles: Data Analyst and Data Scientist.

Data professional responsibilities:

 Look for patterns and trends within big datasets


 Uncover the stories inside data
 Help guide decision making
 Translate key information into visualizations

Roles that use data and analytical skills

 Data Engineer
 Insights or Analytics Team Manager
 Business Intelligence Engineer or Analyst
Data Engineer responsibilities

 Make data accessible


 Ensure data ecosystem produces reliable results
 Deal with infrastructure for data across enterprise
 For designing, building, and maintaining an organization's data infrastructure. They also develop,
construct, test, and maintain architectures and large-scale processing systems.

Insights or Analytics Team Manager responsibilities

 Supervise analytical strategy of organization


 Manage multiple groups

Business Intelligence Engineer or Analyst responsibilities

 A highly strategic, focused on organizing information and making it accessible.

Module 2 challenge
A good sample is a segment of a population that is representative of what? The entire population
At a business, who is responsible for ensuring socially beneficial and inclusive practices, applying scientific
and ethical principles, and staying aware of possible bias? All data professionals
A data professional considers ways that their personal beliefs may inadvertently affect their work. They
establish processes to ensure they collect and communicate sensitive information impartially. What does this
scenario describe? Avoiding subtle biases in data work

Glossary terms from module 2


Aggregate information: Data from a significant number of users that has eliminated personal information
Artificial intelligence (AI): Refers to computer systems able to perform tasks that normally require human
intelligence
Data anonymization: The process of protecting people's private or sensitive data by eliminating PII
Data stewardship: The practices of an organization that ensure that data is accessible, usable, and safe
Edge computing: A way of distributing computational tasks over a bunch of nearby processors (i.e.,
computers) that is good for speed and resiliency and does not depend on a single source of computational
power
Hackathon: An event where programmers and data professionals come together and work on a project
Nonprofit: A group organized for purposes other than generating profit; often aims to further a social cause
or provide a benefit to the public
Open data: Data that is available to the public and free to use, with guidance on how to navigate the
datasets and acknowledge the source
Personally identifiable information (PII): Information that permits the identity of an individual to be
inferred by either direct or indirect means
Sample: A segment of a population, often used to infer parameters of the whole population

Terms and definitions from the previous module


Data professional: Any individual who works with data and/or has data skills
Data science: The discipline of making data useful
Data stewardship: The practices of an organization that ensure that data is accessible, usable, and safe
Edge computing: A way of distributing computational tasks over a bunch of nearby processors (i.e.,
computers) that is good for speed and resiliency and does not depend on a single source of computational
power
Jupyter Notebook: An open-source web application used to create and share documents that contain live
code, equations, visualizations, and narrative text
Machine learning: The use and development of algorithms and statistical models to teach computer
systems to analyze patterns in data
Metrics: Methods and criteria used to evaluate data
Python: A general-purpose programming language
Tableau: A business intelligence and analytics platform that helps people visualize, understand, and make
decisions with data
MODULE 3
YOUR CAREER AS A DATA PROFESSIONAL

Terms and definitions from Course 1, Module 3


Active listening: Refers to allowing team members, leadership, and other collaborative stakeholders to
share their own points of view before offering responses
Analytics Team Manager: A data professional who supervises analytical strategy for an organization, often
managing multiple groups
Business Intelligence Analyst: (Refer to Business Intelligence Engineer)
Business Intelligence Engineer: A data professional who uses their knowledge of business trends and
databases to organize information and make it accessible; also referred to as a Business Intelligence Analyst
Chief Data Officer: An executive-level data professional who is responsible for the consistency, accuracy,
relevancy, interpretability, and reliability of the data a team provides
Data cleaning: The process of formatting data and removing unwanted material
Data Engineer: A data professional who makes data accessible, ensures data ecosystems offer reliable
results, and manages infrastructure for data across enterprises
Data Scientist: A data professional who works closely with analytics to provide meaningful insights that
help improve current business operations
Interpersonal skills: Traits that focus on communicating and building relationships
Mentor: Someone who shares knowledge, skills, and experience to help another grow both professionally
and personally
RACI chart: A visual that helps to define roles and responsibilities for individuals or teams to ensure work
gets done efficiently; lists who is responsible, accountable, consulted, and informed for project tasks

Terms and definitions from previous modules


Aggregate information: Data from a significant number of users that has eliminated personal information
Artificial intelligence (AI): Refers to computer systems able to perform tasks that normally require human
intelligence
Data anonymization: The process of protecting people's private or sensitive data by eliminating PII
Data professional: Any individual who works with data and/or has data skills
Data science: The discipline of making data useful
Data stewardship: The practices of an organization that ensure that data is accessible, usable, and safe
Edge computing: A way of distributing computational tasks over a bunch of nearby processors (i.e.,
computers) that is good for speed and resiliency and does not depend on a single source of computational
power
Hackathon: An event where programmers and data professionals come together and work on a project
Jupyter Notebook: An open-source web application used to create and share documents that contain live
code, equations, visualizations, and narrative text
Machine learning: The use and development of algorithms and statistical models to teach computer
systems to analyze patterns in data
Metrics: Methods and criteria used to evaluate data
Nonprofit: A group organized for purposes other than generating profit; often aims to further a social cause
or provide a benefit to the public
Open data: Data that is available to the public and free to use, with guidance on how to navigate the
datasets and acknowledge the source
Personally identifiable information (PII): Information that permits the identity of an individual to be
inferred by either direct or indirect means
Python: A general-purpose programming language
Sample: A segment of a population that is representative of the entire population
Tableau: A business intelligence and analytics platform that helps people visualize, understand, and make
decisions with data
MODULE 4
DATA APPLICATIONS AND WORKFLOW
COURSE 2: GET STARTED WITH PYTHON

MODULE 1
HELLO, PYTHON

Introduction to Python
Programming languages: The words and symbols that we use to write instructions for computers to follow.
What is Python? Python is a powerful and versatile coding language that has become popular among data
professionals, scientists, and web developers.
Why is Python so popular? It's easy to learn, versatile, and powerful. It also has a large and active
community of users who are willing to help and provide support.
Can use Python to: Analyze data, Build websites, Automate tasks, Create games,...
In Python, a library is a reusable collection of code

Discover more about Python


Python is a high-level programming language: This means it uses human-friendly syntax and resembles
spoken language, making it easier to learn and understand.
Python fundamentals: The lecture introduces basic concepts like printing to the console, performing
computations, assigning variables, and evaluating statements. In fact, the print function will output whatever
we enter in its parentheses.
Operators: Python uses various operators like + for addition, / for division, and ** for exponents.
Conditional statements: Python allows you to check conditions using if statements and perform actions
based on the outcome.
Loops: Python provides looping constructs like for loops to iterate over elements in a sequence and perform
actions on each element.
Functions: Functions are reusable blocks of code that can be called with arguments to perform specific tasks.
Libraries: Python has a rich library of built-in functions for common tasks, like sorting a list using the
sorted() function.
Python's potential: The lecture emphasizes the power and versatility of Python, highlighting its potential to
create complex algorithms and programs.

Python versus other programming languages


Five considerations of programming languages:
Speed
There are many factors that contribute to the speed of a program’s execution, including compile time,
runtime, hardware, installed dependencies, and the efficiency of the code itself. In general, low-level
programming languages are faster, but they’re more difficult to learn and work with.

Approachability
Approachability refers to how easy it is for new learners to start using a language. Learning new
programming languages can be challenging depending on their syntax and overall structure. The syntax is
the structure of code words, symbols, placement, and punctuation. Semantics builds meaning into those
structures by using variables and objects. Additionally, those variables help add flexibility to the programs
and objects where data is housed.

Variables
Information in code is stored in variables. A variable is a named container which stores values in a reserved
location in the computer’s memory. The way a programming language uses variables will have an effect on
a system's core operations or kernel speed. Some languages use static variables to maintain a value
throughout the entire run of a program. Others approach variables as dynamic, allowing values to be
determined when a program is run. Some languages even allow declarative variables, which enable a
program to determine where a variable should be placed.

Data science focus


Programming languages have individual characteristics and can better serve different tasks in data analysis;
this means programmers often use them for specific data science tasks.

Programming paradigm
Programming languages can be object-oriented, functional, or imperative. Object-oriented programming
languages are modeled around data objects. Functional programming languages are modeled around
functions. Imperative languages are modeled around code statements that can alter the state of the program
itself.

Programming language comparisons:


Features by Python R Java C++
Software

Speed Slower Depends on Faster Very fast


configuration and add-
ons

Approachability Easy to learn Complex Easy to learn Complex


Variable Dynamic Dynamic Static Declarative

Data science Machine Exploratory data Used across Not as widely used but
focus learning and analysis and building projects with very powerful
automated extensive statistical open-source implementations
analysis libraries assets

Programming Object-oriented Functional language Object- Multi-paradigm


Paradigm oriented (imperative & object-
oriented)

Jupyter Notebooks
Jupyter Notebook is a popular platform for data professionals.
Jupyter Notebook is an open-source web application for creating and sharing documents containing live
code, mathematical formulas, visualizations, and text.
Jupyter Notebook lets you collaborate on data projects and integrate code. Plus, it puts all of your output in
one document.

Object-oriented programming
What is object-oriented programming (OOP)?
A programming system based around objects, which contain both data and code that manipulates that data.
An object is an instance of a class. Think of it like a fundamental building block of Python.
Makes code more organized, accessible, and reusable.

Examples of OOP in Python:


Lists, functions, strings are all objects.
Dataframes are custom-built classes with attributes and methods.

The most important concept in object-oriented programming:


A class: An object's data type that bundles data and functionality together.
Objects: Instances of classes, containing data and methods.
A method: A function that belongs to a class and typically performs an action or operation. They use
parentheses.
Dot notation: How to access the methods and attributes that belong to an instance of a class.
Attributes: A values associated with an objects or class which are referenced by name using dot notation.
They don't use parentheses. Attributes are especially important for custom-built classes and more complex
data structures, like dataframes.
The core classes of Python are: Integers, Floats, Strings, Booleans, Lists, Dictionaries, Tuples, Sets,
Frozensets, Functions, Ranges, and None,

Benefits of OOP:
Modular code: Easier to understand and maintain.
Reusability: Code can be reused in different parts of a program.
Extensibility: New functionality can be added easily.

More about object-oriented programming


For example, if the class were Spaceship, then attributes might be:
name
kind
speed
tractor_beam

These attributes could be accessed by typing:


Spaceship.name
Spaceship.kind
Spaceship.speed
Spaceship.tractor_beam

Notice that these characteristics are accessed using only a dot.

On the other hand, methods of the Spaceship class might be:


warp()
tractor()

These methods could be used by typing:


Spaceship.warp()
Spaceship.tractor()
Notice that methods are followed by parentheses, and it’s possible for them to take arguments. For example,
Spaceship.warp(7) could change the speed of the ship to warp seven.

Defining classes with unique attributes and methods:


Variables and data types
Variables give meaning to code. Think about nouns in language; nouns are used to identify people, places,
or things in a sentence. Variables in Python are like nouns. Variables point to values. They're not the values
themselves.
Example: X = 3, X is the variable, and its stored value is 3.

A data type: An attribute that describes a piece of data based on its values, its programming language, or the
operations it can perform.

When assigning a new variable, it's helpful to answer these questions before you code:
What's the variable's name?
What's the variable's type?
What's the variable's starting value?

An assignment: The process of storing a value inside a variable.

An expression: A combination of numbers, symbols, or other variables that produce a result when evaluated.

Dynamic typing: Variables can point to objects of any data type.


Create precise variable names
Naming conventions: Consistent guidelines that describe the content, creation date, and version of a file in
its name.

Naming restrictions: Rules built into the syntax of the language itself that must be followed.

A keyword: A special word that is reserved for a specific purpose and that can only be used for that purpose.

Variable naming conventions:


Don’t use reserved keywords like or, in, if, else,...
Don’t use reserved functions such as print, str,...

Naming restrictions for variables:


Only include letters, numbers and underscores.
Must start with a letter or underscores.
Case – sensitive.
Can’t include parentheses.

Example variable names:

Explore Python syntax


The Language of Python
Variables: Represent data stored as strings, tuples, dictionaries, lists, and objects (note: future readings
explain these categories). Example: student_name
Keywords: Special words that are reserved for specific purposes and that can only be used for those
purposes.
Examples:
in
not
or
for
while
return

Operators: Symbols that perform operations on objects and values


Examples:
+ Addition
- Subtraction
* Multiplication
/ Division
** Exponentiation
% Modulo (returns the remainder after a division). Example: 10 % 3 = 1
// Floor division (divides the first operand by the second operand and rounds the result down to the nearest
integer. Example: 5 // 2 = 2
> Greater than (returns a Boolean of whether the left operand is greater than the right operand)
< Less than (returns a Boolean of whether the left operand is less than the right operand)
== Equality (returns a Boolean of whether the left operand is equal to the right operand)

Expressions: A combination of numbers, symbols, and variables to compute and return a result upon
evaluation. Example: [1, 2, 3] + [2, 4, 6]

Functions: A group of related statements to perform a task and return a value


Example:
Conditional statements: Sections of code that direct program execution based on specified conditions.
Example:

Naming rules and conventions:


When assigning names to objects, programmers adhere to a set of rules and conventions which help to
standardize code and make it more accessible to everyone. Here are some naming rules and conventions that
you should know:

 Names cannot contain spaces.


 Names may be a mixture of upper and lower case characters.
 Names can’t start with a number but may contain numbers after the first character.
 Variable names and function names should be written in snake_case, which means that all letters are
lowercase and words are separated using an underscore.
 Descriptive names are better than cryptic abbreviations because they help other programmers (and
you) read and interpret your code. For example, student_name is better than sn. It may feel excessive
when you write it, but when you return to your code you’ll find it much easier to understand.

Tim Peters, a Python programmer, wrote this now-famous “poem” of guiding principles for coding in
Python:
The Zen of Python
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one—and preferably only one—obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Data types and conversions


Data types are the different categories of data that Python can work with.

Strings: A sequences of characters and punctuation that contain textual information. They are immutable,
meaning they cannot be changed after they are created.

Immutable data type: A data type in which the values can never be altered or updated.

Integer: A data type used to represent whole numbers without fractions.


Float data types: A data type that represent numbers that contain decimals.

Type function: You can use the type() function to determine the data type of a variable.

Example:
Implicit conversion: Python automatically converts one data type to another without user involvement.
Explicit conversion: Users convert the data type of an object to a required data type. You can use the int(),
float(), and str() functions to explicitly convert data types.
Debugging: Debugging is the process of finding and fixing errors in your code.
Data community: There are many online resources and communities where you can find help with
debugging and other coding challenges.

Terms and definitions from Course 2, Module 1


Argument: Information given to a function in its parentheses
Assignment: The process of storing a value in a variable
Attribute: A value associated with an object or class which is referenced by name using dot notation
Cells: The modular code input and output fields into which Jupyter Notebooks are partitioned
Class: An object’s data type that bundles data and functionality together
Computer programming: The process of giving instructions to a computer to perform an action or set of
actions
Data type: An attribute that describes a piece of data based on its values, its programming language, or the
operations it can perform
Dot notation: How to access the methods and attributes that belong to an instance of a class
Dynamic typing: Variables that can point to objects of any data type
Explicit conversion: The process of converting a data type of an object to a required data type
Expression: A combination of numbers, symbols, or other variables that produce a result when evaluated
Float: A data type that represents numbers that contain decimals
Immutable data type: A data type in which the values can never be altered or updated
Implicit conversion: The process Python uses to automatically convert one data type to another without
user involvement
Integer: A data type used to represent whole numbers without fractions
Jupyter Notebook: An open-source web application for creating and sharing documents containing live
code, mathematical formulas, visualizations, and text
Keyword: A special word in a programming language that is reserved for a specific purpose and that can
only be used for that purpose
Markdown: A markup language that lets the user write formatted text in a coding environment or plain-text
editor
Method: A function that belongs to a class and typically performs an action or operation
Naming conventions: Consistent guidelines that describe the content, creation date, and version of a file in
its name
Naming restrictions: Rules built into the syntax of a programming language
Object: An instance of a class; a fundamental building block of Python
Object-oriented programming: A programming system that is based around objects which can contain
both data and code that manipulates that data
Programming languages: The words and symbols used to write instructions for computers to follow
String: A sequence of characters and punctuation that contains textual information
Syntax: The structure of code words, symbols, placement, and punctuation
Typecasting: Converting data from one type to another (see explicit conversion)
Variable: A named container which stores values in a reserved location in the computer’s memory
MODULE 2
FUNCTIONS AND CONDITIONAL STATEMENTS

Define functions and returning values


A function is a body of reusable code for performing specific processes or tasks.

def: A keyword that defines a function at the start of the function block.

Ex:

Return: A reserved keyword in Python that makes a function do work to produce new results, ưhich are
saved for later use.

Ex:

Reusability: Defining code once and using it many times without having to rewrite it.

Ex:

Write clean code


Modularity: The ability to write code in separate components that work together and that can be reused for
other programs.
Python’s reusability feature enables data professionals to define code once, then use it many times without
having to rewrite it.

Use comments to scaffold your code


Algorithm: A set of instructions for solving a problem, or accomplishing a task.

Question: Oops! The current code calculates the square of a sum instead of the desired total area.
Here is the corrected code. It calculates the area of the grass border separately using (fountain_side + 2 *
grass_width) * 4 * grass_width. This expression considers the length and width of the grass border around
the fountain. Then, the total area is calculated by adding the fountain area and the grass border area.
def seed_calculator(fountain_side, grass_width):
"""
Calculate number of kilograms of grass seed needed for
a border around a square fountain.

Parameters:
fountain_side (num): length of 1 side of fountain in meters
grass_width (num): width of grass border in meters

Returns:
seed (float): amount of seed (kg) needed for grass border
"""

# Area of fountain
fountain_area = fountain_side**2

# Area of grass border (corrected)


grass_border_area = (fountain_side + 2 * grass_width) * 4 * grass_width

# Total area (corrected)


total_area = fountain_area + grass_border_area

# Amount of seed needed (35 g/sq.m)


seed_needed_per_area = 0.035 # Convert to kg/sq.m for easier calculation
seed = total_area * seed_needed_per_area

return seed

Docstring: A string at the beginning of a function's body, that summarizes the function's behavior, and
explains its arguments and return values.

Lines of code that begin with a hashtag serve as comments and don’t get executed.

Data professionals use comments for their Python code to provide helpful explanations, outline the steps of a
process, and document their work for teammates.

Reference guide: Functions

Reference guide: Functions

As you’ve been learning, functions are bodies of reusable code for performing specific processes or
tasks. They help you do more work with less code. Function examples include:
- A specific calculation or measurement, such as converting Fahrenheit to Celsius
- An inventory utility to iterate quantities and calculate the total cost of goods in stock
- Building a DataFrame from a series or dictionary data
- An application utility such as a spell checker
In this reading, you will learn how to define, build, and call functions.

Function syntax
Define functions using the following syntax and format:
1. Begin with the def keyword followed by the function’s name, then put its parameters/arguments in
parentheses, ending with a colon.
a) Python convention is to use snake_case (lowercase words separated by underscores) for function
names.
2. For important functions or functions whose purposes or operations are not very obvious, include a
docstring. Write the docstring between three opening and closing quotation marks.
a) The docstring should be in the form of a command (e.g., “Add two numbers” as opposed to “Adds
two numbers”).
b) The docstring should summarize the function’s behavior and explain its arguments and return
values.
c) The docstring should be indented four spaces from the definition statement.
3. Write the body of the function.
a) All code should be indented at least four spaces from the definition statement, but there can be
many levels of indentation depending on the complexity of the code.
4. Finally, use a return statement to return a value or a print statement to print something to the
console and complete the function. This line should also be indented four spaces.

return vs. print


Sometimes the difference between return statements and print statements isn’t clear to new learners
of Python. It’s important to understand what each action is and when to use it. Return statements
give you a result that you can use for something else. It doesn’t have to be something that prints
when the function is run. Print statements print something to the console and nothing more. Think of
it like this: a return statement is like your brother going to the market and bringing you back a bag of
potatoes. A print statement is like your brother going to the market, coming home, and telling you
what kind of potatoes were for sale. With the return statement, you have some potatoes to cook.
With the print statement, you just know what potatoes are available, but you don’t have any
potatoes.

Functions vs. methods


Functions and methods are very similar, but there are a few key differences. Methods are a specific
type of function. They are functions that belong to a class. This means that you can use them—or
“call” them—by using dot notation.
Method example:
The split method is a function that belongs to the string class. It splits strings on their whitespaces.
Standalone functions do not belong to a particular class and can often be used on multiple classes.
Function example:

Resources for more information


For more information on functions, consider the Python Reference Library, Data types, Functions,
Symbols
- Built-in functions:
enumerate()
isinstance()
dict()
type()
len()
set()
zip()
- Docstring conventions: PEP 257 guide to writing docstrings

Make comparisons using operators


Boolean data: A data that has only two possible values, usually true or false.

Comparators are operators that compare two values and produce Boolean values.
Ex:

Six comparators in Python:


Python Comparators Symbols
Greater than >
Greater than or equal
>=
to
Less than <
Less than or equal to <=
Equal to ==
Not equal to !=

Ex:

Logical operators: Operators that connect multiple statements together and perform more complex
comparisons.

Ex:

 and: The and operator needs both expressions to be true to return a True result.
Ex:

 or: The or operator, the expression will be True if either of the expressions are true, and False only
when both expressions are false.

Ex:

 not: The not operator inverts the value of the expression that follows it. If it's true, it becomes
false.

Ex:

Notes:
- The single equals sign = is reserved for assignment statements. If you use a single equal sign to
make a comparison, the computer will return a SyntaxError.

- If you try to compare data types that aren’t compatible, like checking if a string is greater than an
integer, Python will throw a TypeError.

Arithmetic operators
Operation Operator Example

[IN] 5 + 2
Addition +
[OUT] 7

[IN] 5 - 2
Subtraction -
[OUT] 3

[IN] 5 * 2
Multiplication *
[OUT] 10

Division / [IN] 5 / 2
Operation Operator Example

[OUT] 2.5

[IN] 5 % 2
Modulo (the remainder of a division) %
[OUT] 1

[IN] 5 ** 2
Exponentiation **
[OUT] 25

Floor division [IN] 5 // 2


//
(the number of times the denominator can fully go into the numerator) [OUT] 2

Use if, elif, else statements to make decisions


Branching: The ability of a program to alter its execution sequence.

if: a reserved keyword that sets up a condition in Python.


IF statements, also known as conditional statements, are just like using the word "if" in everyday
life.

Ex:

else is a reserved keyword that executes when preceding conditions evaluate as False.
ELSE statement lets us set a piece of code to run only when the condition of the IF statement is
False.

Ex:

Modulo: An operator that returns the remainder when one number is divided by another.
Ex:

elif: A reserved keyword that executes subsequent conditions when the previous conditions are not
True.
The elif keyword allows for more than two possible conditions in the code - up to an unlimited
number of comparison cases.
ELIF keyword lets us handle more than two comparison cases.

Ex:

Ex:

Ex:

Uses of Branching:
- Bin data based on its value
- Backup files
- Restrict login access

Reference guide: Conditional statements

Reference guide: Conditional statements


Conditional statements are an essential part of programming. They allow you to control the flow of
information based on certain conditions. In Python, if, elif, and else statements are used to
implement conditional statements. Using conditional statements to branch program execution is a
core part of coding for most data professionals, so it’s important to understand how they work. This
reading is a reference guide to conditional statements.

Conditionals syntax
In earlier videos, you learned some built-in Python operators that allow you to compare values, and
some logical operators that you can use to combine values. You also learned how to use operators in
if – elif – else blocks.
The basic syntax of if – elif – else statements in Python is as follows:
if condition1:
# block of code to execute if the condition evaluates to True

elif condition2:
# block of code to execute if condition1 evaluates to False
# and condition2 evaluates to True

else:
# block of code to execute if BOTH condition1 and condition2
# evaluate to False

Here, condition1 and condition2 are expressions that evaluate to either True or False. If the condition
in the if statement is true, then the block of code that follows is executed. Otherwise, it is skipped.
The elif statement stands for “else if,” and it is used to specify an alternative condition to check if the
first condition is false. You can have any number of elif statements in your code. If the preceding
condition is false and the elif condition is true, then the block of code that follows the elif statement
is executed.
The else statement is used to specify what code to execute if both the if statement and any
subsequent elif statements are false.
Here is an example that uses all three kinds of statements:
Some important things to note about conditional statements in Python:
- The elif and else statements are optional. You can have an if statement by itself.
- You can have multiple elif statements.
- You can only have one else statement, and only at the end of your logic block.
- The conditions must be an expression that evaluates to a Boolean value (True or False).
- Indentation matters! The code associated with each conditional statement must be indented below
it. The typical convention for data professionals is to indent four spaces. Indentation mistakes are
one of the most common causes of unexpected code behavior.

Terms and definitions from Course 2, Module 2


Algorithm: A set of instructions for solving a problem or accomplishing a task
Boolean: A data type that has only two possible values, usually true or false
Branching: The ability of a program to alter its execution sequence
Comparator: An operator that compares two values and produces Boolean values (True/False)
def: A keyword that defines a function at the start of the function block
Docstring: A string at the beginning of a function’s body that summarizes the function’s behavior
and explains its arguments and return values
elif: A reserved keyword that executes subsequent conditions when the previous conditions are not
True
else: A reserved keyword that executes when preceding conditions evaluate as False
Function: A body of reusable code for performing specific processes or tasks
if: A reserved keyword that sets up a condition in Python
Logical operator: An operator that connects multiple statements together and performs complex
comparisons
Modularity: The ability to write code in separate components that work together and that can be
reused for other programs
Modulo: An operator that returns the remainder when one number is divided by another
Refactoring: The process of restructuring code while maintaining its original functionality
return: A reserved keyword in Python that makes a function produce new results which are saved
for later use
Reusability: The capability to define code once and using it many times without having to rewrite it
Self-documenting code: Code written in a way that is readable and makes its purpose clear

Terms and definitions from the previous module


Argument: Information given to a function in its parentheses
Assignment: The process of storing a value in a variable
Attribute: A value associated with an object or class which is referenced by name using dot notation
Cells: The modular code input and output fields into which Jupyter Notebooks are partitioned
Class: An object’s data type that bundles data and functionality together
Computer programming: The process of giving instructions to a computer to perform an action or
set of actions
Data type: An attribute that describes a piece of data based on its values, its programming language,
or the operations it can perform
Dot notation: How to access the methods and attributes that belong to an instance of a class
Dynamic typing: Variables that can point to objects of any data type
Explicit conversion: The process of converting a data type of an object to a required data type
Expression: A combination of numbers, symbols, or other variables that produce a result when
evaluated
Float: A data type that represents numbers that contain decimals
Immutable data type: A data type in which the values can never be altered or updated
Implicit conversion: The process Python uses to automatically convert one data type to another
without user involvement
Integer: A data type used to represent whole numbers without fractions
Jupyter Notebook: An open-source web application for creating and sharing documents containing
live code, mathematical formulas, visualizations, and text
Keyword: A special word in a programming language that is reserved for a specific purpose and that
can only be used for that purpose
Markdown: A markup language that lets the user write formatted text in a coding environment or
plain-text editor
Method: A function that belongs to a class and typically performs an action or operation
Naming conventions: Consistent guidelines that describe the content, creation date, and version of a
file in its name
Naming restrictions: Rules built into the syntax of the language itself that must be followed
Object: An instance of a class; a fundamental building block of Python
Object-oriented programming: A programming system that is based around objects which can
contain both data and code that manipulates that data
Programming languages: The words and symbols used to write instructions for computers to
follow
String: A sequence of characters and punctuation that contains textual information
Syntax: The structure of code words, symbols, placement, and punctuation
Variable: A named container which stores values in a reserved location in the computer’s memory
MODULE 3
LOOPS AND STRINGS

Introduction to while loops


Loop: A block of code used to carry out iterations.

Iteration: The repeated execution of a set of statements, where one iteration is the single execution of
a block of code.

Iterable: An object that's looped, or iterated, over.

Data professionals use loops to automate repetitive tasks.


Data professionals use for loops and while loops to work with iterables.

while loop: A loop that instructs your computer to continuously execute your code based on the
value of a condition.

Ex:

Logical operators: and, or, not

The important thing to remember is that the condition used by the while loop needs to evaluate to
True or False.

break: A keyword that lets you escape a loop without triggering any ELSE statement that follows it
in the loop.
Ex:

Loops, break, and continue statements


While loop syntax:
The condition is a Boolean expression that is evaluated at the beginning of each iteration of the loop.
If the condition is true, the code block executes. After the code block executes, the condition is
evaluated again. This process continues until the condition is false, at which point the loop
terminates and the program continues with the next statement after the loop.

Ex:

In this example, x equals one when the loop begins. Because x is less than 100, the program prints
the value of x, then multiplies x by two. Then the condition is checked again, and because it is still
True, the code inside the loop executes again. This process continues until x becomes 128, at which
point the condition becomes False and the loop terminates.

Infinite loops:
1. Use the stop button in the menu at the top of the notebook.
2. Go to Kernel in the menu bar at the top of the notebook and select Interrupt from the drop-
down menu.
3. While in command mode, press i twice.

break & continue

Ex:

In this example, there is a variable i that acts as a counter. For each iteration of the loop, the
program:
- Checks if x is less than 100.
- If it is, then the program checks if i equals five.
- If it does, the loop terminates because of the break statement. Otherwise, it prints the values of both
i and x, doubles the value of x, and increments the value of i by one.
- Repeats until x ≥ 100 or i = 5. In this case, the loop breaks when i becomes 5.
Ex:

This example is a loop that prints all the numbers from zero through 9 that are not divisible by three.
For each iteration of the loop, the program:
- Checks if i is less than 10.
- If it is, then the program uses the modulo operator to check if i is evenly divisible by three.
- If it is not, then the program prints i, increments the value of i by one, and then cycles back to the
beginning to check that i is less than 10. This happens because of the continue statement. The final i
+= 1 does not execute, thus avoiding a double incrementation of i.
- But if step 2 evaluates i as evenly divisible by three, nothing in the if block executes (so there’s no
print statement) and i is incremented by one.
- Repeats until i becomes 10.

Introduction to for loops


for loop: A piece of code that iterates over a sequence of values.

Ex:

range(): A Python function that returns a sequence of numbers starting from zero, increments by one
by default, and stops before the given number.
range function:
- A range of numbers will start with the value 0 by default.
- The list of numbers generated will be one less than the given value.

Ex
:

Loops with multiple range() parameters


Parameters of the range function:
- Start value
- Stop value
- Step value

Use for loops when there's a sequence of elements that you want to iterate over.

Ex:

Ex:

Booleans are a data type that represents one of two possible states: usually True or False.

for loops
for loops syntax:
A for loop is a control structure that allows you to execute a block of code the same number of times
as there are elements in an iterable sequence. You’ll learn more about iterable sequences later in this
course, but some examples of iterable data types include:

Strings: ‘chimichurri’
Lists: [1, 2, 3, 4, 5, 6]
Tuples: (1, 2, 3, 4, 5)
Dictionaries: {‘Name’: ‘Anita’, ‘Age’: 77}
Sets: {1, 4, 14, 33}

The basic syntax of a for loop is as follows: for item in iterable_sequence:

The iterable_sequence variable can be any iterable data type, and item is a variable whose name is
arbitrary - you decide it. However, there are some conventions that you’ll encounter when naming
this variable. For example, if you’re iterating over characters in a string, you’ll frequently encounter
the variable char. If you’re iterating over a list of numbers, you’ll find n or num. It’s helpful to give
this variable a name so readers of your code understand what kind of information is being looped
over. So, for a variable called names that contains a list of people’s names, you might write: for
name in names:.

Ex:

Notice that num exists as a variable before the for loop begins. The for loop’s first iteration
reassigns its value with that of the first element in the sequence. This reassignment occurs with each
iteration of the loop. When the loop terminates, the variable persists, and it contains the value it had
after the final iteration of the loop.

The range() function


The for loop allows you to create a loop that performs exactly the number of iterations needed for
the data structure you’re looping over. In other words, whether your iterable sequence contains two,
1,000, or a million elements, you can use the same syntax and don’t have to specify the number of
iterations you want. However, sometimes you need to perform a task a set number of times, but you
don’t already have an iterable object to loop over. Or, sometimes you need to generate a known,
regular sequence of numbers. This is where the range() function is useful.

The range() function is a function that takes three arguments: start, stop, step. Its output is an object
belonging to the range class. If you only include one argument, it will be interpreted as the stop
value. The start and step values by default will be zero and one, respectively. If you include two
arguments, they will be interpreted as the start and stop values (again, with step being one by
default). Note that the stop value is not included in the range that is returned.
Ex:

Work with strings

You might also like