0% found this document useful (0 votes)
37 views159 pages

Python Ebook PDF

Uploaded by

Raju Boddula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views159 pages

Python Ebook PDF

Uploaded by

Raju Boddula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 1 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Contents
Who this book is for ........................................................................................................................................................... 4
Introduction to Python ....................................................................................................................................................... 4
Why Choose Python? ........................................................................................................................................................ 4
What Can Python Do? ....................................................................................................................................................... 4
History of Python ............................................................................................................................................................... 4
Python Versions................................................................................................................................................................. 5
Features of Python ............................................................................................................................................................ 5
Python Installations ........................................................................................................................................................... 6
Indentation in Python......................................................................................................................................................... 7
Comments in Python ......................................................................................................................................................... 9
Variables in Python .......................................................................................................................................................... 11
Numbers in Python .......................................................................................................................................................... 17
Strings in Python .............................................................................................................................................................. 21
Python Operators ............................................................................................................................................................ 32
Python Lists ..................................................................................................................................................................... 41
Python Tuples .................................................................................................................................................................. 53
Python Sets ..................................................................................................................................................................... 60
Python Dictionaries ......................................................................................................................................................... 70
Python Conditions and if else statements ....................................................................................................................... 78
Python For Loops............................................................................................................................................................. 83
Python Dates ................................................................................................................................................................... 90
Python RegEx .................................................................................................................................................................. 97
Python string formatting ................................................................................................................................................ 108
Python Functions .......................................................................................................................................................... 116
Python Generators ........................................................................................................................................................ 123
Python Lambda ............................................................................................................................................................. 126
Python Json ................................................................................................................................................................... 128
Python List Comprehension .......................................................................................................................................... 130
Python PIP ..................................................................................................................................................................... 133
Python Try Exception ..................................................................................................................................................... 135
Python File Handling ...................................................................................................................................................... 138
Python Class ................................................................................................................................................................. 142
Python Modules............................................................................................................................................................. 146
Python Scope ................................................................................................................................................................ 149
Python Inheritance ........................................................................................................................................................ 152

Page 2 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Polymorphism ................................................................................................................................................... 156


Resources on Internet for your reference ...................................................................................................................... 159
Reviews ......................................................................................................................................................................... 159

Page 3 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Who this book is for


Who this book is for This book is for data analysts, ETL developers, and anyone looking to get started with, or transition
to, the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be
useful for students planning to build a career in data engineering or IT professionals preparing for a transition.

Introduction to Python
Python is a widely-used, high-level programming language known for its versatility and simplicity. Created by Guido van
Rossum in 1991 and further developed by the Python Software Foundation, Python has become one of the most popular
programming languages today.

Designed with an emphasis on code readability, Python’s syntax allows developers to express ideas in fewer lines of code
compared to many other languages. This makes Python an excellent choice for rapid development and efficient system
integration.

As an interpreted, interactive, object-oriented scripting language, Python allows developers to work quickly and flexibly.
Its design prioritizes clarity, using English keywords where other languages might rely on punctuation, and featuring fewer
syntactical rules overall. This results in a language that is easy to read and understand, making it ideal for both beginners
and experienced developers.

Why Choose Python?


Python is an ideal programming language for beginners due to its simplicity and ease of learning. Its syntax is clear and
easy to read, making it more accessible compared to many other languages. With Python, you can create almost
anything, whether you're solving small problems or building complex applications.

What Can Python Do?


• Web Development: Python can be used on the server-side to create powerful web applications.

• Automation: It can automate workflows by integrating with various software systems.

• Database Connectivity: Python enables seamless connections to database systems, allowing for data
manipulation and retrieval.

• File Handling: It can read, modify, and manage files efficiently.

• Big Data & Complex Calculations: Python is widely used for handling large datasets and performing intricate
mathematical computations.

• Prototyping & Development: Whether for rapid prototyping or developing production-ready software, Python is
an excellent choice.

History of Python
• The Python programming language was conceived in the 1980s, with its development beginning in 1989. It was
officially released to the public on February 20, 1991. Python was created by Guido van Rossum at the
Centrum Wiskunde & Informatica (CWI) in the Netherlands.

• Python was inspired by the ABC programming language, which served as its predecessor. Since its creation,
Python has been maintained and managed by the Python Software Foundation (PSF), a non-commercial
organization dedicated to supporting the language's development.

• For more information about Python and its community, visit the official website at www.python.org.
Page 4 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Versions
Python has gone through several major versions, each bringing new features, improvements, and changes. Here are the
key versions:

1. Python 1.x
The first major release of Python, Python 1.x, introduced the language's foundational features. The "x" represents
minor version updates (e.g., 1.0, 1.1, etc.). This version is outdated and no longer in use.

2. Python 2.x
Python 2.x brought significant changes and enhancements compared to Python 1.x, but it lacked backward
compatibility with Python 1.x. The Python 2 series reached its end of life on January 1, 2020 and is now
considered outdated.

3. Python 3.x
Python 3.x is the current and active version of the language. The "x" here represents various updates and
improvements across the series (e.g., 3.0, 3.1, 3.2, etc.).

Python Releases for Windows | Python.org

Python 3.x is the version recommended for use, as it includes modern features, optimizations, and support for the future
of Python development.

Features of Python
Python offers a wide range of features that make it a versatile and powerful programming language. Here are 11 key
features of Python:

1. Simple
Python's syntax is clear and easy to read, making it beginner-friendly and easy to learn.

2. Freeware and Open Source


Python is free to use and open-source, meaning anyone can contribute to its development.

3. Platform Independent
Python is cross-platform, allowing programs written in Python to run on different operating systems without
modification.

4. Dynamically Typed
Python determines the type of a variable at runtime, allowing for greater flexibility and ease of use.

5. Interpreted
Python code is executed line-by-line by the interpreter, which makes debugging easier and enhances the
development process.

6. High-Level Programming Language


Python is abstracted from low-level machine operations, allowing developers to focus on solving problems
rather than managing memory.

7. Supports Both Functional and Object-Oriented Programming


Python allows for multiple programming paradigms, including functional programming and object-oriented
programming, offering great flexibility.

Page 5 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

8. Robust (Strong)
Python is known for its strong error-handling and built-in exceptions, which make it reliable and less prone to
crashing.

9. Extensible
Python can be extended by integrating with other languages, such as C or C++, to optimize performance for
specific tasks.

10. Embedded
Python can be embedded within other applications, enabling scripting capabilities in different software
environments.

11. Supports Third-Party APIs (Modules)


Python has a rich ecosystem of third-party libraries and modules, such as numpy, pandas, matplotlib, scipy,
scikit-learn, seaborn, NLTK, and Keras, that extend its functionality for tasks like data analysis, machine
learning, and web development.

Python Installations
You can install Python / VS Code on your machine by following the videos below, or you can create a free Databricks
account in the Community Edition for easy hands-on practice. (Use anyone source)

Create a Free Databricks Community Edition Account (2025) & Run First Notebook

How to install Python 3.13.2 on Windows 11

How to Run Python 3.13 in Visual Studio Code on Windows 10/11 [2024] | Run Sample Python Program

How to Install Python on Mac | Install Python on macOS

How to Set Up Python in Visual Studio Code on Mac | VSCode Python Development Basics On MacOS

Page 6 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Indentation in Python
In Python, indentation is critical because it defines the structure of the code. Unlike many other programming languages
that use braces {} to define blocks of code, Python uses indentation (spaces or tabs) to group statements.

Why Indentation is Important:

Defines Code Blocks: Indentation tells Python which code belongs to which block (loops, conditionals, functions, etc.).
Improves Readability: Proper indentation makes the code easier to read and understand.

Rules of Indentation:

Consistency: Always use the same number of spaces or tabs for indentation. The standard is 4 spaces per indentation
level. No Mixing Tabs and Spaces: Stick to one method of indentation. Mixing spaces and tabs will result in an error.

Example 1: If-Else Statement

In this example:

The print() statements are indented by 4 spaces.


If the indentation were incorrect (e.g., missing or inconsistent), Python would raise an IndentationError.

Example 2: For Loop

Here: The print(i) is indented to indicate that it belongs to the for loop.

Example 3: Function Definition

The print() statement is indented to show that it is part of the greet function.
Page 7 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Example 4: Incorrect Indentation

This will raise an IndentationError because the print() statement is not indented correctly.

Tip: Use your text editor or IDE's "Auto-Indent" feature to avoid errors and maintain consistent indentation.

Summary:

Indentation defines code blocks in Python.

Always use 4 spaces for indentation (avoid tabs).

Incorrect or inconsistent indentation will result in errors.

Please open the file below document in a Databricks notebook or Microsoft Visual Studio for better code visualization
and execution [document is available in resource section inside the course].

Page 8 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Comments in Python
✓ Comments can be used to explain Python code.
✓ Comments can be used to make the code more readable.
✓ Comments can be used to prevent execution when testing code.

Creating a Comment

Comments are hints that we add to our code to make it easier to understand. Python comments start with #.
Comments starts with a #, and Python will ignore them:

Comments can be placed at the end of a line, and Python will ignore the rest of the line:

Multiline Comments

Unlike languages such as C++ and Java, Python doesn't have a dedicated method to write multi-line comments.
However, we can achieve the same effect by using the hash (#) symbol at the beginning of each line.

Since Python will ignore string literals that are not assigned to a variable, you can add a multiline string (triple
quotes) in your code, and place your comment inside it:

Page 9 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 10 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Variables in Python
Variables are containers for storing data values.
A variable is an Identifier whose value can be changed or varied during program execution.

DATA REPRESENTATION IN PYTHON

In Real Time, we need the DATA(Literals) taking Effective Decisions. =>To take Effective Decisions on DATA, we must
STORE DATA in MAIN MEMORY (RAM) =>Primarily, The DATA Stored in main memory classified into 5 types. They are:

1. Integer Literals (s.no, e.no, ht.no, ac.no)


2. floating Point Literals (Percent, total bill etc)
3. Boolean Literals (True/False)
4. String Literals (Name, Place Names, product names. etc)
5. date literals (DOB, DOJ, DOD, DOA etc.)

Rules for Using variables OR Identifiers in Python

Rule1

A variable name is a combination of Alphabets or Digits and a special symbol Underscore (_).

Rule 2

First letter of the Variable name must starts either with an alphabet or a special symbol Underscore (_).

Page 11 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Rule-3: No special symbol are allowed Within the variable name except Under Score ( _ )

Rule-4: No Keywords to be used as Variable names (bcoz keywords are the Reserved words and we can't change
the meaning of key words.)

Rule-5: All the variables in Python are CASE SENSITIVE

All the above variables are different.

Rule-6: It is recommended to takes Variables as User-Friendly Names (Not recommended to take big names)

You can get the data type of a variable with the type() function.

Page 12 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Many Values to Multiple Variables

Note: Make sure the number of variables matches the number of values, or else you will get an error.

One Value to Multiple Variables

Output Variables

Page 13 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

GLOBAL VARIABLES

Variables that are created outside of a function (as in all of the examples in the previous pages) are known as global
variables.

Global variables can be used by everyone, both inside of functions and outside

Page 14 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

If you create a variable with the same name inside a function, this variable will be local, and can only be used inside the
function. The global variable with the same name will remain as it was, global and with the original value.

The global Keyword

Normally, when you create a variable inside a function, that variable is local, and can only be used inside that function.
To create a global variable inside a function, you can use the global keyword.

To change the value of a global variable inside a function, refer to the variable by using the global keyword:

Page 15 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 16 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Numbers in Python
Numbers are a fundamental data type in Python and play a crucial role in data engineering tasks. Python supports
multiple numeric types such as integers, floats, and complex numbers, which are essential for performing
mathematical operations, handling large datasets, and processing numerical data efficiently.

1. Types of Numbers in Python

Python provides the following numeric data types:

1.1 Integers (int)

• Whole numbers (positive, negative, or zero) without decimals.

• Example:

1.2 Floating-Point Numbers (float)

• Numbers with decimals or expressed in scientific notation.

• Example:

1.3 Complex Numbers (complex)

• Represented as a + bj, where j is the imaginary unit (√-1).

• Example:

Page 17 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

2. Mathematical Operations with Numbers

Python provides basic arithmetic operators:

3. Built-in Numeric Functions

Python provides useful built-in functions for number operations:

Page 18 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

4. Using the math Module

Python's math module provides advanced mathematical functions:

4.1 Common math Functions

5. Random Number Generation (random module)

In data engineering, random numbers are useful for data sampling, testing, and simulations.

Page 19 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

6. Handling Large Numbers (decimal and fractions Modules)

6.1 decimal for Precision

6.2 fractions for Exact Fractions

7. Best Practices for Handling Numbers in Data Engineering

✔ Use appropriate data types – Choose int, float, or decimal based on precision requirements.
✔ Avoid floating-point precision errors – Use decimal.Decimal for financial calculations.
✔ Optimize performance – Prefer built-in functions (math, random) over manual implementations.
✔ Use vectorized operations – In Pandas & NumPy, prefer vectorized math operations for speed.

Page 20 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Strings in Python
Strings in python are surrounded by either single quotation marks, or double quotation marks.

'hello' is the same as "hello".

You can display a string literal with the print() function:

Quotes Inside Quotes

You can use quotes inside a string, as long as they don't match the quotes surrounding the string:

Assign String to a Variable

Assigning a string to a variable is done with the variable name followed by an equal sign and the string:

Multiline Strings

You can assign a multiline string to a variable by using three quotes

Page 21 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Strings are Arrays

Like many other popular programming languages, strings in Python are arrays of bytes representing unicode characters.
However, Python does not have a character data type, a single character is simply a string with a length of 1.
Square brackets can be used to access elements of the string.

Looping Through a String

Since strings are arrays, we can loop through the characters in a string, with a for loop.

String Length

To get the length of a string, use the len() function

Check String

To check if a certain phrase or character is present in a string, we can use the keyword in

Page 22 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Check if NOT

To check if a certain phrase or character is NOT present in a string, we can use the keyword not in.

Slicing

You can return a range of characters by using the slice syntax.

Specify the start index and the end index, separated by a colon, to return a part of the string.

Slice From the Start

By leaving out the start index, the range will start at the first character:

Slice To the End

By leaving out the end index, the range will go to the end:

Negative Indexing

Use negative indexes to start the slice from the end of the string:

Page 23 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

String Handling

On String Data, we can perform Indexing, Slicing Operations and with these operations, we can also perform different
type of operations by using pre-defined functions present in str object.

capitalize()

This Function is used for capitalizing the first letter First word of a given Sentence only.

Syntax: strobj.capitalize() (OR) strobj=strobj.capitalize()

title():

This is used for obtaining Title Case of a Given Sentence(OR) Making all words First Letters are capital.

Syntax: s.title() (OR) s=s.title()

index()

This Function obtains Index of the specified Value.

If the specified value does not exist then we get ValueError.

Syntax: strobj.index(Value)

Syntax: indexvalue=strobj.index(value)

upper()

It is used for converting any type of Str Data into Upper Case.

Syntax:- strobj.upper() OR strobj=strobj.upper()

Page 24 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

lower()

It is used for converting any type of Str Data into lower Case.

Syntax:- strobj.lower() OR strobj=strobj.lower()

isupper()

This Function returns True provided the given str object data is purely Upper Case otherwise it returns False.

Syntax: strobj.isupper()

islower()

This Function returns True provided the given str object data is purely lower Case otherwise it returns False.

Syntax: strobj.islower()

isalpha()

This Function returns True provided str object contains Purely Alphabets otherwise returns False.

Syntax: strobj.isalpha()

Page 25 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

isdigit()

This Function returns True provided given str object contains purely digits otherwise returns False.

isalnum()

This Function returns True provided str object contains either Alpabets OR Numerics or Alpha-Numerics only
otherwise It returns False.

Page 26 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

isspace()

This Function returns True provided str obj contains purely space otherwise it returns False.

Syntax: strobj.isspace()

split()

This Function is used for splitting the given str object data into different words base specified delimter ( - _ # % ^ ^ , ;
....etc).

The dafeult deleimter is space.

The Function returns Splitting data in the form of list object.

Syntax: strobj.split("Delimter").

Page 27 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

join():

This Function is used for combining or joining list of values from any Iterable object.

Syntax: strobj.join(Iterableobject)

String Concatenation

To concatenate, or combine, two strings you can use the + operator.

Page 28 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

String Format

we cannot combine strings and numbers in the given below example:

F-Strings

F-String was introduced in Python 3.6, and is now the preferred way of formatting strings.

To specify a string as an f-string, simply put an f in front of the string literal, and add curly brackets {} as placeholders for
variables and other operations.

Placeholders and Modifiers

A placeholder can contain variables, operations, functions, and modifiers to format the value.

Note:

A placeholder can include a modifier to format the value.

A modifier is included by adding a colon : followed by a legal formatting type, like .2f which means fixed point number
with 2 decimals

A placeholder can contain Python code, like math operation

Page 29 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Escape Character

To insert characters that are illegal in a string, use an escape character.

An escape character is a backslash \ followed by the character you want to insert.

\' Single Quote

\\ Backslash

\n New Line

\r Carriage Return

Page 30 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

\t Tab

\b Backspace

\ooo Octal value

\xhh Hex value

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 31 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Operators
Python Operators ------> Arithmetic Operators

The purpose of Arithmetic Operators is that "To Perform Arithmetic Operations such as Addition, subtraction,
multiplication etc".

If two or more Variables / objects connected with Arithmetic Operators then we call it as Arithmetic Expression.

Python programming contains 7 types of Arithmetic Operators.

Page 32 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Comparison Operators

The purpose of Relational Operator is that " To compare two values".

If two or More variables / Object connected with relational operators then we call it as Relational Expression.

The result of Relational Expressions is always either True or False.

The Relational Expression is called Test Condition.

In Python Programming, we have 6 types of Relational Operators. They are given in the following table.

Page 33 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Assignment Operators

The purpose of assignment operator is that " To assign or transfer Right Hand Side (RHS) Value / Expression Value to the
Left Hand Side (LHS) Variable ".

The Symbol for Assignment Operator is single equal to ( = ).

Page 34 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 35 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Logical Operators

Logical operators are used to combine conditional statements:

Page 36 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Identity Operators

Identity operators are used to compare the objects, not if they are equal, but if they are actually the same object, with
the same memory location:

Python Membership Operators

Membership operators are used to test if a sequence is presented in an object:

Page 37 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Bitwise Operators

Bitwise operators are used to compare (binary) numbers:

Page 38 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Operator Precedence

Operator precedence describes the order in which operations are performed.

Page 39 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 40 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Lists
Lists are used to store multiple items in a single variable.

Lists are one of 4 built-in data types in Python used to store collections of data

List Items

List items are ordered, changeable, and allow duplicate values.

List items are indexed, the first item has index [0], the second item has index [1] etc.

Ordered

When we say that lists are ordered, it means that the items have a defined order, and that order will not change.

If you add new items to a list, the new items will be placed at the end of the list.

Changeable

The list is changeable, meaning that we can change, add, and remove items in a list after it has been created.

Allow Duplicates

Since lists are indexed, lists can have items with the same value:

List Length

To determine how many items a list has, use the len() function:

Page 41 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

List Items - Data Types

List items can be of any data type

type()

From Python's perspective, lists are defined as objects with the data type 'list':

The list() Constructor

It is also possible to use the list() constructor when creating a new list.

Python - Access List Items

List items are indexed and you can access them by referring to the index number:

Page 42 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Negative Indexing

Negative indexing means start from the end

-1 refers to the last item, -2 refers to the second last item etc.

Range of Indexes

You can specify a range of indexes by specifying where to start and where to end the range.

When specifying a range, the return value will be a new list with the specified items.

Page 43 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Range of Negative Indexes

Specify negative indexes if you want to start the search from the end of the list:

Check if Item Exists

To determine if a specified item is present in a list use the in keyword

Python - Change List Items

To change the value of a specific item, refer to the index number:

Change a Range of Item Values

To change the value of items within a specific range, define a list with the new values, and refer to the range of index
numbers where you want to insert the new values:

Page 44 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

If you insert more items than you replace, the new items will be inserted where you specified, and the remaining items
will move accordingly:

If you insert less items than you replace, the new items will be inserted where you specified, and the remaining items
will move accordingly:

Python List insert() Method

The insert() method inserts the specified value at the specified position.

Python List append() Method

The append() method appends an element to the end of the list.

Page 45 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python List extend() Method

The extend() method adds the specified list elements (or any iterable) to the end of the current list.

Python List remove() Method

The remove() method removes the first occurrence of the element with the specified value.

Page 46 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python List pop() Method

The pop() method removes the element at the specified position.

Python List clear() Method

The clear() method removes all the elements from a list.

Python List copy() Method

The copy() method returns a copy of the specified list.

Python List count() Method

The count() method returns the number of elements with the specified value.

Page 47 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python List index() Method

The index() method returns the position at the first occurrence of the specified value

Python List reverse() Method

The reverse() method reverses the sorting order of the elements

Page 48 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python List sort() Method

The sort() method sorts the list ascending by default.

Customize Sort Function

You can also customize your own function by using the keyword argument key = function. The function will return a
number that will be used to sort the list (the lowest number first):

Case Insensitive Sort

By default the sort() method is case sensitive, resulting in all capital letters being sorted before lower case letters:

Page 49 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Reverse Order

• What if you want to reverse the order of a list, regardless of the alphabet?

• The reverse() method reverses the current sorting order of the elements.

• Reverse the order of the list items:

Copy List

You cannot copy a list simply by typing list2 = list1, because: list2 will only be a reference to list1, and changes made in
list1 will automatically also be made in list2.

Use the copy() method

You can use the built-in List method copy() to copy a list

Page 50 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Use the list() method

Another way to make a copy is to use the built-in method list().

Use the slice Operator

You can also make a copy of a list by using the : (slice) operator.

Python - Join Lists

• Join Two Lists


• There are several ways to join, or concatenate, two or more lists in Python.
• One of the easiest ways are by using the + operator.

Page 51 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Sort List Alphanumerically List objects have a sort() method that will sort the list alphanumerically, ascending, by
default:

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 52 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Tuples
Tuples are used to store multiple items in a single variable.

Tuple is one of 4 built-in data types in Python used to store collections of data

Tuple Items

Tuple items are ordered, unchangeable, and allow duplicate values.

Tuple items are indexed, the first item has index [0], the second item has index [1] etc.

Ordered

When we say that tuples are ordered, it means that the items have a defined order, and that order will not change.

Unchangeable

Tuples are unchangeable, meaning that we cannot change, add or remove items after the tuple has been created.

Allow Duplicates

Since tuples are indexed, they can have items with the same value:

Tuple Length

To determine how many items a tuple has, use the len() function:

Create Tuple With One Item

To create a tuple with only one item, you have to add a comma after the item, otherwise Python will not recognize it as a
tuple

Page 53 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Tuple Items - Data Types

Tuple items can be of any data type:

The tuple() Constructor

It is also possible to use the tuple() constructor to make a tuple.

Access Tuple Items

You can access tuple items by referring to the index number, inside square brackets:

Page 54 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Range of Indexes

You can specify a range of indexes by specifying where to start and where to end the range.

When specifying a range, the return value will be a new tuple with the specified items.

Page 55 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Check if Item Exists

To determine if a specified item is present in a tuple use the in keyword:

Change Tuple Values

Once a tuple is created, you cannot change its values. Tuples are unchangeable, or immutable as it also is called.

But there is a workaround. You can convert the tuple into a list, change the list, and convert the list back into a tuple.

Page 56 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Remove Items

Tuples are unchangeable, so you cannot remove items from it, but you can use the same workaround as we used for
changing and adding tuple items:

Unpacking a Tuple

When we create a tuple, we normally assign values to it. This is called "packing" a tuple: But, in Python, we are also
allowed to extract the values back into variables. This is called "unpacking":

Page 57 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Join Two Tuples

To join two or more tuples you can use the + operator:

Page 58 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Tuple count() Method

The count() method returns the number of times a specified value appears in the tuple.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 59 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Sets

Sets are used to store multiple items in a single variable.

Set is one of 4 built-in data types in Python used to store collections of data.

Sets are written with curly brackets.

Set Items

Set items are unordered, unchangeable, and do not allow duplicate values.

Unordered

Unordered means that the items in a set do not have a defined order.

Set items can appear in a different order every time you use them, and cannot be referred to by index or key

Unchangeable

Set items are unchangeable, meaning that we cannot change the items after the set has been created.

Once a set is created, you cannot change its items, but you can remove items and add new items.

Duplicates Not Allowed

Sets cannot have two items with the same value.

Get the Length of a Set

To determine how many items a set has, use the len() function.

Page 60 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

The set() Constructor

It is also possible to use the set() constructor to make a set.

Access Items

You cannot access items in a set by referring to an index or a key.

But you can loop through the set items using a for loop, or ask if a specified value is present in a set, by using the in
keyword.

Page 61 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Add Items

Once a set is created, you cannot change its items, but you can add new items.

To add one item to a set use the add() method.

Page 62 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Add Any Iterable

The object in the update() method does not have to be a set, it can be any iterable object (tuples, lists, dictionaries etc.)

Remove Item

To remove an item in a set, use the remove(), or the discard() method.

Page 63 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Union

The union() method returns a new set with all items from both sets.

Join Multiple Sets

All the joining methods and operators can be used to join multiple sets.

When using a method, just add more sets in the parentheses, separated by commas:

Page 64 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Join a Set and a Tuple

The union() method allows you to join a set with other data types, like lists or tuples.

The result will be a set.

Update

The update() method inserts all items from one set into another.

The update() changes the original set, and does not return a new set.

Page 65 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Intersection

Keep ONLY the duplicates

The intersection() method will return a new set, that only contains the items that are present in both sets.

The intersection_update()

This method will also keep ONLY the duplicates, but it will change the original set instead of returning a new set.

Page 66 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Difference

The difference() method will return a new set that will contain only the items from the first set that are not present in the
other set.

Difference_update() method

This will also keep the items from the first set that are not in the other set, but it will change the original set instead of
returning a new set.

Page 67 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Symmetric Differences

The symmetric_difference() method will keep only the elements that are NOT present in both sets.

Symmetric_difference_update() method

This will also keep all but the duplicates, but it will change the original set instead of returning a new set.

Python Set isdisjoint() Method

The isdisjoint() method returns True if none of the items are present in both sets, otherwise it returns False.

Page 68 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Set issubset() Method

The issubset() method returns True if all items in the set exists in the specified set, otherwise it returns False.

Python Set issuperset() Method

The issuperset() method returns True if all items in the specified set exists in the original set, otherwise it returns False.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 69 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Dictionaries
Dictionaries are used to store data values in key:value pairs.

A dictionary is a collection which is ordered*, changeable and do not allow duplicates.

As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered.

Dictionary Items

Dictionary items are ordered, changeable, and do not allow duplicates.

Dictionary items are presented in key:value pairs, and can be referred to by using the key name.

Changeable

Dictionaries are changeable, meaning that we can change, add or remove items after the dictionary has been created.

Duplicates Not Allowed

Dictionaries cannot have two items with the same key:

Page 70 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Dictionary Length

To determine how many items a dictionary has, use the len() function:

Dictionary Items - Data Types

The values in dictionary items can be of any data type:

The dict() Constructor

It is also possible to use the dict() constructor to make a dictionary.

Python - Access Dictionary Items

You can access the items of a dictionary by referring to its key name, inside square brackets:

Page 71 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 72 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python - Change Dictionary Items

You can change the value of a specific item by referring to its key name:

Update Dictionary

The update() method will update the dictionary with the items from the given argument.
The argument must be a dictionary, or an iterable object with key:value pairs.

Page 73 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python - Remove Dictionary Items

There are several methods to remove items from a dictionary:

Page 74 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python - Copy Dictionaries

Page 75 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python - Nested Dictionaries

A dictionary can contain dictionaries, this is called nested dictionaries.

Python Dictionary fromkeys() Method.

The fromkeys() method returns a dictionary with the specified keys and the specified value.

Page 76 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Dictionary setdefault() Method

The setdefault() method returns the value of the item with the specified key.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 77 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Conditions and if else statements


Python supports the usual logical conditions from mathematics:

• Equals: a == b

• Not Equals: a != b

• Less than: a < b

• Less than or equal to: a <= b

• Greater than: a > b

• Greater than or equal to: a >= b

• These conditions can be used in several ways, most commonly in "if statements" and loops.

• An "if statement" is written by using the if keyword.

we use two variables, a and b, which are used as part of the if statement to test whether b is greater than a. As a is 599,
and b is 600, we know that 600 is greater than 599, and so we print to screen that "b is greater than a".

Indentation

Python relies on indentation (whitespace at the beginning of a line) to define scope in the code. Other programming
languages often use curly-brackets for this purpose.

Elif

The elif keyword is Python's way of saying "if the previous conditions were not true, then try this condition"

Page 78 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Else

The else keyword catches anything which isn't caught by the preceding conditions.

Short Hand If

If you have only one statement to execute, you can put it on the same line as the if statement.
Page 79 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Short Hand If ... Else

If you have only one statement to execute, one for if, and one for else, you can put it all on the same line:

And

The and keyword is a logical operator, and is used to combine conditional statements:

Or

The or keyword is a logical operator, and is used to combine conditional statements:

Page 80 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Not

The not keyword is a logical operator, and is used to reverse the result of the conditional statement:

Nested If

You can have if statements inside if statements, this is called nested if statements.

The pass Statement

if statements cannot be empty, but if you for some reason have an if statement with no content, put in the pass
statement to avoid getting an error.

Page 81 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 82 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python For Loops


A for loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

This is less like the for keyword in other programming languages, and works more like an iterator method as found in
other object-orientated programming languages.

With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.

Looping Through a String

Even strings are iterable objects, they contain a sequence of characters:

The break Statement

With the break statement we can stop the loop before it has looped through all the items:

Page 83 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

The continue Statement

With the continue statement we can stop the current iteration of the loop, and continue with the next:

The range() Function

To loop through a set of code a specified number of times, we can use the range() function, The range() function returns
a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and ends at a specified number.

Page 84 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Else in For Loop

The else keyword in a for loop specifies a block of code to be executed when the loop is finished:

Page 85 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Nested Loops

A nested loop is a loop inside a loop.

The "inner loop" will be executed one time for each iteration of the "outer loop":

Page 86 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

The pass Statement

for loops cannot be empty, but if you for some reason have a for loop with no content, put in the pass statement to
avoid getting an error.

The while Loop

With the while loop we can execute a set of statements as long as a condition is true.

Page 87 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

The break Statement

With the break statement we can stop the loop even if the while condition is true:

The continue Statement

With the continue statement we can stop the current iteration, and continue with the next:

The else Statement

With the else statement we can run a block of code once when the condition no longer is true:

Page 88 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 89 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Dates in Python
A date in Python is not a data type of its own, but we can import a module named datetime to work with dates as date
objects.

Date Output

When we execute the code from the example above the result will be:
2024-12-31 22:07:14.544191 The date contains year, month, day, hour, minute, second, and microsecond.
The datetime module has many methods to return information about the date object.
Here are a few examples, you will learn more about them later in this chapter:

Creating Date Objects

To create a date, we can use the datetime() class (constructor) of the datetime module.
The datetime() class requires three parameters to create a date: year, month, day.

Page 90 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

The strftime() Method

The datetime object has a method for formatting date objects into readable strings. The method is called strftime(), and
takes one parameter, format, to specify the format of the returned string:

Page 91 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 92 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 93 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 94 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 95 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 96 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Python RegEx
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.

RegEx can be used to check if a string contains the specified search pattern.

RegEx Module

Python has a built-in package called re, which can be used to work with Regular Expressions.

Import the re module:

RegEx in Python

When you have imported the re module, you can start using regular expressions:

The findall() Function

The findall() function returns a list containing all matches.

Metacharacters

Metacharacters are characters with a special meaning:

Page 97 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 98 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 99 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Special Sequences

A special sequence is a \ followed by one of the characters in the examples given below, and has a special meaning:

Page 100 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 101 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 102 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Sets

A set is a set of characters inside a pair of square brackets [] with a special meaning:

Page 103 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 104 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 105 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

RegEx Functions

The re module offers a set of functions that allows us to search a string for a match:

The search() Function

The search() function searches the string for a match, and returns a Match object if there is a match.

If there is more than one match, only the first occurrence of the match will be returned:

Match Object

A Match Object is an object containing information about the search and the result.

Note: If there is no match, the value None will be returned, instead of the Match Object.

Page 106 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

The Match object has properties and methods used to retrieve information about the search, and the result:

.span() returns a tuple containing the start-, and end positions of the match.

.string returns the string passed into the function.

.group() returns the part of the string where there was a match.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 107 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python string formatting


F-String was introduced in Python 3.6, and is now the preferred way of formatting strings.

Before Python 3.6 we had to use the format() method.

F-Strings

F-string allows you to format selected parts of a string.

To specify a string as an f-string, simply put an f in front of the string literal, like this:

Placeholders and Modifiers

To format values in an f-string, add placeholders {}, a placeholder can contain variables, operations, functions, and
modifiers to format the value.

A placeholder can also include a modifier to format the value.

A modifier is included by adding a colon : followed by a legal formatting type, like .2f which means fixed point number
with 2 decimals

Page 108 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Perform Operations in F-Strings

You can perform Python operations inside the placeholders.

You can do math operations:

You can perform if...else statements inside the placeholders:

Execute Functions in F-Strings

You can execute functions inside the placeholder:

The function does not have to be a built-in Python method, you can create your own functions and use them:

Page 109 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

More Modifiers

There are several other modifiers that can be used to format values:

Page 110 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 111 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 112 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 113 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

String format()

Before Python 3.6 we used the format() method to format strings. The format() method can still be used, but f-strings
are faster and the preferred way to format strings. The next examples in this page demonstrates how to format strings
with the format() method. The format() method also uses curly brackets as placeholders {}, but the syntax is slightly
different:

Page 114 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Index Numbers

You can use index numbers (a number inside the curly brackets {0}) to be sure the values are placed in the correct
placeholders:

Named Indexes

You can also use named indexes by entering a name inside the curly brackets {carname}, but then you must use names
when you pass the parameter values txt.format(carname = "Ford"):

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 115 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Functions
• A function is a block of code which only runs when it is called.
• You can pass data, known as parameters, into a function.
• A function can return data as a result.

Creating a Function

In Python a function is defined using the def keyword:

Calling a Function

To call a function, use the function name followed by parenthesis:

Arguments

• Information can be passed into functions as arguments.

• Arguments are specified after the function name, inside the parentheses. You can add as many arguments as
you want, just separate them with a comma.

• The following example has a function with one argument (fname). When the function is called, we pass along a
first name, which is used inside the function to print the full name:

Page 116 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Parameters or Arguments?

• The terms parameter and argument can be used for the same thing: information that are passed into a function.

From a function's perspective:

• A parameter is the variable listed inside the parentheses in the function definition.

• An argument is the value that is sent to the function when it is called.

Number of Arguments

By default, a function must be called with the correct number of arguments. Meaning that if your function expects 2
arguments, you have to call the function with 2 arguments, not more, and not less.

Arbitrary Arguments, *args

• If you do not know how many arguments that will be passed into your function, add a * before the parameter
name in the function definition.

• This way the function will receive a tuple of arguments, and can access the items accordingly.

Page 117 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Keyword Arguments

• You can also send arguments with the key = value syntax.

• This way the order of the arguments does not matter.

Arbitrary Keyword Arguments, **kwargs

• If you do not know how many keyword arguments that will be passed into your function, add two asterisk: **
before the parameter name in the function definition.
• This way the function will receive a dictionary of arguments, and can access the items accordingly:

Default Parameter Value

• The following example shows how to use a default parameter value.


• If we call the function without argument, it uses the default value:

Page 118 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Passing a List as an Argument

• You can send any data types of argument to a function (string, number, list, dictionary etc.), and it will be
treated as the same data type inside the function.

• E.g. if you send a List as an argument, it will still be a List when it reaches the function:

Return Values

To let a function return a value, use the return statement:

The pass Statement

function definitions cannot be empty, but if you for some reason have a function definition with no content, put in the
pass statement to avoid getting an error.

Positional-Only Arguments

• You can specify that a function can have ONLY positional arguments, or ONLY keyword arguments.

• To specify that a function can have only positional arguments, add , / after the arguments:

Page 119 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Keyword-Only Arguments

To specify that a function can have only keyword arguments, add *, before the arguments:

Page 120 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Combine Positional-Only and Keyword-Only

• You can combine the two argument types in the same function.
• Any argument before the / , are positional-only, and any argument after the *, are keyword-only.

Recursion

• Python also accepts function recursion, which means a defined function can call itself.

• Recursion is a common mathematical and programming concept. It means that a function calls itself. This has
the benefit of meaning that you can loop through data to reach a result.

• The developer should be very careful with recursion as it can be quite easy to slip into writing a function which
never terminates, or one that uses excess amounts of memory or processor power. However, when written
correctly recursion can be a very efficient and mathematically-elegant approach to programming.

• In this example, tri_recursion() is a function that we have defined to call itself ("recurse"). We use the k variable
as the data, which decrements (-1) every time we recurse. The recursion ends when the condition is not greater
than 0 (i.e. when it is 0).

• To a new developer it can take some time to work out how exactly this works, best way to find out is by testing
and modifying it.
Page 121 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 122 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Generators

• Python generators are easy way of creating iterators. It generates values one at a time from a

• given sequence instead of returning the entire sequence at once.

• It is a special type of function which returns an iterator object.

• In a generator function, a yield statement is used rather than a return statement.

• The generator function cannot include the return keyword. If we include it then it will terminate

• the execution of the function.

• The difference between yield and return is that once yield returns a value the function is

• paused and the control is transferred to the caller. Local variables and their states are

• remembered between successive calls. In case of the return statement value is returned and

• the execution of the function is terminated.

• Methods like iter() and next() are implemented automatically in generator function.

• Simple generators can be easily created using generator expressions. Generator

• expressions create anonymous generator functions like lambda.

• The syntax for generator expression is similar to that of a list comprehension but the only

• difference is square brackets are replaced with round parentheses. Also list comprehension

• produces the entire list while the generator expression produces one item at a time which is more memory
efficient than list comprehension.

Page 123 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Page 124 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 125 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Lambda
• A lambda function is a small anonymous function.

• A lambda function can take any number of arguments, but can only have one expression.

• lambda arguments : expression

• The expression is executed and the result is returned:

Why Use Lambda Functions?

• The power of lambda is better shown when you use them as an anonymous function inside another function.

• Say you have a function definition that takes one argument, and that argument will be multiplied with an
unknown number:

Page 126 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 127 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Json
JSON is a syntax for storing and exchanging data.

JSON is text, written with JavaScript object notation.

Convert from Python to JSON

If you have a Python object, you can convert it into a JSON string by using the json.dumps() method.

You can convert Python objects of the following types, into JSON strings:

dict list tuple string int float True False None

Page 128 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Format the Result

The example above prints a JSON string, but it is not very easy to read, with no indentations and line breaks.
The json.dumps() method has parameters to make it easier to read the result:

You can also define the separators, default value is (", ", ": "), which means using a comma and a space to separate each
object, and a colon and a space to separate keys from values:

Order the Result

The json.dumps() method has parameters to order the keys in the result:

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 129 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python List Comprehension


List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list.

Example:

Based on a list of demo, you want a new list, containing only the demo with the letter "c" in the name.

Without list comprehension you will have to write a for statement with a conditional test inside:

The Syntax

• newlist = [expression for item in iterable if condition == True]


• The return value is a new list, leaving the old list unchanged.
Condition

The condition is like a filter that only accepts the items that evaluate to True.

The condition if x != "apple" will return True for all elements other than "apple", making the new list contain all fruits
except "apple".

The condition is optional and can be omitted:


Page 130 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Iterable

The iterable can be any iterable object, like a list, tuple, set etc.

Expression

The expression is the current item in the iteration, but it is also the outcome, which you can manipulate before it ends
up like a list item in the new list:

Page 131 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 132 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python PIP
PIP is a package manager for Python packages, or modules if you like. Note: If you have Python version 3.4 or later, PIP
is included by default.

What is a Package?

A package contains all the files you need for a module.

Modules are Python code libraries you can include in your project

Check if PIP is Installed

Navigate your command line to the location of Python's script directory, and type the following: Check PIP version:

C:\Users\Your Name\AppData\Local\Programs\Python\Python36-32\Scripts>pip --version

Download a Package

Downloading a package is very easy. Open the command line interface and tell PIP to download the package you want.
Navigate your command line to the location of Python's script directory, and type the following:

Download a package named "camelcase": C:\Users\Your Name\AppData\Local\Programs\Python\Python36-


32\Scripts>pip install camelcase

Using a Package

Once the package is installed, it is ready to use.

Import the "camelcase" package into your project.

Remove a Package

Use the uninstall command to remove a package:

Uninstall the package named "camelcase":

C:\Users\Your Name\AppData\Local\Programs\Python\Python36-32\Scripts>pip uninstall camelcas

The PIP Package Manager will ask you to confirm that you want to remove the camelcase package:

Uninstalling camelcase-02.1: Would remove: c:\users\Your Name\appdata\local\programs\python\python36-


32\lib\site-packages\camelcase-0.2-py3.6.egg-info c:\users\Your Name\appdata\local\programs\python\python36-
32\lib\site-packages\camelcase* Proceed (y/n)? Press y and the package will be removed.

List Packages

Use the list command to list all the packages installed on your system:

List installed packages:

C:\Users\Your Name\AppData\Local\Programs\Python\Python36-32\Scripts>pip list

Page 133 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

• Result:

• Package Version

• camelcase 0.2

• mysql-connector 2.1.6

• pip 18.1

• pymongo 3.6.1

• setuptools 39.0.1

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 134 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Try Exception


Exception Handling

When an error occurs, or exception as we call it, Python will normally stop and generate an error message.

These exceptions can be handled using the try statement:

The try block lets you test a block of code for errors.

The except block lets you handle the error.

The else block lets you execute code when there is no error.

The finally block lets you execute code, regardless of the result of the try- and except blocks.

Many Exceptions

You can define as many exception blocks as you want, e.g. if you want to execute a special block of code for a special
kind of error:

Else

You can use the else keyword to define a block of code to be executed if no errors were raised:

Page 135 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Finally

The finally block, if specified, will be executed regardless if the try block raises an error or not.

Raise an exception

As a Python developer you can choose to throw an exception if a condition occurs.

To throw (or raise) an exception, use the raise keyword.

Page 136 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 137 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python File Handling


Python File Open

File handling is an important part of any web application. Python has several functions for creating, reading, updating,
and deleting files.

File Handling

• The key function for working with files in Python is the open() function.

• The open() function takes two parameters; filename, and mode.

• There are four different methods (modes) for opening a file:

• "r" - Read - Default value. Opens a file for reading, error if the file does not exist.

• "a" - Append - Opens a file for appending, creates the file if it does not exist

• "w" - Write - Opens a file for writing, creates the file if it does not exist.

• "x" - Create - Creates the specified file, returns an error if the file exists

• In addition you can specify if the file should be handled as binary or text mode

• "t" - Text - Default value. Text mode

• "b" - Binary - Binary mode (e.g. images)

Python File Open

Open a File on the Server

Assume we have the following file, located in the same folder as Python:

• To open the file, use the built-in open() function.

• The open() function returns a file object, which has a read() method for reading the content of the file:

Page 138 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Read Only Parts of the File

By default the read() method returns the whole text, but you can also specify how many characters you want to return:

Read Lines

You can return one line by using the readline() method:

Page 139 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Close Files

It is a good practice to always close the file when you are done with it.

Python File Write

• Write to an Existing File

• To write to an existing file, you must add a parameter to the open() function:

• "a" - Append - will append to the end of the file

• "w" - Write - will overwrite any existing content

Python Delete File

• Delete a File

• To delete a file, you must import the OS module, and run its os.remove() function:

Page 140 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Check if File exist:

To avoid getting an error, you might want to check if the file exists before you try to delete it:

Delete Folder

To delete an entire folder, use the os.rmdir() method:

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 141 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Class

Python Classes/Objects

Python is an object oriented programming language.

Almost everything in Python is an object, with its properties and methods.

A Class is like an object constructor, or a "blueprint" for creating objects.

Create a Class

To create a class, use the keyword class:

Create Object

Now we can use the class named cleverstudies to create objects.

The init() Function

The examples above are classes and objects in their simplest form, and are not really useful in real life applications.

To understand the meaning of classes we have to understand the built-in init() function.

All classes have a function called init(), which is always executed when the class is being initiated.

Use the init() function to assign values to object properties, or other operations that are necessary to do when the
object is being created:

Page 142 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Note: The init() function is called automatically every time the class is being used to create a new object.

The str() Function

The str() function controls what should be returned when the class object is represented as a string.

If the str() function is not set, the string representation of the object is returned:

Object Methods

Objects can also contain methods. Methods in objects are functions that belong to the object.

Let us create a method in the Person class:

Page 143 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

The self Parameter

The self parameter is a reference to the current instance of the class, and is used to access variables that belong to the
class.

It does not have to be named self, you can call it whatever you like, but it has to be the first parameter of any function in
the class:

Modify Object Properties

Page 144 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Delete Object Properties

You can delete properties on objects by using the del keyword:

Delete Objects

You can delete objects by using the del keyword:

The pass Statement

class definitions cannot be empty, but if you for some reason have a class definition with no content, put in the pass
statement to avoid getting an error.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 145 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Modules

What is a Module?

Consider a module to be the same as a code library.

A file containing a set of functions you want to include in your application.

Create a Module

To create a module just save the code you want in a file with the file extension .py:

Use a Module

Now we can use the module we just created, by using the import statement:

Variables in Module

The module can contain functions, as already described, but also variables of all types (arrays, dictionaries, objects
etc):

Page 146 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Naming a Module

You can name the module file whatever you like, but it must have the file extension .py

Code Text

Re-naming a Module

You can create an alias when you import a module, by using the as keyword:

Built-in Modules

There are several built-in modules in Python, which you can import whenever you like.

Page 147 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Using the dir() Function

There is a built-in function to list all the function names (or variable names) in a module. The dir() function:

Import From Module

You can choose to import only parts from a module, by using the from keyword.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 148 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Scope
A variable is only available from inside the region it is created. This is called scope.

Local Scope

A variable created inside a function belongs to the local scope of that function, and can only be used inside that
function.

Function Inside Function

As explained in the example above, the variable x is not available outside the function, but it is available for any function
inside the function:

Global Scope

A variable created in the main body of the Python code is a global variable and belongs to the global scope.

Global variables are available from within any scope, global and local.

Page 149 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Naming Variables

If you operate with the same variable name inside and outside of a function, Python will treat them as two separate
variables, one available in the global scope (outside the function) and one available in the local scope (inside the
function):

Global Keyword

If you need to create a global variable, but are stuck in the local scope, you can use the global keyword. The global
keyword makes the variable global.

Page 150 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Nonlocal Keyword

The nonlocal keyword is used to work with variables inside nested functions. The nonlocal keyword makes the variable
belong to the outer function.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 151 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Inheritance
Inheritance allows us to define a class that inherits all the methods and properties from another class.

Parent class is the class being inherited from, also called base class.

Child class is the class that inherits from another class, also called derived class

Create a Parent Class

Any class can be a parent class, so the syntax is the same as creating any other class:

Create a Child Class

To create a class that inherits the functionality from another class, send the parent class as a parameter when creating
the child class:

Page 152 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Add the init() Function

So far we have created a child class that inherits the properties and methods from its parent.
We want to add the init() function to the child class (instead of the pass keyword).

Note: The init() function is called automatically every time the class is being used to create a new object.

Page 153 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Use the super() Function

Python also has a super() function that will make the child class inherit all the methods and properties from its parent:

Page 154 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].

Page 155 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Python Polymorphism
The word "polymorphism" means "many forms", and in programming it refers to methods/functions/operators with the
same name that can be executed on many objects or classes.

Function Polymorphism

An example of a Python function that can be used on different objects is the len() function.

String

For strings len() returns the number of characters:

Page 156 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Class Polymorphism

Polymorphism is often used in Class methods, where we can have multiple classes with the same method name.

For example, say we have three classes: Car, Boat, and Plane, and they all have a method called move():

Page 157 of 159


PYTHON FOR DATA ENGINEERS (E-BOOK)

Inheritance Class Polymorphism

What about classes with child classes with the same name? Can we use polymorphism there?

Yes. If we use the example above and make a parent class called Vehicle, and make Car, Boat, Plane child classes of
Vehicle, the child classes inherits the Vehicle methods, but can override them:

Child classes inherits the properties and methods from the parent class.
In the example above you can see that the Car class is empty, but it inherits brand, model, and move() from Vehicle.
The Boat and Plane classes also inherit brand, model, and move() from Vehicle, but they both override the move()
method.
Because of polymorphism we can execute the same method for all classes.

Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 158 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)

Resources on Internet for your reference


[FREE] Python Tutorial (w3schools.com)

[FREE] 1. Introduction to Python (youtube.com)

[FREE] Python For Data Engineering 1 : Introduction To Python #Python #DataEngineering (youtube.com)

[FREE] Python Introduction & Features | Tutorial for Beginners | Lecture 1

[PAID] Python Programming for Absolute Beginners to Intermediate | Codebasics

Reviews
Please consider leaving a review! After reading and using this book, why not share your thoughts on the site where you
purchased it? Your unbiased opinion will help potential readers make informed decisions, give Clever Studies valuable
feedback on our products, and provide our authors with insights on their work. Thank you!

Page 159 of 159

You might also like