Python Ebook PDF
Python Ebook PDF
Page 1 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Contents
Who this book is for ........................................................................................................................................................... 4
Introduction to Python ....................................................................................................................................................... 4
Why Choose Python? ........................................................................................................................................................ 4
What Can Python Do? ....................................................................................................................................................... 4
History of Python ............................................................................................................................................................... 4
Python Versions................................................................................................................................................................. 5
Features of Python ............................................................................................................................................................ 5
Python Installations ........................................................................................................................................................... 6
Indentation in Python......................................................................................................................................................... 7
Comments in Python ......................................................................................................................................................... 9
Variables in Python .......................................................................................................................................................... 11
Numbers in Python .......................................................................................................................................................... 17
Strings in Python .............................................................................................................................................................. 21
Python Operators ............................................................................................................................................................ 32
Python Lists ..................................................................................................................................................................... 41
Python Tuples .................................................................................................................................................................. 53
Python Sets ..................................................................................................................................................................... 60
Python Dictionaries ......................................................................................................................................................... 70
Python Conditions and if else statements ....................................................................................................................... 78
Python For Loops............................................................................................................................................................. 83
Python Dates ................................................................................................................................................................... 90
Python RegEx .................................................................................................................................................................. 97
Python string formatting ................................................................................................................................................ 108
Python Functions .......................................................................................................................................................... 116
Python Generators ........................................................................................................................................................ 123
Python Lambda ............................................................................................................................................................. 126
Python Json ................................................................................................................................................................... 128
Python List Comprehension .......................................................................................................................................... 130
Python PIP ..................................................................................................................................................................... 133
Python Try Exception ..................................................................................................................................................... 135
Python File Handling ...................................................................................................................................................... 138
Python Class ................................................................................................................................................................. 142
Python Modules............................................................................................................................................................. 146
Python Scope ................................................................................................................................................................ 149
Python Inheritance ........................................................................................................................................................ 152
Page 2 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 3 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Introduction to Python
Python is a widely-used, high-level programming language known for its versatility and simplicity. Created by Guido van
Rossum in 1991 and further developed by the Python Software Foundation, Python has become one of the most popular
programming languages today.
Designed with an emphasis on code readability, Python’s syntax allows developers to express ideas in fewer lines of code
compared to many other languages. This makes Python an excellent choice for rapid development and efficient system
integration.
As an interpreted, interactive, object-oriented scripting language, Python allows developers to work quickly and flexibly.
Its design prioritizes clarity, using English keywords where other languages might rely on punctuation, and featuring fewer
syntactical rules overall. This results in a language that is easy to read and understand, making it ideal for both beginners
and experienced developers.
• Database Connectivity: Python enables seamless connections to database systems, allowing for data
manipulation and retrieval.
• Big Data & Complex Calculations: Python is widely used for handling large datasets and performing intricate
mathematical computations.
• Prototyping & Development: Whether for rapid prototyping or developing production-ready software, Python is
an excellent choice.
History of Python
• The Python programming language was conceived in the 1980s, with its development beginning in 1989. It was
officially released to the public on February 20, 1991. Python was created by Guido van Rossum at the
Centrum Wiskunde & Informatica (CWI) in the Netherlands.
• Python was inspired by the ABC programming language, which served as its predecessor. Since its creation,
Python has been maintained and managed by the Python Software Foundation (PSF), a non-commercial
organization dedicated to supporting the language's development.
• For more information about Python and its community, visit the official website at www.python.org.
Page 4 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python Versions
Python has gone through several major versions, each bringing new features, improvements, and changes. Here are the
key versions:
1. Python 1.x
The first major release of Python, Python 1.x, introduced the language's foundational features. The "x" represents
minor version updates (e.g., 1.0, 1.1, etc.). This version is outdated and no longer in use.
2. Python 2.x
Python 2.x brought significant changes and enhancements compared to Python 1.x, but it lacked backward
compatibility with Python 1.x. The Python 2 series reached its end of life on January 1, 2020 and is now
considered outdated.
3. Python 3.x
Python 3.x is the current and active version of the language. The "x" here represents various updates and
improvements across the series (e.g., 3.0, 3.1, 3.2, etc.).
Python 3.x is the version recommended for use, as it includes modern features, optimizations, and support for the future
of Python development.
Features of Python
Python offers a wide range of features that make it a versatile and powerful programming language. Here are 11 key
features of Python:
1. Simple
Python's syntax is clear and easy to read, making it beginner-friendly and easy to learn.
3. Platform Independent
Python is cross-platform, allowing programs written in Python to run on different operating systems without
modification.
4. Dynamically Typed
Python determines the type of a variable at runtime, allowing for greater flexibility and ease of use.
5. Interpreted
Python code is executed line-by-line by the interpreter, which makes debugging easier and enhances the
development process.
Page 5 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
8. Robust (Strong)
Python is known for its strong error-handling and built-in exceptions, which make it reliable and less prone to
crashing.
9. Extensible
Python can be extended by integrating with other languages, such as C or C++, to optimize performance for
specific tasks.
10. Embedded
Python can be embedded within other applications, enabling scripting capabilities in different software
environments.
Python Installations
You can install Python / VS Code on your machine by following the videos below, or you can create a free Databricks
account in the Community Edition for easy hands-on practice. (Use anyone source)
Create a Free Databricks Community Edition Account (2025) & Run First Notebook
How to Run Python 3.13 in Visual Studio Code on Windows 10/11 [2024] | Run Sample Python Program
How to Set Up Python in Visual Studio Code on Mac | VSCode Python Development Basics On MacOS
Page 6 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Indentation in Python
In Python, indentation is critical because it defines the structure of the code. Unlike many other programming languages
that use braces {} to define blocks of code, Python uses indentation (spaces or tabs) to group statements.
Defines Code Blocks: Indentation tells Python which code belongs to which block (loops, conditionals, functions, etc.).
Improves Readability: Proper indentation makes the code easier to read and understand.
Rules of Indentation:
Consistency: Always use the same number of spaces or tabs for indentation. The standard is 4 spaces per indentation
level. No Mixing Tabs and Spaces: Stick to one method of indentation. Mixing spaces and tabs will result in an error.
In this example:
Here: The print(i) is indented to indicate that it belongs to the for loop.
The print() statement is indented to show that it is part of the greet function.
Page 7 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
This will raise an IndentationError because the print() statement is not indented correctly.
Tip: Use your text editor or IDE's "Auto-Indent" feature to avoid errors and maintain consistent indentation.
Summary:
Please open the file below document in a Databricks notebook or Microsoft Visual Studio for better code visualization
and execution [document is available in resource section inside the course].
Page 8 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Comments in Python
✓ Comments can be used to explain Python code.
✓ Comments can be used to make the code more readable.
✓ Comments can be used to prevent execution when testing code.
Creating a Comment
Comments are hints that we add to our code to make it easier to understand. Python comments start with #.
Comments starts with a #, and Python will ignore them:
Comments can be placed at the end of a line, and Python will ignore the rest of the line:
Multiline Comments
Unlike languages such as C++ and Java, Python doesn't have a dedicated method to write multi-line comments.
However, we can achieve the same effect by using the hash (#) symbol at the beginning of each line.
Since Python will ignore string literals that are not assigned to a variable, you can add a multiline string (triple
quotes) in your code, and place your comment inside it:
Page 9 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 10 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Variables in Python
Variables are containers for storing data values.
A variable is an Identifier whose value can be changed or varied during program execution.
In Real Time, we need the DATA(Literals) taking Effective Decisions. =>To take Effective Decisions on DATA, we must
STORE DATA in MAIN MEMORY (RAM) =>Primarily, The DATA Stored in main memory classified into 5 types. They are:
Rule1
A variable name is a combination of Alphabets or Digits and a special symbol Underscore (_).
Rule 2
First letter of the Variable name must starts either with an alphabet or a special symbol Underscore (_).
Page 11 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Rule-3: No special symbol are allowed Within the variable name except Under Score ( _ )
Rule-4: No Keywords to be used as Variable names (bcoz keywords are the Reserved words and we can't change
the meaning of key words.)
Rule-6: It is recommended to takes Variables as User-Friendly Names (Not recommended to take big names)
You can get the data type of a variable with the type() function.
Page 12 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Note: Make sure the number of variables matches the number of values, or else you will get an error.
Output Variables
Page 13 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
GLOBAL VARIABLES
Variables that are created outside of a function (as in all of the examples in the previous pages) are known as global
variables.
Global variables can be used by everyone, both inside of functions and outside
Page 14 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
If you create a variable with the same name inside a function, this variable will be local, and can only be used inside the
function. The global variable with the same name will remain as it was, global and with the original value.
Normally, when you create a variable inside a function, that variable is local, and can only be used inside that function.
To create a global variable inside a function, you can use the global keyword.
To change the value of a global variable inside a function, refer to the variable by using the global keyword:
Page 15 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 16 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Numbers in Python
Numbers are a fundamental data type in Python and play a crucial role in data engineering tasks. Python supports
multiple numeric types such as integers, floats, and complex numbers, which are essential for performing
mathematical operations, handling large datasets, and processing numerical data efficiently.
• Example:
• Example:
• Example:
Page 17 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 18 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
In data engineering, random numbers are useful for data sampling, testing, and simulations.
Page 19 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
✔ Use appropriate data types – Choose int, float, or decimal based on precision requirements.
✔ Avoid floating-point precision errors – Use decimal.Decimal for financial calculations.
✔ Optimize performance – Prefer built-in functions (math, random) over manual implementations.
✔ Use vectorized operations – In Pandas & NumPy, prefer vectorized math operations for speed.
Page 20 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Strings in Python
Strings in python are surrounded by either single quotation marks, or double quotation marks.
You can use quotes inside a string, as long as they don't match the quotes surrounding the string:
Assigning a string to a variable is done with the variable name followed by an equal sign and the string:
Multiline Strings
Page 21 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Like many other popular programming languages, strings in Python are arrays of bytes representing unicode characters.
However, Python does not have a character data type, a single character is simply a string with a length of 1.
Square brackets can be used to access elements of the string.
Since strings are arrays, we can loop through the characters in a string, with a for loop.
String Length
Check String
To check if a certain phrase or character is present in a string, we can use the keyword in
Page 22 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Check if NOT
To check if a certain phrase or character is NOT present in a string, we can use the keyword not in.
Slicing
Specify the start index and the end index, separated by a colon, to return a part of the string.
By leaving out the start index, the range will start at the first character:
By leaving out the end index, the range will go to the end:
Negative Indexing
Use negative indexes to start the slice from the end of the string:
Page 23 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
String Handling
On String Data, we can perform Indexing, Slicing Operations and with these operations, we can also perform different
type of operations by using pre-defined functions present in str object.
capitalize()
This Function is used for capitalizing the first letter First word of a given Sentence only.
title():
This is used for obtaining Title Case of a Given Sentence(OR) Making all words First Letters are capital.
index()
Syntax: strobj.index(Value)
Syntax: indexvalue=strobj.index(value)
upper()
It is used for converting any type of Str Data into Upper Case.
Page 24 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
lower()
It is used for converting any type of Str Data into lower Case.
isupper()
This Function returns True provided the given str object data is purely Upper Case otherwise it returns False.
Syntax: strobj.isupper()
islower()
This Function returns True provided the given str object data is purely lower Case otherwise it returns False.
Syntax: strobj.islower()
isalpha()
This Function returns True provided str object contains Purely Alphabets otherwise returns False.
Syntax: strobj.isalpha()
Page 25 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
isdigit()
This Function returns True provided given str object contains purely digits otherwise returns False.
isalnum()
This Function returns True provided str object contains either Alpabets OR Numerics or Alpha-Numerics only
otherwise It returns False.
Page 26 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
isspace()
This Function returns True provided str obj contains purely space otherwise it returns False.
Syntax: strobj.isspace()
split()
This Function is used for splitting the given str object data into different words base specified delimter ( - _ # % ^ ^ , ;
....etc).
Syntax: strobj.split("Delimter").
Page 27 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
join():
This Function is used for combining or joining list of values from any Iterable object.
Syntax: strobj.join(Iterableobject)
String Concatenation
Page 28 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
String Format
F-Strings
F-String was introduced in Python 3.6, and is now the preferred way of formatting strings.
To specify a string as an f-string, simply put an f in front of the string literal, and add curly brackets {} as placeholders for
variables and other operations.
A placeholder can contain variables, operations, functions, and modifiers to format the value.
Note:
A modifier is included by adding a colon : followed by a legal formatting type, like .2f which means fixed point number
with 2 decimals
Page 29 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Escape Character
\\ Backslash
\n New Line
\r Carriage Return
Page 30 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
\t Tab
\b Backspace
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 31 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python Operators
Python Operators ------> Arithmetic Operators
The purpose of Arithmetic Operators is that "To Perform Arithmetic Operations such as Addition, subtraction,
multiplication etc".
If two or more Variables / objects connected with Arithmetic Operators then we call it as Arithmetic Expression.
Page 32 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
If two or More variables / Object connected with relational operators then we call it as Relational Expression.
In Python Programming, we have 6 types of Relational Operators. They are given in the following table.
Page 33 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The purpose of assignment operator is that " To assign or transfer Right Hand Side (RHS) Value / Expression Value to the
Left Hand Side (LHS) Variable ".
Page 34 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 35 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 36 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Identity operators are used to compare the objects, not if they are equal, but if they are actually the same object, with
the same memory location:
Page 37 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 38 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Operator Precedence
Page 39 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 40 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python Lists
Lists are used to store multiple items in a single variable.
Lists are one of 4 built-in data types in Python used to store collections of data
List Items
List items are indexed, the first item has index [0], the second item has index [1] etc.
Ordered
When we say that lists are ordered, it means that the items have a defined order, and that order will not change.
If you add new items to a list, the new items will be placed at the end of the list.
Changeable
The list is changeable, meaning that we can change, add, and remove items in a list after it has been created.
Allow Duplicates
Since lists are indexed, lists can have items with the same value:
List Length
To determine how many items a list has, use the len() function:
Page 41 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
type()
From Python's perspective, lists are defined as objects with the data type 'list':
It is also possible to use the list() constructor when creating a new list.
List items are indexed and you can access them by referring to the index number:
Page 42 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Negative Indexing
-1 refers to the last item, -2 refers to the second last item etc.
Range of Indexes
You can specify a range of indexes by specifying where to start and where to end the range.
When specifying a range, the return value will be a new list with the specified items.
Page 43 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Specify negative indexes if you want to start the search from the end of the list:
To change the value of items within a specific range, define a list with the new values, and refer to the range of index
numbers where you want to insert the new values:
Page 44 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
If you insert more items than you replace, the new items will be inserted where you specified, and the remaining items
will move accordingly:
If you insert less items than you replace, the new items will be inserted where you specified, and the remaining items
will move accordingly:
The insert() method inserts the specified value at the specified position.
Page 45 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The extend() method adds the specified list elements (or any iterable) to the end of the current list.
The remove() method removes the first occurrence of the element with the specified value.
Page 46 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The count() method returns the number of elements with the specified value.
Page 47 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The index() method returns the position at the first occurrence of the specified value
Page 48 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
You can also customize your own function by using the keyword argument key = function. The function will return a
number that will be used to sort the list (the lowest number first):
By default the sort() method is case sensitive, resulting in all capital letters being sorted before lower case letters:
Page 49 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Reverse Order
• What if you want to reverse the order of a list, regardless of the alphabet?
• The reverse() method reverses the current sorting order of the elements.
Copy List
You cannot copy a list simply by typing list2 = list1, because: list2 will only be a reference to list1, and changes made in
list1 will automatically also be made in list2.
You can use the built-in List method copy() to copy a list
Page 50 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
You can also make a copy of a list by using the : (slice) operator.
Page 51 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Sort List Alphanumerically List objects have a sort() method that will sort the list alphanumerically, ascending, by
default:
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 52 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python Tuples
Tuples are used to store multiple items in a single variable.
Tuple is one of 4 built-in data types in Python used to store collections of data
Tuple Items
Tuple items are indexed, the first item has index [0], the second item has index [1] etc.
Ordered
When we say that tuples are ordered, it means that the items have a defined order, and that order will not change.
Unchangeable
Tuples are unchangeable, meaning that we cannot change, add or remove items after the tuple has been created.
Allow Duplicates
Since tuples are indexed, they can have items with the same value:
Tuple Length
To determine how many items a tuple has, use the len() function:
To create a tuple with only one item, you have to add a comma after the item, otherwise Python will not recognize it as a
tuple
Page 53 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
You can access tuple items by referring to the index number, inside square brackets:
Page 54 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Range of Indexes
You can specify a range of indexes by specifying where to start and where to end the range.
When specifying a range, the return value will be a new tuple with the specified items.
Page 55 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Once a tuple is created, you cannot change its values. Tuples are unchangeable, or immutable as it also is called.
But there is a workaround. You can convert the tuple into a list, change the list, and convert the list back into a tuple.
Page 56 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Remove Items
Tuples are unchangeable, so you cannot remove items from it, but you can use the same workaround as we used for
changing and adding tuple items:
Unpacking a Tuple
When we create a tuple, we normally assign values to it. This is called "packing" a tuple: But, in Python, we are also
allowed to extract the values back into variables. This is called "unpacking":
Page 57 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 58 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The count() method returns the number of times a specified value appears in the tuple.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 59 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python Sets
Set is one of 4 built-in data types in Python used to store collections of data.
Set Items
Set items are unordered, unchangeable, and do not allow duplicate values.
Unordered
Unordered means that the items in a set do not have a defined order.
Set items can appear in a different order every time you use them, and cannot be referred to by index or key
Unchangeable
Set items are unchangeable, meaning that we cannot change the items after the set has been created.
Once a set is created, you cannot change its items, but you can remove items and add new items.
To determine how many items a set has, use the len() function.
Page 60 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Access Items
But you can loop through the set items using a for loop, or ask if a specified value is present in a set, by using the in
keyword.
Page 61 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Add Items
Once a set is created, you cannot change its items, but you can add new items.
Page 62 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The object in the update() method does not have to be a set, it can be any iterable object (tuples, lists, dictionaries etc.)
Remove Item
Page 63 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Union
The union() method returns a new set with all items from both sets.
All the joining methods and operators can be used to join multiple sets.
When using a method, just add more sets in the parentheses, separated by commas:
Page 64 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The union() method allows you to join a set with other data types, like lists or tuples.
Update
The update() method inserts all items from one set into another.
The update() changes the original set, and does not return a new set.
Page 65 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Intersection
The intersection() method will return a new set, that only contains the items that are present in both sets.
The intersection_update()
This method will also keep ONLY the duplicates, but it will change the original set instead of returning a new set.
Page 66 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Difference
The difference() method will return a new set that will contain only the items from the first set that are not present in the
other set.
Difference_update() method
This will also keep the items from the first set that are not in the other set, but it will change the original set instead of
returning a new set.
Page 67 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Symmetric Differences
The symmetric_difference() method will keep only the elements that are NOT present in both sets.
Symmetric_difference_update() method
This will also keep all but the duplicates, but it will change the original set instead of returning a new set.
The isdisjoint() method returns True if none of the items are present in both sets, otherwise it returns False.
Page 68 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The issubset() method returns True if all items in the set exists in the specified set, otherwise it returns False.
The issuperset() method returns True if all items in the specified set exists in the original set, otherwise it returns False.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 69 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python Dictionaries
Dictionaries are used to store data values in key:value pairs.
As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered.
Dictionary Items
Dictionary items are presented in key:value pairs, and can be referred to by using the key name.
Changeable
Dictionaries are changeable, meaning that we can change, add or remove items after the dictionary has been created.
Page 70 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Dictionary Length
To determine how many items a dictionary has, use the len() function:
You can access the items of a dictionary by referring to its key name, inside square brackets:
Page 71 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 72 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
You can change the value of a specific item by referring to its key name:
Update Dictionary
The update() method will update the dictionary with the items from the given argument.
The argument must be a dictionary, or an iterable object with key:value pairs.
Page 73 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 74 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 75 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The fromkeys() method returns a dictionary with the specified keys and the specified value.
Page 76 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The setdefault() method returns the value of the item with the specified key.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 77 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
• Equals: a == b
• Not Equals: a != b
• These conditions can be used in several ways, most commonly in "if statements" and loops.
we use two variables, a and b, which are used as part of the if statement to test whether b is greater than a. As a is 599,
and b is 600, we know that 600 is greater than 599, and so we print to screen that "b is greater than a".
Indentation
Python relies on indentation (whitespace at the beginning of a line) to define scope in the code. Other programming
languages often use curly-brackets for this purpose.
Elif
The elif keyword is Python's way of saying "if the previous conditions were not true, then try this condition"
Page 78 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Else
The else keyword catches anything which isn't caught by the preceding conditions.
Short Hand If
If you have only one statement to execute, you can put it on the same line as the if statement.
Page 79 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
If you have only one statement to execute, one for if, and one for else, you can put it all on the same line:
And
The and keyword is a logical operator, and is used to combine conditional statements:
Or
Page 80 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Not
The not keyword is a logical operator, and is used to reverse the result of the conditional statement:
Nested If
You can have if statements inside if statements, this is called nested if statements.
if statements cannot be empty, but if you for some reason have an if statement with no content, put in the pass
statement to avoid getting an error.
Page 81 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 82 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
This is less like the for keyword in other programming languages, and works more like an iterator method as found in
other object-orientated programming languages.
With the for loop we can execute a set of statements, once for each item in a list, tuple, set etc.
With the break statement we can stop the loop before it has looped through all the items:
Page 83 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
With the continue statement we can stop the current iteration of the loop, and continue with the next:
To loop through a set of code a specified number of times, we can use the range() function, The range() function returns
a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and ends at a specified number.
Page 84 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The else keyword in a for loop specifies a block of code to be executed when the loop is finished:
Page 85 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Nested Loops
The "inner loop" will be executed one time for each iteration of the "outer loop":
Page 86 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
for loops cannot be empty, but if you for some reason have a for loop with no content, put in the pass statement to
avoid getting an error.
With the while loop we can execute a set of statements as long as a condition is true.
Page 87 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
With the break statement we can stop the loop even if the while condition is true:
With the continue statement we can stop the current iteration, and continue with the next:
With the else statement we can run a block of code once when the condition no longer is true:
Page 88 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 89 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Dates in Python
A date in Python is not a data type of its own, but we can import a module named datetime to work with dates as date
objects.
Date Output
When we execute the code from the example above the result will be:
2024-12-31 22:07:14.544191 The date contains year, month, day, hour, minute, second, and microsecond.
The datetime module has many methods to return information about the date object.
Here are a few examples, you will learn more about them later in this chapter:
To create a date, we can use the datetime() class (constructor) of the datetime module.
The datetime() class requires three parameters to create a date: year, month, day.
Page 90 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
The datetime object has a method for formatting date objects into readable strings. The method is called strftime(), and
takes one parameter, format, to specify the format of the returned string:
Page 91 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 92 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 93 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 94 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 95 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 96 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Python RegEx
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
RegEx can be used to check if a string contains the specified search pattern.
RegEx Module
Python has a built-in package called re, which can be used to work with Regular Expressions.
RegEx in Python
When you have imported the re module, you can start using regular expressions:
Metacharacters
Page 97 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 98 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Page 99 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Special Sequences
A special sequence is a \ followed by one of the characters in the examples given below, and has a special meaning:
Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:
RegEx Functions
The re module offers a set of functions that allows us to search a string for a match:
The search() function searches the string for a match, and returns a Match object if there is a match.
If there is more than one match, only the first occurrence of the match will be returned:
Match Object
A Match Object is an object containing information about the search and the result.
Note: If there is no match, the value None will be returned, instead of the Match Object.
The Match object has properties and methods used to retrieve information about the search, and the result:
.span() returns a tuple containing the start-, and end positions of the match.
.group() returns the part of the string where there was a match.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
F-Strings
To specify a string as an f-string, simply put an f in front of the string literal, like this:
To format values in an f-string, add placeholders {}, a placeholder can contain variables, operations, functions, and
modifiers to format the value.
A modifier is included by adding a colon : followed by a legal formatting type, like .2f which means fixed point number
with 2 decimals
The function does not have to be a built-in Python method, you can create your own functions and use them:
More Modifiers
There are several other modifiers that can be used to format values:
String format()
Before Python 3.6 we used the format() method to format strings. The format() method can still be used, but f-strings
are faster and the preferred way to format strings. The next examples in this page demonstrates how to format strings
with the format() method. The format() method also uses curly brackets as placeholders {}, but the syntax is slightly
different:
Index Numbers
You can use index numbers (a number inside the curly brackets {0}) to be sure the values are placed in the correct
placeholders:
Named Indexes
You can also use named indexes by entering a name inside the curly brackets {carname}, but then you must use names
when you pass the parameter values txt.format(carname = "Ford"):
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Functions
• A function is a block of code which only runs when it is called.
• You can pass data, known as parameters, into a function.
• A function can return data as a result.
Creating a Function
Calling a Function
Arguments
• Arguments are specified after the function name, inside the parentheses. You can add as many arguments as
you want, just separate them with a comma.
• The following example has a function with one argument (fname). When the function is called, we pass along a
first name, which is used inside the function to print the full name:
Parameters or Arguments?
• The terms parameter and argument can be used for the same thing: information that are passed into a function.
• A parameter is the variable listed inside the parentheses in the function definition.
Number of Arguments
By default, a function must be called with the correct number of arguments. Meaning that if your function expects 2
arguments, you have to call the function with 2 arguments, not more, and not less.
• If you do not know how many arguments that will be passed into your function, add a * before the parameter
name in the function definition.
• This way the function will receive a tuple of arguments, and can access the items accordingly.
Keyword Arguments
• You can also send arguments with the key = value syntax.
• If you do not know how many keyword arguments that will be passed into your function, add two asterisk: **
before the parameter name in the function definition.
• This way the function will receive a dictionary of arguments, and can access the items accordingly:
• You can send any data types of argument to a function (string, number, list, dictionary etc.), and it will be
treated as the same data type inside the function.
• E.g. if you send a List as an argument, it will still be a List when it reaches the function:
Return Values
function definitions cannot be empty, but if you for some reason have a function definition with no content, put in the
pass statement to avoid getting an error.
Positional-Only Arguments
• You can specify that a function can have ONLY positional arguments, or ONLY keyword arguments.
• To specify that a function can have only positional arguments, add , / after the arguments:
Keyword-Only Arguments
To specify that a function can have only keyword arguments, add *, before the arguments:
• You can combine the two argument types in the same function.
• Any argument before the / , are positional-only, and any argument after the *, are keyword-only.
Recursion
• Python also accepts function recursion, which means a defined function can call itself.
• Recursion is a common mathematical and programming concept. It means that a function calls itself. This has
the benefit of meaning that you can loop through data to reach a result.
• The developer should be very careful with recursion as it can be quite easy to slip into writing a function which
never terminates, or one that uses excess amounts of memory or processor power. However, when written
correctly recursion can be a very efficient and mathematically-elegant approach to programming.
• In this example, tri_recursion() is a function that we have defined to call itself ("recurse"). We use the k variable
as the data, which decrements (-1) every time we recurse. The recursion ends when the condition is not greater
than 0 (i.e. when it is 0).
• To a new developer it can take some time to work out how exactly this works, best way to find out is by testing
and modifying it.
Page 121 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Generators
• Python generators are easy way of creating iterators. It generates values one at a time from a
• The generator function cannot include the return keyword. If we include it then it will terminate
• The difference between yield and return is that once yield returns a value the function is
• paused and the control is transferred to the caller. Local variables and their states are
• remembered between successive calls. In case of the return statement value is returned and
• Methods like iter() and next() are implemented automatically in generator function.
• The syntax for generator expression is similar to that of a list comprehension but the only
• difference is square brackets are replaced with round parentheses. Also list comprehension
• produces the entire list while the generator expression produces one item at a time which is more memory
efficient than list comprehension.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Lambda
• A lambda function is a small anonymous function.
• A lambda function can take any number of arguments, but can only have one expression.
• The power of lambda is better shown when you use them as an anonymous function inside another function.
• Say you have a function definition that takes one argument, and that argument will be multiplied with an
unknown number:
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Json
JSON is a syntax for storing and exchanging data.
If you have a Python object, you can convert it into a JSON string by using the json.dumps() method.
You can convert Python objects of the following types, into JSON strings:
The example above prints a JSON string, but it is not very easy to read, with no indentations and line breaks.
The json.dumps() method has parameters to make it easier to read the result:
You can also define the separators, default value is (", ", ": "), which means using a comma and a space to separate each
object, and a colon and a space to separate keys from values:
The json.dumps() method has parameters to order the keys in the result:
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Example:
Based on a list of demo, you want a new list, containing only the demo with the letter "c" in the name.
Without list comprehension you will have to write a for statement with a conditional test inside:
The Syntax
The condition is like a filter that only accepts the items that evaluate to True.
The condition if x != "apple" will return True for all elements other than "apple", making the new list contain all fruits
except "apple".
Iterable
The iterable can be any iterable object, like a list, tuple, set etc.
Expression
The expression is the current item in the iteration, but it is also the outcome, which you can manipulate before it ends
up like a list item in the new list:
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python PIP
PIP is a package manager for Python packages, or modules if you like. Note: If you have Python version 3.4 or later, PIP
is included by default.
What is a Package?
Modules are Python code libraries you can include in your project
Navigate your command line to the location of Python's script directory, and type the following: Check PIP version:
Download a Package
Downloading a package is very easy. Open the command line interface and tell PIP to download the package you want.
Navigate your command line to the location of Python's script directory, and type the following:
Using a Package
Remove a Package
The PIP Package Manager will ask you to confirm that you want to remove the camelcase package:
List Packages
Use the list command to list all the packages installed on your system:
• Result:
• Package Version
• camelcase 0.2
• mysql-connector 2.1.6
• pip 18.1
• pymongo 3.6.1
• setuptools 39.0.1
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
When an error occurs, or exception as we call it, Python will normally stop and generate an error message.
The try block lets you test a block of code for errors.
The else block lets you execute code when there is no error.
The finally block lets you execute code, regardless of the result of the try- and except blocks.
Many Exceptions
You can define as many exception blocks as you want, e.g. if you want to execute a special block of code for a special
kind of error:
Else
You can use the else keyword to define a block of code to be executed if no errors were raised:
Finally
The finally block, if specified, will be executed regardless if the try block raises an error or not.
Raise an exception
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
File handling is an important part of any web application. Python has several functions for creating, reading, updating,
and deleting files.
File Handling
• The key function for working with files in Python is the open() function.
• "r" - Read - Default value. Opens a file for reading, error if the file does not exist.
• "a" - Append - Opens a file for appending, creates the file if it does not exist
• "w" - Write - Opens a file for writing, creates the file if it does not exist.
• "x" - Create - Creates the specified file, returns an error if the file exists
• In addition you can specify if the file should be handled as binary or text mode
Assume we have the following file, located in the same folder as Python:
• The open() function returns a file object, which has a read() method for reading the content of the file:
By default the read() method returns the whole text, but you can also specify how many characters you want to return:
Read Lines
Close Files
It is a good practice to always close the file when you are done with it.
• To write to an existing file, you must add a parameter to the open() function:
• Delete a File
• To delete a file, you must import the OS module, and run its os.remove() function:
To avoid getting an error, you might want to check if the file exists before you try to delete it:
Delete Folder
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Class
Python Classes/Objects
Create a Class
Create Object
The examples above are classes and objects in their simplest form, and are not really useful in real life applications.
To understand the meaning of classes we have to understand the built-in init() function.
All classes have a function called init(), which is always executed when the class is being initiated.
Use the init() function to assign values to object properties, or other operations that are necessary to do when the
object is being created:
Note: The init() function is called automatically every time the class is being used to create a new object.
The str() function controls what should be returned when the class object is represented as a string.
If the str() function is not set, the string representation of the object is returned:
Object Methods
Objects can also contain methods. Methods in objects are functions that belong to the object.
The self parameter is a reference to the current instance of the class, and is used to access variables that belong to the
class.
It does not have to be named self, you can call it whatever you like, but it has to be the first parameter of any function in
the class:
Delete Objects
class definitions cannot be empty, but if you for some reason have a class definition with no content, put in the pass
statement to avoid getting an error.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Modules
What is a Module?
Create a Module
To create a module just save the code you want in a file with the file extension .py:
Use a Module
Now we can use the module we just created, by using the import statement:
Variables in Module
The module can contain functions, as already described, but also variables of all types (arrays, dictionaries, objects
etc):
Naming a Module
You can name the module file whatever you like, but it must have the file extension .py
Code Text
Re-naming a Module
You can create an alias when you import a module, by using the as keyword:
Built-in Modules
There are several built-in modules in Python, which you can import whenever you like.
There is a built-in function to list all the function names (or variable names) in a module. The dir() function:
You can choose to import only parts from a module, by using the from keyword.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Scope
A variable is only available from inside the region it is created. This is called scope.
Local Scope
A variable created inside a function belongs to the local scope of that function, and can only be used inside that
function.
As explained in the example above, the variable x is not available outside the function, but it is available for any function
inside the function:
Global Scope
A variable created in the main body of the Python code is a global variable and belongs to the global scope.
Global variables are available from within any scope, global and local.
Naming Variables
If you operate with the same variable name inside and outside of a function, Python will treat them as two separate
variables, one available in the global scope (outside the function) and one available in the local scope (inside the
function):
Global Keyword
If you need to create a global variable, but are stuck in the local scope, you can use the global keyword. The global
keyword makes the variable global.
Nonlocal Keyword
The nonlocal keyword is used to work with variables inside nested functions. The nonlocal keyword makes the variable
belong to the outer function.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Inheritance
Inheritance allows us to define a class that inherits all the methods and properties from another class.
Parent class is the class being inherited from, also called base class.
Child class is the class that inherits from another class, also called derived class
Any class can be a parent class, so the syntax is the same as creating any other class:
To create a class that inherits the functionality from another class, send the parent class as a parameter when creating
the child class:
So far we have created a child class that inherits the properties and methods from its parent.
We want to add the init() function to the child class (instead of the pass keyword).
Note: The init() function is called automatically every time the class is being used to create a new object.
Python also has a super() function that will make the child class inherit all the methods and properties from its parent:
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Python Polymorphism
The word "polymorphism" means "many forms", and in programming it refers to methods/functions/operators with the
same name that can be executed on many objects or classes.
Function Polymorphism
An example of a Python function that can be used on different objects is the len() function.
String
Class Polymorphism
Polymorphism is often used in Class methods, where we can have multiple classes with the same method name.
For example, say we have three classes: Car, Boat, and Plane, and they all have a method called move():
What about classes with child classes with the same name? Can we use polymorphism there?
Yes. If we use the example above and make a parent class called Vehicle, and make Car, Boat, Plane child classes of
Vehicle, the child classes inherits the Vehicle methods, but can override them:
Child classes inherits the properties and methods from the parent class.
In the example above you can see that the Car class is empty, but it inherits brand, model, and move() from Vehicle.
The Boat and Plane classes also inherit brand, model, and move() from Vehicle, but they both override the move()
method.
Because of polymorphism we can execute the same method for all classes.
Please open the file above in a Databricks notebook or Microsoft Visual Studio for better code visualization and
execution [File is available in resource section inside the course].
Page 158 of 159
PYTHON FOR DATA ENGINEERS (E-BOOK)
[FREE] Python For Data Engineering 1 : Introduction To Python #Python #DataEngineering (youtube.com)
Reviews
Please consider leaving a review! After reading and using this book, why not share your thoughts on the site where you
purchased it? Your unbiased opinion will help potential readers make informed decisions, give Clever Studies valuable
feedback on our products, and provide our authors with insights on their work. Thank you!