Python All Module Notes - Pagenumber
Python All Module Notes - Pagenumber
Module -1
Python Basic Concepts and Programming
• Variables, Keywords
• Statements and Expressions
• Operators, Precedence and Associativity
• Data Types, Indentation, Comments
• Reading Input, Print Output
• Type Conversions, The type( ) Function and Is Operator
• Control Flow Statements
— The if Decision Control Flow Statement,
— The if…else Decision Control Flow Statement
— The if…elif…else Decision Control Statement
— Nested if Statement
— The while Loop
— The for Loop
— The continue and break Statements
• Built-In Functions, Commonly Used Modules
• Function Definition and Calling the Function
• The return Statement and void Function
• Scope and Lifetime of Variables
• Default Parameters, Keyword Arguments
• *args and **kwargs, Command Line Arguments
Introduction
• Easy-to-learn − Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.
• Easy-to-read − Python code is more clearly defined and visible to the eyes.
• A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
• Interactive Mode − Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
• Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
• Scalable − Python provides a better structure and support for large programs than other
scripting languages.
Python has rich set of libraries for various purposes like large-scale data processing,
predictive analytics, scientific computing etc. Based on one’s need, the required packages
can be downloaded. But there is a free open source distribution Anaconda, which simplifies
package management and deployment. Hence, it is suggested for the readers to install
Anaconda from the below given link, rather than just installing a simple Python.
https://anaconda.org/anaconda/python
Successful installation of anaconda provides you Python in a command prompt, the default
editor IDLE and also a browser-based interactive computing environment known as Jupyter
notebook.
Hence these high-level programming language has to be translated into machine language
using translators such as : (1) interpreters and (2) compilers.
interprets it immediately.
every line of code can
Interpreter generate the output
Source immediately
program
compiler
executable files
(with extensions .exe, .dll )
Interpreter Compiler
Translates program one statement at a Scans the entire program and translates it
time. as a whole into machine code.
It takes less amount of time to analyze the It takes large amount of time to analyze the
source code but the overall execution time source code but the overall execution time
is slower. is comparatively faster.
Continues translating the program until It generates the error message only after
the first error is met, in which case it scanning the whole program. Hence
stops. Hence debugging is easy. debugging is comparatively hard.
Programming language like Python, Ruby Programming language like C, C++ use
use interpreters. compilers.
Writing a program
• Program can be written using a text editor.
• To write the Python instructions into a file, which is called a script. By convention, Python
scripts have names that end with .py .
• To execute the script, you have to tell the Python interpreter the name of the file. In a
command window, you would type python hello.py as follows:
$ cat hello.py
print('Hello world!')
$ python hello.py
Hello world!
The “$” is the operating system prompt, and the “cat hello.py” is showing us that the file
“hello.py” has a one-line Python program to print a string. We call the Python interpreter
and tell it to read its source code from the file “hello.py” instead of prompting us for lines
of Python code
Program Execution
Compilation
The program is converted into byte code. Byte code is a fixed set of instructions that represent
arithmetic, comparison, memory operations, etc. It can run on any operating system and
hardware. The byte code instructions are created in the .pyc file. he compiler creates a
directory named __pycache__ where it stores the .pyc file.
Interpreter
The next step involves converting the byte code (.pyc file) into machine code. This step is
necessary as the computer can understand only machine code (binary code). Python Virtual
Machine (PVM) first understands the operating system and processor in the computer and then
converts it into machine code. Further, these machine code instructions are executed by
processor and the results are displayed.
Flow Controls
There are some low-level conceptual patterns that we use to construct programs. These
constructs are not just for Python programs, they are part of every programming language
from machine language up to the high-level languages. They are listed as follows:
• Sequential execution: Perform statements one after another in the order they are
encountered in the script.
• Conditional execution: Check for certain conditions and then execute or skip a sequence
of statements. (Ex: statements with if-elfi-else)
• Repeated execution: Perform some set of statements repeatedly, usually with some
variation. (Ex: statements with for, while loop)
• Reuse: Write a set of instructions once and give them a name and then reuse those
instructions as needed throughout your program. (Ex: statements in functions)
Identifiers
• A Python identifier is a name used to identify a variable, function, class, module or other
object. An identifier starts with a letter A to Z or a to z or an underscore (_) followed by zero
or more letters, underscores and digits (0 to 9).
• Python does not allow punctuation characters such as @, $, and % within identifiers. Python
is a case sensitive programming language. Thus, Manpower and manpower are two different
identifiers in Python.
• Here are naming conventions for Python identifiers
— Class names start with an uppercase letter. All other identifiers start with a lowercase
letter.
— Starting an identifier with a single leading underscore indicates that the identifier is
private.
— Starting an identifier with two leading underscores indicates a strongly private identifier.
— If the identifier also ends with two trailing underscores, the identifier is a language-
defined special name.
Variables, Keywords
Variables
• A variable is a named place in the memory where a programmer can store data and later
retrieve the data using the variable “name”
• An assignment statement creates new variables and gives them values.
• In python, a variable need not be declared with a specific type before its usage. The type of it
will be decided by the value assigned to it.
Values and types
A value is one of the basic things a program works with, like a letter or a number.
Consider an example, It consists of integers and strings, floats, etc.,
>>> print(4)
4
>>> message = 'And now for something completely different'
>>> n = 17
>>> pi = 3.1415926535897931
Keywords
Keywords are a list of reserved words that have predefined meaning. Keywords are special
vocabulary and cannot be used by programmers as identifiers for variables, functions, constants
or with any identifier name. Attempting to use a keyword as an identifier name will cause an
error. The following table shows the Python keywords.
Statement
• A statement is a unit of code that the Python interpreter can execute.
• We have seen two kinds of statements:
— assignment statement: We assign a value to a variable using the assignment statement
(=). An assignment statement consists of an expression on the
right-hand side and a variable to store the result. In python ,there is special feature for
multiple assignments, where more than one variable can be initialized in single
statement.
Ex: str=”google”
x = 20+y
a, b, c = 2, “B”, 3.5
print statement : print is a function which takes string or variable as a argument to display it
on the screen.
Following are the examples of statements –
>>> x=5 #assignment statement
>>> x=5+3 #assignment statement
>>> print(x) #printing statement
end : The print function will automatically advance to the next line. For instance, the following
will print on two lines:
code output
print("A") A
print("B") B
print("C", end=" ") CE
print("E")
Expressions
An expression is a combination of values, variables, and operators. A value all by itself is
considered an expression, and so is a variable, so the following are all legal expressions. If you
type an expression in interactive mode, the interpreter evaluates it and displays the result:
>>> x=5
>>> x+1
6
• Operators are special symbols that represent computations like addition and multiplication.
The values the operator is applied to are called operands.
• Here is list of arithmetic operators
Operator Meaning Example
+ Addition Sum= a+b
- Subtraction Diff= a-b
Division a=2
/ b=3
div=a/b
(div will get a value 1.3333333)
Floor Division – F = a//b
// returns only integral A= 4//3 (X will get a value 1)
part of qotient after
division
% Modulus – remainder A= a %b
after (Remainder after dividing a by b)
Division
** Exponent E = x** y
(means x to the power of y)
• Relational or Comparison Operators: are used to check the relationship (like less than,
greater than etc) between two operands. These operators return a Boolean value either True
or False.
• Assignment Operators: Apart from simple assignment operator = which is used for assigning
values to variables, Python provides compound assignment operators.
• For example,
statements Compound
statement
x=x+y x+=y
y=y//2 y//=2
• Logical Operators: The logical operators and, or, not are used for comparing or negating the
logical values of their operands and to return the resulting logical value. The values of the
operands on which the logical operators operate evaluate to either True or False. The result of
the logical operator is always a Boolean value, True or False.
— Multiplication and Division are the next priority. Out of these two operations, whichever
comes first in the expression is evaluated.
— Addition and Subtraction are the least priority. Out of these two operations, whichever
appears first in the expression is evaluated i.e., they are evaluated from left to right .
Example : x = 1 + 2 ** 3 / 4 * 5
Data Types
Data types specify the type of data like numbers and characters to be stored and manipulated
within a program.
Basic data types of Python are
• Numbers
• Boolean
• Strings
• list
• tuple
• dictionary
• None
• Numbers
Integers, floating point numbers and complex numbers fall under Python numbers category.
They are defined as int, float and complex class in Python. Integers can be of any length; it is
only limited by the memory available. A floating-point number is accurate up to 15 decimal
places. Integer and floating points are separated by decimal points. 1 is an integer, 1.0 is
floating point number. Complex numbers are written in the form, x + yj, where x is the real
part and y is the imaginary part.
• Boolean
Booleans may not seem very useful at first, but they are essential when you start using
conditional statements. Boolean value is, either True or False. The Boolean values, True and
False are treated as reserved words.
• Strings
A string consists of a sequence of one or more characters, which can include letters, numbers,
and other types of characters. A string can also contain spaces. You can use single quotes or
double quotes to represent strings and it is also called a string literal. Multiline strings can be
denoted using triple quotes, ''' or " " ". These are fixed values, not variables that you literally
provide in your script.
For example,
1. >>> s = 'This is single quote string'
2. >>> s = "This is double quote string"
3. >>> s = '''This
is Multiline
string'''
• List
A list is formed(or created) by placing all the items (elements) inside square brackets [ ],
separated by commas.It can have any number of items and they may or may not be of different
types (integer, float, string, etc.).
Example : List1 = [3,8,7.2,"Hello"]
• Tuple
A tuple is defined as an ordered collection of Python objects. The only difference between tuple
and list is that tuples are immutable i.e. tuples can’t be modified after it’s created. It is
represented by tuple class. we can represent tuples using parentheses ( ).
Example: Tuple = (25,10,12.5,"Hello")
• Dictionary
Dictionary is an unordered collection of data values, which is used to store data values like a
map, which, unlike other Data Types that hold only a single value as an element, a
Dictionary consists of key-value pair. Key-value is provided within the dictionary to form it
more optimized. In the representation of a dictionary data type, each key-value pair during a
Dictionary is separated by a colon: whereas each key’s separated by a ‘comma’.
Example: Dict1 = {1 : 'Hello' , 2 : 5.5, 3 : 'World' }
• None
None is another special data type in Python. None is frequently used to represent the absence
of a value. For example, >>> money = None
Indentation
• In Python, Programs get structured through indentation (FIGURE below)
• Any statements written under another statement with the same indentation is interpreted to
belong to the same code block. If there is a next statement with less indentation to the left,
then it just means the end of the previous code block.
• In other words, if a code block has to be deeply nested, then the nested statements need to be
indented further to the right. In the above diagram, Block 2 and Block 3 are nested under
Block 1. Usually, four whitespaces are used for indentation and are preferred over tabs.
Incorrect indentation will result in Indentation Error.
Comments
• As programs get bigger and more complicated, they get more difficult to read. Formal
programming languages are many, and it is often difficult to look at a piece of code and figure
out what it is doing, or why.
• For this reason, it is a good idea to add notes to your programs to explain in natural language
what the program is doing. These notes are called comments, and in Python they start with
the # symbol:
Ex1. #This is a single-line comment
Ex2. ''' This is a
multiline
comment '''
• Comments are most useful when they document non-obvious features of the code. It is
reasonable to assume that the reader can figure out what the code does; it is much more
useful to explain why.
Reading Input
Python provides a built-in function called input that gets input from the keyboard. When this
function is called, the program waits for the user input. When the user press the Enter key,
the program resumes and input returns user value as a string.
For example
>>> inp = input()
Welcome to world of python
>>> print(inp)
Welcome to world of python
• It is a good idea to have a prompt message telling the user about what to enter as a value.
You can pass that prompt message as an argument to input function.
>>>x=input('Please enter some text:\n')
Please enter some text:
Roopa
>>> print(x)
Roopa
• The sequence \n at the end of the prompt represents a newline, which is a special character
that causes a line break. That’s why the user’s input appears below the prompt.
• If you expect the user to type an integer, you can try to convert the return value to int using
the int() function:
Example1:
>>> prompt = 'How many days in a week?\n'
>>> days = input(prompt)
How many days in a week?
7
>>> type(days)
<class 'str'>
Example 2:
>>> x=int(input('enter number\n'))
enter number
12
>>> type(x)
<class 'int'>
Print Output
Format operator
• The format operator, % allows us to construct strings, replacing parts of the strings with the
data stored in variables.
• When applied to integers, % is the modulus operator. But when the first operand is a string,
% is the format operator.
• For example, the format sequence “%d” means that the operand should be formatted as an
integer (d stands for “decimal”):
Example 1:
>>> camels = 42
>>>'%d' % camels
'42'
• A format sequence can appear anywhere in the string, so you can embed a value in a
sentence:
Example 2 :
>>> camels = 42
>>> 'I have spotted %d camels.' % camels
'I have spotted 42 camels.'
• If there is more than one format sequence in the string, the second argument has to be a
tuple. Each format sequence is matched with an element of the tuple, in order.
• The following example uses “%d” to format an integer, “%g” to format a floating point number
, and “%s” to format a string:
Example 3:
>>> 'In %d years I have spotted %g %s.' % (3, 0.1, 'camels')
'In 3 years I have spotted 0.1 camels.'
Format function
format() : is one of the string formatting methods in Python3, which allows multiple
substitutions and value formatting. This method lets us concatenate elements within a string
through positional formatting.
Two types of Parameters:
— positional_argument
— keyword_argument
• Positional argument: It can be integers, floating point numeric constants, strings, characters
and even variables.
• Keyword argument : They is essentially a variable storing some value, which is passed as
parameter.
Use the index numbers >>>print("Every {3} should know the use of {2} {1}
of the values to change programming and {0}" .format("programmer", "Open",
the order that they "Source", "Operating Systems"))
appear in the string Every Operating Systems should know the use of Source
Open programming and programmer
f-strings
Formatted strings or f-strings were introduced in Python 3.6. A f-string is a string literal that is
prefixed with “f”. These strings may contain replacement fields, which are expressions enclosed
within curly braces {}. The expressions are replaced with their values.
Example : >>>a=10
>>>print(f”the value is {a}”)
the value is 10
Is Operator
• If we run these assignment statements:
a = 'banana'
b = 'banana'
Figure: (a)
• We know that a and b both refer to a string, but we don’t know whether they refer to the
same string. There are two possible states, shown in Figure (a).
• In one case, a and b refer to two different objects that have the same value. In the second
case, they refer to the same object. That is, a is an alias name for b and viceversa. In other
words, these two are referring to same memory location.
• To check whether two variables refer to the same object, you can use the is operator.
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
• When two variables are referring to same object, they are called as identical objects.
• When two variables are referring to different objects, but contain a same value, they are
known as equivalent objects.
>>>s1=input(“Enter a string:”)
>>>s2= input(“Enter a string:”)
>>>s1 is s2 #check s1 and s2 are identical
False
>>>s1 == s2 #check s1 and s2 are equivalent
True
• If two objects are identical, they are also equivalent, but if they are equivalent, they are not
necessarily identical.
A conditional statement gives the developer to ability to check conditions and change the
behaviour of the program accordingly. The simplest form is the if statement:
1) The if Decision Control Flow Statement
Syntax, x=1
if condition: if x>0:
statement 1 print("positive
statement 2 number")
……………..
statement n Output:
positive number
if condition :
statements
else :
statements
Example:
x=6
if x%2 == 0:
print('x is even')
else :
print('x is
odd') Figure : if-Then-Else Logic
Syntax Example
if condition1: x=1
Statement y=6
elif condition2: if x < y:
Statement print('x is less than y')
.................... elif x > y:
elif condition_n:
print('x is greater than y')
else:
Statement
print('x and y are equal')
else:
Statement Output:
X is less than y
4) Nested if Statement
• The conditional statements can be nested. That is, one set of conditional statements can be
nested inside the other.
• Let us consider an example, the outer conditional statement contains two branches.
Example Output:
x=3 x is less than y
y=4
if x == y:
print('x and y are equal')
else:
if x < y:
print('x is less than y')
else:
print('x is greater than y')
• Nested conditionals make the code difficult to read, even though there are proper
indentations. Hence, it is advised to use logical operators like and to simplify the nested
conditionals.
• If Python detects that there is nothing to be gained by evaluating the rest of a logical
expression, it stops its evaluation and does not do the computations in the rest of the logical
expression. When the evaluation of a logical expression stops because the overall value is
already known, it is called short-circuiting the evaluation.
• However ,if the first part of logical expression results in True, then the second part has to be
evaluated to know the overall result. The short-circuiting not only saves the computational
time, but it also leads to a technique known as guardian pattern.
Example 2
>>> x = 6
>>> y = 0
>>> x >= 2 and y != 0 and (x/y) > 2
False
Example 3
>>> x >= 2 and (x/y) > 2 and y != 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
• In the first logical expression, x >= 2 is False so the evaluation stops at first condition itself.
• In the second logical expression, x >= 2 is True but y != 0 is False so it never reach the
condition (x/y).
• In the third logical expression, the y!= 0 is placed after the (x/y)>2 condition so the
expression fails with an error.
• In the second expression, we say that y != 0 acts as a guard to insure that we only execute
(x/y) if y is non-zero.
Iteration repeats the execution of a sequence of code. Iteration is useful for solving many
programming problems. Iteration and conditional execution form the basis for algorithm
construction.
In the above example, variable i is initialized to 1. Then the condition i <=5 is being checked. If
the condition is true, the block of code containing print statement print(i) and increment
statement (n=n+1) are executed. After these two lines, condition is checked again. The
procedure continues till condition becomes false, that is when n becomes 6. Now, the while-
loop is terminated and next statement after the loop will be executed. Thus, in this example,
Also notice that, variable i is initialized before starting the loop and it is incremented inside
the loop. Such a variable that changes its value for every iteration and controls the total
execution of the loop is called as iteration variable or counter variable. If the count variable is
not updated properly within the loop, then the loop may enter into infinite loop.
• Here, the condition is always True, which will never terminate the loop. Sometimes, the
condition is given such a way that it will never become false and hence by restricting the
program control to go out of the loop. This situation may happen either due to wrong
condition or due to not updating the counter variable.
• Hence to overcome this situation, break statement is used. The break statement can be
used to break out of a for or while loop before the loop is finished.
• Here is a program that allows the user to enter up to 10 numbers. The user can stop early
by entering a negative number.
while True: Output:
Enter a number: 23
num = eval(input('Enter a number: ')) 23
if num<0: Enter a number: 34
34
break Enter a number: 56
print(num) 56
Enter a number: -12
• In the above example, observe that the condition is kept inside the loop such a way that, if the
user input is a negative number, the loop terminates. This indicates that, the loop may
terminate with just one iteration (if user gives negative number for the very first time) or it
may take thousands of iteration (if user keeps on giving only positive numbers as input).
Hence, the number of iterations here is unpredictable. But we are making sure that it will not
All the lines are printed except the one that starts with the hash sign because when the
continue is executed, it ends the current iteration and jumps back to the while statement to
start the next iteration, thus skipping the print statement.
Definite loops using for
• Sometimes we want to loop through a set of things such as a list of words, the lines in a file,
or a list of numbers. When we have a list of things to loop through, we can construct a
definite loop using a for statement.
• for statement loops through a known set of items so it runs through as many iterations as
there are items in the set.
• There are two versions in for loop:
— for loop with sequence
— for loop with range( ) function
— In the example, the variable friends is a list of three strings and the for loop goes through
the list and executes the body once for each of the three strings in the list.
— name is the iteration variable for the for loop. The variable name changes for each iteration
of the loop and controls when the for loop completes. The iteration variable steps
successively through the three strings stored in the friends variable.
• The for loop can be used to print (or extract) all the characters in a string as shown below :
for i in "Hello": Output:
print(i, end=‟\t‟) H e l l o
The start and end indicates starting and ending values in the sequence, where end is
excluded in the sequence (That is, sequence is up to end-1). The default value of start is 0.
The argument steps indicates the increment/decrement in the values of sequence with the
default value as 1. Hence, the argument steps is optional. Let us consider few examples on
usage of range() function.
EX:1 Program code to print the message multiple times
for i in range(3): Hello
print('Hello’) Hello
Hello
EX:2 Program code to print the numbers in sequence. Here iteration variable i takes the
value from 0 to 4 excluding 5. In each iteration value of i is printed.
for i in range(5): 0 1 2 3 4
print(i, end= “\t”)
EX:3 Program to allow the user to find squares of any three number
for i in range(5,0,-1): 5
4
print(i) 3
print('Blast off!!') 2
1
Blast off !!
Functions
Function types
• Built in functions
• User defined functions
Built-in functions
• Python provides a number of important built in functions that we can use without needing
to provide the function definition.
• Built-in functions are ready to use functions.
• The general form of built-in functions: function_name(arguments)
An argument is an expression that appears between the parentheses of a function call and
each argument is separated by comma .
Random numbers
• Most of the programs that we write take predefined input values and produces expected
output values. such programs are said to be deterministic. Determinism is usually a good
thing, since we expect the same calculation to yield the same result.
• But it is not case always, for some applications, we want the computer to be unpredictable.
Games are an obvious example, but there are many applications.
• Making a program truly nondeterministic turns out to be not so easy, but there are ways to
make it at least seem nondeterministic. One of them is to use algorithms that generate
pseudorandom numbers.
• The function random returns a random float between 0.0 and 1.0 and for integer between (1
and 100 etc) .
• Python has a module called random, in which functions related to random numbers are
available.
• To generate random numbers. Consider an example program to use random() function which
generates random number between 0.0 and 1.0 ,but not including 1.0. In the below program,
it generates 5 random numbers
• The function randint() takes the parameters low and high, and returns an integer between
low and high (including both).
>>> import random
>>> random.randint(5,10)
10
>>> random.randint(5,10)
6
>>> random.randint(5,10)
7
• To choose an element from a sequence at random, you can use choice():
>>> t = [1, 2, 3]
>>> random.choice(t)
2
>>> random.choice(t)
3
• The first line in the function def fname(arg_list) is known as function header/definition. The
remaining lines constitute function body.
• The function header is terminated by a colon and the function body must be indented.
• To come out of the function, indentation must be terminated.
• Unlike few other programming languages like C, C++ etc, there is no main() function or
specific location where a user-defined function has to be called.
• The programmer has to invoke (call) the function wherever required.
• Consider a simple example of user-defined function –
Function calls
• A function is a named sequence of instructions for performing a task.
• When we define a function we will give a valid name to it, and then specify the instructions
for performing required task. Then, whenever we want to do that task, a function is called by
its name.
Consider an example,
>>> type(33)
<class 'int'>
• Here, type function is called to know the datatype of the value. The expression in
parenthesis is called the argument of the function. The argument is a value or variable that
we are passing into the function as input to the function.
• It is common to say that a function “takes” an argument and “returns” a result. The result is
called the return value.
The return Statement and void Function
• A function that performs some task, but do not return any value to the calling function is
known as void function. The examples of user-defined functions considered till now are void
functions.
• The function which returns some result to the calling function after performing a task is
known as fruitful function. The built-in functions like mathematical functions, random
number generating functions etc. that have been considered earlier are examples for fruitful
functions.
• One can write a user-defined function so as to return a value to the calling function as
shown in the following example.
— In the above example, The function addition() take two arguments and returns their sum to
the receiving variable x.
— When a function returns something and if it is not received using a some variable, the
return value will not be available later.
• When we are using built –in functions, that yield results are fruitful functions.
>>>math.sqrt(2)
1.7320508075688772
• The void function might display something on the screen or has some other effect. they
perform an action but they don’t have return value.
Consider an example
>>> result=print('python')
python
>>> print(result)
None
>>> print(type(None))
<class 'NoneType'>
• The scope of a variable determines the portion of the program where you can access a
particular identifier. There are two basic scopes of variables in Python
— Global variables
— Local variables
• This means that local variables can be accessed only inside the function in which they are
declared, whereas global variables can be accessed throughout the program body by all
functions. When you call a function, the variables declared inside it are brought into scope.
Default Parameters
• For some functions, you may want to make some parameters optional and use default values
in case the user does not want to provide values for them. This is done with the help of
default argument values.
• Default argument values can be specified for parameters by appending to the parameter
name in the function definition the assignment operator ( = ) followed by the default value.
How It Works
The function named say is used to print a string as many times as specified. If we don't supply a
value, then by default, the string is printed just once. We achieve this by specifying a default
argument value of 1 to the parameter times . In the first usage of say , we supply only the string
and it prints the string once. In the second usage of say , we supply both the string and an
argument 5 stating that we want to say the string message 5 times.
Keyword Arguments
• If you have some functions with many parameters and you want to specify only some of
them, then you can give values for such parameters by naming them - this is called
keyword arguments - we use the name (keyword) instead of the position to specify the
arguments to the function.
• There are two advantages
— one, using the function is easier since we do not need to worry about the order of the arguments.
— Two, we can give values to only those parameters to which we want to, provided that the other
parameters have default argument values.
def func(a, b=5, c=10):
print('a is', a, 'and b is', b, 'and c is', c)
func(3, 7)
func(25, c=24)
func(c=50, a=100)
Output:
a is 3 and b is 7 and c is 10
a is 25 and b is 5 and c is 24
a is 100 and b is 5 and c is 50
How It Works
The function named func has one parameter without a default argument value, followed by two
parameters with default argument values. In the first usage, func(3, 7) , the parameter a gets the
value 3 , the parameter b gets the value 7 and c gets the default value of 10 . In the second
usage func(25, c=24) , the variable a gets the value of 25 due to the position of the argument.
Then, the parameter c gets the value of 24 due to naming i.e. keyword arguments. The variable b
gets the default value of 5 . In the third usage func(c=50, a=100) , we use keyword arguments for
all specified values. Notice that we are specifying the value for parameter c before that for a even
though a is defined before c in the function definition.
Note: statement blocks of the function definition * and ** are not used with args and kwargs.
sqrtcmdline.py
from sys import argv
from math import sqrt
if len(argv) < 3:
print('Supply range of values')
else:
for n in range(int(argv[1]), int(argv[2]) + 1):
print(n, sqrt(n))
Output:
C:\Code>python sqrtcmdline.py 2 5
2 1.4142135623730951
3 1.7320508075688772
4 2.0
5 2.23606797749979
Question Bank
Q.
Questions
No.
Module -2
Python Collection Objects, Classes
Strings
– Creating and Storing Strings,
– Basic String Operations,
– Accessing Characters in String by Index Number,
– String Slicing and Joining,
– String Methods,
– Formatting Strings,
Lists
– Creating Lists,
– Basic List Operations,
– Indexing and Slicing in Lists,
– Built-In Functions Used on Lists,
– List Methods
– Sets, Tuples and Dictionaries.
Files
– reading and writing files.
Class
– Class Definition
– Constructors
– Inheritance
– Overloading
Strings
>>> S='hello'
>>> Str="hello"
>>> M="""This is a multiline
String across two lines"""
• Sometimes we may want to have a string that contains backslash and don't want it to be
treated as an escape character. Such strings are called raw string. In Python raw string is
created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash (\) as a
literal character.
The in operator
• The in operator of Python is a Boolean operator which takes two string operands.
• It returns True, if the first operand appears as a substring in second operand, otherwise
returns False.
Ex:1
if 'pa' in “roopa:
print('Your string contains “pa”.')
Ex:2
if ';' not in “roopa”:
print('Your string does not contain any semicolons.')
Ex:3 we can avoid writing longer codes like this
if t=='a' or t=='e' or t=='i' or t=='o' or t=='u':
instead we can write this code with ‘in’ operator
if t in 'aeiou':
String comparison
• Basic comparison operators like < (less than), > (greater than), == (equals) etc. can be
applied on string objects.
• Such comparison results in a Boolean value True or False.
• Internally, such comparison happens using ASCII codes of respective characters.
• List of ASCII values for some of the character set
A – Z : 65 – 90
a – z : 97 – 122
0 – 9 : 48 – 57
Space : 32
Enter Key : 13
This loop traverses the string and displays each letter on a line by itself. The loop condition is
index < len(fruit), so when index is equal to the length of the string, the condition is false, and
the body of the loop is not executed. The last character accessed is the one with the index
len(fruit)-1, which is the last character in the string.
• Another way to write a traversal is with a for loop:
fruit="grapes" Output:
for char in fruit: g r a p e s
print(char,end="\t")
Each time through the loop, the next character in the string is assigned to the variable char.
The loop continues until no characters are left.
Accessing Characters in String by Index Number
• We can get at any single character in a string using an index specified in square brackets
• Python supports negative indexing of string starting from the end of the string as shown
below:
character g o o d m o r n i n g
index -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
• Only a required number of characters can be extracted from a string using colon (:)
symbol.
Where,
start : the position from where it starts
end : the position where it stops(excluding end position)
step: also known as stride, is used to indicate number of steps to be incremented
after extracting first character. The default value of stride is 1.
• If start is not mentioned, it means that slice should start from the beginning of the string.
• If the end is not mentioned, it indicates the slice should be till the end of the string.
• If the both are not mentioned, it indicates the slice should be from the beginning till the
end of the string.
Examples: s = ”abcdefghij”
index 0 1 2 3 4 5 6 7 8 9
characters a b c d e f g h i j
Reverse index -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
By the above set of examples, one can understand the power of string slicing and of Python
script. The slicing is a powerful tool of Python which makes many task simple pertaining to data
types like strings, Lists, Tuple, Dictionary etc.
String Methods
• Strings are an example of Python objects.
• An object contains both data (the actual string itself) and methods, which are effectively
functions that are built into the object and are available to any instance of the object.
• Python provides a rich set of built-in classes for various purposes. Each class is enriched
with a useful set of utility functions and variables that can be used by a Programmer.
• The built-in set of members of any class can be accessed using the dot operator as shown–
objName.memberMethod(arguments)
• The dot operator always binds the member name with the respective object name. This is very
essential because, there is a chance that more than one class has members with same name.
To avoid that conflict, almost all Object-oriented languages have been designed with this
common syntax of using dot operator.
• Python has a function called dir which lists the methods available for an object. The type
function shows the type of an object and the dir function shows the available methods.
— The methods are usually called using the object name. This is known as method
invocation. We say that a method is invoked using an object.
lstrip()
to remove whitespace at left side
Formatting Strings
Strings can be formatted in different ways, using :
• format operator
• f-string
• format( ) function
Format operator
• The format operator, % allows us to construct strings, replacing parts of the strings with the
data stored in variables.
• When applied to integers, % is the modulus operator. But when the first operand is a string,
% is the format operator.
• For example, the format sequence “%d” means that the operand should be formatted as an
integer (d stands for “decimal”):
Example 1:
>>> camels = 42
>>>'%d' % camels
'42'
• A format sequence can appear anywhere in the string, so you can embed a value in a
sentence:
Example 2 :
>>> camels = 42
>>> 'I have spotted %d camels.' % camels
'I have spotted 42 camels.'
• If there is more than one format sequence in the string, the second argument has to be a
tuple. Each format sequence is matched with an element of the tuple, in order.
• The following example uses “%d” to format an integer, “%g” to format a floating point number
, and “%s” to format a string:
Example 3:
>>> 'In %d years I have spotted %g %s.' % (3, 0.1, 'camels')
'In 3 years I have spotted 0.1 camels.'
f-strings
Formatted strings or f-strings were introduced in Python 3.6. A f-string is a string literal that is
prefixed with “f”. These strings may contain replacement fields, which are expressions enclosed
within curly braces {}. The expressions are replaced with their values.
Example : >>>a=10
>>>print(f”the value is {a}”)
the value is 10
Format function
format() : is one of the string formatting methods in Python3, which allows multiple
substitutions and value formatting. This method lets us concatenate elements within a string
through positional formatting.
Two types of Parameters:
— positional_argument
— keyword_argument
• Positional argument: It can be integers, floating point numeric constants, strings, characters
and even variables.
• Keyword argument : They is essentially a variable storing some value, which is passed as
parameter.
Use the index numbers >>>print("Every {3} should know the use of {2} {1}
of the values to change programming and {0}" .format("programmer", "Open",
the order that they "Source", "Operating Systems"))
appear in the string Every Operating Systems should know the use of Source
Open programming and programmer
Lists
A list is a sequence
• A list is an ordered sequence of values.
• It is a data structure in Python. The values inside the lists can be of any type (like
integer, float, strings, lists, tuples, dictionaries etc) and are called as elements or
items.
• The elements of lists are enclosed within square brackets.
Creating Lists
There are various ways of creating list:
• Creating a simple list:
L = [1,2,3]
Use square brackets to indicate the start and end of the list, and separate the items
by commas.
• Empty list is equivalent of 0 or ' '. The empty list [ ] can be created using list function
or empty square brackets.
a = [ ]
l=list()
• Long lists If you have a long list to enter, you can split it across several lines, like
below:
nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,]
• We can use eval(input( )) to allow the user to enter a list. Here is an example:
L = eval(input('Enter a list: ')) Output:
print('The first element is ', L[0]) Enter a list: [5,7,9]
The first element is 5
• The statement a = a + [1, 3, 5] reassigns a to the new list [10, 20, 30, 1, 3, 5].
• The statement a += [10] updates a to be the new list [10, 20, 30, 1, 3, 5, 10].
• Similarly, the * operator repeats a list a given number of times:
lt=[0]*20 Output:
print(lt) [0, 0, 0, 0]
print([1, 2, 3] * 3) [1, 2, 3, 1, 2, 3, 1, 2, 3]
print(["abc"]*5) ['abc','abc','abc','abc','abc']
List Aliasing
• When an object is assigned to other using assignment operator, both of them will
refer to same object in the memory. The association of a variable with an object is
called as reference.
a = [10, 20, 30, 40]
b = [10, 20, 30, 40]
print('Is ', a, ' equal to ', b, '?', sep='', end=' ')
print(a == b)
print('Are ', a, ' and ', b, ' aliases?', sep='', end=' ')
print(a is b)
• The assignment statement (c=d) causes variables c and d to refer to the same list
object. We say that c and d are aliases. In other words, there are two references to the
same object in the memory.
• An object with more than one reference has more than one name, hence we say that
object is aliased. If the aliased object is mutable, changes made in one alias will
reflect the other.
• heterolist.py demonstrates that lists may be heterogeneous; that is, a list can hold
elements of varying types. Accessing elements of the list with their index.
collection=[24.2,4,'word',eval,19,-0.03,'end'] 24.2
4
print(collection[0]) word
print(collection[1]) <built-in function eval>
19
print(collection[2])
-0.03
print(collection[3])
end
print(collection[4]) [24.2,4,'word',
print(collection[5]) <built-in function eval>,
print(collection[6]) 19, -0.03, 'end']
print(collection)
names=["roopa","shwetha","rajani"] Output:
print(names[-2]) shwetha
print(names[-1]) rajani
• Accessing the elements within inner list can be done by double-indexing. The inner
list is treated as a single element by outer list. the first index indicates position of
inner list inside outer list, and the second index means the position of particular
value within inner list.
ls=[[1,2],['EC','CS']] Output:
print(ls[1][0]) EC
List slicing
• We can make a new list from a portion of an existing list using a technique known
as slicing. A list slice is an expression of the form
print(lst[:]) [10,20,30,40,50,60,70,80,90,100]
• An end value greater than the length of the list is treated as the length of the list.
Ex: lst[4:100] # here 100 is treated as len(lst)
• A slice operator on the left side of an assignment can update multiple elements:
t = ['a','b','c','d','e','f'] Output:
t[1:3] = ['x', 'y']
['a', 'x', 'y', 'd', 'e', 'f']
print(t)
The sum( ) function only works when the list elements are numbers. The other
functions (max(), len(), etc.) work with lists of strings and other types that can be
comparable.
Ex: Program to read the data from the user and to compute sum and average of
those numbers
In this program, we initially create an empty list. Then, we are taking an infinite while
loop. As every input from the keyboard will be in the form of a string, we need to
convert x into float type and then append it to a list. When the keyboard input is a
string ‘done’, then the loop is going to get terminated. After the loop, we will find the
average of those numbers with the help of built-in functions sum() and len().
List Methods
There are several built-in methods in list class for various purposes. They are as follows:
append: The append() method adds a single item to the existing list. It doesn't return a new list;
rather it modifies the original list.
fruits = ['apple', 'banana', 'cherry'] fruits after appending: ['apple',
fruits.append("orange") 'banana', 'cherry', 'orange']
print("fruits after appending:",fruits)
extend: the extend() method takes a single argument (a list) and adds it to the end.
insert : Inserts a new element before the element at a given index. Increases the length of the
list by one. Modifies the list.
ls=[3,5,10]
ls.insert(1,"hi") ls after inserting element: [3,'hi,5,10]
print("ls after inserting element:",ls)
index : Returns the lowest index of a given element within the list. Produces an error if the
element does not appear in the list. Does not modify the list.
ls=[15, 4, 2, 10, 5, 3, 2, 6]
print(ls.index(2)) 2
print(ls.index(2,3,7)) 6
#finds the index of 2 between the position 3 to 7
reverse: Physically reverses the elements in the list. The list is modified.
ls=[4,3,1,6]
ls.reverse()
print("reversed list :" ,ls) reversed list : [6, 1, 3, 4]
sort: Sorts the elements of the list in ascending order. The list is modified.
ls=[3,10,5, 16,-2]
ls.sort()
print("list in ascending order: ",ls) ls in ascending order:[-2,3,5,10,16]
ls.sort(reverse=True)
print("list in descending order: ",ls) ls in descending order:[16,10,5,3,-2]
clear: This method removes all the elements in the list and makes the list empty
ls=[1,2,3] []
ls.clear()
print(ls)
Deleting elements
There are several ways to delete elements from a list. Python provides few built-in
methods for removing elements as follows:
• pop() : This method deletes the last element in the list, by default or removes the
item at specified index p and returns its value. Index is passed as an argument to
pop(). pop modifies the list and returns the element that was removed.
• remove(): This method can be used, if the index of the element is not known, then
— Note that, this function will remove only the first occurrence of the specified value,
but not all occurrences.
— Unlike pop() function, the remove() function will not return the value that has been
deleted.
• del: This is an operator to be used when more than one item to be deleted at a time.
Here also, we will not get the items deleted.
my_list = ['p','r','o','b','l','e','m']
del my_list[2]
['p', 'r', 'b', 'l', 'e', 'm']
#deletes the element at position 2
print(my_list)
del my_list[1:5]
#deletes the elements between position 1 and 5
['p', 'm']
print(my_list)
del my_list NameError:
#deletes the entire list name 'my_list' is not
print(my_list) defined
#Deleting all odd indexed elements of a list
t=['a', 'b', 'c', 'd', 'e'] ['a', 'c', 'e']
del t[1::2]
print(t)
Sets
• Python provides a data structure that represents a mathematical set. As with mathematical
sets, we use curly braces { } in Python code to enclose the elements of a literal set.
• Python distinguishes between set literals and dictionary literals by the fact that all the items
in a dictionary are colon-connected (:) key-value pairs, while the elements in a set are simply
values.
• Unlike Python lists, sets are unordered and may contain no duplicate elements. The following
• We can make a set out of a list using the set conversion function:
>>> L = [10, 13, 10, 5, 6, 13, 2, 10, 5]
>>> S = set(L)
>>> S
{10, 2, 13, 5, 6}
As we can see, the element ordering is not preserved, and duplicate elements appear only
once in the set.
• Python set notation exhibits one important difference with mathematics: the expression { }
does not represent the empty set. In order to use the curly braces for a set, the set must
contain at least one element. The expression set( ) produces a set with no elements, and
thus represents the empty set.
• Python reserves the { } notation for empty dictionaries. Unlike in mathematics, all sets in
python must be finite. Python supports the standard mathematical set operations of
intersection, union, set difference, and symmetric difference. Table below shows the python
syntax for these operations.
Example : The following interactive sequence computes the union and intersection and two
sets and tests for set membership:
Tuples
• Tuples are one of Python's simplest and most common collection types. A tuple is a
sequence of values much like a list.
• The values stored in a tuple can be any type, and they are indexed by integers.
Tuple creation
There are many ways to a tuple. They are as follows
Creation Output Comments
t ='a','b','c','d','e' ('a','b','c','d','e') Syntactically, a tuple is a
print(t) comma-separated list of
values.
t=('a','b','c','d','e') ('a','b','c','d','e') Although not necessary, it
print(t) is common to enclose
tuples in parentheses
t1=tuple() ()
print(t1)
Creating an empty tuple
t2=() ()
print(t2)
t=tuple('lupins') print(t) ('l','u','p','i','n','s') Passing arguments to tuple
function
t = tuple(range(3))
print(t) (0, 1, 2)
t1 = ('a',) <class 'tuple’> To create a tuple with a
print(type(t1)) single element, you have to
include the final comma.
t2 = ('a')
print(type(t2))
<class 'str’>
t = (1, "Hello", 3.4) (1, "Hello", 3.4) tuple with mixed datatypes
print(t)
t=("mouse",[8,4,6],(1,2,3)) ("mouse",[8,4,6],(1,2,3)) Creating nested tuple
print(my_tuple)
index operator [ ] to access an item in a tuple where the index starts from 0
College =('r','n','s','i','t') Output:
print(College[0]) r
print(College[4]) t
g
#accessing elements of nested tuple 4
n_tuple =("program", [8, 4, 6],(1, 2, 3)) ('b', 'c’)
print(n_tuple[0][3])
print(n_tuple[1][1])
In the above example, n_tuple is a tuple with mixed elements i.e., string, list , or tuple
Tuples are immutable
• One of the main differences between lists and tuples in Python is that tuples are immutable,
that is, one cannot add or modify items once the tuple is initialized.
For example:
>>> t = (1, 4, 9)
>>> t[0] = 2
• Similarly, tuples don't have .append and .extend methods as list does. Using += is possible,
but it changes the binding of the variable, and not the tuple itself:
>>> t = (1, 2)
>>> q = t
>>> t += (3, 4)
>>>t
(1, 2, 3,4)
>>>
q (1, 2)
• Be careful when placing mutable objects, such as lists, inside tuples. This may lead to very
confusing outcomes when changing them. For example:
• You can use the += operator to "append" to a tuple - this works by creating a new tuple with
the new element you "appended" and assign it to its current variable; the old tuple is not
changed, but replaced.
Comparing tuples
• Tuples can be compared using operators like >, <, >=, == etc.
• Python starts by comparing the first element from each sequence. If they are equal,
it goes on to the next element, and so on, until it finds elements that differ.
Subsequent elements are not considered (even if they are really big).
Example Description
>>>(0, 1, 2) < (0, 3, 4) Step1:0==0
True Step2:1 < 3 true
Comparison stops at this stage
Step3:2 < 4 ignored
>>>(0, 1, 2000000) < (0, 3, 4) Step1:0==0
True Step2:1 < 3 true
Comparison stops at this stage
Step3:2000000 < 4 ignored
• When we use relational operator on tuples containing non-comparable types, then TypeError
will be thrown.
>>>(1,'hi')<('hello','world')
TypeError: '<' not supported between instances of 'int' and 'str'
• The sort function works the same way. It sorts primarily by first element, but in the case of a
tie, it sorts by second element, and so on. This pattern is known as DSU
—Decorate a sequence by building a list of tuples with one or more sort keys preceding
the elements from the sequence,
—Sort the list of tuples using the Python built-in sort, and
Consider a program of sorting words in a sentence from longest to shortest, which illustrates
DSU property.
txt = 'this is an example for sorting tuple'
words = txt.split()
t = list()
for word in words:
t.append((len(word), word))
print(t) #displays unsorted list
t.sort(reverse=True)
print(t) #displays sorted list
res = list()
for length, word in t:
res.append(word)
print(res) #displays sorted list with only words but not length
Output:
[(4,'this'),(2,'is'),(2,'an'),(7,'example'),(3,'for'),(7,'sorting'),(5,'tuple')]
[(7,'sorting'),(7,'example'),(5,'tuple'),(4,'this'),(3,'for'),(2,'is'),(2,'an')]
['sorting', 'example', 'tuple', 'this', 'for', 'is', 'an']
— sort compares the first element, length, first, and only considers the second element to break
— The second loop traverses the list of tuples and builds a list of words in descending order of
length. The four-character words are sorted in reverse alphabetical order, so “sorting”
appears before “example” in the list.
Ex:
>>>x, y, z = 1, 2, 3
>>> print(x) #prints 1
>>> print(y) #prints 2
>>> print(y) #prints 3
• when we use a tuple on the left side of the assignment statement, we omit the parentheses,
but the following is an equally valid syntax:
• A particularly clever application of tuple assignment allows us to swap the values of two
variables in a single statement.
>>> a=10
>>> b=20
>>> a, b = b, a
>>> print(a, b) #prints 20 10
Both sides of this statement are tuples, but the left side is a tuple of variables; the right side
is a tuple of expressions. Each value on the right side is assigned to its respective variable
on the left side. All the expressions on the right side are evaluated before any of the
assignments.
• The number of variables on the left and the number of values on the right must be the same:
>>> a, b = 1, 2, 3
ValueError: too many values to unpack
• The symbol _ can be used as a disposable variable name if we want only few elements of a
tuple, acting as a placeholder:
>>>a = 1, 2, 3, 4
>>>_, x, y, _ = a
>>>print(x) #prints 2
>>>print(y) #prints 3
• Sometimes we may be interested in using only few values in the tuple. This can be achieved
by using variable-length argument tuples (variable with a *prefix) can be used as a catch-all
variable, which holds multiple values of the tuple.
• More generally, the right side can be any kind of sequence (string, list, or tuple). For example,
to split an email address into a user name and a domain. Code is as follows:
>>>addr = '[email protected]'
>>> uname, domain = addr.split('@')
>>> print(uname) #prints monty
>>> print(domain) #prints python.org
• As dictionary may not display the contents in an order, we can use sort() on lists and then
print in required order.
• However, since the list of tuples is a list, and tuples are comparable, we can now sort the list
of tuples. Converting a dictionary to a list of tuples is a way for us to output the contents of a
dictionary sorted by key.
Dictionaries
• Dictionary is a set of key: value pairs, with the requirement that the keys are unique (within
one dictionary).
• Dictionary is a mapping between a set of indices (which are called keys) and a set of values.
Each key maps to a value. The association of a key and a value is called a key-value pair.
• Unlike other sequences, which are indexed by a range of numbers, dictionaries are indexed
by keys, which can be any immutable type such as strings, numbers, tuples(if they contain
only strings, numbers, or tuples).
d = { }
• The function dict creates a new dictionary with no items.
empty_d = dict()
• Placing a comma-separated list of key:value pairs within the braces adds initial key:value
pairs to the dictionary.
Notice the output, the order of items in a dictionary is not same as its creation. As dictionary
members are not indexed over integers, the order of elements inside it may vary.
Accessing elements of dictionary
• To access an element with in a dictionary, we use square brackets exactly as we would with a
list. In a dictionary every key has an associated value.
>>> print(eng2sp['two'])
'dos'
The key 'two' always maps to the value “dos” so the order of the items doesn’t matter.
>>> print(eng2sp['four'])
KeyError: 'four'
Length of the dictionary
The len function works on dictionaries; it returns the number of key-value pairs:
>>> num_word = {1: 'one', 2: 'two', 3:'three'}
>>>len(num_word)
3
in operator with dictionaries
The in operator works on dictionaries to check whether something appears as a key in the
dictionary (but not the value).
>>>eng2sp ={'one':'uno','two':'dos','three':'tres'}
To see whether something appears as a value in a dictionary, you can use the method values,
which returns a collection of values, and then use the in operator:
— For dictionaries, Python uses an algorithm called a hash table that has a remarkable
property: the in operator takes about the same amount of time no matter how many
items are in the dictionary
• Assume that we need to count the frequency of alphabets in a given string. There are
different ways to do it .
— Create 26 variables to represent each alphabet. Traverse the given string and increment the
corresponding counter when an alphabet is found.
— Create a list with 26 elements (all are zero in the beginning) representing alphabets.
Traverse the given string and increment corresponding indexed position in the list when an
alphabet is found.
— Create a dictionary with characters as keys and counters as values. When we find a
character for the first time, we add the item to dictionary. Next time onwards, we increment
the value of existing item. Each of the above methods will perform same task, but the logic
of implementation will be different. Here, we will see the implementation using dictionary.
It can be observed from the output that, a dictionary is created here with characters as keys
and frequencies as values. Note that, here we have computed histogram of counters.
• Dictionary in Python has a method called get(), which takes key and a default value as two
arguments. If key is found in the dictionary, then the get() function returns corresponding
value, otherwise it returns default value. For example,
In the above example, when the get() function is taking 'jan' as argument, it returned
corresponding value, as 1 is found in month directory . Whereas, when get() is used with
'october' as key, the default value ‘not found’ (passed as second argument) is returned.
• The function get() can be used effectively for calculating frequency of alphabets in a string.
Here is the modified version of the program –
word = 'vishveshwarayya'
d = dict()
for c in word:
d[c] = d.get(c,0) + 1
print(d)
Output:
{'v':2, 'i':1, 's':2, 'h':2, 'e':1, 'w':1, 'a':3, 'r':1, 'y':2}
In the above program, for every character c in a given string, we will try to retrieve a value. When
the c is found in d, its value is retrieved, 1 is added to it, and restored. If c is not found, 0 is
taken as default and then 1 is added to it.
Looping and dictionaries
• When a for-loop is applied on dictionaries, it traverses the keys of the dictionary. This loop
prints each key and the corresponding value:
• Sometimes we may want to access key-value pair together from the dictionary, it can be done
by using items() method as follows:
Output:
names ={'chuck':1,'annie':42,'jan':100} chuck 1
for k, v in names.items(): annie 42
print(k, v) jan 100
If we want to print the keys in alphabetical order, first make a list of the keys in the dictionary
using the keys method available in dictionary objects, and then sort that list and loop through
the sorted list, looking up each key and printing out key-value pairs in sorted order as follows:
names ={'chuck':1, 'annie':42, 'jan':100} Output:
lst = list(names.keys())
print(lst) ['chuck', 'annie', 'jan']
lst.sort() Elements in alphabetical order:
annie 42
print("Elements in alphabetical order:")
chuck 1
for key in lst:
jan 100
print(key, names[key])
FILES
— Binary files : These files are capable of storing text, image, video, audio, database files, etc
which contains the data in the form of bits.
Opening files
• To perform read or write operation on a file, first file must be opened.
• Opening the file communicates with the operating system, which knows where the data for
each file is stored.
• A file can be opened using a built-in function open( ).
• The syntax of open( ) function is as below :
fhand= open(“filename”, “mode”)
Here,
filename -> is name of the file to be opened. This string may be just a name of the file, or it
may include pathname also. Pathname of the file is optional when the file is
stored in current working directory.
mode -> This string specifies an access mode to use the file i.e., for reading, writing,
appending etc.
fhand -> It is a reference to a file object, which acts as a handler for all further operations on
files.
• If mode is not specified , by default, open( ) uses mode ‘r’ for reading.
>>> fhand = open('sample.txt')
>>>print(fhand)
<_io.TextIOWrapper name='sample.txt' mode='r' encoding='cp1252'>
Note: In this example, we assume the file ‘sample.txt‘ stored in the same folder that you are in
when you start Python. Otherwise path of the file has to be passed.
fhand = open('c:/users/roopa/desktop/sample.txt')
• If the file does not exist, open will fail with a traceback and you will not get a handle to access
the contents of the file:
>>> fhand = open('fibo.txt')
Traceback (most recent call last): File "<stdin>", line 1, in <module> FileNotFoundError: [Errno
2] No such file or directory: 'fibo.txt'
• List of modes in which files can be opened are given below :
Mode Meaning
Opens a file for reading purpose. If the specified file does not exist in the
r specified path, or if you don‟t have permission, error message will be
displayed. This is the default mode of open() function in Python.
Opens a file for writing purpose. If the file does not exist, then a new file
w with the given name will be created and opened for writing. If the file
already exists, then its content will be over-written.
Opens a file for appending the data. If the file exists, the new content will
a be appended at the end of existing content. If no such file exists, it will
be created and new content will be written into it.
r+ Opens a file for reading and writing.
Opens a file for both writing and reading. Overwrites the existing file if the
w+ file exists. If the file does not exist, creates a new file for reading and
writing.
Opens a file for both appending and reading. The file pointer is at the end
a+ of the file if the file exists. The file opens in the append mode. If the file
does not exist, it creates a new file for reading and
rb Opens a file for reading only in binary format
wb Opens a file for writing only in binary format
ab Opens a file for appending only in binary format
Reading files
• Once the specified file is opened successfully. The open( ) function provides handle which is a
refers to the file.
• There are several ways to read the contents of the file
1. using the file handle as the sequence in for loop.
sample.txt
Python is a high level programming language
it is introduced by Guido van rossam.
Python is easy to learn and simple to code an application
Print_count.py
fhand = open('sample.txt')
count = 0
for line in fhand:
count = count + 1
print('Line :', count,line)
print('Line Count:', count)
fhand.close()
Output:
Line Count: 3
• In the above example, ‘for’ loop simply counts the number of lines in the file and prints them
out.
• When the file is read using a for loop in this manner, Python takes care of splitting the data
in the file into separate lines using the newline character. Python reads each line through
the newline and includes the newline as the last character in the line variable for each
iteration of the for loop.
• Notice the above output, there is a gap of two lines between each of the output lines. This is
because, the new-line character \n is also a part of the variable line in the loop, and the
print() function has default behavior of adding a line at the end. To avoid this double-line
spacing, we can remove the new-line character attached at the end of variable line by using
built-in string function rstrip() as below –
• Because the for loop reads the data one line at a time, it can efficiently read and count the
lines in very large files without running out of main memory to store the data. The above
program can count the lines in any size file using very little memory since each line is read,
counted, and then discarded.
2. The second way of reading a text file loads the entire file into a string :
• If you know the file is relatively small compared to the size of your main memory, you can
read the whole file into one string using the read method on the file handle.
>>>fhand = open('sample.txt')
>>> content= fhand.read()
>>> print(len(content))
140 # count of characters
>>> print(content[:7] )
Python
• In this example, the entire contents (all 140 characters) of the file sample.txt are read
directly into the variable content. We use string slicing to print out the first 7 characters of
the string data stored in variable content.
• When the file is read in this manner, all the characters including all of the lines and newline
characters are one big string in the variable content. It is a good idea to store the output of
read as a variable because each call to read exhausts the resource.
• We read the file name from the user and place it in a variable named fname and open that
file. Now we can run the program repeatedly on different files.
Writing files
• To write a file, you have to open it with mode “w” as a second parameter:
>>> fout = open('output.txt', 'w')
>>> print(fout)
<_io.TextIOWrapper name='output.txt' mode='w' encoding='cp1252'>
• If the file already exists, opening it in write mode clears out the old data and starts fresh, so
be careful! If the file doesn’t exist, a new one is created.
• The write method of the file handle object puts data into the file, returning the number of
characters written.
>>> line1 = "This here's the wattle,\n"
>>> fout.write(line1)
24
• We must make sure to manage the ends of lines as we write to the file by explicitly inserting
the newline character when we want to end a line. The print statement automatically
appends a newline, but the write method does not add the newline automatically.
>>> line2 = 'the emblem of our land.\n'
>>> fout.write(line2)
24
• When you are done writing, you have to close the file to make sure that the last bit of data is
physically written to the disk so it will not be lost if the power goes off.
>>> fout.close( )
• We could close the files which we open for read as well, but we can be a little sloppy if we are
only opening a few files since Python makes sure that all open files are closed when the
program ends. When we are writing files, we want to explicitly close the files so as to leave
nothing to chance.
• To avoid such chances, the with statement allows objects like files to be used in a way that
ensures they are always cleaned up promptly and correctly.
with open("test.txt", 'w') as f :
f.write("my first file\n")
f.write("This file\n\n")
f.write("contains three lines\n")
my first file
This file
These lines are written into file
“test.txt”
contains three lines
Class definition
• We have used many of Python’s built-in types; now we are going to define a new type.
• Class is a user-defined data type which binds data and functions together into single entity.
• Class is just a prototype (or a logical entity/blue print) which will not consume any memory.
• An object is an instance of a class and it has physical existence. One can create any number
of objects for a class.
• A class can have a set of variables (also known as attributes, member variables) and member
functions (also known as methods).
• Class − A user-defined prototype for an object that defines a set of attributes that
characterize any object of the class. The attributes are data members and methods, accessed
via dot notation.
• Class can defined with following syntax
class ClassName:
'Optional class documentation string'
class_suite
— Class can be created using keyword class
— The class has a documentation string, which can be accessed via ClassName.__doc__.
— The class_suite consists of all the component statements defining class members, data
attributes and functions.
• As an example, we will create a class called Point .
class Point:
pass #creating empty class
Here, we are creating an empty class without any members by just using the keyword pass
within it.
• Class can be created only with documentation string as follows:
class Point: Output:
"""Represents a point in 2-D <class
space."""
'__main__.Point'>
Print(Point)
Because Point is defined at the top level, its “full name” is __main__.Point. The term __main__
indicates that the class Point is in the main scope of the current module.
• Creating a new object is called instantiation, and the object is an instance of the class.
Class can have any number of instances. Object of the class can be created as follows:
blank=Point() Output:
print(blank) <__main__.Point object at 0x03C72070>
Here blank is not the actual object, rather it contains a reference to Point .When we print an
object, Python tells which class it belongs to and where it is stored in the memory. Observe
the output ,It clearly shows that, the object occupies the physical space at location
0x03C72070(hexadecimal value).
Attributes
• An object can contain named elements known as attributes. One can assign values to
these attributes using dot operator. For example, (0,0) represents the origin, and
coordinates (x,y) represents some point. so we can assign two attributes x and y for the
object blank of a class Point as below:
>>> blank.x =
3.0
>>> blank.y =
4.0 Object Diagram
A state diagram that shows an object and its attributes is called an object diagram.
The variable blank refers to a Point object, which contains two attributes. Each attribute
refers to a floating-point number.
• We can read the value of an attribute using the same syntax:
>>> blank.y # read the value of an attribute y
4.0
>>> x = blank.x # Attribute x of an object can be assigned to other
variables
>>> x
3.0
The expression blank.x means, “Go to the object blank refers to and get the value of x”. In
the example, we assign that value to a variable named x. There is no conflict between the
variable x and the attribute x.
Class attribute − A variable that is shared by all instances of a class. They are common to
all the objects of that class. Class variables are defined within a class but outside any of the
class's methods. Class variables are not used as frequently as instance variables are.
Instance attribute − instance attributes defined for individual objects.. Attributes of one
instance are not available for another instance of the same class.
Following example demonstrate the usage of instance attribute and class attribute:
class Flower:
‘’’folwers and it behaviour’’’
color = 'white' # class attribute shared by all instances
Here, the attributes usedIn created is available only for the object rose, but not for lotus.
Thus, usedIn is instance attribute but not class attribute. We can use attributes with dot
notation as part of any expression.
>>> '(%g, %g)' % (blank.x, blank.y)
'(3.0, 4.0)'
>>> sum = blank.x + blank.y
>>> sum
5.0
Example :
Program to create a class Point representing a point on coordinate system. Implement following
functions –
— A function read_point() to receive x and y attributes of a Point object as user input
— A function distance() which takes two objects of Point class as arguments and computes
the Euclidean distance between them.
import math
class Point:
""" class Point representing a coordinate point"""
def read_point(p):
p.x=float(input("x coordinate:"))
p.y=float(input("y coordinate:"))
def print_point(p):
print("(%g,%g)"%(p.x, p.y))
def distance(p1,p2):
d=math.sqrt((p1.x-p2.x)**2+(p1.y-p2.y)**2)
return d
p1=Point() #create first object
print("Enter First point:")
read_point(p1) #read x and y for p1
In the above program, we have used 3 functions which are not members of the class:
The formula is
since all these functions does not belong to class, they are called as normal functions without
dot notation.
Constructor method
• Python uses a special method called a constructor method. Python allows you to define only
one constructor per class. Also known as the __init__() method, it will be the first method
definition of a class and its syntax is
def __init__(self, parameter_1, parameter_2, …., parameter_n):
statement(s)
• The __init__() method defines and initializes the instance variables. It is invoked as soon as
an object of a class is instantiated.
• The __init__() method for a newly created object is automatically executed with all of its
parameters .
• The __init__() method is indeed a special method as other methods do not receive this
treatment. The parameters for __init__() method are initialized with the arguments that you
had passed during instantiation of the class object.
• Class methods that begin with a double underscore (__) are called special methods as they
have special meaning. The number of arguments during the instantiation of the class object
should be equivalent to the number of parameters in __init__() method (excluding the self
parameter).
• Example:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
p1 = Person("John", 36)
print(p1.name)
print(p1.age)
Inheritance
• Inheritance enables new classes to receive or inherit variables and methods of existing
classes. Inheritance is a way to express a relationship between classes. If you want to build
a new class, which is already similar to one that already exists, then instead of creating a
new class from scratch you can reference the existing class and indicate what is different by
overriding some of its behavior or by adding some new functionality.
• A class that is used as the basis for inheritance is called a superclass or base class. A class
that inherits from a base class is called a subclass or derived class. The terms parent class
and child class are also acceptable terms to use respectively.
• A derived class inherits variables and methods from its base class while adding additional
variables and methods of its own. Inheritance easily enables reusing of existing code. Class
BaseClass, on the left, has one variable and one method.
• Class DerivedClass, on the right, is derived from BaseClass and contains an additional
variable and an additional method.
A polygon is a closed figure with 3 or more sides. Say, we have a class called Polygon
defined as follows.
def area(self):
return self.w * self.h
def perimeter(self):
return 2 * (self.w + self.h)
print("area of rectangle",r.area())
print("perimeter of rectangle",r.perimeter())
print("area of rectangle",s.area())
print("perimeter of rectangle",s.perimeter())
Overloading
Operator overloading
• Normally operators like +,-,/,*, works fine with built-in datatypes.
• Basic operators like +, -, * etc. can be overloaded. To overload an operator, one needs to write
a method within user-defined class. The method should consist of the code what the
programmer is willing to do with the operator.
• Let us consider an example to overload + operator to add two Time objects by defining
__add__ method inside the class.
def __add__(self,t2):
sum = Time()
sum.hour = self.hour + t2.hour
sum.minute= self.minute + t2.minute
sum.second = self.second + t2.second
return sum
t3=t1+t2 #When we apply the + operator to Time objects, Python invokes __add__.
→ when the statement t3=t1+t2 is used, it invokes a special method __add__() written inside
the class. Because, internal meaning of this statement is t3 = t1.__add__(t2)
→ Here, t1 is the object invoking the method. Hence, self inside __add__() is the reference (alias)
of t1. And, t2 is passed as argument explicitly.
Python provides a special set of methods which have to be used for overloading operator.
Following table shows gives a list of operators and their respective Python methods for
overloading.
Example program
This program demonstrates creating or defining a class and its object, __init__ method
and operator overloading concept by overloading + operator by redefining __add__
function.
def __str__(self):
return "(%d,%d)"%(self.x, self.y)
Question Bank
Q.
Questions
No.
LISTS
1 What are lists? Lists are mutable. Justify the statement with examples.
2 How do you create an empty list? Give example.
3 Explain + and * operations on list objects with suitable examples.
4 Discuss various built-in methods in lists.
Implement a Python program using Lists to store and display the average of N integers accepted
5
from the user.
6 What are the different ways of deleting elements from a list? Discuss with suitable functions.
7 How do you convert a list into a string and vice-versa? Illustrate with examples.
8 Write a short note on: a) Parsing lines b) Object Aliasing
Write the differences between
a. sort() and sorted()
9
b) append() and extend()
c) join() and split()
10 When do we encounter TypeError, ValueError and IndexError?
11 What are identical and equivalent objects? How are they identified? give examples.
12 Discuss different ways of traversing a list.
1 How tuples are created in Python? Explain different ways of accessing and creating them.
Write a Python program to read all lines in a file accepted from the user and print all email
2
addresses contained in it. Assume the email addresses contain only non-white space characters.
3 List merits of dictionary over list.
4 Explain dictionaries. Demonstrate with a Python program.
5 Compare and contrast tuples with lists.
6 Define a dictionary type in Python. Give example.
7 Explain get() function in dictionary with suitable code snippet.
8 Discuss the dictionary methods keys() and items() with suitable programming examples.
9 Explain various steps to be followed while debugging a program that uses large datasets.
10 Briefly discuss key-value pair relationship in dictionaries.
11 Define a tuple. Give an example to illustrate creation of a tuple
12 What is mutable and immutable objects? Give examples.
13 Explain List of Tuples and Tuple of Lists.
14 How do you create an empty tuple and a single element tuple?
15 Explain DSU pattern with respect to tuples. Give example
16 How do you create a tuple using a string and using a list? Explain with example.
17 Explain the concept of tuple-comparison. How tuple-comparison is implemented in sort()
function? Discuss.
Write a short note on:
a) Tuple assignment
18
b) Dictionaries and Tuples
19 How tuples can be used as a key for dictionaries? Discuss with example.
20 Discuss pros and cons of various sequences like lists, strings and tuples.
Explain the following operations in tuples:
21a) Sum of two tuples
b) Slicing operators
Discuss the Tuple Assignment with example .Explain how swapping can be done using tuple
22 assignment. Write a Python program to input two integers a and b , and swap those numbers .
Print both input and swapped numbers.
SETS
23 Explain how to create an empty set.
24 List the merits of sets over list. Demonstrate it with example.
Write a program to create an intersection, union, set difference, and symmetric
25
difference of sets.
26 Define class and object. Given an example for creating a class and an object of that class.
27 What are attributes? Explain with an example and respective object diagram.
Write a program to create a class called Rectangle with the help of a corner point, width and
height. Write following functions and demonstrate their working:
28
a. To find and display center of rectangle
b. To display point as an ordered pair
c. To resize the rectangle
35 Difference between pure function and modifier. write a python program to find duration of the
event if start and end time is given by defining class TIME.
Write a program to create a class called ‘Time’ to represent time in HH:MM:SS format.
Perform following operations:
36 i. T3=T1+T2
ii. T4=T1+360
iii. T5=130+T1
38 Write a program to add two point objects by overloading + operator. Overload _ _str_ _( ) to
display point as an ordered pair.
39 Define polymorphism. Demonstrate polymorphism with function to find histogram to count the
number of times each letters appears in a word and in sentence.
40 Using datetime module write a program that gets the current date and prints the day of the week.
41 What does the keyword self in python mean? Explain with an example.
Module 3
Data Pre-processing and Data Wrangling
Topics Covered
Loading from CSV files, Accessing SQL databases. Cleansing Data with Python:
Stripping out extraneous information, Normalizing data AND Formatting data.
Combining and Merging Data Sets – Reshaping and Pivoting – Data
Transformation – String Manipulation, Regular Expressions.
Pandas features a number of functions for reading tabular data as a DataFrame object. Table
below has a summary of all of them.
Functions, which are meant to convert text data into a DataFrame. The options for these
functions fall into a few categories:
• Indexing: can treat one or more columns as the returned DataFrame, and whether to get
column names from the file, the user, or not at all.
• Type inference and data conversion: this includes the user-defined value conversions and
custom list of missing value markers.
• Datetime parsing: includes combining capability, including combining date and time
information spread over multiple columns into a single column in the result.
• Unclean data issues: skipping rows or a footer, comments, or other minor things like
numeric data with thousands separated by commas.
Since ex1.csv is comma-delimited, we can use read_csv to read it into a DataFrame. If file
contains any other delimiters then, read_table can be used by specifying the delimiter.
Suppose we wanted the message column to be the index of the returned DataFrame. We can
either indicate we want the column at index 4 or named 'message' using the index_col
argument:
To form a hierarchical index from multiple columns, just pass a list of column numbers or
names:
The parser functions have many additional arguments to help you handle the wide variety of
exception file formats that occur. For example, you can skip the first, third, and fourth rows
of a file with skiprows:
• The na_values option can take either a list or set of strings to consider missing values:
• A database is a file that is organized for storing data. Most databases are organized
like a dictionary in the sense that they map from keys to values. The biggest
difference is that the database is on disk (or other permanent storage), so it persists
after the program ends. Because a database is stored on permanent storage, it can
store far more data than a dictionary, which is limited to the size of the memory in
the computer.
• Like a dictionary, database software is designed to keep the inserting and accessing
of data very fast, even for large amounts of data. Database software maintains its
performance by building indexes as data is added to the database to allow the
computer to jump quickly to a particular entry.
• There are many different database systems which are used for a wide variety of
purposes including: Oracle, MySQL, Microsoft SQL Server, PostgreSQL, and SQLite.
• Python to work with data in SQLite database files, many operations can be done more
conveniently using software called the Database Browser for SQLite which is freely
available from:
http://sqlitebrowser.org/
• Using the browser you can easily create tables, insert data, edit data, or run simple
SQL queries on the data in the database
Database concepts
• For the first look, database seems to be a spreadsheet consisting of multiple sheets.
The primary data structures in a database are tables, rows and columns.
• Each table may consist of n number of attributes and m number of tuples (or
records).
• Every tuple gives the information about one individual. Every cell (i, j) in the table
indicates value of jth attribute for ith tuple.
Consider the problem of storing details of students in a database table. The format may
look like –
Roll No Name DOB Marks
Student1 1 Akshay 22/10/2001 82.5
Student 2 2 Arun 20/12/2000 81.3
............... ............... ............... ...............
............... ............... ............... ...............
Student m ............... ............... ............... ...............
Thus, table columns indicate the type of information to be stored, and table rows gives
record pertaining to every student. We can create one more table say department
consisting of attributes like dept_id, homephno, City. To relate this table with a
respective Rollno stored in student, and dept_id stored in department table. Thus, there
is a relationship between two tables in a single database. There are softwares that can
maintain proper relationships between multiple tables in a single database and are
known as Relational Database Management Systems (RDBMS).
import sqlite3
conn = sqlite3.connect('music.db') #create database name music
cur = conn.cursor()
cur.execute('CREATE TABLE Tracks (title TEXT, plays INTEGER)')
conn.close()
• The connect operation makes a “connection” to the database stored in the file
music.db in the current directory. If the file does not exist, it will be created.
• A cursor is like a file handle that we can use to perform operations on the data
stored in the database. Calling cursor() is very similar conceptually to calling
open() when dealing with text files.
• Once we have the cursor, we can begin to execute commands on the contents of
the database using the execute() method is as shown in figure below.
Example1: Write a python to create student Table from college database.(the attributes of
student like Name,USN,Marks.)Perform the following operations like insert,delete and
retrieve record from student Table.
import sqlite3
conn = sqlite3.connect(‘college.db’)
cur=conn.cursor()
print(“Opened database successfully”)
cur.execute(‘CREATE TABLE student(name TEXT, usn NUMERIC, Marks INTEGER)’)
print(“Table created successfully”)
cur.execute(‘INSERT INTO student(name,usn,marks) values (?,?,?)’,(‘akshay’,’1rn16mca16’,30))
cur.execute(‘insert into student(name,usn,marks) values (?,?,?)’,(‘arun’,’1rn16mca17’,65))
print(‘student’)
cur.execute(‘SELECT name, usn ,marks from student’)
for row in cur:
print(row)
cur.execute(‘DELETE FROM student WHERE Marks < 40’)
cur.execute(‘select name,usn,marks from student’)
conn.commit()
cur.close()
Output:
Opened database successfully
Table created successfully
student
('akshay', '1rn16mca16', 30)
('arun', '1rn16mca17', 65)
Example 2: Write a python code to create a database file(music.sqlite) and a table named
Tracks with two columns- title , plays. Also insert , display and delete the contents of the
table
import sqlite3
conn = sqlite3.connect('music.sqlite')
cur = conn.cursor()
cur.execute('CREATE TABLE Tracks (title TEXT, plays INTEGER)')
cur.execute(“INSERT INTO Tracks (title, plays) VALUES ('Thunderstruck', 200)”)
cur.execute(“INSERT INTO Tracks (title, plays) VALUES (?, ?)”,('My Way', 15))
conn.commit()
print('Tracks:')
cur.execute('DELETE FROM Tracks WHERE plays < 100')
cur.execute('SELECT title, plays FROM Tracks')
for row in cur:
print(row)
cur.close()
Output
Tracks:
('Thunderstruck', 200)
Extraneous information refers to irrelevant or unnecessary data that can clutter a dataset and
make it difficult to analyze. This could include duplicate entries, empty fields, or irrelevant
columns. Stripping out this information involves removing it from the dataset, resulting in a more
concise and manageable dataset.
To strip out extraneous information in a Pandas DataFrame, you can use various methods and
functions provided by the library. Some commonly used methods include:
• dropna( ): This method removes rows with missing values (NaN or None) from the DataFrame.
You can specify the axis (0 for rows and 1 for columns) along which the rows or columns with
missing values should be dropped.
Example:
df = df.dropna()
#This will remove all rows that contain at least one missing value.
• drop( ): The drop() method in Pandas is used to remove columns from a DataFrame. It can be
used to drop a single column or multiple columns at once.
df.drop(columns, axis=1, inplace=False)
Ex:
cars2 = cars_data.drop(['Doors','Weight'],axis='columns')
• drop_duplicates(): methods to remove missing values and duplicate rows specify the
columns based on which the duplicates should be checked.
• loc[ ] and iloc[ ]: These indexing methods allow you to select specific rows and columns from
the DataFrame. They are used to select only the relevant data and exclude the unwanted
information.
Ex:1
Ex:2
• Filtering: conditional statements can be used to filter the DataFrame and select only the
rows that meet certain criteria. This allows to remove unwanted data based on specific
conditions.
Example:
Data normalization is the process of transforming data into a consistent format to facilitate
comparison and analysis. This may involve converting data to a common unit of measurement,
formatting dates and times consistently, or standardizing data formats. Normalization ensures
that data is comparable and can be easily processed and analysed.
Normalization is a crucial step in data preprocessing for machine learning tasks. It involves
transforming numerical features to have a mean of 0 and a standard deviation of 1. This process
ensures that all features are on the same scale, enabling efficient and accurate learning by
machine learning algorithms.
We import all the required libraries, NumPy and sklearn. You can see that we import the
preprocessing from the sklearn itself. That’s why this is the sklearn normalization method. We
created a NumPy array with some integer value that is not the same. We called the normalize
method from the preprocessing and passed the numpy_array, which we just created as a
parameter. We can see from the results, all integer data are now normalized between 0 and 1.
We can also normalize the particular dataset column. In this, we are going to discuss about that.
We import the library pandas and sklearn. We created a dummy CSV file, and we are now
loading that CSV file with the help of the pandas (read_csv) package. We print that CSV file
which we recently loaded. We read the particular column of the CSV file using the np. array and
store the result to value_array. We called the normalize method from the preprocessing and
passed the value_array parameter.
Method 3: Convert to normalize without using the columns to array (using sklearn)
In the previous method 2, we discussed how to a particular CSV file column we could normalize.
But sometimes we need to normalize the whole dataset, then we can use the below method
where we do normalize the whole dataset but along column-wise (axis = 0). If we mention the
axis = 1, then it will do row-wise normalize. The axis = 1 is by default value.
Now, we pass the whole CSV file along with one more extra parameter axis =0, which said to the
library that the user wanted to normalize the whole dataset column-wise.
We called the MinMaxScalar from the preprocessing method and created an object
(min_max_Scalar) for that. We did not pass any parameters because we need to normalize the
data between 0 and 1. But if you want, you can add your values which will be seen in the next
method.
We first read all the names of the columns for further use to display results. Then we call the
fit_tranform from the created object min_max_Scalar and passed the CSV file into that. We get
the normalized results which are between 0 and 1.
The sklearn also provides the option to change the normalized value of what you want. By
default, they do normalize the value between 0 and 1. But there is a parameter which we called
feature_range, which can set the normalized value according to our requirements.
Here, We call the MinMaxScalar from the preprocessing method and create an object
(min_max_Scalar) for that. But we also pass another parameter inside of the MinMaxScaler
(feature_range). That parameter value we set 0 to 2. So now, the MinMaxScaler will normalize the
data values between 0 to 2. We first read all the names of the columns for further use to display
results. Then we call the fit_tranform from the created object min_max_Scalar and passed the
CSV file into that. We get the normalized results which are between 0 and 2.
We can also do normalize the data using pandas. These features are also very popular in
normalizing the data. The maximum absolute scaling does normalize values between 0 and 1.
We are applying here .max () and .abs() as shown below:
We call each column and then divide the column values with the .max() and .abs(). We print the
result, and from the result, we confirm that our data normalize between 0 and 1.
The next method which we are going to discuss is the z-score method. This method converts the
information to the distribution. This method calculates the mean of each column and then
subtracts from each column and, at last, divides it with the standard deviation. This normalizes
the data between -1 and 1.
We calculate the column’s mean and subtract it from the column. Then we divide the column
value with the standard deviation. We print the normalized data between -1 and 1.
One popular library is Scikit-Learn, which offers the StandardScaler class for normalization.
Here's an example of how to use StandardScaler to normalize a dataset:
Formatting Data:
• Formatting data in Pandas involves transforming and presenting data in a structured and
readable manner. Pandas, a popular Python library for data analysis, offers various methods
and techniques to format data effectively.
• One of the key features of Pandas is its ability to handle different data types and structures.
It provides specific formatting options for each data type, ensuring that data is displayed in
a consistent and meaningful way. For example, numeric data can be formatted with specific
number of decimal places, currency symbols, or percentage signs. Date and time data can be
formatted in various formats, such as "dd/mm/yyyy" or "hh:mm:ss".
• Pandas also allows users to align data within columns, making it easier to read and compare
values. This can be achieved using the "justify" parameter, which takes values such as "left",
"right", or "center". Additionally, Pandas provides options to control the width of columns,
ensuring that data is presented in a visually appealing manner.
Data contained in pandas objects can be combined together in a number of built-in ways:
• pandas.merge connects rows in DataFrames based on one or more keys. This will be
familiar to users of SQL or other relational databases, as it implements database join
operations.
• combine_first instance method enables splicing together overlapping data to fill in missing
values in one object with values from another.
import pandas as pd
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'],'data2': range(3)})
df1 df2
• The below examples shows many-to-one merge situation; the data in df1 has multiple rows
labeled a and b, whereas df2 has only one row for each value in the key column.
Observe that the 'c' and 'd' values and associated data are missing from the result. By default
merge does an 'inner' join; the keys in the result are the intersection. The outer join takes the
union of the keys, combining the effect of applying both left and right joins.
df1 df2
Many-to-many joins form the Cartesian product of the rows. Since there were 3 'b' rows in
the left DataFrame and 2 in the right one, there are 6 'b' rows in the result. The join method
only affects the distinct key values appearing in the result:
Merging on Index
The merge key or keys in a DataFrame will be found in its index. In this case, you can pass
left_index=True or right_index=True (or both) to indicate that the index should be used as the
merge key
By default concat works along axis=0, producing another Series. If you pass axis=1, the result
will instead be a DataFrame (axis=1 is the columns):
There are a number of fundamental operations for rearranging tabular data. These are
alternatingly referred to as reshape or pivot operations.
• Using the stack method on this data pivots the columns into the rows, producing a
Series.
From a hierarchically-indexed Series, we can rearrange the data back into a DataFrame with
unstack.
• By default the innermost level is unstacked (same with stack). You can unstack a different
level by passing a level number or name:
• Unstacking might introduce missing data if all of the values in the level aren’t found in each
of the subgroups:
• Data is frequently stored this way in relational databases like MySQL as a fixed schema allows
the number of distinct values in the item column to increase or decrease as data is added or
deleted in the table.
• The data may not be easy to work with in long format; it is preferred to have a DataFrame
containing one column per distinct item value indexed by timestamps in the date column.
ROOPA.H.M, Dept of MCA, RNSIT Page 24
111
Module 1 [22MCA31] Data Analytics using Python
The pivot() function is used to reshape a given DataFrame organized by given index / column
values. This function does not support data aggregation, multiple values will result in a
MultiIndex in the columns.
Syntax:
DataFrame.pivot(self, index=None, columns=None, values=None)
Example:
Suppose you had two value columns that you wanted to reshape simultaneously:
• By omitting the last argument, you obtain a DataFrame with hierarchical columns:
Data transformation
Data transformation is the process of converting raw data into a format that is suitable for
analysis and modeling. It's an essential step in data science and analytics workflows, helping to
unlock valuable insights and make informed decisions.
Few of the data transfer mechanisms are :
• Removing Duplicates
• Replacing Values
• Renaming Axis Indexes
• Discretization and Binning
• Detecting and Filtering Outliers
• Permutation and Random Sampling
i) Removing duplicates
Duplicate rows may be found in a DataFrame using method duplicated which returns a
boolean Series indicating whether each row is a duplicate or not. Relatedly,
drop_duplicates returns a DataFrame where the duplicated array is True.
data = DataFrame(
{ 'k1': ['one'] * 3 + ['two'] * 4,
'k2': [1, 1, 2, 3, 3, 4, 4] } )
data.duplicated() data.drop_duplicates()
data
• Suppose we wanted to find values in one of the columns exceeding one in magnitude:
• To select all rows having a value exceeding 1 or -1, we can use the any method on a
boolean DataFrame:
• Some times it is necessary to replace missing values with some specific values or NAN
values. It can be done by using replace method. Let’s consider this Series:
• The -999 values might be sentinel values for missing data. To replace these with NA
values that pandas understands, we can use replace, producing a new Series:
data.replace(-999, np.nan)
• If we want to replace multiple values at once, you instead pass a list then the substitute
value:
Like values in a Series, axis labels can be similarly transformed by a function or mapping
of some form to produce new, differently labeled objects. The axes can also be modified in
place without creating a new data structure.
import pandas as pd
• To create a transformed version of a data set without modifying the original, a useful
method is rename:
data.rename(index=str.title, columns=str.upper)
• rename can be used in conjunction with a dict-like object providing new values for a subset
of the axis labels:
data.rename(index={'OHIO': 'INDIANA'},
columns={'three': 'peekaboo'})
• Continuous data is often discretized or otherwise separated into “bins” for analysis.
Suppose we have data about a group of people in a study, and we want to group them into
discrete age buckets:
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
Let’s divide these into bins of 18 to 25, 26 to 35, 35 to 60, and finally 60 and older. To do
so, we have to use cut, a function in pandas:
import pandas as pd
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
bins = [18, 25, 35, 60, 100]
cats = pd.cut(ages, bins)
cats
• The object pandas returns is a special Categorical object. We can treat it like an array of
strings indicating the bin name; internally it contains a levels array indicating the distinct
category names along with a labeling for the ages data in the labels attribute:
cats.labels
cats.levels
Index([(18, 25], (25, 35], (35, 60], (60, 100]], dtype=object)
pd.value_counts(cats)
Consistent with mathematical notation for intervals, a parenthesis means that the side is
open while the square bracket means it is closed (inclusive).
Permuting (randomly reordering) a Series or the rows in a DataFrame is easy to do using the
numpy.random.permutation function. Calling permutation with the length of the axis you
want to permute produces an array of integers indicating the new ordering:
df
sampler = np.random.permutation(5)
array([1, 0, 2, 3, 4])
sampler
That array can then be used in ix-based indexing or the take function:
df.take(sampler)
Another type of transformation for statistical modeling or machine learning applica tions
is converting a categorical variable into a “dummy” or “indicator” matrix. If a column in a
DataFrame has k distinct values, you would derive a matrix or DataFrame containing k
columns containing all 1’s and 0’s. pandas has a get_dummies function for doing this,
though devising one yourself is not difficult. Let’s return to an earlier ex ample
DataFrame:
pd.get_dummies(df['key'])
In some cases, you may want to add a prefix to the columns in the indicator DataFrame,
which can then be merged with the other data. get_dummies has a prefix argument for
doing just this:
df_with_dummy = df[['data1']].join(dummies)
df_with_dummy
String Manipulation
Python has long been a popular data munging language in part due to its ease-of-use for
string and text processing. Most text operations are made simple with the string object’s built-
in methods. For more complex pattern matching and text manipulations, regular expressions
may be needed. pandas adds to the mix by enabling you to apply string and regular
expressions concisely on whole arrays of data, additionally handling the annoyance of missing
data.
Regular Expressions
Expressions.
RegEx Functions
The re module offers a set of functions that allows us to search a string for a match.
By using these functions we can search required pattern. They are as follows:
• match(): re.match() determine if the RE matches at the beginning of the string. The
method returns a match object if the search is successful. If not, it returns None.
import re Output:
abyss
pattern = '^a...s$'
Search successful.
test_string = 'abyss'
result = re.match(pattern,test_string)
if result:
print("Search successful.")
else:
print("Search unsuccessful.")
• search(): The search( ) function searches the string for a match, and returns a Match
object if there is a match. If there is more than one match found, only the first
occurrence of the match will be returned.
import re Output:
pattern='Tutorials' <re.Match object;
line ='Python Tutorials' span=(7,16),match='Tutorials'>
result = re.search(pattern, line)
Tutorials
print(result)
print(result.group())
• findall() : Find all substrings where the RE matches, and returns them as a list. It
searches from start or end of the given string and returns all occurrences of the
pattern. While searching a pattern, it is recommended to use re.findall() always, it
works like re.search() and re.match() both.
import re Output:
str = "The rain in Spain" ['ai', 'ai']
x = re.findall("ai", str)
print(x)
Special Sequences
A special sequence is a \ followed by one of the characters in the list below, and has a
special meaning:
\d Returns a match where the string contains "\d"
digits (numbers from 0-9)
\D Returns a match where the string DOES NOT "\D"
contain digits
[0-9] Returns a match for >>>str ="8 times before 11:45 AM"
any digit >>>re.findall("[0-9]", str)
between 0 and 9 ['8', '1', '1', '4', '5']
[0- Returns a match for >>>str = "8 times before 11:45 AM"
5][0- any two-digit numbers >>>re.findall("[0-5][0-9]", str)
9] ['11', '45']
from 00 and 59
[+] In
sets, +, *, ., |, (), $,{} has
no special meaning, >>>str ="8 times before 11:45 AM"
so [+] means: return a >>>re.findall("[+]", str)
match for []
any + character in the
string
EX:1 Search for lines that start with 'F', followed by 2 characters, followed by 'm:'
import re Output:
hand = open('pattern.txt')
for line in hand: From: Bengaluru^560098
line = line.rstrip() From:<[email protected]>
if re.search('^F..m:', From: <[email protected]>
line):
print(line)
The regular expression F..m: would match any of the strings “From:”, “Fxxm:”, “F12m:”,
or “F!@m:” since the period characters in the regular expression match any character.
Ex:2 Search for lines that start with From and have an at sign
import re Output:
hand = open('pattern.txt')
for line in hand: From:<[email protected]>
line = line.rstrip() From:
if re.search('^From:.+@', <[email protected]>
line):
print(line)
The search string ˆFrom:.+@ will successfully match lines that start with “From:”,
followed by one or more characters (.+), followed by an at-sign.
Ex:1 Extract anything that looks like an email address from the line.
import re
s = 'A message from [email protected] to [email protected] about meeting @2PM'
lst = re.findall('\S+@\S+', s)
print(lst)
Output: ['[email protected]', '[email protected]']
— Translating the regular expression, we are looking for substrings that have at least One or
more non-whitespace character, followed by an at-sign, followed by at least one more non-
whitespace character.
Ex:3 Search for lines that have an at sign between characters .The characters
must be a letter or number
import re Output:
hand = open('pattern.txt')
for line in hand: ['From:[email protected]']
line = line.rstrip() ['[email protected]']
x=re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-
Z]',line)
if len(x) > 0:
print(x)
Here, we are looking for substrings that start with a single lowercase letter, uppercase
letter, or number “[a-zA-Z0-9]”, followed by zero or more non-blank characters (\S*),
followed by an at-sign, followed by zero or more non-blank characters (\S*), followed by
an uppercase or lowercase letter. Note that we switched from + to * to indicate zero or
more non-blank characters since [a-zA-Z0-9] is already one non-blank character.
Remember that the * or + applies to the single character immediately to the left of the
plus or asterisk.
X-DSPAM-Confidence:
0.8475
X-DSPAM-Probability:
0.0000
X-DSPAM-Confidence:
0.6178
X-DSPAM-Probability:
0.0000
X-DSPAM-Confidence:
0.6961
X-DSPAM
done with the file
content
Ex:1 Search for lines that start with 'X' followed by any non whitespace
characters and ':' followed by a space and any number. The number can
include a decimal.
import re Output:
hand = open('file.txt') X-DSPAM-Confidence:
for line in hand: 0.8475
line = line.rstrip() X-DSPAM-Probability:
x =re.search('^X-.*: ([0-9.]+)', 0.0000
line) X-DSPAM-Confidence:
if x: 0.6178
print(x.group()) X-DSPAM-Probability:
0.0000
X-DSPAM-Confidence:
0.6961
• But, if we want only the numbers in the above output. We can use split() function on
extracted string. However, it is better to refine regular expression. To do so, we need
the help of parentheses.
When we add parentheses to a regular expression, they are ignored when matching
the string(with search()). But when we are using findall(), parentheses indicate that
while we want the whole expression to match, we are only interested in extracting a
portion of the substring that matches the regular expression.
Ex:2 Search for lines that start with 'X' followed by any non whitespace
characters and ':' followed by a space and any number. The number can include
a decimal. Then print the number if it is greater than zero.
import re Output:
hand = open('file.txt') ['0.8475']
for line in hand: ['0.0000']
line = line.rstrip() ['0.6178']
x = re.findall('^X.*: ([0-9.]+)', ['0.0000']
line) ['0.6961']
if len(x) > 0:
print(x)
• Let us consider another example, assume that the file contain lines of the form:
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
If we wanted to extract all of the revision numbers (the integer number at the end of
these lines) using the same technique as above, we could write the following program:
Ex:3 Search for lines that start with 'Details: rev=' followed by numbers and '.'
Then print the number if it is greater than zero.
import re
str="Details:http://source.sakaiproject.org/viewsvn/?view=rev&rev=3
9772"
x = re.findall('^Details:.*rev=([0-9]+)', str)
if len(x) > 0:
print(x)
Output:
['39772']
In the above example, we are looking for lines that start with Details:, followed by
any number of characters (.*), followed by rev=, and then by one or more digits. We
want to find lines that match the entire expression but we only want to extract the
integer number at the end of the line, so we surround [0-9]+ with parentheses.
Note that, the expression [0-9] is greedy, because, it can display very large
number. It keeps grabbing digits until it finds any other character than the digit.
Escape character
Character like dot, plus, question mark, asterisk, dollar etc. are meta characters in
regular expressions. Sometimes, we need these characters themselves as a part of
matching string. Then, we need to escape them using a backslash. For example,
import re Output:
x = 'We just received $10.00 for
cookies.' matched string: $10.00
y = re.search('\$[0-9.]+',x)
print("matched string:",y.group())
Here, we want to extract only the price $10.00. As, $ symbol is a metacharacter, we
need to use \ before it. So that, now $ is treated as a part of matching string, but not
as metacharacter.
Question Bank
Topics to be studied
• Submitting a form
• CSS Selectors.
Web Scraping
• The main purpose of web scraping is to collect and analyze data from
websites for various applications, such as research, business intelligence, or
creating datasets.
• Developers use tools and libraries like BeautifulSoup (for Python), Scrapy, or
Puppeteer to automate the process of fetching and parsing web data.
Python Libraries
• requests
• Beautiful Soup
• Selenium
Requests
import requests
# Specify the base URL
base_url = "https://jsonplaceholder.typicode.com"
# GET request
get_response = requests.get(f"{base_url}/posts/1")
print(f"GET Response:\n{get_response.json()}\n")
# POST request
new_post_data = {
'title': 'New Post',
'body': 'This is the body of the new post.',
'userId': 1
}
post_response = requests.post(f"{base_url}/posts", json=new_post_data)
print(f"POST Response:\n{post_response.json()}\n")
BeautifulSoup
• Used to extract tables, lists, paragraph and you can also put filters to extract
information from web pages.
• BeautifulSoup does not fetch the web page for us. So we use requests pip
install beautifulsoup4
133
BeautifulSoup
print(type(soup))
Tag Object
• This object is usually used to extract a tag from the whole HTML document.
• Beautiful Soup is not an HTTP client which means to scrap online websites
you first have to download them using the requests module and then serve
them to Beautiful Soup for scraping.
• This object returns the first found tag if your document has multiple tags with the same name.
from bs4 import BeautifulSoup
# Initialize the object with an HTML page
soup = BeautifulSoup('''
<html>
<b>RNSIT</b>
<b> Knowx Innovations</b>
</html>
''', "html.parser")
# Get the tag
tag = soup.b
print(tag)
# Print the output
print(type(tag))
134
• The tag contains many methods and attributes. And two important features of a tag are
its name and attributes.
• Name:The name of the tag can be accessed through ‘.name’ as suffix.
• Attributes: Anything that is NOT tag
• A document may contain multi-valued attributes and can be accessed using key-value pair.
• NavigableString Object: A string corresponds to a bit of text within a tag. Beautiful Soup uses
the NavigableString class to contain these bits of text
descendants generator
IMPORTANTS POINTS
• BeautifulSoup provides several methods for searching for tags based on their contents,
such as find(), find_all(), and select().
• The find_all() method returns a list of all tags that match a given filter, while the find()
method returns the first tag that matches the filter.
• You can use the text keyword argument to search for tags that contain specific text.
Select method
• The select method allows you to apply these selectors to navigate and
extract data from the parsed document easily.
139
CSS Selector
• Id selector (#)
• Class selector (.)
• Universal Selector (*)
• Element Selector (tag)
• Grouping Selector(,)
CSS Selector
• Id selector (#) :The ID selector targets a specific HTML element based on its unique
identifier attribute (id). An ID is intended to be unique within a webpage, so using the ID
selector allows you to style or apply CSS rules to a particular element with a specific ID.
#header {
color: blue;
font-size: 16px;
}
• Class selector (.) : The class selector is used to select and style HTML elements based on
their class attribute. Unlike IDs, multiple elements can share the same class, enabling
you to apply the same styles to multiple elements throughout the document.
.highlight {
background-color: yellow;
font-weight: bold;
}
CSS Selector
• Universal Selector (*) :The universal selector selects all HTML elements on the webpage.
It can be used to apply styles or rules globally, affecting every element. However, it is
important to use the universal selector judiciously to avoid unintended consequences.
*{
margin: 0;
padding: 0;
}
• Element Selector (tag) : The element selector targets all instances of a specific HTML
element on the page. It allows you to apply styles universally to elements of the same
type, regardless of their class or ID.
p{
color: green;
font-size: 14px;
}
140
• Grouping Selector(,) : The grouping selector allows you to apply the same styles to
multiple selectors at once. Selectors are separated by commas, and the styles specified
will be applied to all the listed selectors.
h1, h2, h3 {
font-family: 'Arial', sans-serif;
color: #333;
}
• These selectors are fundamental to CSS and provide a powerful way to target and style
different elements on a webpage.
<!DOCTYPE html>
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<div id="content">
Creating a basic HTML page <h1>Heading 1</h1>
<p class="paragraph">This is a sample paragraph.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
</ul>
<a href="https://example.com">Visit Example</a>
</div>
</body>
</html>
Selenium
• Selenium is an open-source testing tool, which means it can be downloaded
from the internet without spending anything.
Webdriver
XPATH
143
CSS Selector
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
driver.maximize_window()
time.sleep(3)
# Navigate to the form page
driver.get('https://www.confirmtkt.com/pnr-status')
# Locate form elements
pnr_field = driver.find_element("name", "pnr")
submit_button = driver.find_element(By.CSS_SELECTOR, '.col-xs-4')
# Fill in form fields
pnr_field.send_keys('4358851774')
# Submit the form
submit_button.click()
X = 10000
In the special case that all variables are of the same type, much of this information is
redundant: it can be much more efficient to store data in a fixed-type array. The
difference between a dynamic-type list and a fixed-type (NumPy-style) array is
illustrated in Figure.
While Python’s array object provides efficient storage of array-based data, NumPy adds to
this efficient operations on that data.
146
NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible
If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:
Write a Python program that creates a mxn integer arrayand Prints its attributes using
Numpy
151
Output:
You can also modify values using any of the above index notation:
NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value
to an integer array, the value will be silently truncated.
Multidimensional subarrays
Reshaping of Arrays
Another useful type of operation is reshaping of arrays. The most flexible way of doing this
is with the reshape() method. For example, if you want to put the numbers 1 through 9 in a
3×3 grid, you can do the following:
155
• Note that for this to work, the size of the initial array must match the size of the
reshaped array.
• The reshape method will use a no-copy view of the initial array, but with noncontiguous
memory buffers this is not always the case.
• Reshaping can be done with the reshape method, or more easily by making use of the
newaxis keyword within a slice operation.
156
For working with arrays of mixed dimensions, it can be clearer to use the np.vstack
(vertical stack) and np.hstack (horizontal stack) functions:
157
Splitting of arrays
• The opposite of concatenation is splitting, which is implemented by the functions np.split,
np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split points:
• Computation on NumPy arrays can be very fast, or it can be very slow. The key to making
it fast is to use vectorized operations, generally implemented through NumPy’s universal
functions (ufuncs).
• NumPy’s ufuncs can be used to make repeated calculations on array elements much
more efficient.
158
Each time the reciprocal is computed, Python first examines the object’s type and does a
dynamic lookup of the correct function to use for that type. If we were working in
compiled code instead, this type specification would be known before the code exe‐
cutes and the result could be computed much more efficiently.
• For many types of operations, NumPy provides a convenient interface into this kind of
statically typed, compiled routine. This is known as a vectorized operation.
• This vectorized approach is designed to push the loop into the compiled layer that
underlies NumPy, leading to much faster execution.
159
• Looking at the execution time for our big array, we see that it completes orders of
magnitude faster than the Python loop:
Introducing UFuncs
• Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is
to quickly execute repeated operations on values in NumPy arrays.
Array arithmetic
• NumPy’s ufuncs feel very natural to use because they make use of Python’s native
arithmetic operators. The standard addition, subtraction, multiplication, and division can
all be used:
• There is also a unary ufunc for negation, a ** operator for exponentiation, and a % operator for modulus:
Absolute value
• The corresponding NumPy ufunc is np.absolute, which is also available under the alias
np.abs:
Trigonometric functions
• NumPy provides a large number of useful ufuncs, and some of the most useful for the
data scientist are the trigonometric functions.
162
Specifying output
• For large calculations, it is sometimes useful to be able to specify the array where the
result of the calculation will be stored. Rather than creating a temporary array, you can
use this to write computation results directly to the memory location where you’d
like them to be. For all ufuncs, you can do this using the out argument of the function:
we can write the results of a computation to every other element of a specified array:
If we had instead written y[::2] = 2 ** x, this would have resulted in the creation of
a temporary array to hold the results of 2 ** x
Aggregates
• For binary ufuncs, there are some interesting aggregates that can be computed directly
from the object. we can use the reduce method of any ufunc can do this.
• A reduce method repeatedly applies a given operation to the elements of an array until
only a single result remains.
• For example, calling reduce on the add ufunc returns the sum of all elements in the
array:
164
calling reduce on the multiply ufunc results in the product of all array elements:
Note that for these particular cases, there are dedicated NumPy functions to compute the results
(np.sum, np.prod, np.cumsum, np.cumprod)
Outer products
• Finally, any ufunc can compute the output of all pairs of two different inputs using the
outer method. This allows you, in one line, to do things like create a multiplication table:
Broadcasting
Broadcasting in NumPy is a powerful mechanism that allows for the arithmetic operations on arrays of
different shapes and sizes, without explicitly creating additional copies of the data. It simplifies the
process of performing element-wise operations on arrays of different shapes, making code more
concise and efficient.
• Automatic Replication: When broadcasting, NumPy automatically replicates the smaller array along the
necessary dimensions to make it compatible with the larger array. This replication is done without actually
creating multiple copies of the data, which helps in saving memory.
Example:
Suppose you have a 2D array A of shape (3, 1) and another 1D array B of shape (3). Broadcasting allows you to
add these arrays directly, and NumPy will automatically replicate the second array along the second
dimension to match the shape of the first array.
Module 5
Visualization with Matplotlib and Seaborn
Matplotlib package
Matplotlib is a multiplatform data visualization library built on NumPy arrays. The matplotlib
package is the main graphing and plotting tool . The package is versatile and highly
configurable, supporting several graphing interfaces.
Matplotlib, together with NumPy and SciPy provides MATLAB-like graphing capabilities.
The benefits of using matplotlib in the context of data analysis and visualization are as follows:
• Integration with NumPy and SciPy (used for signal processing and numerical analysis) is
seamless.
• The package is highly customizable and configurable, catering to most people’s needs.
The package is quite extensive and allows embedding plots in a graphical user interface.
Other Advantages
• One of Matplotlib’s most important features is its ability to play well with many operating
systems and graphics backends. Matplotlib supports dozens of backends and output types,
which means you can count on it to work regardless of which operating system you are
using or which output format you wish. This cross-platform, everything-to-everyone
approach has been one of the great strengths of Matplotlib.
• It has led to a large userbase, which in turn has led to an active developer base and
Matplotlib’s powerful tools and ubiquity within the scientific Python world.
• Pandas library itself can be used as wrappers around Matplotlib’s API. Even with wrappers
like these, it is still often useful to dive into Matplotlib’s syntax to adjust the final plot
output.
Plotting Graphs
This section details the building blocks of plotting graphs: the plot() function and how to
control it to generate the output we require.
The functionality of plot() is similar to that of MATLAB and GNU-Octave with some minor
differences, mostly due to the fact that Python has a different syntax from MATLAB and
GNU-Octave.
The vector y is passed as an input to plot(). As a result, plot() drew a graph of the vector y
using auto-incrementing integers for an x-axis. Which is to say that, if x-axis values are
not supplied, plot() will automatically generate one for you: plot(y) is equivalent to
plot(range(len(y)), y).
Note:If you don’t have a GUI installed with matplotlib, replace show() with
savefig('filename') and open the generated image file in an image viewer.)
The call to function figure() generates a new figure to plot on, so we don’t overwrite the previous
figure.
• Let’s look at some more options. Next, we want to plot y as a function of t, but display only
markers, not lines. This is easily done:
To select a different marker, replace the character 'o' with another marker symbol.
Table below lists some popular choices; issuing help(plot) provides a full account of the
available markers.
Controlling Graph
For a graph to convey an idea aesthetically, though it is important, the data is not
everything. The grid and grid lines, combined with a proper selection of axis and labels,
present additional layers of information that add clarity and contribute to overall graph
presentation.
Now, let’s focus to controlling the figure by controlling the x-axis and y-axis behavior
and setting grid lines.
• Axis
• Grid and Ticks
• Subplots
• Erasing the Graph
Axis
The axis() function controls the behavior of the x-axis and y-axis ranges. If you do not supply a
parameter to axis(), the return value is a tuple in the form (xmin, xmax, ymin, ymax). You can
use axis() to set the new axis ranges by specifying new values: axis([xmin, xmax, ymin, ymax]).
If you’d like to set or retrieve only the x-axis values or y-axis values, do so by using the
functions xlim(xmin, xmax) or ylim(ymin, ymax), respectively.
The function axis() also accepts the following values: 'auto', 'equal', 'tight', 'scaled', and 'off'.
— The value 'auto'—the default behavior—allows plot() to select what it thinks are the best
values.
— The value 'equal' forces each x value to be the same length as each y value, which is
important if you’re trying to convey physical distances, such as in a GPS plot.
— The value 'tight' causes the axis to change so that the maximum and minimum values of
x and y both touch the edges of the graph.
— The value 'scaled' changes the x-axis and y-axis ranges so that x and y have both the
same length (i.e., aspect ratio of 1).
— Lastly, calling axis('off') removes the axis and labels.
Figure below shows the results of applying different axis values to this circle.
The function grid() draws a grid in the current figure. The grid is composed of a set of
horizontal and vertical dashed lines coinciding with the x ticks and y ticks. You can toggle
the grid by calling grid() or set it to be either visible or hidden by using grid(True) or
grid(False), respectively.
To control the ticks (and effectively change the grid lines, as well), use the functions xticks()
and yticks(). The functions behave similarly to axis() in that they return the current ticks if
ROOPA.H.M, Dept of MCA, RNSIT Page 4
170
Module 5 [20MCA31] Data Analytics using Python
no parameters are passed; you can also use these functions to set ticks once parameters
are provided. The functions take an array holding the tick values as numbers and an
optional tuple containing text labels. If the tuple of labels is not provided, the tick numbers
are used as labels.
Adding Text
There are several options to annotate your graph with text. You’ve already seen some, such as
using the xticks() and yticks() function.
The following functions will give you more control over text in a graph.
Title
The function title(str) sets str as a title for the graph and appears above the plot area. The
function accepts the arguments listed in Table 6-5.
All alignments are based on the default location, which is centered above the graph. Thus,
setting ha='left' will print the title starting at the middle (horizontally) and extending to the
right. Similarly, setting ha='right' will print the title ending in the middle of the graph
(horizontally). The same applies for vertical alignment. Here’s an example of using the title()
function:
The functions xlabel() and ylabel() are similar to title(), only they’re used to set the x-axis and y-
axis labels, respectively. Both these functions accept the text arguments .
>>> xlabel('time [seconds]')
Next on our list of text functions is legend(). The legend() function adds a legend box and
associates a plot with text:
The legend order associates the text with the plot. An alternative approach is to specify the
label argument with the plot() function call, and then issue a call to legend() with no
parameters:
loc can take one of the following values: 'best', 'upper right', 'upper left', 'lower left', 'lower right',
'right', 'center left', 'center right', 'lower center', 'upper center', and 'center'. Instead of using
strings, use numbers: 'best' corresponds to 0, 'upper left' corresponds to 1, and 'center'
corresponds to 10. Using the value 'best' moves the legend to a spot less likely to hide data;
however, performance-wise there may be some impact.
Text Rendering
The text(x, y, str) function accepts the coordinates in graph units x, y and the string to print,
str. It also renders the string on the figure. You can modify the text alignment using the
arguments. The following will print text at location (0, 0):
The function text() has many other arguments, such as rotation and fontsize.
Example:
The example script summarizes the functions we’ve discussed up to this point: plot() for
plotting; title(), xlabel(), ylabel(), and text() for text annotations; and xticks(), ylim(), and grid()
for grid control.
Object-oriented design of matplotlib involves two functions, setp() and getp(), that retrieve and
set a matplotlib object’s parameters. The benefit of using setp() and getp() is that automation is
easily achieved. Whenever a plot() command is called, matplotlib returns a list of matplotlib
objects.
For example, you can use the getp() function to get the linestyle of a line object. You can use the
setp() function to set the linestyle of a line object.
Here is an example of how to use the getp() and setp() functions to get and set the linestyle of a
line object:
This code will create a line plot of the data in the x and y lists. The linestyle of the line object will
be set to dashed. The code will then print the linestyle to the console. Finally, the code will show
the plot.
This code will create a line plot of the data in the x and y lists. The x-axis label will be set to
"X-axis", the y-axis label will be set to "Y-axis", and the title of the plot will be set to "My Plot".
The code will then print the x-axis and y-axis limits to the console. Finally, the code will show
the plot.
You can use the get_xlim() and get_ylim() functions to get the current x-axis and y-axis limits,
respectively. You can use the set_xlim() and set_ylim() functions to set the x-axis and y-axis
limits, respectively.
Patches
Drawing shapes requires some more care. matplotlib has objects that represent many
common shapes, referred to as patches. Some of these, like Rectangle and Circle are found
in matplotlib.pyplot, but the full set is located in matplotlib.patches.
To add a shape to a plot, create the patch object shp and add it to a subplot by calling
ax.add_patch(shp).
To work with patches, assign them to an already existing graph because, in a sense, patches
are “patched” on top of a figure. Table below gives a partial listing of available patches. In this
table, the notation xy indicates a list or tuple of (x, y) values
Import Libraries: Import the necessary libraries including Seaborn and Matplotlib. Seaborn
comes with several built-in datasets for practice. You can load one using the load_dataset
function.
Load a Dataset:
tips_data = sns.load_dataset("tips")
Customize Seaborn Styles: Seaborn comes with several built-in styles. You can set the style using
sns.set_style().
sns.set_style("whitegrid")
# Other styles include "darkgrid", "white", "dark", and "ticks"
Advanced Scatter Plots: Create a scatter plot with additional features like hue, size, and style.
Pair Plots for Multivariate Analysis: Visualize relationships between multiple variables with pair
plots.
correlation_matrix = tips_data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
Violin Plots: Visualize the distribution of a numerical variable for different categories.
FacetGrid for Customized Subplots: Use FacetGrid to create custom subplots based on
categorical variables.
Joint Plots: Combine scatter plots with histograms for bivariate analysis.
Question bank:
1. Explain , how simple line plot can be created using matplotlib? Show the adjustments
done to the plot w.r.t line colors.
The simplest of all plots is the visualization of a single function y = f (x ). Here we will create
simple line plot.
In Matplotlib, the figure (an instance of the class plt.Figure) can be thought of as a single
container that contains all the objects representing axes, graphics, text, and labels. The
axes (an instance of the class plt.Axes) is what we see above: a bounding box with ticks
and labels, which will eventually contain the plot elements that make up the visualization.
Alternatively, we can use the pylab interface, which creates the figure and axes in the
background. Ex: plt.plot(x, np.sin(x))
The plt.plot() function takes additional arguments that can be used to specify the color
keyword, which accepts a string argument representing virtually any imaginable color. The
color can be specified in a variety of ways.
Matplotlib was originally written as a Python The object-oriented interface is available for
alternative for MATLAB users, and much of its these more complicated situations, and for
syntax reflects that fact. when we want more control over your
figure.
The MATLAB-style tools are contained in the
pyplot (plt) interface.
Interface is stateful: it keeps track of the Rather than depending on some notion of
current” figure and axes, where all plt an “active” figure or axes, in the object-
commands are applied. once the second panel oriented interface the plotting functions are
is created, going back and adding something methods of explicit Figure and Axes
to the first is bit complex. objects.
3. Write the lines of code to create a simple histogram using matplotlib library.
A simple histogram can be useful in understanding a dataset. the below code creates a
simple histogram.
4. What are the two ways to adjust axis limits of the plot using Matplotlib? Explain with the example
for each.
Matplotlib does a decent job of choosing default axes limits for your plot, but some‐ times
it’s nice to have finer control.
• using plt.axis()
The plt.axis( ) method allows you to set the x and y limits with a single call, by passing a
list that specifies [xmin, xmax, ymin, ymax].
5. List out the dissimilarities between plot() and scatter() functions while plotting scatter plot.
• The difference between the two functions is: with pyplot.plot() any property you apply
(color, shape, size of points) will be applied across all points whereas in pyplot.scatter() you
have more control in each point’s appearance. That is, in plt.scatter() you can have the color,
shape and size of each dot (datapoint) to vary based on another variable.
• While it doesn’t matter as much for small amounts of data, as datasets get larger than a
few thousand points, plt.plot can be noticeably more efficient than plt.scatter. The reason is
that plt.scatter has the capability to render a different size and/or color for each point, so
the renderer must do the extra work of constructing each point individually. In plt.plot, on
the other hand, the points are always essentially clones of each other, so the work of
determining the appearance of the points is done only once for the entire set of data.
• For large datasets, the difference between these two can lead to vastly different performance,
and for this reason, plt.plot should be preferred over plt.scatter for large datasets.
6. How to customize the default plot settings of Matplotlib w.r.t runtime configuration
and stylesheets? Explain with the suitable code snippet.
• Each time Matplotlib loads, it defines a runtime configuration (rc) containing the default
styles for every plot element we create.
• We can adjust this configuration at any time using the plt.rc convenience routine.
• To modify the rc parameters, we’ll start by saving a copy of the current rcParams
dictionary, so we can easily reset these changes in the current session:
IPython_default = plt.rcParams.copy()
• Now we can use the plt.rc function to change some of these settings:
Seaborn Matplotlib
Let us assume
x=[10,20,30,45,60]
y=[0.5,0.2,0.5,0.3,0.5]
Matplotlib Seaborn
#to plot the graph #to plot the graph
import matplotlib.pyplot as plt import seaborn as sns
plt.style.use('classic') sns.set()
plt.plot(x, y) plt.plot(x, y)
plt.legend('ABCDEF',ncol=2, plt.legend('ABCDEF',ncol=2,
loc='upper left') loc='upper left')
8. List and describe different categories of colormaps with the suitable code snippets.
Three different categories of colormaps:
Divergent colormaps : These usually contain two distinct colors, which show positive and
negative deviations from a mean (e.g., RdBu or PuOr).
Qualitative colormaps : These mix colors with no particular sequence (e.g., rainbow or jet).
10.With the suitable example, describe how to draw histogram and kde plots using
seaborn.
Often in statistical data visualization, all we want is to plot histograms and joint
distributions of variables.