0% found this document useful (0 votes)

14 views34 pages

Python Unit 4

Uploaded by

drashtijasani11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views34 pages

Python Unit 4

Uploaded by

drashtijasani11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

BCA SEM-5 PYTHON Unit-4

How to generating virtual environment in python?

How to managing packages with pip (python package index) in python?

 Python applications will often use packages and modules that don’t come
as part of the standard library.
 Applications will sometimes need a specific version of a library.
 This means it may not be possible for one Python installation to meet the
requirements of every application.
 If application A needs version 1.0 of a particular module but application B
needs version 2.0, then the requirements are in conflict and installing
either version 1.0 or 2.0 will leave one application unable to run.
 The solution for this problem is to create a virtual environment, a self-
contained directory tree that contains a Python installation for a particular
version of Python, plus a number of additional packages.
 Different applications can then use different virtual environments.
 To resolve the earlier example of conflicting requirements, application A
can have its own virtual environment with version 1.0 installed while
application B has another virtual environment with version 2.0.
 If application B requires a library be upgraded to version 3.0, this will not
affect application A’s environment.
 To create a virtual environment, decide upon a directory where you want
to place it, and run the venv module as a script with the directory path:

python3 -m venv tutorial-env

 This will create the tutorial-env directory if it doesn’t exist, and also create
directories inside it containing a copy of the Python interpreter, the
standard library, and various supporting files.

Page 1
Page 2
BCA SEM-5 PYTHON Unit-4

 Once you’ve created a virtual environment, you may activate it.

tutorial-env\Scripts\[Link]

Managing Packages with pip

 You can install, upgrade, and remove packages using a program called
pip. By default pip will install packages from the Python Package Index.
 pip has a number of subcommands: “search”, “install”, “uninstall”,
“freeze”, etc.
 You can install the latest version of a package by specifying a
package’s name:

(tutorial-env) $ pip install novas

 You can also install a specific version of a package by giving the

package name followed by == and the version number:

(tutorial-env) $ pip install requests==2.6.0

 pip list will display all of the packages installed in the virtual environment:

(tutorial-env) $ pip list

novas ([Link])
numpy (1.9.2)
pip (7.0.3)

How to install mysql connector? And verify the connector installation.

 To communicate with MySQL database from python, we need a

program that becomes an interface to MYSQL database server.
Page 3
 MYSQLDB is an interface between the MYSQL server and Python programs.

Before you can access MySQL databases using Python, you must install one (or
more) of the following packages in a python.

Page 4
BCA SEM-5 PYTHON Unit-4

 MySQL-python:

 This package contains the MySQLdb module, which is written in C. It isone

of the most commonly used Python packages for MySQL.

 mysql-connector-python:

 This package contains the [Link] module, which is

written entirely in Python.

 PyMySQL:

 This package contains the pymysql module, which is written entirely

in Python. It is designed to be a drop-in replacement for the MySQL-
python package.

 All three of these packages use Python's portable SQL database API.
 This means that if you switch from one module to another, you can reuse
almost all of your existing code (the code sample below demonstrates
how to do this).

 Type the command for the package you want to install:

 To install the MySQL-python package, type the following command:

pip install MySQL-python

 To install the mysql-connector-python package, type the following command:

pip install mysql-connector-python

 To install the pymysql package, type the following command:

pip install pymysql

Page 5
Page 6
BCA SEM-5 PYTHON Unit-4

 Verifying the MySQLdb Interface Installation

 If the MySQLdb interface for python has been installed successfully, we can
see a new module by the name ‘MySQLdb’ is added to the existing
modules in python library.
 Go to python IDLE shell window and type the following at python prompt
>>> help(‘modules’)
 As a result, it will display all the available modules of python.
 We should be able to locate ‘MySQLdb’ module among them.
 This represent that the MySQLdb interface for python has beensuccessfully
installed.

Using MYSQL from python explain select , insert , delete with

example.

 Python can be used in database applications.

 One of the most popular databases is MySQL.

 You need to know the following detail of the MySQL server to perform the
connection from Python.

 Username – i.e., the username that you use to work with MySQL Server.
The default username for the MySQL database is a root
Page 7
Page 8
BCA SEM-5 PYTHON Unit-4

 Password – Password is given by the user at the time of installing the mysql
database. If you are using root then you won’t need the password.
 Host Name – is the server name or Ip address on which MySQL is running. if
you are running on localhost, then you can use localhost, or it’s IP, i.e.
[Link]
 Database Name – Database name to which you want to connect. Here we are
using Database named ‘Emp‘ because we have already created this for our
example.

Methods which are used in database connectivity with mysql:

1. connect() : This method is used for creating a connection to our database

it have four arguments:
1. Server Name
2. Database User Name
3. Database Provider
4. Database Name

2. Cursor() : This method creates a cursor object that is capable for executing
sql query on database.

3. Execute() : This method is used for executing sql query on database. It takes
a sql query( as string) as an argument.

4. Fetchone() : This method retrieves the next row of a query result set and
returns a single sequence, or None if no more rows are available.

5. Close() : This method close the database connection.

Creating a Table

 To create a table in MySQL, use the "CREATE TABLE" statement.

 Make sure you define the name of the database when you create
the connection

Page 9
Page 10
BCA SEM-5 PYTHON Unit-4

 Inserting Record

 To fill a table in MySQL, use the "INSERT INTO" statement.

Import [Link] as mydb

pydb=[Link](host="localhost",user="root",passwd="",database=db_nam
e)
[Link]("insert into emp(eid,name,sal) values(10,’dixita’,2000)”)
[Link]()
print([Link],"Row Inserted Successfully")
[Link]()

 Delete Record
 You can delete records from an existing table by using the
"DELETEFROM" statement:

Import [Link] as mydb

pydb=[Link](host="localhost",user="root",passwd="",database=db_name)

[Link]("delete from emp where eid=10”)

[Link]()
print([Link],"Row deleted Successfully")
[Link]()

 Select From a Table

 To select from a table in MySQL, use the "SELECT" statement:

Import [Link] as mydb

pydb=[Link](host="localhost",user="root",passwd="",database=db_name)
Page 11
BCA SEM-5 PYTHON Unit-4

pycursor=[Link]()
[Link]("SELECT * FROM emp")
res=[Link]()
for x in res:
print(x)

 Data Analysis With Pandas

The process of installing and configuring Pandas, including setting up a virtual environment,
involves several steps:
1. Install PIP (if not already present):
PIP is typically included with Python installations from version 3.4 onwards. To verify if PIP
is installed and to check its version, open your terminal or command prompt and execute:
pip –version
If PIP is not found, you may need to reinstall Python or follow platform-specific instructions
to install PIP.
2. Create a Virtual Environment:
Navigate to your desired project directory in the terminal or command prompt and create a
virtual environment using the venv module:
python -m venv my_pandas_env

Replace my_pandas_env with your preferred name for the virtual environment.
3. Activate the Virtual Environment:
On Windows.
my_pandas_env\Scripts\activate
On macOS/Linux.
source my_pandas_env/bin/activate

Once activated, the name of your virtual environment will typically appear in your terminal
prompt, indicating that you are working within it.
4. Install Pandas 2.0:
With the virtual environment active, use PIP to install Pandas. To specifically install version
2.0, execute:
pip install pandas==2.0.0
Page 12
(Note: Replace 2.0.0 with the specific patch version if required, e.g., 2.0.3.)
5. Verify Installation:
To confirm that Pandas has been installed correctly within your virtual environment, open a
Python interpreter or create a Python script and execute the following:

import pandas as pd
print(pd.__version__)

This will print the installed version of Pandas, confirming a successful installation.

 Data Structures in Pandas

Pandas is an open-source Python library used for working with relational or labeled data
in an easy and intuitive way. It provides powerful data structures and a wide range of
operations for manipulating numerical data and time series. Pandas also offers tools for
cleaning, processing and analyzing data efficiently. It is one of the most popular libraries
for data analysis in Python and primarily supports two core data structures:
 Series
 DataFrame
Series
A Series is a one-dimensional array-like object that can store any data type such as
integers, strings, floats, or even Python objects. It comes with labels (called an index).
Syntax
[Link](data=None, index=None, dtype=None, name=None, copy=False)
Parameters:
 data: Array-like, dict or scalar – Input data.
 index (Optional): Labels for the axis.
 dtype (Optional): Data type of the Series.
 name (Optional): Name of the Series.
 copy (Bool): Copy data if True.
Returns: A [Link] object containing the provided data with an associated index.
Example 1: Series holding the char data type.

import pandas as pd
a = ['g', 'e', 'e', 'k', 's']

Page 13
res = [Link](a)
print(res)

Output

Explanation: We pass the list a into [Link](a), which converts it into a Series (a
column-like structure) where each item gets a default index starting from 0, automatically
assigned by Pandas.
Example 2: Series holding the Int data type.

import pandas as pd
a = [1,2,3,4,5]

res = [Link](a)
print(res)

Output

Page 14
Explanation: We pass the list a into [Link] a, which converts it into a Series (a column-
like structure) where each number gets a default index starting from 0, automatically
assigned by Pandas.
Example 3: Series holding the dictionary.

import pandas as pd
a = { 'Id': 1013, 'Name': 'MOhe', 'State': 'Maniput','Age': 24}

res = [Link](a)
print(res)

Output:

 Dataframe

A DataFrame is a two-dimensional, size-mutable and heterogeneous tabular data

structure with labeled rows and columns, similar to a spreadsheet or SQL table. Each
column in a DataFrame is a Pandas Series, allowing you to work with multiple types of
data in one table.
Syntax:
[Link](data=None, index=None, columns=None, dtype=None, copy=False)
Parameters:
 data: Various forms of input data (e.g., lists, dict, ndarray, Series, another DataFrame).
 index( Optional): labels for rows.
 columns(Optional): labels for columns.
 dtype(Optional): Optional data type for all columns.
 copy(Optional): Boolean; whether to copy data or not.
Returns: A [Link] object representing a 2D labeled data structure.
Example 1: Creating a dataFrame from a list

import pandas as pd
Page 15
a = ['Python', 'Pandas', 'Numpy']

df = [Link](a, columns=['Tech'])
print(df)

Output:

Explanantion: We pass the list a into [Link](a, columns=['Tech']), which converts

it into a DataFrame with a single column named 'Tech'. Each item becomes a row and
Pandas automatically assigns a default integer index starting from 0.
Example 2: Creating a dataFrame from a dictionary

a={
'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
'Age': [20, 21, 19, 18]
}
res = [Link](a)
print(res)

Output:

Explanation: We pass the dictionary a into [Link](a), which converts it into a

DataFrame where the dictionary keys become column names and the values (lists)

Page 16
become the column data. Pandas assigns a default integer index starting from 0 for the
rows.
Example 3: Selecting columns and rows in a dataFrame

import pandas as pd

a={
'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age': [27, 24, 22, 32],
'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification': ['Msc', 'MA', 'MCA', 'Phd']
}
df = [Link](a)
print(df[['Name', 'Qualification']])

Output:

Explanation: We create a DataFrame df from the dictionary a, then select and print only
the columns 'Name' and 'Qualification' by passing their names in a list to df[]. This returns
a new DataFrame with just those two columns.
Accessing columns and rows in a dataFrame
A DataFrame in Pandas is a 2D tabular structure where you can easily access and
manipulate data by selecting specific columns or rows. You can extract one or more
columns using column names and filter rows using labels or conditions.
Example 1: We can access one or more columns in a DataFrame using square brackets.

import pandas as pd
a={
'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age': [27, 24, 22, 32],
'City': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj']
}
df = [Link](a)

print(df['Name']) # single column

Page 17
print(df[['Name', 'City']]) # multiple columns

Output:

Explanation:
 df['Name'] returns a Series containing values from the 'Name' column.
 df[['Name', 'City']] returns a new DataFrame containing only the specified columns.
Example 2: We can use .loc[] to access rows by index or filter them using conditions.

Output:

import pandas as pd

a={
'Name': ['Mohe', 'Shyni', 'Parul', 'Sam'],
'ID': [12, 43, 54, 32],
'City': ['Delhi', 'Kochi', 'Pune', 'Patna']
}

df = [Link](a)
res = [Link][df['Name'] == 'Mohe']
print(res)

Page 18
Explanation: [Link][df['Name'] == 'Mohe'] filters and returns only the row(s) where the
'Name' column has the value 'Mohe'.

 Creating Data Frame with different sources

Creating a Pandas DataFrame in Python can be achieved from various data sources.
1. From an Excel Spreadsheet:
To create a DataFrame from an Excel file, the read_excel() function from the Pandas library
is used.

import pandas as pd

# Assuming '[Link]' is your Excel file and 'Sheet1' is the sheet name
df_excel = pd.read_excel('[Link]', sheet_name='Sheet1')
print(df_excel)

2. From .csv files:

To create a DataFrame from a CSV file, the read_csv() function from the Pandas library is
used.

import pandas as pd

# Assuming '[Link]' is your CSV file

df_csv = pd.read_csv('[Link]')
print(df_csv)

3. From a Python Dictionary:

A DataFrame can be created from a dictionary where keys represent column names and
values are lists of data for each column.

import pandas as pd

data_dict = {
Page 19
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']
}
df_dict = [Link](data_dict)
print(df_dict)

4. From a List of Lists or Tuples:

A DataFrame can be created from a list of lists or a list of tuples, where each inner list or
tuple represents a row. Column names can be specified separately.

import pandas as pd

data_list = [
['Alice', 25, 'New York'],
['Bob', 30, 'London'],
['Charlie', 35, 'Paris']
]
df_list = [Link](data_list, columns=['Name', 'Age', 'City'])
print(df_list)

data_tuples = [
('David', 40, 'Tokyo'),
('Eve', 28, 'Berlin')
]
df_tuples = [Link](data_tuples, columns=['Name', 'Age', 'City'])
print(df_tuples)

Page 20
 Operations on Data frames

Once we create a data frame, we can do various operations on it. These operations
help us in analyzing the data or manipulating the data. The reader is advised to refer
to the list of all operations available in pandas at the following link:

[Link]

First we will create a data frame from a .csv file using read_csv() function as shown
below. This data frame will be the basis for our operations.

>>> df = pd.read_csv("f:\\python\PANDAS\[Link]")
>>> df
OUTPUT:
empid ename sal doj
0 1001 Ganesh Rao 10000.00 10-10-00
1 1002 Anil Kumar 23000.50 3-20-2002
2 1003 Gaurav Gupta 18000.33 03-03-02
3 1004 Hema Chandra 16500.50 10-09-00
4 1005 Laxmi Prasanna 12000.75 08-10-00
5 1006 Anant Nag 9999.99 09-09-99

 Knowing number of rows and columns

 Using the .shape attribute:

 The shape attribute of a DataFrame returns a tuple representing its

dimensions, where the first element is the number of rows and the second is
the number of columns.
import pandas as pd

# Create a sample DataFrame

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]}
df = [Link](data)

# Get the number of rows and columns

num_rows, num_cols = [Link]

Page 21
print(f"Number of rows: {num_rows}")
print(f"Number of columns: {num_cols}")

 Retrieving Rows from Data Frame

In Python, when working with Pandas DataFrames, the head() method is used to
retrieve the first n rows of the DataFrame. By default, if no argument is provided, head()
returns the first 5 rows.

import pandas as pd

# Create a sample DataFrame

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
'Age': [25, 30, 35, 28, 42, 50, 22],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami', 'Boston',
'Seattle']}
df = [Link](data)

# Retrieve the first 5 rows (default)

first_five_rows = [Link]()
print("First 5 rows:\n", first_five_rows)

# Retrieve the first 3 rows

first_three_rows = [Link](3)
print("\nFirst 3 rows:\n", first_three_rows)

 tail() on your DataFrame without any arguments to get the last five rows.

import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
'Age': [25, 30, 35, 28, 42, 50, 22],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami', 'Boston',
'Seattle']}
df = [Link](data)

last_five_rows = [Link]()
print(last_five_rows)

Page 22
last_three_rows = [Link](3)
print(last_three_rows)

 Retrieving a Range of Rows

We can treat the data frame as an object and retrieve the rows from it using
slicing. For example, if we write df[1:4], we can get 1nd row to 3th row (excludes 4th
row).
import pandas as pd

data = {'col1': [1, 2, 3, 4, 5, 6],

'col2': ['A', 'B', 'C', 'D', 'E', 'F']}
df = [Link](data)

# Select rows from index 1 (inclusive) to 4 (exclusive)

selected_rows = df[1:4]
print(selected_rows)

 To Retrieve Column Data

To get the column data, we can mention the column name as subscript. For
example, [Link] will display all employee id numbers. This can also be done
using df[‘age’]

import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace'],
'Age': [25, 30, 35, 28, 42, 50, 22],
'City': ['New York', 'Los Angeles', 'Chicago',
'Houston', 'Miami', 'Boston', 'Seattle']}
df = [Link](data)
print(df)
print(df['Age'])

 Retrieving Data from Multiple Columns

To retrieve multiple columns data, we can provide the list of column

names as subscript to data frame object as df[ [list of column names] ]. For example, to
display the employee ids and their names, we can write:
>>> df[['empid', 'ename']]

Page 23
empid ename
0 1001 Ganesh Rao
1 1002 Anil Kumar
2 1003 Gaurav Gupta
3 1004 Hema Chandra
4 1005 Laxmi Prasanna
5 1006 Anant Nag

 Finding Maximum and Minimum Values

It is possible to find the highest value using max() method and the least value using min()
method. These methods are applied to columns containing numerical data. For example,
to know the highest salary and the least salary, we can use as:
>>> df['sal'].max()
23000.5
>>> df['sal'].min()
9999.9899999999998

 Displaying Statistical Information

We have describe() method that displays very important information like number of
values, average, standard deviation, minimum, maximum, 25%, 50% and 75% of the
total value. This information is highly useful for statistical analysis.

>>> [Link]()
empid sal
count 6.000000 6.000000
mean 1003.500000 14917.011667
std 1.870829 5181.037711
min 1001.000000 9999.990000
25% 1002.250000 10500.187500
50% 1003.500000 14250.625000
75% 1004.750000 17625.372500
max 1006.000000 23000.500000

Performing Queries on Data

We can retrieve rows based on a query. The query should be given as subscript in the
data frame object. For example, to retrieve all the rows where salary is greater than Rs.
10000, we can write:
>>> df[[Link]>10000]
Page 24
OUTPUT:
empid ename sal doj
1 1002 Anil Kumar 23000.50 3-20-2002
2 1003 Gaurav Gupta 18000.33 03-03-02
3 1004 Hema Chandra 16500.50 10-09-00
4 1005 Laxmi Prasanna 12000.75 08-10-00

To retrieve the row where salary is maximum, we can write:

>>> df[[Link] == [Link]()]

OUTPUT:

empid ename sal doj

1 1002 Anil Kumar 23000.5 3-20-2002

Suppose, we want to show data from some columns based on a query, we can mention
the list of columns and then the query as: df[[column names]][query].
For example, to
display only id numbers and names where the salary is greater than Rs. 10000, we can
write:
>>> df[['empid', 'ename']][[Link]>10000]

OUTPUT:
empid ename
1 1002 Anil Kumar
2 1003 Gaurav Gupta
3 1004 Hema Chandra
4 1005 Laxmi Prasanna

 Knowing the index range

The first column is called index column and it is generated in the data frame
automatically. We can retrieve the index information using index attribute as:
>>> [Link]
RangeIndex(start=0, stop=6, step=1)

 Setting a column as index

We know the index column is automatically generated. If we do not want this column and
we want to set a column from our data as index column, that is possible using set_index()
method. The column with unique values can be set as index column. For example, to
make ‘empid’ column as index column for our data frame, we can write:
Page 25
>>> df1 = df.set_index('empid')
The above statement creates another data frame ‘df1’ that uses ‘empid’ as index column.
We can verify this by displaying df1 as:
>>> df1
OUTPUT:

Empid ename sal doj

1001 Ganesh Rao 10000.00 10-10-2000
1002 Anil Kumar 23000.50 3-3-2002
1003 Gaurav Gupta 18000.33 3-3-2002
1004 Hema Chandra 16500.50 3-3-2002
1005 Laxmi Prasanna 12000.75 10-8-2000
1006 Anant Nag 9999.99 9-9-1999
We can find the empid being used as index column in the new data frame ‘df1’ as above.
However, the original data frame ‘df’ in this case is not modified and it still uses
automatically generated index column. If we want to modify the original ‘df’ and set
empid as index column, we should add ‘inplace=True’ as shown below:
>>> df.set_index('empid', inplace=True)
>>> df
OUTPUT:

Empid ename sal doj

1001 Ganesh Rao 10000.00 10-10-00
1002 Anil Kumar 23000.50 3-20-2002
1003 Gaurav Gupta 18000.33 03-03-02
1004 Hema Chandra 16500.50 10-09-00
1005 Laxmi Prasanna 12000.75 08-10-00
1006 Anant Nag 9999.99 09-09-99
Once we set ‘empid’ as index, it is possible to locate the data of any employee by passing
employee id number to loc attribute as:
>>> [Link][1004]
OUTPUT:
ename Hema Chandra
sal 16500.5
doj 3-3-2002
Name: 1004, dtype: object

Resetting the Index

To reset the index value from ‘empid’ back to auto-generated index number, we can use
reset_index() method with inplace=True option as:
Page 26
>>> df.reset_index(inplace=True)
>>> df
OUTPUT:
empid ename sal doj
0 1001 Ganesh Rao 10000.00 10-10-2000
1 1002 Anil Kumar 23000.50 3-3-2002
2 1003 Gaurav Gupta 18000.33 3-3-2002
3 1004 Hema Chandra 16500.50 3-3-2002
4 1005 Laxmi Prasanna 12000.75 10-8-2000
5 1006 Anant Nag 9999.99 9-9-1999

 Sorting the data

To sort the data coming from a .csv file, first let us read the data from the file into a data
frame using read_csv() function as:
>>> df = pd.read_csv("f:\\python\PANDAS\[Link]",
parse_dates=['doj'])
Here, we are loading the data from [Link] file and also informing to take ‘doj’ as
date type field using parse_dates option. Now let us display the data frame as:
>>> print(df)
OUTPUT:
empid ename sal doj
0 1001 Ganesh Rao 10000.00 2000-10-10
1 1002 Anil Kumar 23000.50 2002-03-03
2 1003 Gaurav Gupta 18000.33 2002-03-03
3 1004 Hema Chandra 16500.50 2002-03-03
4 1005 Laxmi Prasanna 12000.75 2000-08-10
5 1006 Anant Nag 9999.99 1999-09-09
To sort the rows on ‘doj’ column into ascending order, we can use sort_values() method
as:
>>> df1 = df.sort_values('doj')
>>> df1
OUTPUT:

empid ename sal doj

5 1006 Anant Nag 9999.99 1999-09-09
4 1005 Laxmi Prasanna 12000.75 2000-08-10
0 1001 Ganesh Rao 10000.00 2000-10-10
1 1002 Anil Kumar 23000.50 2002-03-03

Page 27
2 1003 Gaurav Gupta 18000.33 2002-03-03
3 1004 Hema Chandra 16500.50 2002-03-03

To sort in descending order, we should use an additional option ‘ascending = False’ as:
>>> df1 = df.sort_values('doj', ascending=False)
Sorting on multiple columns is also possible. This can be done using an option ‘by’ in the
sort_values() method. For example, we want to sort on ‘doj’ in descending order and in
that ‘sal’ in ascending order. That means, when two employees have same ‘doj’, then
their salaries will be sorted into ascending order.
>>> df1 = df.sort_values(by=['doj', 'sal'], ascending=[False, True])
>>> df1
OUTPUT:
empid ename sal doj
3 1004 Hema Chandra 16500.50 2002-03-03
2 1003 Gaurav Gupta 18000.33 2002-03-03
1 1002 Anil Kumar 23000.50 2002-03-03
0 1001 Ganesh Rao 10000.00 2000-10-10
4 1005 Laxmi Prasanna 12000.75 2000-08-10
5 1006 Anant Nag 9999.99 1999-09-09

 Handling missing data

In many cases, the data that we receive from various sources may not be perfect. That
means there may be some missing data. For example, ‘[Link]’ file contains the
following data where employee name is missing in one row and salary and date of
joining are missing in another row. Please see beloved Figure :

Page 28
When we convert the data into a data frame, the missing data is represented by NaN (Not a
Number). NaN is a default marker for the missing value. Please observe the following
data frame:
>>> import pandas as pd
>>> df = pd.read_csv("f:\\python\PANDAS\[Link]")
>>> df

OUTPUT:
0 1001 Ganesh Rao 10000.00 10-10-00
1 1002 Anil Kumar 23000.50 03-03-02
2 1003 NaN 18000.33 03-03-02
3 1004 Hema Chandra NaN NaN
4 1005 Laxmi Prasanna 12000.75 10-08-00
5 1006 Anant Nag 9999.99 09-09-99

We can use fillna() method to replace the Na or NaN values by a specified value. For
example, to fill the NaN values by 0, we can use:
>>> df1 = [Link](0)
>>> df1

OUTPUT:
empid ename sal doj
0 1001 Ganesh Rao 10000.00 10-10-00
1 1002 Anil Kumar 23000.50 03-03-02
2 1003 0 18000.33 03-03-02
3 1004 Hema Chandra 0.00 0
4 1005 Laxmi Prasanna 12000.75 10-08-00
5 1006 Anant Nag 9999.99 09-09-99
But this is not so useful as it is filling any type of column with zero. We can fill each
column with a different value by passing the column names and the value to be used to
fill in the column. For example, to fill ‘ename’ column with ‘Name missing’, ‘sal’ with 0.0
and ‘doj’ with ’00-00-00’, we should supply these values as a dictionary to fillna() method
as shown below:
>>> df1 = [Link]({'ename': 'Name missing', 'sal': 0.0, 'doj':'00-00-
00'})
>>> df1
OUTPUT:
empid ename sal doj
0 1001 Ganesh Rao 10000.00 10-10-00
Page 29
1 1002 Anil Kumar 23000.50 03-03-02
2 1003 Name missing 18000.33 03-03-02
3 1004 Hema Chandra 0.00 00-00-00
4 1005 Laxmi Prasanna 12000.75 10-08-00
5 1006 Anant Nag 9999.99 09-09-99
If we do not want the missing data and want to remove those rows having Na or NaN
values, then we can use dropna() method as:
>>> df1 = [Link]()
>>> df1
OUTPUT:
empid ename sal doj
0 1001 Ganesh Rao 10000.00 10-10-00
1 1002 Anil Kumar 23000.50 03-03-02
4 1005 Laxmi Prasanna 12000.75 10-08-00
5 1006 Anant Nag 9999.99 09-09-99
In this way, filling the necessary data or eliminating the missing data is called ‘data
cleansing’.

 Merging and Joining DataFrames:

 Merging DataFrames using merge() Function

The merge() function in Python, part of the Pandas library, is used to combine two
DataFrames based on common columns or indices, similar to SQL JOIN operations.
Syntax:
[Link](left, right, how='inner', on=None, left_on=None, right_on=None,
left_index=False, right_index=False, suffixes=('_x', '_y'), copy=True, indicator=False,
validate=None)

Key Parameters:
left: The left DataFrame to merge.
right: The right DataFrame to merge.

how: Specifies the type of merge (join):

'inner' (default): Returns only the rows where the merge key(s) are present in both
DataFrames.
'left': Returns all rows from the left DataFrame, and matching rows from the right
DataFrame. If no match, NaN is filled.
'right': Returns all rows from the right DataFrame, and matching rows from the left
DataFrame. If no match, NaN is filled.
Page 30
'outer': Returns all rows when there is a match in either the left or right DataFrame. If
no match, NaN is filled.
on: Column or list of column names to join on. These must be present in both
DataFrames.
left_on: Column or list of column names from the left DataFrame to use as join keys.
right_on: Column or list of column names from the right DataFrame to use as join keys.
left_index: If True, use the index of the left DataFrame as its join key(s).
right_index: If True, use the index of the right DataFrame as its join key(s).
suffixes: A tuple of string suffixes to apply to overlapping column names (not join keys)
from the left and right DataFrames, respectively. Default is ('_x', '_y').

import pandas as pd

# Create two sample DataFrames

df1 = [Link]({'key': ['A', 'B', 'C', 'D'], 'value1': [1, 2, 3, 4]})
df2 = [Link]({'key': ['B', 'D', 'E', 'F'], 'value2': [5, 6, 7, 8]})

# Perform an inner merge on the 'key' column

merged_df_inner = [Link](df1, df2, on='key', how='inner')
print("Inner Merge:\n", merged_df_inner)

# Perform a left merge

merged_df_left = [Link](df1, df2, on='key', how='left')
print("\nLeft Merge:\n", merged_df_left)

 Joining DataFrames using join() Method

The join() function in Pandas is used to combine columns of two or more DataFrames
based on their indices or a specified column. It is particularly useful when you need
to merge DataFrames where one or both have meaningful indices that serve as the
key for joining.

import pandas as pd

# Create sample DataFrames

df1 = [Link]({'value1': [10, 20, 30]}, index=['A', 'B', 'C'])
df2 = [Link]({'value2': [100, 200]}, index=['B', 'C'])

# Perform a left join (default behavior)

joined_df = [Link](df2)
print("Left Join (default):")
Page 31
print(joined_df)

# Perform an inner join

inner_joined_df = [Link](df2, how='inner')
print("\nInner Join:")
print(inner_joined_df)

# Perform an outer join

outer_joined_df = [Link](df2, how='outer')
print("\nOuter Join:")
print(outer_joined_df)

# Perform a right join

right_joined_df = [Link](df2, how='right')
print("\nRight Join:")
print(right_joined_df)

# Joining on a column instead of index (requires setting index on one DataFrame)

df3 = [Link]({'key': ['X', 'Y', 'Z'], 'data1': [1, 2, 3]})
df4 = [Link]({'key': ['Y', 'Z', 'W'], 'data2': [10, 20, 30]})

# Set 'key' as index for df4 to join on it

df4_indexed = df4.set_index('key')

# Join df3 with df4_indexed on the 'key' column of df3 and the index of df4_indexed
joined_on_column = [Link](df4_indexed, on='key')
print("\nJoin on a specific column:")
print(joined_on_column)

 Grouping and Aggregating Data

Grouping and aggregating data in Pandas is a fundamental process for data analysis
and involves the "split-apply-combine" strategy. This strategy allows for the summarization
and transformation of data based on specific criteria.
1. Grouping Data:
The primary method for grouping data in Pandas is the groupby() function. This
function splits a DataFrame into groups based on the unique values within one or more
specified columns.
import pandas as pd
Page 32
data = {'Category': ['A', 'B', 'A', 'C', 'B'],
'Value': [10, 15, 20, 25, 30]}
df = [Link](data)

# Grouping by a single column

grouped_by_category = [Link]('Category')

# Grouping by multiple columns (e.g., if you had another column 'Subcategory')

# grouped_by_multiple = [Link](['Category', 'Subcategory'])

The result of groupby() is a DataFrameGroupBy object, which is an intermediate

object that holds the grouped data but does not yet display the aggregated results.

2. Aggregating Data:
After grouping, you can apply aggregation functions to each group to summarize the
data. Common aggregation functions include:
.sum(): Calculates the sum of values within each group.
.mean(): Calculates the average of values within each group.
.count(): Counts the number of non-null values within each group.
.min(): Finds the minimum value within each group.
.max(): Finds the maximum value within each group.
.median(): Calculates the median value within each group.
.std(): Calculates the standard deviation within each group.
.agg(): Allows applying multiple aggregation functions simultaneously or custom
functions.

import pandas as pd

data = {'Category': ['A', 'B', 'A', 'C', 'B'],

'Value': [10, 15, 20, 25, 30]}
df = [Link](data)
# Grouping by a single column
grouped_by_category = [Link]('Category')

# Grouping by multiple columns (e.g., if you had another column 'Subcategory')

# grouped_by_multiple = [Link](['Category', 'Subcategory'])
# Applying an aggregation function to a grouped object
sum_by_category = grouped_by_category['Value'].sum()
print("Sum by Category:\n", sum_by_category)
Page 33
# Applying multiple aggregations using .agg()
multi_agg_by_category = grouped_by_category['Value'].agg(['sum', 'mean', 'count'])
print("\nMultiple Aggregations by Category:\n", multi_agg_by_category)

# Applying different aggregations to different columns (if applicable)

# [Link]('Category').agg({'Value1': 'sum', 'Value2': 'mean'})

Page 34

Python MySQL Integration Guide
No ratings yet
Python MySQL Integration Guide
78 pages
Chapter 6 Interface Python With MYSQL
No ratings yet
Chapter 6 Interface Python With MYSQL
80 pages
cs12 - Python Connectivity With SQL
No ratings yet
cs12 - Python Connectivity With SQL
9 pages
Chapter 6 Interface Python With MYSQL
No ratings yet
Chapter 6 Interface Python With MYSQL
80 pages
Unit-5 SQL 1 Creating A Databse Connection
No ratings yet
Unit-5 SQL 1 Creating A Databse Connection
17 pages
Python 3 - MySQL Database Access
No ratings yet
Python 3 - MySQL Database Access
10 pages
Python & MySQL
No ratings yet
Python & MySQL
17 pages
Data Base Programming
No ratings yet
Data Base Programming
12 pages
Interface Python With SQL PDF
No ratings yet
Interface Python With SQL PDF
21 pages
Lecture On Interface of Python With An SQL Database
No ratings yet
Lecture On Interface of Python With An SQL Database
26 pages
Database Programming in Python
No ratings yet
Database Programming in Python
21 pages
(L9) Programming With Python (Intermediate Level)
No ratings yet
(L9) Programming With Python (Intermediate Level)
7 pages
CH - 4 Advance Python
No ratings yet
CH - 4 Advance Python
40 pages
Python MySQL
No ratings yet
Python MySQL
16 pages
Python My SQL
100% (2)
Python My SQL
13 pages
Server Side - Python - MySQL Connectivity With Python
No ratings yet
Server Side - Python - MySQL Connectivity With Python
13 pages
Preeti Arora Chapter8 Second Year Python With MY SQL
No ratings yet
Preeti Arora Chapter8 Second Year Python With MY SQL
29 pages
Unit 6
No ratings yet
Unit 6
20 pages
Connect Python to MySQL Database
No ratings yet
Connect Python to MySQL Database
6 pages
UNIT5
No ratings yet
UNIT5
21 pages
Lecture - 5 - Database Handling in Python - Tagged
No ratings yet
Lecture - 5 - Database Handling in Python - Tagged
40 pages
Unit 5 Python
No ratings yet
Unit 5 Python
13 pages
Interface Python With MYSQL
100% (1)
Interface Python With MYSQL
10 pages
It Sounds Like You
No ratings yet
It Sounds Like You
2 pages
MySQL Database Utilities in Python
No ratings yet
MySQL Database Utilities in Python
4 pages
Interface Python With SQL
No ratings yet
Interface Python With SQL
22 pages
Interface
No ratings yet
Interface
51 pages
Python MySQL Database Connectivity Guide
No ratings yet
Python MySQL Database Connectivity Guide
61 pages
Python MySQL Connectivity
No ratings yet
Python MySQL Connectivity
26 pages
Python and MySQL Integration Guide
No ratings yet
Python and MySQL Integration Guide
31 pages
Surya
No ratings yet
Surya
15 pages
Python Database
No ratings yet
Python Database
3 pages
Python MySQL Database Connection Guide
No ratings yet
Python MySQL Database Connection Guide
18 pages
Python-MySQL Connectivity
No ratings yet
Python-MySQL Connectivity
8 pages
Interface Python With MySQL - 1
No ratings yet
Interface Python With MySQL - 1
28 pages
Unit V Database Connectivity With MySQL
No ratings yet
Unit V Database Connectivity With MySQL
14 pages
Python-MySQL Database Connectivity Guide
No ratings yet
Python-MySQL Database Connectivity Guide
25 pages
MySQL-Python Setup Guide
No ratings yet
MySQL-Python Setup Guide
3 pages
Main 9
No ratings yet
Main 9
2 pages
Interface of MySQL and Python
No ratings yet
Interface of MySQL and Python
38 pages
5.database Programming Part-I & II 2022-23
No ratings yet
5.database Programming Part-I & II 2022-23
19 pages
Chapter 16 Cs Sumita Arora
No ratings yet
Chapter 16 Cs Sumita Arora
19 pages
Advance Python Unit - 4
No ratings yet
Advance Python Unit - 4
10 pages
MySQL Database Connectivity in Python
No ratings yet
MySQL Database Connectivity in Python
12 pages
Interface Python With SQL Database Notes
No ratings yet
Interface Python With SQL Database Notes
4 pages
MySQL with Python: A Beginner's Guide
No ratings yet
MySQL with Python: A Beginner's Guide
30 pages
Interface Python With MySQL
100% (1)
Interface Python With MySQL
40 pages
Ipproject
No ratings yet
Ipproject
4 pages
MySQL Connectivity With Python
No ratings yet
MySQL Connectivity With Python
9 pages
Unit-5 Python
100% (1)
Unit-5 Python
36 pages
Django and Python MySQL Connectivity Guide
100% (1)
Django and Python MySQL Connectivity Guide
4 pages
Python Database API Overview
No ratings yet
Python Database API Overview
9 pages
Inteface Python With Mysql
No ratings yet
Inteface Python With Mysql
13 pages
Python Hotel Management System
No ratings yet
Python Hotel Management System
7 pages
Mysqlconnectivity Final
100% (1)
Mysqlconnectivity Final
28 pages
T4 L8 Host Program Python
No ratings yet
T4 L8 Host Program Python
17 pages
Connect MySQL with Python Guide
No ratings yet
Connect MySQL with Python Guide
9 pages
Python To MySql Connection
No ratings yet
Python To MySql Connection
16 pages
Python MYSQL
No ratings yet
Python MYSQL
29 pages
105 Unit 4
No ratings yet
105 Unit 4
17 pages
CLASS 12 COMPUTER SCIENCE PRACTICAL FILE 2023 24 - Extracted
No ratings yet
CLASS 12 COMPUTER SCIENCE PRACTICAL FILE 2023 24 - Extracted
7 pages
University Management System
No ratings yet
University Management System
7 pages
SQLAlchemy vs. Raw SQL Queries in Python: A Compa
No ratings yet
SQLAlchemy vs. Raw SQL Queries in Python: A Compa
16 pages
SQL Programs (1-15)
No ratings yet
SQL Programs (1-15)
16 pages
SQL Queries for Employee Data Analysis
No ratings yet
SQL Queries for Employee Data Analysis
38 pages
SQL Insert Statement Practices
100% (1)
SQL Insert Statement Practices
19 pages
SQL Database Operations Guide
No ratings yet
SQL Database Operations Guide
48 pages
Database Management System Laboratory Manual Harsh
No ratings yet
Database Management System Laboratory Manual Harsh
77 pages
Assignment 3
No ratings yet
Assignment 3
30 pages
Java JDBC: Display Employee & Manage Movies
No ratings yet
Java JDBC: Display Employee & Manage Movies
4 pages
Developed By: Mujeeeb Rehman & Tamanna Solanki
No ratings yet
Developed By: Mujeeeb Rehman & Tamanna Solanki
29 pages
Python Data Analysis Handbook
No ratings yet
Python Data Analysis Handbook
57 pages
MySQL Made Easy - Joseph C Scott
No ratings yet
MySQL Made Easy - Joseph C Scott
971 pages
Reliable PyODBC Inserts Plus Technical Documnets
No ratings yet
Reliable PyODBC Inserts Plus Technical Documnets
4 pages
Document 15
No ratings yet
Document 15
15 pages
Simple Arithmetic Calculator Aim: Algorithm
No ratings yet
Simple Arithmetic Calculator Aim: Algorithm
15 pages
Database SQL and Function Reference
No ratings yet
Database SQL and Function Reference
228 pages
SQL Tips for Developers
No ratings yet
SQL Tips for Developers
184 pages
SQL Cheatsheet PDF
100% (1)
SQL Cheatsheet PDF
2 pages
MCA Lab Exam Answers
No ratings yet
MCA Lab Exam Answers
5 pages
DB2 V9 Application Programming&SQL Guide Dsnapk13
No ratings yet
DB2 V9 Application Programming&SQL Guide Dsnapk13
1,157 pages
Java Lab
No ratings yet
Java Lab
51 pages
Creating and Managing SQL Views
No ratings yet
Creating and Managing SQL Views
6 pages
Form Input, Update, Delete Data Siswa
No ratings yet
Form Input, Update, Delete Data Siswa
7 pages
Tut8 (Ans)
No ratings yet
Tut8 (Ans)
4 pages
SQL Cheat Sheet for Developers
No ratings yet
SQL Cheat Sheet for Developers
13 pages
DBMS University Schema Guide
No ratings yet
DBMS University Schema Guide
4 pages
Firebird 2.1 ReleaseNotes
No ratings yet
Firebird 2.1 ReleaseNotes
172 pages
Grade 5 - CH 3 - WS 2
No ratings yet
Grade 5 - CH 3 - WS 2
2 pages