0% found this document useful (0 votes)
33 views31 pages

Data Science Lab-Manual Upto 3-Excercise

Uploaded by

Gayathri meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views31 pages

Data Science Lab-Manual Upto 3-Excercise

Uploaded by

Gayathri meena
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Ex. No.

:01
Date :
Python Packages Installation
Aim
To download, install and explore the features of Numpy, Scipy, Jupyter, Statsmodel, and Pandas
packages.

What is Python librarie’s?


A Python library is a collection of related modules. It contains bundles of code that can be used repeatedly
in different programs. It makes Python Programming simpler and convenient for the programmer. a Python
library is simply a collection of codes or modules of codes that we can use in a program for specific
operations. We use libraries so that we don’t need to write the code again in our program that is already
available. But how it works.

What is NumPy?
Python NumPy is a general-purpose array processing package that provides tools for handling
ndimensional arrays. It provides various computing tools such as comprehensive mathematical functions,
linear algebra routines. NumPy provides both the flexibility of Python and the speed of well-optimized
compiled C code. Its easy-to-use syntax makes it highly accessible and productive for programmers from
any background.
Python is open source object oriented interpreted language. Of the many features, one of the
important features that makes python a strong programming language is Python packages. A lot of external
packages are written in python which you can be installed and used depending upon your requirement.
Python packages are nothing but directory of python scripts. Each script is a module which can be a function,
methods or new python type created for particular functionality. numpy is one such important package
created to ease array computation in python.
In this we will explain the process of downloading and installing numpy packages and how to use
them in python environment on mac, windows, ubuntu and fedora operating systems. The basics of python
programming language are not covered in this blog. For beginners, the basics of python programming
language are covered in this Edureka blog.
All python packages are installed using pip – Package Installer for Python. You can view the details
of all python packages and download them from Python Package Index (PyPI). However, pip is
automatically installed when you download and install python from python.org or any other python
integrated environment. Please read the blog for the best python integrated platforms which also provides
loads of other functionalities. pip is the simplest way to download packages directly from PyPI from your
command line.

NumPy Installation on Windows Operating System


Python is not installed by default in windows operating system. You can download the required version of
python from python.org. Once python is installed successfully, open command prompt and use pip to install
numpy.
Pre-requisites:
The only thing that you need for installing Numpy on Windows are:
• Python
• PIP or Conda (depending upon user preference)

Installing Numpy on Windows:


Installing NumPy
1
You can follow the steps outlined below and use the commands on most Linux, Mac, or Windows
systems. Any irregularities in commands are noted along with instructions on how to modify them to your
needs.
Step 1: Check Python Version
Before you can install NumPy, you need to know which Python version you have. This programming
language comes preinstalled on most operating systems (except Windows; you will need to install Python
on Windows manually).
Most likely, you have Python 2 or Python 3 installed, or even both versions. To
check whether you have Python 2, run the command:

$ python -V
The output should give you a version number.

Step 2: Install Pip


The easiest way to install NumPy is by using Pip. Pip a package manager for installing and managing
Python software packages.
Unlike Python, Pip does not come preinstalled on most operating systems. Therefore, you need to set up the
package manager that corresponds to the version of Python you have. If you have both versions of Python,
install both Pip versions as well.
The commands below use the apt utility as we are installing on Ubuntu for the purposes of this article.
Install Pip (for Python 2) by running:

$ sudo apt install python-pip

Step 3: Install NumPy


With Pip set up, you can use its command line for installing NumPy.
Install NumPy with Python 2 by typing:

$ pip install numpy


Pip downloads the NumPy package and notifies you it has been successfully installed.

Step 4: Verify NumPy Installation


2
Use the show command to verify whether NumPy is now part of you Python packages:

$ pip show numpy


Upgrading NumPy
If you already have NumPy and want to upgrade to the latest version, for Pip2 use the command: $
pip install --upgrade numpy

SciPy What
is scipy?
SciPy in Python is an open-source library used for solving mathematical, scientific, engineering, and
technical problems. It allows users to manipulate the data and visualize the data using a wide range of
highlevel Python commands. SciPy is built on the Python NumPy extension.
Why SciPy?
• SciPy contains varieties of sub packages which help to solve the most common issue related to
Scientific Computation.
• SciPy package in Python is the most used Scientific library only second to GNU Scientific Library
for C/C++ or Matlab’s.
• Easy to use and understand as well as fast computational power.
• It can operate on an array of NumPy library.
To install the main SciPy packages from the SciPy library, using Windows, Mac or Linux. SciPy is a free
and open-source Python library with packages optimized and developed for scientific and technical
computing. If you have Python installed, you can use Python's standard pip package manager, and install it
from the Python Package index.
Using the Python Package Index

Open the SciPy website in your internet browser. Type or paste https://www.scipy.org/ into the address
bar, and press ↵ Ente or ⏎ Return on your keyboard.
3
Click the Install button on the home page. This button looks like a downward green arrow on the
blueand-white SciPy icon. You can find it near the upper-left corner of the page.
• This will open the SciPy installation details on a new page.

Make sure Python is installed on your computer. SciPy is an open-source Python library, and requires a
basic Python distribution installed on your system.
• If you don't have Python installed, you can select one of the recommended distributions under the
"Scientific Python Distributions" heading, and install it to your computer.

4
• If you're not sure how to install Python, make sure to check out this article for detailed instructions
on installing the core packages.

• Open your computer's command prompt terminal. You can open the Command Prompt on
Windows, Terminal on Mac, or your distribution's Terminal on Linux.

Type and run python -m pip install -U pip. This command will make sure the latest pip files are
installed on your system to handle package managing tasks.
• Press ↵ Ente or ⏎ Return to run the command.

Type and run pip install scipy in the command prompt. This will use the Python Package index, and
install the core SciPy packages on your computer.
• You can also install other core packages like Numpy and Matplotlib by using the pip install numpy
and pip install matplotlib commands.

5
Pandas
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working
with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building
block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of
becoming the most powerful and flexible open source data analysis/manipulation tool available in any
language.
Pandas is well suited for many different kinds of data:
• Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
• Ordered and unordered (not necessarily fixed-frequency) time series data.
• Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
• Any other form of observational / statistical data sets. The data need not be labeled at all to be
placed into a pandas data structure

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle
the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering.
For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built
on top of NumPy and is intended to integrate well within a scientific computing environment with many
other 3rd party libraries. Python installation
Pandas dataframes are some of the most useful data structures available in any library. It has uses in every
data-intensive field, including but not limited to scientific computing, data science, and machine learning.
The library does not come included with a regular install of Python. To use it, you must install the Pandas
framework separately.
How to Install Python Pandas on Windows?
Before you install Pandas, you must bear in mind that it supports only Python versions 3.7, 3.8, and 3.9.
Therefore, if you have not installed Python on your computer or have an older version of Python installed,
you must install a version that supports Pandas on your computer.
Installing with pip
It is a package installation manager that makes installing Python libraries and frameworks straightforward.
As long as you have a newer version of Python installed (> Python 3.4), pip will be installed on your
computer along with Python by default.
6
However, if you’re using an older version of Python, you will need to install pip on your computer before
installing Pandas. The easiest way to do this is to upgrade to the latest version of Python available on
https://www.python.org.
Step #1: Launch Command Prompt
Press the Windows key on your keyboard or click on the Start button to open the start menu. Type “cmd,”
and the Command Prompt app should appear as a listing in the start menu. Open up the command prompt
so you can install Pandas.

Step #2: Enter the Required Command


After you launch the command prompt, the next step in the process is to type in the required command to
initialize pip installation.
Enter the command “pip install pandas” on the terminal. This should launch the pip installer. The required
files will be downloaded, and Pandas will be ready to run on your computer.

Jupyter
The Jupyter Notebook is an open source web application that you can use to create and share documents
that contain live code, equations, visualizations, and text. Jupyter Notebook is maintained by the people at
Project Jupyter.

7
Jupyter Notebooks are a spin-off project from the IPython project, which used to have an IPython
Notebook project itself. The name, Jupyter, comes from the core supported programming languages that it
supports:
Julia, Python, and R. Jupyter ships with the IPython kernel, which allows you to write your programs in
Python, but there are currently over 100 other kernels that you can also use.
The Jupyter Notebook is not included with Python, so if you want to try it out, you will need to install
Jupyter.
There are many distributions of the Python language.. The most popular is CPython, which is the reference
version of Python that you can get from their website. It is also assumed that you are using Python 3.
Installation
If so, then you can use a handy tool that comes with Python called pip to install Jupyter Notebook like
this: $ pip install jupyter
Starting the Jupyter Notebook Server
Now that you have Jupyter installed, let’s learn how to use it. To get started, all you need to do is open up
your terminal application and go to a folder of your choice. I recommend using something like your
Documents folder to start out with and create a subfolder there called Notebooks or something else that is
easy to remember.
Then just go to that location in your terminal and run the following command:

$ jupyter notebook

This will start up Jupyter and your default browser should start (or open a new tab) to the following
URL: http://localhost:8888/tree

Your browser should now look something like this:

Creating a Notebook

Now that you know how to start a Notebook server, you should probably learn how to create an actual
Notebook document.

8
All you need to do is click on the New button (upper right), and it will open up a list of choices. On my
machine, I happen to have Python 2 and Python 3 installed, so I can create a Notebook that uses either of
these. For simplicity’s sake, let’s choose Python 3.

Your web page should now look like this:

Naming

You will notice that at the top of the page is the word Untitled. This is the title for the page and the name
of your Notebook. Since that isn’t a very descriptive name, let’s change it!

Just move your mouse over the word Untitled and click on the text. You should now see an in-browser
dialog titled Rename Notebook. Let’s rename this one to Hello Jupyter:

9
Running Cells

A Notebook’s cell defaults to using code whenever you first create one, and that cell uses the kernel that
you chose when you started your Notebook.
In this case, you started yours with Python 3 as your kernel, so that means you can write Python code in
your code cells. Since your initial Notebook has only one empty cell in it, the Notebook can’t really do
anything.

Thus, to verify that everything is working as it should, you can add some Python code to the cell and try
running its contents.

Let’s try adding the following code to that cell:

print('Hello Jupyter!')
Running a cell means that you will execute the cell’s contents. To execute a cell, you can just select the
cell and click the Run button that is in the row of buttons along the top. It’s towards the middle. If you
prefer using your keyboard, you can just press Shift + Enter .

When I ran the code above, the output looked like this:

If you have multiple cells in your Notebook, and you run the cells in order, you can share your variables and
imports across cells. This makes it easy to separate out your code into logical chunks without needing to
reimport libraries or recreate variables or functions in every cell.

When you run a cell, you will notice that there are some square braces next to the word In to the left of the
cell. The square braces will auto fill with a number that indicates the order that you ran the cells. For
example, if you open a fresh Notebook and run the first cell at the top of the Notebook, the square braces
will fill with the number 1

10
Statsmodels

Statsmodels is a Python module that provides classes and functions for the estimation of many different
statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive
list of result statistics are available for each estimator. The results are tested against existing statistical
packages to ensure that they are correct. The package is released under the open source Modified BSD
(3clause) license. The online documentation is hosted at statsmodels.org.

As its name implies, statsmodels is a Python library built specifically for statistics. Statsmodels is built on
top of NumPy, SciPy, and matplotlib, but it contains more advanced functions for statistical testing and
modeling that you won't find in numerical libraries like NumPy or SciPy.

Some of the essential features of this package are-

1. It includes various models of linear regression like ordinary least squares, generalized least
squares, weighted least squares, etc.
2. It provides some efficient functions for time series analysis.
3. It also has some datasets for examples and testing.
4. Models based on survival analysis are also available.
5. All the statistical tests that we can imagine for data on a large scale are present.

Installing statsmodels

Let's have a look at the steps of installing statsmodels in Python-

1. Checking the version of Python installed in our PCs, we have discussed this already in the previous
articles but let's talk about this again-
There are two ways to check the version of Python in Windows-

o Using Powershell o
Using Command Prompt

Using PowerShell

Follow the below steps to check the version of Python using PowerShell.

1. Click 'Win+R' or type 'Run' on the taskbar's search pane.


2. Type 'Powershell'
3. A window will appear on your screen named 'Windows Powershell'
11
4. Click on 'Enter'
5. Type python -version and click on 'Enter'
6. The version would be displayed in the next line.

Using Command Prompt

Type 'Command Prompt' on the taskbar's search pane and you'll see its icon. Click on it to open the
command prompt.

Also, you can directly click on its icon if it is pinned on the taskbar.

1. Once the 'Command Prompt' screen is visible on your screen.


2. Type python -version and click on 'Enter'.
3. The version installed in your system would be displayed in the next line.

12
Result:

Thus, the python libraries Numpy, pandas, statsmodels, jupyter are downloaded and installed
successfully.

Ex.No : 02 Date
:

NumPy Arrays
Aim:

To write the python programs to establish various NumPy array operations like, Attributes,
Indexing, Slicing, Reshaping, Concatenation and Splitting.

Array Attribute’s
Aim:

To write a python program to generate an array and show its attributes like dimension, shape and
size.

Algorithm:
1. Start the python program.
2. Import the numpy library as np
3. Generate an array by using randint method.
4. For 2 and 3 dimensional arrays mention the number of rows and columns in size ( ).
5. Print the dimension, size, and shape of an array separately.
6. Stop the program.
Program:
import numpy as np np.random.seed(0)
x1 = np.random.randint(10, size=6) x2 =
np.random.randint(10, size=(3, 4)) x3 =
np.random.randint(10, size=(3, 4, 5))
print("x1 ndim: ", x1.ndim) print("x2
shape:", x2.shape)
print("x3 size: ", x3.size)

Output:
13
x1 ndim: 1 x2
shape: (3, 4)
x3 size: 60

14
Array Indexing
Aim:
To write a python program to perform array indexing concept.

Algorithm:
1. Start the python program.
2. Import the NumPy package as np.
3. Generate an array using random values.
4. Apply various indexing methods to identify elements in an array.
5. Apply negative indexing to refer element in an array from right to left.
6. Stop the program.

Program: Single dimensional array


import numpy as np x1 =
np.random.randint(10, size=6)
print('array') print(x1)
print('select an element from an array') print(x1[0])
print('select an element by negative index') print(x1[-3])

Output:
array [5 9 3
0 5 0]
select an element from an array
5
select an element by negative index
0

Program: Multi-dimensional array import


numpy as np
x1 = np.random.randint(12, size=(3,4))
print('array') print(x1) print('select an
element from an array') print(x1[1,2])

Output:
array [[ 0 10
2 11]
[10 7 11 2] [
9 2 3 11]]
select an element from an array
11

Array Slicing
Aim:
15
To write a python program to perform slicing operation in an array.
Algorithm:
1. Start the python program.
2. Import the NumPy library as np.
3. Generate an array.
4. Slice the array to specific no of starting elements or element after the specified no of elements
or specified number of middle elements.
5. Slice the array with step of elements.
6. Stop the program.

Program: Slicing single dimensional array


import numpy as np print('A sequence
Array') x = np.arange(10)
print(x)
print('Slice first 3 values') print(x[:3])
print('Slice elements after index 7') print(x[7:])
print('slice middle values') print(x[4:7])
print('slice every second element')
print(x[::2])

Output:
A sequence Array
[0 1 2 3 4 5 6 7 8 9]
Slice first 3 values
[0 1 2]
Slice elements after index 7
[7 8 9]
slice middle values
[4 5 6]
Slice every second element
[0 2 4 6 8]

Program: Slicing multi-dimensional array

import numpy as np
x = np.random.randint(10, size=(3, 4)) print('Array',x2)
print('Slice two rows and three columns') print(x[:2,:3])
print('Slice three rows and two columns') print(x[:3,::2])
print('Slice a particular value') print(x[:1,:1])

Output: Array [[3 5 2


4]
[7 6 8 8]
[1 6 7 7]]
Slice two rows and three columns
16
[[3 2 0]
[8 3 8]]
Slice three rows and two columns
[[3 0]
[8 8]
[8 3]]
Slice a particular value
[[3]]

17
Array Reshaping
Aim:
To write a python program to reshape the given array.
Algorithm:
1. Start the python program.
2. Import the NumPy array as np.
3. Generate an array.
4. Reshape the single dimensional array into multi-dimensional array.
5. Stop the program.

Program:
import numpy as np
x=np.arange(1,10)
print('Array',x)
x1=x.reshape(3,3)
print('Reshaped Array') print(x1)

Output:

Array [1 2 3 4 5 6 7 8 9]
Reshaped Array
[[1 2 3]
[4 5 6]
[7 8 9]]

18
Array Concatenation and Splitting
Aim:
To write a python program to concatenate two arrays into a single array and to split a single array into
small arrays.

Algorithm:
1. Start the python program.
2. Import the NumPy as np
3. Generate two arrays x and y.
4. Concatenate the x and y as z.
5. Print the z array.
6. Split the z into x1,x2. The splitting point is index 3
7. Print the split arrays.
8. Stop the program. Program:
import numpy as np x=np.array([1,2,3,4,5])
y=np.array([6,7,8,9,10])
print('X=',x) print('Y=',y)
print('Array Concatenation')
z=np.concatenate([x,y])
print('Z=',z) print('Array
Splitting')
x1,x2=np.split(z,[3])
print('X1=',x1)
print('X2=',x2)

Output:
X= [1 2 3 4 5]
Y= [ 6 7 8 9 10]
Array Concatenation
Z= [ 1 2 3 4 5 6 7 8 9 10]
Array Splitting
X1= [1 2 3]
X2= [ 4 5 6 7 8 9 10]

Result:
Thus, the program for various array operations are written and executed successfully.

19
Ex.No. : 03
Date :
Pandas
Aim:
Write the python program to perform various actions with pandas Series and Data Frames.

Pandas – Series Aim:


To write the python program to create and perform various actions in Pandas series.

Algorithm:
1. Start the python program.
2. Import the pands library as pd.
3. Create the series as pd.Series() and perform various actions.
4. Stop the program.

Program:
Series with default index.
import pandas as pd x=pd.Series([10,20,30,40,50])
print(x)

Output:
0 10
1 20
2 30
3 40
4 50
dtype: int64

Series with explicit index import


pandas as pd
x=pd.Series([10,20,30,40,50],index=['a','b','c','d','e']) print(x)

Output:
a 10
b 20
c 30
d 40
e 50
dtype: int64

Series as specialized dictionary and display single and group of data’s


import pandas as pd marks = {'Ram':80,
'Kasim':90,
'Faizal':85,
20
'Tamil':95,
'Sai':92}

21
result = pd.Series(marks)
print(result)
print('Display a single data in series')
print(result['Ram'])
print('Display a set of data in series')
print(result['Kasim':'Tamil'])

Output:
R
a
m
8
0
Kasim 90
Faizal 85
Tami
l 95
Sai
92
dtype: int64
Display a single data in series
80
Display a set of data in series
Kasim 90
Faizal
85
Tamil
95
dtype: int64

Pandas – Data Frame

Aim:
To write the python program to create and perform various actions in Pandas Data Frame.

Algorithm:
1. Start the python program.
2. Import the pands library as pd.
3. Create the series as pd.DataFrame() and perform various actions.
4. Stop the program.

Program: simple data frame

22
import pandas as pd
data = {"Roll No": [100, 101, 102],
"Percentage": [95, 80, 75]
}
df = pd.DataFrame(data)
print(df)

Output:
Roll No Percentage
0 100 95
1 101 80
2 102 75

Locate Row
import pandas as pd data =
{"Roll No": [100, 101,
102],
"Percentage": [95, 80, 75]
}
df = pd.DataFrame(data)
print(df.loc[1])

Output:
Roll
No
101
Percentage 80
Name: 1, dtype: int64

Named Indexes
import
pandas as
pd data = {
"Roll No": [100, 101, 102],
"Percentage": [95, 80, 75]
}
df = pd.DataFrame(data, index = ["AAA", "BBB",
"CCC"]) print(df) print('Locate a named index')
print(df.loc['BBB'])

Output:
Roll No Percentage

23
AAA 100 95
BBB 101 80
CCC 102 75
Locate a named index
Roll No 101
Percentage 80
Name: BBB, dtype: int64

Checking for missing values using isnull() and notnull() :


import pandas as
pd import numpy
as np
dict = {'First Score':[100, 90,
np.nan, 95], 'Second Score':
[30, 45, 56, np.nan], 'Third
Score':[np.nan, 40, 80, 98]} df =
pd.DataFrame(dict) df.isnull()

Output:

First Score Second Score Third Score

0 False False True


1 False False False
2 True False False
3 False True False

Filling missing values


using fillna() import
pandas as pd import
numpy as np
dict = {'First Score':[100, 90, np.nan,
95], 'Second Score': [30, 45, 56,
np.nan], 'Third Score':[np.nan,
40, 80, 98]}
df = pd.DataFrame(dict)
df.fillna(0)

Output:
First Score Second Score Third Score
0 100.0 30.0 0.0
1 90.0 45.0 40.0

24
2 0.0 56.0 80.0
3 95.0 0.0 98.0

Drop null values


import
pandas as
pd import
numpy as
np
dict = {'First Score':[100, 90, np.nan,
95], 'Second Score': [30, 45, 56,
np.nan], 'Third Score':[np.nan,
40, 80, 98]}
df = pd.DataFrame(dict)
df.dropna()

Output:
First Score Second Score Third Score

1 90.0 45.0 40.0

String functions
Convert upper case to
lower case import
pandas as pd
import numpy as
np
s = pd.Series(['X', 'Y', 'Z', 'Aaba', 'Baca', np.nan, 'CABA', None, 'bird',
'horse', 'dog']) print("Original series:") print(s)
print("\nConvert all string values to upper case:")
print(s.str.upper())
print("\nConvert all string values to lower case:")
print(s.str.lower())
print("\nLength of the string values:")
print(s.str.len())

Output:
Output
Original series:
0 X
1 Y

25
2 Z
3 Aaba
4 Baca
5 NaN
6 CABA
7 None
8 bird
9 horse 10 dog dtype: object

Convert all string values to upper case:


0 X
1 Y
2 Z
3 AABA 4 BACA
5 NaN
6 CABA
7 None
8 BIRD
9 HORSE 10 DOG dtype: object

Convert all string values to lower case:


0 x
1 y
2
z
3 aaba
4 baca
5 NaN
6 caba
7 None
8 bird
9 horse 10 dog dtype: object

Length of the string values of the said Series:


0 1.0
1 1.0
2 1.0
3 4.0
4 4.0
5 NaN
6 4.0
7 NaN
8 4.0

26
9 5.0 10 3.0 dtype: float64

Remove whitespaces, left sided whitespaces and right sided whitespaces of the string
values of a given pandas series

import pandas as pd
color1 = pd.Index([' Green', 'Black ', ' Red ', 'White', '
Pink ']) print("Original series:") print(color1)
print("\nRemove whitespace")
print(color1.str.strip())
print("\nRemove left sided whitespace")
print(color1.str.lstrip())
print("\nRemove Right sided whitespace")
print(color1.str.rstrip())

Output:
Original series:
Index([' Green', 'Black ', ' Red ', 'White', ' Pink '], dtype='object')

Remove whitespace
Index(['Green', 'Black', 'Red', 'White', 'Pink'], dtype='object')

Remove left sided whitespace


Index(['Green', 'Black ', 'Red ', 'White', 'Pink '], dtype='object')

Remove Right sided whitespace


Index([' Green', 'Black', ' Red', 'White', ' Pink'], dtype='object')

Pandas Joining and merging DataFrame along


rows import pandas as pd
student_data1 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
'marks': [200, 210, 190, 222, 199]})

student_data2 = pd.DataFrame({
'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['FFF', 'GGG', 'HHH', 'III', 'JJJ'],
'marks': [201, 200, 198, 219, 201]})

27
print("Original DataFrames:")
print(student_data1)
print("-------------------------------------")
print(student_data2)
print("\nJoin the said two dataframes along rows:")
result_data = pd.concat([student_data1,
student_data2]) print(result_data)

Output:
Original DataFrames:
student_id name
marks
0 S1 AAA 200
1 S2 BBB 210
2 S3 CCC 190
3 S4 DDD 222
4 S5 EEE 199
-------------------------------------
student_id name marks
0 S4 FFF 201
1 S5 GGG 200
2 S6 HHH 198
3 S7 III 219
4 S8 JJJ 201

Join the said two dataframes along rows:


student_id name marks
0 S1 AAA 200
1 S2 BBB 210
2 S3 CCC 190
3 S4 DDD 222
4 S5 EEE 199
0 S4 FFF 201
1 S5 GGG 200
2 S6 HHH 198
3 S7 III 219
4 S8 JJJ 201

To join the two given data frames along columns

import pandas as pd

28
student_data1 = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
'marks': [200, 210, 190, 222, 199]})

student_data2 = pd.DataFrame({
'student_id': ['S4', 'S5', 'S6', 'S7', 'S8'],
'name': ['FFF', 'GGG', 'HHH', 'III', 'JJJ'],
'marks': [201, 200, 198, 219, 201]})

print("Original DataFrames:")
print(student_data1)
print("-------------------------------------")
print(student_data2)
print("\nJoin the said two dataframes along rows:")
result_data = pd.concat([student_data1,
student_data2],axis = 1) print(result_data)

Output:
Original DataFrames:
student_id name
marks 0 S1
AAA 200
1 S2 BBB 210
2 S3 CCC 190
3 S4 DDD 222
4 S5 EEE 199
---------------------------
---------- student_id
name marks 0 S4
FFF 201
1 S5 GGG 200
2 S6 HHH 198
3 S7 III 219
4 S8 JJJ 201

Join the said two dataframes along rows:


student_id name marks student_id name marks
0 S1 AAA 200 S4 FFF 201
1 S2 BBB 210 S5 GGG 200
2 S3 CCC 190 S6 HHH 198
3 S4 DDD 222 S7 III 219 4 S5 EEE 199 S8 JJJ 201

29
To append rows to an existing Data Frame and display the combined data.
import pandas as pd
student_data1 =
pd.DataFrame({
'student_id': ['S1', 'S2', 'S3', 'S4', 'S5'],
'name': ['AAA', 'BBB', 'CCC', 'DDD', 'EEE'],
'marks': [200, 210, 190, 222, 199]})

s6=pd.Series(['S6','FFF',205],index=['student_id','name','marks'])

print("Original
DataFrames:")
print(student_data1)
print("\n New
Row(s)")
print(s6)
combined_data=student_data1.append(s6,ignore_index=True)
print("\n Combined data:")
print(combined_data)

Output:

Original DataFrames:
student_id name
marks
0 S1 AAA 200
1 S2 BBB 210
2 S3 CCC 190
3 S4 DDD 222
4 S5 EEE 199

New Row(s)
student_id
S6 name
FFF marks
205
dtype: object

Combined data:
student_id name
marks
0 S1 AAA 200
1 S2 BBB 210

30
2 S3 CCC 190
3 S4 DDD 222
4 S5 EEE 199
5 S6 FFF 205

Result:
Thus, the programs to exhibit various Pandas features are written and executed successfully.

31

You might also like