0% found this document useful (0 votes)
0 views4 pages

Pandas

Pandas is especially useful when working with tabular data such as CSV files, Excel spreadsheets, SQL tables, and time-series data.

Uploaded by

benti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views4 pages

Pandas

Pandas is especially useful when working with tabular data such as CSV files, Excel spreadsheets, SQL tables, and time-series data.

Uploaded by

benti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Pandas

Handout (Approx. 5 Pages)

1. Introduction to Pandas
Pandas is a powerful open-source Python library used for data manipulation, analysis,
and cleaning. It provides high-level data structures and tools that make working with
structured data easy and efficient. Pandas is built on top of NumPy and is a core library in
data science, statistics, machine learning, and business analytics.

Pandas is especially useful when working with tabular data such as CSV files, Excel
spreadsheets, SQL tables, and time-series data.

2. Installing and Importing Pandas


2.1 Installation

Pandas can be installed using the Python package manager pip:

pip install pandas

2.2 Importing Pandas

The standard convention is to import Pandas with the alias pd:

import pandas as pd

This alias is commonly used in almost all Pandas-based programs.

3. Core Data Structures in Pandas


3.1 Series

A Series is a one-dimensional labeled array capable of holding data of any type.

data = [10, 20, 30, 40]


series = [Link](data)
print(series)

Key features of Series:

• One-dimensional
• Indexed
• Supports different data types

3.2 DataFrame

A DataFrame is a two-dimensional data structure similar to a table in a database or an


Excel spreadsheet.

data = {
'Name': ['Ali', 'Sara', 'John'],
'Age': [20, 22, 21],
'Department': ['CS', 'IT', 'SE']
}

df = [Link](data)
print(df)

DataFrames are the most commonly used Pandas structure.

4. Loading and Saving Data


4.1 Reading Data from Files

df = pd.read_csv('[Link]')
df = pd.read_excel('[Link]')

4.2 Writing Data to Files

df.to_csv('[Link]', index=False)
df.to_excel('[Link]', index=False)

Pandas also supports reading from and writing to SQL databases.


5. Exploring and Inspecting Data
Common methods for exploring data:

print([Link]())
print([Link]())
print([Link]())
print([Link]())

These methods help understand the structure, data types, and summary statistics of datasets.

6. Data Selection and Indexing


6.1 Selecting Columns

print(df['Name'])

6.2 Selecting Rows

print([Link][0])
print([Link][1])

6.3 Conditional Selection

print(df[df['Age'] > 20])

7. Data Cleaning and Handling Missing Values


Real-world data often contains missing or incorrect values.

7.1 Checking for Missing Values

print([Link]().sum())

7.2 Handling Missing Values

[Link](0, inplace=True)
[Link](inplace=True)

Data cleaning is one of the most important uses of Pandas.


8. Data Analysis and Operations
8.1 Sorting Data

df.sort_values(by='Age', ascending=True)

8.2 Grouping Data

[Link]('Department')['Age'].mean()

8.3 Applying Functions

df['Age_plus_1'] = df['Age'].apply(lambda x: x + 1)

9. Pandas with Databases and Other Libraries


Pandas integrates well with:

• NumPy – numerical computing


• Matplotlib / Seaborn – data visualization
• SQL databases – data storage and retrieval
• Scikit-learn – machine learning

Example: Reading data from MySQL

pd.read_sql(query, connection)

10. Applications of Pandas


Pandas is widely used in:

• Data analysis and reporting


• Business intelligence
• Financial analysis
• Scientific research
• Machine learning preprocessing
• Database reporting systems

You might also like