0% found this document useful (0 votes)
13 views6 pages

Introduction To Pandas Data Structures

Uploaded by

extraspace2605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views6 pages

Introduction To Pandas Data Structures

Uploaded by

extraspace2605
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Department of Computer Science and Engineering

Data Analytics and


Visualization
By Dr.S.Jagadeesh Kumar
Introduction to Pandas Data Structures (1)

✓ The fundamental data structures in pandas are the Series (one-dimensional) and the
DataFrame (two-dimensional), both designed to handle labeled and potentially
heterogeneous data efficiently.

Series:

✓ A Series is a one-dimensional, labeled array capable of holding data of any type—integers,


strings, floats, etc.

✓ It consists of two main components: the values and the index (labels).
✓ Series are often compared to a single column in a spreadsheet or a dictionary with labeled
keys.
Introduction to Pandas Data Structures (2)

✓ Example: To create a Series with labeled indices 'a', 'b', and 'c'.
import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c’])
print(s)

DataFrame:

✓ A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data


structure with labeled rows and columns.

✓ Each column in a DataFrame is a Series; thus, different columns can store different data
types. You can think of a DataFrame as an entire spreadsheet or an SQL table where
columns represent variables and rows represent observations.
Introduction to Pandas Data Structures (3)

✓ Example: To create a table-like structure with 'Name' and 'Age' as columns.


import pandas as pd
data = {'Name': ['Amy', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)

✓ Both structures offer powerful indexing and flexible handling of missing data.
✓ DataFrames can be constructed from lists, dictionaries, Series, arrays, or even other
DataFrames, enabling easy input from diverse data sources.

✓ The labeled axes (rows via index, columns via columns) make data selection and
manipulation intuitive and robust.
Key Characteristics and Functionality

✓ Labeled Data: Both Series and DataFrames leverage labels (indexes and column names) for
intuitive data access and manipulation.
✓ Missing Data Handling: Pandas uses NaN (Not a Number) as the standard marker for missing
data, and provides methods like isnull() and notnull() for checking, and dropna() and fillna()
for handling missing values.
✓ Integration: Pandas seamlessly integrates with other Python libraries like NumPy (which it is
built upon), Matplotlib for visualization, and scikit-learn for machine learning.
✓ Data Alignment: Data alignment is intrinsic to pandas, ensuring that operations on Series or
DataFrames with different or overlapping indices correctly align data based on shared labels.
✓ Flexibility: Pandas is well-suited for various data types, including tabular data, time series
data, and arbitrary matrix data, making it a versatile tool for diverse data analysis tasks.
Thank You

You might also like