Department of Computer Science and Engineering
Data Analytics and
Visualization
By Dr.S.Jagadeesh Kumar
Introduction to Pandas Data Structures (1)
✓ The fundamental data structures in pandas are the Series (one-dimensional) and the
DataFrame (two-dimensional), both designed to handle labeled and potentially
heterogeneous data efficiently.
Series:
✓ A Series is a one-dimensional, labeled array capable of holding data of any type—integers,
strings, floats, etc.
✓ It consists of two main components: the values and the index (labels).
✓ Series are often compared to a single column in a spreadsheet or a dictionary with labeled
keys.
Introduction to Pandas Data Structures (2)
✓ Example: To create a Series with labeled indices 'a', 'b', and 'c'.
import pandas as pd
s = pd.Series([10, 20, 30], index=['a', 'b', 'c’])
print(s)
DataFrame:
✓ A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data
structure with labeled rows and columns.
✓ Each column in a DataFrame is a Series; thus, different columns can store different data
types. You can think of a DataFrame as an entire spreadsheet or an SQL table where
columns represent variables and rows represent observations.
Introduction to Pandas Data Structures (3)
✓ Example: To create a table-like structure with 'Name' and 'Age' as columns.
import pandas as pd
data = {'Name': ['Amy', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
✓ Both structures offer powerful indexing and flexible handling of missing data.
✓ DataFrames can be constructed from lists, dictionaries, Series, arrays, or even other
DataFrames, enabling easy input from diverse data sources.
✓ The labeled axes (rows via index, columns via columns) make data selection and
manipulation intuitive and robust.
Key Characteristics and Functionality
✓ Labeled Data: Both Series and DataFrames leverage labels (indexes and column names) for
intuitive data access and manipulation.
✓ Missing Data Handling: Pandas uses NaN (Not a Number) as the standard marker for missing
data, and provides methods like isnull() and notnull() for checking, and dropna() and fillna()
for handling missing values.
✓ Integration: Pandas seamlessly integrates with other Python libraries like NumPy (which it is
built upon), Matplotlib for visualization, and scikit-learn for machine learning.
✓ Data Alignment: Data alignment is intrinsic to pandas, ensuring that operations on Series or
DataFrames with different or overlapping indices correctly align data based on shared labels.
✓ Flexibility: Pandas is well-suited for various data types, including tabular data, time series
data, and arbitrary matrix data, making it a versatile tool for diverse data analysis tasks.
Thank You