Experiment 12
TITLE: Data Series and Data Frames using Pandas
THEORY:
Pandas is a popular library in Python for data manipulation and analysis. It provides
two main data structures: Series and DataFrame, which are built on top of NumPy
arrays. Here's an overview of each:
1. Series:
A Series is a one-dimensional labeled array capable of holding any data type (e.g.,
integers, floats, strings, etc.).
It is similar to a one-dimensional NumPy array, but with an associated index that labels
each element.
You can create a Series using the [Link]() constructor, passing in a Python list,
NumPy array, or dictionary.
Series objects support vectorized operations and provide various attributes and methods
for data manipulation.
2. DataFrame:
A DataFrame is a two-dimensional labeled data structure with columns of potentially
different data types.
It is similar to a table in a relational database or a spreadsheet in Excel.
You can create a DataFrame using the [Link]() constructor, passing in data as a
dictionary, NumPy array, or another DataFrame.
DataFrames provide labeled rows and columns, making it easy to access and manipulate
data.
Key Features:
• Indexing and Selection: You can access individual elements, rows, or columns
of a Series or DataFrame using various indexing methods (e.g., integer
indexing, label indexing, boolean indexing).
• Data Alignment: Series and DataFrame objects automatically align data based
on their labels, making it easy to perform operations on data with different
indexes.
• Missing Data Handling: Pandas provides functions for detecting, removing, and
filling missing values in Series and DataFrame objects (dropna(), fillna(),
isnull(), notnull()).
• Grouping and Aggregation: Pandas supports grouping data by one or more keys
and applying aggregation functions (e.g., sum, mean, count) to compute
summary statistics (groupby(), agg()).
• Merging and Joining: DataFrames can be merged or joined together based on
one or more keys, similar to SQL join operations (merge(), concat()).
• Reshaping and Pivoting: Pandas provides functions for reshaping and pivoting
data between long and wide formats (stack(), unstack(), pivot_table()).
• Time Series Handling: Pandas has built-in support for working with time series
data, including date/time indexing, resampling, and frequency conversion
(DatetimeIndex, resample()).
• Plotting and Visualization: Data in Series and DataFrame objects can be easily
visualized using built-in plotting functions (plot()), which leverage the
Matplotlib library.