0% found this document useful (0 votes)
2 views2 pages

Pandas Notes

Uploaded by

santanusb9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

Pandas Notes

Uploaded by

santanusb9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pandas – Detailed Notes

Introduction to Pandas
• Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and data
manipulation Python library.
• Built on top of NumPy.

Main Data Structures


1. Series – 1D labeled array capable of holding any data type (integers, strings, floating point
numbers, Python objects, etc.).
2. DataFrame – 2D labeled data structure with columns of potentially different types (like a
spreadsheet or SQL table).
3. Index – Immutable sequence used for indexing and aligning data.

Key Features
• Handling missing data.
• Size mutability – columns can be inserted and deleted from DataFrame.
• Automatic and explicit data alignment.
• Powerful group by functionality for performing split-apply-combine operations.
• High performance merging and joining of data sets.
• Time series functionality.

Commonly Used Functions


• Data Loading: read_csv(), read_excel(), read_sql(), read_json(), read_html().
• Data Export: to_csv(), to_excel(), to_sql(), to_json().
• Data Selection: loc[], iloc[], at[], iat[].
• Data Cleaning: dropna(), fillna(), replace().
• Data Transformation: apply(), map(), astype().
• Grouping: groupby(), aggregate(), transform().
• Combining Data: merge(), join(), concat().
• Reshaping: pivot_table(), stack(), unstack(), melt().
• Descriptive Statistics: describe(), mean(), median(), std(), value_counts().

Indexing and Selection


• label-based selection with loc.
• integer location-based selection with iloc.
• Boolean indexing for filtering data.

Handling Missing Data


• dropna(): Drop missing values.
• fillna(): Fill missing values with a specified value or method.
• interpolate(): Fill missing values using interpolation.

Time Series and Date Functionality


• pandas has robust support for working with time series data, including date_range(),
to_datetime(), resample(), shifting, and rolling windows.

Visualization
• [Link]() uses matplotlib internally to plot data directly from pandas.

Integration with Other Libraries


• Works seamlessly with NumPy, Matplotlib, SQLAlchemy, openpyxl, PyArrow, and more.

Summary
Pandas is the go-to library in Python for data manipulation, cleaning, analysis, and preparation. It
integrates with numerous libraries for visualization and file handling, making it an essential tool for
data scientists and analysts.

You might also like