0% found this document useful (0 votes)
23 views24 pages

Lecture - 2 Pandas

The document discusses handling missing data in pandas, including methods for inserting, dropping, and filling missing values, as well as performing calculations that account for these values. It also covers various functions for summarizing data, such as describe(), idxmin(), idxmax(), and value_counts(), along with string processing methods that automatically exclude missing values. Additionally, it explains merging, joining, and concatenating DataFrames, highlighting the similarities to SQL operations.

Uploaded by

Rupal Gayakwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views24 pages

Lecture - 2 Pandas

The document discusses handling missing data in pandas, including methods for inserting, dropping, and filling missing values, as well as performing calculations that account for these values. It also covers various functions for summarizing data, such as describe(), idxmin(), idxmax(), and value_counts(), along with string processing methods that automatically exclude missing values. Additionally, it explains merging, joining, and concatenating DataFrames, highlighting the similarities to SQL operations.

Uploaded by

Rupal Gayakwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

LECTURE -2

PANDAS
Working With Missing Data
Values considered “missing”
pandas uses different sentinel values to represent a missing (also referred to as NA) depending on the data
type.
numpy.nan for NumPy data types. The disadvantage of using NumPy data types is that the original data
type will be coerced to np.float64 or object.
Inserting Missing Data

You can insert missing values by simply assigning to a Series


or DataFrame. The missing value sentinel used will be chosen
based on the dtype.
Example
Calculations With Missing Data
Missing values propagate through arithmetic operations
between pandas objects.

When summing data, NA values or empty data will be treated


as zero.
When taking the product, NA values or empty data will be
treated as 1.

Cumulative methods like cumsum() and cumprod() ignore NA


values by default preserve them in the result. This behavior
can be changed with skipna
Dropping Missing Data
dropna() drop a rows or columns with missing data.
Filling Missing Data
Filling by value
fillna() replaces NA values with non-NA data.
Replace NA with a scalar value
Fill gaps forward or backward
Boolean Indexing
Select rows where df.A is greater than 0.
Boolean Indexing

Using isin() method for filtering:


Stats
Operations in general exclude missing data.
Calculate the mean value for each column:
User Defined Functions
DataFrame.agg() and DataFrame.transform() applies a user
defined function that reduces or broadcasts its result
respectively.
Summarizing Data: describe
There is a convenient describe() function which computes a
variety of summary statistics about a Series or the columns of
a DataFrame (excluding NAs of course):
Index Of Min/Max Values
The idxmin() and idxmax() functions on Series and DataFrame
compute the index labels with the minimum and maximum
corresponding values:
Value Counts (Histogramming) / Mode
The value_counts() Series method computes a histogram of a 1D
array of values. It can also be used as a function on regular
arrays:
Value Counts (Histogramming) / Mode

The value_counts() method can be used to count combinations


across multiple columns.
String Methods
Series and Index are equipped with a set of string processing
methods that make it easy to operate on each element of the
array. Perhaps most importantly, these methods exclude
missing/NA values automatically. These are accessed via the
str attribute and generally have names matching the
equivalent (scalar) built-in string methods:
String Methods
Merge, Join, Concatenate
• concat() :
The concat() function concatenates an arbitrary amount of
Series or DataFrame objects along an axis while
performing optional set logic (union or intersection) of the
indexes on the other axes. Like numpy.concatenate, concat()
takes a list or dict of homogeneously-typed objects and
concatenates them.
Merge, Join, Concatenate
• Joining logic of the resulting axis
The join keyword specifies how to handle axis values that don’t
exist in the first DataFrame.
join='outer' takes the union of all axis values
Merge, Join, Concatenate

• merge():
merge() performs join operations similar to relational databases like SQL.

merge() implements common SQL style joining operations.


one-to-one: joining two DataFrame objects on their indexes which must
contain unique values.
many-to-one: joining a unique index to one or more columns in a
different DataFrame.
many-to-many : joining columns on columns.
EXAMPLE
DataFrame.join()
DataFrame.join() combines the columns of multiple, potentially differently-
indexed DataFrame into a single result DataFrame.

You might also like