0% found this document useful (0 votes)
5 views1 page

Pandas DataFrame Notes - 12pages-Pages-11

The document discusses handling missing and non-finite data in pandas, highlighting the use of constructs like np.nan and None to represent missing values. It also covers working with categorical data, including creating, ordering, and renaming categories, as well as converting between data types. Key operations such as filling missing values, dropping rows or columns with NaN, and managing infinite numbers are also addressed.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views1 page

Pandas DataFrame Notes - 12pages-Pages-11

The document discusses handling missing and non-finite data in pandas, highlighting the use of constructs like np.nan and None to represent missing values. It also covers working with categorical data, including creating, ordering, and renaming categories, as well as converting between data types. Key operations such as filling missing values, dropping rows or columns with NaN, and managing infinite numbers are also addressed.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Working with missing and non-finite data Working with Categorical Data

Working with missing data Categorical data


Pandas uses the not-a-number construct ([Link] and The pandas Series has an R factors-like data type for
float('nan')) to indicate missing data. The Python None encoding categorical data.
can arise in data as well. It is also treated as missing s = Series(['a','b','a','c','b','d','a'],
data; as is the pandas not-a-time construct dtype='category')
([Link]). df['B'] = df['A'].astype('category')
Note: the key here is to specify the "category" data type.
Missing data in a Series Note: categories will be ordered on creation if they are
s = Series( [8,None,float('nan'),[Link]]) sortable. This can be turned off. See ordering below.
#[8, NaN, NaN, NaN]
[Link]() #[False, True, True, True] Convert back to the original data type
[Link]()#[True, False, False, False] s = Series(['a','b','a','c','b','d','a'],
[Link](0)#[8, 0, 0, 0] dtype='category')
s = [Link]('string')
Missing data in a DataFrame
df = [Link]() # drop all rows with NaN Ordering, reordering and sorting
df = [Link](axis=1) # same for cols s = Series(list('abc'), dtype='category')
df=[Link](how='all') #drop all NaN row print ([Link])
df=[Link](thresh=2) # drop 2+ NaN in r s=[Link].reorder_categories(['b','c','a'])
# only drop row if NaN in a specified col s = [Link]()
df = [Link](df['col'].notnull()) [Link] = False
Trap: category must be ordered for it to be sorted
Recoding missing data
[Link](0, inplace=True) # [Link]  0 Renaming categories
s = df['col'].fillna(0) # [Link]  0 s = Series(list('abc'), dtype='category')
df = [Link](r'\s+', [Link], [Link] = [1, 2, 3] # in place
regex=True) # white space  [Link] s = [Link].rename_categories([4,5,6])
# using a comprehension ...
Non-finite numbers [Link] = ['Group ' + str(i)
With floating point numbers, pandas provides for for i in [Link]]
positive and negative infinity. Trap: categories must be uniquely named
s = Series([float('inf'), float('-inf'),
[Link], -[Link]]) Adding new categories
Pandas treats integer comparisons with plus or minus s = [Link].add_categories([4])
infinity as expected.
Removing categories
Testing for finite numbers s = [Link].remove_categories([4])
(using the data from the previous example) [Link].remove_unused_categories() #inplace
b = [Link](s)

Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
11

You might also like