Select a slice of rows by label/index
Working with rows [inclusive-from : inclusive–to [ : step]]
df = df['a':'c'] # rows 'a' through 'c'
Get the row index and labels Trap: cannot work for integer labelled rows – see
idx = [Link] # get row index previous code snippet on integer position slicing.
label = [Link][0] # first row label
label = [Link][-1] # last row label Append a row of column totals to a DataFrame
l = [Link]() # get as a list # Option 1: use dictionary comprehension
a = [Link] # get as an array sums = {col: df[col].sum() for col in df}
sums_df = DataFrame(sums,index=['Total'])
Change the (row) index df = [Link](sums_df)
[Link] = idx # new ad hoc index
df = df.set_index('A') # col A new index # Option 2: All done with pandas
df = df.set_index(['A', 'B']) # MultiIndex df = [Link](DataFrame([Link](),
df = df.reset_index() # replace old w new columns=['Total']).T)
# note: old index stored as a col in df
[Link] = range(len(df)) # set with list Iterating over DataFrame rows
df = [Link](index=range(len(df))) for (index, row) in [Link](): # pass
df = df.set_index(keys=['r1','r2','etc']) Trap: row data type may be coerced.
[Link](index={'old':'new'}, inplace=True)
Sorting DataFrame rows values
Adding rows df = [Link]([Link][0],
df = original_df.append(more_rows_in_df) ascending=False)
Hint: convert row to a DataFrame and then append. [Link](['col1', 'col2'], inplace=True)
Both DataFrames should have same column labels.
Sort DataFrame by its row index
Dropping rows (by name) df.sort_index(inplace=True) # sort by row
df = [Link]('row_label') df = df.sort_index(ascending=False)
df = [Link](['row1','row2']) # multi-row
Random selection of rows
Boolean row selection by values in a column import random as r
df = df[df['col2'] >= 0.0] k = 20 # pick a number
df = df[(df['col3']>=1.0) | (df['col1']<0.0)] selection = [Link](range(len(df)), k)
df = df[df['col'].isin([1,2,5,7,11])] df_sample = [Link][selection, :] # get copy
df = df[~df['col'].isin([1,2,5,7,11])] Note: this randomly selected sample is not sorted
df = df[df['col'].[Link]('hello')]
Trap: bitwise "or", "and" “not; (ie. | & ~) co-opted to be Drop duplicates in the row index
Boolean operators on a Series of Boolean df['index'] = [Link] # 1 create new col
Trap: need parentheses around comparisons. df = df.drop_duplicates(cols='index',
take_last=True)# 2 use new col
Selecting rows using isin over multiple columns del df['index'] # 3 del the col
# fake up some data df.sort_index(inplace=True)# 4 tidy up
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = DataFrame(data) Test if two DataFrames have same row index
len(a)==len(b) and all([Link]==[Link])
# multi-column isin
lf = {1:[1, 3], 3:[8, 27]} # look for Get the integer position of a row or col index label
f = df[df[list(lf)].isin(lf).all(axis=1)] i = [Link].get_loc('row_label')
Trap: index.get_loc() returns an integer for a unique
Selecting rows using an index match. If not a unique match, may return a slice/mask.
idx = df[df['col'] >= 2].index
print([Link][idx]) Get integer position of rows that meet condition
a = [Link](df['col'] >= 2) #numpy array
Select a slice of rows by integer position
[inclusive-from : exclusive-to [: step]] Test if the row index values are unique/monotonic
start is 0; end is len(df)
if [Link].is_unique: pass # ...
df = df[:] # copy entire DataFrame b = [Link].is_monotonic_increasing
df = df[0:2] # rows 0 and 1 b = [Link].is_monotonic_decreasing
df = df[2:3] # row 2 (the third row)
df = df[-1:] # the last row
Find row index duplicates
df = df[:-1] # all but the last row
if [Link].has_duplicates:
df = df[::2] # every 2nd row (0 2 ..)
print([Link]())
Trap: a single integer without a colon is a column label
Note: also similar for column label duplicates.
for integer numbered columns.
Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
4