Pandas DataFrame Cheat Sheet Guide
Pandas DataFrame Cheat Sheet Guide
Series of data
Series of data
Series of data
Series of data
Series of data
Series of data
+'USER:PASSWORD@HOST/DATABASE')
([Link])
df = pd.read_sql_table('table', engine)
Row index
Series object: an ordered, one-dimensional array of Get a DataFrame from a Python dictionary
data with an index. All the data in a Series is of the # default --- assume data is in columns
same data type. Series arithmetic is vectorised after first df = DataFrame({
aligning the Series index for each of the operands. 'col0' : [1.0, 2.0, 3.0, 4.0],
'col1' : [100, 200, 300, 400]
s1 = Series(range(0,4)) # -> 0, 1, 2, 3 })
s2 = Series(range(1,5)) # -> 1, 2, 3, 4
s3 = s1 + s2 # -> 1, 3, 5, 7
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
1
Get a DataFrame from data in a Python dictionary
# --- use helper method for data in rows Working with the whole DataFrame
df = DataFrame.from_dict({ # data by row
# rows as python dictionaries Peek at the DataFrame contents/structure
'row0' : {'col0':0, 'col1':'A'},
[Link]() # index & data types
'row1' : {'col0':1, 'col1':'B'}
dfh = [Link](i) # get first i rows
}, orient='index')
dft = [Link](i) # get last i rows
dfs = [Link]() # summary stats cols
df = DataFrame.from_dict({ # data by row
top_left_corner_df = [Link][:4, :4]
# rows as python lists
'row0' : [1, 1+1j, 'A'],
'row1' : [2, 2+2j, 'B'] DataFrame non-indexing attributes
}, orient='index') dfT = df.T # transpose rows and cols
l = [Link] # list row and col indexes
Create play/fake data (useful for testing) (r_idx, c_idex) = [Link] # from above
s = [Link] # Series column data types
# --- simple - default integer indexes
b = [Link] # True for empty DataFrame
df = DataFrame([Link](50,5))
i = [Link] # number of axes (it is 2)
t = [Link] # (row-count, column-count)
# --- with a time-stamp row index:
i = [Link] # row-count * column-count
df = DataFrame([Link](500,5))
a = [Link] # get a numpy array for df
[Link] = pd.date_range('1/1/2005',
periods=len(df), freq='M')
DataFrame utility methods
# --- with alphabetic row and col indexes df = [Link]() # copy a DataFrame
# and a "groupable" variable df = [Link]() # rank each col (default)
import string df = df.sort_values(by=col)
import random df = df.sort_values(by=[col1, col2])
r = 52 # note: min r is 1; max r is 52 df = df.sort_index()
c = 5 df = [Link](dtype) # type conversion
df = DataFrame([Link](r, c),
columns = ['col'+str(i) for i in DataFrame iteration methods
range(c)], [Link]()# (col-index, Series) pairs
index = list((string. ascii_uppercase+ [Link]() # (row-index, Series) pairs
string.ascii_lowercase)[0:r]))
df['group'] = list( # example ... iterating over columns
''.join([Link]('abcde') for (name, series) in [Link]():
for _ in range(r)) ) print('Col name: ' + str(name))
print('First value: ' +
str([Link][0]) + '\n')
Apply numpy mathematical functions to columns Test if column index values are unique/monotonic
df['log_data'] = [Link](df['col1']) if [Link].is_unique: pass # ...
Note: many many more numpy math functions b = [Link].is_monotonic_increasing
Hint: Prefer pandas math over numpy where you can. b = [Link].is_monotonic_decreasing
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
3
Select a slice of rows by label/index
Working with rows [inclusive-from : inclusive–to [ : step]]
df = df['a':'c'] # rows 'a' through 'c'
Get the row index and labels Trap: doesn't work on integer labelled rows
idx = [Link] # get row index
label = [Link][0] # first row label Append a row of column totals to a DataFrame
label = [Link][-1] # last row label # Option 1: use dictionary comprehension
l = [Link]() # get as a list sums = {col: df[col].sum() for col in df}
a = [Link] # get as an array sums_df = DataFrame(sums,index=['Total'])
df = [Link](sums_df)
Change the (row) index
[Link] = idx # new ad hoc index # Option 2: All done with pandas
df = df.set_index('A')# col A new index df = [Link](DataFrame([Link](),
df = df.set_index(['A', 'B'])# MultiIndex columns=['Total']).T)
df = df.reset_index() # replace old w new
# note: old index stored as a col in df Iterating over DataFrame rows
[Link] = range(len(df)) # set with list for (index, row) in [Link](): # pass
df = [Link](index=range(len(df))) Trap: row data type may be coerced.
df = df.set_index(keys=['r1','r2','etc'])
[Link](index={'old':'new'}, Sorting DataFrame rows values
inplace=True) df = [Link]([Link][0],
ascending=False)
Adding rows [Link](['col1', 'col2'], inplace=True)
df = original_df.append(more_rows_in_df)
Hint: convert row to a DataFrame and then append. Sort DataFrame by its row index
Both DataFrames should have same column labels. df.sort_index(inplace=True) # sort by row
df = df.sort_index(ascending=False)
Dropping rows (by name)
df = [Link]('row_label') Random selection of rows
df = [Link](['row1','row2']) # multi-row import random as r
k = 20 # pick a number
Boolean row selection by values in a column selection = [Link](range(len(df)), k)
df = df[df['col2'] >= 0.0] df_sample = [Link][selection, :]
df = df[(df['col3']>=1.0) | Note: this sample is not sorted
(df['col1']<0.0)]
df = df[df['col'].isin([1,2,5,7,11])] Drop duplicates in the row index
df = df[~df['col'].isin([1,2,5,7,11])] df['index'] = [Link] # 1 create new col
df = df[df['col'].[Link]('hello')] df = df.drop_duplicates(cols='index',
Trap: bitwise "or", "and" “not; (ie. | & ~) co-opted to be take_last=True)# 2 use new col
Boolean operators on a Series of Boolean del df['index'] # 3 del the col
Trap: need parentheses around comparisons. df.sort_index(inplace=True)# 4 tidy up
Selecting rows using isin over multiple columns Test if two DataFrames have same row index
# fake up some data len(a)==len(b) and all([Link]==[Link])
data = {1:[1,2,3], 2:[1,4,9], 3:[1,8,27]}
df = DataFrame(data) Get the integer position of a row or col index label
i = [Link].get_loc('row_label')
# multi-column isin
lf = {1:[1, 3], 3:[8, 27]} # look for Trap: index.get_loc() returns an integer for a unique
f = df[df[list(lf)].isin(lf).all(axis=1)] match. If not a unique match, may return a slice/mask.
Selecting rows using an index Get integer position of rows that meet condition
idx = df[df['col'] >= 2].index a = [Link](df['col'] >= 2) #numpy array
print([Link][idx])
Test if the row index values are unique/monotonic
Select a slice of rows by integer position if [Link].is_unique: pass # ...
[inclusive-from : exclusive-to [: step]] b = [Link].is_monotonic_increasing
default start is 0; default end is len(df) b = [Link].is_monotonic_decreasing
df = df[:] # copy DataFrame
df = df[0:2] # rows 0 and 1 Find row index duplicates
df = df[-1:] # the last row if [Link].has_duplicates:
df = df[2:3] # row 2 (the third row) print([Link]())
df = df[:-1] # all but the last row Note: also similar for column label duplicates.
df = df[::2] # every 2nd row (0 2 ..)
Trap: a single integer without a colon is a column label
for integer numbered columns.
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
4
Working with cells Summary: selecting using the DataFrame index
Selecting a cell by row and column labels Using the DataFrame index to select columns
value = [Link]['row', 'col'] s = df['col_label'] # returns Series
value = [Link]['row', 'col'] df = df[['col_label']]# return DataFrame
value = df['col'].at['row'] # tricky df = df[['L1', 'L2']] # select with list
Note: .at[] fastest label based scalar lookup df = df[index] # select with index
df = df[s] #select with Series
Setting a cell by row and column labels Note: the difference in return type with the first two
[Link]['row', 'col'] = value examples above based on argument type (scalar vs list).
[Link]['row', 'col'] = value
df['col'].at['row'] = value # tricky Using the DataFrame index to select rows
df = df['from':'inc_to']# label slice
Selecting and slicing on labels df = df[3:7] # integer slice
df = [Link]['row1':'row3', 'col1':'col3'] df = df[df['col'] > 0.5]# Boolean Series
df = [Link]['label'] # single label
Note: the "to" on this slice is inclusive.
df = [Link][container] # lab list/Series
df = [Link]['from':'to']# inclusive slice
Setting a cross-section by labels df = [Link][bs] # Boolean Series
[Link]['A':'C', 'col1':'col3'] = [Link] df = [Link][0] # single integer
[Link][1:2,'col1':'col2']=[Link]((2,2)) df = [Link][container] # int list/Series
[Link][1:2,'A':'C']=[Link][1:2,'A':'C'] df = [Link][0:5] # exclusive slice
Remember: inclusive "to" in the slice df = [Link][x] # loc then iloc
Selecting a cell by integer position Using the DataFrame index to select a cross-section
value = [Link][9, 3] # [row, col] # r and c can be scalar, list, slice
value = [Link][0, 0] # [row, col] [Link][r, c] # label accessor (row, col)
value = [Link][len(df)-1, [Link][r, c]# integer accessor
len([Link])-1] [Link][r, c] # label access int fallback
df[c].iloc[r]# chained – also for .loc
Selecting a range of cells by int position
df = [Link][2:4, 2:4] # subset of the df Using the DataFrame index to select a cell
df = [Link][:5, :5] # top left corner # r and c must be label or integer
s = [Link][5, :] # returns row as Series [Link][r, c] # fast scalar label accessor
df = [Link][5:6, :] # returns row as row [Link][r, c] # fast scalar int accessor
Note: exclusive "to" – same as python list slicing. df[c].iat[r] # chained – also for .at
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
5
Grouping
Joining/Combining DataFrames gb = [Link]('cat') # by one columns
gb = [Link](['c1','c2']) # by 2 cols
Three ways to join two DataFrames: gb = [Link](level=0) # multi-index gb
gb = [Link](level=['a','b']) # mi gb
• merge (a database/SQL-like join operation)
print([Link])
• concat (stack side by side or one on top of the other)
Note: groupby() returns a pandas groupby object
• combine_first (splice the two together, choosing
Note: the groupby object attribute .groups contains a
values from one over the other)
dictionary mapping of the groups.
Trap: NaN values in the group key are automatically
Merge on indexes
dropped – there will never be a NA group.
df_new = [Link](left=df1, right=df2,
how='outer', left_index=True,
Iterating groups – usually not needed
right_index=True)
for name, group in gb:
How: 'left', 'right', 'outer', 'inner'
print (name)
How: outer=union/all; inner=intersection print (group)
Merge on columns
Selecting a group
df_new = [Link](left=df1, right=df2,
dfa = [Link]('cat').get_group('a')
how='left', left_on='col1',
dfb = [Link]('cat').get_group('b')
right_on='col2')
Trap: When joining on columns, the indexes on the
passed DataFrames are ignored. Applying an aggregating function
Trap: many-to-many merges on a column can result in # apply to a column ...
an explosion of associated data. s = [Link]('cat')['col1'].sum()
s = [Link]('cat')['col1'].agg([Link])
# apply to the every column in DataFrame
Join on indexes (another way of merging)
s = [Link]('cat').agg([Link])
df_new = [Link](other=df2, on='col1', df_summary = [Link]('cat').describe()
how='outer') df_row_1s = [Link]('cat').head(1)
df_new = [Link](other=df2,on=['a','b'],
how='outer') Note: aggregating functions reduce the dimension by
one – they include: mean, sum, size, count, std, var,
Note: [Link]() joins on indexes by default.
sem, describe, first, last, min, max
[Link]() joins on common columns by
default.
Applying multiple aggregating functions
gb = [Link]('cat')
Simple concatenation is often the best
df=[Link]([df1,df2],axis=0)#top/bottom # apply multiple functions to one column
df = [Link]([df2, df3]) #top/bottom dfx = gb['col2'].agg([[Link], [Link]])
df=[Link]([df1,df2],axis=1)#left/right # apply to multiple fns to multiple cols
Trap: can end up with duplicate rows or cols dfy = [Link]({
Note: concat has an ignore_index parameter 'cat': np.count_nonzero,
'col1': [[Link], [Link], [Link]],
Combine_first 'col2': [[Link], [Link]]
df = df1.combine_first(other=df2) })
Note: gb['col2'] above is shorthand for
# multi-combine with python reduce() [Link]('cat')['col2'], without the need for regrouping.
df = reduce(lambda x, y:
x.combine_first(y), Transforming functions
[df1, df2, df3, df4, df5]) # transform to group z-scores, which have
Uses the non-null values from df1. The index of the # a group mean of 0, and a std dev of 1.
combined DataFrame will be the union of the indexes zscore = lambda x: ([Link]())/[Link]()
from df1 and df2. dfz = [Link]('cat').transform(zscore)
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
6
Group by a row index (non-hierarchical index)
df = df.set_index(keys='cat') Working with dates, times and their indexes
s = [Link](level=0)['col1'].sum()
dfg = [Link](level=0).sum() Dates and time – points and spans
With its focus on time-series data, pandas has a suite of
tools for managing dates and time: either as a point in
time (a Timestamp) or as a span of time (a Period).
Pivot Tables: working with long and wide data
t = [Link]('2013-01-01')
t = [Link]('2013-01-01 [Link]')
These features work with and often create t = [Link]('2013-01-01 [Link].7')
hierarchical or multi-level Indexes; p = [Link]('2013-01-01', freq='M')
(the pandas MultiIndex is powerful and complex). Note: Timestamps should be in range 1678 and 2261
years. (Check [Link] and [Link]).
Pivot, unstack, stack and melt
Pivot tables move from long format to wide format data A Series of Timestamps or Periods
# Let's start with data in long format ts = ['2015-04-01 [Link]',
from StringIO import StringIO # python2.7 '2014-04-02 [Link]']
#from io import StringIO # python 3
data = """Date,Pollster,State,Party,Est # Series of Timestamps (good)
13/03/2014, Newspoll, NSW, red, 25 s = pd.to_datetime([Link](ts))
13/03/2014, Newspoll, NSW, blue, 28
13/03/2014, Newspoll, Vic, red, 24 # Series of Periods (often not so good)
13/03/2014, Newspoll, Vic, blue, 23 s = [Link]( [[Link](x, freq='M')
13/03/2014, Galaxy, NSW, red, 23 for x in ts] )
13/03/2014, Galaxy, NSW, blue, 24 s = [Link](
13/03/2014, Galaxy, Vic, red, 26 [Link](ts,freq='S'))
13/03/2014, Galaxy, Vic, blue, 25 Note: While Periods make a very useful index; they may
13/03/2014, Galaxy, Qld, red, 21 be less useful in a Series.
13/03/2014, Galaxy, Qld, blue, 27"""
df = pd.read_csv(StringIO(data), From non-standard strings to Timestamps
header=0, skipinitialspace=True) t = ['[Link].7654-JAN092002',
'[Link].6589-FEB082016']
# pivot to wide format on 'Party' column s = [Link](pd.to_datetime(t,
# 1st: set up a MultiIndex for other cols format="%H:%M:%S.%f-%b%d%Y"))
df1 = df.set_index(['Date', 'Pollster', Also: %B = full month name; %m = numeric month;
'State']) %y = year without century; and more …
# 2nd: do the pivot
wide1 = [Link](columns='Party') Dates and time – stamps and spans as indexes
An index of Timestamps is a DatetimeIndex.
# unstack to wide format on State / Party
An index of Periods is a PeriodIndex.
# 1st: MultiIndex all but the Values col
df2 = df.set_index(['Date', 'Pollster', date_strs = ['2014-01-01', '2014-04-01',
'State', 'Party']) '2014-07-01', '2014-10-01']
# 2nd: unstack a column to go wide on it
wide2 = [Link]('State') dti = [Link](date_strs)
wide3 = [Link]() # pop last index
pid = [Link](date_strs, freq='D')
# Use stack() to get back to long format pim = [Link](date_strs, freq='M')
long1 = [Link]() piq = [Link](date_strs, freq='Q')
# Then use reset_index() to remove the
# MultiIndex. print (pid[1] - pid[0]) # 90 days
long2 = long1.reset_index() print (pim[1] - pim[0]) # 3 months
print (piq[1] - piq[0]) # 1 quarter
# Or melt() back to long format
# 1st: flatten the column index time_strs = ['2015-01-01 [Link].12345',
[Link] = ['_'.join(col).strip() '2015-01-01 [Link].67890']
for col in [Link]] pis = [Link](time_strs, freq='U')
# 2nd: remove the MultiIndex
wdf = wide1.reset_index() [Link] = pd.period_range('2015-01',
# 3rd: melt away periods=len(df), freq='M')
long3 = [Link](wdf, value_vars=
['Est_blue', 'Est_red'], dti = pd.to_datetime(['04-01-2012'],
var_name='Party', id_vars=['Date', dayfirst=True) # Australian date format
'Pollster', 'State']) pi = pd.period_range('1960-01-01',
'2015-12-31', freq='M')
Note: See documentation, there are many arguments to
these methods. Hint: unless you are working in less than seconds,
prefer PeriodIndex over DateTimeImdex.
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
7
Period frequency constants (not a complete list) Upsampling and downsampling
Name Description # upsample from quarterly to monthly
U Microsecond pi = pd.period_range('1960Q1',
L Millisecond periods=220, freq='Q')
S Second df = DataFrame([Link](len(pi),5),
index=pi)
T Minute
dfm = [Link]('M', convention='end')
H Hour # use ffill or bfill to fill with values
D Calendar day
B Business day # downsample from monthly to quarterly
W-{MON, TUE, …} Week ending on … dfq = [Link]('Q', how='sum')
MS Calendar start of month
M Calendar end of month Time zones
QS-{JAN, FEB, …} Quarter start with year starting t = ['2015-06-30 [Link]',
(QS – December) '2015-12-31 [Link]']
Q-{JAN, FEB, …} Quarter end with year ending (Q dti = pd.to_datetime(t
– December) ).tz_localize('Australia/Canberra')
AS-{JAN, FEB, …} Year start (AS - December) dti = dti.tz_convert('UTC')
ts = [Link]('now',
A-{JAN, FEB, …} Year end (A - December) tz='Europe/London')
From DatetimeIndex to Python datetime objects # get a list of all time zones
dti = [Link](pd.date_range( import pyzt
start='1/1/2011', periods=4, freq='M')) for tz in pytz.all_timezones:
s = Series([1,2,3,4], index=dti) print tz
na = dti.to_pydatetime() #numpy array Note: by default, Timestamps are created without time
na = [Link].to_pydatetime() #numpy array zone information.
Frome Timestamps to Python dates or times Row selection with a time-series index
df['date'] = [[Link]() for x in df['TS']] # start with the play data above
df['time'] = [[Link]() for x in df['TS']] idx = pd.period_range('2015-01',
Note: converts to [Link] or [Link]. But periods=len(df), freq='M')
does not convert to [Link]. [Link] = idx
Line plot
df1 = [Link]()
ax = [Link]()
[Link](fig)
Density plot
ax = [Link]()
Box plot
ax = [Link](vert=False)
# followed by the same code as above
Scatter plot
ax = [Link](x='A', y='C')
ax = [Link](column='c1', by='c2')
Histogram
ax = df['A'].[Link](bins=20) Pie chart
s = [Link](data=[10, 20, 30],
index = ['dogs', 'cats', 'birds'])
ax = [Link](autopct='%.1f')
ax.set_title('Pie Chart')
ax.set_aspect(1) # make it round
ax.set_ylabel('') # remove default
fig = [Link]
fig.set_size_inches(8, 3)
Multiple histograms
[Link]('[Link]', dpi=125)
ax = [Link](bins=25, alpha=0.5)
ax = [Link](bins=25, stacked=True) [Link](fig)
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
9
Change the range plotted
ax.set_xlim([-5, 5])
lower, upper = ax.get_ylim()
ax.set_ylim([lower-1, upper+1])
Add a footnote
# after the fig.tight_layout(pad=1)
[Link](0.99, 0.01, 'Footnote',
A line and bar on the same chart ha='right', va='bottom',
In matplotlib, bar charts visualise categorical or discrete fontsize='x-small',
data. Line charts visualise continuous data. This makes fontstyle='italic', color='#999999')
it hard to get bars and lines on the same chart. Typically
combined charts either have too many labels, and/or the
lines and bars are misaligned or missing. You need to Working with missing and non-finite data
trick matplotlib a bit … pandas makes this tricking easier
# start with fake percentage growth data Working with missing data
s = [Link]([Link]( Pandas uses the not-a-number construct ([Link] and
1.02, 0.015, 40)) float('nan')) to indicate missing data. The Python None
s = [Link]() can arise in data as well. It is also treated as missing
dfg = ([Link]([s / [Link](1),
data; as is the pandas not-a-time construct
s / [Link](4)], axis=1) * 100) - 100
([Link]).
[Link] = ['Quarter', 'Annual']
[Link] = pd.period_range('2010-Q1',
periods=len(dfg), freq='Q') Missing data in a Series
s = Series( [8,None,float('nan'),[Link]])
# reindex with integers from 0; keep old #[8, NaN, NaN, NaN]
old = [Link] [Link]() #[False, True, True, True]
[Link] = range(len(dfg)) [Link]()#[True, False, False, False]
[Link](0)#[8, 0, 0, 0]
# plot the line from pandas
ax = dfg['Annual'].plot(color='blue', Missing data in a DataFrame
label='Year/Year Growth') df = [Link]() # drop all rows with NaN
df = [Link](axis=1) # same for cols
# plot the bars from pandas df=[Link](how='all') #drop all NaN row
dfg['Quarter'].[Link](ax=ax, df=[Link](thresh=2) # drop 2+ NaN in r
label='Q/Q Growth', width=0.8) # only drop row if NaN in a specified col
df = [Link](df['col'].notnull())
# relabel the x-axis more appropriately
ticks = [Link][(([Link]+0)%4)==0] Recoding missing data
labs = [Link](old[ticks]).astype(str)
[Link](0, inplace=True) # [Link] à 0
ax.set_xticks(ticks)
s = df['col'].fillna(0) # [Link] à 0
ax.set_xticklabels([Link]('Q',
df = [Link](r'\s+', [Link],
'\nQ'), rotation=0)
regex=True) # white space à [Link]
# fix the range of the x-axis … skip 1st
ax.set_xlim([0.5,len(dfg)-0.5]) Non-finite numbers
With floating point numbers, pandas provides for
# add the legend positive and negative infinity.
l=[Link](loc='best',fontsize='small') s = Series([float('inf'), float('-inf'),
[Link], -[Link]])
# finish off and plot in the usual manner Pandas treats integer comparisons with plus or minus
ax.set_title('Fake Growth Data') infinity as expected.
ax.set_xlabel('Quarter')
ax.set_ylabel('Per cent') Testing for finite numbers
(using the data from the previous example)
fig = [Link]
b = [Link](s)
fig.set_size_inches(8, 3)
fig.tight_layout(pad=1)
[Link]('[Link]', dpi=125)
[Link](fig)
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
10
Working with Categorical Data Working with strings
Renaming categories
s = Series(list('abc'), dtype='category')
[Link] = [1, 2, 3] # in place
s = [Link].rename_categories([4,5,6])
# using a comprehension ...
[Link] = ['Group ' + str(i)
for i in [Link]]
Trap: categories must be uniquely named
Removing categories
s = [Link].remove_categories([4])
[Link].remove_unused_categories() #inplace
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
11
Basic Statistics
Summary statistics
s = df['col1'].describe()
df1 = [Link]()
Value counts
s = df['col1'].value_counts()
Histogram binning
count, bins = [Link](df['col1'])
count, bins = [Link](df['col'],
bins=5)
count, bins = [Link](df['col1'],
bins=[-3,-2,-1,0,1,2,3,4])
Regression
import [Link] as sm
result = [Link](formula="col1 ~ col2 +
col3", data=df).fit()
print ([Link])
print ([Link]())
Cautionary note
Version 15 January 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
12