Multiple histograms
- plot_utils.hist_multi(X, bins=10, fig=None, ax=None, figsize=None, dpi=100, nan_warning=False, showmeans=True, showmedians=False, vert=True, data_names=[], rot=45, name_ax_label=None, data_ax_label=None, sort_by=None, title=None, show_vals=True, show_pct_diff=False, baseline_data_index=0, legend_loc='best', show_counts_on_data_ax=True, **extra_kwargs)[source]
Generate multiple histograms, one for each data set within
X.- Parameters:
X (pandas.DataFrame, pandas.Series, numpy.ndarray, or dict) –
The data to be visualized. It can be of the following types:
- pandas.DataFrame:
Each column contains a set of data
- pandas.Series:
Contains only one set of data
- numpy.ndarray:
1D numpy array: only one set of data
2D numpy array: each column contains a set of data
Higher dimensional numpy array: not allowed
- dict:
Each key-value pair is one set of data
- list of lists:
Each sub-list is a data set
Note that the NaN values in the data are implicitly excluded.
bins (int or sequence or str) – If an integer is given, the whole range of data (i.e., all the numbers within
X) is divided intobinssegments. If sequence or str, they will be passed to thebinsargument ofmatplotlib.pyplot.hist().fig (matplotlib.figure.Figure or
None) – Figure object. If None, a new figure will be created.ax (matplotlib.axes._subplots.AxesSubplot or
None) – Axes object. If None, a new axes will be created.figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of
fig(if notNone) will override this parameter.dpi (float) – Figure resolution. The dpi of
fig(if notNone) will override this parameter.nan_warning (bool) – Whether to show a warning if there are NaN values in the data.
showmeans (bool) – Whether to show the mean values of each data group.
showmedians (bool) – Whether to show the median values of each data group.
vert (bool) – Whether to show the “base” of the histograms as vertical.
data_names (list<str>,
[], orNone) –The names of each data set, to be shown as the axis tick label of each data set. If
[]orNone, it will be determined automatically. IfXis a:- numpy.ndarray:
data_names = [‘data_0’, ‘data_1’, ‘data_2’, …]
- pandas.Series:
data_names = X.name
- pd.DataFrame:
data_names = list(X.columns)
- dict:
data_names = list(X.keys())
rot (float) – The rotation (in degrees) of the data_names when shown as the tick labels. If vert is False, rot has no effect.
name_ax_label (str) – The label of the “name axis”. (“Name axis” is the axis along which different violins are presented.)
data_ax_label (str) – The labels of the “data axis”. (“Data axis” is the axis along which the data values are presented.)
sort_by ({‘name’, ‘mean’, ‘median’,
None}) – Option to sort the different data groups inXin the violin plot.Nonemeans no sorting, keeping the violin plot order as provided; ‘mean’ and ‘median’ mean sorting the violins according to the mean/median values of each data group; ‘name’ means sorting the violins according to the names of the groups.title (str) – The title of the plot.
show_vals (bool) – Whether to show mean and/or median values along the mean/median bars. Only effective if
showmeansand/orshowmediansare turned on.show_pct_diff (bool) – Whether to show percent difference of mean and/or median values between different data sets. Only effective when
show_valsis set toTrue.baseline_data_index (int) – Which data set is considered the “baseline” when showing percent differences.
legend_loc (str) – The location specification for the legend.
show_counts_on_data_ax (bool) – Whether to show counts besides the histograms.
**extra_kwargs (dict) – Other keyword arguments to be passed to
matplotlib.pyplot.bar().
- Returns:
fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.
ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.