Positive rate
- plot_utils.positive_rate(categorical_array, two_classes_array, fig=None, ax=None, figsize=None, dpi=100, barh=True, top_n=None, dropna=False, xlabel=None, ylabel=None, show_stats=True)[source]
Calculate the proportions of the different categories in
categorical_arraythat fall into class “1” (orTrue) intwo_classes_array, and optionally show a figure.Also, a Pearson’s chi-squared test is performed to test the independence between
categorical_arrayandtwo_classes_array. The chi-squared statistics, p-value, and degree-of-freedom are returned.- Parameters:
categorical_array (list, numpy.ndarray, or pandas.Series) – An array of categorical values.
two_class_array (list, numpy.ndarray, or pandas.Series) – The target variable containing two classes. Each value in this parameter correspond to a value in
categorical_array(at the same index). It must have the same length ascategorical_array. The second unique value in this parameter will be considered as the positive class (for example, “True” in [True, False, True], or “3” in [1, 1, 3, 3, 1]).fig (matplotlib.figure.Figure or
None) – Figure object. If None, a new figure will be created.ax (matplotlib.axes._subplots.AxesSubplot or
None) – Axes object. If None, a new axes will be created.figsize ((float, float)) – Figure size in inches, as a tuple of two numbers. The figure size of
fig(if notNone) will override this parameter.dpi (float) – Figure resolution. The dpi of
fig(if notNone) will override this parameter.barh (bool) – Whether or not to show the bars as horizontal (otherwise, vertical).
top_n (int) – Only shows
top_ncategories (ranked by their positive rate) in the figure. Useful when there are too many categories. IfNone, show all categories.dropna (bool) – If
True, ignore entries (in both arrays) where there are missing values in at least one array. IfFalse, the missing values are treated as a new category: “N/A”.xlabel (str) – X axes label.
ylabel (str) – Y axes label.
show_stats (bool) – Whether or not to show the statistical test results (chi2 statistics and p-value) on the figure.
- Returns:
fig (matplotlib.figure.Figure) – The figure object being created or being passed into this function.
ax (matplotlib.axes._subplots.AxesSubplot) – The axes object being created or being passed into this function.
pos_rate (pandas.Series) – The positive rate of each categories in x
chi2_results (tuple<float>) – A tuple in the order of (chi2, p_value, degree_of_freedom)