Skip to content

Releases: facebookresearch/balance

0.18.0 (2026-03-24)

24 Mar 19:58

Choose a tag to compare

New Features

  • Implemented r_indicator() with validated sample-variance formula
    • Added a public r_indicator(sample_p, target_p) implementation in
      weighted_comparisons_stats using the documented Eq. 2.2.2 formulation
      over concatenated propensity vectors and explicit input-size validation.
    • Added validation for non-finite and out-of-range propensity values,
      and expanded unit coverage for formula correctness and edge cases.
    • Added BalanceDFWeights.r_indicator() as a convenience wrapper, so
      sample.weights().r_indicator() computes the r-indicator directly.

Deprecations

  • Sample.design_effect() is deprecated — use sample.weights().design_effect() instead.
    The method already exists on BalanceDFWeights; the Sample method now emits a
    DeprecationWarning and delegates. Will be removed in balance 0.19.0.
  • Sample.design_effect_prop() is deprecated — use sample.weights().design_effect_prop() instead.
    New method added to BalanceDFWeights. Will be removed in balance 0.19.0.
  • Sample.plot_weight_density() is deprecated — use sample.weights().plot() instead.
    Will be removed in balance 0.19.0.
  • Sample.covar_means() is deprecated — use sample.covars().mean() instead
    (with .rename(index={'self': 'adjusted'}).reindex([...]).T for the same format).
    Will be removed in balance 0.19.0.
  • Sample.outcome_sd_prop() is deprecated — use sample.outcomes().outcome_sd_prop() instead.
    New method added to BalanceDFOutcomes. Will be removed in balance 0.19.0.
  • Sample.outcome_variance_ratio() is deprecated — use sample.outcomes().outcome_variance_ratio() instead.
    New method added to BalanceDFOutcomes. Will be removed in balance 0.19.0.

LLM/GenAI

  • Added CLAUDE.md project context files for Claude Code users, covering architecture,
    build/test instructions (Meta and open-source), code conventions, and pre-submit checklist.
  • Updated .github/copilot-instructions.md review checklist to reduce duplication with
    CLAUDE.md and add missing conventions (MIT license header, from __future__ import annotations,
    factory pattern, seed fixing, deprecation style).

Bug Fixes

  • prepare_marginal_dist_for_raking / _realize_dicts_of_proportions: fixed memory explosion from LCM expansion
    • When proportions had high decimal precision or many covariates were passed,
      the LCM of the individual per-variable array lengths could reach tens of
      millions (or more), causing OOM crashes.
    • Both functions now accept a max_length parameter (default 10000). When
      the natural LCM exceeds max_length, the output is capped at max_length
      rows and counts are allocated via the Hare-Niemeyer (largest remainder)
      method, which guarantees the total stays exactly max_length with minimal
      rounding error per category.
    • A warning is logged whenever the cap is applied.
    • A new internal helper _hare_niemeyer_allocation implements the allocation logic.

Contributors

@neuralsorcerer, @talgalili

Full Changelog: 0.17.0...0.18.0

0.17.0 (2026-03-17)

17 Mar 14:01

Choose a tag to compare

Breaking Changes

  • CLI: unmentioned columns now go to ignore_columns instead of outcome_columns
    • Previously, when --outcome_columns was not explicitly set, all columns that
      were not the id, weight, or a covariate were automatically classified as
      outcome columns. Now those columns are placed into ignore_columns instead.
    • Columns that are explicitly mentioned — the id column, weight column,
      covariate columns, and outcome columns — are not ignored.

New Features

  • ASCII comparative histogram and plot improvements
    • Added ascii_comparative_hist for comparing multiple distributions against a
      baseline using inline visual indicators (, , , ).
    • Comparative ASCII plots now order datasets as population → adjusted → sample.
    • ascii_plot_dist accepts a new comparative keyword (default True) to
      toggle between comparative and grouped-bar histograms for numeric variables.

Code Quality & Refactoring

  • Moved dataset loading implementations out of balance.datasets.__init__
    • Refactored load_sim_data, load_cbps_data, and load_data into
      balance.datasets.loading_data and re-exported them from
      balance.datasets to preserve the public API while keeping module
      responsibilities focused.

Documentation

  • ASCII plot documentation and tutorial examples
    • Added rendered text-plot examples to ASCII plot docstrings and documented
      library="balance" support. Updated balance_quickstart.ipynb with
      adjusted vs unadjusted ASCII plot examples.
  • Improved keep_columns documentation
    • Updated docstrings for has_keep_columns(), keep_columns(), and the
      --keep_columns argument to clarify that keep columns control which columns
      appear in the final output CSV. Keep columns that are not id, weight,
      covariate, or outcome columns will be placed into ignore_columns during
      processing but are still retained and available in the output.
  • Clarified _prepare_input_model_matrix argument docs
    • Updated docstrings in balance.utils.model_matrix with
      explicit descriptions for sample, target, variables, and add_na
      behavior when preparing model-matrix inputs.

Bug Fixes

  • Weight diagnostics now consistently accept DataFrame inputs
    • design_effect, nonparametric_skew, prop_above_and_below, and
      weighted_median_breakdown_point now explicitly normalize DataFrame inputs
      to their first column before computation, matching validation behavior and
      returning scalar/Series outputs consistently.
  • Model-matrix robustness improvements
    • _make_df_column_names_unique() now avoids suffix collisions when columns
      like a, a_1, and repeated a names appear together, renaming
      duplicates deterministically to prevent downstream clashes.
    • _prepare_input_model_matrix() now raises a deterministic ValueError
      when the input sample has zero rows, instead of relying on an assertion.
  • Stabilized prop_above_and_below() return paths
    • prop_above_and_below() now builds concatenated outputs only from present
      Series objects and returns None when both below and above are None,
      avoiding ambiguous concat inputs while preserving existing behavior for valid
      threshold sets.
  • Validated and normalized comma-separated CLI column arguments
    • CLI column-list arguments now trim surrounding whitespace and reject empty
      entries (for example, "id,,weight") with clear ValueError messages,
      preventing malformed column specifications from silently propagating.
    • Applied to --covariate_columns, --covariate_columns_for_diagnostics,
      --batch_columns, --keep_columns, and --outcome_columns parsing.

Tests

  • Added end-to-end adjustment test with ASCII plot output and expanded ASCII plot edge-case coverage
    • TestAsciiPlotsAdjustmentEndToEnd runs the full adjustment pipeline and
      asserts exact expected ASCII output. Added tests for ascii_plot_dist with
      comparative=False and mixed categorical+numeric routing.
  • Expanded warning coverage for Sample.from_frame() ID inference
    • Added assertions that validate all three expected warnings are emitted when inferring an id column and default weights, including ID guessing, ID string casting, and automatic weight creation.
  • Expanded IPW helper and diagnostics test coverage
    • Added tests for link_transform() and calc_dev() to validate behavior
      for extreme probabilities and finite 10-fold deviance summaries.
    • Refactored diagnostics tests to use a shared IPW setup helper, added
      edge-case assertions for solver/penalty values, NaN coercion of non-scalar
      inputs, and now assert labels match fitted model parameters.
  • Expanded prop_above_and_below() edge-case coverage
    • Added focused tests for empty threshold iterables, mixed None threshold groups in dict mode, and explicit all-None threshold handling across return formats.
  • Added unit coverage for CLI I/O and empty-batch handling
    • Added focused tests for BalanceCLI.process_batch() empty-sample failure payloads, load_and_check_input() CSV loading paths, and write_outputs() delimiter-aware output writing for both adjusted and diagnostics files.

Contributors

@sahil350 , @neuralsorcerer, @talgalili

Full Changelog

0.16.0...0.17.0

0.16.0 (2026-02-09)

09 Feb 15:23

Choose a tag to compare

New Features

  • Outcome weight impact diagnostics
    • Added paired outcome-weight impact tests (y*w0 vs y*w1) with confidence intervals.
    • Exposed in BalanceDFOutcomes, Sample.diagnostics(), and the CLI via
      --weights_impact_on_outcome_method.
  • Pandas 3 support
    • Updated compatibility and tests for pandas 3.x
  • Categorical distribution metrics without one-hot encoding
    • KLD/EMD/CVMD/KS on BalanceDF.covars() now operate on raw categorical variables
      (with NA indicators) instead of one-hot encoded columns.
  • Misc
    • Raw-covariate adjustment for custom models
      • Sample.adjust() now supports fitting models on raw covariates (without a model matrix)
        for IPW via use_model_matrix=False. String, object, and boolean columns are converted
        to pandas Categorical dtype, allowing sklearn estimators with native categorical
        support (e.g., HistGradientBoostingClassifier with categorical_features="from_dtype")
        to handle them correctly. Requires scikit-learn >= 1.4 when categorical columns are
        present.
    • Validate weights include positive values
      • Added a guard in weight diagnostics to error when all weights are zero.
    • Support configurable ID column candidates
      • Sample.from_frame() and guess_id_column() now accept candidate ID column names
        when auto-detecting the ID column.
    • Formula support for BalanceDF model matrices
      • BalanceDF.model_matrix() now accepts a formula argument to build
        custom model matrices without precomputing them manually.

Bug Fixes

  • Removed deprecated setup build
    • Replaced deprecated setup.py with pyproject.toml build in CI to avoid build failure.
  • Hardened ID column candidate validation
    • guess_id_column() now ignores duplicate candidate names and validates that candidates are non-empty strings.
  • Hardened pandas 3 compatibility paths
    • Updated string/NA handling and discrete checks for pandas 3 dtypes, and refreshed tests to accept string-backed dtypes.

Packaging & Tests

  • Pandas 3.x compatibility
    • Expanded the pandas dependency range to allow pandas 3.x releases.
  • Direct util imports in tests
    • Refactored util test modules to import helpers directly from their modules instead of via balance_util.

Breaking Changes

  • Require positive weights for weight diagnostics that normalize or aggregate
    • design_effect, nonparametric_skew, prop_above_and_below, and
      weighted_median_breakdown_point now raise a ValueError when all weights
      are zero.
    • Migration: ensure your weights include at least one positive value
      before calling these diagnostics, or catch the ValueError if all-zero
      weights are possible in your workflow.

Contributors

@neuralsorcerer, @talgalili (with code/methodological review by @talsarig)

Full Changelog: 0.15.0...0.16.0

0.15.0 (2026-01-20)

20 Jan 10:51

Choose a tag to compare

New Features

  • Added EMD/CVMD/KS distribution diagnostics
    • BalanceDF now exposes Earth Mover's Distance (EMD), Cramér-von Mises distance (CVMD), and Kolmogorov-Smirnov (KS) statistics for comparing adjusted samples to targets.
    • These diagnostics support weighted or unweighted comparisons, apply discrete/continuous formulations, and respect aggregate_by_main_covar for one-hot categorical aggregation.
  • Exposed outcome columns selection in the CLI
    • Added --outcome_columns to choose which columns are treated as outcomes
      instead of defaulting to all non-id/weight/covariate columns. Remaining columns are moved to ignored_columns.
  • Improved missing data handling in poststratify()
    • poststratify() now accepts na_action to either drop rows with missing
      values or treat missing values as their own category during weighting.
    • Breaking change: the default behavior now fills missing values in
      poststratification variables with "__NaN__" and treats this as a distinct
      category during weighting. Previously, missing values were not handled
      explicitly, and their treatment depended on pandas groupby and merge
      defaults. To approximate the legacy behavior where missing values do not
      form their own category, pass na_action="drop" explicitly.
  • Added formula support for descriptive_stats model matrices
    • descriptive_stats() now accepts a formula argument that is always
      applied to the data (including numeric-only frames), letting callers
      control which terms and dummy variables are included in summary statistics.

Documentation

Code Quality & Refactoring

  • Added warning when the sample size of 'target' is much larger than 'sample' sample size
    • Sample.adjust() now warns when the target exceeds 100k rows and is at
      least 10x larger than the sample, highlighting that uncertainty is
      dominated by the sample (akin to a one-sample comparison).
  • Split util helpers into focused modules
    • Broke balance.util into balance.utils submodules for easier navigation.

Bug Fixes

  • Updated Sample.__str__() to format weight diagnostics like Sample.summary()
    • Weight diagnostics (design effect, effective sample size proportion, effective sample size)
      are now displayed on separate lines instead of comma-separated on one line.
    • Replaced "eff." abbreviations with full "effective" word for better readability.
    • Improves consistency with Sample.summary() output format.
  • Numerically stable CBPS probabilities
    • The CBPS helper now uses a stable logistic transform to avoid exponential
      overflow warnings during probability computation in constraint checks.
  • Silenced pandas observed default warning
    • Explicitly sets observed=False in weighted categorical KLD calculations
      to retain current behavior and avoid future pandas default changes.
  • Fixed plot_qq_categorical to respect the weighted parameter for target data
    • Previously, the target weights were always applied regardless of the
      weighted=False setting, causing inconsistent behavior between sample
      and target proportions in categorical QQ plots.
  • Restored CBPS tutorial plots
  • Clearer validation errors in adjustment helpers
    • trim_weights() now accepts list/tuple inputs and reports invalid types explicitly.
    • apply_transformations() raises clearer errors for invalid inputs and empty transformations.
  • Fixed model_matrix to drop NA rows when requested
    • model_matrix(add_na=False) now actually drops rows containing NA values while preserving categorical levels, matching the documented behavior.
    • Previously, add_na=False only logged a warning without dropping rows; code relying on the old behavior may now see fewer rows and should either handle missingness explicitly or use add_na=True.

Tests

  • Aligned formatting toolchain between Meta internal and GitHub CI
    • Added ["fbcode/core_stats/balance"] override to Meta's internal tools/lint/pyfmt/config.toml to use formatter = "black" and sorter = "usort".
    • This ensures both internal (pyfmt/arc lint) and external (GitHub Actions) environments use the same Black 25.1.0 formatter, eliminating formatting drift.
    • Updated CI workflow, pre-commit config, and requirements-fmt.txt to use black==25.1.0.
  • Added Pyre type checking to GitHub Actions via .pyre_configuration.external and a new pyre job in the workflow. Tests are excluded due to external typeshed stub differences; library code is fully type-checked.
  • Added test coverage workflow and badge to README via .github/workflows/coverage.yml. The workflow collects coverage using pytest-cov, generates HTML and XML reports, uploads them as artifacts, and displays coverage metrics. A coverage badge is now shown in README.md alongside other workflow badges.
  • Improved test coverage for edge cases and error handling paths
    • Added targeted tests for previously uncovered code paths across the library, addressing edge cases including empty inputs, verbose logging, error handling for invalid parameters, and boundary conditions in weighting methods (IPW, CBPS, rake).
    • Tests exercise defensive code paths that handle empty DataFrames, NaN convergence values, invalid model types, and non-convergence warnings.
  • Split test_util.py into focused test modules
    • Split the large test_util.py file (2325 lines) into 5 modular test files that mirror the balance/utils/ structure:
      • test_util_data_transformation.py - Tests for data transformation utilities
      • test_util_input_validation.py - Tests for input validation utilities
      • test_util_model_matrix.py - Tests for model matrix utilities
      • test_util_pandas_utils.py - Tests for pandas utilities (including high cardinality warnings)
      • test_util_logging_utils.py - Tests for logging utilities
    • This improves test organization and makes it easier to locate tests for specific utilities.

Contributors

@neuralsorcerer, @talgalili

Full Changelog: 0.14.0...0.15.0

0.14.0 (2025-12-14)

14 Dec 09:31

Choose a tag to compare

New Features

  • Enhanced adjusted sample summary output
    • Sample.__str__() now displays adjustment details (method, trimming
      parameters, design effect, effective sample size) when printing adjusted
      samples (#194,
      #57).
  • Richer Sample.summary() diagnostics
    • Adjusted sample summary now groups covariate diagnostics, reports design
      effect alongside ESSP/ESS, and surfaces weighted outcome means when
      available.
  • Warning of high-cardinality categorical features in .adjust()
    • Categorical features where ≥80% of values are unique are flagged before
      weight fitting to help identify problematic columns like user IDs
      (#195,
      #65).
  • Ignored column handling for Sample inputs
    • Sample.from_frame accepts ignore_columns for columns that should remain
      on the dataframe but be excluded from covariates and outcome statistics.
      Ignored columns appear in Sample.df and can be retrieved via
      Sample.ignored_columns().

Code Quality & Refactoring

  • Consolidated diagnostics helpers
    • Added _concat_metric_val_var() helper and balance.util._coerce_scalar
      for robust diagnostics row construction and scalar-to-float conversion.
    • Breaking change: Sample.diagnostics() for IPW now always emits
      iteration/intercept summaries plus hyperparameter settings.

Bug Fixes

  • Early validation of null weight inputs
    • Sample.from_frame now raises ValueError when weights contain None,
      NaN, or pd.NA values with count and preview of affected rows.
  • Percentile weight trimming across platforms
    • trim_weights() now computes thresholds via percentile quantiles with
      explicit clipping bounds for consistent behavior across Python/NumPy
      versions.
    • Breaking change: percentile-based clipping may shift by roughly one
      observation at typical limits.
  • IPW diagnostics improvements
    • Fixed multi_class reporting, normalized scalar hyperparameters to floats,
      removed deprecated penalty argument warnings, and deduplicated metric
      entries for stable counts across sklearn versions.

Tests

  • Added Windows and macOS CI testing support
    • Expanded GitHub Actions to run on ubuntu-latest, macos-latest, and
      windows-latest for Python 3.9-3.14.
    • Added tempfile_path() context manager for cross-platform temp file
      handling and configured matplotlib Agg backend via conftest.py.

Contributors

@neuralsorcerer, @talgalili, @wesleytlee

Full Changelog

0.13.0...0.14.0

0.13.0 (2025-12-02)

02 Dec 16:49

Choose a tag to compare

New Features

  • Propensity modeling beyond static logistic regression
    • ipw() now accepts any sklearn classifier via the model argument,
      enabling the use of models like random forests and gradient boosting while
      preserving all existing trimming and diagnostic features. Dense-only
      estimators and models without linear coefficients are fully supported.
      Propensity probabilities are stabilized to avoid numerical issues.
    • Allow customization of logistic regression by passing a configured
      :class:~sklearn.linear_model.LogisticRegression instance through the
      model argument. Also, the CLI now accepts
      --ipw_logistic_regression_kwargs JSON to build that estimator directly for
      command-line workflows.
  • Covariate diagnostics
    • Added KL divergence calculations for covariate comparisons (numeric and
      one-hot categorical), exposed via BalanceDF.kld() alongside linked-sample
      aggregation support.
  • Weighting Methods
    • rake() and poststratify() now honour weight_trimming_mean_ratio and
      weight_trimming_percentile, trimming and renormalising weights through the
      enhanced trim_weights(..., target_sum_weights=...) API so the documented
      parameters work as expected
      (#147).

Documentation

  • Added comprehensive post-stratification tutorial notebook
    (balance_quickstart_poststratify.ipynb)
    (#141,
    #142,
    #143).
  • Expanded poststratify docstring with clear examples and improved statistical
    methods documentation
    (#141).
  • Added project badges to README for build status, Python version support, and
    release tracking
    (#145).
  • Added IPW quickstart tutorial showcasing default logistic regression and
    custom sklearn classifier usage in (balance_quickstart.ipynb).
  • Shorten the welcome message (for when importing the package).

Code Quality & Refactoring

  • Raking algorithm refactor

    • Removed ipfn dependency and replaced with a vectorized NumPy
      implementation (_run_ipf_numpy) for iterative proportional fitting,
      resulting in significant performance improvements and eliminating external
      dependency (#135).
  • IPW method refactoring

    • Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code
      patterns into reusable helper functions: _compute_deviance(),
      _compute_proportion_deviance(), _convert_to_dense_array().
    • Removed manual ASMD improvement calculation and now uses existing
      compute_asmd_improvement() from weighted_comparisons_stats.py
  • Type safety improvements

    • Migrated 32 Python files from # pyre-unsafe to # pyre-strict mode,
      covering core modules, statistics, weighting methods, datasets, and test
      files
    • Modernized type hints to PEP 604 syntax (X | Y instead of Union[X, Y])
      across 11 files for improved readability and Python 3.10+ alignment
    • Type alias definitions in typing.py retain Union syntax for Python 3.9
      compatibility
    • Enhanced plotting function type safety with TypedDict definitions and
      proper type narrowing
    • Replaced assert-based type narrowing with _verify_value_type() helper for
      better error messages and pyre-strict compliance
  • Renamed BalanceDF to BalanceDF****

    • BalanceCovarsDF to BalanceDFCovars
    • BalanceOutcomesDF to BalanceDFOutcomes
    • BalanceWeightsDF to BalanceDFWeights

Bug Fixes

  • Utility Functions
    • Fixed quantize() to preserve column ordering and use proper TypeError
      exceptions (#133)
  • Statistical Functions
    • Fixed division by zero in asmd_improvement() when asmd_mean_before is
      zero, now returns 0.0 for 0% improvement
  • CLI & Infrastructure
    • Replaced deprecated argparse FileType with pathlib.Path
      (#134)
  • Weight Trimming
    • Fixed trim_weights() to consistently return pd.Series with
      dtype=np.float64 and preserve original index across both trimming methods
    • Fixed percentile-based winsorization edge case: _validate_limit() now
      automatically adjusts limits to prevent floating-point precision issues
      (#144)
    • Enhanced documentation for trim_weights() and _validate_limit() with
      clearer examples and explanations

Tests

  • Enhanced test coverage for weight trimming with
    test_trim_weights_return_type_consistency and 11 comprehensive tests for
    _validate_limit() covering edge cases, error conditions, and boundary
    conditions

Contributors

@neuralsorcerer, @talgalili, @wesleytlee

Full Changelog: 0.12.1...0.13.0

0.12.1 (2025-11-03)

03 Nov 09:30

Choose a tag to compare

New Features

  • Added a welcome message when importing the package.

Welcome to balance (Version 0.12.1)!
An open-source Python package for balancing biased data samples.

📖 Documentation: https://import-balance.org/
🛠️ Get Help / Report Issues: https://github.com/facebookresearch/balance/issues/
📄 Citation:
Sarig, T., Galili, T., & Eilat, R. (2023).
balance - a Python package for balancing biased data samples.
https://arxiv.org/abs/2307.06024

Tip: You can access this information at any time with balance.help()

Documentation

Bug Fixes

Contributors

@talgalili, @wesleytlee

Full Changelog: 0.12.0...0.12.1

0.12.0 (2025-10-14)

15 Oct 08:02

Choose a tag to compare

New Features

  • Support for Python 3.13 + 3.14
    • Update setup.py and CI/CD integration to include Python 3.13 and 3.14.
    • Remove upper version constraints from numpy, pandas, scipy, and scikit-learn dependencies for Python 3.12+.

Contributors

@talgalili, @wesleytlee

Full Changelog: 0.11.0...0.12.0

0.11.0 (2025-09-24)

24 Sep 08:18

Choose a tag to compare

New Features

  • Python 3.12 support - Complete support for Python 3.12 alongside existing Python 3.9, 3.10, and 3.11 support (with CI/CD integration).
    • Implemented Python version-specific dependency constraints - Added conditional version ranges for numpy, pandas, scipy, and scikit-learn that vary based on Python version (e.g., numpy>=1.21.0,<2.0 for Python <3.12, numpy>=1.24.0,<2.1 for Python >=3.12)
    • Pandas compatibility improvements - Replaced value_counts(dropna=False) with groupby().size() in frequency table creation to avoid FutureWarning
    • Fixed various pandas deprecation warnings and improved DataFrame handling
  • Improved raking algorithm - Completely refactored rake weighting from DataFrame-based to array-based ipfn algorithm using multi-dimensional arrays and itertools for better performance and compatibility with latest Python versions. Variables are now automatically alphabetized to ensure consistent results regardless of input order.
  • poststratify method enhancement - New strict_matching parameter (default True) handles cases where sample cells are not present in target data. When False, issues warning and assigns weight 0 to uncovered samples

Bug Fixes

  • Type annotations - Enhanced Pyre type hints throughout the codebase, particularly in utility functions
  • Sample class improvements - Fixed weight type assignment (ensuring float64 type), improved DataFrame manipulation with .infer_objects(copy=False) for pandas compatibility, and enhanced weight setting logic
  • Website dependencies - Updated various website dependencies including Docusaurus and related packages

Tests

Comprehensive test refactoring, including:

  • Enhanced test validation - Added detailed explanations of test methodologies and expected behaviors in docstrings
  • Improved test coverage - Tests now include edge cases like NaN handling, different data types, and error conditions
  • Improved test organization (more granular) across all test modules (test_stats_and_plots.py, test_balancedf.py, test_ipw.py, test_rake.py, test_cli.py, test_weighted_comparisons_plots.py, test_cbps.py, test_testutil.py, test_adjustment.py, test_util.py, test_sample.py)
  • Updated GitHub workflows to include Python 3.12 in build and test matrix
  • Fix 261 "pandas deprecation" warnings!
  • Added type annotations - Converted test_balancedf.py to pyre-strict with.

Documentation

  • GitHub issue template for support questions - Added structured template to help users ask questions about using the balance package

Contributors

@talgalili, @wesleytlee

Full Changelog

0.10.0...0.11.0

0.10.0 (2025-01-06)

06 Jan 15:48

Choose a tag to compare

News

  • This version we transitioned ipw to use sklearn. This enables support for newer python versions as well as the Windows OS!
  • Updated Python and package compatibility. Balance is now compatible with Python 3.11, but no longer compatible with Python 3.8 due to typing errors. Balance is currently incompatible with Python 3.12 due to the removal of distutils.
  • Update license from GPL-v2 to the MIT license.

New Features

  • Dependency on glmnet has been removed, and the ipw method now uses sklearn.
  • ipw method uses logistic regression with L2-penalties instead of L1-penalties for computational reasons. The transition from glmnet to sklearn and use of L2-penalties will lead to slightly different generated weights compared to previous versions of Balance.
  • Unfortunately, the sklearn-based ipw method is generally slower than the previous version by 2-5x. Consider using the new arguments lambda_min, lambda_max, and num_lambdas for a more efficient search over the ipw penalization space.

Bug Fixes

Documentation

  • Added links to presentation given at ISA 2023.
  • Fixed misc typos.

Full Changelog

0.9.0...0.10.0

Contributors

@wesleytlee, @talgalili, @SarigT