Releases · facebookresearch/balance

@neuralsorcerer

New Features

Implemented r_indicator() with validated sample-variance formula
- Added a public r_indicator(sample_p, target_p) implementation in
  weighted_comparisons_stats using the documented Eq. 2.2.2 formulation
  over concatenated propensity vectors and explicit input-size validation.
- Added validation for non-finite and out-of-range propensity values,
  and expanded unit coverage for formula correctness and edge cases.
- Added BalanceDFWeights.r_indicator() as a convenience wrapper, so
  sample.weights().r_indicator() computes the r-indicator directly.

Deprecations

Sample.design_effect() is deprecated — use sample.weights().design_effect() instead.
The method already exists on BalanceDFWeights; the Sample method now emits a
DeprecationWarning and delegates. Will be removed in balance 0.19.0.
Sample.design_effect_prop() is deprecated — use sample.weights().design_effect_prop() instead.
New method added to BalanceDFWeights. Will be removed in balance 0.19.0.
Sample.plot_weight_density() is deprecated — use sample.weights().plot() instead.
Will be removed in balance 0.19.0.
Sample.covar_means() is deprecated — use sample.covars().mean() instead
(with .rename(index={'self': 'adjusted'}).reindex([...]).T for the same format).
Will be removed in balance 0.19.0.
Sample.outcome_sd_prop() is deprecated — use sample.outcomes().outcome_sd_prop() instead.
New method added to BalanceDFOutcomes. Will be removed in balance 0.19.0.
Sample.outcome_variance_ratio() is deprecated — use sample.outcomes().outcome_variance_ratio() instead.
New method added to BalanceDFOutcomes. Will be removed in balance 0.19.0.

LLM/GenAI

Added CLAUDE.md project context files for Claude Code users, covering architecture,
build/test instructions (Meta and open-source), code conventions, and pre-submit checklist.
Updated .github/copilot-instructions.md review checklist to reduce duplication with
CLAUDE.md and add missing conventions (MIT license header, from __future__ import annotations,
factory pattern, seed fixing, deprecation style).

Bug Fixes

prepare_marginal_dist_for_raking / _realize_dicts_of_proportions: fixed memory explosion from LCM expansion
- When proportions had high decimal precision or many covariates were passed,
  the LCM of the individual per-variable array lengths could reach tens of
  millions (or more), causing OOM crashes.
- Both functions now accept a max_length parameter (default 10000). When
  the natural LCM exceeds max_length, the output is capped at max_length
  rows and counts are allocated via the Hare-Niemeyer (largest remainder)
  method, which guarantees the total stays exactly max_length with minimal
  rounding error per category.
- A warning is logged whenever the cap is applied.
- A new internal helper _hare_niemeyer_allocation implements the allocation logic.

Contributors

@neuralsorcerer, @talgalili

Full Changelog: 0.17.0...0.18.0

@sahil350

Breaking Changes

CLI: unmentioned columns now go to ignore_columns instead of outcome_columns
- Previously, when --outcome_columns was not explicitly set, all columns that
  were not the id, weight, or a covariate were automatically classified as
  outcome columns. Now those columns are placed into ignore_columns instead.
- Columns that are explicitly mentioned — the id column, weight column,
  covariate columns, and outcome columns — are not ignored.

New Features

ASCII comparative histogram and plot improvements
- Added ascii_comparative_hist for comparing multiple distributions against a
  baseline using inline visual indicators (█, ▒, ▐, ░).
- Comparative ASCII plots now order datasets as population → adjusted → sample.
- ascii_plot_dist accepts a new comparative keyword (default True) to
  toggle between comparative and grouped-bar histograms for numeric variables.

Code Quality & Refactoring

Moved dataset loading implementations out of balance.datasets.__init__
- Refactored load_sim_data, load_cbps_data, and load_data into
  balance.datasets.loading_data and re-exported them from
  balance.datasets to preserve the public API while keeping module
  responsibilities focused.

Documentation

ASCII plot documentation and tutorial examples
- Added rendered text-plot examples to ASCII plot docstrings and documented
  library="balance" support. Updated balance_quickstart.ipynb with
  adjusted vs unadjusted ASCII plot examples.
Improved keep_columns documentation
- Updated docstrings for has_keep_columns(), keep_columns(), and the
  --keep_columns argument to clarify that keep columns control which columns
  appear in the final output CSV. Keep columns that are not id, weight,
  covariate, or outcome columns will be placed into ignore_columns during
  processing but are still retained and available in the output.
Clarified _prepare_input_model_matrix argument docs
- Updated docstrings in balance.utils.model_matrix with
  explicit descriptions for sample, target, variables, and add_na
  behavior when preparing model-matrix inputs.

Bug Fixes

Weight diagnostics now consistently accept DataFrame inputs
- design_effect, nonparametric_skew, prop_above_and_below, and
  weighted_median_breakdown_point now explicitly normalize DataFrame inputs
  to their first column before computation, matching validation behavior and
  returning scalar/Series outputs consistently.
Model-matrix robustness improvements
- _make_df_column_names_unique() now avoids suffix collisions when columns
  like a, a_1, and repeated a names appear together, renaming
  duplicates deterministically to prevent downstream clashes.
- _prepare_input_model_matrix() now raises a deterministic ValueError
  when the input sample has zero rows, instead of relying on an assertion.
Stabilized prop_above_and_below() return paths
- prop_above_and_below() now builds concatenated outputs only from present
  Series objects and returns None when both below and above are None,
  avoiding ambiguous concat inputs while preserving existing behavior for valid
  threshold sets.
Validated and normalized comma-separated CLI column arguments
- CLI column-list arguments now trim surrounding whitespace and reject empty
  entries (for example, "id,,weight") with clear ValueError messages,
  preventing malformed column specifications from silently propagating.
- Applied to --covariate_columns, --covariate_columns_for_diagnostics,
  --batch_columns, --keep_columns, and --outcome_columns parsing.

Tests

Added end-to-end adjustment test with ASCII plot output and expanded ASCII plot edge-case coverage
- TestAsciiPlotsAdjustmentEndToEnd runs the full adjustment pipeline and
  asserts exact expected ASCII output. Added tests for ascii_plot_dist with
  comparative=False and mixed categorical+numeric routing.
Expanded warning coverage for Sample.from_frame() ID inference
- Added assertions that validate all three expected warnings are emitted when inferring an id column and default weights, including ID guessing, ID string casting, and automatic weight creation.
Expanded IPW helper and diagnostics test coverage
- Added tests for link_transform() and calc_dev() to validate behavior
  for extreme probabilities and finite 10-fold deviance summaries.
- Refactored diagnostics tests to use a shared IPW setup helper, added
  edge-case assertions for solver/penalty values, NaN coercion of non-scalar
  inputs, and now assert labels match fitted model parameters.
Expanded prop_above_and_below() edge-case coverage
- Added focused tests for empty threshold iterables, mixed None threshold groups in dict mode, and explicit all-None threshold handling across return formats.
Added unit coverage for CLI I/O and empty-batch handling
- Added focused tests for BalanceCLI.process_batch() empty-sample failure payloads, load_and_check_input() CSV loading paths, and write_outputs() delimiter-aware output writing for both adjusted and diagnostics files.

Contributors

@sahil350 , @neuralsorcerer, @talgalili

Full Changelog

0.16.0...0.17.0

@neuralsorcerer

New Features

Outcome weight impact diagnostics
- Added paired outcome-weight impact tests (y*w0 vs y*w1) with confidence intervals.
- Exposed in BalanceDFOutcomes, Sample.diagnostics(), and the CLI via
  --weights_impact_on_outcome_method.
Pandas 3 support
- Updated compatibility and tests for pandas 3.x
Categorical distribution metrics without one-hot encoding
- KLD/EMD/CVMD/KS on BalanceDF.covars() now operate on raw categorical variables
  (with NA indicators) instead of one-hot encoded columns.
Misc
- Raw-covariate adjustment for custom models
  - Sample.adjust() now supports fitting models on raw covariates (without a model matrix)
    for IPW via use_model_matrix=False. String, object, and boolean columns are converted
    to pandas Categorical dtype, allowing sklearn estimators with native categorical
    support (e.g., HistGradientBoostingClassifier with categorical_features="from_dtype")
    to handle them correctly. Requires scikit-learn >= 1.4 when categorical columns are
    present.
- Validate weights include positive values
  - Added a guard in weight diagnostics to error when all weights are zero.
- Support configurable ID column candidates
  - Sample.from_frame() and guess_id_column() now accept candidate ID column names
    when auto-detecting the ID column.
- Formula support for BalanceDF model matrices
  - BalanceDF.model_matrix() now accepts a formula argument to build
    custom model matrices without precomputing them manually.

Bug Fixes

Removed deprecated setup build
- Replaced deprecated setup.py with pyproject.toml build in CI to avoid build failure.
Hardened ID column candidate validation
- guess_id_column() now ignores duplicate candidate names and validates that candidates are non-empty strings.
Hardened pandas 3 compatibility paths
- Updated string/NA handling and discrete checks for pandas 3 dtypes, and refreshed tests to accept string-backed dtypes.

Packaging & Tests

Pandas 3.x compatibility
- Expanded the pandas dependency range to allow pandas 3.x releases.
Direct util imports in tests
- Refactored util test modules to import helpers directly from their modules instead of via balance_util.

Breaking Changes

Require positive weights for weight diagnostics that normalize or aggregate
- design_effect, nonparametric_skew, prop_above_and_below, and
  weighted_median_breakdown_point now raise a ValueError when all weights
  are zero.
- Migration: ensure your weights include at least one positive value
  before calling these diagnostics, or catch the ValueError if all-zero
  weights are possible in your workflow.

Contributors

@neuralsorcerer, @talgalili (with code/methodological review by @talsarig)

Full Changelog: 0.15.0...0.16.0

@neuralsorcerer

New Features

Added EMD/CVMD/KS distribution diagnostics
- BalanceDF now exposes Earth Mover's Distance (EMD), Cramér-von Mises distance (CVMD), and Kolmogorov-Smirnov (KS) statistics for comparing adjusted samples to targets.
- These diagnostics support weighted or unweighted comparisons, apply discrete/continuous formulations, and respect aggregate_by_main_covar for one-hot categorical aggregation.
Exposed outcome columns selection in the CLI
- Added --outcome_columns to choose which columns are treated as outcomes
  instead of defaulting to all non-id/weight/covariate columns. Remaining columns are moved to ignored_columns.
Improved missing data handling in poststratify()
- poststratify() now accepts na_action to either drop rows with missing
  values or treat missing values as their own category during weighting.
- Breaking change: the default behavior now fills missing values in
  poststratification variables with "__NaN__" and treats this as a distinct
  category during weighting. Previously, missing values were not handled
  explicitly, and their treatment depended on pandas groupby and merge
  defaults. To approximate the legacy behavior where missing values do not
  form their own category, pass na_action="drop" explicitly.
Added formula support for descriptive_stats model matrices
- descriptive_stats() now accepts a formula argument that is always
  applied to the data (including numeric-only frames), letting callers
  control which terms and dummy variables are included in summary statistics.

Documentation

Documented the balance CLI
- Added full API docstrings for balance.cli and a new CLI tutorial notebook.
Created Balance CLI tutorial
- Added CLI command echoing, a load_data() example, and richer diagnostics exploration with metric/variable listings and a browsable diagnostics table. https://import-balance.org/docs/tutorials/balance_cli_tutorial/
Synchronized docstring examples with test cases
- Updated user-facing docstrings so the documented examples mirror tested inputs
  and outputs.

Code Quality & Refactoring

Added warning when the sample size of 'target' is much larger than 'sample' sample size
- Sample.adjust() now warns when the target exceeds 100k rows and is at
  least 10x larger than the sample, highlighting that uncertainty is
  dominated by the sample (akin to a one-sample comparison).
Split util helpers into focused modules
- Broke balance.util into balance.utils submodules for easier navigation.

Bug Fixes

Updated Sample.__str__() to format weight diagnostics like Sample.summary()
- Weight diagnostics (design effect, effective sample size proportion, effective sample size)
  are now displayed on separate lines instead of comma-separated on one line.
- Replaced "eff." abbreviations with full "effective" word for better readability.
- Improves consistency with Sample.summary() output format.
Numerically stable CBPS probabilities
- The CBPS helper now uses a stable logistic transform to avoid exponential
  overflow warnings during probability computation in constraint checks.
Silenced pandas observed default warning
- Explicitly sets observed=False in weighted categorical KLD calculations
  to retain current behavior and avoid future pandas default changes.
Fixed plot_qq_categorical to respect the weighted parameter for target data
- Previously, the target weights were always applied regardless of the
  weighted=False setting, causing inconsistent behavior between sample
  and target proportions in categorical QQ plots.
Restored CBPS tutorial plots
- Re-enabled scatter plots in the CBPS comparison tutorial notebook while
  avoiding GitHub Pages rendering errors and pandas colormap warnings. https://import-balance.org/docs/tutorials/comparing_cbps_in_r_vs_python_using_sim_data/
Clearer validation errors in adjustment helpers
- trim_weights() now accepts list/tuple inputs and reports invalid types explicitly.
- apply_transformations() raises clearer errors for invalid inputs and empty transformations.
Fixed model_matrix to drop NA rows when requested
- model_matrix(add_na=False) now actually drops rows containing NA values while preserving categorical levels, matching the documented behavior.
- Previously, add_na=False only logged a warning without dropping rows; code relying on the old behavior may now see fewer rows and should either handle missingness explicitly or use add_na=True.

Tests

Aligned formatting toolchain between Meta internal and GitHub CI
- Added ["fbcode/core_stats/balance"] override to Meta's internal tools/lint/pyfmt/config.toml to use formatter = "black" and sorter = "usort".
- This ensures both internal (pyfmt/arc lint) and external (GitHub Actions) environments use the same Black 25.1.0 formatter, eliminating formatting drift.
- Updated CI workflow, pre-commit config, and requirements-fmt.txt to use black==25.1.0.
Added Pyre type checking to GitHub Actions via .pyre_configuration.external and a new pyre job in the workflow. Tests are excluded due to external typeshed stub differences; library code is fully type-checked.
Added test coverage workflow and badge to README via .github/workflows/coverage.yml. The workflow collects coverage using pytest-cov, generates HTML and XML reports, uploads them as artifacts, and displays coverage metrics. A coverage badge is now shown in README.md alongside other workflow badges.
Improved test coverage for edge cases and error handling paths
- Added targeted tests for previously uncovered code paths across the library, addressing edge cases including empty inputs, verbose logging, error handling for invalid parameters, and boundary conditions in weighting methods (IPW, CBPS, rake).
- Tests exercise defensive code paths that handle empty DataFrames, NaN convergence values, invalid model types, and non-convergence warnings.
Split test_util.py into focused test modules
- Split the large test_util.py file (2325 lines) into 5 modular test files that mirror the balance/utils/ structure:
  - test_util_data_transformation.py - Tests for data transformation utilities
  - test_util_input_validation.py - Tests for input validation utilities
  - test_util_model_matrix.py - Tests for model matrix utilities
  - test_util_pandas_utils.py - Tests for pandas utilities (including high cardinality warnings)
  - test_util_logging_utils.py - Tests for logging utilities
- This improves test organization and makes it easier to locate tests for specific utilities.

Contributors

@neuralsorcerer, @talgalili

Full Changelog: 0.14.0...0.15.0

@neuralsorcerer

New Features

Enhanced adjusted sample summary output
- Sample.__str__() now displays adjustment details (method, trimming
  parameters, design effect, effective sample size) when printing adjusted
  samples (#194,
  #57).
Richer Sample.summary() diagnostics
- Adjusted sample summary now groups covariate diagnostics, reports design
  effect alongside ESSP/ESS, and surfaces weighted outcome means when
  available.
Warning of high-cardinality categorical features in .adjust()
- Categorical features where ≥80% of values are unique are flagged before
  weight fitting to help identify problematic columns like user IDs
  (#195,
  #65).
Ignored column handling for Sample inputs
- Sample.from_frame accepts ignore_columns for columns that should remain
  on the dataframe but be excluded from covariates and outcome statistics.
  Ignored columns appear in Sample.df and can be retrieved via
  Sample.ignored_columns().

Code Quality & Refactoring

Consolidated diagnostics helpers
- Added _concat_metric_val_var() helper and balance.util._coerce_scalar
  for robust diagnostics row construction and scalar-to-float conversion.
- Breaking change: Sample.diagnostics() for IPW now always emits
  iteration/intercept summaries plus hyperparameter settings.

Bug Fixes

Early validation of null weight inputs
- Sample.from_frame now raises ValueError when weights contain None,
  NaN, or pd.NA values with count and preview of affected rows.
Percentile weight trimming across platforms
- trim_weights() now computes thresholds via percentile quantiles with
  explicit clipping bounds for consistent behavior across Python/NumPy
  versions.
- Breaking change: percentile-based clipping may shift by roughly one
  observation at typical limits.
IPW diagnostics improvements
- Fixed multi_class reporting, normalized scalar hyperparameters to floats,
  removed deprecated penalty argument warnings, and deduplicated metric
  entries for stable counts across sklearn versions.

Tests

Added Windows and macOS CI testing support
- Expanded GitHub Actions to run on ubuntu-latest, macos-latest, and
  windows-latest for Python 3.9-3.14.
- Added tempfile_path() context manager for cross-platform temp file
  handling and configured matplotlib Agg backend via conftest.py.

Contributors

@neuralsorcerer, @talgalili, @wesleytlee

Full Changelog

0.13.0...0.14.0

@neuralsorcerer

New Features

Propensity modeling beyond static logistic regression
- ipw() now accepts any sklearn classifier via the model argument,
  enabling the use of models like random forests and gradient boosting while
  preserving all existing trimming and diagnostic features. Dense-only
  estimators and models without linear coefficients are fully supported.
  Propensity probabilities are stabilized to avoid numerical issues.
- Allow customization of logistic regression by passing a configured
  :class:~sklearn.linear_model.LogisticRegression instance through the
  model argument. Also, the CLI now accepts
  --ipw_logistic_regression_kwargs JSON to build that estimator directly for
  command-line workflows.
Covariate diagnostics
- Added KL divergence calculations for covariate comparisons (numeric and
  one-hot categorical), exposed via BalanceDF.kld() alongside linked-sample
  aggregation support.
Weighting Methods
- rake() and poststratify() now honour weight_trimming_mean_ratio and
  weight_trimming_percentile, trimming and renormalising weights through the
  enhanced trim_weights(..., target_sum_weights=...) API so the documented
  parameters work as expected
  (#147).

Documentation

Added comprehensive post-stratification tutorial notebook
(balance_quickstart_poststratify.ipynb)
(#141,
#142,
#143).
Expanded poststratify docstring with clear examples and improved statistical
methods documentation
(#141).
Added project badges to README for build status, Python version support, and
release tracking
(#145).
Added IPW quickstart tutorial showcasing default logistic regression and
custom sklearn classifier usage in (balance_quickstart.ipynb).
Shorten the welcome message (for when importing the package).

Code Quality & Refactoring

Raking algorithm refactor
- Removed ipfn dependency and replaced with a vectorized NumPy
  implementation (_run_ipf_numpy) for iterative proportional fitting,
  resulting in significant performance improvements and eliminating external
  dependency (#135).
IPW method refactoring
- Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code
  patterns into reusable helper functions: _compute_deviance(),
  _compute_proportion_deviance(), _convert_to_dense_array().
- Removed manual ASMD improvement calculation and now uses existing
  compute_asmd_improvement() from weighted_comparisons_stats.py
Type safety improvements
- Migrated 32 Python files from # pyre-unsafe to # pyre-strict mode,
  covering core modules, statistics, weighting methods, datasets, and test
  files
- Modernized type hints to PEP 604 syntax (X | Y instead of Union[X, Y])
  across 11 files for improved readability and Python 3.10+ alignment
- Type alias definitions in typing.py retain Union syntax for Python 3.9
  compatibility
- Enhanced plotting function type safety with TypedDict definitions and
  proper type narrowing
- Replaced assert-based type narrowing with _verify_value_type() helper for
  better error messages and pyre-strict compliance
Renamed BalanceDF to BalanceDF****
- BalanceCovarsDF to BalanceDFCovars
- BalanceOutcomesDF to BalanceDFOutcomes
- BalanceWeightsDF to BalanceDFWeights

Bug Fixes

Utility Functions
- Fixed quantize() to preserve column ordering and use proper TypeError
  exceptions (#133)
Statistical Functions
- Fixed division by zero in asmd_improvement() when asmd_mean_before is
  zero, now returns 0.0 for 0% improvement
CLI & Infrastructure
- Replaced deprecated argparse FileType with pathlib.Path
  (#134)
Weight Trimming
- Fixed trim_weights() to consistently return pd.Series with
  dtype=np.float64 and preserve original index across both trimming methods
- Fixed percentile-based winsorization edge case: _validate_limit() now
  automatically adjusts limits to prevent floating-point precision issues
  (#144)
- Enhanced documentation for trim_weights() and _validate_limit() with
  clearer examples and explanations

Tests

Enhanced test coverage for weight trimming with
test_trim_weights_return_type_consistency and 11 comprehensive tests for
_validate_limit() covering edge cases, error conditions, and boundary
conditions

Contributors

@neuralsorcerer, @talgalili, @wesleytlee

Full Changelog: 0.12.1...0.13.0

@talgalili

New Features

Added a welcome message when importing the package.

Welcome to balance (Version 0.12.1)!
An open-source Python package for balancing biased data samples.

📖 Documentation: https://import-balance.org/
🛠️ Get Help / Report Issues: https://github.com/facebookresearch/balance/issues/
📄 Citation:
Sarig, T., Galili, T., & Eilat, R. (2023).
balance - a Python package for balancing biased data samples.
https://arxiv.org/abs/2307.06024

Tip: You can access this information at any time with balance.help()

Documentation

Added 'CHANGELOG' to the docs website. https://import-balance.org/docs/docs/CHANGELOG/

Bug Fixes

Fixed plotly figures in all the tutorials. https://import-balance.org/docs/tutorials/

Contributors

@talgalili, @wesleytlee

Full Changelog: 0.12.0...0.12.1

@talgalili

New Features

Support for Python 3.13 + 3.14
- Update setup.py and CI/CD integration to include Python 3.13 and 3.14.
- Remove upper version constraints from numpy, pandas, scipy, and scikit-learn dependencies for Python 3.12+.

Contributors

@talgalili, @wesleytlee

Full Changelog: 0.11.0...0.12.0

@talgalili

New Features

Python 3.12 support - Complete support for Python 3.12 alongside existing Python 3.9, 3.10, and 3.11 support (with CI/CD integration).
- Implemented Python version-specific dependency constraints - Added conditional version ranges for numpy, pandas, scipy, and scikit-learn that vary based on Python version (e.g., numpy>=1.21.0,<2.0 for Python <3.12, numpy>=1.24.0,<2.1 for Python >=3.12)
- Pandas compatibility improvements - Replaced value_counts(dropna=False) with groupby().size() in frequency table creation to avoid FutureWarning
- Fixed various pandas deprecation warnings and improved DataFrame handling
Improved raking algorithm - Completely refactored rake weighting from DataFrame-based to array-based ipfn algorithm using multi-dimensional arrays and itertools for better performance and compatibility with latest Python versions. Variables are now automatically alphabetized to ensure consistent results regardless of input order.
poststratify method enhancement - New strict_matching parameter (default True) handles cases where sample cells are not present in target data. When False, issues warning and assigns weight 0 to uncovered samples

Bug Fixes

Type annotations - Enhanced Pyre type hints throughout the codebase, particularly in utility functions
Sample class improvements - Fixed weight type assignment (ensuring float64 type), improved DataFrame manipulation with .infer_objects(copy=False) for pandas compatibility, and enhanced weight setting logic
Website dependencies - Updated various website dependencies including Docusaurus and related packages

Tests

Comprehensive test refactoring, including:

Enhanced test validation - Added detailed explanations of test methodologies and expected behaviors in docstrings
Improved test coverage - Tests now include edge cases like NaN handling, different data types, and error conditions
Improved test organization (more granular) across all test modules (test_stats_and_plots.py, test_balancedf.py, test_ipw.py, test_rake.py, test_cli.py, test_weighted_comparisons_plots.py, test_cbps.py, test_testutil.py, test_adjustment.py, test_util.py, test_sample.py)
Updated GitHub workflows to include Python 3.12 in build and test matrix
Fix 261 "pandas deprecation" warnings!
Added type annotations - Converted test_balancedf.py to pyre-strict with.

Documentation

GitHub issue template for support questions - Added structured template to help users ask questions about using the balance package

Contributors

@talgalili, @wesleytlee

Full Changelog

0.10.0...0.11.0

@wesleytlee

News

This version we transitioned ipw to use sklearn. This enables support for newer python versions as well as the Windows OS!
Updated Python and package compatibility. Balance is now compatible with Python 3.11, but no longer compatible with Python 3.8 due to typing errors. Balance is currently incompatible with Python 3.12 due to the removal of distutils.
Update license from GPL-v2 to the MIT license.

New Features

Dependency on glmnet has been removed, and the ipw method now uses sklearn.
ipw method uses logistic regression with L2-penalties instead of L1-penalties for computational reasons. The transition from glmnet to sklearn and use of L2-penalties will lead to slightly different generated weights compared to previous versions of Balance.
Unfortunately, the sklearn-based ipw method is generally slower than the previous version by 2-5x. Consider using the new arguments lambda_min, lambda_max, and num_lambdas for a more efficient search over the ipw penalization space.

Bug Fixes

Fix E721 flake8 issue (see: https://github.com/facebookresearch/balance/actions/runs/5704381365/job/15457952704)

Documentation

Added links to presentation given at ISA 2023.
Fixed misc typos.

Full Changelog

0.9.0...0.10.0

Contributors

@wesleytlee, @talgalili, @SarigT

Releases: facebookresearch/balance

0.18.0 (2026-03-24)

New Features

Deprecations

LLM/GenAI

Bug Fixes

Contributors

Contributors

Uh oh!

0.17.0 (2026-03-17)

Breaking Changes

New Features

Code Quality & Refactoring

Documentation

Bug Fixes

Tests

Contributors

Full Changelog

Contributors

Uh oh!

0.16.0 (2026-02-09)

New Features

Bug Fixes

Packaging & Tests

Breaking Changes

Contributors

Contributors

Uh oh!

0.15.0 (2026-01-20)

New Features

Documentation

Code Quality & Refactoring

Bug Fixes

Tests

Contributors

Contributors

Uh oh!

0.14.0 (2025-12-14)

New Features

Code Quality & Refactoring

Bug Fixes

Tests

Contributors

Full Changelog

Contributors

Uh oh!

0.13.0 (2025-12-02)

New Features

Documentation

Code Quality & Refactoring

Bug Fixes

Tests

Contributors

Contributors

Uh oh!

0.12.1 (2025-11-03)

New Features

Documentation

Bug Fixes

Contributors

Contributors

Uh oh!

0.12.0 (2025-10-14)

New Features

Contributors

Contributors

Uh oh!

0.11.0 (2025-09-24)

New Features

Bug Fixes

Tests

Documentation

Contributors

Full Changelog

Contributors

Uh oh!

0.10.0 (2025-01-06)

News

New Features

Bug Fixes