Warn on high-cardinality NA features producing uniform ipw weights by neuralsorcerer · Pull Request #195 · facebookresearch/balance

neuralsorcerer · 2025-12-03T06:53:40Z

Added detection of high-cardinality categorical columns with missing values before IPW fitting and surfaced them when all weights become identical to flag potential causes. Also added tests.

Closes [FEATURE] check (raise warning) if features are provided that lead to all equal weights #65

Copilot

Pull request overview

This PR adds detection and warning functionality for high-cardinality categorical features with missing values that can cause uniform IPW weights, addressing issue #65. The implementation detects these problematic columns before model fitting and includes them in the warning message when all weights become identical.

Key changes:

Added high-cardinality detection logic that identifies categorical columns where ≥80% of non-NA values are unique
Enhanced the uniform weights warning to list potentially problematic high-cardinality columns
Added comprehensive test coverage for the new detection functionality

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
balance/weighting_methods/ipw.py	Implements high-cardinality detection logic and enhances warning message to include problematic columns when uniform weights are detected
tests/test_ipw.py	Adds three test cases covering high-cardinality detection for object dtype, categorical dtype, and low-cardinality scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

balance/weighting_methods/ipw.py

tests/test_ipw.py

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

balance/weighting_methods/ipw.py

talgalili · 2025-12-03T07:22:31Z

Please fix lint :)

talgalili

Good progress!

Now that I'm looking at this closer, I think the way the 'issue' was written is not clear enough - so I'd like to clarify further.

Several comments:

All of this new logic should be in an external _function, so as to not make ipw more complex, and also for easier testing/validating.
magic numbers (0.8) should be placed as arguments in the function. With maybe a way to set it as part of a global feature in the package. That can be a TODO left as a comment. But just a thought. BTW, the number can probably be much lower, e.g. 50%. But sure, let's start with 80%.
The check for high cardinality should be done in general, regardless of what the results of the weights are (1 or not). For example, if the user puts the 'id' column as a feature (by accident), or if they add a column with user names or something like this - we want a warning so that the user can see that this column has an issue - and may not be relevant for the model (and either removed or bucketed)
The point is not NA features, it's features with high cardinality (that might have NA to them).
It's worth adding the cardinality number itself, per column, to the warning message. And sort the columns (DESC) from high to low cardinality, in the warning message.
All of these should be tested in the test plan.

THANKS!

balance/weighting_methods/ipw.py

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

balance/weighting_methods/ipw.py

talgalili

nits

balance/weighting_methods/ipw.py

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

meta-codesync · 2025-12-03T11:59:27Z

@talgalili has imported this pull request. If you are a Meta employee, you can view this in D88260566.

talgalili · 2025-12-03T11:59:31Z

LGTM.
I'll send it for internal review.

talgalili · 2025-12-03T14:06:56Z

@neuralsorcerer
FYI: While going over this again carefully, I noticed some things I'd like to change. I'll do it tomorrow, so this might take a day or two to land.

neuralsorcerer · 2025-12-03T14:08:22Z

Sure, thank you @talgalili

talgalili · 2025-12-04T05:00:04Z

FYI:
I finished updating this PR but due to an infra issue, it might take a day (or so?!) to get pushed back to github.

meta-codesync · 2025-12-04T07:37:05Z

@talgalili merged this pull request in 4900fa2.

Warn on high-cardinality NA features producing uniform ipw weights

8775f33

Copilot AI review requested due to automatic review settings December 3, 2025 06:53

meta-cla bot added the cla signed label Dec 3, 2025

Copilot started reviewing on behalf of neuralsorcerer December 3, 2025 06:54 View session

Copilot finished reviewing on behalf of neuralsorcerer December 3, 2025 06:56

Copilot AI reviewed Dec 3, 2025

View reviewed changes

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

tests/test_ipw.py Show resolved Hide resolved

Implement suggestions

2782522

neuralsorcerer requested a review from Copilot December 3, 2025 07:11

Copilot started reviewing on behalf of neuralsorcerer December 3, 2025 07:12 View session

Copilot finished reviewing on behalf of neuralsorcerer December 3, 2025 07:14

Copilot AI reviewed Dec 3, 2025

View reviewed changes

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

Implement suggestions

9fff1d0

Fix lints

408d2c5

talgalili requested changes Dec 3, 2025

View reviewed changes

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

Warn on high-cardinality categorical features

8483272

neuralsorcerer requested a review from talgalili December 3, 2025 10:20

talgalili requested changes Dec 3, 2025

View reviewed changes

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Show resolved Hide resolved

balance/weighting_methods/ipw.py Show resolved Hide resolved

talgalili requested a review from Copilot December 3, 2025 10:31

Copilot started reviewing on behalf of talgalili December 3, 2025 10:31 View session

Copilot finished reviewing on behalf of talgalili December 3, 2025 10:34

Copilot AI reviewed Dec 3, 2025

View reviewed changes

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

balance/weighting_methods/ipw.py Show resolved Hide resolved

Implement suggestions

475b797

neuralsorcerer requested a review from talgalili December 3, 2025 11:03

talgalili requested changes Dec 3, 2025

View reviewed changes

balance/weighting_methods/ipw.py Outdated Show resolved Hide resolved

talgalili requested a review from Copilot December 3, 2025 11:18

Copilot started reviewing on behalf of talgalili December 3, 2025 11:18 View session

Copilot finished reviewing on behalf of talgalili December 3, 2025 11:21

Copilot AI reviewed Dec 3, 2025

View reviewed changes

Implement suggestions

248bba9

neuralsorcerer requested a review from talgalili December 3, 2025 11:44

talgalili approved these changes Dec 3, 2025

View reviewed changes

meta-codesync bot closed this in 4900fa2 Dec 4, 2025

facebook-github-bot added the Merged label Dec 4, 2025

neuralsorcerer deleted the card branch December 4, 2025 09:35

Conversation

neuralsorcerer commented Dec 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

talgalili commented Dec 3, 2025

Uh oh!

talgalili left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

talgalili left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

meta-codesync bot commented Dec 3, 2025

Uh oh!

talgalili commented Dec 3, 2025

Uh oh!

talgalili commented Dec 3, 2025

Uh oh!

neuralsorcerer commented Dec 3, 2025

Uh oh!

talgalili commented Dec 4, 2025

Uh oh!

meta-codesync bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants