Skip to content

feat(Probability/ProbabilityMassFunction): total variation distance and MetricSpace instance#35826

Open
quangvdao wants to merge 5 commits intoleanprover-community:masterfrom
quangvdao:quangvdao/pmf-total-variation
Open

feat(Probability/ProbabilityMassFunction): total variation distance and MetricSpace instance#35826
quangvdao wants to merge 5 commits intoleanprover-community:masterfrom
quangvdao:quangvdao/pmf-total-variation

Conversation

@quangvdao
Copy link
Copy Markdown
Collaborator

Define total variation distance on probability mass functions, provide
a MetricSpace instance, and prove the data processing inequality.

The extended TV distance PMF.etvDist is defined in ℝ≥0∞ as
(1/2) ∑ x, absDiff (p x) (q x), using a new symmetric absolute
difference ENNReal.absDiff. The real-valued PMF.tvDist is its
toReal.

New definitions:

  • ENNReal.absDiff: symmetric absolute difference (a - b) + (b - a)
  • PMF.etvDist: extended total variation distance
  • PMF.tvDist: real-valued total variation distance
  • PMF.instMetricSpace: MetricSpace instance on PMF α

Main results:

  • ENNReal.absDiff_triangle: triangle inequality for absDiff
  • ENNReal.absDiff_tsum_le: subadditivity over infinite sums
  • PMF.etvDist_le_one: TV distance is at most 1
  • PMF.etvDist_map_le: data processing inequality (deterministic maps)
  • PMF.etvDist_bind_right_le: data processing inequality (Markov kernels)
  • PMF.etvDist_option_punit: closed form for binary distributions

Supersedes #33680 with a complete, sorry-free implementation.
See Zulip discussion:
https://leanprover.zulipchat.com/#narrow/channel/217875-Is-there-code-for-X.3F/topic/total.20variation.20distance.20between.20two.20PMFs

This PR was authored with the support of Claude (Cursor IDE).


This PR provides a complete formalization of total variation distance for PMFs, including:

  1. ENNReal.absDiff (Mathlib/Data/ENNReal/AbsDiff.lean): A symmetric absolute difference
    for ℝ≥0∞ with triangle inequality, connection to real absolute value, and algebraic properties.
    The tsum-related lemmas (absDiff_tsum_le, tsum_fiber) are placed in the TV distance file
    to avoid pulling topology imports into Data.ENNReal.

  2. PMF.etvDist / PMF.tvDist (Mathlib/Probability/ProbabilityMassFunction/TotalVariation.lean):
    Total variation distance with a MetricSpace instance, data processing inequality for both
    deterministic maps and Markov kernels, and a closed form for binary distributions.

Compared to #33680, this PR:

  • Has no sorrys
  • Defines the distance in ℝ≥0∞ first (following Mathlib convention for edist/dist)
  • Includes the data processing inequality
  • Has full documentation and passes all Mathlib linters

Made with Cursor

@github-actions github-actions bot added the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Feb 26, 2026
@quangvdao quangvdao force-pushed the quangvdao/pmf-total-variation branch from d4dd027 to 2ad4f9d Compare February 26, 2026 21:54
@github-actions github-actions bot removed the merge-conflict The PR has a merge conflict with master, and needs manual merging. (this label is managed by a bot) label Feb 26, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 26, 2026

PR summary 5e5be118ff

Import changes for modified files

No significant changes to the import graph

Import changes for all files
Files Import difference
Mathlib.Data.ENNReal.AbsDiff (new file) 719
Mathlib.Probability.ProbabilityMassFunction.TotalVariation (new file) 1504

Declarations diff

+ absDiff
+ absDiff_comm
+ absDiff_eq_zero
+ absDiff_le_add
+ absDiff_mul_right_le
+ absDiff_self
+ absDiff_toReal
+ absDiff_triangle
+ absDiff_tsub_tsub
+ absDiff_tsum_le
+ dist_eq_tvDist
+ edist_eq_etvDist
+ etvDist
+ etvDist_bind_right_le
+ etvDist_comm
+ etvDist_eq_zero_iff
+ etvDist_le_one
+ etvDist_map_le
+ etvDist_ne_top
+ etvDist_option_punit
+ etvDist_self
+ etvDist_triangle
+ instMetricSpace
+ map_apply_eq
+ pmf_none_eq
+ tsub_le_tsub_add_tsub
+ tsub_mul_le
+ tsum_fiber
+ tsum_tsub_le_tsum_tsub
+ tvDist
+ tvDist_bind_right_le
+ tvDist_comm
+ tvDist_def
+ tvDist_eq_zero_iff
+ tvDist_le_one
+ tvDist_map_le
+ tvDist_nonneg
+ tvDist_option_punit
+ tvDist_self
+ tvDist_triangle

You can run this locally as follows
## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>

The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.


No changes to technical debt.

You can run this locally as

./scripts/reporting/technical-debt-metrics.sh pr_summary
  • The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
  • The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

@quangvdao quangvdao force-pushed the quangvdao/pmf-total-variation branch from 2ad4f9d to fdf603e Compare February 26, 2026 21:57

/-- Symmetric absolute difference in `ℝ≥0∞`, defined as `(a - b) + (b - a)`.
Satisfies `(absDiff a b).toReal = |a.toReal - b.toReal|` when both arguments are finite. -/
protected noncomputable def absDiff (a b : ℝ≥0∞) : ℝ≥0∞ := (a - b) + (b - a)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this satisfy the axioms of an EMetricSpace? Could this be renamed edist and turned into an instance of that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should as it would create a diamond. From Opus:

Good observation — absDiff does satisfy the EMetricSpace axioms (all four are proved in the file). However, the induced metric topology disagrees with the existing order topology on ℝ≥0∞: under absDiff, finite values are at infinite distance from ⊤ (since absDiff n ⊤ = ∞), so sequences like n → ∞ don't converge, unlike in the order topology. Since EMetricSpace extends TopologicalSpace, this would create a diamond. The absDiff is intended as a pointwise building block for PMF.etvDist, and the MetricSpace instance lives on PMF α where the topologies are consistent.

@joneugster
Copy link
Copy Markdown
Contributor

t-measure-probability

@github-actions github-actions bot added the t-measure-probability Measure theory / Probability theory label Feb 27, 2026
Comment thread Mathlib/Probability/ProbabilityMassFunction/TotalVariation.lean Outdated
quangvdao and others added 4 commits March 7, 2026 04:56
- Update tsum_ite_eq call to new API (function + SummationFilter)
- Remove unused [DecidableEq β] from etvDist_map_le, tvDist_map_le types
- Use classical in proofs instead, per linter guidance

Made-with: Cursor
Both AbsDiff.lean and TotalVariation.lean need the `module` declaration
and `public import` to comply with the new module system.

Made-with: Cursor
Declarations in module files are private by default in Lean 4.29.
Add @[expose] public section to export all declarations.

Made-with: Cursor
Copy link
Copy Markdown
Collaborator

@BoltonBailey BoltonBailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good definition to have, and nice to get the data processing inequality too, thanks!

@EtienneC30
Copy link
Copy Markdown
Member

I don't think we should have this. Total variation should just be a special case of a more general definition for arbitrary measures (possibly based on MeasureTheory.VectorMeasure.variation). Moreover we are planning to move away from PMF to use directly sums of Dirac masses.

@grunweg grunweg added the LLM-generated PRs with substantial input from LLMs - review accordingly label Mar 16, 2026
@mathlib-triage mathlib-triage bot assigned EtienneC30 and unassigned RemyDegenne Apr 6, 2026
@EtienneC30 EtienneC30 removed their assignment Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

LLM-generated PRs with substantial input from LLMs - review accordingly t-measure-probability Measure theory / Probability theory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants