FIX online updates in MiniBatchDictionaryLearning #25354

jeremiedbb · 2023-01-11T10:39:54Z

The online updates of the sufficient statistics should be normalized by batch_size since it's an average over the batch. When fitting with a constant batch size it doesn't matter but when using partial_fit on batches of different sizes it has an impact.
It's described in the original paper of Mairal Online Learning for Matrix Factorization and Sparse Coding, page 9 of the PDF, in section 3.4.3 mini-batch extension. (https://www.jmlr.org/papers/volume11/mairal10a/mairal10a.pdf)

I computed the final objective function for 100 datasets randomly generated, fitted using partial_fit on batch of various sizes and the objective function is constantly lower with this fix (code below).

from sklearn.decomposition import MiniBatchDictionaryLearning
import numpy as np
 
a = np.hstack(([0], np.logspace(1, 3, num=10).astype(int)))
slices = [slice(a[k], a[k+1]) for k in range(len(a) - 1)]

objs = []
for seed in range(100):
    X = np.random.RandomState(seed).random_sample((1000, 100))
    dl = MiniBatchDictionaryLearning(n_components=15, max_iter=10, random_state=0)
    for sl in slices:
        dl.partial_fit(X[sl])
        obj = 0.5 * (np.sum((X - dl.transform(X) @ dl.components_)**2)) + dl.alpha * np.sum(np.abs(dl.transform(X)))
        objs.append(obj)

(objs_main - objs_this_pr).min()
# 156.0638606704997
(objs_main - objs_this_pr).max()
#  930.3064058826549
# It corresponds to an improvement on the objective function between 1% and 7%.

It's not really possible to add a test for this, but I think the results above are convincing enough.

jeremiedbb · 2023-01-11T16:26:04Z

sklearn/decomposition/_dict_learning.py

+    >>> np.mean(X_transformed == 0) < 0.5
+    True


I changed that because there was doctest failures in some jobs. The result was slightly under 0.39 in some jobs and slightly over 0.39 in others.

Since the purpose is to show that the matrix has some sparsity, I think this change is acceptable.

glemaitre

Yes it makes sense looking at the algorithm. LGTM.

doc/whats_new/v1.2.rst

Co-authored-by: Olivier Grisel <[email protected]>

fix online updates

50f3a8c

github-actions bot added the module:decomposition label Jan 11, 2023

jeremiedbb added 4 commits January 11, 2023 13:19

changed models + fix doc test

6b5c94b

check

b129227

[azure parallel]

da57f54

avoid doctest failure due to numerical instability

e39b96f

jeremiedbb commented Jan 11, 2023

View reviewed changes

glemaitre self-requested a review January 13, 2023 18:59

glemaitre approved these changes Jan 13, 2023

View reviewed changes

doc/whats_new/v1.2.rst Outdated Show resolved Hide resolved

jeremiedbb added 2 commits January 18, 2023 10:15

Update v1.2.rst

caac23a

Merge branch 'main' into fix-online-dl-updates

34a7ae8

ogrisel approved these changes Jan 19, 2023

View reviewed changes

Merge branch 'main' into fix-online-dl-updates

5046928

ogrisel added this to the 1.2.1 milestone Jan 19, 2023

ogrisel enabled auto-merge (squash) January 19, 2023 11:04

ogrisel merged commit cfd428a into scikit-learn:main Jan 19, 2023

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023

FIX online updates in MiniBatchDictionaryLearning (scikit-learn#25354)

a1f794f

Co-authored-by: Olivier Grisel <[email protected]>

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023

FIX online updates in MiniBatchDictionaryLearning (scikit-learn#25354)

001a0b6

Co-authored-by: Olivier Grisel <[email protected]>

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023

FIX online updates in MiniBatchDictionaryLearning (scikit-learn#25354)

14d2066

Co-authored-by: Olivier Grisel <[email protected]>

adrinjalali pushed a commit that referenced this pull request Jan 24, 2023

FIX online updates in MiniBatchDictionaryLearning (#25354)

17ff0e7

Co-authored-by: Olivier Grisel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX online updates in MiniBatchDictionaryLearning #25354

FIX online updates in MiniBatchDictionaryLearning #25354

Uh oh!

jeremiedbb commented Jan 11, 2023 •

edited

Loading

Uh oh!

jeremiedbb Jan 11, 2023

Uh oh!

glemaitre left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

FIX online updates in MiniBatchDictionaryLearning #25354

FIX online updates in MiniBatchDictionaryLearning #25354

Uh oh!

Conversation

jeremiedbb commented Jan 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb Jan 11, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeremiedbb commented Jan 11, 2023 •

edited

Loading