Skip to content

Optimize Lossguide#2835

Closed
Levachev wants to merge 39 commits into
catboost:masterfrom
Levachev:dev
Closed

Optimize Lossguide#2835
Levachev wants to merge 39 commits into
catboost:masterfrom
Levachev:dev

Conversation

@Levachev
Copy link
Copy Markdown
Contributor

Optimize Lossguide with subtract trick

@Levachev
Copy link
Copy Markdown
Contributor Author

Levachev commented Mar 21, 2025

Optimizing Lossguide with the subtract trick.

When testing, it was found that the result will be different(compared to the master version) if you do not set the parameters of CatBoostRegressor
random_strength=0,
bootstrap_type="No",
has_time=True
Otherwise, the result will be different due to the fact that Ctx->LearnProgress->Rand.GenRand() will be different when splitting the leaf

const TTrainingDataProviders& data,
const TFold& fold,
ui32 oneHotMaxSize
const TTrainingDataProviders& data,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please restore indentation -- once this is done, i will get back with more comments

@Levachev
Copy link
Copy Markdown
Contributor Author

@Evgueni-Petrov-aka-espetrov restore indentation complete

@a-holm
Copy link
Copy Markdown

a-holm commented Apr 4, 2025

Minor Nit: The PR description is very brief. Given this is a significant algorithmic optimization replacing the previous GreedyTensorSearchLossguide function entirely, could you add a sentence or two to the description briefly outlining the approach (e.g., reusing parent stats via shared pointers passed down during expansion)? This would improve context for future readers/maintainers looking at the PR history.

@Levachev
Copy link
Copy Markdown
Contributor Author

Levachev commented Apr 6, 2025

@a-holm
The most time-consuming part of the original Lossguide implementation is calculating the statistics separately for each child leaf. Due to the subtraction trick, we only calculate the statistics for the smaller node, and for the larger one we subtract the statistics of the smaller node from the statistics of the parent node.
I hope I have clarified the matter.
Link to subtract trick - https://everdark.github.io/k9/notebooks/ml/gradient_boosting/gbt.nb.html

@a-holm
Copy link
Copy Markdown

a-holm commented Apr 6, 2025

@Levachev very cool! Thank you

return;
}
auto candidatesContextsLeftLeaf = SelectFeaturesForScoring(data, {}, fold, ctx);
auto candidatesContextsRightLeaf = SelectFeaturesForScoring(data, {}, fold, ctx);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be identical to candidatesContextsLeftLeaf
could we use copy instead of calling SelectFeaturesForScoring?

}
};

const auto findBestCandidate = [&](TIndexType leftLeaf, TIndexType rightLeaf, const std::shared_ptr<TVector<TBucketStats>> &parent) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this a normal static function to improve readability

const TCandidateInfo* bestSplitCandidate = nullptr;
const double scoreBeforeSplit = CalcScoreWithoutSplit(leaf, *fold, *ctx);
SelectBestCandidate(data, *ctx, candidatesContexts, maxFeatureValueCount, *fold, scoreBeforeSplit, &bestScore, &bestSplitCandidate);
fold->DropEmptyCTRs();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does not optimized GreedyTensorSearchLossguide call DropEmptyCTRs anywhere?
it may free some memory

fold,
ctx);
const size_t maxFeatureValueCount = CalcMaxFeatureValueCount(*fold, candidatesContexts);
CheckInterrupted(); // check after long-lasting operation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does not optimized GreedyTensorSearchLossguide call CheckInterrupted anywhere?
this call is needed to ensure catboost can be interrupted here

Comment on lines +1730 to +1731
const bool needSplitLeftLeaf = leafDepth[leftLeaf] < ctx->Params.ObliviousTreeOptions->MaxDepth
&& leftLeafBoundsSize >= ctx->Params.ObliviousTreeOptions->MinDataInLeaf;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please store max depth and min data in leaf in some variables to avoid code duplication

*parent);

} else {
if(needSplitLeftLeaf) {
Copy link
Copy Markdown
Contributor

@Evgueni-Petrov-aka-espetrov Evgueni-Petrov-aka-espetrov Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need space after if -- please check everywhere

&rightLeafBestSplitCandidate,
&rightLeafBestSplit,
*parent);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove empty line

&rightLeafGain,
&rightLeafBestSplitCandidate,
&rightLeafBestSplit);
leftLeafStats = CalculateWithSubtractTrickNoParentQueue(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls insert empty line before leftLeafStats

&leftLeafBestSplitCandidate,
&leftLeafBestSplit,
*parent);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove empty line

TVector<TBucketStats> leftLeafStats;
TVector<TBucketStats> rightLeafStats;

if ((leftLeafBoundsSize <= rightLeafBoundsSize) && isSubtractTrickAllowed && needSplitLeftLeaf && needSplitRightLeaf) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please structure as follows for better readability

if (isSubtractTrickAllowed && needSplitLeftLeaf && needSplitRightLeaf) {
    if (leftLeafBoundsSize <= rightLeafBoundsSize) {
        // ...
    } else {
        // ...
    }
} else {
    if (needSplitLeftLeaf) {
        // ...
    }
    if (needSplitRightLeaf) {
        // ...
    }
}

@Levachev
Copy link
Copy Markdown
Contributor Author

Levachev commented Apr 28, 2025

I hereby agree to the terms of the CLA available at: link.

Comment on lines +1741 to +1742
auto leftLeafStatsPtr = MakeSimpleShared<TVector<TBucketStats>>(leftLeafStats);
auto rightLeafStatsPtr = MakeSimpleShared<TVector<TBucketStats>>(rightLeafStats);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid unnecessary copy here and in findBestCandidateRoot as follows

leftLeafStatsPtr = MakeSimpleShared<TVector<TBucketStats>>();
leftLeafStatsPtr->swap(leftLeafStats);

Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
Comment thread catboost/private/libs/algo/greedy_tensor_search.cpp Outdated
@Evgueni-Petrov-aka-espetrov
Copy link
Copy Markdown
Contributor

Evgueni-Petrov-aka-espetrov commented May 28, 2025

Since this optim reorders summation of derivatives, canonical outputs of pytest and python-package/ut/medium will change for Lossguide.
Model quality (final prediction error value) is preserved.
However, about 5-10% of predictions drift within 0.1% from their current canonical values.
I will update canonical data on Yandex side before final merge.

@Evgueni-Petrov-aka-espetrov
Copy link
Copy Markdown
Contributor

Shipped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants