Skip to content

<algorithm>: clamp produces unoptimal assembly with branches #2334

@lhecker

Description

@lhecker

The implementation of std::clamp is implemented in terms of using std::less here:

STL/stl/inc/algorithm

Lines 10178 to 10182 in 178b840

template <class _Ty>
_NODISCARD constexpr const _Ty& clamp(const _Ty& _Val, const _Ty& _Min_val, const _Ty& _Max_val) {
// returns _Val constrained to [_Min_val, _Max_val]
return _STD clamp(_Val, _Min_val, _Max_val, less{});
}

Unfortunately this has the side-effect of being harder to optimize for MSVC.

For T = int, std::clamp produces:

int std_clamp(int,int,int) PROC
        cmp     r8d, ecx
        jge     SHORT $LN6@std_clamp
        mov     eax, r8d
        ret     0
$LN6@std_clamp:
        cmp     ecx, edx
        cmovl   ecx, edx
        mov     eax, ecx
        ret     0
int std_clamp(int,int,int) ENDP

As MSVC appears to have specific optimizations to compile ternaries into conditional assignments (cmov), an implementation based on std::min/max produces a version without branching:

template<class T>
constexpr const T& my_clamp(const T& v, const T& lo, const T& hi) {
    return std::max(lo, std::min(hi, v));
}
int min_max_clamp(int,int,int) PROC
        cmp     edx, r8d
        cmovl   r8d, edx
        cmp     ecx, r8d
        cmovl   ecx, r8d
        mov     eax, ecx
        ret     0
int min_max_clamp(int,int,int) ENDP

This advantage holds up even if the comparator for T is more complex, like the one for std::string.
clang doesn't need such special treatment, which is why this could be considered a compiler optimization issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    fixedSomething works now, yay!performanceMust go faster

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions