clamp cmov codegen by AlexGuteniev · Pull Request #2336 · microsoft/STL

AlexGuteniev · 2021-11-12T18:54:15Z

clamp in terms of min / max is faster ✨

but let's specialize for std::less and integers/pointers, to avoid pessimizing cases where cmov is not applicable or comparison is expensive

Resolves #2334

Proved by benchmark that it worth doing for interegers, and for /fp:fast floats.

benchmark.cpp

#include <chrono>
#include <iostream>
#include <random>

template<class T>
const T& clamp_branchless(const T& v, const T& minv, const T& maxv) {
	const T& t = v < minv ? minv : v;
	const T& r = maxv < t ? maxv : t;
	return r;
}

template<class T>
const T& clamp_branchy(const T& v, const T& minv, const T& maxv) {
	if (maxv < v) {
		return maxv;
	}
	if (v < minv) {
		return minv;
	}
	return v;
}


template<typename T, std::size_t N>
void benchmark(T(&array)[N], T& result, const char* descr, int R = 1000000) {
	auto t1 = std::chrono::steady_clock::now();
	for (int i = 0; i < R; i++) {
		for (int j = 0; j < N; j++) {
			result += clamp_branchless<T>(array[j], 300, 700);
		}
	}
	auto t2 = std::chrono::steady_clock::now();
	for (int i = 0; i < R; i++) {
		for (int j = 0; j < N; j++) {
			result += clamp_branchy<T>(array[j], 300, 700);
		}
	}
	auto t3 = std::chrono::steady_clock::now();

	auto branchless_time = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
	auto branchy_time = std::chrono::duration_cast<std::chrono::duration<double>>(t3 - t2);

	std::cout << "Type: " << descr << " branchy: " << branchy_time.count() << " branchless: " << branchless_time.count() << "\n";
}

int    int_in[1000];
int    int_out;
double real_in[1000];
double real_out;

int main() {
	std::mt19937 gen(53342);
	std::uniform_int_distribution<int> int_dis(0, 1000);
	for (int& element : int_in) {
		element = int_dis(gen);
	}

	std::uniform_real_distribution<double> real_dis(0, 1000);
	for (double& element : real_in) {
		element = real_dis(gen);
	}

	benchmark(int_in, int_out, "ints");
	benchmark(real_in, real_out, "reals");
}

Benchmark results

Integer results are:
Type: ints branchy: 2.14611 branchless: 0.701921

For /fp:fast floating point results are:
Type: reals branchy: 1.01462 branchless: 1.01654

For /fp:precise floating point results are:
Type: reals branchy: 1.06364 branchless: 1.0455

Branchless clamp is slower when all inputs are in range:

Benchmark results

Integer results are:
Type: ints branchy: 0.600716 branchless: 0.721154

For /fp:fast floating point results are:
Type: reals branchy: 1.02346 branchless: 1.03637

For /fp:precise floating point results are:
Type: reals branchy: 1.02011 branchless: 1.14736

can't use min/max as can't forward _Pred to two places

Yes I Hate Ternary

pointer to member cannot be less, so can match any scalar

lhecker · 2021-11-14T22:23:20Z

@AlexGuteniev Ah thanks, I didn't see the benchmark.
To be entirely honest I believe that your benchmark has a flaw: It doesn't use the value after the write into the out array, due to which only the throughput is measured, but not the latency. In such cases the branchy version will win, thanks to the enormous pipeline depth of modern CPUs, which allows multiple iterations of the same loop to execute concurrently, hiding the latency of any branches. But I believe it's fairly likely that the result of functions like clamp are used right away more often than not, which makes latency just as important as throughput.

You can approximate the latency better, by changing out_* to be a single int or double and adding onto it.

With out_* being an array (input range 400-600):

Type: ints branchy: 0.423876 branchless: 0.598911
Type: reals branchy: 0.62541 branchless: 0.633118

With out_* being a scalar (input range 400-600):

Type: ints branchy: 0.623527 branchless: 0.526837
Type: reals branchy: 0.822351 branchless: 0.626051

With out_* being a scalar (input range 0-1000):

Type: ints branchy: 0.790643 branchless: 0.545034
Type: reals branchy: 0.845853 branchless: 0.623605

I find the consistency of the branchless benchmark results remarkable and I think it shows why they should be preferred whenever possible.

AlexGuteniev · 2021-11-15T06:09:44Z

@lhecker , thanks, I've made an edit to the benhmark

AlexGuteniev · 2021-11-16T08:11:47Z

But I believe it's fairly likely that the result of functions like clamp are used right away more often than not, which makes latency just as important as throughput.

There's separate std::ranges::clamp. Apparently for that one it is unlikely that results are used right away, as it should go through entire range before the next algorithm is invoked.

strega-nil-ms

Minor test notes, the code itself looks good

Co-authored-by: nicole mazzuca <[email protected]>

…ator

strega-nil-ms · 2022-05-26T22:37:12Z

I've done some benchmarking of my own; the following benchmark.cpp (modeled on yours, @AlexGuteniev):

benchmark.cpp

#include <algorithm>
#include <chrono>
#include <iostream>
#include <random>

#define DO_STRINGIZE(v) L"" #v
#define STRINGIZE(v)    DO_STRINGIZE(v)
#ifndef BRANCHY
#error "forgot to /D BRANCHY"
#endif

struct BenchmarkHeader {
    wchar_t const* branchy = STRINGIZE(BRANCHY);
    wchar_t const* processor;
    wchar_t const* processor_company;
};

struct BenchmarkSettings {
    int size_of_vector  = INT_MIN;
    int runs            = INT_MIN;
    int minimum         = INT_MIN;
    int maximum         = INT_MIN;
    int clamped_minimum = INT_MIN;
    int clamped_maximum = INT_MIN;
    int trials          = INT_MIN;
};

template <class T>
double benchmark(const T* const first, T const* const last, T& result, const BenchmarkHeader& header,
    const BenchmarkSettings& settings) {

    auto start = std::chrono::steady_clock::now();
    for (int i = 0; i < settings.runs; ++i) {
        for (const T* it = first; it != last; ++it) {
            result += std::clamp<T>(*it, settings.clamped_minimum, settings.clamped_maximum);
        }
    }
    auto end = std::chrono::steady_clock::now();
    return std::chrono::duration_cast<std::chrono::duration<double>>(end - start).count() / settings.runs;
}

template <class T>
auto benchmark_type(const BenchmarkHeader& header, const BenchmarkSettings& settings, const wchar_t* type) {
    std::vector<T> input(settings.size_of_vector);
    T output = 0;

    std::wcout << header.branchy << ',' << header.processor << ',' << header.processor_company << ',' << type << ','
               << settings.size_of_vector << ',' << settings.runs << ',' << settings.minimum << ',' << settings.maximum
               << ',' << settings.clamped_minimum << ',' << settings.clamped_maximum;
    for (int i = 0; i < settings.trials; ++i) {
        {
            std::random_device rd;
            auto dist = ([&settings]() {
                if constexpr (std::is_integral_v<T>) {
                    return std::uniform_int_distribution<T>(settings.minimum, settings.maximum);
                } else {
                    return std::uniform_real_distribution<T>(settings.minimum, settings.maximum);
                }
            })();

            std::generate(std::begin(input), std::end(input), [&rd, &dist]() { return dist(rd); });
        }

        auto trial = benchmark(input.data(), input.data() + input.size(), output, header, settings);
        std::wcout << ',' << trial;
    }

    std::wcout << '\n';

    return output;
}

int wmain(const int argc, wchar_t** const argv) {
    static constexpr std::pair<const wchar_t*, const wchar_t*(BenchmarkHeader::*)> header_parameters[] = {
        {L"--processor", &BenchmarkHeader::processor},
        {L"--company", &BenchmarkHeader::processor_company},
    };
    static constexpr std::pair<const wchar_t*, int(BenchmarkSettings::*)> settings_parameters[] = {
        {L"--vector-size", &BenchmarkSettings::size_of_vector},
        {L"--runs", &BenchmarkSettings::runs},
        {L"--min", &BenchmarkSettings::minimum},
        {L"--max", &BenchmarkSettings::maximum},
        {L"--clamp-min", &BenchmarkSettings::clamped_minimum},
        {L"--clamp-max", &BenchmarkSettings::clamped_maximum},
        {L"--trials", &BenchmarkSettings::trials},
    };

    BenchmarkHeader header{};
    BenchmarkSettings settings{};

    for (int arg_i = 1; arg_i < argc; ++arg_i) {
        bool found = false;
        for (auto parm : header_parameters) {
            if (wcscmp(argv[arg_i], parm.first) == 0) {
                if (!(header.*parm.second)) {
                    ++arg_i;
                    header.*parm.second = argv[arg_i];
                } else {
                    std::wcerr << L"passed " << parm.first << L" multiple times\n";
                    return 1;
                }
                found = true;
            }
        }
        for (auto parm : settings_parameters) {
            if (wcscmp(argv[arg_i], parm.first) == 0) {
                if (settings.*parm.second == INT_MIN) {
                    ++arg_i;
                    settings.*parm.second = static_cast<int>(wcstol(argv[arg_i], nullptr, 10));
                } else {
                    std::wcerr << L"passed " << parm.first << L"multiple times\n";
                    return 1;
                }
                found = true;
            }
        }
        if (!found) {
            std::wcerr << L"unrecognized argument " << argv[arg_i] << L'\n';
            return 1;
        }
    }

    for (auto parm : header_parameters) {
        if (!(header.*parm.second)) {
            std::wcerr << L"argument " << parm.first << " not passed\n";
            return 1;
        }
    }
    for (auto parm : settings_parameters) {
        if (settings.*parm.second == INT_MIN) {
            std::wcerr << L"argument " << parm.first << " not passed\n";
            return 1;
        }
    }

    int trials = 5;

    std::wcout << L"Branch?,Processor,Processor Company,Type,N,Runs,Min,Max,Clamped Min,Clamped Max";
    for (int i = 0; i < settings.trials; ++i) {
        std::wcout << L',' << L"Trial " << (i + 1);
    }
    std::wcout << L'\n';

    auto s  = benchmark_type<short>(header, settings, L"short");
    auto i  = benchmark_type<int>(header, settings, L"int");
    auto l  = benchmark_type<long>(header, settings, L"long");
    auto ll = benchmark_type<long long>(header, settings, L"long long");
    auto f  = benchmark_type<float>(header, settings, L"float");
    auto d  = benchmark_type<double>(header, settings, L"double");

    std::wcout << L"\n\noutput:\n";
    std::wcout << L"short,int,long,long long,float,double\n";
    std::wcout << s << ',' << i << ',' << l << ',' << ll << ',' << f << ',' << d << '\n';
}

And got the following raw data:

raw data

Branch?,Processor,Processor Company,Type,N,Runs,Min,Max,Clamped Min,Clamped Max,Trial 1,Trial 2,Trial 3,Trial 4,Trial 5,Average
branchless,x64,intel,short,50,000,000,10,-1000,1000,-300,300,0.0307597,0.0307724,0.0309739,0.0309177,0.0315486,0.03099446
branchless,x64,intel,int,50,000,000,10,-1000,1000,-300,300,0.0261614,0.0262558,0.0266834,0.0265467,0.0261862,0.0263667
branchless,x64,intel,long,50,000,000,10,-1000,1000,-300,300,0.0325709,0.0323946,0.0322038,0.0322979,0.0326036,0.03241416
branchless,x64,intel,long long,50,000,000,10,-1000,1000,-300,300,0.0382545,0.038413,0.0378916,0.0382268,0.0380203,0.03816124
branchless,x64,intel,float,50,000,000,10,-1000,1000,-300,300,0.0581218,0.0568805,0.0568683,0.0571446,0.0570825,0.05721954
branchless,x64,intel,double,50,000,000,10,-1000,1000,-300,300,0.0627476,0.0614245,0.0614438,0.0616488,0.0618331,0.06181956
branchless,x64,intel,short,50,000,000,10,-1000,1000,-1000,1000,0.030247,0.0301774,0.0304275,0.0304509,0.0306174,0.03038404
branchless,x64,intel,int,50,000,000,10,-1000,1000,-1000,1000,0.026132,0.0263663,0.0261524,0.0260969,0.0262399,0.0261975
branchless,x64,intel,long,50,000,000,10,-1000,1000,-1000,1000,0.0356146,0.0322538,0.032345,0.0322297,0.0326115,0.03301092
branchless,x64,intel,long long,50,000,000,10,-1000,1000,-1000,1000,0.0382861,0.0439842,0.0382186,0.0385868,0.041021,0.04001934
branchless,x64,intel,float,50,000,000,10,-1000,1000,-1000,1000,0.0567044,0.0568411,0.0570752,0.05716,0.057088,0.05697374
branchless,x64,intel,double,50,000,000,10,-1000,1000,-1000,1000,0.0618878,0.0612712,0.061348,0.0615576,0.0614417,0.06150126
branchless,x64,intel,short,50,000,000,10,-1000,1000,0,1000,0.0304006,0.030353,0.0302131,0.0304818,0.0312518,0.03054006
branchless,x64,intel,int,50,000,000,10,-1000,1000,0,1000,0.0261579,0.0266862,0.0264275,0.0261135,0.0260856,0.02629414
branchless,x64,intel,long,50,000,000,10,-1000,1000,0,1000,0.0325495,0.032575,0.0324561,0.0323098,0.0326873,0.03251554
branchless,x64,intel,long long,50,000,000,10,-1000,1000,0,1000,0.0382069,0.0382973,0.0382747,0.0380733,0.0378058,0.0381316
branchless,x64,intel,float,50,000,000,10,-1000,1000,0,1000,0.0570075,0.0570859,0.0570695,0.0570084,0.0574737,0.057129
branchless,x64,intel,double,50,000,000,10,-1000,1000,0,1000,0.0619618,0.0612276,0.0611233,0.0630977,0.0614704,0.06177616
branchless,x64,intel,short,50,000,000,10,-1000,1000,-1000,0,0.0304815,0.030353,0.0316145,0.0302088,0.0302104,0.03057364
branchless,x64,intel,int,50,000,000,10,-1000,1000,-1000,0,0.0264647,0.0267136,0.0273937,0.0261126,0.0264696,0.02663084
branchless,x64,intel,long,50,000,000,10,-1000,1000,-1000,0,0.032258,0.0323605,0.0325388,0.0325769,0.036068,0.03316044
branchless,x64,intel,long long,50,000,000,10,-1000,1000,-1000,0,0.0380318,0.0399791,0.0381676,0.0379547,0.0378395,0.03839454
branchless,x64,intel,float,50,000,000,10,-1000,1000,-1000,0,0.0571292,0.0573249,0.0572185,0.0569971,0.0568268,0.0570993
branchless,x64,intel,double,50,000,000,10,-1000,1000,-1000,0,0.0635628,0.0614783,0.0615637,0.0612266,0.0615011,0.0618665
branch,x64,intel,short,50,000,000,10,-1000,1000,-300,300,0.131582,0.131115,0.131612,0.130697,0.130677,0.1311366
branch,x64,intel,int,50,000,000,10,-1000,1000,-300,300,0.141665,0.142679,0.141217,0.142365,0.142126,0.1420104
branch,x64,intel,long,50,000,000,10,-1000,1000,-300,300,0.138849,0.139846,0.138676,0.139194,0.139244,0.1391618
branch,x64,intel,long long,50,000,000,10,-1000,1000,-300,300,0.144001,0.147541,0.142996,0.141286,0.143063,0.1437774
branch,x64,intel,float,50,000,000,10,-1000,1000,-300,300,0.263469,0.263556,0.263109,0.263324,0.263803,0.2634522
branch,x64,intel,double,50,000,000,10,-1000,1000,-300,300,0.269243,0.268038,0.267364,0.268133,0.268234,0.2682024
branch,x64,intel,short,50,000,000,10,-1000,1000,-1000,1000,0.0240649,0.0238211,0.0239027,0.0240058,0.0244858,0.02405606
branch,x64,intel,int,50,000,000,10,-1000,1000,-1000,1000,0.0288381,0.026317,0.0260632,0.0259924,0.0262759,0.02669732
branch,x64,intel,long,50,000,000,10,-1000,1000,-1000,1000,0.0294482,0.0279469,0.0305684,0.03038,0.0291242,0.02949354
branch,x64,intel,long long,50,000,000,10,-1000,1000,-1000,1000,0.0328927,0.0343112,0.0333807,0.0329204,0.0335131,0.03340362
branch,x64,intel,float,50,000,000,10,-1000,1000,-1000,1000,0.0545511,0.0551195,0.0544184,0.0544971,0.0545425,0.05462572
branch,x64,intel,double,50,000,000,10,-1000,1000,-1000,1000,0.058983,0.0586856,0.0588727,0.0588711,0.0587617,0.05883482
branch,x64,intel,short,50,000,000,10,-1000,1000,0,1000,0.0238642,0.0239384,0.0238281,0.0239371,0.0242335,0.02396026
branch,x64,intel,int,50,000,000,10,-1000,1000,0,1000,0.026815,0.0263101,0.0292427,0.0269946,0.0263549,0.02714346
branch,x64,intel,long,50,000,000,10,-1000,1000,0,1000,0.0281906,0.0287167,0.0286152,0.0295415,0.0294489,0.02890258
branch,x64,intel,long long,50,000,000,10,-1000,1000,0,1000,0.0330974,0.0374702,0.0333578,0.0336516,0.0331444,0.03414428
branch,x64,intel,float,50,000,000,10,-1000,1000,0,1000,0.205144,0.205456,0.20791,0.203369,0.202325,0.2048408
branch,x64,intel,double,50,000,000,10,-1000,1000,0,1000,0.20852,0.207115,0.207549,0.207759,0.207784,0.2077454
branch,x64,intel,short,50,000,000,10,-1000,1000,-1000,0,0.152678,0.151846,0.152024,0.151943,0.152173,0.1521328
branch,x64,intel,int,50,000,000,10,-1000,1000,-1000,0,0.15962,0.159133,0.159042,0.158947,0.15809,0.1589664
branch,x64,intel,long,50,000,000,10,-1000,1000,-1000,0,0.1637,0.164116,0.163417,0.164494,0.166192,0.1643838
branch,x64,intel,long long,50,000,000,10,-1000,1000,-1000,0,0.161416,0.163301,0.160363,0.160596,0.160196,0.1611744
branch,x64,intel,float,50,000,000,10,-1000,1000,-1000,0,0.195259,0.196052,0.196406,0.195528,0.197289,0.1961068
branch,x64,intel,double,50,000,000,10,-1000,1000,-1000,0,0.198275,0.198893,0.198614,0.200076,0.198825,0.1989366

Which I have bashed into the following:

Type	CMin	CMax	Branch	Branchless	Branchless Faster
short	-1000	1000	0.02405606	0.03038404	79.17%
int	-1000	1000	0.02669732	0.0261975	101.91%
long	-1000	1000	0.02949354	0.03301092	89.34%
long long	-1000	1000	0.03340362	0.04001934	83.47%
float	-1000	1000	0.05462572	0.05697374	95.88%
double	-1000	1000	0.05883482	0.06150126	95.66%
short	0	1000	0.02396026	0.03054006	78.46%
int	0	1000	0.02714346	0.02629414	103.23%
long	0	1000	0.02890258	0.03251554	88.89%
long long	0	1000	0.03414428	0.0381316	89.54%
float	0	1000	0.2048408	0.057129	358.56%
double	0	1000	0.2077454	0.06177616	336.29%
short	-300	300	0.1311366	0.03099446	423.10%
int	-300	300	0.1420104	0.0263667	538.60%
long	-300	300	0.1391618	0.03241416	429.32%
long long	-300	300	0.1437774	0.03816124	376.76%
float	-300	300	0.2634522	0.05721954	460.42%
double	-300	300	0.2682024	0.06181956	433.85%
short	-1000	0	0.1521328	0.03057364	497.59%
int	-1000	0	0.1589664	0.02663084	596.93%
long	-1000	0	0.1643838	0.03316044	495.72%
long long	-1000	0	0.1611744	0.03839454	419.78%
float	-1000	0	0.1961068	0.0570993	343.45%
double	-1000	0	0.1989366	0.0618665	321.56%

You'll note that there are two cases where branchy wins - the "always in range" case, and the "always in the upper half of the range (for integers)" case. However, it only wins by <20% (normally 5-10%), and the branchless case wins by 3-5x in the cases it's better.

Also, branchless is extremely consistent. The standard deviation of branchy is 0.08, while the standard deviation of branchless is 0.013.

Given this, I am in favor of this change. Marking decision needed for the rest of the team to take a look

lhecker · 2022-05-27T18:33:06Z

@strega-nil-ms Thanks for these exhaustive benchmarks!

Coincidentally there was a Reddit post about cmov just recently and I really liked one comment in particular:

A correctly-predicted branch is slightly faster than a cmov. An incorrectly-predicted branch is incredibly slower than a cmov.

I found that very fitting and I'm glad to see this to be seemingly confirmed. 🙂

StephanTLavavej · 2022-05-27T22:52:27Z

FYI @strega-nil-ms, I pushed minor changes after you approved.

StephanTLavavej · 2022-06-01T03:35:54Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2022-06-02T00:21:28Z

🗜️ 🎉 🗜️

Co-authored-by: nicole mazzuca <[email protected]> Co-authored-by: Stephan T. Lavavej <[email protected]>

AlexGuteniev added 2 commits November 12, 2021 20:29

Try to resolve microsoft#2334

de276fc

min / max magix

d9de2d2

miscco reviewed Nov 12, 2021

View reviewed changes

Comment thread stl/inc/algorithm Outdated

AlexGuteniev added 10 commits November 12, 2021 20:59

accursed code

c040e87

even more accursed

0fa3a95

move only pred there

be1e3c7

and still use debug pred

d2818d1

forward!

531b802

still ternary. and temporary.

b845bf2

can't use min/max as can't forward _Pred to two places

still ternary. and temporary.

d0abcab

can't use min/max as can't forward _Pred to two places

Specialized version.

c4b5b8b

Yes I Hate Ternary

-newline

0d8b3bc

More types

409f4f5

AlexGuteniev marked this pull request as ready for review November 12, 2021 20:15

AlexGuteniev requested a review from a team as a code owner November 12, 2021 20:15

AlexGuteniev added 3 commits November 12, 2021 22:59

deeper check to cover more cases

da12431

still check bounds

9830d2d

remove_cvref_t is C++20

9afc554

StephanTLavavej added the performance Must go faster label Nov 12, 2021

StephanTLavavej requested changes Nov 12, 2021

View reviewed changes

Comment thread stl/inc/algorithm Outdated

Comment thread stl/inc/algorithm Outdated

Comment thread stl/inc/algorithm Outdated

Comment thread stl/inc/algorithm Outdated

Comment thread stl/inc/algorithm

Comment thread stl/inc/algorithm Outdated

AlexGuteniev added 2 commits November 13, 2021 13:43

Improvements

ded2f08

a test

330a34c

AlexGuteniev requested a review from StephanTLavavej November 13, 2021 13:59

AlexGuteniev added 7 commits November 13, 2021 16:05

shorter

87e4af0

wrong changelist

a128c35

remove one weird preprocessor trick

19cc183

-comment

25c7c51

branchless every float

6c08e0c

simplify

acad229

simpler!

ca5d919

pointer to member cannot be less, so can match any scalar

timsong-cpp reviewed Nov 14, 2021

View reviewed changes

Comment thread stl/inc/algorithm

_DEBUG_LT_PRED

1e4e5dd

StephanTLavavej self-assigned this Dec 1, 2021

AlexGuteniev marked this pull request as draft December 2, 2021 00:42

AlexGuteniev marked this pull request as ready for review December 3, 2021 11:33

strega-nil-ms suggested changes May 24, 2022

View reviewed changes

Comment thread tests/std/tests/GH_002334_branchless_clamp/test.cpp Outdated

Comment thread tests/std/tests/GH_002334_branchless_clamp/test.cpp Outdated

Comment thread tests/std/tests/GH_002334_branchless_clamp/test.cpp Outdated

AlexGuteniev and others added 2 commits May 25, 2022 08:13

Apply suggestions from code review

92b55c7

Co-authored-by: nicole mazzuca <[email protected]>

Merge remote-tracking branch 'upstream/main' into i_hate_ternary_oper…

cce919c

…ator

AlexGuteniev requested a review from strega-nil-ms May 25, 2022 19:44

StephanTLavavej assigned strega-nil-ms and unassigned StephanTLavavej May 25, 2022

strega-nil-ms added the decision needed We need to choose something before working on this label May 26, 2022

strega-nil-ms approved these changes May 26, 2022

View reviewed changes

StephanTLavavej unassigned strega-nil-ms May 27, 2022

StephanTLavavej reviewed May 27, 2022

View reviewed changes

Code review feedback.

cf41f2b

StephanTLavavej approved these changes May 27, 2022

View reviewed changes

StephanTLavavej self-assigned this Jun 1, 2022

StephanTLavavej removed the decision needed We need to choose something before working on this label Jun 2, 2022

StephanTLavavej merged commit ae710e7 into microsoft:main Jun 2, 2022

AlexGuteniev deleted the i_hate_ternary_operator branch June 2, 2022 05:32

fsb4000 pushed a commit to fsb4000/STL that referenced this pull request Aug 13, 2022

clamp cmov codegen (microsoft#2336)

6756834

Co-authored-by: nicole mazzuca <[email protected]> Co-authored-by: Stephan T. Lavavej <[email protected]>

Conversation

AlexGuteniev commented Nov 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lhecker commented Nov 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

AlexGuteniev commented Nov 15, 2021

Uh oh!

AlexGuteniev commented Nov 16, 2021

Uh oh!

strega-nil-ms left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

strega-nil-ms commented May 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhecker commented May 27, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StephanTLavavej commented May 27, 2022

Uh oh!

StephanTLavavej commented Jun 1, 2022

Uh oh!

StephanTLavavej commented Jun 2, 2022

🗜️ 🎉 🗜️

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

AlexGuteniev commented Nov 12, 2021 •

edited

Loading

lhecker commented Nov 14, 2021 •

edited

Loading

strega-nil-ms commented May 26, 2022 •

edited

Loading