Skip to content

Conversation

@jsnajdr
Copy link
Member

@jsnajdr jsnajdr commented Nov 5, 2025

This PR optimizes the algorithm that adjusts the theme seed color if contrast requirements can't be met. This algorithm makes the seed color slightly lighter so that it contrasts sufficiently well with black, or slightly darker for contrast with white. And finds the minimal adjustment needed.

The results can be measured with the "background ramp" unit test:

npm run test:unit packages/theme/src/color-ramps/test/index.test.ts

I expanded the test to include a full matrix of colors. Before this PR, the timing is 3700ms, after this PR it's 3200ms, a 14% improvement.

The PR includes a lot of bugfixes and tweaks that I will comment on individually. I also have ideas to make it even faster, but I wanted to share what I already have.

@jsnajdr jsnajdr requested review from aduth and mirka November 5, 2025 11:54
@jsnajdr jsnajdr self-assigned this Nov 5, 2025
@jsnajdr jsnajdr added [Type] Enhancement A suggestion for improvement. [Package] Theme /packages/theme labels Nov 5, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on with the line additions in this file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we've expanded the tests, but it still feels a bit excessive. Are we drawing much value from the snapshots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that there is a lot of useful information in these tests, it's just that snapshots are not a good format for them, as they are too big and the useful information is hard to see. We should perform a set of checks on the output instead of just a snapshot diff.

Some valuable things you can see in the snapshot diff:

  • some colors now have warning: true instead of warning: false. I fixed some as they were caused by a bug in the algorithm, but there are a few remaining. I need to check if it's just a rounding error (overflowing by less than CONTRAST_EPSILON should be fine) or another bug. There should be expect calls that verify warning === false explicitly.
  • after algorithm change, colors often change. There is a harmless kind of change just by one point, like from #a6a6a6 to #a7a7a7. The algorithms are inherently approximate, they iterate towards the precise value and stop when reaching certain precision. On the other hand, I'd like to be able to flag big changes, these are unwanted. Not sure yet how to do that.
  • instead of snapshotting the precise colors, we should check if the generated ramps really satisfy the contrast requirements. We have a checkAccessibleCombinations function for that, but it's now used only in Storybook.
  • many colors have changed from something almost white or black, like #fefefe or #010000, to pure #fff and #000. That's a sign that the patch improved something 🙂 These colors are typically the extreme ones, they have the biggest contrast against the seed color from the entire ramp. fgSurface4 is one of them. And it's right that we managed to push them all the way to black/white. When the algorithm used to stop just one point short, it means that it did some rounding or whatever incorrectly.

So, I agree that the current snapshots are excessive, and would like to find a way to convert the above points to expect statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • some colors now have warning: true instead of warning: false. I fixed some as they were caused by a bug in the algorithm, but there are a few remaining. I need to check if it's just a rounding error (overflowing by less than CONTRAST_EPSILON should be fine) or another bug. There should be expect calls that verify warning === false explicitly.

This makes sense to me that we should be asserting directly. I'd personally worry that people wouldn't be paying attention to this in snapshot changes.

  • after algorithm change, colors often change. There is a harmless kind of change just by one point, like from #a6a6a6 to #a7a7a7. The algorithms are inherently approximate, they iterate towards the precise value and stop when reaching certain precision. On the other hand, I'd like to be able to flag big changes, these are unwanted. Not sure yet how to do that.

These changes should also manifest through updates in the prebuilt artifacts, right? I think we should also have a test that makes sure that those are in sync.

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: jsnajdr <[email protected]>
Co-authored-by: aduth <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@jsnajdr jsnajdr force-pushed the optimize/theme-seed-adjustment branch from 0a042d6 to e39c78d Compare November 5, 2025 15:29
@github-actions
Copy link

github-actions bot commented Nov 5, 2025

Flaky tests detected in e136698.
Some tests passed with failed attempts. The failures may not be related to this commit but are still reported for visibility. See the documentation for more information.

🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/19130645907
📝 Reported issues:

} as const;

// Lightness precision we aim for. Approximately 1/256, resolution of an 8-bit number.
export const LIGHTNESS_EPSILON = 4e-3;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming this constant because it's always used to check contrast values. Contrast between two colors, or a contrast "deficit" of a color pair.

{
lightnessConstraint,
taperChromaOptions,
strict = true,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the strict param because it was never used. If true, it caused an exception to be thrown if a matching color can't be found. But we don't need that, we return a lot of information about why a match wasn't found: achieved, reached, deficit, ...

// Set the boundary based on the direction.
const mostContrastingL = direction === 'lighter' ? 1 : 0;
const mostContrastingColor = direction === 'lighter' ? WHITE : BLACK;
const highestContrast = getContrast( reference, mostContrastingColor );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this block few lines up because we use the highestContrast value to calculate the deficit return field.

// If seed has low contrast vs reference, adjusting seed has high impact
// If seed has high contrast vs reference, adjusting seed has low impact
const impactWeight = 1 / getContrast( seed, referenceColor );
const weightedDeficit = deficitVsTarget * impactWeight;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code, weighing the contrast by dividing by the seed-vs-reference contrast, seems wrong to me.

First, seed adjustment helps all constraints equally. If the seed luminance (Y) is halved, the contrast between reference and target approximately doubles. This is universal, no matter where the intermediate "reference" value is.

The right way to measure the deficit in "same units" is to do the opposite: multiply the deficit with seed-vs-reference contrast instead of dividing it. Then the result is always "contrast between seed and target", independent from the intermediate steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deficit calculation has been completely moved to findColorMeetingRequirements.

let betterSeedL = UNSATISFIED_DIRECTION === 'lighter' ? 0 : 1;
let betterSeedL = MAX_DEFICIT_DIRECTION === 'lighter' ? 0 : 1;
let betterDeficit = -MAX_DEFICIT;
let betterReplaced = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same "Illinois bisection" as used in findColorMeetingRequirements. If we continue to use it (I'm not sure about it) I'll abstract it into a common function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about it

What are your reservations about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I implemented another optimization that, when adjusting the seed, recomputes only the few "critical" colors instead of the entire ramp. Yesterday, when it was still only in my head, I wasn't yet sure if there will be really two bisection loops that are similar to each other. It turns out there are, and I no longer have the reservations I had.

Before merging this PR, I'll extract the "solve by bisection" function to an util and also address the other minor feedback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now a solveWithBisect helper that's used at two places.

betterSeedL = get( newSeed, [ OKLCH, 'l' ] );
// Only update toReturn when the ramp satisfies all constraints.
toReturn.ramp = iterationResults.rampResults;
} else if ( UNSATISFIED_DIRECTION !== mainDir ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mreishus recently pointed out that this comparison might be wrong. It's true, we want to compare:

iterationResults.MAX_DEFICIT_DIRECTION === MAX_DEFICIT_DIRECTION

The MAX_DEFICIT_DIRECTION (formerly UNSATISFIED_DIRECTION) value specifies in which direction did the first iteration fail to satisfy contrast constraint. And we want to lighten/darken the seed a little bit. But when doing the bisection, it can happen that whe move the seed too much in the opposite direction, and now it can't find contrastful colors in the opposite direction.

In that case, we want to ignore the deficit value because it's not related to our calculation. It's better hardcoded to -MAX_DEFICIT.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 6, 2025

I pushed one more optimization that has a nice impact. When adjusting the seed, don't recalculate the entire ramp on each iteration, but calculate only the most offending color (typically fgSurface4 for background ramps) and the colors it depends on. After this most offending color reaches target contrast, the adjustment is done. We now calculate only 3 colors instead of 22.

The improvement is from the original 3700ms to just 1900ms, that's 50% 🚀

Copy link
Member

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the explanations, as a lot of this is hard to wrap my head around 😅 From my cursory understanding, this looks good to me, and I'm also judging by lack of any changes in the precompiled build artifacts that this is an effective refactoring.

} )
).toMatchSnapshot();
} );
}, 10000 );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to do something like it.each to avoid increasing the timeout because we're testing so much in the single test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider it.each in a follow-up. At this moment I don't want to do it because:

  1. We would lose the benchmark use case of the test
  2. I still hope that we can get way below the 1900ms time and remove the need for extra timeout completely. We're already quite well below the default 5s limit.

Comment on lines 180 to 181
if ( ! contrast.ignoreWhenAdjustingSeed ) {
if (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we combine these into a single if ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined into a single if.

Comment on lines 72 to 74
let MAX_DEFICIT = -Infinity;
let MAX_DEFICIT_DIRECTION: RampDirection = 'lighter';
let MAX_DEFICIT_STEP = 'none';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but maybe while we're already here: These are not constants (both in the fact that they're function specific and we reassign their value), so I don't think they should be capitalized like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, changed the names to normal maxDeficit-like ones.

};
}

// Return minimal set of steps that are needed to calculate `stepName` from the seed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: For this to be associated in autocomplete, it should be TSDoc-formatted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a full tsdoc block for the function.

Comment on lines 220 to 222
addDeps( stepConfig.contrast.reference );
if ( stepConfig.sameAsIfPossible ) {
addDeps( stepConfig.sameAsIfPossible );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this could get into a loop? Should we also check in the condition above whether result already contains step?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do a simple loop because we are recursing twice, once for contrast.reference and once for sameAsIfPossible. The loop would need to keep some backtracking stack. In practice the recursion is never very deep, at most 3 levels.

Should we also check in the condition above whether result already contains step?

result is a set, so adding step twice is already harmless. But I added additional check to prevent repeated recursion into an already handled step value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is a set, so adding step twice is already harmless. But I added additional check to prevent repeated recursion into an already handled step value.

The recursion was what I was worried about, i should have said "infinite loop" to be clearer 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought you're asking about converting the function from recursion to a for loop 🙂 Loop is now prevented by checking if a given step is already in result. And our ramp configs don't have circular refs anyway 🙂

@jsnajdr jsnajdr force-pushed the optimize/theme-seed-adjustment branch from d2bb404 to 1559ddc Compare November 7, 2025 11:19
@jsnajdr jsnajdr enabled auto-merge (squash) November 7, 2025 11:38
@jsnajdr jsnajdr merged commit 728becd into trunk Nov 7, 2025
40 of 44 checks passed
@jsnajdr jsnajdr deleted the optimize/theme-seed-adjustment branch November 7, 2025 11:57
@github-actions github-actions bot added this to the Gutenberg 22.1 milestone Nov 7, 2025
Copy link
Member

@tyxla tyxla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible this broke some tests on trunk? Funny enough, they pass locally, they only fail in CI.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 9, 2025

Is it possible this broke some tests on trunk? Funny enough, they pass locally, they only fail in CI.

Yes, it's very strange, I also don't see any failures locally. It's like if different machines did some floating-point calculations differently.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 9, 2025

Disabling the tests in #73122.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 10, 2025

This is the diff in the snapshot:

         "input": {
    -       "seedComputed": "#4cb3b3",
    +       "seedComputed": "#4db3b3",
            "seedOriginal": "#4db3b3",
    -       "seedUnchanged": false,
    +       "seedUnchanged": true,
          },

It shows that the actual generated colors are still the same. It's the seed color that's wrong.

The original seed color for this failing ramp is hsl(180deg 40% 50%). In RGB this is rgb(30% 70% 70%) and that's hex #4db3b3. The RGB coordinates are [0.3, 0.7, 0.7] exactly.

The "computed" seed color is the same color after some transformations, namely converting first to OKLCH and only then to RGB. After the extra OKLCH transformation the RGB coordinates are very slightly different:

[ 0.29999999999999943, 0.7000000000000003, 0.7000000000000001 ]

This 1e-14 difference is enough to change the hex string from #4db3b3 to #4cb3b3. A very unpleasant quirk in the formatting algorithm.

Note that the CI run was doing it right: it produced the same string for both cases. It's our local macs that do the "wrong" calculation.

I remember that @aduth and @mirka also ran into some floating-point issues when originally creating the theme package?

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 10, 2025

To make the snapshots stable, we'd need to stop using the hex strings and use the rgb(30% 70% 70%) format instead. Here the rounding would be much more friendly.

For the RGB hex string, the 0.3 value is multiplied 255 and the result is 76.5, exactly between 76 and 77. The 76.5 value is rounded up, but the 76.49999 one is rounded down.

@aduth
Copy link
Member

aduth commented Nov 10, 2025

A couple thoughts at a glance:

  • Is there any way we can round or otherwise normalize the math to avoid these floating point issues? For example, Color.js supports a precision argument on serialize'd values, or setting a default Color.defaults.precision
  • I sort-of wonder if we should even care about these slight differences, if the difference is largely imperceptible and not regressing the color into one which doesn't meet our expected constraints. In other words, whether we should have some tolerance for these differences

Changing to another format could make sense if it's an objectively better format to represent the colors accurately.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 11, 2025

The precision argument is used only for color formats where the numbers are expressed as floating-point. For example, precision makes the difference between rgb(0.3 0.7 0.7) and rgb(0.314 0.716 0723).

For the hex format, the precision is determined by the format itself. It's 1/256 and can't be anything else. And the way the rounding is performed is very unstable. 76.5 is rounded up to 77 (4d), but subtracting just an infinitesimal number leads to 76.49999 and gets rounded down to 76 (4c). For floating point rounding, it can't happen that a tiny change at 1e-15 level produces a jump from 76 to 75.

For our purposes, where we generate colors algorithmically, the hex format with its coarse rounding is not very good. One hex step corresponds to quite a big contrast difference (1.008 to 1.016) and we need to take into account that this rounding is happening. When aiming for the standard-required 4.5 contrast, we need to be aware that it can be rounded to 4.48 and adjust the calculation accordingly.

Specifying the colors as floats is also more "natural" for the OS and the hardware. The calculations are done in floating point. Fancy HDR displays have 10 bits per channel, while hex RGB has just 8 bits.

@mirka
Copy link
Member

mirka commented Nov 14, 2025

I remember that @aduth and @mirka also ran into some floating-point issues when originally creating the theme package?

It's not a machine problem, but a Node version problem. Likely a V8 bug that manifests in Node 24 (according to spec, it's supposed to adhere to IEEE 754 and not differ across implementations). Probably reportable if we isolate a repro.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 14, 2025

It's not a machine problem, but a Node version problem.

In this case, it's something else than Node.js version. Here is a minimal script that demonstrates the issue.

const { parse, serialize, to, OKLCH, sRGB, ColorSpace } = require( 'colorjs.io/fn' );

ColorSpace.register( OKLCH );
ColorSpace.register( sRGB );

const rgb = parse( 'rgb(70% 30% 30%)' );
const ok = to( rgb, OKLCH );
const backRgb = to( ok, sRGB );
console.log( 'rgb', rgb.coords );
console.log( 'backRgb', backRgb.coords );
console.log( 'formatted', serialize( rgb, { format: 'hex' } ), 'vs', serialize( backRgb, { format: 'hex' } ) );

After converting a color from RGB to OKLCH and back, the coordinates are different. And it has a big impact on the hex formatting. Because 0.4999999 rounds down to 0, while 0.5 rounds up to 1.

I tested this on Node.js 20, 22 and 24 and the output is always the same:

rgb [ 0.7, 0.3, 0.3 ]
backRgb [ 0.6999999999999996, 0.3000000000000007, 0.30000000000000016 ]
formatted #b34d4d vs #b24d4d

Yet the CI failures imply that sometimes we get a different result, namely instead of the 0.699... we get something 0.7-ish.

@mirka
Copy link
Member

mirka commented Nov 14, 2025

Is this different from the problem that prompted #73122? The CI failures on trunk after #73004 merged were all only failing on Node 24, so I assumed it was the same issue.

@jsnajdr
Copy link
Member Author

jsnajdr commented Nov 14, 2025

Is this different from the problem that prompted #73122?

No, it's exactly that problem. But the output is the same across Node.js versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Package] Theme /packages/theme [Type] Enhancement A suggestion for improvement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants