Debugging Flaky Tests

Oh no, the tests are failing.

Wait, now they aren’t.

Wait, only when I re-run the test it works.

Wait, I just ran it from scratch and it worked this time!

Uuugh, every time I run the failing test in isolation it works. EVERY time. But when I run it in context it sometimes fails. Why?

Does this sound familiar? I hope not, but it probably does. It can be tempting to just push up the latest change and mash “re-run” on the CI button until a green check shows up, then say “oh yeah tests are flakey and unreliable”. But what can we do to actually solve this problem?

I ran into this problem today. Below I’ll drop an outline of the steps I took to find and resolve the problem.

1. Find the failure state

First thing to do is find a state where the failure happens. I’ve got a test that succeeds every time I run it in isolation, but fails occasionally when the suite is run. Running the whole suite is too slow for me to want to do repeatedly, but running the test alone never produces the error.

I’ve identified that my failure state is when the test is run in a group with other tests.

2. Decrease Iteration Time

Now, in order to find the reason for the test that fails occasionally, I want to find out how I can reduce the time it takes for me to find a failure. I said above that running the whole suite was too slow.

I try using the it.only modifier on just the failing test and ran the suite a bunch of times. No failures. So I add .only to the test immediately preceding the failing one. Then I begin re-running the suite again. Success! The test fails occasionally when its immediate neighbor is run first. We can now produce the failure with a fast iteration time.

FAIL  src/view_state/ActiveEntryViewState.test.ts > active entry view state > ignores update and shows an error if the model is missing for some reason

AssertionError: expected null to be 'entry-not-found' // Object.is equality

- Expected:
"entry-not-found"

+ Received:
null

3. Check for shared state

Often when tests succeed in isolation, but fail in a group, it’s because of some state that’s shared between the tests. In my case, both tests are working with a class called activeEntryViewState:

it.only("ignores update if a different entry has been selected", async () => {
  const { viewStates, activeEntryState, entry, classes } = await freshState();
  // ...other code
  const blocks = activeEntryState.blocksFromEntryModel;
  expect(blocks.length).toBeGreaterThan(0);
});

it.only("ignores update and shows an error if the model is missing for some reason", async () => {
  const { viewStates, activeEntryState, entry } = await freshState();
  // ...other code
  expect(activeEntryState.error).toBe("entry-not-found");
});

But, at the start of each test we I can see that a new test state is constructed inside the test scope. activeEntryState should be a different copy for each test. Let’s make sure of that using a trick I blogged about in another post that assigns a random number to each instance of an object in memory:

I add a few log lines to my tests, prefixed with “ZZZ” so that they’re easy to find and clean up later:

it.only("ignores update if a different entry has been selected", async () => {
  const { viewStates, activeEntryState, entry, classes } = await freshState();
  
  // ... other code
  logUniqueObjectID("ZZZ activeEntryState in test 1", activeEntryState);

  const blocks = activeEntryState.blocksFromEntryModel;
  expect(blocks.length).toBeGreaterThan(0);
});

it.only("ignores update and shows an error if the model is missing for some reason", async () => {
  const { viewStates, activeEntryState, entry } = await freshState();
  
  // ...other code
  logUniqueObjectID("ZZZ activeEntryState in test 2", activeEntryState);
  
  expect(activeEntryState.error).toBe("entry-not-found");
});

Then I run the tests and make sure my two activeEntryState bindings are actually pointing to different objects in memory:

ZZZ activeEntryState in test 1: 0b237c7d
ZZZ activeEntryState in test 2: 5a283579

Yep, different objects. So my problem probably isn’t shared state between tests. Time to look deeper.

4. Follow the Logic

I’ll take a deeper look at the logic of the assertions in the failing test, and then the functions involved in the tests operation. Here’s the function I’m calling on the activeEntryState to simulate my test condition, and the assertions that follow:

  // This is the test call we'll be looking into
  activeEntryState._for_test_forceEntryUpdate(null);

  // This assertion always passes
  await waitFor(() => {
    expect(activeEntryState._for_test_skippedLastUpdateBecause).toBe(
      "null_entry_model",
    );
  });

  // This assertion sometimes fails
  expect(activeEntryState.error).toBe("entry-not-found");

The activeEntryState has a function just for testing that modifies some state, and a property that’s there for the tests to assert on. The last assertion, on the error property is the one that’s failing. Sometimes it’s as expected, but sometimes I’m seeing null in that property instead of "entry-not-found". This is a helpful hint. I’ll look at the function under test on my activeEntryState class.

_for_test_forceEntryUpdate just calls a private function, gotNewEntry, under the hood. I’ll see what it does:

// ...
if (!entryModel) {
  this.error = "entry-not-found";
  this.loading = false;
  this._for_test_skippedLastUpdateBecause = "null_entry_model";
  console.warn(`ZZZ ${getObjectIdFor(this)} gotNewEntry null entry model`);
  return;
}
// ...

That’s the first thing to happen inside that function! I see both bits of state that I’m asserting on getting set, and then there’s a return after. How could the test possibly be failing?

But wait, before getting too worked up, I’ll keep looking.

const blocks = this.getBlocksFromEntry(entryModel, true);
if (blocks === "error") {
  // ...
} else {
  this.mostRecentBlocks = blocks;
  this.error = null;
  this.entryModel = entryModel;
  this.loading = false;
  const entryFeatureFlags = this.entryModel?.featureFlags;
  if (entryFeatureFlags) {
    this.allFeatureFlagsValid =
      validateEntryFeatureFlags(entryFeatureFlags);
  }
}

Aha! At the very end of this function, if there are blocks in the entry we set error = null, that’s what I’m seeing sometimes in my test! Is it possible that this function is getting called multiple times and changing its state before my test assertion runs?

ZZZ entrymodel is null? true
ZZZ null entry model
ZZZ entrymodel is null? false
ZZZ setting blocks

Waahooo! My hypothesis was right. Thus function is getting called twice during the test. There’s also a subtle trickiness to observe if I look at the assertion again:

// After a bit, we should see that the update from the database was ignored
await waitFor(() => {
  expect(activeEntryState._for_test_skippedLastUpdateBecause).toBe(
    "null_entry_model",
  );
});

expect(activeEntryState.error).toBe("entry-not-found");

One of my assertions is in a callback inside of waitFor, and the other is at the top level of the test. These two bits of state are set immediately, one right after the other. If we had put both checks into the waitFor, there wouldn’t be time for a second run of the function under test before the second assertion. But since the first assertion was in an async callback, the runtime has the freedom to process other work between the first assertion and the second assertion.

I’ll dive back into the class under test again and see if anything else is calling the function that we call to clear the entry state.

  private loadEntry(entryGID: GlobalEntryID) {
    this.resetEditor({ loading: true });

    this.unsubscribeFromEntryModel?.();
    this.unsubscribeFromEntryModel = this.entryStore.subToEntry(
      entryGID,
      this.gotNewEntry,
    );

gotNewEntry is the function we were examining before. This seems like a good lead. There’s code here that calls the function whenever a specific entry changes in the database. I’m not totally sure this is the culprit. So just to be sure, I add a call stack to one of my previous log lines:

ZZZ gotNewEntry entrymodel is null? false Error: 
    at ActiveEntryViewState.apply (src/view_state/ActiveEntryViewState.ts:222:7)
    at executeAction (node_modules/mobx/src/core/action.ts:70:19)
    at ActiveEntryViewState@51.gotNewEntry (node_modules/mobx/src/core/action.ts:50:16)
    at Object.next (src/data/stores/EntryStore.ts:72:9)
    at node_modules/dexie/src/live-query/live-query.ts:140:72
    at execInGlobalContext (node_modules/dexie/src/helpers/promise.js:735:13)
    at node_modules/dexie/src/live-query/live-query.ts:140:11

There’s the proof. A live query from Dexie.js is calling gotNewEntry in response to a database change.

5. Fix the Test

In this case, I don’t really mind that the tested function is being called more than once, as long as my checks happen at the right time and the state I care about is verified. The fix for this test is simple, I’ll move the assertion that occasionally breaks into the waitFor block alongside the assertion that always succeeds:

await waitFor(() => {
  expect(activeEntryState._for_test_skippedLastUpdateBecause).toBe(
    "null_entry_model",
  );
  expect(activeEntryState.error).toBe("entry-not-found");
  expect(activeEntryState.loading).toBe(false);
});

And now the tests pass every time ????

Claude Code Hooks for Fun and Focus
January 9, 2026

iOS 16+ List Row Separators
April 9, 2025

Persuasive Resumes
April 7, 2025

Claude Code Hooks for Fun and Focus
January 9, 2026

iOS 16+ List Row Separators
April 9, 2025

Persuasive Resumes
April 7, 2025

Debugging Flaky Tests
November 1, 2024

1. Find the failure state

2. Decrease Iteration Time

3. Check for shared state

4. Follow the Logic

5. Fix the Test

Leave a Reply Cancel reply

Read Next

Cannot find named export
November 12, 2024

Debugging a Module Resolution Problem in Node
November 22, 2024

AI’s Not Trustworthy, But It Is Testworthy
March 10, 2025

Persuasive Resumes
April 7, 2025

Debugging Flaky TestsNovember 1, 2024

1. Find the failure state

2. Decrease Iteration Time

3. Check for shared state

4. Follow the Logic

5. Fix the Test

Leave a Reply Cancel reply

Read Next

Debugging Flaky Tests
November 1, 2024