Fix duplicate word typos in comments and documentation by veeceey · Pull Request #4658 · HypothesisWorks/hypothesis

veeceey · 2026-02-08T15:39:11Z

This PR fixes several duplicate word typos found in comments and documentation:

Changes:

database.py: Fixed "the the" → "the" in GitHubArtifactDatabase comment
shrinker.py: Fixed "to to" → "to" in shrinker comment
changelog.rst: Fixed multiple instances of duplicate "the" and "a" in historical changelog entries

This combines fixes from PRs #4655, #4656, and #4657 as requested by the maintainer.

Combined fixes for: - database.py: "the the" -> "the" - shrinker.py: "to to" -> "to" - changelog.rst: Multiple instances of duplicate "the" and "a"

Zac-HD · 2026-02-08T17:27:43Z

Let's add a check to our ./build.sh lint rule with git grep -nP '\b([a-z]+)\s+\1\b' -- ':!*.svg' ':!*.png' ':!*.jpg' ':!*.sketch'. There are a few false alarms but I think rephrasing is okay; and while it'll miss cross-line issues I don't think that's worth the trouble.

It looks like the CI failures are due to a formatting issue and an unexpected Z3 build failing, tagging @Liam-DeVoe re #4646.

Add a `git grep` check for duplicate words (e.g. "the the") to the `./build.sh lint` rule, and fix a redundant-parentheses formatting issue in shrinker.py that was failing CI check-format. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Zac-HD · 2026-02-09T01:29:20Z

        sys.exit(1)

+    dupes = subprocess.run(


let's not skip the duplicates-check if we happen to violate this too; we can exit(1) just once at the end.

We'll also want to fix existing cases, by either deduplicating them or rephrasing as appropriate.

Done - both points addressed in 6ef9519:

Restructured to use a failed flag so both the dataclass check and the duplicate-word check run before a single exit(1) at the end.

All existing duplicate-word cases across the codebase have been fixed (deduplicated or rephrased).

One small deviation from the suggested pattern: I used ([a-z]{2,}) instead of ([a-z]+) to require at least 2-letter words. With ([a-z]+), there's a false positive in test_patching.py:37 where n n matches across an escaped newline in a string literal (",\n) followed by n=100). That case can't meaningfully be rephrased since it's test data representing actual code formatting. The {2,} minimum avoids single-letter false positives like this while still catching all real duplicate-word typos.

Restructure the duplicate-word check in lint() to set a failure flag instead of calling sys.exit(1) immediately, so all checks run before exiting. Fix all existing duplicate-word cases across the codebase by deduplicating or rephrasing as appropriate. Also require 2+ letter words in the duplicate-word pattern to avoid false positives from single-letter matches in escape sequences. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Liam-DeVoe · 2026-02-09T02:07:03Z

Personally I disagree with adding a lint rule for this because there's no way to locally disable it, and the computer saying that my personal style of writing must be changed to satisfy an arbitrary rule really puts me off.

AI is already good enough to catch this stuff (see: this pr), so let's just wait for such a tool to be available in CI and then run it every few months? Or wait for drive-by contributors? Idk this seems very not worth the CI rule to me.

Liam-DeVoe · 2026-02-09T02:44:47Z

I think we could whitelist the the, of of, etc. But I'm not a fan of disallowing very very, that that.

Zac-HD · 2026-02-09T09:05:45Z

Sure, I don't have very strong feelings about this. I'd lean towards adding a grep -v or a negative-lookbehind to exclude allowed-duplicate words, but if you want to take out the check entirely that's fine with me. @Liam-DeVoe tag me if you need an approve on your push to merge or something.

veeceey · 2026-02-09T20:37:21Z

Thanks for the discussion @Zac-HD @Liam-DeVoe! Given the concerns about false positives with legitimate duplicates ('that that', etc.) and no way to locally disable a grep-based check, I think the simplest approach is to just keep the typo fixes in this PR without adding an automated lint rule. Happy to add a check with exclusions if you'd prefer, but dropping it seems like the path of least resistance here.

veeceey · 2026-02-10T07:38:43Z

Based on the discussion above, it sounds like the consensus is to keep just the typo fixes and drop the automated lint rule. I'll push an update to remove the duplicate-word check from tooling/src/hypothesistooling/__main__.py while keeping all the typo corrections across the codebase.

Per discussion with Zac-HD and Liam-DeVoe, the automated duplicate-word grep check has too many false positives for legitimate constructs like "that that" and no way to locally disable it. Keeping just the typo fixes, dropping the lint rule. Co-Authored-By: Claude Opus 4.6 <[email protected]>

Liam-DeVoe

Let's keep the CI check with just the following unambiguously-incorrect double words whitelisted: the as a to because. That gets ~70% of this PR, and we'll just accept the rest as casualties.

I was tempted to use a blacklist, but these two diffs make me think a generic blacklist is not effective enough:

…ste changes Per Liam-DeVoe and Zac-HD's feedback: - Revert the "that that" -> "the" change in api-style.rst (intentional construct) - Re-add duplicate-word lint check, but only for unambiguously incorrect duplicates: the, as, a, to, because - Use deferred exit so all lint issues are reported before failing

Liam-DeVoe

Thanks Varun!

Fix duplicate word typos in comments and documentation

fc7d767

Combined fixes for: - database.py: "the the" -> "the" - shrinker.py: "to to" -> "to" - changelog.rst: Multiple instances of duplicate "the" and "a"

veeceey requested review from Liam-DeVoe and Zac-HD as code owners February 8, 2026 15:39

This was referenced Feb 8, 2026

Fix duplicate 'to' in shrinker.py comment #4656

Closed

Fix duplicate 'the' in database.py comment #4655

Closed

Fix duplicate words in changelog.rst #4657

Closed

Zac-HD reviewed Feb 9, 2026

View reviewed changes

Liam-DeVoe reviewed Feb 10, 2026

View reviewed changes

Comment thread guides/api-style.rst

veeceey and others added 4 commits February 10, 2026 02:50

revert more matters of taste

26fee6a

format

19d47ce

clean up

d5443c9

Liam-DeVoe enabled auto-merge February 11, 2026 03:54

Liam-DeVoe approved these changes Feb 11, 2026

View reviewed changes

Liam-DeVoe merged commit 2822e66 into HypothesisWorks:master Feb 11, 2026
78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate word typos in comments and documentation#4658

Fix duplicate word typos in comments and documentation#4658
Liam-DeVoe merged 8 commits intoHypothesisWorks:masterfrom
veeceey:fix/typo-combined-fixes

veeceey commented Feb 8, 2026

Uh oh!

Zac-HD commented Feb 8, 2026

Uh oh!

Zac-HD Feb 9, 2026

Uh oh!

veeceey Feb 9, 2026

Uh oh!

Liam-DeVoe commented Feb 9, 2026

Uh oh!

Liam-DeVoe commented Feb 9, 2026

Uh oh!

Zac-HD commented Feb 9, 2026

Uh oh!

veeceey commented Feb 9, 2026

Uh oh!

veeceey commented Feb 10, 2026

Uh oh!

Liam-DeVoe left a comment

Uh oh!

Uh oh!

Liam-DeVoe left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

veeceey commented Feb 8, 2026

Uh oh!

Zac-HD commented Feb 8, 2026

Uh oh!

Zac-HD Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

veeceey Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Liam-DeVoe commented Feb 9, 2026

Uh oh!

Liam-DeVoe commented Feb 9, 2026

Uh oh!

Zac-HD commented Feb 9, 2026

Uh oh!

veeceey commented Feb 9, 2026

Uh oh!

veeceey commented Feb 10, 2026

Uh oh!

Liam-DeVoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Liam-DeVoe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants