Skip to content

Fix duplicate word typos in comments and documentation#4658

Merged
Liam-DeVoe merged 8 commits intoHypothesisWorks:masterfrom
veeceey:fix/typo-combined-fixes
Feb 11, 2026
Merged

Fix duplicate word typos in comments and documentation#4658
Liam-DeVoe merged 8 commits intoHypothesisWorks:masterfrom
veeceey:fix/typo-combined-fixes

Conversation

@veeceey
Copy link
Copy Markdown
Contributor

@veeceey veeceey commented Feb 8, 2026

This PR fixes several duplicate word typos found in comments and documentation:

Changes:

  • database.py: Fixed "the the" → "the" in GitHubArtifactDatabase comment
  • shrinker.py: Fixed "to to" → "to" in shrinker comment
  • changelog.rst: Fixed multiple instances of duplicate "the" and "a" in historical changelog entries

This combines fixes from PRs #4655, #4656, and #4657 as requested by the maintainer.

Combined fixes for:
- database.py: "the the" -> "the"
- shrinker.py: "to to" -> "to"
- changelog.rst: Multiple instances of duplicate "the" and "a"
@Zac-HD
Copy link
Copy Markdown
Member

Zac-HD commented Feb 8, 2026

Let's add a check to our ./build.sh lint rule with git grep -nP '\b([a-z]+)\s+\1\b' -- ':!*.svg' ':!*.png' ':!*.jpg' ':!*.sketch'. There are a few false alarms but I think rephrasing is okay; and while it'll miss cross-line issues I don't think that's worth the trouble.

It looks like the CI failures are due to a formatting issue and an unexpected Z3 build failing, tagging @Liam-DeVoe re #4646.

Add a `git grep` check for duplicate words (e.g. "the the") to the
`./build.sh lint` rule, and fix a redundant-parentheses formatting
issue in shrinker.py that was failing CI check-format.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Comment on lines +95 to +97
sys.exit(1)

dupes = subprocess.run(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not skip the duplicates-check if we happen to violate this too; we can exit(1) just once at the end.

We'll also want to fix existing cases, by either deduplicating them or rephrasing as appropriate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - both points addressed in 6ef9519:

  1. Restructured to use a failed flag so both the dataclass check and the duplicate-word check run before a single exit(1) at the end.
  2. All existing duplicate-word cases across the codebase have been fixed (deduplicated or rephrased).

One small deviation from the suggested pattern: I used ([a-z]{2,}) instead of ([a-z]+) to require at least 2-letter words. With ([a-z]+), there's a false positive in test_patching.py:37 where n n matches across an escaped newline in a string literal (",\n) followed by n=100). That case can't meaningfully be rephrased since it's test data representing actual code formatting. The {2,} minimum avoids single-letter false positives like this while still catching all real duplicate-word typos.

Restructure the duplicate-word check in lint() to set a failure flag
instead of calling sys.exit(1) immediately, so all checks run before
exiting. Fix all existing duplicate-word cases across the codebase by
deduplicating or rephrasing as appropriate. Also require 2+ letter
words in the duplicate-word pattern to avoid false positives from
single-letter matches in escape sequences.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@Liam-DeVoe
Copy link
Copy Markdown
Member

Personally I disagree with adding a lint rule for this because there's no way to locally disable it, and the computer saying that my personal style of writing must be changed to satisfy an arbitrary rule really puts me off.

AI is already good enough to catch this stuff (see: this pr), so let's just wait for such a tool to be available in CI and then run it every few months? Or wait for drive-by contributors? Idk this seems very not worth the CI rule to me.

@Liam-DeVoe
Copy link
Copy Markdown
Member

I think we could whitelist the the, of of, etc. But I'm not a fan of disallowing very very, that that.

@Zac-HD
Copy link
Copy Markdown
Member

Zac-HD commented Feb 9, 2026

Sure, I don't have very strong feelings about this. I'd lean towards adding a grep -v or a negative-lookbehind to exclude allowed-duplicate words, but if you want to take out the check entirely that's fine with me. @Liam-DeVoe tag me if you need an approve on your push to merge or something.

@veeceey
Copy link
Copy Markdown
Contributor Author

veeceey commented Feb 9, 2026

Thanks for the discussion @Zac-HD @Liam-DeVoe! Given the concerns about false positives with legitimate duplicates ('that that', etc.) and no way to locally disable a grep-based check, I think the simplest approach is to just keep the typo fixes in this PR without adding an automated lint rule. Happy to add a check with exclusions if you'd prefer, but dropping it seems like the path of least resistance here.

@veeceey
Copy link
Copy Markdown
Contributor Author

veeceey commented Feb 10, 2026

Based on the discussion above, it sounds like the consensus is to keep just the typo fixes and drop the automated lint rule. I'll push an update to remove the duplicate-word check from tooling/src/hypothesistooling/__main__.py while keeping all the typo corrections across the codebase.

Per discussion with Zac-HD and Liam-DeVoe, the automated
duplicate-word grep check has too many false positives for
legitimate constructs like "that that" and no way to locally
disable it. Keeping just the typo fixes, dropping the lint rule.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown
Member

@Liam-DeVoe Liam-DeVoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the CI check with just the following unambiguously-incorrect double words whitelisted: the as a to because. That gets ~70% of this PR, and we'll just accept the rest as casualties.

I was tempted to use a blacklist, but these two diffs make me think a generic blacklist is not effective enough:

image

Comment thread guides/api-style.rst
veeceey and others added 4 commits February 10, 2026 02:50
…ste changes

Per Liam-DeVoe and Zac-HD's feedback:
- Revert the "that that" -> "the" change in api-style.rst (intentional construct)
- Re-add duplicate-word lint check, but only for unambiguously incorrect
  duplicates: the, as, a, to, because
- Use deferred exit so all lint issues are reported before failing
@Liam-DeVoe Liam-DeVoe enabled auto-merge February 11, 2026 03:54
Copy link
Copy Markdown
Member

@Liam-DeVoe Liam-DeVoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Varun!

@Liam-DeVoe Liam-DeVoe merged commit 2822e66 into HypothesisWorks:master Feb 11, 2026
78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants