Skip to content

Conversation

@davidperezgar
Copy link
Member

@davidperezgar davidperezgar commented Aug 3, 2025

Implements language detection for readme.txt files to ensure compliance with WordPress.org requirement that readmes be written in standard English.

Implementation

Uses patrickschur/language-detection library to detect language with enhanced preprocessing:

Enhanced Detection Features

  • ✅ Strips HTML tags, code snippets, URLs, and emails before detection
  • ✅ Minimum content length (30 chars) to avoid false positives on short text
  • ✅ Confidence thresholds for more accurate results
  • ✅ Handles technical content without false positives

Check Locations & Severity

  • Short Description and Full Description: ERROR (severity 7) - must be English

Testing

Added comprehensive test coverage:

  • Fixed existing test case to properly check errors vs warnings
  • Added test for English content (positive case)
  • Added test for edge cases (code, URLs, technical terms)
  • Created dedicated Language_Utils trait tests with 14 test cases
  • Created test plugin for edge case validation

All tests pass linting and PHPStan checks.

Example Output

For non-English content:

ERROR: The readme description contains unofficial language. It must be written in standard English.

Edge Cases Handled

✅ Code snippets and technical documentation
✅ URLs and email addresses
✅ Short descriptions (< 30 chars get benefit of doubt)
✅ HTML formatting
✅ Mixed technical/natural language content
✅ Common WordPress/plugin terminology

Fixes #1009

@davidperezgar davidperezgar linked an issue Aug 3, 2025 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Aug 3, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: davidperezgar <[email protected]>
Co-authored-by: ernilambar <[email protected]>
Co-authored-by: frantorres <[email protected]>
Co-authored-by: swissspidy <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@davidperezgar davidperezgar self-assigned this Aug 3, 2025
@ernilambar
Copy link
Member

Suggestions:

@davidperezgar davidperezgar merged commit b5c35c9 into trunk Nov 30, 2025
25 checks passed
@davidperezgar davidperezgar deleted the 1009-require-users-to-write-readmetxt-in-english branch November 30, 2025 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Require users to write readme.txt in English

4 participants