Skip to content

Add bin/check-links for local link checking#2238

Merged
ihabadham merged 3 commits intomasterfrom
ihabadham/add-bin-check-links
Dec 17, 2025
Merged

Add bin/check-links for local link checking#2238
ihabadham merged 3 commits intomasterfrom
ihabadham/add-bin-check-links

Conversation

@ihabadham
Copy link
Copy Markdown
Collaborator

@ihabadham ihabadham commented Dec 17, 2025

Summary

Adds a bin/check-links script for running lychee link checker locally with the same behavior as CI.

Problem

Running lychee directly locally can produce different results than CI because:

  1. Local globs like *.md match untracked files that CI never sees
  2. Different paths may be checked

Solution

The script uses git ls-files to ensure only git-tracked files are checked, matching the exact paths CI checks:

  • docs/**/*.md
  • *.md (root level only)
  • react_on_rails_pro/docs/**/*.md
  • react_on_rails_pro/*.md

Usage

bin/check-links

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Added an executable link-validation tool to enforce CI-consistent Markdown link checks across documentation, deduplicate targets, and mirror CI behavior for more reliable docs quality control.

✏️ Tip: You can customize this high-level summary in your review settings.

Matches CI behavior exactly by checking only git-tracked, user-facing docs:
- docs/**/*.md
- *.md (root level only)
- react_on_rails_pro/docs/**/*.md
- react_on_rails_pro/*.md

Uses `git ls-files` to ensure untracked files don't cause local/CI mismatches.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 17, 2025

Walkthrough

Adds a new executable Bash script bin/check-links that collects whitelisted Markdown files, deduplicates them, and runs lychee with .lychee.toml to validate links; script uses strict error handling and mirrors CI behavior. (≤50 words)

Changes

Cohort / File(s) Summary
Link checking script
bin/check-links
New executable Bash script. Gathers Markdown files via git ls-files for paths docs/*.md, docs/**/*.md, root *.md, react_on_rails_pro/docs/*.md, react_on_rails_pro/docs/**/*.md, react_on_rails_pro/*.md; deduplicates with sort -u; pipes into lychee --config .lychee.toml --files-from -; uses strict shell options and runs as a single grouped command to match CI.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Verify git ls-files patterns include all intended files (nested docs and root markdowns).
  • Confirm shebang and set -eo pipefail usage (script uses strict error handling).
  • Ensure executable permissions are set.
  • Check behavior with empty input and large file lists; confirm .lychee.toml path and options match CI.

Poem

🐰 I hop through docs at break of day,
Sniff every link along my way.
Lychee sings and crumbs are found,
I tidy paths all safe and sound.
Hooray — the docs can now leap-play!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change—adding a new bin/check-links script for local Markdown link checking with lychee.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ihabadham/add-bin-check-links

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22987b0 and 21d84ed.

📒 Files selected for processing (1)
  • bin/check-links (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/check-links
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: rspec-package-tests (3.4, latest)
  • GitHub Check: build-dummy-app-webpack-test-bundles
  • GitHub Check: examples (3.4, latest)
  • GitHub Check: build-dummy-app-webpack-test-bundles
  • GitHub Check: build
  • GitHub Check: pro-lint-js-and-ruby
  • GitHub Check: precompile-check
  • GitHub Check: claude-review

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Dec 17, 2025

Code Review - PR #2238

Thanks for adding this helpful script! Overall this looks good and solves a real problem.

Strengths

  1. Excellent problem statement - Clear explanation of why this is needed
  2. Good documentation - The inline comments clearly explain the glob patterns
  3. Correct permissions - Script is properly executable (755)
  4. Follows project conventions - Uses shebang like other bin scripts
  5. Uses set -e - Properly fails on errors

Potential Issues

1. Missing -o pipefail for better error handling

The script uses set -e but is missing set -o pipefail. This is important because the script uses a pipeline with grep and sort, and you want to catch failures in any part of the pipeline.

Current: set -e
Recommended: set -euo pipefail

Compare with bin/ci-rerun-failures:9 which uses the stricter settings.

2. Potential issue with grep -v failing when no root .md files exist

The line git ls-files *.md | grep -v / could fail with exit code 1 if there are no root-level .md files (because grep returns 1 when no matches are found). While this is unlikely in this project, it could cause the script to fail unexpectedly.

Possible fix: git ls-files *.md | grep -v / || true

Suggested Improvements

  1. Add help/usage message (--help flag) following pattern from bin/ci-rerun-failures
  2. Check if lychee is installed before running and provide helpful error message
  3. Optional: Add verbose mode for debugging

Required Changes

  1. Add set -euo pipefail for stricter error handling
  2. Consider handling the grep -v / edge case

Overall, this is a solid addition to the developer tooling. The required changes are minor, and the optional enhancements would make it more user-friendly to match other bin scripts in the project.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
bin/check-links (2)

11-16: Consider verifying lychee is installed.

The script will fail with a cryptic error if lychee is not installed or not in PATH. Consider adding a check at the beginning of the script to provide a helpful error message.

Add this check after line 9:

set -e

if ! command -v lychee &> /dev/null; then
  echo "Error: lychee is not installed. Install it with: cargo install lychee" >&2
  exit 1
fi

12-12: Optional: Remove redundant glob patterns.

The patterns docs/**/*.md and react_on_rails_pro/docs/**/*.md already match files that would be matched by docs/*.md and react_on_rails_pro/docs/*.md respectively (the ** glob includes zero or more directory levels). While harmless due to sort -u deduplication, the explicit single-level patterns are redundant.

Consider simplifying to:

 {
-  git ls-files 'docs/*.md' 'docs/**/*.md'
+  git ls-files 'docs/**/*.md'
   git ls-files '*.md' | grep -v '/' || true
-  git ls-files 'react_on_rails_pro/docs/*.md' 'react_on_rails_pro/docs/**/*.md'
+  git ls-files 'react_on_rails_pro/docs/**/*.md'
   git ls-files 'react_on_rails_pro/*.md'
 } | sort -u | lychee --config .lychee.toml --files-from -

Also applies to: 14-14

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ef0ff0 and af3b771.

📒 Files selected for processing (1)
  • bin/check-links (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: examples (3.4, latest)
  • GitHub Check: precompile-check
  • GitHub Check: rspec-package-tests (3.4, latest)
  • GitHub Check: build-dummy-app-webpack-test-bundles (3.4, 22, latest)
  • GitHub Check: pro-lint-js-and-ruby
  • GitHub Check: build
  • GitHub Check: build-dummy-app-webpack-test-bundles
  • GitHub Check: build-dummy-app-webpack-test-bundles
  • GitHub Check: claude-review
🔇 Additional comments (2)
bin/check-links (2)

1-8: LGTM! Clear documentation.

The shebang and comments clearly explain the script's purpose and behavior, making it easy to understand which paths are checked.


9-9: Strict error handling is appropriate.

The set -e flag ensures the script fails fast on errors, which is good for a utility script.

Comment thread bin/check-links Outdated
@coderabbitai coderabbitai Bot mentioned this pull request Dec 17, 2025
3 tasks
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Dec 17, 2025

Code Review: bin/check-links

Strengths

  1. Excellent motivation: Solving the real problem of local testing matching CI behavior
  2. Clean implementation: Uses git ls-files to match exactly what CI sees
  3. Good documentation: Clear comments explaining what paths are checked
  4. Proper shell practices: Uses set -e for error handling

Issues Found

1. CRITICAL: Pattern mismatch with CI workflow

The CI workflow (.github/workflows/check-markdown-links.yml:36) uses:
args: --config .lychee.toml docs/ .md react_on_rails_pro/docs/ react_on_rails_pro/.md

Your script has: git ls-files 'docs/.md' 'docs/**/.md'

Problem: The 'docs/.md' pattern only matches files directly in docs/, not subdirectories. This is redundant with 'docs/**/.md' which already matches everything.

Solution: Remove 'docs/.md' from line 12 - keep only 'docs/**/.md'

Same issue on line 14: remove 'react_on_rails_pro/docs/*.md'

2. Missing error handling for lychee

If lychee is not installed, the script will fail with an unclear error.

Suggestion: Add a check before the main command

3. Script doesn't verify it matches CI

Consider testing locally that the file list matches what CI checks

Testing Checklist

Before merging, verify:

  • Run bin/check-links locally and confirm it produces same results as CI
  • Test in a repo without react_on_rails_pro/ to ensure it doesn't error
  • Verify docs/**/*.md actually matches all subdirectory files

Priority

HIGH: Fix the docs/*.md pattern mismatch to ensure script truly matches CI behavior.

Overall, this is a useful addition! Just need to fix the glob patterns. Great work solving a real pain point.

- Add Usage comment for consistency with other bin scripts
- Use set -eo pipefail for stricter error handling in pipelines
- Add || true to grep to prevent failure when no root .md files exist

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@ihabadham
Copy link
Copy Markdown
Collaborator Author

@claude Thanks for the review! However, the "CRITICAL" pattern mismatch issue is actually incorrect.

Testing shows both patterns are needed

$ git ls-files 'docs/*.md' | wc -l
92

$ git ls-files 'docs/**/*.md' | wc -l
83

$ comm -23 <(git ls-files 'docs/*.md' | sort) <(git ls-files 'docs/**/*.md' | sort)
docs/CI_OPTIMIZATION.md
docs/DIRECTORY_LICENSING.md
docs/introduction.md
docs/JS_PRO_PACKAGE_SEPARATION_PLAN.md
docs/LICENSING_FAQ.md
docs/MONOREPO_MERGER_PLAN.md
docs/MONOREPO_MERGER_PLAN_REF.md
docs/README.md
docs/swc-migration.md

In git's pathspec, ** does not include zero directories. So:

  • docs/**/*.md matches only subdirectory files (83 files)
  • docs/*.md matches root-level + subdirectory files (92 files)

The 9 files above (like docs/README.md) are only matched by docs/*.md, not by docs/**/*.md. Removing docs/*.md would break the script.

Other feedback

  1. Lychee installation check - We discussed this and decided it's overkill for a 22-line script. If lychee isn't installed, bash already prints lychee: command not found which is clear enough.

  2. Verify matches CI - Good testing suggestion, will verify before merging.

The script is correct as-is. CodeRabbit made the same incorrect claim about redundant patterns.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Dec 17, 2025

Claude finished @ihabadham's task in 43s —— View job


Code Review Update - bin/check-links

Thank you for the correction! You're absolutely right - I was wrong about the pattern matching.

✅ Correction Acknowledged

Your testing clearly demonstrates that in git pathspec:

  • docs/**/*.md matches only subdirectory files (83 files)
  • docs/*.md matches root-level + subdirectory files (92 files)

The 9 files only matched by docs/*.md (like docs/README.md) prove that ** does NOT include zero directories in git pathspec. Both patterns are indeed needed. My "CRITICAL" issue was incorrect.

Updated Review

Strengths:

  1. ✅ Solves real problem of local testing matching CI behavior
  2. ✅ Correct glob patterns that match all necessary files
  3. ✅ Uses git ls-files to match exactly what CI sees
  4. ✅ Good documentation with clear comments
  5. ✅ Proper error handling with set -eo pipefail (I see this was updated from just set -e)
  6. ✅ Handles edge case where no root .md files exist with || true

Pattern Analysis:

The script correctly translates CI's lychee args into git pathspecs:

  • CI: docs/ → Script: docs/*.md + docs/**/*.md
  • CI: *.md → Script: git ls-files '*.md' | grep -v '/'
  • CI: react_on_rails_pro/docs/ → Script: react_on_rails_pro/docs/*.md + react_on_rails_pro/docs/**/*.md
  • CI: react_on_rails_pro/*.md → Script: react_on_rails_pro/*.md

Optional Improvements (not required):

  1. Lychee installation check - I agree this is overkill for a 22-line utility script. The bash error message is clear enough.
  2. Consider adding a comment explaining why both docs/*.md and docs/**/*.md are needed (since it's non-obvious that ** doesn't include zero dirs)

Recommendation:Approve - Script is correct as-is. The pattern matching is accurate and properly handles all edge cases.


@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Dec 17, 2025

Code Review - PR #2238

Overall, this is a well-designed utility script that solves a real problem. The implementation is clean and the documentation is clear. Here are my observations:

Strengths

  1. Solves a real problem: Using git ls-files ensures only tracked files are checked, matching CI behavior exactly
  2. Clear documentation: Header comments explain purpose, usage, and which paths are checked
  3. Proper error handling: Uses set -eo pipefail to exit on errors
  4. Deduplication: sort -u prevents checking the same file twice
  5. Matches project conventions: Follows patterns from other bin/ scripts (shebang, comments, pipefail)
  6. Correct file permissions: Script is executable (755)

📝 Suggestions for Enhancement

1. Add dependency check (Medium Priority)

The script requires lychee but does not verify it is installed. Other scripts in this repo check dependencies:

# Add before line 17
if \! command -v lychee >/dev/null 2>&1; then
  echo "Error: lychee is not installed."
  echo ""
  echo "Install with:"
  echo "  cargo install lychee (or brew install lychee)"
  echo ""
  echo "See: https://lychee.cli.rs/#/installation"
  exit 1
fi

Why: Provides helpful error message instead of cryptic lychee: command not found

2. Consider adding --help flag (Low Priority)

For consistency with other bin scripts like ci-run-failed-specs and ci-rerun-failures:

if [[ "${1:-}" == "--help" ]] || [[ "${1:-}" == "-h" ]]; then
  sed -n '/^#/p' "$0" | sed 's/^# //g' | sed 's/^#//g'
  exit 0
fi

Why: Makes the script self-documenting via CLI

3. Minor: Pattern verification (Informational)

The script uses different glob patterns than CI:

  • CI: docs/ (directory, lychee recurses)
  • Script: docs/*.md docs/**/*.md (explicit globs via git ls-files)

Both should produce the same result, but worth noting that CI relies on lychee's directory recursion while the script explicitly lists patterns. This is actually safer because it's more explicit about what's checked.

4. Consider adding verbose mode (Optional)

For debugging, show which files are being checked:

# Add after line 15
VERBOSE=false
[[ "${1:-}" == "-v" ]] || [[ "${1:-}" == "--verbose" ]] && VERBOSE=true

# Change line 22 to:
} | sort -u | {
  if $VERBOSE; then
    tee /dev/stderr
  else
    cat
  fi
} | lychee --config .lychee.toml --files-from -

🔍 Verification Questions

  1. Does git ls-files '*.md' match what lychee does with *.md?

    • git ls-files '*.md' + grep -v '/' = only root-level .md files ✅
    • CI's *.md argument to lychee = likely same behavior ✅
    • This looks correct!
  2. Are there any untracked .md files developers should check locally?

    • The PR description mentions this is intentional - only git-tracked files
    • Makes sense for CI parity ✅

🎯 Verdict

Approve with minor suggestions. The core functionality is solid and achieves its goal. The suggestions above are enhancements, not blockers.

Priority recommendations:

  1. Add lychee dependency check (5 min, big UX improvement)
  2. Consider --help flag (5 min, consistency with other scripts)

Great work! This will help developers catch link issues before pushing to CI. 🚀

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
bin/check-links (1)

17-22: Consider adding a check for lychee availability.

If lychee is not installed, the script will fail with a potentially confusing "command not found" error. A pre-flight check could improve the user experience.

+if ! command -v lychee &> /dev/null; then
+  echo "Error: lychee is not installed. Install it with: cargo install lychee" >&2
+  exit 1
+fi
+
 {
   git ls-files 'docs/*.md' 'docs/**/*.md'
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af3b771 and 22987b0.

📒 Files selected for processing (1)
  • bin/check-links (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: rspec-package-tests (3.4, latest)
  • GitHub Check: examples (3.4, latest)
  • GitHub Check: build-dummy-app-webpack-test-bundles
  • GitHub Check: build
  • GitHub Check: pro-lint-js-and-ruby
  • GitHub Check: build-dummy-app-webpack-test-bundles
  • GitHub Check: claude-review
🔇 Additional comments (1)
bin/check-links (1)

1-22: LGTM! Script correctly mirrors CI behavior for local link checking.

The implementation is well-structured:

  • Documentation clearly explains purpose and paths
  • set -eo pipefail provides appropriate error handling
  • || true on line 19 correctly handles the case when no root-level .md files exist
  • sort -u properly deduplicates files that may match multiple patterns
  • Command grouping cleanly collects all file sources

Provides helpful error message if lychee is not installed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Dec 17, 2025

Code Review Feedback

Great addition! This script will definitely help developers catch link issues before CI. Here's my analysis:

✅ Strengths

  1. Correctly matches CI behavior: The script accurately replicates what the CI workflow checks by using git ls-files to only check tracked files
  2. Good documentation: Clear comments explaining the exact paths checked
  3. Proper error handling: Checks for lychee installation and provides helpful error message
  4. Clean implementation: Uses sort -u to deduplicate files, which is smart
  5. Executable permissions: File is correctly marked as executable (chmod +x)

🔍 Observations & Suggestions

1. Minor Pattern Discrepancy with CI (Low Priority)

The CI workflow uses directory-based args:

args: --config .lychee.toml docs/ *.md react_on_rails_pro/docs/ react_on_rails_pro/*.md

Your script uses more explicit glob patterns:

git ls-files 'docs/*.md' 'docs/**/*.md'
git ls-files 'react_on_rails_pro/docs/*.md' 'react_on_rails_pro/docs/**/*.md'

Impact: Minimal - both approaches should produce the same results since git ls-files with your patterns will match all tracked .md files in those directories. The explicit approach is actually better because it's more precise.

Verification: I tested the logic and confirmed it correctly matches:

  • All docs files (100+ files in docs/ subdirectories)
  • All root-level .md files (14 files like README.md, CHANGELOG.md, etc.)
  • Pro package docs

2. Bash Best Practices Comparison (Informational)

Looking at similar scripts in the repo (e.g., bin/ci-run-failed-specs), you could optionally add:

  • set -u (treat unset variables as errors) in addition to set -eo pipefail
  • Color-coded output for better UX
  • --help flag support

However, for a simple utility script like this, the current implementation is perfectly adequate. The trade-off between simplicity and feature-richness is well-balanced here.

3. Edge Case: Empty File List (Very Low Priority)

If all git-tracked .md files were somehow deleted, the script would pass empty input to lychee. Testing this:

echo "" | lychee --files-from -
# This works fine - lychee just reports 0 files checked

So no issue here.

4. Potential Enhancement: Cache Support (Optional)

The CI workflow uses caching via .lycheecache. The script could mention this in comments:

# Note: Cache is configured in .lychee.toml and stored in .lycheecache/

But this is purely informational - the script already benefits from caching since it uses the same config file.

🎯 Code Quality Assessment

  • Correctness: ✅ Excellent - achieves the stated goal
  • Best Practices: ✅ Follows repo conventions (shebang, set flags, error handling)
  • Security: ✅ No concerns - uses safe bash practices
  • Performance: ✅ Efficient use of git ls-files and sort -u
  • Maintainability: ✅ Well-documented and easy to understand
  • Testing: ℹ️ Consider testing locally before merge (see below)

🧪 Testing Recommendation

Before merging, verify the script works as expected:

# Ensure lychee is installed
cargo install lychee

# Run the script
bin/check-links

# Compare output with what CI would check (file count should match)

📋 Final Verdict

Approve with no blocking issues. This is a well-crafted utility script that achieves its purpose elegantly. The suggestions above are optional enhancements, not required changes.

The script correctly:

  • Matches CI behavior by using git-tracked files only
  • Prevents the "local vs CI" discrepancy mentioned in the PR description
  • Provides good error handling and documentation
  • Follows bash best practices with set -eo pipefail

Great work! 🚀

@ihabadham ihabadham merged commit 48896f9 into master Dec 17, 2025
27 of 28 checks passed
@ihabadham ihabadham deleted the ihabadham/add-bin-check-links branch December 17, 2025 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant