Skip to content

Comments

Grepped Ranking to a Semantic-Heuristic Search Grep#34

Merged
boyter merged 26 commits intomasterfrom
stripper
Feb 19, 2026
Merged

Grepped Ranking to a Semantic-Heuristic Search Grep#34
boyter merged 26 commits intomasterfrom
stripper

Conversation

@boyter
Copy link
Owner

@boyter boyter commented Feb 18, 2026

This adds in a new ranking algorithm which expands on BM25 in the following ways

  • Uses the output of scc to know if a match is code or a comment and adjust ranks based on this depending on user preference
  • Weights against the complexity of the file using scc output so more complex logic appears higher but is adjustable
  • Applies weights against code with lots of code but little complexity indicating it is a data file
  • Adds code specific stop words agains code to avoid over ranking based on them
  • Down ranks apparent test files depending on the users search

It also adds in options to control all of these via the interfaces, including the TUI and HTTP.

@pr-insights pr-insights bot added L/complexity Low complexity S/size Small change labels Feb 18, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a comprehensive semantic-heuristic search ranking system that significantly enhances code search relevance by incorporating structural analysis of source code. The PR adds a new "structural" ranker that extends BM25 with code-aware features: weighting matches by whether they appear in code vs comments vs strings, applying complexity gravity to boost algorithmically dense files, penalizing low-information-density data files, dampening common language keywords, and adjusting test file rankings based on query intent.

Changes:

  • New structural ranking algorithm with configurable weights for code/comment/string matches
  • Complexity gravity system (brain/logic/default/low/off) to boost complex logic files
  • Noise penalty to demote large low-complexity data files (JSON, logs, minified JS)
  • Language-specific stopword dampening (e.g., "func" in Go, "def" in Python)
  • Test file detection with adaptive ranking based on query intent
  • Content-type filters (--only-code, --only-comments, --only-strings) for precise searching
  • Path filter with glob pattern support (path:/search/, file:*.go)
  • TUI function key controls (F1-F4) for live ranker/filter adjustment
  • HTTP server real-time search with JSON API endpoint
  • MCP server integration with new parameters

Reviewed changes

Copilot reviewed 27 out of 88 changed files in this pull request and generated no comments.

Show a summary per file
File Description
vendor/* Formatting updates (whitespace, indentation, build tags) across multiple vendored libraries
pkg/ranker/ranker.go Core structural ranker implementation with gravity, noise penalty, test dampening
pkg/ranker/stopwords.go Language-specific keyword dampening system for 18 languages
pkg/ranker/*_test.go Comprehensive test coverage for all new ranking features
pkg/search/executor.go Path filter with glob support, post-evaluation metadata filters
pkg/search/parser.go, lexer.go Path filter parsing with slash-in-value handling
pkg/search/*_test.go 150+ new test cases for path/glob filters and metadata evaluation
search.go, search_filter_test.go Content-type filtering with per-byte classification from scc
config.go New configuration fields with intent-based parameter resolution
tui.go, tui_test.go Function key controls for live ranker/filter cycling
http.go, asset/templates/* Real-time search UI with JSON API and dropdown controls
mcp.go Enhanced MCP tool descriptions with new parameters
main.go CLI integration with mutual exclusivity checks and auto-selection logic
language.go scc integration for per-byte content classification
console.go Updated ranking call with new parameters
README.md Comprehensive documentation of new features

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@boyter boyter merged commit f24bfb0 into master Feb 19, 2026
@boyter boyter deleted the stripper branch February 19, 2026 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L/complexity Low complexity S/size Small change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant