Problem
rtk grep is line-oriented. On minified bundles (single-line JS, compressed JSON, log files without newlines) a single match line can be 100K+ characters. The current behavior truncates to max_line_len (default 80) and destroys the surrounding semantic context that an LLM needs to reason about the match.
Concretely, running rtk grep 'fetch' bundle.min.js on a 1 MB minified file either:
- truncates the matching line to 80 chars (losing the URL, payload, options object), or
- returns the entire 1 MB line if
--max-len is raised, blowing the token budget.
Neither outcome is useful, and this is exactly the kind of token-waste RTK is meant to prevent.
Proposal
Add a new subcommand rtk extract that finds regex hits and emits a configurable character window around each match — independent of line boundaries — with optional secondary-keyword filtering to drop irrelevant windows.
rtk extract <PATTERN> [PATH] [-w/--window N] [-b/--before N] [-a/--after N]
[-r/--require KEYWORD]... [-i/--ignore-case]
[-m/--max N] [--no-dedupe]
| Flag |
Default |
Purpose |
-w, --window |
100 |
Symmetric window (chars). |
-b, --before / -a, --after |
from --window |
Asymmetric overrides. |
-r, --require |
none, repeatable |
Window must contain ALL of these substrings. |
-i, --ignore-case |
false |
Case-insensitive primary regex and --require. |
-m, --max |
100 |
Token-budget cap on emitted windows. |
--no-dedupe |
false |
Disable collapsing of identical windows ((xN)). |
Example
$ rtk extract 'fetch\([^)]{0,300}\)' bundle.min.js -w 80 -r '/api/v1/login'
bundle.min.js (2 matches)
@184523: ...auth.Ka.send(«fetch(\"/api/v1/login\",{method:\"PUT\",body:JSON.stringify(t)})»);return r...
@201117: ...refresh=()=>«fetch(\"/api/v1/login\",{method:\"POST\",body:e})»;...
1 file, 2 windows shown (of 14 matches, 12 filtered by --require)
Why a new command instead of grep flags
Keeping grep line-oriented is valuable for the 99% case of normal source files. Folding --window and --require into grep would either change its default semantics (breaking) or add flags that only make sense in a different mental model. A separate command keeps both clean.
Expected savings
On a typical minified bundle, a 1 MB match-line collapses to a ~300-char window — roughly 99.9% reduction. Even on multi-match cases capped at --max=100, output stays in the low-KB range vs. raw multi-MB. Comfortably above RTK's 60% bar.
Prior art
I have a working Python proof-of-concept that uses the same model (re.finditer + character slicing + secondary substring filter) and it has been the most useful pattern for analyzing reverse-engineered minified bundles. Porting it natively into RTK keeps the workflow inside the existing rtk gain tracking pipeline.
Out of scope (deliberately deferred)
- Streaming for files > 5 MB (initial version reads whole file; > 5 MB will fail loud).
- Folding
--window / --require back into rtk grep.
- Binary-file detection beyond the read-as-utf8 short-circuit.
I have an implementation ready and will open a PR shortly.
Problem
rtk grepis line-oriented. On minified bundles (single-line JS, compressed JSON, log files without newlines) a single match line can be 100K+ characters. The current behavior truncates tomax_line_len(default 80) and destroys the surrounding semantic context that an LLM needs to reason about the match.Concretely, running
rtk grep 'fetch' bundle.min.json a 1 MB minified file either:--max-lenis raised, blowing the token budget.Neither outcome is useful, and this is exactly the kind of token-waste RTK is meant to prevent.
Proposal
Add a new subcommand
rtk extractthat finds regex hits and emits a configurable character window around each match — independent of line boundaries — with optional secondary-keyword filtering to drop irrelevant windows.-w, --window-b, --before/-a, --after--window-r, --require-i, --ignore-case--require.-m, --max--no-dedupe(xN)).Example
Why a new command instead of grep flags
Keeping
grepline-oriented is valuable for the 99% case of normal source files. Folding--windowand--requireintogrepwould either change its default semantics (breaking) or add flags that only make sense in a different mental model. A separate command keeps both clean.Expected savings
On a typical minified bundle, a 1 MB match-line collapses to a ~300-char window — roughly 99.9% reduction. Even on multi-match cases capped at
--max=100, output stays in the low-KB range vs. raw multi-MB. Comfortably above RTK's 60% bar.Prior art
I have a working Python proof-of-concept that uses the same model (
re.finditer+ character slicing + secondary substring filter) and it has been the most useful pattern for analyzing reverse-engineered minified bundles. Porting it natively into RTK keeps the workflow inside the existingrtk gaintracking pipeline.Out of scope (deliberately deferred)
--window/--requireback intortk grep.I have an implementation ready and will open a PR shortly.