test(dialect): strengthen compression_stats and count_tokens assertions#150
test(dialect): strengthen compression_stats and count_tokens assertions#150mvalentsev wants to merge 1 commit intoMemPalace:developfrom
Conversation
|
@bensig @proxysoul PR #147 changed the |
79e1647 to
794dc81
Compare
|
Update: the core rename fix landed in main via @igorls' PR series, so this is no longer a CI blocker. Rebased on the merged fix, and the remaining diff is just two extra assertions — a strictness check on |
43d0e18 to
25e691e
Compare
25e691e to
9c0da51
Compare
web3guru888
left a comment
There was a problem hiding this comment.
PR #150 — test(dialect): strengthen compression_stats and count_tokens assertions
This is a small, focused test hardening PR that adds meaningful coverage on top of the existing assertion suite. The context in the PR description is helpful: the ratio → size_ratio / compressed_chars → summary_chars rename landed via @igorls' series, and this PR bolts on the regression guards that should have accompanied that rename from the start.
What's done well:
The original_tokens_est > summary_tokens_est assertion in test_stats is the right catch. The risk it guards against — a token estimator that collapses to a constant (e.g. always returns 1 due to a bad max() guard or wrong heuristic multiplier) — would be completely invisible to the existing char-level assertion. It's a silent failure mode that could persist through several refactor cycles without this.
The three count_tokens edge cases ("", "one", "hello world") directly exercise the max(1, int(word_count * 1.3)) contract. The inline comment explaining the "hello world" → 2 calculation is a nice touch — it makes the expected value self-documenting for future readers rather than leaving them to mentally trace the heuristic. Not every test PR does this.
Issues / concerns:
One minor flakiness risk: test_stats creates a compression pair and then asserts original_tokens_est > summary_tokens_est. If Dialect.compress() ever produces a summary longer than the original (degenerate case on very short inputs — e.g., a 2-word input that gets a 5-word summary header prepended), the assertion fails for the wrong reason. It might be worth using a clearly long fixture string (50+ words) so the compressed form is unambiguously shorter. The current fixture text is not shown in the diff, so I can't verify it's safe — worth double-checking.
The count_tokens("") == 1 case is correct per the max(1, ...) guard, but it implicitly documents that empty strings always return 1. If the semantics ever change (e.g., returning 0 for empty input becomes intentional), the test will catch it — but it might be surprising. A one-line comment explaining "empty string returns minimum 1 by design" would clarify the intent vs. a floor that was accidentally left in.
Overall verdict:
LGTM — this is exactly the kind of regression guard that makes a codebase durable through refactors. The token estimator path is load-bearing for anything that uses compression_stats to make throttling or batching decisions, so having explicit assertions on it is worthwhile. The edge-case coverage on count_tokens is appropriate for a max(1, ...) guard. Minor suggestion: verify the fixture text in test_stats is long enough that summary_tokens_est < original_tokens_est is guaranteed, not probabilistic.
Reviewed by MemPalace-AGI — autonomous research system with perfect memory
|
Small but correct. The two new assertions are the right additions:
The comment explaining the word-based heuristic ( LGTM. |
9c0da51 to
93ff655
Compare
93ff655 to
ec8c6df
Compare
Adds coverage on top of the honest-stats test fix that already landed in main: - test_stats now also asserts original_tokens_est is strictly greater than summary_tokens_est, which catches a class of regressions where the token estimator flattens to a constant. - test_count_tokens gains edge cases for the empty string and the single-word input. Both exercise the max(1, ...) guard in Dialect.count_tokens, so a future refactor that drops the guard fails loudly instead of silently returning 0.
ec8c6df to
1a79095
Compare
|
closing — dialect test improvements landed via #569 (merged). thanks @mvalentsev! |
What does this PR do?
The main honest-stats test fix (
ratio->size_ratio,compressed_chars->summary_chars,len // 3-> word-basedcount_tokens) already landed in main through @igorls' PR series. This PR now just adds a couple of assertions on top of that fix so the regression surface is a bit smaller.test_statsalso assertsstats["original_tokens_est"] > stats["summary_tokens_est"]. That catches a class of regressions where the token estimator flattens to a constant, which would otherwise slip through the existing char-level assertion.test_count_tokensgains edge cases for the empty string and the single-word input. Both exercise themax(1, ...)guard inDialect.count_tokens, so a future refactor that drops the guard fails loudly instead of silently returning 0.Originally opened before I saw that the core rename fix was already in flight. After rebasing on the merged fix, the remaining diff is a small additive coverage improvement.
How to test
18 tests pass locally.
Checklist