Skip to content

feat(sql): rnd_symbol_zipf() function for generating skewed symbol distributions #6217

Merged
bluestreak01 merged 4 commits intomasterfrom
vi_zipf
Oct 3, 2025
Merged

feat(sql): rnd_symbol_zipf() function for generating skewed symbol distributions #6217
bluestreak01 merged 4 commits intomasterfrom
vi_zipf

Conversation

@bluestreak01
Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 commented Oct 2, 2025

Generates Zipf/Power Law Distribution of symbols from the list. The probability of symbol decays from left to right with given 'alpha' velocity.

Usage:

select rnd_symbol_zipf('AAPL', 'MSFT', 'GOOGL', 'TSLA', 'AMZN', 5.3) from long_sequence(100_000);

Also added weighted, but more verbose function:

select rnd_symbol_weighted('AAPL', 50, 'MSFT', 30, 'GOOGL', 15, 'TSLA', 5) from long_sequence(100_000);

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 2, 2025

Walkthrough

Adds a new Zipf-based random symbol function factory to the SQL engine and corresponding tests. Minor/no-op repository housekeeping change in .gitignore-like area reported. No changes to exported APIs beyond the new factory.

Changes

Cohort / File(s) Summary
New Zipf random symbol function
core/src/main/java/io/questdb/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactory.java
Adds rnd_symbol_zipf(V) function factory. Validates constant symbol args and positive constant alpha. Builds cumulative Zipf distribution, samples per row using execution-context Rnd. Implements symbol table integration and plan rendering.
Tests for Zipf function
core/src/test/java/io/questdb/test/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactoryTest.java
Adds unit tests covering basic behavior, distribution skew across various alpha values, two-symbol case, and validation errors (insufficient args, zero/negative alpha).
Repo housekeeping
.gitignore, compat/src/test/rust/scenarios/target/...
No-op/whitespace-level change affecting ignored/generated paths; no logic impact.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant SQL as SQL Parser/Planner
  participant FF as RndSymbolZipfFunctionFactory
  participant Ctx as SqlExecutionContext
  participant Func as Zipf Symbol Function
  participant Rnd as Rnd (seeded)

  SQL->>FF: newInstance(args=[symbols..., alpha], positions, config, Ctx)
  FF->>FF: Validate constants and alpha > 0
  FF->>Func: Construct Func(symbols, alpha, cumulative probs)
  Func->>Ctx: getRandom()
  Ctx-->>Func: Rnd instance
  Func->>Rnd: seed if needed
  SQL-->>Func: Function instance returned

  loop For each row evaluation
    SQL->>Func: getSymbol(row)
    Func->>Rnd: nextDouble()
    Func->>Func: linear search over CDF to pick index
    Func-->>SQL: selected symbol
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title succinctly highlights the primary change by naming the new rnd_symbol_zipf() SQL function and describing its purpose in generating skewed symbol distributions, which directly matches the changeset. No changes needed for the title.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run `@coderabbitai generate docstrings` to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed The pull request description clearly describes the new rnd_symbol_zipf function, its Zipf distribution behavior, and provides relevant usage examples, directly reflecting the changes introduced in the request.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch vi_zipf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jerrinot
Copy link
Copy Markdown
Contributor

jerrinot commented Oct 2, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 2, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
core/src/main/java/io/questdb/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactory.java (1)

183-191: Use an actual binary search for cumulative sampling

The comment promises a binary search, yet we linearly scan every symbol. With large symbol lists this turns the sampler into an O(n) hotspot. Switching to Arrays.binarySearch keeps the cumulative-array approach but restores logarithmic behaviour.

+import java.util.Arrays;
@@
-            for (int i = 0; i < count; i++) {
-                if (u <= cumulativeProbabilities[i]) {
-                    return i;
-                }
-            }
-
-            // Fallback to last element (should not happen due to cumulative[count-1] == 1.0)
-            return count - 1;
+            int idx = Arrays.binarySearch(cumulativeProbabilities, u);
+            if (idx >= 0) {
+                return idx;
+            }
+            idx = -idx - 1;
+            return idx < count ? idx : count - 1;
core/src/test/java/io/questdb/test/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactoryTest.java (1)

90-96: Soften the high-alpha assertion to avoid brittle failures

These exact counts assume the RNG sequence never changes. A tiny tweak in Zipf normalisation (or RNG implementation) will make this test noisy. Consider asserting that '1AAPL' dominates and the remaining symbols collectively stay under a small threshold instead of pinning exact totals. Keeps the coverage while avoiding future flakes.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e783f08 and 52234d3.

📒 Files selected for processing (3)
  • .gitignore (1 hunks)
  • core/src/main/java/io/questdb/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactory.java (1 hunks)
  • core/src/test/java/io/questdb/test/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactoryTest.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (Hosted Running tests on windows-other)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start X64Zfs Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start X64Zfs Agent)
  • GitHub Check: New pull request (SelfHosted Griffin And Fuzz tests Start ARM Agent)
  • GitHub Check: New pull request (SelfHosted Other tests Start ARM Agent)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-cairo)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-fuzz)
  • GitHub Check: New pull request (Hosted Running tests with cover on linux-griffin)
  • GitHub Check: New pull request (Check Changes Check changes)

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 106 / 127 (83.46%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/engine/functions/rnd/RndSymbolZipfFunctionFactory.java 45 57 78.95%
🔵 io/questdb/griffin/engine/functions/rnd/RndSymbolWeightedFunctionFactory.java 61 70 87.14%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants