Sidequery

Migrator

Migrate existing SQL queries to semantic layer by analyzing queries and generating model definitions.

Migrates existing SQL queries to semantic layer by analyzing queries and generating model definitions.

Key features:

  1. Generate model definitions from existing queries
  2. Rewrite queries to use semantic layer syntax
  3. Measure what's missing from your semantic layer

Quick Start

Generate models from your existing queries:

sidemantic migrator --queries queries/ --generate-models output/

This analyzes SQL files in queries/ and generates model YAML files in output/models/.

Example output models/orders.yml:

model:
  name: orders
  table: orders
dimensions:
  - name: status
    sql: status
  - name: region
    sql: region
metrics:
  - name: sum_amount
    agg: sum
    sql: amount
  - name: count
    agg: count
    sql: "*"

Rewritten queries are generated automatically in output/rewritten_queries/ when you use --generate-models.

Original query:

SELECT status, region, SUM(amount), COUNT(*)
FROM orders
WHERE status = 'completed'
GROUP BY status, region

Rewritten query:

SELECT orders.status, orders.region, orders.sum_amount, orders.count
FROM orders
WHERE status = 'completed'

Supported Patterns

Basic Aggregations

Extracts dimensions from GROUP BY and metrics from aggregations:

SELECT
    status,
    SUM(amount) as revenue,
    COUNT(DISTINCT customer_id) as customers
FROM orders
GROUP BY status

Complex Expressions

Handles CASE, COALESCE, CAST, math, string functions:

SELECT
    UPPER(status) as status,
    SUM(CASE WHEN priority = 'high' THEN amount ELSE 0 END) as high_priority_revenue,
    SUM(quantity * price) as total_revenue
FROM orders
GROUP BY UPPER(status)

Extracts underlying columns (status, amount, quantity, price).

Time Dimensions

SELECT
    DATE_TRUNC('month', order_date) as month,
    EXTRACT(YEAR FROM order_date) as year,
    SUM(amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date), EXTRACT(YEAR FROM order_date)

Extracts order_date as time dimension. Rewrites to orders.order_date__month, orders.order_date__year.

Derived Metrics

SELECT
    status,
    SUM(revenue) / COUNT(*) as avg_revenue_per_order,
    SUM(amount - discount) as net_amount
FROM orders
GROUP BY status

Extracts base metrics and derived metrics separately:

metrics:
  - name: sum_revenue
    agg: sum
    sql: revenue
  - name: count
    agg: count
    sql: "*"
  - name: avg_revenue_per_order
    type: derived
    sql: "SUM(revenue) / COUNT(*)"

Joins and Relationships

SELECT
    c.region,
    COUNT(o.order_id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.region

Extracts relationship: orders.customer_idcustomers.id (many_to_one).

# orders.yml
relationships:
  - name: customers
    type: many_to_one
    foreign_key: customer_id

Subqueries

Resolves subquery aliases to underlying tables:

SELECT
    sub.status,
    COUNT(*) as order_count
FROM (
    SELECT status, amount
    FROM orders
    WHERE amount > 100
) sub
GROUP BY sub.status

Extracts orders table and resolves sub.statusorders.status.

Window Functions and Cumulative Metrics

Detects window functions and generates cumulative metric definitions.

Running totals:

SELECT
    order_date,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders
metrics:
  - name: running_total
    type: cumulative
    sql: "orders.sum_amount"

Rolling windows:

SELECT
    order_date,
    SUM(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as rolling_7day
FROM orders
metrics:
  - name: rolling_7day
    type: cumulative
    sql: "orders.sum_amount"
    window: "6 days"

Period-to-date:

SELECT
    order_date,
    SUM(amount) OVER (
        PARTITION BY DATE_TRUNC('month', order_date)
        ORDER BY order_date
    ) as mtd_revenue
FROM orders
metrics:
  - name: mtd_revenue
    type: cumulative
    sql: "orders.sum_amount"
    grain_to_date: "month"

What's extracted:

  • Aggregation window functions: SUM/AVG/COUNT/MIN/MAX with OVER()
  • Rolling windows with ROWS BETWEEN
  • Period-to-date with PARTITION BY DATE_TRUNC()

What's ignored:

  • Window-only functions: ROW_NUMBER(), RANK(), LAG(), LEAD()
  • Complex window calculations like COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()

Base aggregations inside complex calculations are still extracted as regular metrics.

Other Patterns

CTEs: Extracts from underlying tables, preserves structure in rewrites

GROUP BY ordinals: GROUP BY 1, 2 resolved to columns

Implicit joins: FROM a, b WHERE a.id = b.fk extracts relationships

Self-joins: Generates single model

CLI Usage

Generate models from queries:

sidemantic migrator --queries queries/ --generate-models output/

Analyze query coverage against an existing semantic layer:

sidemantic migrator ./models --queries queries/ --verbose

When you use --generate-models, it creates:

  • Model YAML files in output/models/
  • Rewritten queries in output/rewritten_queries/

Options:

  • directory - Semantic layer directory to compare against (default: .)
  • --queries, -q - File or folder of SQL queries to analyze (required)
  • --generate-models, -g - Output directory for generated models + rewrites
  • --verbose, -v - Show detailed analysis

Migration Workflow

1. Analyze queries and generate models

sidemantic migrator --queries queries/ --generate-models models/generated --verbose

Review generated models in models/generated/models/ and rewritten queries in models/generated/rewritten_queries/.

2. Generate rewritten queries

# Rewritten queries are generated automatically when using --generate-models

The generated YAML uses a model: top-level and is intended as a starting point. Convert it to native models: format before loading into Sidemantic.

3. Test rewritten queries

Compare results between original and rewritten queries to ensure correctness.

4. Deploy models

# Convert/merge generated YAML into your native models directory

Load into your semantic layer and start using.

Troubleshooting

Low Coverage

Cause: Missing models, dimensions, or metrics.

Fix: Generate models from your queries first:

sidemantic migrator --queries queries/ --generate-models output/

Then load those models into your semantic layer before re-analyzing.

Dimensions Not Extracted

Cause: No GROUP BY clause.

-- ❌ No dimensions (no GROUP BY)
SELECT COUNT(*) FROM orders

-- ✅ Extracts status
SELECT status, COUNT(*) FROM orders GROUP BY status

Relationships Not Detected

Causes:

  • Non-equi joins
  • Complex conditions
  • Multiple join keys

Fix: Manually add relationships to generated models or use simple equality joins in queries.

Metric Names Don't Match

Analyzer generates names from agg_column:

  • SUM(amount)sum_amount
  • COUNT(*)count

Use aliases to control names:

SELECT SUM(amount) as total_revenue FROM orders  -- Generates "total_revenue"

Python API

Get coverage metrics from report:

report = analyzer.analyze_queries(queries)

print(f"Coverage: {report.coverage_percentage:.1f}%")
print(f"Missing models: {report.missing_models}")
print(f"Missing dimensions: {report.missing_dimensions}")

Provide database connection for better inference:

import duckdb

conn = duckdb.connect(":memory:")
analyzer = Migrator(layer, connection=conn)

Benefits: Resolves ambiguous columns, infers relationships from foreign keys, detects primary keys.