Migrator
Migrate existing SQL queries to semantic layer by analyzing queries and generating model definitions.
Migrates existing SQL queries to semantic layer by analyzing queries and generating model definitions.
Key features:
- Generate model definitions from existing queries
- Rewrite queries to use semantic layer syntax
- Measure what's missing from your semantic layer
Quick Start
Generate models from your existing queries:
sidemantic migrator --queries queries/ --generate-models output/
This analyzes SQL files in queries/ and generates model YAML files in output/models/.
Example output models/orders.yml:
model:
name: orders
table: orders
dimensions:
- name: status
sql: status
- name: region
sql: region
metrics:
- name: sum_amount
agg: sum
sql: amount
- name: count
agg: count
sql: "*"
Rewritten queries are generated automatically in output/rewritten_queries/ when you use --generate-models.
Original query:
SELECT status, region, SUM(amount), COUNT(*)
FROM orders
WHERE status = 'completed'
GROUP BY status, region
Rewritten query:
SELECT orders.status, orders.region, orders.sum_amount, orders.count
FROM orders
WHERE status = 'completed'
Supported Patterns
Basic Aggregations
Extracts dimensions from GROUP BY and metrics from aggregations:
SELECT
status,
SUM(amount) as revenue,
COUNT(DISTINCT customer_id) as customers
FROM orders
GROUP BY status
Complex Expressions
Handles CASE, COALESCE, CAST, math, string functions:
SELECT
UPPER(status) as status,
SUM(CASE WHEN priority = 'high' THEN amount ELSE 0 END) as high_priority_revenue,
SUM(quantity * price) as total_revenue
FROM orders
GROUP BY UPPER(status)
Extracts underlying columns (status, amount, quantity, price).
Time Dimensions
SELECT
DATE_TRUNC('month', order_date) as month,
EXTRACT(YEAR FROM order_date) as year,
SUM(amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date), EXTRACT(YEAR FROM order_date)
Extracts order_date as time dimension. Rewrites to orders.order_date__month, orders.order_date__year.
Derived Metrics
SELECT
status,
SUM(revenue) / COUNT(*) as avg_revenue_per_order,
SUM(amount - discount) as net_amount
FROM orders
GROUP BY status
Extracts base metrics and derived metrics separately:
metrics:
- name: sum_revenue
agg: sum
sql: revenue
- name: count
agg: count
sql: "*"
- name: avg_revenue_per_order
type: derived
sql: "SUM(revenue) / COUNT(*)"
Joins and Relationships
SELECT
c.region,
COUNT(o.order_id) as order_count
FROM customers c
JOIN orders o ON c.id = o.customer_id
GROUP BY c.region
Extracts relationship: orders.customer_id → customers.id (many_to_one).
# orders.yml
relationships:
- name: customers
type: many_to_one
foreign_key: customer_id
Subqueries
Resolves subquery aliases to underlying tables:
SELECT
sub.status,
COUNT(*) as order_count
FROM (
SELECT status, amount
FROM orders
WHERE amount > 100
) sub
GROUP BY sub.status
Extracts orders table and resolves sub.status → orders.status.
Window Functions and Cumulative Metrics
Detects window functions and generates cumulative metric definitions.
Running totals:
SELECT
order_date,
SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders
metrics:
- name: running_total
type: cumulative
sql: "orders.sum_amount"
Rolling windows:
SELECT
order_date,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_7day
FROM orders
metrics:
- name: rolling_7day
type: cumulative
sql: "orders.sum_amount"
window: "6 days"
Period-to-date:
SELECT
order_date,
SUM(amount) OVER (
PARTITION BY DATE_TRUNC('month', order_date)
ORDER BY order_date
) as mtd_revenue
FROM orders
metrics:
- name: mtd_revenue
type: cumulative
sql: "orders.sum_amount"
grain_to_date: "month"
What's extracted:
- Aggregation window functions:
SUM/AVG/COUNT/MIN/MAXwithOVER() - Rolling windows with
ROWS BETWEEN - Period-to-date with
PARTITION BY DATE_TRUNC()
What's ignored:
- Window-only functions:
ROW_NUMBER(),RANK(),LAG(),LEAD() - Complex window calculations like
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER()
Base aggregations inside complex calculations are still extracted as regular metrics.
Other Patterns
CTEs: Extracts from underlying tables, preserves structure in rewrites
GROUP BY ordinals: GROUP BY 1, 2 resolved to columns
Implicit joins: FROM a, b WHERE a.id = b.fk extracts relationships
Self-joins: Generates single model
CLI Usage
Generate models from queries:
sidemantic migrator --queries queries/ --generate-models output/
Analyze query coverage against an existing semantic layer:
sidemantic migrator ./models --queries queries/ --verbose
When you use --generate-models, it creates:
- Model YAML files in
output/models/ - Rewritten queries in
output/rewritten_queries/
Options:
directory- Semantic layer directory to compare against (default:.)--queries,-q- File or folder of SQL queries to analyze (required)--generate-models,-g- Output directory for generated models + rewrites--verbose,-v- Show detailed analysis
Migration Workflow
1. Analyze queries and generate models
sidemantic migrator --queries queries/ --generate-models models/generated --verbose
Review generated models in models/generated/models/ and rewritten queries in models/generated/rewritten_queries/.
2. Generate rewritten queries
# Rewritten queries are generated automatically when using --generate-models
The generated YAML uses a model: top-level and is intended as a starting point. Convert it to native models: format before loading into Sidemantic.
3. Test rewritten queries
Compare results between original and rewritten queries to ensure correctness.
4. Deploy models
# Convert/merge generated YAML into your native models directory
Load into your semantic layer and start using.
Troubleshooting
Low Coverage
Cause: Missing models, dimensions, or metrics.
Fix: Generate models from your queries first:
sidemantic migrator --queries queries/ --generate-models output/
Then load those models into your semantic layer before re-analyzing.
Dimensions Not Extracted
Cause: No GROUP BY clause.
-- ❌ No dimensions (no GROUP BY)
SELECT COUNT(*) FROM orders
-- ✅ Extracts status
SELECT status, COUNT(*) FROM orders GROUP BY status
Relationships Not Detected
Causes:
- Non-equi joins
- Complex conditions
- Multiple join keys
Fix: Manually add relationships to generated models or use simple equality joins in queries.
Metric Names Don't Match
Analyzer generates names from agg_column:
SUM(amount)→sum_amountCOUNT(*)→count
Use aliases to control names:
SELECT SUM(amount) as total_revenue FROM orders -- Generates "total_revenue"
Python API
Get coverage metrics from report:
report = analyzer.analyze_queries(queries)
print(f"Coverage: {report.coverage_percentage:.1f}%")
print(f"Missing models: {report.missing_models}")
print(f"Missing dimensions: {report.missing_dimensions}")
Provide database connection for better inference:
import duckdb
conn = duckdb.connect(":memory:")
analyzer = Migrator(layer, connection=conn)
Benefits: Resolves ambiguous columns, infers relationships from foreign keys, detects primary keys.