Advanced SQL Query Optimization
Techniques
Yes, there are numerous additional optimization techniques beyond execution order.
Here's an exhaustive list of SQL optimization strategies:
Index-Based Optimizations
1. Covering Indexes
-- Instead of separate lookups
SELECT customer_id, email, created_date
FROM customers
WHERE status = 'active';
-- Create covering index
CREATE INDEX idx_customers_covering ON customers (status) INCLUDE (customer_id, email,
created_date);
2. Composite Index Order
-- Less optimal index usage
CREATE INDEX idx_order_date_status ON orders (order_date, status);
WHERE status = 'completed' AND order_date > '2024-01-01';
-- Better index order (most selective column first)
CREATE INDEX idx_status_date ON orders (status, order_date);
WHERE status = 'completed' AND order_date > '2024-01-01';
3. Partial Indexes
-- Index only relevant data
CREATE INDEX idx_active_customers ON customers (customer_id) WHERE status = 'active';
JOIN Optimizations
4. JOIN Order Optimization
-- Less optimal - large table first
SELECT * FROM large_table lt
JOIN small_table1 st1 ON [Link] = [Link]
JOIN small_table2 st2 ON [Link] = [Link];
-- Better - start with smallest result set
SELECT * FROM small_table1 st1
JOIN small_table2 st2 ON st1.common_id = st2.common_id
JOIN large_table lt ON [Link] = [Link];
5. Semi-JOIN vs EXISTS
-- Less optimal with IN
SELECT * FROM customers c
WHERE c.customer_id IN (SELECT customer_id FROM orders WHERE amount > 1000);
-- Better with EXISTS
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND amount > 1000);
Subquery Optimizations
6. Correlated vs Non-Correlated Subqueries
-- Less optimal - correlated subquery
SELECT customer_id,
(SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) as order_count
FROM customers c;
-- Better - window function
SELECT DISTINCT customer_id,
COUNT(*) OVER (PARTITION BY customer_id) as order_count
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id;
7. CTE vs Subqueries for Readability
-- Complex nested subquery
SELECT * FROM (
SELECT customer_id, total_amount,
ROW_NUMBER() OVER (ORDER BY total_amount DESC) as rn
FROM (
SELECT customer_id, SUM(amount) as total_amount
FROM orders GROUP BY customer_id
)t
) ranked WHERE rn <= 10;
-- Better with CTE
WITH customer_totals AS (
SELECT customer_id, SUM(amount) as total_amount
FROM orders GROUP BY customer_id
),
ranked_customers AS (
SELECT customer_id, total_amount,
ROW_NUMBER() OVER (ORDER BY total_amount DESC) as rn
FROM customer_totals
)
SELECT * FROM ranked_customers WHERE rn <= 10;
Function and Expression Optimizations
8. Avoid Functions on Indexed Columns
-- Less optimal - function prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- Better - range condition uses index
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';
9. CASE vs Multiple OR Conditions
-- Less optimal
SELECT * FROM products
WHERE category = 'Electronics' OR category = 'Computers' OR category = 'Mobile';
-- Better for index usage
SELECT * FROM products
WHERE category IN ('Electronics', 'Computers', 'Mobile');
Aggregate Function Optimizations
10. COUNT(*) vs COUNT(column)
-- Less optimal
SELECT COUNT(customer_id) FROM customers WHERE status = 'active';
-- Better - COUNT(*) is faster
SELECT COUNT(*) FROM customers WHERE status = 'active';
11. Conditional Aggregation
-- Less optimal - multiple queries
SELECT COUNT(*) FROM orders WHERE status = 'completed';
SELECT COUNT(*) FROM orders WHERE status = 'pending';
-- Better - single query
SELECT
COUNT(CASE WHEN status = 'completed' THEN 1 END) as completed_orders,
COUNT(CASE WHEN status = 'pending' THEN 1 END) as pending_orders
FROM orders;
Data Type Optimizations
12. Appropriate Data Types
-- Less optimal - oversized data types
CREATE TABLE products (
id BIGINT, -- INT might suffice
price DECIMAL(20,4), -- DECIMAL(10,2) might suffice
status VARCHAR(255) -- CHAR(1) for single character status
);
-- Better - right-sized data types
CREATE TABLE products (
id INT,
price DECIMAL(10,2),
status CHAR(1)
);
UNION Optimizations
13. UNION vs UNION ALL
-- Less optimal - removes duplicates unnecessarily
SELECT customer_id FROM customers WHERE region = 'North'
UNION
SELECT customer_id FROM customers WHERE region = 'South';
-- Better - if duplicates don't matter
SELECT customer_id FROM customers WHERE region IN ('North', 'South');
-- Or if separate queries needed and no duplicates exist
SELECT customer_id FROM customers WHERE region = 'North'
UNION ALL
SELECT customer_id FROM customers WHERE region = 'South';
Window Function Optimizations
14. Window Functions vs Self-Joins
-- Less optimal - self-join
SELECT o1.order_id, [Link],
(SELECT COUNT(*) FROM orders o2 WHERE [Link] > [Link]) as rank
FROM orders o1;
-- Better - window function
SELECT order_id, amount,
RANK() OVER (ORDER BY amount DESC) as rank
FROM orders;
Query Structure Optimizations
15. **Avoid SELECT ***
-- Less optimal
SELECT * FROM large_table WHERE condition;
-- Better - specify only needed columns
SELECT id, name, email FROM large_table WHERE condition;
16. LIMIT with ORDER BY Optimization
-- Less optimal - sorts entire result set
SELECT * FROM large_table ORDER BY created_date DESC LIMIT 10;
-- Better - with appropriate index on created_date
CREATE INDEX idx_created_date_desc ON large_table (created_date DESC);
SELECT * FROM large_table ORDER BY created_date DESC LIMIT 10;
Advanced Techniques
17. Query Hints and Plan Control
-- Database-specific hints (SQL Server example)
SELECT /*+ INDEX(orders, idx_order_date) */ *
FROM orders
WHERE order_date > '2024-01-01';
18. Batch Processing
-- Instead of processing millions of rows at once
UPDATE large_table SET status = 'processed' WHERE condition;
-- Process in batches
UPDATE large_table SET status = 'processed'
WHERE id IN (SELECT id FROM large_table WHERE condition LIMIT 1000);
19. Materialized Views
-- For frequently accessed complex aggregations
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT DATE_TRUNC('month', order_date) as month,
SUM(amount) as total_sales,
COUNT(*) as order_count
FROM orders
GROUP BY DATE_TRUNC('month', order_date);
20. Partitioning Strategies
-- Partition large tables by date
CREATE TABLE orders_partitioned (
order_id INT,
order_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (order_date);
Performance Monitoring
21. Query Execution Plan Analysis
Use EXPLAIN PLAN or EXPLAIN ANALYZE
Monitor for table scans, nested loops, and hash joins
Check for missing indexes
22. Statistics Maintenance
Keep table statistics updated
Use ANALYZE TABLE regularly
Monitor cardinality estimates
Database-Specific Optimizations
23. Connection Pooling
Reuse database connections
Avoid connection overhead
24. Prepared Statements
-- Reuse execution plans
PREPARE stmt FROM 'SELECT * FROM customers WHERE customer_id = ?';
25. Bulk Operations
-- Instead of multiple INSERT statements
INSERT INTO table VALUES (1, 'a'), (2, 'b'), (3, 'c');
-- Or use BULK INSERT for large datasets
The key is to profile your queries, understand your data patterns, and apply the most
relevant optimizations based on your specific use case and database system.