SQL Saturday Baton Rouge 2025

SQL Saturday Baton Rouge is a great event hosted on LSU’s campus that has some of the best food, presenters, and attendees in the country! Had some incredible questions for my session and learned a ton throughout the conference. Thank you to all the sponsors and volunteers who make this event happen year after year (16 years running!). It’s one of the first events I ever spoke at (https://dataonwheels.wordpress.com/2022/08/23/sql-saturday-baton-rogue-recap/), and I’m so excited I was able to speak here again!

Here’s a link to the event where you can find other incredible speakers and sessions! Pictured below are some of my SML (Saturday Morning Learning) buddies – Dominick Raimato, Jason Romans, Kevin Pereira, and me! Thanks Sheila Romans for the great picture of us nerds!

My session – Mastering Microsoft Fabric Data Warehouses Tips & Tricks

Thanks to everyone who came out to my session and came with me on a journey to kill Pippin’s long-running query, create case insensitive warehouses, and monitor our warehouse and lakehouse queries on a single pane of glass. You guys were such an engaged audience and I loved being able to laugh and commiserate about our SQL experiences together!

You can find the presentation and resources here – https://github.com/Anytsirk12/DataOnWheels/tree/main/2025%20Presentations/2025%20SQL%20Saturday%20Baton%20Rouge

Jason Romans – How to Diagnose a Slow Power BI Report

Jason gave an awesome session about how to use external tools to diagnosis and even proactively solve a slow Power BI Report. Check out his session live if you get a chance, lots of great slides and demos! Below is the abstract from his session, be sure to follow the DAX Shepherd.
https://thedaxshepherd.com/presentations/

Users complain that a Power BI report is so slow that it is not worth using. A Power BI report may not perform well for many reasons, but how do you know where to start? In this session, I will show you some of the main tools I use to dive deep into the common and the not-so-common performance issues I run into. Starting from a slow report, we will learn how to diagnose where the problem is. Beginning with Performance Analyzer in Power BI, we can start to figure out where the issue is. Is it the visual that is slow, or are the DAX queries taking too long?

Next, we will use DAX Studio to tune the DAX queries that are being generated. Along the way, we will look beneath the visuals to see how Power BI is architected so we can better understand how to write better code from the start. We will examine Query Plans and Server Timings to evaluate which of our changes are more beneficial.

Lastly, we will examine methods for detecting and preventing performance issues before they occur. Using Tabular Editor, we will learn how Best Practice Analyzer can detect performance issues with your semantic models.

Kevin Feasel – Applying Forensic Accounting Techniques using SQL and Python

Below are my stream of conscious notes from this session, it was a ton of fun to think of forensically investigate a SQL environment using python. Session was packed with great examples and was an absolute blast to play detective. Kevin runs curated SQL which is an awesome site to get a “curated” list of recent blogs with incredible insights across the data community.
https://curatedsql.com/

Forensic accounting deals with primarily investigation and litigation. Goal is to connect the dots on potential fraud in a way that will stand up in court.

Step 1 – Summary Analysis

  1. Find largest table in terms of records or size
  2. For largest table, review data types and keys
  3. Look at foreign keys off that largest table and do the same analysis over and over.

You end up with a database model. That gives you a high-level view of the database.
There will be some python notebooks available in the future that will walk through how to explore a sample database.

Pandas has a command called df.describe(include=’all’) that will give you stats on a dataframe. Pretty neat way to easily get some info on numeric values especially. Note – you should convert the date fields to be dates within the dataframe (likely default to objects) then you’ll get similar stats for the date columns.
Pandas also has good ways to easily group, count, and sorting (also able to do this in SQL) so you can easily see what the top artifacts are within certain categories. One example is number of vendors with certain expense categories. For example, maybe only one Vendor is selling cleaning supplies. There may be a reason for it, but in an auditing scenario, that is a good place to start looking for preferential treatment and fraud.

Step 2 – Growth Analysis

This looks at changes in ratios over time. Goal is to look for major changes which look to be far outside the norm. Typically this is performed against important additive measures like revenue, cost, number of customers, number of orders, and even active users as a percent of total users (a “fixed cap” ratio).

We can perform an eyeball test at this phase. We can also do comprehensive analysis using KS and PPS tests. Seaborn library in python allows you to easily create lineplots and other visuals to see how additive measures perform over time. This allows you to investigate and gut check growth over time. This can be particularly interesting if you expect a linear increase or decrease but see something more random.

Min max scaler inside of sklearn library (preprocessing) can rescale everything on a range from 0 – 1 to allow you to really compare growth of two very different scaled items.

Gap analysis – sometimes the lack of value can be more interesting. Two meanings – review of controls and where things can go wrong & what data is missing.
Controls – do people have access to edit logs, who can do what? Does data change and do we know when it does? Are we tracking failures? Double-checking calculations on reports.
Gaps and Islands – gaps in data. Gaps are elements in a sequence which are missing. Islands are contiguous sets of data up to and not including the gap.
Accountants care about sequences that should be complete like receipt numbers, purchase orders, check numbers, invoice numbers, etc. Example check 1003 is missing, your accountant wants to see the voided check.

How to solve gaps and islands. Use DENSE_RANK() to build an ordering then calculate a difference between actual value and the DENSE_RANK() to find where the gap pops up and the islands get created.

Warning with gaps – SQL will typically have gaps in IDENTITY. This happens often with rollbacks where identity value gets burned. This also applies to SEQUENCE. Building a gapless sequence is a huge hit on capacity which is why it’s not default. It forces single row insertion. Creates serializable isolation level.

Regression Analysis

Contains a dependent variable (the thing we want to explain). For example, count of invoices per month, count of invoices is the dependent variable. Dependent variable must be continuous, countable feature. Can only be one dependent variable in a regression. Multivariate multiple regression is a thing but it’s really a series of regressions. We have one or more independent variables that we think will explain the dependent variable (time, people, customer, location, etc.). We assume the independent variables will drive the dependent variable but not other independent variables. Data should ideally have a good amount of variety in the independent variables. Ideal is 30-100 rule of thumb (data points per combination of independent variables). More data is better. We also assume that the past behavior is similar to future behavior.

sklearn.linear_model import LinearRegression and sklearn.metrics import mean_square_error from python make conducting a linear regression super easy to implement. You need to fit the model to see how the dependent variable is impacted by the provided independent variables. sm_model.sumary() will give you the OLS regression results. Be sure to use what you learned from your gut check line charts. For example, if there’s a significant deviation from the line in a certain year, try building the linear regression for before and during that significant difference occurs and compare that to what you get after. This may also help you find the true outliers that led to the craziness overall.

Cohort Analysis – looking at top-level aggregates or by a single slice might help identify problems. Differences should be able to be explained by domain experts.

  • Duplication of entries
  • frequency of one thing versus another
  • changes in frequency over time
  • shape of numeric distribution

Intuition on duplication

  • People don’t like to perform more work than necessary
  • when given an option between more work and less work, people will generally choose to less work
  • duplication is usually considered more work
  • if the low-work is easy to understand, people are more likely to gravitate to it ceteris paribus

These are not rules of nature, just intuitions.

Digit analysis – you should expect a uniformed distribution of the last digit in a sequence. There are exceptions. Budgets usually are rounded off and end with 5s or 0s. Estimates in general are also rounded to 5 or 0. Less precise measurements often end in 0.

When people try to fake data, they tend to go with “more random” numbers like 2, 3, 7, and 8. Benford’s law – unlike the final digit, the first digit of a sequence often follows a power law because 1 is more likely to occur than 2, etc. Works best when there’s 2-3 orders of magnitude or more. Pretty neat, definitely look up Benford’s law.

Python library is called beautiful soup for this. Pretty cool.

Keep records of your findings. As you dig, you will want to have that data readily available for future reference.

Jonathan Stewart – Building scalable data warehouses in Fabric

Jonathan walks through the basics of building a data warehouse that can easily grow without taking down capacity. He walks through best data integration options, star schema principles, and some great data modeling best practices. Column selection, JOIN, and aggregation optimization techniques were the focus.

V-Ordering, when should you do it? It’s enabled by default for read optimization on the data warehouse but off by default for data lakehouses to prioritize data timeliness.

  • Direct Lake Semantic models
  • If user experience is priority over data timeliness

Size Optimization Strategy = choose the smallest data type that supports your data requirements. Plan for scaling, but realistically. A column that is yes/no will never have more than 3 characters so it can be a VARCHAR(3).

His abstract:

Learn how to design and implement scalable data warehouses using Microsoft Fabric. This session will cover best practices for data storage, management, and retrieval, as well as the use of Fabric’s data warehousing features. Attendees will gain insights into creating efficient data pipelines, optimizing performance, and ensuring data consistency.

How-To Convert Excel Date & Time to SQL

Hey data friends! This one comes from my personal vault (aka backlog of drafts I’ve been needing to write up) and is a really simple code that I always forget how to do. So, to save us all some ChatGPT-ing, here’s my tried-and-true way of converting Excel Date & Time fields to a true Date & Time in SQL.

Let’s say you have the following Excel table (don’t worry these people are not real) and are loading it into SQL Server through an ETL process.

When the data gets to SQL, you realize that the call timestamp is coming in as a number! How do you get this to a proper date time field in SQL?

The most accurate way to get the date and time out of this field is to split the field on the period and calculate the date separately from the time. We will use a CTE to split the date and time fields then process the changes.

WITH separate_date_time AS (
	SELECT 
	*
	,SUBSTRING([Call Timestamp],0 --SUBSTRING allows us to get a portion of the text starting at the beginning (the 0) and ending before the '.'
		,CHARINDEX('.',[Call Timestamp]))  'CallDate' --CHARINDEX allows us to get the location of the '.' so we know where to end our substring selection
	,SUBSTRING([Call Timestamp],CHARINDEX('.',[Call Timestamp])   --for the time, we will start at the '.' and grab everything after that
		,LEN([Call Timestamp]))  'CallTime'
	FROM dbo.ExcelUploadDemo
)

SELECT 
ID 
,[Call Timestamp]
,CallDate
,CallTime
, DATEADD(D,CAST(CallDate AS INT), '1899-12-30') AS [CleanedCallDate]
, DATEADD(SECOND,ROUND(CAST([CallTime] AS float) * 86400, 0), CAST('00:00' AS TIME)) AS [CleanedCallTime]
, DATEADD(D,CAST(CallDate AS INT), '1899-12-30') + DATEADD(SECOND,ROUND(CAST([CallTime] AS float) * 86400, 0), CAST('00:00' AS TIME)) AS [CleanedCallDateTime]
FROM separate_date_time

Little note, you need to round because occasionally a :00 second value will be converted to 59 seconds of the previous minute. Happy coding friends!

T-SQL Tuesday #176: That Piece of Technical Advice for the Past Me

DIVERSIFY!

We devote a lot of time to mastering the technology that we are passionate about or use for our jobs. For example, I specialized in Microsoft business intelligence, which has a wide range of technologies indeed. But I was still relatively narrow in focus.

I would suggest that you broaden your knowledge and skills in related fields like security and networking. You do not need to master them, but you should have a solid understanding of the concepts. Learning related technologies enables you to communicate effectively and confidently with your colleagues about how to provide high-quality solutions to your customers or businesses.

Learn New Things

There are many ways to learn new skills. Microsoft Learn is a great place to start but not the only option. There are online training programs (Udemy, Pragmatic Works), YouTube channels (Tales from the Field, Data on Wheels), blogs (Simple Talk, Eric on Identity), events (SQL Saturday, Microsoft Ignite), and even books (Hands on Analysis Services, M365 Security). These are some ways you can learn.

Get Certified

Certifications require you to show that you have a solid grasp of the topic. You have to be able to reason through concepts and respond to questions about the technology and its use. While certifications do not make you an expert, they do indicate a good knowledge of the technology. Finishing the certification process also gives you confidence to engage in relevant discussions with other technologists about the subject. Another advantage is that you can include this in your resume.

Getting certified in your area of expertise should be a formality, getting certified in something different is a stretch!

Other Benefits

Here are some other reasons to diversify:

  • You will improve your troubleshooting skills.
  • You may discover a new favorite technology.
  • You could increase your value in the job market.
  • You have something new to share with the community.

Learn something new, share what you learn.

Dynamically Unpivot columns in SQL

Picture this, your data ingestion team has created a table that has the sales for each month year split into different columns. At first glance, you may think “what’s the big deal? Should be pretty easy, right? All I need to do is unpivot these columns in Power BI and I’m good to go.” So you go that route, and the report works for one month. Next month, you get an urgent email from your stakeholders saying they can’t see this month’s numbers. That’s when you realize that this table will grow with new columns every month. That means that any report you make needs a schema refresh every single month. Unfortunately, Power BI will not grab new columns from a table once it’s published into the online service. The only way for the Power Query to pivot the new columns is for you to open the report in your desktop, go to Power Query, and refresh the preview to get all the columns in that table.

If you’re like me, you may not have time (or the will) to manually download, fix, and reupload this report each month. But have no fear! SQL is here! The unpivot function in SQL can feel like a black box, designed for only black belts in SQL to use, but we will break it down together and create a dynamic unpivot script.

What does unpivot in SQL accomplish? Unpivot takes multiple rows and converts them to columns. The general syntax is as follows:

SELECT 
originalColumn1, originalColumnHeaders, originalValues
FROM 
(SELECT originalColumn1, originalColumn2, originialColumn3, originalColumn4 FROM dbo.Table) T 
UNPIVOT (originalColumnHeaders FOR originalValues in (originalColumn2, originialColumn3, originalColumn4)) as UP

Let’s break this down.

The first SELECT statement contains the column we don’t want to unpivot plus the two names of what we want our final columns to be. For example, using our table below, we want to keep the country column as is, but we want to unpivot those sales months and the sales values. So, we may name the originalColumnHeaders as SalesMonth and originalValues as Sales. You can use whatever you’d like, but it has to match what’s in the UNPIVOT statement.

The SELECT after FROM tells the query where to get data from. By default, SQL requires this to be hard-coded, but don’t worry, we have a way for that to be dynamic.

The UNPIVOT statement requires us to provide two new column names, the first will be what you want the column full of the old column headers to be called. In our case, this will be SalesMonth. The second will be the column full of the old values within those columns (aka Sales).

The hard-coded version of our example would look like this:

SELECT 
country, SalesMonth, Sales
FROM 
(SELECT country, sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024 FROM dbo.Table) T 
UNPIVOT (SalesMonth FOR Sales in (sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024)) as UP

The result of this code looks pretty good! Now we can easily pull the month and year out from SalesMonth and have some data we can trend. But we still have that pesky hard coded problem.

To get around the hard coding, we have to do some SQL jujitsu. We will use a couple of variables that can be dynamically populated then execute the SQL as a variable using sp_ExecuteQueries. This is a bit easier to explain with in-code notes, so feel free to look through this query and read the notes that go along with it for details on how it works.


--Use this script to create the table for the demo
/* --Comment this line out to run the create table code
DROP TABLE IF EXISTS dbo.PivotedSales
CREATE TABLE dbo.PivotedSales (
	country VARCHAR(250),
	sales_jan_2024 DECIMAL(17,3),
	sales_feb_2024 DECIMAL(17,3),
	sales_mar_2024 DECIMAL(17,3),
	sales_apr_2024 DECIMAL(17,3),
	sales_may_2024 DECIMAL(17,3)
);
INSERT INTO dbo.PivotedSales (country, sales_jan_2024, sales_feb_2024, sales_mar_2024, sales_apr_2024, sales_may_2024)
VALUES 
	('South Africa', 111.22, 222.33, 333.44, 444.55, 555.66),
	('Canada', 112.22, 227.33, 332.44, 400.55, 500.66),
	('United States', 113.22, 228.33, 330.44, 401.55, 501.66),
	('Mexico', 114.22, 229.33, 334.44, 404.55, 504.66),
	('Ireland', 115.22, 230.33, 335.44, 409.55, 509.66),
	('Germany', 116.22, 231.33, 336.44, 499.55, 599.66),
	('South Africa', 1011.22, 2022.33, 3303.44, 4044.55, 5505.66),
	('Canada', 1102.22, 2027.33, 3302.44, 4000.55, 5000.66),
	('United States', 1103.22, 2280.33, 3030.44, 4001.55, 5001.66),
	('Mexico', 1104.22, 2209.33, 3034.44, 4004.55, 5004.66),
	('Ireland', 1105.22, 2300.33, 3305.44, 4009.55, 5009.66),
	('Germany', 1106.22, 2310.33, 3036.44, 4909.55, 5909.66);

--*/

--Original table
SELECT * FROM PivotedSales


--Set up parameters
;DECLARE @UnpivotColumns NVARCHAR(MAX), @FilterPefix NVARCHAR(MAX) 
-- FilterPrefix is the prefix that all the columns we want to pivot have. This will allow us to dynamically grab any new columns as they get created
SET @FilterPefix = 'sales%' --Note, you can adjust this to be a suffix or a middle string if needed by moving the % (wildcard)


--This section sets our @Unpivot column variable to be a comma separated list of the columns in our table with the FilterPrefix
SELECT @UnpivotColumns = STRING_AGG(CONVERT(NVARCHAR(MAX), C.name),',')
FROM sys.columns C 
INNER JOIN sys.types T ON T.user_type_id = C.user_type_id
WHERE C.object_id = object_id('dbo.PivotedSales') --this ensures we only get columns from our table
	AND C.name LIKE ''+@FilterPefix+'' --this makes sure only columns with the filter prefix are returned
	AND (T.name = 'decimal' OR T.name LIKE '%int%') --this ensures we only grab columns with a decimal or int type. You can adjust this if needed

SELECT @UnpivotColumns AS 'Unpivot Columns'


--This section creates a dynamic SQL statement using the comma separated list we just generated
DECLARE @ExecuteSQL NVARCHAR(MAX)
SET @ExecuteSQL = '
SELECT Country, SalesMonth, Sales
FROM 
	(SELECT Country, ' + @UnpivotColumns + ' FROM dbo.PivotedSales) P
	UNPIVOT
	(Sales FOR SalesMonth IN (' + @UnpivotColumns + ')) as S'
SELECT @ExecuteSQL AS 'Dynamic SQL Script' --this will show you the SQL statement we've generated


--Finally, we will use the system stored proc sp_executesql to execute our dynamic sql script
EXECUTE sp_executesql @ExecuteSQL

Now that is a lot of code, so be sure to look at each chunk and make sure you adjust it for your use case. Happy coding folks!

Huge credit for this solution goes Colin Fitzgerald and Miranda Lochner! Thank you both for sharing this with me!

SQL Saturday Atlanta 2024

Hey there happy coders! Last weekend I had the pleasure of speaking at the SQL Saturday Atlanta event in Georgia! It was an awesome time of seeing data friends and getting to make some new friends. If you live near a SQL Saturday event and are looking for a great way to learn new things, I can’t recommend SQL Saturday’s enough. They are free and an excellent way to meet people who can help you face challenges in a new way. Below are my notes from various sessions attended as well as the materials from my own session. Enjoy!

Link to the event – https://sqlsaturday.com/2024-04-20-sqlsaturday1072/#schedule

My session – Real-Time Analytics in Fabric

Thank you so much to everyone who came out to see my first ever session on Real-Time Analytics in Fabric! We had a couple of glitches in the Logic App creation, but had an excellent time troubleshooting together as a group. Please check out my GitHub for the slide deck as well as all the code used in the demonstration. Happy coding!

Link to my GitHub – https://github.com/Anytsirk12/DataOnWheels/tree/main/Real-Time%20Analytics%20in%20Fabric

Introduction to SQL Server Essential Concepts by Bradley Ball

Goal is to understand the core things about databases to enable deeper understanding.

ACID = atomicity, consistency, isolation, and durability.
Atomicity = either all of it commits or none of it commits. Consistency = my data must be in a consistent state before a transaction completes. Isolation = transaction must operate independently from other transactions. Durability = complex logging that is the transaction log. The log has all the commit history and ensures accurate data.

Transaction isolation levels – serializable, read committed (SQL server default), read uncommitted, repeatable read, snapshot isolation. You can set some of these at the db level. Serializable = blocks anything trying to get data on the same page, you can set this at the transaction level. Read committed = I can’t read data if a transaction is currently occurring. Read uncommitted = a dirty read, grabs data that isn’t committed. Repeatable read = nobody uses this lol. It’s a shared lock that holds for a longer period of time than the typical micro second. Shared lock means everyone can read the data. Snapshot isolation = Oracle and postgres use this. Every thing is an append and update. If I have 4 transactions, 1 update and 3 read, usually update would block reads, but this would redirect readers to a copy of the data in the TempDB at the point in time they requested to read it (aka before the update commits).

DMV = dynamic management view. Bradley used one that allows you to see active sessions in your database. We can see that a read query is blocked by an uncommitted transaction. We can see the wait_type = LCK_M_S and the blocking_session_id which is our uncommitted transaction. To get around this, he can run the read script and get a dirty read by setting the isolation level to read uncommitted. To unblock the original request, he can use ROLLBACK TRANSACTION to allow it to unlock that page of data.

How does SQL Server work on the inside? We have a relational engine and a storage engine. User interacts with a SNI which translates the request to the relational engine. User > SNI > Relational Engine [command parser > optimizer (if not in planned cache otherwise goes straight to storage engine) > query executer] > Storage Engine [access methods (knows where all data is) > buffer manager (checks the data cache but if not found then goes to the disk and pulls that into the buffer pool data cache). This gets extremely complicated for other processes like in-memory OLTP. The SQL OS is what orchestrates all these items.

SQL OS – pre-emptive scheduling (operating system) & cooperative pre-emptive scheduling (accumulates wait stats to identify why something is running slower).

Locks, latches, waits. Locks are like a stop light (row, page, and table escalation). If you lock a row, it will lock a page. Latches are who watches the locks/watchmen. It’s a lock for locks. Waits are cooperative scheduling. If a query takes too long, it will give up it’s place in line willingly. That creates a signal wait which signals there’s too much lined up.

SQL data hierarchy. Records are a row. Records are on a data page (8 k). Extents are 8 pages (64 k). It’s faster to read extents than pages. Allocation bit maps are 1s and 0s that signify data on a data page that enables even faster data reads – allows governing on 400 GB of data on 1 8KB page. IAM chains and allocation units allows quick navigation of pages. Every table is divided into in row data, row overflow data (larger than 8064 k), and lob data (large object like VARCHAR max and images).

Allocation units are made of 3 types:
1. IN_ROW_DATA (also known as HoBTs or Heap or B-Trees)
2. LOB_DATA (also known as LOBs or large object data)
3. ROW_OVERFLOW_DATA (also known as SLOBs, small large object data)

Heaps vs Tables. Oracle stores data as a heap which is super fast to insert. In SQL, these have bad performance due to clustered indexes and inserting new data. This is very situational. A table is either heap or clustered index, cannot be both. But heaps can have non-clustered indexes.

B-Tree allows you to get to the record with less reads by following a logic tree (think h is before j so we don’t need to read records after j). Heaps create a 100% table scan without a clustered index. Adding the clustered index dropped that significantly to only 1 read instead of the 8000 reads.

Recovery models – full, bulk logged, simple (on-prem). In the cloud everything is full by default. Full means everything is backed up. Bulk means you can’t recover the data but you can rerun the input process. Simple means you can get a snapshot but you can’t do any point in time restore. This will largely be determined by any SLAs you have.

Transaction log. This will constantly be overwritten. Your log should be at least 2.5 as large as your largest cluster. DBCC SQLPERF(logspace) will get you all the space available for logs in the various dbs. Selecting from the log is always not recommended since it creates a lock and logs are always running, so don’t do this in prod lol. Rebuilding indexes will grow your transaction log massively. To free up space in the transaction log, you have to a backup log operation which is why those are super important.

Fun tip, when creating a table you can put DEFAULT ‘some value’ at the end of a column name to provide it a default value if one is not provided. Pretty cool.

You can use file group or piecemeal restores to restore hot data much faster then go back and restore older, cold data afterward. To restore, you must have zero locks on the db. While restoring, the database is not online. Note, if you do a file group restore, you cannot query data that is in a unrestored file group so queries like SELECT * will not work.

Tales from the field has a ton of YouTube videos on these subjects as well.

Lessons Learned using Fabric Data Factory dataflow by Belinda Allen

What are dataflows? Dataflows are a low-code interface tool for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations. The goal is to allow for more people to manipulate and use data within your organization. At the heart, it’s Power Query.

Why start with dataflows as a citizen developer? It’s power query and you know that. It’s low-code data transformation. Excellent for migrating Power BI reports to Fabric.

Lots of great discussion about when it makes sense to use a dataflow gen2.

You can copy and paste power query from Power BI by going into the advanced editor OR you can hold shift and select all the queries you want then ctrl c then go to power query for a dataflow gen2 in the online service and hit ctrl v and it will populate with all your tables! Pretty neat. You can also make your relationships within the online portal.

DBA Back to Basics: Getting Started with Performance Tuning by John Sterrett

For the code visit: https://johnsterrett.com/blog-series/sql-server-performance-root-cause-analysis/

Goal of today’s session – arm anyone who is new to performance tuning with processes and sills to solve common problems.

Basic query runtime. SQL has something called wait stats that tells you what caused the query to be slow. When you run a massive query, it will go into a suspended state which will require reading from disc instead of from memory cache (T1). After that, you’re in a runable state (T2). Finally, you get to run it (T3).

Basic bottlenecks = memory, disk, CPU, network, locking blocking & deadlocks. Adding memory is typically the fastest way to improve performance.

Identify performance problems happening right now:

EXEC sp_whoisactive. This is an open source script that gives you insight into who’s running what right now. You can get this from https://whoisactive.com. The cool thing about this is there are more ways to run it than just EXEC sp_whoisactive. Identify what’s consuming the most CPU from the column. There’s also some parameters you can use like @sort_order. EXEC sp_whoIsActive @sort_order = ‘[CPU] DESC’, @get_task_info = 2. The task info parameter will give more information in a wait_info column. The best command is exec sp_whoIsActive @help = 1. This provides ALL the documentation on what it does. Adam (the creator) also has a 30 day blog series on everything it can do for you! One option to make things run faster is to kill the process causing the issue lol.

How to handle blocking. You can do explicit transactions with BEGIN TRANSACTION which will lock the table. At the end, you need to either COMMIT or ROLLBACK or else that lock holds. SQL uses pessimistic locking as default so it won’t let you read data that’s locked – it will simply wait and spin until that lock is removed. You can use exec sp_whoisactive @get_plans = 1 to get the execution plan. Be careful, the wait_info can be deceptive since the thing that takes the most time may not be the problem. It may be blocked by something else, check the blocking_session_id to ve sure. Also check the status and open_tran_count to see if something is sleeping and not committed. Keep in mind that the sql_text will only show you the last thing that ran in that session. SO if you run a select in the same session (query window) as the original update script, it won’t be blocked and can run and THAT query will show up in the who is active which can be super confusing. To resolve this issue, you can use ROLLBACK in that session to drop that UPDATE statement.

To find blocking queries use EXEC sp_whoIsActive @find_block_leaders = 1, @sort_order = ‘[blocked_session_count] DESC’.

Identifying top offenders over the long term:

There’s a feature in SQL 2016 forward called Query Store which persists data for you even after you restart data. It’s essentially a black box for SQL. Query Store is on by default in SQL 2022 and online servers. It’s available for express edition as well. Be sure to triple check this is on, because if you migrated servers it will keep the original settings from the old server. If you right click on the DB, you can navigate to query store and turn it on via Operation Mode (Requested) to Read write. Out of the box is pretty good, but you can adjust how often it refreshes and for how much history. To see if it’s enabled, you should see Query Store as a folder under the db in SSMS.

Under query store, you can select Top Resource Consuming queries. There’s lots of configuration options including time interval and what metric. SQL Server 2017 and newer have a Query Wait Statistics report as well to see what was causing pain. It’ll show you what queries were running in the blocking session. You won’t get who ran the query from query store, but you can write sp_whoisactive to a table that automatically loops (very cool). This will have overhead on top of your db, so be mindful of that.

Intro to execution plans:

Keep in mind, SQL’s goal is to get you a “good enough” plan, not necessarily the best plan. Follow the thick lines. That’s where things are happening. Cost will tell you the percentage of the total time taken.

Key lookups. It’s a fancy way to say you have an index, so we can skip the table and go straight to the data you have indexed. BUT if there’s a nest loop, then there’s an additional columns in the select statement so it’s doing that key lookup for every value. More indexes can make your select statements worse if it’s using the wrong index that isn’t best for your query.

Index tuning process.
1. Identify tables in query
2. Identify columns being selected
3. Identify filters (JOIN and WHERE)
4. Find total rows for each table in the query
5. Find selectivity (rows with filter/table rows)
6. Enable statistics io, time, and the actual execution plan
7. Run the query and document your findings
8. Review existed indexes for filters and columns selected
9. Add index for lowest selectivity adding the selected columns as included columns
10. Run the query again and document findings
11. Compare findings with baseline (step 7)
12. Repeat last 5 steps as needed

To see existing indexes, you can run sp_help ‘tableName’. In the example, there’s an index key on OnlineSalesKey but that field is not used in our filter context (joins and where statements) in the query. Order of fields in indexes do matter because it looks in that order.

Brent Ozar made a SP you can use called sp_blitzIndex that will give you a ton of info on an index for a table including stats, usage, and compression. It also includes Create TSQL and Drop TSQL for that index to alter the table.

To turn on stats, use SET STATISTICS IO, TIME ON at the beginning of the query. Be sure to also include the actual execution plan (estimated doesn’t always match what actually happened). Now we can benchmark. Use SET STATISTICS IO OFF and SET STATISTICS TIME OFF. Create an non clustered index with our filter context columns.