Power BI, Excel, OneLake – Dreams Do Come True!

I can’t believe it’s finally here! A way to have Excel live in OneDrive and access it from Power BI nearly live! We can officially short cut files to our OneLake from both SharePoint and OneDrive! I am super excited about this feature, and I hope you are too. This feature plus User Data Functions allows us to not only have data from Excel in our reports but keep it as fresh as needed. Imagine having budget allocations that you want to adjust right before or during a meeting. Now you can! You can edit a file in Excel and hit one button to see the new numbers in your report. In the past, we relied on 3rd party services or Power Apps licensing to accomplish this sort of experience. Now we can just use Excel, an old data friend.

Please note, THIS IS IN PREVIEW AND VERY NEW. I’ve included a ton of screenshots, but please be advised these may not be entirely reflective of the GA reality once this feature is released. My example uses a OneDrive folder, but you can easily do this with SharePoint as well! One caveat, you will need a Fabric capacity. This does work with a trial capacity.

  1. Creating the ShortCut in OneLake to OneDrive Folder
  2. Connecting to the File in Power BI
  3. Creating a Refresh Schedule in Power BI Service
  4. Optional – creating a manual refresh button using Translytical Task Flows
  5. Additional Resources

Creating the ShortCut in OneLake to OneDrive Folder

1 – Navigate to OneDrive online: https://onedrive.live.com/login

2 – Select the settings gear in the top right corner and select “OneDrive settings”.

3 – On the left-hand panel, select “More Settings” then scroll all the way down to the Diagnostic Information. From there, copy the OneDrive web URL. This is what we will use in Fabric to make the short cut.
NOTE – you will need to delete everything in that URL after the _com (in the screenshot below that’s “_layouts/15/onedrive.aspx”.

4 – Navigate to a lakehouse in Fabric where you would like to access the content from.
5 – Hit the Get data drop down then select “New shortcut”

6 – Choose the OneDrive option. As of this post, it is currently in Preview and says “OneDrive (Preview)” on the button.

7 – Create a new connection. The Site URL will be what you copied from the OneDrive settings. I recommend renaming the connection to something like “[Name]’s OneDrive” so it’s clear where the data comes from.

8 – Now that we have a connection, you can point the shortcut to any folder in your OneDrive! Please note, it will grab all children folders inside as well as files. Be mindful of what you actually want to share within OneLake. On the bright side, this means you only need one shortcut per folder hierarchy which makes this much easier if you have files in multiple subfolders you want to share.

9 – Once you have a folder selected, it may give you an option to transform data (may just be me with CSVs and JSON files in my folders lol). You can skip this by hitting “previous” then “skip” or “Revert Changes” at the top. If you want to transform your CSVs to delta tables, simply hit next. I haven’t played around with these auto transformations yet, so if you have notes let me know! Anything with “Auto” and “Preview” scares me, so I stay away until at least the “Preview” is gone haha. Also, currently this feature does not work. Your shortcut will simply disappear into the void. Really excited to see where this ends up in the future though!

10 – Hit “Create” and boom! You can now access files from your OneDrive inside OneLake! It may take a couple of refreshes on your browser and a few seconds, but then you can go to the Files section of your lakehouse and see the files/folders in your shortcut. NOTE – at first it will show the folder title in your “Tables” section. That means it’s working. Try refreshing your browser (and give it a couple of minutes) and it will pop up in your files section. It does flash a share warning to you, don’t worry about that and just give it a bit to load in the right section. The time it takes is directly proportionate to the amount of information/size of files you’re dropping. If you look at my screenshot, you can see I have one file that’s over 4 GB. Probably not the best folder to pull in (should have gone one level deeper to avoid that file since all I want is in the CSVs folder), but I wanted to see if it can handle it.

If needed, hit the three dots next to your folder and manually move it to files section.

You can see the date modified matches what’s in your OneDrive! Now to test the syncing, let’s make a change to the dim_customer.csv.
Original view:

Change made, took 40 seconds to sync from my laptop to OneDrive then about instantly it was available in Fabric (by the time I moved to that tab and refreshed my page it was there!).

Holy cow! This is a game changer! No longer do folks need to upload files manually using the OneLake explorer (very buggy in my experience). Now you can just short cut it in and allow your ETL process to always grab the latest version that’s been shortcut to OneLake!!

So that brings us to the next phase, how does this work with Power BI? Can we finally have a “live” experience with Excel file data in Power BI?

Connecting to the File in Power BI

1 – Open up your Power BI file. This should work in both the Power BI Desktop and the web Power BI experience, but my demo will use the Desktop.

2 – Connect to the lakehouse with our files. To connect to the files within a lakehouse, we’ll have to do a custom query since the main lakehouse connector only allows you to pull tables/views. Thankfully, a connector does already exist, it’s just not in our standard Get Data options. Create a blank query, then use the code below with your workspace and lakehouse id. To find the IDs, grab from the URL (https://app.fabric.microsoft.com/groups/WORKSPACEID/lakehouses/LAKEHOUSEID?experience=fabric-developer).

let
    workspaceID = "YOUR WORKSPACE ID",
    lakehouseID = "YOUR LAKEHOUSE ID",
    Source = Lakehouse.Contents(null){[workspaceId= workspaceID]}[Data]{[lakehouseId =lakehouseID]}[Data],
    Files_Folder = Source{[Id="Files",ItemKind="Folder"]}[Data]
in
    Files_Folder

3 – Navigate to your file by clicking on the link under “Data” (likely called “Folder”) > then the link under “Content” (also probably called “Folder”) > then the link under “Content” that ties to the file you want to open (mine was called “Binary” for the file I wanted). Now interact with it like any other file import in Power Query.

CSVs are relatively simple, so I pulled in an Excel file for the screenshots below to show what it looks like if you have formatted tables in your file. It shows BOTH formatted tables and sheets! Pretty awesome!

Creating a Refresh Schedule in Power BI Service

Now all that is already pretty amazing. We no longer need to mess with crazy links from Excel and can access all our data from the same place – OneLake! But how do we refresh it?

1 – Navigate to the semantic model settings.

2 – Under “Data Source Credentials” there will be a new source called “Lakehouse”. Open the “Edit Credentials” link and authenticate using the credentials you want to be used for the refresh (can be a service account if needed, but MUST be a OAuth2 source). Right now, there’s no way to connect other than OAuth2. Not great if you want to use service principals, but service accounts are still an option.

If you don’t need “live” updates, you can stop here. However, if you’re as patient as I am, then you’ll want a way to trigger a refresh of this data on the fly from within your report. Enter, translytical task flows.

Optional – creating a manual refresh button using Translytical Task Flows

1 – Create a User Data Function item in Fabric. It will be mostly blank because all we really want is a way to trigger the report refresh. This function will accept a name parameter and return a little message alerting the user that report refresh has been kicked off. Don’t forget to hit that “Publish” button in the top right corner to actually have this be live! The publishing process can take a bit, be sure it finishes publishing before looking for it in Power BI.

Here’s the code I’ll be using:

import datetime
import fabric.functions as fn
import logging

udf = fn.UserDataFunctions()

@udf.function()
def refresh_report(name: str) -> str:
    logging.info('Python UDF trigger function processed a request.')

    return f"Welcome {name}! Your report refresh will kick off at {datetime.datetime.now()}!"

2 – Navigate to Power BI Desktop and add blank button. You can add any button, but blank looks cleanest and is easiest to make clear what we want people to do.

3 – Add a Data function action and ensure the “Refresh Report” toggle is on. This is the key functionality we are looking for.

4 – Create a measure to populate the current user’s name automatically.

User = USERNAME()

5 – Add that measure to the button by selecting the little fx option next to name.

6 – Configure your button with some text to let people know what it does.

7 – Publish and test! Enjoy your new button! Keep in mind, this will refresh the WHOLE model. My model is fairly small and quick, so not a huge deal. There’s not currently a way to have it only refresh one table using the UI, but if you want to make a more complex UDF notebook, you can have it refresh ONLY the table that’s been impacted. Talk about powerful.

I will make a follow up blog soon that will cover how to adapt this method to only refresh the table you need, so stay tuned and hit the subscribe button for a ping when new blogs are published!

UPDATE – I looked into ways to only refresh one table, but it requires using client secret credentials and cannot use semantic link, mssparkutils, or a large number of other libraries available in other notebooks in Fabric. I’m hoping this will change long-term, but for now please refer to this blog on how to use the REST API in a standard python notebook to refresh Power BI: https://medium.com/@arvind.g90/refresh-smarter-not-harder-power-bi-automation-with-rest-api-python-63923b37c9a6 .

Additional Resources

FabCon Vienna 2025 Recap

Holy cow FabCon Vienna was incredible! This recap is extremely delayed because I’ve run into some serious decision fatigue about what to include (and my amazing hubby got me a LEGO Death Star to build for my birthday/anniversary/Christmas). So much of it was beyond amazing and I’m extremely grateful for the opportunity to not only attend but speak! Because of that, I’ve decided to include a little mini section about the new releases I’m excited about as well as a little blurb about the hallway track (the best part of any conference in my opinion).

I made so many wonderful new friends and really really enjoyed seeing so many friends over such a short period of time! Big conferences like this always feel like a reunion where we get to pull more people into the family and all learn/get excited about features together. Thank you to the organizers for all the work they did to bring us all together and make such a big conference feel easy to navigate and enjoy.

My Session

Let’s set the stage – it’s the last day of the conference, second to last session, right after lunch. I was very mentally prepared for a half-empty room of food-coma folks whose brains have already been very fried with incredible content for the week. Imagine my surprise as the room filled up immediately after lunch! I have only been speaking since 2022, but I can honestly say that this was by far one of the top 3 attendee groups I’ve ever had. Thank you to all of you who came to learn, asked insightful questions, and actually took me up on my offer to chat after the session!

So what did we learn? We chatted about how to manage data warehouse builds inside of Fabric and covered some tips and tricks from the field. We discussed dealing with case sensitivity and hunting and killing capacity killers along with how similar it is to Lakehouses under the hood. Interested to learn more? Feel free to check out my slide deck and resources from my github: https://github.com/Anytsirk12/DataOnWheels/tree/main/2025%20Presentations/2025%20FabCon%20Vienna

Things I’m Excited About

Want to see the full list? Check it out here: https://blog.fabric.microsoft.com/en-us/blog/september-2025-fabric-feature-summary/

  • MERGE in Data warehouse
  • Workspace collation setting
  • Notebooks can reach mirrored databases!
  • UDFs in DAX
  • Multi-tasking views (preview)
  • User data functions in Fabric GA
  • Materialized views in Lakehouses
  • Custom calendars in DAX
  • TMDL view
  • Variable library GA
  • Fabric MCP and CLI

The Hallway Track

If you’ve never been to a conference in person, this is where the real magic happens. Between sessions, the conversations that spark in passing over coffee cups, shared frustrations, wonderful eureka moments, and spontaneous problem-solving sessions were genuinely the highlight of the conference for me. There’s something special about the unplanned connections and deep dives that happen when you’re surrounded by people who love the same things you do. Those casual chats turned into new friendships, unexpected collaborations, and a dozen ideas I can’t wait to bring home and explore deeper. It’s the warm heartbeat of every great conference, and FabCon Vienna delivered it in the best possible way.

Are you looking for a way to justify the cost of conferences to a boss (or yourself)? The hallway track is where it’s at. For example, I had a good friend of mine who was running into issues with Microsoft stating that people were overloading a REST API and constantly hitting the API limit, but he knew nobody was calling it (or at least not that much). We chatted and I was able to build a notebook with him to grab and save the activity logs from Power BI into a lakehouse that he could share with Microsoft support as evidence that the call was only made x times a day. We ended up brainstorming throughout the conference and we were able to prove out to Microsoft what he already knew. Plus now he has a growing record of all the activities within the tenant he manages, such a powerful tool long-term!

That is just one example of many times I’ve seen or been a part of a major business issue getting resolved at a conference. That type of support from experts in the field who are extremely passionate is absolutely priceless. You also gain an army of folks that you can reach out to and who have materials you can reference if you run into issues in the future. This is how you build your village.

Data Warehouse CoreNote

Bogdan Crivat

Check out this announcement blog for all the details: https://blog.fabric.microsoft.com/en-us/blog/welcome-to-fabric-data-warehouse?ft=Announcements:category

New enterprise security features – private link, customer managed encryption keys, outbound access protection

New – MERGE, varchar(max), UI-based SQL audit logs, JSON file access using openrowset, workspace collation (yay!)

Coming in Oct – Identity!

Migration assistant is GA – migrate from SQL Server or Synapse to Fabric.

Alerts & actions – data driven alerts, monitor data quality, increase productivity (summarize failed data ingestion, long running queries, etc.).

Coming soon (very tbd) – AI functions, clustering, SQL pools, Fabric Functions, faster ingest BCP, and stats refresh

Question – lakehouses have been slow, we enjoy using python notebooks. Should we have used DWH? Would it use less capacity or speed things up? Answer – no. The reason we have two is because people have different skills. DWH is better at many concurrent users, spark is better at data preparation.

No-Code, Low-code, Pro-Code: Unlocking Data Magic with Fabric Dataflows Gen2 by Cristian Angyal

“Small daily improvements over time lead to stunning results” – Robin Sharma

Dataflows Gen2 = power query. It fits within the ingest/prep step of the data lifecycle.

Demo = employee training program cost allocation & tracking. Multinational company runs monthly trainings and wants to see cost break downs by department, country, skill, by month over month. The marketing team loads all the data into Excel since they are familiar with it.

Dataflows are ideal for this use case because they can easily connect to Excel files and do a number of manipulations on this smaller dataset. The UI options to manipulate this file work, but it is not very flexible. Imagine a column name is changed, or someone forgets a space, then it will break what the UI just built.

We can make the solution much more robust and future proof using parameters for the folder paths. You can also directly split columns into rows instead of columns then unpivoting (see under “Advanced” settings in the split column UI). By default, the split by delimiter will hard code the number of columns to create, be aware that this will not work well for user entered data. So what about the column names? What if someone accidentally adds a space to the end of a column header? Let’s code for that. Instead of combining files from a folder, use “Create” then create a new column with Excel.Workbook([Content], true) and this will also us to see the data in the file. From there we will slowly replace hard coding. For example, the drill down into a file hard codes the sheet name. Use {0} instead to go down to the first sheet no matter the name. Then we created a function that allows us to pass in the custom column we made that contains the actual table. This gives us the option to load more of our steps in to the function itself to apply to any number of tables.

To get the column names, we can use the Table.ColumnNames() function then plug those values into the next step instead of allowing Power Query to hard code it. List.Select(Table.ColumnNames(TableName), each _<> “Fixed column name”) will allow you to not include columns that we don’t want to split from this process. List.Accumulate will allow you to use that list of column names and apply an “accumulator” to a table where you can pass in a function to split the columns in that list by a delaminate into new rows. Super cool! These are my kind of crazy notes, if it this sounds epic but you can’t follow my notes, reach out to Cristian! He’s a Power Query wizard and loves helping.

PowerQuery.How is also an extremely valuable resource. Custom GPT that Cristian created that is open to all = https://bit.ly/PQMagic

linktr.ee/cristiangyal

Semantic Model Optimization for Enterprise AI Enablement by Samson Truong

Goal of AI is to enable enterprise to answer business questions quickly, but it lacks business context and often runs into issues interpreting organizational knowledge like custom fiscal calendars, custom KPIs, and often returns generic answers.

Use star schemas for simplicity and performance. This allows copilot to easily interpret business units and metrics. Creating explicit relationships (don’t use treat as or use relationship) allows copilot to easily navigate your model. Strong relationships guide copilot and DAX logic.

Well-structured DAX makes a difference! Use easy to explain DAX or add comments, name intuitively, and predefine key metrics. Additionally, add in some descriptions. You can even leverage AI to generate these and simply edit the more complex use cases. Copilot only reads the first 200 characters, so lead with that.

Meaningful hierarchies can allow copilot to dive deeper and create drill downs into it’s responses.

Use data value standardization. For example, use High, Low, and Medium instead of High/Hi/1.

Define and label KPIs. Avoid making users ask for common metrics.

RLS is EXTREMELY important in the world of AI. This allows copilot to be helpful without risking security.

Tools to optimize your model for AI = best practice analyzer, Prep data for AI, AI data schemas, verified answers, AI instructions. These are all baked into Fabric which is great! Pretty cool to see the BPA run from a python notebook in Fabric. Has a very clean output. There’s also a memory optimizer that can highlight columns not being used that take up a bunch of space.

Data Saturday Columbus 2025 Recap

This weekend was one of my favorite Data Saturday events, Data Saturday Columbus! If you’re in the area, I highly recommend it. Always some great speakers, amazing lunch, and super engaged folks hungry to learn. Thank you to everyone who came out for an 8:30 AM session on Decoding Fabric Licensing, the questions were great and I really loved all the conversations after the session! Also, hope you all enjoyed the peanut butter cookies 🙂

Event link: https://datasaturdays.com/Event/20250816-datasaturday0071

My presentation: https://github.com/Anytsirk12/DataOnWheels/tree/main/2025%20Presentations/2025%20Data%20Saturday%20Columbus

Lenore Flower, myself, Jason Romans from SML!
Sheila Romans, myself, Lenore Flower, Jason Romans, Tony Wilhelm

Power BI Directory: Enhancing Data Navigation and User Experience by Miranda Grant

Pretty awesome solution for a directory app that leverages Power Automate to manage requests for access and show folks what they have access to inside a Power BI environment.

The built in access request is insufficient. Using power automate, it can pick up this email as a trigger then send a custom request to folks to direct them to instructions and details on how to get access to apps/reports plus see what they have access to already.

Fun tip – workspace descriptions have no character limit but app descriptions only have 200 characters allowed.

Build for your users who hate computers. The goal is to give them as many ways to navigate/understand as possible.

Effortless Data Transformation with Microsoft Fabric by Chris Schmidt

I really enjoyed this session on Real-Time Intelligence in Microsoft Fabric. What stood out to me was how simple it can be to transform and analyze data as it comes in without having to build out complicated pipelines or deal with fragile orchestration. Instead, Fabric can handle those transformations automatically, which feels like a big time-saver.

We also spent some time diving into KQL (Kusto Query Language). I had heard of it before, but this gave me a much clearer picture of how powerful it can be for real-time analytics. Seeing the practical examples and use cases across different industries helped me connect the dots and inspired me with some new ideas for my session for PASS about RTI!

Side Quest – Origami with Tabitha Shore

There were a few sessions late in the day I had planned to attend, but instead had an awesome side quest in what we call the “Hallway Track” with Tabitha, Peter Shore’s daughter. We’ve gotten to know each other over the past couple of years as she volunteers at Data Saturday Columbus annually. She was gracious enough to teach me how to make “little dudes”, the cutest animal that you can draw your own face on! We had a blast making mini ones and hiding them around the conference and I now have one in my office so I never forget how to make them 🙂 Highly recommend learning something that’s not necessarily technical whenever possible, you never know who you may get to know! Always good to remember that while technology changes every day, having solid human connections and friends will never go out of style.

DevUp 2025 Recap

Had an incredible time at DevUp in St. Louis this weekend! This conference gets better and better each year and draws in some incredible folks from all over the country. Huge shout out to the organizers for planning such a fun event and facilitating an attendee party where we all got to make some great new friends (and even see a few magic tricks!). I’m honored they asked me to give two sessions this year, thank you to everyone who attended and asked thought provoking questions that really took the sessions to a new level. Hope we can stay connected and learn more from each other in the future! Below are some notes from a few sessions I attended or heard about from other attendees that sounded amazing. I also included a link to my github with the slides and resources used in the sessions for future reference.

https://www.devupconf.org/sessions

Link to my github: https://github.com/Anytsirk12/DataOnWheels/tree/main/2025%20Presentations/2025%20DevUp

It was also awesome to hang out with another member of the P3 tribe, James Bartlett! We always have a blast and he had some awesome sessions, so check him out if you attend a conference he’s at! But also, check out the Drill Down podcast (https://www.linkedin.com/events/7285420982427693057/). Highly recommend tuning in to their bi-weekly pod for some top notch content and speakers 🙂

How to Navigate Change with Confidence by Alain Hunkins

alainhunkins.com/devup

Industrial Age: human resources
Digital Age: human beings

The goal is to crack the code of human behavior to help humans become great. The key is ritual.

Ritual = shared meaning + repetition + emotion

You manage processes and projects, you lead people.

Leading humans is all about collaboration, communication, and connection.

Your brain has a negativity bias.

Every behavior of a person is a lagging indicator of the behavior of their leader.

Trust = credibility (do what you say) + connection (show up real) + care (act like others matter, fake it until you make it if needed)

As leaders, build rituals around credibility, connection, and care. Small actions have big impact.

Ritual <> Routine (habitual action with no emotion or meaning)

Ritual design elements = purpose, cadence, format, leadership signal, test & adjust

3-Question Check-in is a great tool. How are you feeling? What is on your mind? How can I support you?

Psychological safety is key to a great team. What does that entail? Equal airtime, leaders model vulnerability,

End meetings by asking for receipts – what were the major action items?

How do we start? 5-minute ritual challenge – pick a stupid small ritual, scale, try it for 30 days. There’s a lot of rituals you can implement. Which ritual is best? The one you use. Some options are the agile leader toolkit. https://alainhunkins.com/devup_agile/

Build a Fabric Real-Time Intelligence Solution in a Day by Alex Powers

Very hands on workshop, highly recommend attending one of these if you get a chance! It was a 90 minute demo where you build your own RTI solution end-to-end. Pretty incredible!

Power BI Governance – All-In-One Solution for Impact Analysis & Backups by Chris Cusumano

Awesome solution to view your Power BI estate without being a tenant admin. Easily identify what will break if a measure changes down to the visual level. I’m super bummed I didn’t get to attend this one, but we chatted about it together after lunch and WOW it blew me away! Check it out and let me know what you think!

https://github.com/chris1642/Power-BI-Backup-Impact-Analysis-Governance-Solution

https://chris1642.github.io/Power_BI_Governance_Presentation/

Seamless, open and intelligent integrations with Microsoft Fabric by Alex Powers

Alex expertly walked through easy ways to utilize Fabric as an application developer. Honestly, it blew me away to see the demos of using event houses, transanalytical task flows, workflows, and AI features throughout Fabric as a backend for an application. This is definitely a space to watch as Fabric matures in a world of AI and agents.

Report Building at Light Speed – Wireframes & Themes by Mike Carlo

This is a session I desperately wanted to attend, but couldn’t make it since I was speaking at the same time. I’m hoping Mike posts some of the resources from this talk soon (keep an eye out on PowerBI.tips resources!) because just our discussions about this were incredible. If you’re looking to level up your report design, I cannot recommend it enough.

Here’s the abstract for more details:

As you think about building Power BI reports, there is so much time needed to stylize to create a stunning looking report. Wouldn’t it be nice to build reports faster with more precision.

This is a fun demo heavy session about how to create wireframes of Power BI reports with themed properties. The worlds best theme generator has evolved to now build entire reports!

Come watch how easy it is to build reports using Wireframes.

Topics covered within this session
– Basics of the report building process
– Basics of creating Scrims “Backgrounds” for your report
– Rapid prototyping of reports using Wireframes

SQL Saturday Baton Rouge 2025

SQL Saturday Baton Rouge is a great event hosted on LSU’s campus that has some of the best food, presenters, and attendees in the country! Had some incredible questions for my session and learned a ton throughout the conference. Thank you to all the sponsors and volunteers who make this event happen year after year (16 years running!). It’s one of the first events I ever spoke at (https://dataonwheels.wordpress.com/2022/08/23/sql-saturday-baton-rogue-recap/), and I’m so excited I was able to speak here again!

Here’s a link to the event where you can find other incredible speakers and sessions! Pictured below are some of my SML (Saturday Morning Learning) buddies – Dominick Raimato, Jason Romans, Kevin Pereira, and me! Thanks Sheila Romans for the great picture of us nerds!

My session – Mastering Microsoft Fabric Data Warehouses Tips & Tricks

Thanks to everyone who came out to my session and came with me on a journey to kill Pippin’s long-running query, create case insensitive warehouses, and monitor our warehouse and lakehouse queries on a single pane of glass. You guys were such an engaged audience and I loved being able to laugh and commiserate about our SQL experiences together!

You can find the presentation and resources here – https://github.com/Anytsirk12/DataOnWheels/tree/main/2025%20Presentations/2025%20SQL%20Saturday%20Baton%20Rouge

Jason Romans – How to Diagnose a Slow Power BI Report

Jason gave an awesome session about how to use external tools to diagnosis and even proactively solve a slow Power BI Report. Check out his session live if you get a chance, lots of great slides and demos! Below is the abstract from his session, be sure to follow the DAX Shepherd.
https://thedaxshepherd.com/presentations/

Users complain that a Power BI report is so slow that it is not worth using. A Power BI report may not perform well for many reasons, but how do you know where to start? In this session, I will show you some of the main tools I use to dive deep into the common and the not-so-common performance issues I run into. Starting from a slow report, we will learn how to diagnose where the problem is. Beginning with Performance Analyzer in Power BI, we can start to figure out where the issue is. Is it the visual that is slow, or are the DAX queries taking too long?

Next, we will use DAX Studio to tune the DAX queries that are being generated. Along the way, we will look beneath the visuals to see how Power BI is architected so we can better understand how to write better code from the start. We will examine Query Plans and Server Timings to evaluate which of our changes are more beneficial.

Lastly, we will examine methods for detecting and preventing performance issues before they occur. Using Tabular Editor, we will learn how Best Practice Analyzer can detect performance issues with your semantic models.

Kevin Feasel – Applying Forensic Accounting Techniques using SQL and Python

Below are my stream of conscious notes from this session, it was a ton of fun to think of forensically investigate a SQL environment using python. Session was packed with great examples and was an absolute blast to play detective. Kevin runs curated SQL which is an awesome site to get a “curated” list of recent blogs with incredible insights across the data community.
https://curatedsql.com/

Forensic accounting deals with primarily investigation and litigation. Goal is to connect the dots on potential fraud in a way that will stand up in court.

Step 1 – Summary Analysis

  1. Find largest table in terms of records or size
  2. For largest table, review data types and keys
  3. Look at foreign keys off that largest table and do the same analysis over and over.

You end up with a database model. That gives you a high-level view of the database.
There will be some python notebooks available in the future that will walk through how to explore a sample database.

Pandas has a command called df.describe(include=’all’) that will give you stats on a dataframe. Pretty neat way to easily get some info on numeric values especially. Note – you should convert the date fields to be dates within the dataframe (likely default to objects) then you’ll get similar stats for the date columns.
Pandas also has good ways to easily group, count, and sorting (also able to do this in SQL) so you can easily see what the top artifacts are within certain categories. One example is number of vendors with certain expense categories. For example, maybe only one Vendor is selling cleaning supplies. There may be a reason for it, but in an auditing scenario, that is a good place to start looking for preferential treatment and fraud.

Step 2 – Growth Analysis

This looks at changes in ratios over time. Goal is to look for major changes which look to be far outside the norm. Typically this is performed against important additive measures like revenue, cost, number of customers, number of orders, and even active users as a percent of total users (a “fixed cap” ratio).

We can perform an eyeball test at this phase. We can also do comprehensive analysis using KS and PPS tests. Seaborn library in python allows you to easily create lineplots and other visuals to see how additive measures perform over time. This allows you to investigate and gut check growth over time. This can be particularly interesting if you expect a linear increase or decrease but see something more random.

Min max scaler inside of sklearn library (preprocessing) can rescale everything on a range from 0 – 1 to allow you to really compare growth of two very different scaled items.

Gap analysis – sometimes the lack of value can be more interesting. Two meanings – review of controls and where things can go wrong & what data is missing.
Controls – do people have access to edit logs, who can do what? Does data change and do we know when it does? Are we tracking failures? Double-checking calculations on reports.
Gaps and Islands – gaps in data. Gaps are elements in a sequence which are missing. Islands are contiguous sets of data up to and not including the gap.
Accountants care about sequences that should be complete like receipt numbers, purchase orders, check numbers, invoice numbers, etc. Example check 1003 is missing, your accountant wants to see the voided check.

How to solve gaps and islands. Use DENSE_RANK() to build an ordering then calculate a difference between actual value and the DENSE_RANK() to find where the gap pops up and the islands get created.

Warning with gaps – SQL will typically have gaps in IDENTITY. This happens often with rollbacks where identity value gets burned. This also applies to SEQUENCE. Building a gapless sequence is a huge hit on capacity which is why it’s not default. It forces single row insertion. Creates serializable isolation level.

Regression Analysis

Contains a dependent variable (the thing we want to explain). For example, count of invoices per month, count of invoices is the dependent variable. Dependent variable must be continuous, countable feature. Can only be one dependent variable in a regression. Multivariate multiple regression is a thing but it’s really a series of regressions. We have one or more independent variables that we think will explain the dependent variable (time, people, customer, location, etc.). We assume the independent variables will drive the dependent variable but not other independent variables. Data should ideally have a good amount of variety in the independent variables. Ideal is 30-100 rule of thumb (data points per combination of independent variables). More data is better. We also assume that the past behavior is similar to future behavior.

sklearn.linear_model import LinearRegression and sklearn.metrics import mean_square_error from python make conducting a linear regression super easy to implement. You need to fit the model to see how the dependent variable is impacted by the provided independent variables. sm_model.sumary() will give you the OLS regression results. Be sure to use what you learned from your gut check line charts. For example, if there’s a significant deviation from the line in a certain year, try building the linear regression for before and during that significant difference occurs and compare that to what you get after. This may also help you find the true outliers that led to the craziness overall.

Cohort Analysis – looking at top-level aggregates or by a single slice might help identify problems. Differences should be able to be explained by domain experts.

  • Duplication of entries
  • frequency of one thing versus another
  • changes in frequency over time
  • shape of numeric distribution

Intuition on duplication

  • People don’t like to perform more work than necessary
  • when given an option between more work and less work, people will generally choose to less work
  • duplication is usually considered more work
  • if the low-work is easy to understand, people are more likely to gravitate to it ceteris paribus

These are not rules of nature, just intuitions.

Digit analysis – you should expect a uniformed distribution of the last digit in a sequence. There are exceptions. Budgets usually are rounded off and end with 5s or 0s. Estimates in general are also rounded to 5 or 0. Less precise measurements often end in 0.

When people try to fake data, they tend to go with “more random” numbers like 2, 3, 7, and 8. Benford’s law – unlike the final digit, the first digit of a sequence often follows a power law because 1 is more likely to occur than 2, etc. Works best when there’s 2-3 orders of magnitude or more. Pretty neat, definitely look up Benford’s law.

Python library is called beautiful soup for this. Pretty cool.

Keep records of your findings. As you dig, you will want to have that data readily available for future reference.

Jonathan Stewart – Building scalable data warehouses in Fabric

Jonathan walks through the basics of building a data warehouse that can easily grow without taking down capacity. He walks through best data integration options, star schema principles, and some great data modeling best practices. Column selection, JOIN, and aggregation optimization techniques were the focus.

V-Ordering, when should you do it? It’s enabled by default for read optimization on the data warehouse but off by default for data lakehouses to prioritize data timeliness.

  • Direct Lake Semantic models
  • If user experience is priority over data timeliness

Size Optimization Strategy = choose the smallest data type that supports your data requirements. Plan for scaling, but realistically. A column that is yes/no will never have more than 3 characters so it can be a VARCHAR(3).

His abstract:

Learn how to design and implement scalable data warehouses using Microsoft Fabric. This session will cover best practices for data storage, management, and retrieval, as well as the use of Fabric’s data warehousing features. Attendees will gain insights into creating efficient data pipelines, optimizing performance, and ensuring data consistency.