Using Lakehouse data
at scale with Power BI.
Featuring Power BI
Direct Lake mode!
Stijn Wynants
Benni De Jagere
Slides
CAT
Premium Sponsors
Raffle Prizes
Standard Sponsors
Benni De Jagere
Senior Program Manager | Fabric Customer Advisory Team ( FabricCAT )
Fabric CAT
.be Member
@BenniDeJagere
/bennidejagere
/bennidejagere
/bennidejagere
#SayNoToPieCharts
Stijn Wynants
Senior Customer Engineer | FastTrack Engineering
FastTrack
.be Member
@SQLStijn
/stijn-wynants-ba528660/
/Stijn-wynants
Fabric Espresso
#OneMore?
Disclaimer: We’re not benchmarking
Session Objectives
Session Objectives
Introduce Fabric and OneLake
Set the scene for Direct Lake
Take it for spin.. ☺
Introducing Fabric
Microsoft Fabric
The unified data platform for the era of AI
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Science Warehousing Time Analytics Activator
OneLake
One Copy for all computes
Real separation of compute and storage
All the compute engines store their data
automatically in OneLake
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI The data is stored in a single common format
Factory Engineering Science Warehousing Time Analytics Activator
Delta – Parquet, an open standards format,
is the storage format for all tabular data in
Analytics vNext
Spark T-SQL
Serverless KQL
Analysis
Once data is stored in the lake, it is directly
Compute Services
accessible by all the engines without needing
any import/export
All the compute engines have been fully
optimized to work with Delta Parquet as their
Customers Service Business
360
Finance
Telemetry KPIs
native format
Delta – Delta – Delta – Delta – Shared universal security model is enforced
Parquet Parquet Parquet Parquet
FormatÅ Format Format Format across all the engines
Database files
SQL
“Direct Query Mode”
DAX
Queries Queries
Data Power BI
Tables Scan Warehouse/ Analysis
Slow, but real time
Reports
Lakehouse Services
Storage
Database files
“Import Mode”
DAX
Data
Queries
Power BI
Tables Scan Warehouse/ Import
Analysis
Latent & duplicative but fast
Reports
Lakehouse Services
Storage
Copy of
Tables
Database files
SQL DAX
“Direct Query Mode” Scan
Data
Queries
Power BI Queries
Tables Warehouse/ Analysis
Slow, but real time
Reports
Lakehouse Services
Storage
Database files
DAX
“Import Mode” Scan
Data
Import Power BI
Queries
Tables Warehouse/ Analysis
Latent & duplicative but fast
Reports
Lakehouse Services
Storage
Copy of
Tables
Parquet/Delta Lake
“Direct Lake Mode”
DAX
Data
Queries
Power BI
Tables Warehouse/ Scan Analysis
Perfect!
Reports
Lakehouse Services
OneLake
Why Delta?
Why Delta (Parquet)?
Open Standard for data format
Column oriented, efficient data storage and retrieval
Efficient Data Compression and Encoding
Becoming the Industry Standard
Well suited for pruning ( Column, rowgroup)
Thrives on bulk operations
Inside Delta (Parquet)
Header:
RowGroup1:
StoreID: StoreA, StoreA, StoreA
DateTime : 2023-01-01, 2023-01-02, 2023-01-03
ProductID : SKU001, SKU001, SKU001
Value: 10, 15, 12
RowGroup2:
….
Footer:
Inside Delta (Parquet) – Dictionary IDs
Header:
RowGroup1:
StoreID: 1, 1, 1
DateTime : 1, 2, 3
ProductID : 1, 1, 1
Value: 1, 2, 3
RowGroup2:
….
Footer:
Introducing V-Ordering
Write time optimization to parquet files
Sorting, row group distribution, dictionary encoding, and
compression (Shuffling)
Complies to the open standard
Z-Order, compaction, vacuum, time travel, etc. are
orthogonal to V-Order
V-ordering in action
Microsoft Internal DB (162 tables)
CSV Parquet V-Order
880GB 268GB 84GB
x3.2
Reduced IO for workloads
V-ordering in our demo case
STOP! Demo time!
Using Direct Lake mode over a Lakehouse
DirectLake Mode
On start, no data is loaded in-memory
Column data is transcoded from Parquet files when queried
Multi-column tables can have mix of transcoded (resident) and non-
resident
Column data can get evicted over time
DirectLake fallback to SQL Server for suitable sub-queries
“Framing” of dataset determines what gets loaded from DeltaLake
DQ Fallback
Dataset
Direct Lake Mode Delta Lake
Lakehouse (Parquet Files)
DAX/MDX Fallback? DQ Trips001.parquet
Trips002.parquet
Verti-
Scan Trips003.parquet
DimBike001.parquet
Duration
On demand transcoding as
Station
Trips
Bike
needed
Framing
What is framing
"point in time" way of tracking what data can be queried by DirectLake
Why is this important
Delta-lake data is transient for many reasons
ETL Process
Ingest data to delta lake tables
Transform as needed using preferred tool
When ready, perform Framing operation on dataset
Framing is near instant and acts like a cursor
Determines the set of .parquet files to use/ignore for transcoding operations
Framing
Source Data Delta Lake Power BI
(ADLS Parquet Files)
Dataset
EVALUATE ‘Table’
1,2,3 1,2,3 Full Refresh 1
Value
4,5,6 4,5,6 Full Refresh 2 -------
1
2
7,8,9 7,8,9 Full Refresh 3 3
4
5
6
7
8
9
STOP! Demo time!
Let’s look at Framing
Optimizing Delta for Direct Lake mode
Optimizing Delta for Direct Lake mode
• V-Order makes a big difference, as it’s tailored for Verti-Scan
• Direct Lake will work over Shortcuts to external data
Expect a performance impact, because reasons ..
• Direct Lake thrives on fewer, larger .parquet files
Physical structure will always be crucial
OPTIMIZE (bin-compaction) and VACUUM in the Data Engineering process will be key
Especially with streaming/small batch architectures, keep this in mind
• Principle of lean models will still apply
Only include what’s needed for the reports and datasets
Common Answers to Common Questions
“Greatest Hits”
• Delta doesn’t like spaces in object names ☺
• Delta Tables are a hard requirement for Direct Lake mode
Dataflows Gen2, Pipelines, Notebooks can create them for you in the lakehouse
• Web modelling is the only way to use DirectLake for now
• XMLA Read/Write is not yet supported
No External Tools, Calc Groups, ..
• DirectLake doesn’t have unique DAX limitations
DQ does ..
• No confirmed plans right now to support Apache Iceberg, HUDI, ..
• No, you can’t have Copilot yet
What does this mean for my data modelling?
Thanks, @KoVer!
Data should be transformed as far
upstream as possible, and as far
downstream as necessary.
Matthew Roche, 2021
(The purple haired sword afficionado)
https://ssbipolar.com/2021/05/31/roches-maxim
Resources
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-overview
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-pbi-
reporting
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-
and-v-order?tabs=sparksql
https://fabric.guru/power-bi-direct-lake-mode-frequently-asked-questions
https://www.fourmoo.com/2023/05/24/using-power-bi-directlake-in-microsoft-
fabric/
Slides
https://github.com/BenniDeJagere/Presentations/{Year}/{YYYYMMDD}_{Event}
Thank you