0% found this document useful (0 votes)

135 views38 pages

Lake House Data at Scale With Power Bi

Uploaded by

JAKSON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views38 pages

Lake House Data at Scale With Power Bi

Uploaded by

JAKSON

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Using Lakehouse data

at scale with Power BI.

Featuring Power BI
Direct Lake mode!
Stijn Wynants
Benni De Jagere

Slides

CAT
Premium Sponsors
Raffle Prizes
Standard Sponsors
Benni De Jagere
Senior Program Manager | Fabric Customer Advisory Team ( FabricCAT )

Fabric CAT
.be Member
@BenniDeJagere
/bennidejagere
/bennidejagere
/bennidejagere
#SayNoToPieCharts
Stijn Wynants
Senior Customer Engineer | FastTrack Engineering

FastTrack
.be Member
@SQLStijn
/stijn-wynants-ba528660/
/Stijn-wynants
Fabric Espresso
#OneMore?
Disclaimer: We’re not benchmarking
Session Objectives
Session Objectives

Introduce Fabric and OneLake

Set the scene for Direct Lake
Take it for spin.. ☺
Introducing Fabric
Microsoft Fabric
The unified data platform for the era of AI

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Science Warehousing Time Analytics Activator

OneLake
One Copy for all computes
Real separation of compute and storage

All the compute engines store their data

automatically in OneLake

Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI The data is stored in a single common format
Factory Engineering Science Warehousing Time Analytics Activator

Delta – Parquet, an open standards format,

is the storage format for all tabular data in
Analytics vNext

Spark T-SQL
Serverless KQL
Analysis
Once data is stored in the lake, it is directly
Compute Services
accessible by all the engines without needing
any import/export

All the compute engines have been fully

optimized to work with Delta Parquet as their
Customers Service Business
360
Finance
Telemetry KPIs
native format

Delta – Delta – Delta – Delta – Shared universal security model is enforced

Parquet Parquet Parquet Parquet
FormatÅ Format Format Format across all the engines
Database files
SQL
“Direct Query Mode”
DAX
Queries Queries
Data Power BI
Tables Scan Warehouse/ Analysis

Slow, but real time

Reports
Lakehouse Services

Storage

Database files

“Import Mode”
DAX
Data
Queries
Power BI
Tables Scan Warehouse/ Import
Analysis
Latent & duplicative but fast
Reports
Lakehouse Services

Storage
Copy of
Tables
Database files
SQL DAX
“Direct Query Mode” Scan
Data
Queries
Power BI Queries
Tables Warehouse/ Analysis
Slow, but real time
Reports
Lakehouse Services

Storage

Database files
DAX
“Import Mode” Scan
Data
Import Power BI
Queries
Tables Warehouse/ Analysis
Latent & duplicative but fast
Reports
Lakehouse Services

Storage
Copy of
Tables

Parquet/Delta Lake

“Direct Lake Mode”

DAX
Data
Queries
Power BI
Tables Warehouse/ Scan Analysis

Perfect!
Reports
Lakehouse Services

OneLake
Why Delta?
Why Delta (Parquet)?

Open Standard for data format

Column oriented, efficient data storage and retrieval

Efficient Data Compression and Encoding

Becoming the Industry Standard

Well suited for pruning ( Column, rowgroup)

Thrives on bulk operations

Inside Delta (Parquet)

Header:
RowGroup1:
StoreID: StoreA, StoreA, StoreA
DateTime : 2023-01-01, 2023-01-02, 2023-01-03
ProductID : SKU001, SKU001, SKU001
Value: 10, 15, 12
RowGroup2:
….
Footer:
Inside Delta (Parquet) – Dictionary IDs

Header:
RowGroup1:
StoreID: 1, 1, 1
DateTime : 1, 2, 3
ProductID : 1, 1, 1
Value: 1, 2, 3
RowGroup2:
….
Footer:
Introducing V-Ordering

Write time optimization to parquet files

Sorting, row group distribution, dictionary encoding, and
compression (Shuffling)
Complies to the open standard
Z-Order, compaction, vacuum, time travel, etc. are
orthogonal to V-Order
V-ordering in action
Microsoft Internal DB (162 tables)

CSV Parquet V-Order

880GB 268GB 84GB

x3.2
Reduced IO for workloads
V-ordering in our demo case
STOP! Demo time!
Using Direct Lake mode over a Lakehouse
DirectLake Mode
On start, no data is loaded in-memory
Column data is transcoded from Parquet files when queried
Multi-column tables can have mix of transcoded (resident) and non-
resident
Column data can get evicted over time
DirectLake fallback to SQL Server for suitable sub-queries
“Framing” of dataset determines what gets loaded from DeltaLake
DQ Fallback
Dataset
Direct Lake Mode Delta Lake
Lakehouse (Parquet Files)

DAX/MDX Fallback? DQ Trips001.parquet

Trips002.parquet

Verti-
Scan Trips003.parquet

DimBike001.parquet
Duration

On demand transcoding as
Station
Trips

Bike

needed
Framing
What is framing
"point in time" way of tracking what data can be queried by DirectLake
Why is this important
Delta-lake data is transient for many reasons
ETL Process
Ingest data to delta lake tables
Transform as needed using preferred tool
When ready, perform Framing operation on dataset
Framing is near instant and acts like a cursor
Determines the set of .parquet files to use/ignore for transcoding operations
Framing
Source Data Delta Lake Power BI
(ADLS Parquet Files)
Dataset
EVALUATE ‘Table’
1,2,3 1,2,3 Full Refresh 1
Value
4,5,6 4,5,6 Full Refresh 2 -------
1
2
7,8,9 7,8,9 Full Refresh 3 3
4
5
6
7
8
9
STOP! Demo time!
Let’s look at Framing
Optimizing Delta for Direct Lake mode
Optimizing Delta for Direct Lake mode
• V-Order makes a big difference, as it’s tailored for Verti-Scan
• Direct Lake will work over Shortcuts to external data
Expect a performance impact, because reasons ..

• Direct Lake thrives on fewer, larger .parquet files

Physical structure will always be crucial
OPTIMIZE (bin-compaction) and VACUUM in the Data Engineering process will be key
Especially with streaming/small batch architectures, keep this in mind

• Principle of lean models will still apply

Only include what’s needed for the reports and datasets
Common Answers to Common Questions
“Greatest Hits”
• Delta doesn’t like spaces in object names ☺
• Delta Tables are a hard requirement for Direct Lake mode
Dataflows Gen2, Pipelines, Notebooks can create them for you in the lakehouse

• Web modelling is the only way to use DirectLake for now

• XMLA Read/Write is not yet supported
No External Tools, Calc Groups, ..

• DirectLake doesn’t have unique DAX limitations

DQ does ..

• No confirmed plans right now to support Apache Iceberg, HUDI, ..

• No, you can’t have Copilot yet
What does this mean for my data modelling?
Thanks, @KoVer!
Data should be transformed as far
upstream as possible, and as far
downstream as necessary.

Matthew Roche, 2021

(The purple haired sword afficionado)
https://ssbipolar.com/2021/05/31/roches-maxim
Resources
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-overview
https://learn.microsoft.com/en-us/power-bi/enterprise/directlake-analyze-qp
https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-pbi-
reporting
https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-
and-v-order?tabs=sparksql
https://fabric.guru/power-bi-direct-lake-mode-frequently-asked-questions
https://www.fourmoo.com/2023/05/24/using-power-bi-directlake-in-microsoft-
fabric/
Slides
https://github.com/BenniDeJagere/Presentations/{Year}/{YYYYMMDD}_{Event}
Thank you

Fabric
100% (1)
Fabric
46 pages
SCD Type-2 with Pandas in Spark
0% (1)
SCD Type-2 with Pandas in Spark
8 pages
Qlik Analytics Introduction - Revenue Dept
No ratings yet
Qlik Analytics Introduction - Revenue Dept
33 pages
Power BI Interview Questions
No ratings yet
Power BI Interview Questions
15 pages
Microsoft Fabric: Unified Data Analytics
No ratings yet
Microsoft Fabric: Unified Data Analytics
15 pages
Nikitha Data Analyst
No ratings yet
Nikitha Data Analyst
4 pages
End-To-End Tutorials in Microsoft Fabric
No ratings yet
End-To-End Tutorials in Microsoft Fabric
3 pages
Databricks Widgets Overview and Usage
No ratings yet
Databricks Widgets Overview and Usage
13 pages
Databricks Exam
No ratings yet
Databricks Exam
14 pages
Databricks Certified Data Engineer Associate Course V2 Release
No ratings yet
Databricks Certified Data Engineer Associate Course V2 Release
300 pages
Databricks Python & Linux Commands Guide
No ratings yet
Databricks Python & Linux Commands Guide
109 pages
Fabric Settings
No ratings yet
Fabric Settings
3 pages
Microsoft Fabric Licensing Deck
No ratings yet
Microsoft Fabric Licensing Deck
12 pages
Microsoft Fabric Overview and Features
No ratings yet
Microsoft Fabric Overview and Features
99 pages
Work With Task Flows (Preview)
No ratings yet
Work With Task Flows (Preview)
15 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
Microsoft DP 600 Certsdeals Actual Questions by Davis 22 07 2024 8qa
No ratings yet
Microsoft DP 600 Certsdeals Actual Questions by Davis 22 07 2024 8qa
10 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Aman's Resume BI
No ratings yet
Aman's Resume BI
1 page
SQL Server 2025
No ratings yet
SQL Server 2025
3 pages
Azure Databricks Mastery
No ratings yet
Azure Databricks Mastery
95 pages
Azure Databricks Onboarding Guide
No ratings yet
Azure Databricks Onboarding Guide
298 pages
Key Features of Apache Airflow 2.0
100% (2)
Key Features of Apache Airflow 2.0
39 pages
Certscare DP 600 Questions by Kirby 15 04 2024 12qa
No ratings yet
Certscare DP 600 Questions by Kirby 15 04 2024 12qa
15 pages
HCL Interview Prepration
No ratings yet
HCL Interview Prepration
4 pages
Data Lake Azure
No ratings yet
Data Lake Azure
290 pages
Deepak Dubey Data Engineer Resume
No ratings yet
Deepak Dubey Data Engineer Resume
2 pages
Microsoft Fabric
100% (1)
Microsoft Fabric
26 pages
A - Learning - Oreilly.com-Preface Data Engineering With AWS
No ratings yet
A - Learning - Oreilly.com-Preface Data Engineering With AWS
6 pages
Data Analyst with Healthcare Expertise
No ratings yet
Data Analyst with Healthcare Expertise
3 pages
Big Book of Data Warehousing and Bi v11 010925 Final
No ratings yet
Big Book of Data Warehousing and Bi v11 010925 Final
110 pages
Limitless Analytics with Azure Synapse
100% (1)
Limitless Analytics with Azure Synapse
29 pages
Union Bank Interview
No ratings yet
Union Bank Interview
30 pages
DP-900 Exam Skills Overview
No ratings yet
DP-900 Exam Skills Overview
3 pages
Data Lake Vs Warehouse Vs Lakehouse Vs Mesh Vs Fabric 1651985778
100% (1)
Data Lake Vs Warehouse Vs Lakehouse Vs Mesh Vs Fabric 1651985778
10 pages
Azure Data Factory Monitoring Best Practices
No ratings yet
Azure Data Factory Monitoring Best Practices
9 pages
Power BI Gateway and Data Refresh Explained
No ratings yet
Power BI Gateway and Data Refresh Explained
25 pages
DP 600 Dumps
No ratings yet
DP 600 Dumps
36 pages
Power BI Cheat Sheet
No ratings yet
Power BI Cheat Sheet
10 pages
Lead Data Engineer with AWS Expertise
No ratings yet
Lead Data Engineer with AWS Expertise
2 pages
Top SQL Interview Questions 2024
No ratings yet
Top SQL Interview Questions 2024
20 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
DP-200 Exam: Exam DP-200 Exam Title Implementing An Azure Data Solution 8.0 Product Type 120 Q&A With Explanations
100% (1)
DP-200 Exam: Exam DP-200 Exam Title Implementing An Azure Data Solution 8.0 Product Type 120 Q&A With Explanations
156 pages
Use Delta Lake in Azure Synapse Analytics
No ratings yet
Use Delta Lake in Azure Synapse Analytics
37 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Relational (OLTP) Data Modeling
No ratings yet
Relational (OLTP) Data Modeling
2 pages
Data Engineer Interview Questions With Examples
No ratings yet
Data Engineer Interview Questions With Examples
8 pages
Naresh DE
No ratings yet
Naresh DE
5 pages
CDH To CDP Migration-July29v3
0% (1)
CDH To CDP Migration-July29v3
22 pages
Data Warehousing Concepts and Basics
No ratings yet
Data Warehousing Concepts and Basics
21 pages
DP 600t00a Enu Powerpoint 08
No ratings yet
DP 600t00a Enu Powerpoint 08
15 pages
Lakehouse: A Unified Data Architecture
No ratings yet
Lakehouse: A Unified Data Architecture
9 pages
DP 300 Demo
No ratings yet
DP 300 Demo
13 pages
Sqoop Demo
No ratings yet
Sqoop Demo
7 pages
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
No ratings yet
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
10 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
Azure Analysis Services Overview
No ratings yet
Azure Analysis Services Overview
173 pages
Data Engineering Optimization Best Practices
No ratings yet
Data Engineering Optimization Best Practices
53 pages
Power BI Developer of The Future With Fabric Slides
No ratings yet
Power BI Developer of The Future With Fabric Slides
32 pages
Data Structures Practical File
No ratings yet
Data Structures Practical File
63 pages
What Is Central Management Store
No ratings yet
What Is Central Management Store
9 pages
Create Bootable CD-Roms Easily
No ratings yet
Create Bootable CD-Roms Easily
8 pages
Installing Oracle Database 10g Release 1 and 2 On Red Hat Enterprise Linux
No ratings yet
Installing Oracle Database 10g Release 1 and 2 On Red Hat Enterprise Linux
35 pages
Qlik GeoAnalytics Datasheet1 - 1
No ratings yet
Qlik GeoAnalytics Datasheet1 - 1
3 pages
19c CloneDocument
No ratings yet
19c CloneDocument
14 pages
SQL Index Queries for DBMS Systems
No ratings yet
SQL Index Queries for DBMS Systems
10 pages
Join (SQL) : Sample Tables
No ratings yet
Join (SQL) : Sample Tables
9 pages
SQL Functions and Queries Explained
No ratings yet
SQL Functions and Queries Explained
40 pages
Dissertation Report: Tribal Museum and Development Center, Dindori, Madhya Pradesh.
No ratings yet
Dissertation Report: Tribal Museum and Development Center, Dindori, Madhya Pradesh.
137 pages
Veritas Files System Adm Guide
No ratings yet
Veritas Files System Adm Guide
296 pages
SQL Trace Analysis for DBAs
No ratings yet
SQL Trace Analysis for DBAs
29 pages
Info 202
No ratings yet
Info 202
1 page
DataStage Buffer Size Optimization
No ratings yet
DataStage Buffer Size Optimization
30 pages
Project Report Library Management System Free
No ratings yet
Project Report Library Management System Free
38 pages
SQL Queries for Employee Database
No ratings yet
SQL Queries for Employee Database
9 pages
OWASP Top 10 Vulnerabilities Guide
No ratings yet
OWASP Top 10 Vulnerabilities Guide
5 pages
Supplier Data Conversion Guide
100% (1)
Supplier Data Conversion Guide
19 pages
Query 10 SQL Scripts
No ratings yet
Query 10 SQL Scripts
3 pages
Data Mining: A Four-Phase Guide
No ratings yet
Data Mining: A Four-Phase Guide
1 page
WDD Experiments
No ratings yet
WDD Experiments
39 pages
RS-LR Copnversion
No ratings yet
RS-LR Copnversion
3 pages
People Analytics: A Practical Guide
No ratings yet
People Analytics: A Practical Guide
31 pages
Hotel Management System Project
No ratings yet
Hotel Management System Project
10 pages
Benefits of Managed Services in Data Analytics
No ratings yet
Benefits of Managed Services in Data Analytics
25 pages
MariaDB Installation and Management Guide
No ratings yet
MariaDB Installation and Management Guide
10 pages
D426 Practice Test With Answer Key
100% (1)
D426 Practice Test With Answer Key
22 pages
NetWorker 8.2 and Service Packs Release Notes
No ratings yet
NetWorker 8.2 and Service Packs Release Notes
166 pages
Laboratorio Modulo1
No ratings yet
Laboratorio Modulo1
8 pages
Accounting Error Codes List
No ratings yet
Accounting Error Codes List
6 pages

Lake House Data at Scale With Power Bi

Uploaded by

Lake House Data at Scale With Power Bi

Uploaded by

Using Lakehouse data

at scale with Power BI.

Introduce Fabric and OneLake

All the compute engines store their data

Delta – Parquet, an open standards format,

All the compute engines have been fully

Delta – Delta – Delta – Delta – Shared universal security model is enforced

Slow, but real time

“Direct Lake Mode”

Open Standard for data format

Column oriented, efficient data storage and retrieval

Efficient Data Compression and Encoding

Becoming the Industry Standard

Well suited for pruning ( Column, rowgroup)

Thrives on bulk operations

Write time optimization to parquet files

CSV Parquet V-Order

880GB 268GB 84GB

DAX/MDX Fallback? DQ Trips001.parquet

• Direct Lake thrives on fewer, larger .parquet files

• Principle of lean models will still apply

• Web modelling is the only way to use DirectLake for now

• DirectLake doesn’t have unique DAX limitations

• No confirmed plans right now to support Apache Iceberg, HUDI, ..

Matthew Roche, 2021

You might also like