0% found this document useful (0 votes)
26 views118 pages

SQL Analytics On Databricks

The document outlines a course on SQL Analytics using Databricks, detailing the agenda, learning objectives, and lab exercises. Key topics include data discovery with Unity Catalog, data importing processes, and SQL execution techniques. It emphasizes best practices for data governance and the use of Databricks features for effective data management and analytics.

Uploaded by

Muhammad Atif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views118 pages

SQL Analytics On Databricks

The document outlines a course on SQL Analytics using Databricks, detailing the agenda, learning objectives, and lab exercises. Key topics include data discovery with Unity Catalog, data importing processes, and SQL execution techniques. It emphasizes best practices for data governance and the use of Databricks features for effective data management and analytics.

Uploaded by

Muhammad Atif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

SQL Analytics on

Databricks

Databricks Academy

©2025 Databricks Inc. — All rights reserved


Agenda
Modules in this Course Time

Data Discovery 45 minutes

Data Importing 45 minutes

SQL Execution 70 minutes

Query Analysis 20 minutes

©2025 Databricks Inc. — All rights reserved


Lab Exercise Environment
Technical Details

● Your lab environment is provided by Vocareum.


● It will open in a new tab.
● It has been configured with the permissions and resources required to
accomplish the tasks outlined in the lab exercise.
● Third party cookies must be enabled in your browser for Vocareum’s
user experience to work properly.
● Make sure to enable pop ups!

©2025 Databricks Inc. — All rights reserved


Course Learning Objectives
● Discover data in Databricks using Unity Catalog.
● Manage data object ownership and permissions within Unity Catalog.
● Identify the processes for data ingestion in Databricks.
● Import and manage data files using Databricks SQL and file-based uploads.
● Execute data ingestion processes within the scope of the work done by
data analysts.
● Use Databricks SQL to query, transform, and manipulate datasets.
● Create tables and views as dynamic or materialized views.
● Analyze query performance by leveraging the available information on a
query in Databricks.
● Describe the key points that make up the recommended best practices for
executing SQL analytics on Databricks.
©2025 Databricks Inc. — All rights reserved
Data Discovery

SQL Analytics on Databricks

©2025 Databricks Inc. — All rights reserved


Agenda
Data Discovery Time Lecture Demo Lab

Using Unity Catalog as a Data Discovery Tool 15 mins ✓

Understanding Data Object Ownership 5 mins ✓

Use Unity Catalog to Locate and Inspect



Datasets 15 mins

©2025 Databricks Inc. — All rights reserved


Learning Objectives
● Discover data in Databricks using Unity Catalog.
○ Explain the role of Unity Catalog in managing datasets and metadata.
○ Locate catalogs, schemas, tables, and views within the Catalog Explorer.
● Manage data object ownership and permissions within Unity Catalog.
○ Describe data object ownership models, permissions and governance in
Databricks.
○ Review the permissions settings and metadata available in Catalog Explorer.

©2025 Databricks Inc. — All rights reserved


Data Discovery
LECTURE

Using Unity
Catalog as a Data
Discovery Tool

©2025 Databricks Inc. — All rights reserved


©2025 Databricks Inc. — All rights reserved
Databricks Data Intelligence Platform
100% serverless
Disaster recovery Cost controls Enterprise security

Databricks Workflows/DLT
SQL Ingest, ETL, streaming
Data warehousing
Mosaic AI AI/BI
Artificial intelligence Business intelligence

Lakehouse

©2025 Databricks Inc. — All rights reserved


Databricks Vocabulary

©2025 Databricks Inc. — All rights reserved


Databricks Vocabulary
Terms to learn and understand

• Cloud Service Provider • Principal Databricks Account


• Databricks Account • Service Principal Workspace
• Unity Catalog • User Metastore Principals Compute
• Metastore • Account Groups Catalog Account Groups
Catalog Explorer
• Catalog • Workspace Schema
Table, Users Notebooks
• Schema • Compute Volume,
View, etc. Service Principals SQL Editor
• Data objects • Catalog Explorer
• Table • Notebook Databricks AI/BI

• View • Databricks SQL


• Volume • SQL Editor
Cloud Service Provider
• Function • Databricks AI/BI
• Three-level namespace • Databricks Marketplace
• Delta Sharing

©2025 Databricks Inc. — All rights reserved


Cloud Service Provider
Databricks Vocabulary

Databricks Account
Amazon Web Services

Workspace

Microsoft Azure

Google Cloud Provider


Cloud Service Provider

©2025 Databricks Inc. — All rights reserved


Databricks Account & Workspace
Databricks Vocabulary

Databricks Account Console Databricks Workspace

©2025 Databricks Inc. — All rights reserved


Unity Catalog and Governance
Databricks Vocabulary

Traditional catalogs

Security Collaboration Quality Management


Trust your data with
lineage, monitoring, Access control Discovery Lineage Cost controls
and observability
Secure data Quality Business
Auditing
sharing monitoring semantics

One open
governance Tables AI Models Files Notebooks Dashboards
model for all
data & AI assets

©2025 Databricks Inc. — All rights reserved


Unity Catalog Data Objects
Databricks Vocabulary

UC Provides a unified governance solution for data and AI assets on Databricks

Access
Table
Controls

(Unity) View
Catalog
Metastore
Schema
(database)
Databricks Volume
assigned to Catalog
Account
Schema
Databricks (database) Function
Workspace

Databricks Model
Workspace

©2025 Databricks Inc. — All rights reserved


Catalog Explorer
Databricks Vocabulary
UI driven access control
to simplify secure data
permissioning

Browse and understand


data assets stored in your
platform

Data lineage
End-to-end table &
column lineage

©2025 Databricks Inc. — All rights reserved


Unity Catalog Data Objects
Databricks Vocabulary

Table Catalog

Catalog View

Schema
Volume
Catalog
Schema
Function

Model

©2025 Databricks Inc. — All rights reserved


Unity Catalog Data Objects
Databricks Vocabulary

Table

Schema (database)
Catalog View

Schema
Volume
Catalog

Schema
Function

Model

©2025 Databricks Inc. — All rights reserved


Unity Catalog Data Objects
Databricks Vocabulary

Table

Catalog View Tables and Views

Schema
Volume
Catalog

Schema
Function

Model

©2025 Databricks Inc. — All rights reserved


Unity Catalog Data Objects
Databricks Vocabulary

Table

Catalog View

Schema
Volume
Catalog

Schema
Volumes
Function Store and access files in any
format, including structured,
semi-structured, and
Model unstructured data.

Examples: CSV, Parquet,


TXT, XLSX and more

©2025 Databricks Inc. — All rights reserved


Types of Tables and Volumes
Data Objects

Managed Tables and Volumes External Tables and Volumes


Databricks Account Non- Databricks
Databricks Account Storage
Or third-party
Metastore platforms
Metastore
Catalog Data
Catalog
Schema
Schema
Table Metadata
Table Metadata

Data

Cloud Service Provider

©2025 Databricks Inc. — All rights reserved


Unity Catalog Data Objects
Databricks Vocabulary

Table

Catalog View

Schema
Volume
Catalog

Schema
Function
SQL Functions
Saved logic that return a
Model
value or set of values that
can be shared and governed

©2025 Databricks Inc. — All rights reserved


Address for Data Objects
How to address your data objects in code

Unity Catalog Three-Level Namespace


catalog.schema.dataobjectname

SELECT * FROM catalog.schema.table

©2025 Databricks Inc. — All rights reserved


In the
Databricks
Workspace

©2025 Databricks Inc. — All rights reserved


SQL Editor & Notebooks
In the Workspace

©2025 Databricks Inc. — All rights reserved


AI/BI Dashboards and Genie Spaces
An all-in-one visualization and presentation environment paired with a tool
that allows you to ask questions of your data in natural language using your
terms.

©2025 Databricks Inc. — All rights reserved


Compute
In the Workspace

All-purpose compute
● General purpose compute for
Notebooks

Job compute
● Designed specifically for use with
automated Workflows
SQL warehouses
Serverless compute is available for all
● Optimized for SQL query execution
computational needs throughout the
and data exploration
platform.
©2025 Databricks Inc. — All rights reserved
Data Discovery Options
In the Workspace

Intelligent Search

Programmatic Listing

Catalog Explorer

©2025 Databricks Inc. — All rights reserved


Making Data Discovery Easier
Best practices and AI support

©2025 Databricks Inc. — All rights reserved


Deep Power BI & Tableau Integrations
Seamless catalog integration & data model sync

Power BI Integration

Publish UC datasets from Databricks UI,


without PBI Desktop to Power BI Online.

Sync entire schemas including table


relationships (PK/FK) to save time.

Tableau Integration

Easily explore Unity Catalog datasets in


Tableau Online with a single click from Data
Explorer.

©2025 Databricks Inc. — All rights reserved


Databricks Marketplace

©2025 Databricks Inc. — All rights reserved


Databricks Marketplace
Open Marketplace for All Your Data, Analytics, and AI

Open exchange for all data products


Data
Files
Data
Tables • Datasets
• Notebooks
• Dashboards
Databricks
• ML models
Notebooks Solution
Marketplace
Accelerators
• Solutions Accelerators

Powered by Delta Sharing


ML Dashboards
Models

©2025 Databricks Inc. — All rights reserved


Delta Sharing
Open data sharing and collaboration

● Avoid vendor lock-in with open source


Delta Sharing for seamless data sharing
across clouds, regions, and platforms
without replication

● Share more than just data - Notebooks, ML


models dashboards

● Explore and monetize data products


through an open marketplace

● Confidential collaboration on sensitive data


with scalable clean rooms

©2025 Databricks Inc. — All rights reserved 34


Data Discovery with Unity Catalog
Easily access your data assets within the Databricks Workspace

Intelligent Search

Programmatic Listing

Catalog Explorer

©2025 Databricks Inc. — All rights reserved


Data Discovery
LECTURE

Understanding
Data Object
Ownership

©2025 Databricks Inc. — All rights reserved


Unity Catalog is a data
governance tool.

©2025 Databricks Inc. — All rights reserved


Best Practices: Principle of Least Privilege
Only have access to items when and for as long as you need access

Key Topics: More Secure

• Role-based Access Control


Limited
• Frequent monitoring and Access

auditing Moderate
Access
• Provide access on a “need-
to-know” basis Broad
Access Less Secure

©2025 Databricks Inc. — All rights reserved


Databricks Vocabulary
Terms to learn and understand

• Cloud Service Provider • Principal Databricks Account


• Databricks Account • Service Principal Workspace
• Unity Catalog • User Metastore Principals Compute
• Metastore • Account Groups Catalog Account Groups
Catalog Explorer
• Catalog • Workspace Schema
Table, Users Notebooks
• Schema • Compute Volume,
View, etc. Service Principals SQL Editor
• Data objects • Catalog Explorer
• Table • Notebook Databricks AI/BI

• View • Databricks SQL


• Volume • SQL Editor
Cloud Service Provider
• Function • Databricks AI/BI
• Three-level namespace • Databricks Marketplace
• Delta Sharing

©2025 Databricks Inc. — All rights reserved


Databricks Vocabulary

©2025 Databricks Inc. — All rights reserved


Three Types of Principals
Databricks Vocabulary

Users Service Principals Account Groups

©2025 Databricks Inc. — All rights reserved


Data Object Ownership
Governed by Unity Catalog

©2025 Databricks Inc. — All rights reserved


Types of Permissions on Non-Data Objects
Objects in Workspace Navigation

CAN READ/ ✓ Can view the object or read it’s results


CAN VIEW ✓ Can create a copy of the object

All the above, plus


CAN RUN
✓ Can run the object to update results (e.g. queries and notebooks)

All the above, plus


CAN EDIT
✓ Can make changes to the object’s content (e.g. code, comments)

All the above, plus


CAN
✓ Can modify permissions and access to the object
MANAGE
✓ Can delete the object

©2025 Databricks Inc. — All rights reserved


Type of Permissions on Data Objects
Objects in the Catalog Explorer
Catalogs Schemas Data Objects

©2025 Databricks Inc. — All rights reserved


What does this mean for you?

©2025 Databricks Inc. — All rights reserved


Everyone has a role to play in security
Both platform and users work together

Traditional
catalogs

Security Collaboration Quality Insights

Access Cost
Discovery Lineage
Control Controls

Secure Open Quality Business


Auditing
Data Sharing Monitoring Semantics

Tables AI Models Files Notebooks Dashboards

Delta Lake Parquet

Iceberg

©2025 Databricks Inc. — All rights reserved


Sharing and Grants
For objects that you own

©2025 Databricks Inc. — All rights reserved


Objects you don’t own
Permissions based viewing

● Can only view items you have


permissions to

● Access requests to object


owners

● Access provided by
administers and managers

©2025 Databricks Inc. — All rights reserved


Data Discovery
LAB EXERCISE

Use Unity Catalog


to Locate and
Inspect Datasets

©2025 Databricks Inc. — All rights reserved


Lab Exercise Goals
● What the learner is to accomplish within the lab environment as part of
completing the lab.
● Explain how it relates to the lecture or demonstration preceding the
exercise.
● Announce whether the lab will be completed with the instructor or if it’s
an independent exercise to be completed in the following X minutes.

©2025 Databricks Inc. — All rights reserved


Data Importing

SQL Analytics on Databricks

©2025 Databricks Inc. — All rights reserved


Agenda
Data Importing Time Lecture Demo Lab

Ingesting Data into Databricks 5 mins ✓

Uploading Data to Databricks Using the UI 10 mins ✓

Programmatic Exploration and Data Ingestion to



Unity Catalog 15 mins
Import Data into Databricks 15 mins ✓

©2025 Databricks Inc. — All rights reserved


Learning Objectives
● Identify the processes for data ingestion in Databricks.
○ Explain how data ingestion is performed in Databricks.
○ Differentiate between batch and streaming data ingestion in Databricks.
● Import and manage data files using Databricks SQL and file-based
uploads.
○ Upload a small data file in Databricks.
● Execute data ingestion processes within the scope of the work done by
data analysts.
○ Use SQL to ingest data programmatically into Databricks.

©2025 Databricks Inc. — All rights reserved


Data Importing
LECTURE

Ingesting Data
into Databricks

©2025 Databricks Inc. — All rights reserved


Getting data into Databricks

©2025 Databricks Inc. — All rights reserved


Ingesting Data into Databricks
Delta Lake Overview

Data Sources Databricks

Delta Lake is an open-source


Ingestion protocol for reading and writing
files to cloud storage

Cloud Data Lake

©2025 Databricks Inc. — All rights reserved


Ingesting Data into Databricks
Delta Lake Overview - Table Components
Files are stored
within a directory
Databricks

DeltaTableName /

…282e.snappy.parquet _delta_log/
…432c.snappy.parquet …0000.json
Delta Tables …380c.snappy.parque …0001.json
t …0002.json
…402z.snappy.parquet

Cloud Data Lake Data is stored as Transaction logs


parquet files with metadata

©2025 Databricks Inc. — All rights reserved


Delta Lake Summary
Key Features and Benefits

ACID Transactions

Data Manipulation Language (DML)

Time Travel
Schema Evolution and
Enforcement
Many more!

©2025 Databricks Inc. — All rights reserved


Data Transformation
Medallion Architecture (Multi Hop)
Ingest Data Processing and Transformation Consumers

Data Quality Levels

Bronze Silver Gold BI & Reporting

Batch
ML & AI
Raw Data Cleaned Curated
Ingestion Data Data
Streaming Streaming
Photon Analytics

©2025 Databricks Inc. — All rights reserved


Data Transformation
Medallion Architecture (Multi Hop)
Ingest Data Processing and Transformation Consumers

Data Quality Levels

Bronze Silver Gold BI & Reporting

Batch
ML & AI

Improve Data Quality


Incrementally improve the structure and quality of
Streaming Streaming
data as it flows through each layer.
Analytics

©2025 Databricks Inc. — All rights reserved


Data Transformation
Medallion Architecture (Multi Hop)
Ingest Data Processing and Transformation Consumers

Can remove personally identifiable information(PII) if required

Bronze Silver Gold BI & Reporting

Batch
ML & AI

● Dumping ground for raw data from external source systems


● Often with long retention (years)
Streaming ● Data as it originally existed Streaming
Analytics

©2025 Databricks Inc. — All rights reserved


Data Transformation
Medallion Architecture (Multi Hop)
Ingest Data Processing and Transformation Consumers

Bronze Silver Gold BI & Reporting

Batch
ML & AI

● Filter, cleanse, join and enrich the bronze data


● Define structure and enforce or evolve schema
Streaming ● Single source of truth Streaming
Analytics

©2025 Databricks Inc. — All rights reserved


Data Transformation
Medallion Architecture (Multi Hop)
Ingest Data Processing and Transformation Consumers

Bronze Silver Gold BI & Reporting

Batch
ML & AI

● Clean data, ready for consumption


● Can be business-level aggregates of the silver data
Streaming ● Delivered downstream to users and applications Streaming
Analytics

©2025 Databricks Inc. — All rights reserved


Data Transformation
Medallion Architecture (Multi Hop)
Ingest Data Processing and Transformation Consumers

● DELETE ● OVERWRITE
INSERT
● MERGE ● AGGREGATE
Bronze Silver Gold BI & Reporting

Batch
ML & AI

Delta Lake ACID Support


Enables inserts, deletes, updates an merges throughout
Streaming Streaming
the data transformation process. Analytics

©2025 Databricks Inc. — All rights reserved


Data Transformation Reality
Real World Architecture is Typically More Complex

Data stream source


BI & Reporting

Batch source

Data Lake ML & AI


(CSV, JSON, TXT…)

Data stream source


Streaming
Analytics

©2025 Databricks Inc. — All rights reserved


File Upload UI

Data Importing
Methods
FROM read_files (...)

Common Importing Methods


for Data Analysts

COPY INTO

©2025 Databricks Inc. — All rights reserved


Lakeflow
Working withConnect
Data Engineering

Easily connect key


data sources

DLT
Data Ingestion
Reliable data
Pipelines
pipelines made easy Working with Data Engineering

Jobs

Unified orchestration
for analytics and AI

©2025 Databricks Inc. — All rights reserved


Data Importing
DEMONSTRATION

Upload Data to
Databricks Using
the UI

©2025 Databricks Inc. — All rights reserved


High Level Steps
● List high level steps to be covered in the demonstration.
● Announce whether the demonstration is intended to be done as a
“follow-along” synchronous demonstration or if it’s an instruction only
demonstration to be completed entirely by the instructor and not the
audience.

©2025 Databricks Inc. — All rights reserved


Data Importing
DEMONSTRATION

Programmatic
Exploration and
Data Ingestion to
Unity Catalog
©2025 Databricks Inc. — All rights reserved
High Level Steps
● List high level steps to be covered in the demonstration.
● Announce whether the demonstration is intended to be done as a
“follow-along” synchronous demonstration or if it’s an instruction only
demonstration to be completed entirely by the instructor and not the
audience.

©2025 Databricks Inc. — All rights reserved


Data Importing
LAB EXERCISE

Import Data into


Databricks

©2025 Databricks Inc. — All rights reserved


Lab Exercise Goals
● What the learner is to accomplish within the lab environment as part of
completing the lab.
● Explain how it relates to the lecture or demonstration preceding the
exercise.
● Announce whether the lab will be completed with the instructor or if it’s
an independent exercise to be completed in the following X minutes.

©2025 Databricks Inc. — All rights reserved


SQL Execution

SQL Analytics on Databricks

©2025 Databricks Inc. — All rights reserved


Agenda
SQL Execution Time Lecture Demo Lab

Databricks SQL and Databricks SQL Warehouses 5 mins ✓

The Unified SQL Editor 10 mins ✓

Manipulate and Transform Data with Databricks



SQL 20 mins
Creating Views with Databricks SQL 15 mins ✓

Manipulate and Analyze a Table 20 mins ✓

©2025 Databricks Inc. — All rights reserved


Learning Objectives
● Use Databricks SQL to query, transform, and manipulate datasets.
○ Explain the role of SQL Warehouses in query execution.
○ Describe the performance characteristics of SQL Warehouses.
○ Use the Unified SQL Editor to write and execute queries.
● Create tables and views as dynamic or materialized views.
○ Execute common SQL transformations (aggregations, joins, filtering) using the
Unified SQL Editor.
○ Create a materialized view.
○ Differentiate between dynamic and materialized views.

©2025 Databricks Inc. — All rights reserved


SQL Execution
LECTURE

Databricks SQL
and Databricks
SQL Warehouses

©2025 Databricks Inc. — All rights reserved


Databricks Data Intelligence Platform
100% serverless
Disaster recovery Cost controls Enterprise security

Databricks Workflows/DLT
SQL Ingest, ETL, streaming
Data warehousing
Mosaic AI AI/BI
Artificial intelligence Business intelligence

Lakehouse

©2025 Databricks Inc. — All rights reserved


What exactly is Databricks SQL?

©2025 Databricks Inc. — All rights reserved


AI-driven Optimizations
Automatic and optimized serverless compute that
scales for all data volumes and complex queries

Databricks SQL
supports intelligent
Intelligent Experience
data warehousing
AI powered natural language interfaces for everyone;
and analytics to with built-in understanding of your data and

meet your needs business

Unified Architecture
Ingest, Transform & Query all from a single platform;
fully meets data warehousing and business intelligence
needs while providing full access to AI capabilities
©2025 Databricks Inc. — All rights reserved
The Benefits of Databricks SQL
A full collection of tools for business intelligence and analytics

● Supported by Delta Lake, ANSI SQL


Databricks SQL provides a unified
streaming and batch processing
environment. Full Featured SQL Editor
● Databricks SQL is enterprise
ready, meaning you can use it for
Alerts and Schedules
production workloads.
● Databricks SQL is simple to
administer. All-in-one Environment

©2025 Databricks Inc. — All rights reserved


Databricks SQL Warehouses

©2025 Databricks Inc. — All rights reserved


SQL Warehouses
Your compute for SQL analytics on Databricks

Each SQL Warehouse type has different performance capabilities.

Warehouse Type Photon Engine Predictive IO Intelligent Workload


Management (IWM)
Serverless ✓ ✓ ✓
Pro ✓ ✓
Classic ✓

©2025 Databricks Inc. — All rights reserved


Serverless
Best price/performance

Instant and Elastic Compute


- Eliminate wait times and avoid
over-provisioning

Minimal Management Overhead


- Databricks managed upgrades,
patching, performance optimization
and more

Overall leads to a Lower


Total Cost of Ownership
©2025 Databricks Inc. — All rights reserved
SQL Execution
DEMONSTRATION

The Unified SQL


Editor

©2025 Databricks Inc. — All rights reserved


High Level Steps
● List high level steps to be covered in the demonstration.
● Announce whether the demonstration is intended to be done as a
“follow-along” synchronous demonstration or if it’s an instruction only
demonstration to be completed entirely by the instructor and not the
audience.

©2025 Databricks Inc. — All rights reserved


SQL Execution
DEMONSTRATION

Manipulate and
Transform Data
with Databricks
SQL
©2025 Databricks Inc. — All rights reserved
High Level Steps
● List high level steps to be covered in the demonstration.
● Announce whether the demonstration is intended to be done as a
“follow-along” synchronous demonstration or if it’s an instruction only
demonstration to be completed entirely by the instructor and not the
audience.

©2025 Databricks Inc. — All rights reserved


SQL Execution
DEMONSTRATION

Creating Views
with Databricks
SQL

©2025 Databricks Inc. — All rights reserved


View Type Reviews
Four key types of views

Standard View Temporary View

Materialized View Dynamic View


(Precomputed) (Fine-grained Access Control)

©2025 Databricks Inc. — All rights reserved


High Level Steps
● List high level steps to be covered in the demonstration.
● Announce whether the demonstration is intended to be done as a
“follow-along” synchronous demonstration or if it’s an instruction only
demonstration to be completed entirely by the instructor and not the
audience.

©2025 Databricks Inc. — All rights reserved


SQL Execution
LAB EXERCISE

Manipulate and
Analyze a Table

©2025 Databricks Inc. — All rights reserved


Lab Exercise Goals
● What the learner is to accomplish within the lab environment as part of
completing the lab.
● Explain how it relates to the lecture or demonstration preceding the
exercise.
● Announce whether the lab will be completed with the instructor or if it’s
an independent exercise to be completed in the following X minutes.

©2025 Databricks Inc. — All rights reserved


Query Analysis

SQL Analytics on Databricks

©2025 Databricks Inc. — All rights reserved


Agenda
Query Analysis Time Lecture Demo Lab
Databricks Photon and Optimization in
5 mins ✓
Databricks
Query Insights 10 mins ✓

Best Practices for SQL Analytics 5 mins ✓

©2025 Databricks Inc. — All rights reserved


Learning Objectives
● Analyze query performance by leveraging the available information on a
query in Databricks.
○ Explain how Databricks Photon improves query performance.
○ Use Query Insights to analyze the performance of a query in Databricks.
● Describe the key points that make up the recommended best practices
for executing SQL analytics on Databricks.

©2025 Databricks Inc. — All rights reserved


Query Optimization
LECTURE

Databricks
Photon and
Optimization in
Databricks
©2025 Databricks Inc. — All rights reserved
Databricks SQL Performance Improvement
June 2024 to October 2024

ETL workloads are 9% more


efficient on average

Delivers 14% better


performance for Business
Intelligence (BI) workloads

Exploratory data analysis is


now 13% faster

©2025 Databricks Inc. — All rights reserved


Databricks Query Optimization
Built in features to support query optimization

Photon Predictive I/O Intelligent Workload


Management (IWM)
Built-in Vectorized Query Suite of Features for Features for Enhancing
Engine Improving Selective Scan Serverless SQL’s Query
Operations in SQL Queries Processing Ability at Scale

©2025 Databricks Inc. — All rights reserved


Photon
World record achieving query engine with zero tuning or setup
Save on compute costs
• ETL customers are saving up to 40% on their
compute cost
Fast query performance
• Built for modern hardware with up to 12x better
price/perf compared to other cloud data
warehouses
No code changes
• Spark APIs that can do exploration, ETL, big
data, small data, low latency, high concurrency,
batch, and streaming
Broad language support
• Support for SQL, Python, Scala, R, and Java

©2025 Databricks Inc. — All rights reserved


Databricks SQL infused with AI in our engine

Intelligent Workload Automatic Data Indexless


Management Layout Indexing
Leverages machine learning to Eliminates knobs to optimize Predictive i/o delivers comparable
efficiently route queries and storage with ROI-based table performance without expensive
scale clusters to maximize maintenance algorithms search-optimized indexes
cost/performance

TPC-DCS 1TB - Query timings (minutes


13 Lower is better
8.7

3X 2.4X
65.48
2.2X

4.4 29.48 3.6 3.7

Baseline New Before After CDW CDW w/Search DBSQL w/


Optimization Index Predictive I/O
SQL Warehouse - Mixed - Workloads

©2025 Databricks Inc. — All rights reserved 106


Intelligent Workload Management

Uses machine learning to efficiently


route queries and autoscale clusters
based on actual workloads

• Protects query latency by upscaling


quickly when queueing occurs
• Reduces costs by minimizing
always-on clusters and scaling down
quickly.

©2025 Databricks Inc. — All rights reserved 107


Intelligent Workload Management
AI powered simplicity to maximize throughput & utilization while reducing
query latency
Mixed Workloads Query Latency (sec)
Lower is better

©2025 Databricks Inc. — All rights reserved


Databricks Query Profile
Exploring query execution details

©2025 Databricks Inc. — All rights reserved


Query Optimization
DEMONSTRATION

Query Insights

©2025 Databricks Inc. — All rights reserved


High Level Steps
● List high level steps to be covered in the demonstration.
● Announce whether the demonstration is intended to be done as a
“follow-along” synchronous demonstration or if it’s an instruction only
demonstration to be completed entirely by the instructor and not the
audience.

©2025 Databricks Inc. — All rights reserved


SQL Execution
LECTURE

Best Practices for


SQL Analytics

©2025 Databricks Inc. — All rights reserved


Getting the most out of Databricks
Best Practice for SQL Analytics on Databricks

Managed Tables and Databricks SQL


Dynamic Views Data
Warehousing

Optimize Table and Query


Performance

Metadata Management
and Unity Catalog

©2025 Databricks Inc. — All rights reserved


Getting the most out of Databricks
Best Practice for SQL Analytics on Databricks

Managed Tables and ● Utilize managed tables over external tables


Dynamic Views ● Dynamic views provide row-level filtering, column,
masking and more
● Review permissions regularly to ensure proper data
access policy compliance
Optimize Table and Query
Performance

Metadata Management
and Unity Catalog

©2025 Databricks Inc. — All rights reserved


Getting the most out of Databricks
Best Practice for SQL Analytics on Databricks

Managed Tables and


Dynamic Views

Optimize Table and Query


Performance

Metadata Management
and Unity Catalog

©2025 Databricks Inc. — All rights reserved


Getting the most out of Databricks
Best Practice for SQL Analytics on Databricks

Managed Tables and


Dynamic Views

Optimize Table and Query


Performance

Metadata Management
and Unity Catalog

©2025 Databricks Inc. — All rights reserved


Getting the most out of Databricks
Best Practice for SQL Analytics on Databricks

Managed Tables and


Dynamic Views

Optimize Table and Query


Performance

Metadata Management
and Unity Catalog

©2025 Databricks Inc. — All rights reserved


AI-Assistant in Query Editor
• Databricks Assistant is an AI-based
assistant that enhances user productivity
by generating SQL queries, explaining
complex code, and fixing errors.

• It acts as a productivity-enhancing
companion, integrated with Unity Catalog,
to provide relevant and contextual insights
about users' entire data estate, improving
efficiency in query creation.

• Describe your task in English, and


Databricks Assistant will automatically
handle SQL generation, code explanation,
and error correction

©2025 Databricks Inc. — All rights reserved


What additional best practices
would you recommend for
SQL Analytics on Databricks?

©2025 Databricks Inc. — All rights reserved


Summary and
Next Steps

©2025 Databricks Inc. — All rights reserved


Course Learning Objective Recap
● Discover data in Databricks using Unity Catalog.
● Manage data object ownership and permissions within Unity Catalog.
● Identify the processes for data ingestion in Databricks.
● Import and manage data files using Databricks SQL and file-based uploads.
● Execute data ingestion processes within the scope of the work done by
data analysts.
● Use Databricks SQL to query, transform, and manipulate datasets.
● Create tables and views as dynamic or materialized views.
● Analyze query performance by leveraging the available information on a
query in Databricks.
● Describe the key points that make up the recommended best practices for
executing SQL analytics on Databricks.
©2025 Databricks Inc. — All rights reserved
Earn a Databricks certification!
Certification helps you gain industry recognition, competitive
differentiation, greater productivity, and results.

• This course helps you prepare for the Databricks


Certified Data Analyst Associate exam
• Recommended Self-Paced Courses
• Get Started with SQL Analytics and BI on Databricks
• AI/BI for Data Analysts
• Please see the Databricks Academy for
additional prep materials

For more information visit:


databricks.com/learn/certification

©2025 Databricks Inc. — All rights reserved 123


©2025 Databricks Inc. — All rights reserved

You might also like