0% found this document useful (0 votes)
54 views5 pages

Databricks Cluster Optimization Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views5 pages

Databricks Cluster Optimization Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Catalyst Optimizer is used for Performance tuning.

Tungsten for memory management and CPU efficiency.


Cluster Types
All-purpose Cluster Job Cluster
1. Created Manually 1.Created by Jobs
2. Persistent 2. Terminated at end of the Job
3. Suitable for Interactive workloads 3. Suitable for automated workloads
And adhoc jobs
4. Shared among many users 4. Isolated just for the Job
5. Expensive to run 5. Cheaper to run

Actually, it will take 4-5 min time to create the cluster. To speed the process we can
use the cluster pools
Single Node and Multi Node Cluster: Single Node will have only one node which is
driver Node
Multi Node cluster have one node and multiple worker Nodes.

Access Modes: There are 4 types of Access Modes


Databricks Runtime: 4 types of Runtimes
a)Databricks Runtime
b)Databricks Runtime ML
c)Photon Runtime
d)Databricks Runtime Light
Databricks Utilities
%fs – File Systems
Dbutils.fs.ls(‘/’)
Dbutils.help()
Dbutils.fs.help()
Dbutils.fs.help(‘ls’)

Unity Catalog : unified solution for implementing data governance in the data
lakehouse
Data governance: is the process of managing the availability, usability ,integrity and
security of the data present in an enterprise.
Data Access control
Data Lineage
Data Audit
Data discoverability

Steps: 1. Create Databricks workspace


2. Create Azure data lake gen2
3. Create Access Connector
4. Add role Storage Blob data Contributor
5. Create Unity catalog Meta store
6. Enable Databricks workspace for Unity catalog
Access Connector is link between Storage account and Azure Databricks workspace.

You might also like