Data Engineering with Databricks Cookbook

This is the code repository for Data Engineering with Databricks Cookbook, published by Packt.

Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

What is this book about?

This book shows you how to use Apache Spark, Delta Lake, and Databricks to build data pipelines, manage and transform data, optimize performance, and more. Additionally, you’ll implement DataOps and DevOps practices, and orchestrate data workflows.

This book covers the following exciting features:

Perform data loading, ingestion, and processing with Apache Spark
Discover data transformation techniques and custom user-defined functions (UDFs) in Apache Spark
Manage and optimize Delta tables with Apache Spark and Delta Lake APIs
Use Spark Structured Streaming for real-time data processing
Optimize Apache Spark application and Delta table query performance
Implement DataOps and DevOps practices on Databricks
Orchestrate data pipelines with Delta Live Tables and Databricks Workflows
Implement data governance policies with Unity Catalog

If you feel this book is for you, get your copy today!

Instructions and Navigations

All of the code is organized into folders. For example, Chapter01.

The code will look like the following:

from pyspark.sql import SparkSession

spark = (SparkSession.builder
 .appName("read-csv-data")
 .master(«spark://spark-master:7077»)
 .config(«spark.executor.memory", "512m")
 .getOrCreate())

spark.sparkContext.setLogLevel("ERROR")

Following is what you need for this book: This book is for data engineers, data scientists, and data practitioners who want to learn how to build efficient and scalable data pipelines using Apache Spark, Delta Lake, and Databricks. To get the most out of this book, you should have basic knowledge of data architecture, SQL, and Python programming.

With the following software and hardware list you can run all code files present in the book (Chapter 1-11).

Software and Hardware List

Chapter	Software required	OS required
1-11	Docker Engine version 18.02.0+	Windows, Mac OS X, and Linux (any)
1-11	Docker Compose version 1.25.5+	Windows, Mac OS X, and Linux (any)
1-11	Docker Desktop	Windows, Mac OS X, and Linux (any)
1-11	Git	Windows, Mac OS X, and Linux (any)

Related products

Business Intelligence with Databricks SQL [Packt] [Amazon]
Optimizing Databricks Workloads [Packt] [Amazon]

Get to Know the Author

Pulkit Chadha is a seasoned technologist with over 15 years of experience in data engineering. His proficiency in crafting and refining data pipelines has been instrumental in driving success across diverse sectors such as healthcare, media and entertainment, hi-tech, and manufacturing. Pulkit’s tailored data engineering solutions are designed to address the unique challenges and aspirations of each enterprise he collaborates with.

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.vscode		.vscode
Chapter01		Chapter01
Chapter02		Chapter02
Chapter03		Chapter03
Chapter04		Chapter04
Chapter05		Chapter05
Chapter06		Chapter06
Chapter07		Chapter07
Chapter08		Chapter08
Chapter09		Chapter09
Chapter10		Chapter10
Chapter11		Chapter11
data		data
diagrams/Chapter10		diagrams/Chapter10
docker		docker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering with Databricks Cookbook

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Get to Know the Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

PacktPublishing/Data-Engineering-with-Databricks-Cookbook

Folders and files

Latest commit

History

Repository files navigation

Data Engineering with Databricks Cookbook

What is this book about?

Instructions and Navigations

Software and Hardware List

Related products

Get to Know the Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages