3 - Data Engineering With AWS - Second Edition

Uploaded by

sultanakpt2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

640 views7 pages

3 - Data Engineering With AWS - Second Edition

Uploaded by

sultanakpt2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

Table of Contents

1. Data Engineering with AWS, Second Edition: Acquire the skills to

design and build cloud-based data transformation pipelines like a
pro using AWS
2. 1 An Introduction to Data Engineering
1. Join our book community on Discord
2. Technical requirements
3. The rise of big data as a corporate asset
4. The challenges of ever-growing datasets
5. The role of the data engineer as a big data enabler
1. Understanding the role of the data engineer
2. Understanding the role of the data scientist
3. Understanding the role of the data analyst
4. Understanding other common data related roles
6. The benefits of the cloud when building big data analytic
solutions
7. Hands-on - creating and accessing your AWS account
1. Creating a new AWS account
2. Accessing your AWS account
8. Summary
3. 2 Data Management Architectures for Analytics
1. Join our book community on Discord
2. Technical requirements
3. The evolution of data management for analytics
1. Databases and data warehouses
2. Dealing with big, unstructured data
3. Cloud based solutions for big data analytics
4. A deeper dive into data warehouse concepts and architecture
1. Dimensional modelling in data warehouses
2. Understanding the role of data marts
3. Distributed storage and massively parallel processing
4. Columnar data storage and efficient data compression
5. Feeding data into the warehouse – ETL and ELT pipelines
6. An overview of data lake architecture and concepts
https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 1/7
10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

7. Data lake logical architecture

5. Bringing together the best of data warehouses and data lakes
1. The Data Lake House approach
6. Hands-on – Using the AWS Command Line Interface (CLI) to
create S3 buckets
1. Accessing the AWS CLI
2. Creating new Amazon S3 buckets
7. Summary
4. 3 The AWS Data Engineer's Toolkit
1. Join our book community on Discord
2. Technical requirements
3. AWS services for ingesting data
1. Overview of Amazon Database Migration Service (DMS)
2. Overview of Amazon Kinesis for streaming data ingestion
3. Overview of Amazon MSK for streaming data ingestion
4. Overview of Amazon AppFlow for ingesting data from SaaS
services
5. Overview of AWS Transfer Family for ingestion using
FTP/SFTP protocols
6. Overview of AWS DataSync for ingesting from on-premises
storage
7. Overview of the AWS Snow family of devices for large data
transfers
8. Overview of AWS Glue for data ingestion
4. AWS services for transforming data
1. Overview of AWS Lambda for light transformations
2. Overview of AWS Glue for serverless data processing
3. Overview of Amazon EMR for Hadoop ecosystem processing
5. AWS services for orchestrating big data pipelines
1. Overview of AWS Glue workflows for orchestrating Glue
components
2. Overview of AWS Step Functions for complex workflows
3. Overview of Amazon Managed Workflows for Apache
Airflow (MWAA)aa
6. AWS services for consuming data
1. Overview of Amazon Athena for SQL queries in the data
lake

https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 2/7
10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

2. Overview of Amazon Redshift and Redshift Spectrum for

data warehousing and data lakehouse architectures
3. Overview of Amazon QuickSight for visualizing data
7. Hands-on – triggering an AWS Lambda function when a new file
arrives in an S3 bucket
1. Creating a Lambda layer containing the AWS SDK for Pandas
library
2. Creating an IAM policy and role for your Lambda function
3. Creating a Lambda function
4. Configuring our Lambda function to be triggered by an S3
upload
8. Summary
5. 4 Data Governance, Security and Cataloging
1. Join our book community on Discord
2. Technical requirements
3. The many different aspects of Data Governance
4. Data security, access and privacy
1. Common data regulatory requirements
2. Core data protection concepts
3. Personal data
4. Encryption
5. Anonymized data
6. Pseudonymized data/tokenization
7. Authentication
8. Authorization
9. Putting these concepts together
5. Data quality, data profiling, and data lineage
1. Data quality
2. Data profiling
3. Data lineage
6. Business and technical data catalogs
1. Implementing a data catalog to avoid creating a data swamp
2. Business data catalog
3. Technical data catalog
7. AWS services that help with data governance
1. The AWS Glue/Lake Formation technical data catalog
2. AWS Glue DataBrew for profiling datasets

https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 3/7
10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

3. AWS Glue for Data Quality

4. AWS Key Management Service (KMS) for data encryption
5. Amazon Macie for detecting PII data in Amazon S3 objects
6. AWS Glue Studio Detect PII transform for detecting PII data
in datasets
7. Amazon GuardDuty for detecting threats in an AWS account
8. AWS Identity and Access Management (IAM) service
9. Using AWS Lake Formation to manage data lake access
8. Hands-on – configuring Lake Formation permissions
1. Creating a new user with IAM permissions
2. Transitioning to managing fine-grained permissions with
AWS Lake Formation
9. Summary
6. 5 Architecting Data Engineering Pipelines
1. Join our book community on Discord
2. Technical requirements
3. Approaching the data pipeline architecture
1. Architecting houses and architecting pipelines
2. Whiteboarding as an information-gathering tool
3. Conducting a whiteboarding session
4. Identifying data consumers and understanding their
requirements
5. Identifying data sources and ingesting data
6. Identifying data transformations and optimizations
1. File format optimizations
2. Data standardization
3. Data quality checks
4. Data partitioning
5. Data denormalization
6. Data cataloging
7. Whiteboarding data transformation
7. Loading data into data marts
8. Wrapping up the whiteboarding session
9. Hands-on – architecting a sample pipeline
1. Detailed notes from the project "Bright Light" whiteboard-
ing meeting of GP Widgets, Inc
10. Summary

https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 4/7
10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

7. 6 Ingesting Batch and Streaming Data

1. Join our book community on Discord
2. Technical requirements
3. Understanding data sources
1. Data variety
2. Questions to ask
4. Ingesting data from a relational database
1. AWS Database Migration Service (DMS)
2. AWS Glue
3. Other ways to ingest data from a database
4. Deciding on the best approach for ingesting from a database
5. Ingesting streaming data
1. Amazon Kinesis versus Amazon Managed Streaming for
Kafka (MSK)
6. Hands-on – ingesting data with AWS DMS
1. Deploying MySQL and an EC2 data loader via
CloudFormation
2. Creating an IAM policy and role for DMS
3. Configuring DMS settings and performing a full load from
MySQL to S3
4. Querying data with Amazon Athena
7. Hands-on – ingesting streaming data
1. Configuring Kinesis Data Firehose for streaming delivery to
Amazon S3
2. Configuring Amazon Kinesis Data Generator (KDG)
3. Adding newly ingested data to the Glue Data Catalog
4. Querying the data with Amazon Athena
8. Summary
8. 7 Transforming Data to Optimize for Analytics
1. Join our book community on Discord
2. Technical requirements
3. Overview of how transformations can create value
1. Cooking, baking, and data transformations
2. Transformations as part of a pipeline
4. Types of data transformation tools
1. Apache Spark
2. Hadoop and MapReduce

https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 5/7
10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

3. SQL
4. GUI-based tools
5. Common data preparation transformations
1. Protecting PII data
2. Optimizing the file format
3. Optimizing with data partitioning
4. Data cleansing
6. Common business use case transformations
1. Data denormalization
2. Enriching data
3. Pre-aggregating data
4. Extracting metadata from unstructured data
7. Working with change data capture (CDC) data
1. Traditional approaches – data upserts and SQL views
2. Modern approaches – Open Table Formats (OTF)
8. Hands-on – joining datasets with AWS Glue Studio
1. Creating a new data lake zone – the curated zone
2. Creating a new IAM role for the Glue job
3. Configuring a denormalization transform using AWS Glue
Studio
4. Finalizing the denormalization transform job to write to S3
5. Create a transform job to join streaming and film data using
AWS Glue Studio
9. Summary
9. 8 Identifying and Enabling Data Consumers
1. Join our book community on Discord
2. Technical requirements
3. Understanding the impact of data democratization
1. A growing variety of data consumers
2. How a data mesh helps data consumers
4. Meeting the needs of business users with data visualization
1. AWS tools for business users
5. Meeting the needs of data analysts with structured reporting
1. AWS tools for data analysts
6. Meeting the needs of data scientists and ML models
1. AWS tools used by data scientists to work with data

https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 6/7
10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

7. Hands-on – creating data transformations with AWS Glue

DataBrew
1. Configuring new datasets for AWS Glue DataBrew
2. Creating a new Glue DataBrew project
3. Building your Glue DataBrew recipe
4. Creating a Glue DataBrew job
8. Summary
10. 10 Orchestrating the Data Pipeline
1. Technical requirements
2. Understanding the core concepts for pipeline orchestration
1. What is a data pipeline, and how do you orchestrate it?
2. How do you trigger a data pipeline to run?
3. How do you handle the failures of a step in your pipeline?
3. Examining the options for orchestrating pipelines in AWS
1. AWS Data Pipeline (now in maintenance mode)
2. AWS Glue Workflows to orchestrate Glue resources
3. Apache Airflow as an open-source orchestration solution
4. Pros and cons of using MWAA
5. AWS Step Functions for a serverless orchestration solution
6. Pros and cons of using AWS Step Functions
7. Deciding on which data pipeline orchestration tool to use
4. Hands-on – orchestrating a data pipeline using AWS Step
Functions
1. Creating new Lambda functions
2. Creating an SNS topic and subscribing to an email address
3. Creating a new Step Functions state machine
4. Configuring our S3 bucket to send events to EventBridge
5. Summary

1. Cover
2. Table of contents

https://learning.oreilly.com/library/view/data-engineering-with/9781804614426/nav.xhtml 7/7

A - Learning - Oreilly.com-Preface Data Engineering With AWS
No ratings yet
A - Learning - Oreilly.com-Preface Data Engineering With AWS
6 pages
AWS Data Engineering Overview
No ratings yet
AWS Data Engineering Overview
9 pages
Data Engineering Report Final
No ratings yet
Data Engineering Report Final
56 pages
DataCamp - Data Engineer
No ratings yet
DataCamp - Data Engineer
2 pages
Data Engineering Nanodegree with AWS
No ratings yet
Data Engineering Nanodegree with AWS
16 pages
Ram Documentatation
No ratings yet
Ram Documentatation
56 pages
Data Engineering For Machine Learning Pipelines From Python Libraries To ML P
100% (2)
Data Engineering For Machine Learning Pipelines From Python Libraries To ML P
582 pages
AWS Data Engineering Guide
No ratings yet
AWS Data Engineering Guide
2 pages
NoteGPT - AWS Data Engineer Full Course in 10 Hours (2025) - Data Engineer Course For Beginner - Edureka Live
No ratings yet
NoteGPT - AWS Data Engineer Full Course in 10 Hours (2025) - Data Engineer Course For Beginner - Edureka Live
141 pages
ADE Roadmap
No ratings yet
ADE Roadmap
28 pages
Data Engineer (Azure & Fabric)
No ratings yet
Data Engineer (Azure & Fabric)
37 pages
Puneeth Report
No ratings yet
Puneeth Report
37 pages
Chapter 1 What Is Data Engineering PDF
No ratings yet
Chapter 1 What Is Data Engineering PDF
79 pages
Aws Intern Report
No ratings yet
Aws Intern Report
37 pages
Data Engineering in 100 Days
No ratings yet
Data Engineering in 100 Days
4 pages
Daniel Beach - Introduction To Data Engineering-Leanpub - Com (2022)
100% (1)
Daniel Beach - Introduction To Data Engineering-Leanpub - Com (2022)
172 pages
Data Engineer Roadmap
No ratings yet
Data Engineer Roadmap
4 pages
An Introduction To Data Engineering
No ratings yet
An Introduction To Data Engineering
2 pages
Data Engineering Course Outline
No ratings yet
Data Engineering Course Outline
3 pages
Brgineer Data
No ratings yet
Brgineer Data
2 pages
Data Engineering Syllabus
No ratings yet
Data Engineering Syllabus
5 pages
Data Engineering For Everyone 1
No ratings yet
Data Engineering For Everyone 1
79 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
56 pages
Big Data Engineering with AWS Insights
No ratings yet
Big Data Engineering with AWS Insights
48 pages
AWS Data Engineering 1 Week Plan
No ratings yet
AWS Data Engineering 1 Week Plan
4 pages
DE AWS Test (1) T
No ratings yet
DE AWS Test (1) T
74 pages
Summer Internship Report On: Aws Data Engineering (Topic)
No ratings yet
Summer Internship Report On: Aws Data Engineering (Topic)
21 pages
Complete Data Engineering Roadmap With Resources
No ratings yet
Complete Data Engineering Roadmap With Resources
16 pages
AWS Tools for Data Engineers
No ratings yet
AWS Tools for Data Engineers
24 pages
Geetha Intern de
No ratings yet
Geetha Intern de
26 pages
That
No ratings yet
That
4 pages
Introduction To Data Engineering
100% (2)
Introduction To Data Engineering
23 pages
Azure Data Engineering Syllabus
No ratings yet
Azure Data Engineering Syllabus
17 pages
L1 - Introduction and Data EcoSystem
No ratings yet
L1 - Introduction and Data EcoSystem
42 pages
A Internship Report UTTAM
No ratings yet
A Internship Report UTTAM
9 pages
Step by Step Guide For Data Engineering
No ratings yet
Step by Step Guide For Data Engineering
7 pages
Introduction to Data Engineering
No ratings yet
Introduction to Data Engineering
30 pages
Brochure Professional Certificate in Data Engineering
100% (1)
Brochure Professional Certificate in Data Engineering
14 pages
Aws Data Engineer
No ratings yet
Aws Data Engineer
66 pages
100 Data Engineering QUESTIONS ANSWERS
No ratings yet
100 Data Engineering QUESTIONS ANSWERS
59 pages
Data Engineering by AWS
100% (1)
Data Engineering by AWS
11 pages
Program Guide - APCR - DEA - NAMER
No ratings yet
Program Guide - APCR - DEA - NAMER
4 pages
Data Engineering Roadmap Guide
No ratings yet
Data Engineering Roadmap Guide
3 pages
Data Engineering Career Boost
No ratings yet
Data Engineering Career Boost
15 pages
Awsq
No ratings yet
Awsq
5 pages
Road-Map For Data Engineering
No ratings yet
Road-Map For Data Engineering
1 page
Internship 1
No ratings yet
Internship 1
24 pages
Aspiring Data Engineers' Guide
No ratings yet
Aspiring Data Engineers' Guide
4 pages
Data Engineering With Python Course Agenda and Syllabus
No ratings yet
Data Engineering With Python Course Agenda and Syllabus
3 pages
Becoming A Data Engineer (The StudyPlan)
No ratings yet
Becoming A Data Engineer (The StudyPlan)
4 pages
AWS Academy Data Engineering v1 Coures Outline (EN-US) 2022-11-01
No ratings yet
AWS Academy Data Engineering v1 Coures Outline (EN-US) 2022-11-01
6 pages
Big Book of Data Engineering 2nd Edition Final
100% (1)
Big Book of Data Engineering 2nd Edition Final
97 pages
Um10276 1382161
No ratings yet
Um10276 1382161
80 pages
Greek Terms for Bribery and Gifts
No ratings yet
Greek Terms for Bribery and Gifts
124 pages
Fanuc Oi-Tc Canned Cycle
No ratings yet
Fanuc Oi-Tc Canned Cycle
16 pages
Myanmar's Thadingyut Festival
100% (1)
Myanmar's Thadingyut Festival
7 pages
Tugas B. Inggris Ari Susantii
No ratings yet
Tugas B. Inggris Ari Susantii
4 pages
Daftar Tenaga Pendidik Lombok Timur
100% (1)
Daftar Tenaga Pendidik Lombok Timur
49 pages
2 Marker Assignment 6
No ratings yet
2 Marker Assignment 6
3 pages
The Incal Vol2 The Luminous Incal Alejandro Jodorowsky Moebius Download
100% (1)
The Incal Vol2 The Luminous Incal Alejandro Jodorowsky Moebius Download
40 pages
Python Tuples PDF
No ratings yet
Python Tuples PDF
3 pages
DLD Project Report10
No ratings yet
DLD Project Report10
9 pages
Windows 7 Patch Testing Tool
No ratings yet
Windows 7 Patch Testing Tool
1 page
List of Osho S Complete Audio Video DVD Collection
No ratings yet
List of Osho S Complete Audio Video DVD Collection
15 pages
BA Eng
No ratings yet
BA Eng
46 pages
1 - Bhāvanā
No ratings yet
1 - Bhāvanā
28 pages
Console PZ
No ratings yet
Console PZ
617 pages
Revolver 2.0 Template
No ratings yet
Revolver 2.0 Template
6 pages
Desiring Her Completed Desiring Her
45% (33)
Desiring Her Completed Desiring Her
46 pages
CFE 101 - Module 2, Lesson 2 - Church Teaching
No ratings yet
CFE 101 - Module 2, Lesson 2 - Church Teaching
2 pages
IFC Nationals 2024
No ratings yet
IFC Nationals 2024
1 page
Sacrament of Confirmation
No ratings yet
Sacrament of Confirmation
5 pages
Modals Practice
No ratings yet
Modals Practice
53 pages
Teacher Resume
No ratings yet
Teacher Resume
1 page
The Subject and Object of Linguistics
No ratings yet
The Subject and Object of Linguistics
3 pages
A Day by Emili Dickenson
No ratings yet
A Day by Emili Dickenson
5 pages
ED6 Sem 1 End-Term Ver A
No ratings yet
ED6 Sem 1 End-Term Ver A
7 pages
Verbos Irregulares
No ratings yet
Verbos Irregulares
2 pages
The Legend of King Arthur - Past Simple Gap Fill
No ratings yet
The Legend of King Arthur - Past Simple Gap Fill
1 page
Ugc Net Exam Daa PDF
No ratings yet
Ugc Net Exam Daa PDF
94 pages
Chapter 8 Dynamic Programming Student
No ratings yet
Chapter 8 Dynamic Programming Student
24 pages
Communication - Crossword Labs
No ratings yet
Communication - Crossword Labs
1 page

3 - Data Engineering With AWS - Second Edition

Uploaded by

3 - Data Engineering With AWS - Second Edition

Uploaded by

10/24/23, 2:06 PM Data Engineering with AWS - Second Edition

1. Data Engineering with AWS, Second Edition: Acquire the skills to

7. Data lake logical architecture

2. Overview of Amazon Redshift and Redshift Spectrum for

3. AWS Glue for Data Quality

7. 6 Ingesting Batch and Streaming Data

7. Hands-on – creating data transformations with AWS Glue

You might also like