Data Pipelines: Uses and Processes

Uploaded by

Alaphoran Tamizhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Data Pipelines: Uses and Processes

Uploaded by

Alaphoran Tamizhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Data Pipelines Explained

00:00
let's talk about data pipelines what they are when and how they're used so i want to start
with a simple idea most of us are fortunate enough to turn on the tap whenever we like
and fresh clean water comes out however have you have you thought about how that water
actually gets to you well water starts out in our lakes our oceans

00:30
and even our rivers but most of us probably wouldn't drink straight from the lake right we
have to treat and transform this water into something that's safe for us to use and we do
this using treatment facilities and we get the water from where it is to where it needs to go
using water pipelines

01:00
right now once that water has gotten from the source to their treatment plants it's then
cleansed and and made sure it's safe to use and then it's sent out using even more pipelines
to where we need it and we use it in a couple different places we need it for drinking water
we need it for cleaning

01:30
and we also need it for agriculture right so we use even more pipelines to get this water to
where it's needed okay now as you can see water pipelines take water from where it is to
where it's needed now we can start to think about data in organizations in a very similar
way so data and organization starts out in

02:00
data lakes it's in different databases that are a part of different sas applications some
applications are on-prem and then we also have streaming data which is kind of like our
river here now this can be data that is coming in in real time and so an example of that could
be sensor data from uh factories where data's being collected every second and being sent

1/4
02:30
and being sent back up to our repositories so just like our water sources this data is dirty
it's contaminated and it must be cleaned and transformed before it's useful in helping us
make business decisions now when we talk so how do we do this work we do it using not
water pipelines but data pipelines okay

03:00
so when we talk about data pipelines we have a few different processes that we can use to
help us handle the task of transforming and cleaning this data we can use processes like etl
we can use data replication we can also use something called data virtualization

03:30
right okay so one of the most common processes is etl which stands for extract transform
and load and that does exactly what it sounds like it extracts data from where it is it
transforms it by cleaning up mismatching data by taking care of missing values getting
rid of duplicated data putting in making sure the right columns are there and then loading
it into

04:00
a landing repository for ready-to-use business data an example of one of these repositories
could be an enterprise data warehouse right okay so most of the time we use something
called batch processing which means that on a given schedule we load data into our etl tool
and then load it to where it needs to be

04:30
but we could also have stream ingestion which would support the streaming data that i
mentioned earlier so it's continuously taking data in transforming it and then continuously
loading it to where it needs to be okay now another tool that we might see is data replication
so what this involves is a continuously replicating and copying data into another repository
before being loaded or used by our use

2/4
05:00
case so we could have a repository here in the middle that copies data from our source into
this into this repository so why would we do that right well one of the reasons could be that
the application or use case where we need this data needs to have a really high performant
back end to it and it's possible that our source data can't support something like that another
reason could be for backup and

05:30
disaster recovery reasons so in the situation where our source data goes offline for some
reason we still have this backup to keep running our business processes against okay so the
last one i want to touch on is data virtualization so all of the methods that i've described so
far require you to copy data from where it is and move it into another repository but what
if we want to test out a new

06:00
data use case and don't want to go through a large data transformation project well in that
case we can use a technology called data virtualization to simply virtualize access to our
data sources and only query them in real time when we need them without copying them
over and once we're happy with the outcome of our our test use case we can go back and
build out these formal data pipelines so data virtualization technology allows

06:30
us to access all these disparate data sources without having to go through building out
permanent data pipelines so once we're satisfied with the results of our data virtualization
project we can build a formal data pipeline that can support the massive amounts of data
that we need to that we need in a

07:00
production use case now unfortunately we haven't figured out a way how to virtualize water
but we can definitely do it with data in our in our organizations okay so after we've used all
these different processes to get data ready for uh analysis or different applications we can
start using it so what are the different ways in which we can use this data well we might
need it for our business intelligence platforms that

3/4
07:30
are needed for different types of reporting well we might also need it for machine learning
use cases right so machine learning requires tons and tons of high quality data so we need
to use these data pipeline tools to feed our machine learning algorithms and so this clean
data can be fed into our machine learning models to help us start making better and smarter
decisions in our business

08:00
okay so as we can see data pipelines take data from data producers and give them to data
consumers thank you if you have questions please drop us a line below and if you want to
see more videos like this in the future please like and subscribe

4/4

Data Engineering and Data Engineer - Students
No ratings yet
Data Engineering and Data Engineer - Students
56 pages
OpenSAP Bw4h2 Week 3 Transcript en
No ratings yet
OpenSAP Bw4h2 Week 3 Transcript en
25 pages
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
No ratings yet
Cloud Data Lakes For Dummies Snowflake Special Edition V1 4
10 pages
UNIT 1 To 5
No ratings yet
UNIT 1 To 5
37 pages
OD M2 Building A Data Lake
No ratings yet
OD M2 Building A Data Lake
59 pages
Data Transformation in Data Pipelines
No ratings yet
Data Transformation in Data Pipelines
11 pages
Data Lake Architecture Essentials
No ratings yet
Data Lake Architecture Essentials
50 pages
Data Engineering Skills & Tools Guide
No ratings yet
Data Engineering Skills & Tools Guide
20 pages
DocScanner 20 Oct 2024 2-19 PM
No ratings yet
DocScanner 20 Oct 2024 2-19 PM
16 pages
Building Data Lakes on Google Cloud
No ratings yet
Building Data Lakes on Google Cloud
60 pages
Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
CCD Unit 4
No ratings yet
CCD Unit 4
5 pages
Streaming Data Pipelines Guide
No ratings yet
Streaming Data Pipelines Guide
9 pages
Building a Successful Data Lake
No ratings yet
Building a Successful Data Lake
28 pages
Modern Data Stack
No ratings yet
Modern Data Stack
23 pages
MGMT7020 W9C1 Data Structures
No ratings yet
MGMT7020 W9C1 Data Structures
13 pages
Data Virtualization vs. Data Lakes
No ratings yet
Data Virtualization vs. Data Lakes
10 pages
Oreilly Technical Guide Understanding Etl
100% (1)
Oreilly Technical Guide Understanding Etl
107 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Data Lake Architecture - Designing The Data Lake and Avoiding The Garbage Dump (PDFDrive)
No ratings yet
Data Lake Architecture - Designing The Data Lake and Avoiding The Garbage Dump (PDFDrive)
209 pages
A Comprehensive Meta Model For The
No ratings yet
A Comprehensive Meta Model For The
61 pages
Ds 6
No ratings yet
Ds 6
7 pages
Data Engineering
No ratings yet
Data Engineering
14 pages
Course1 Summary
No ratings yet
Course1 Summary
4 pages
AWS Data Infrastructure Guide
No ratings yet
AWS Data Infrastructure Guide
9 pages
DS Day 6
No ratings yet
DS Day 6
5 pages
Data Engineering Part 1 1735286787
No ratings yet
Data Engineering Part 1 1735286787
22 pages
Data Engineering Famous Terms 1756202104
No ratings yet
Data Engineering Famous Terms 1756202104
22 pages
Data Engineering - Session 01
No ratings yet
Data Engineering - Session 01
34 pages
BDMA Part 2
No ratings yet
BDMA Part 2
16 pages
DP 900 Day 4
No ratings yet
DP 900 Day 4
40 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
The Big Book of Data Engineering: A Collection of Technical Blogs, Including Code Samples and Notebooks
100% (2)
The Big Book of Data Engineering: A Collection of Technical Blogs, Including Code Samples and Notebooks
57 pages
AWS Data Pipelines: Automation & Security
No ratings yet
AWS Data Pipelines: Automation & Security
10 pages
Unit 4
No ratings yet
Unit 4
30 pages
Data Terms 1714351092
No ratings yet
Data Terms 1714351092
12 pages
Architecting A Data Lake
100% (9)
Architecting A Data Lake
60 pages
Cloud Data Pipelines Explained
No ratings yet
Cloud Data Pipelines Explained
8 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
What Is A Data Pipeline - IBM
No ratings yet
What Is A Data Pipeline - IBM
10 pages
Data Lakes vs. Warehouses Explained
No ratings yet
Data Lakes vs. Warehouses Explained
8 pages
Data Engineering Life Cycle
No ratings yet
Data Engineering Life Cycle
33 pages
2019C2 - Data Lakes Ebook
No ratings yet
2019C2 - Data Lakes Ebook
37 pages
Data Lakes: Optimize Analytics Guide
No ratings yet
Data Lakes: Optimize Analytics Guide
37 pages
Architecting Data Lakes Zaloni PDF
No ratings yet
Architecting Data Lakes Zaloni PDF
63 pages
MODERN ENTERPRISE Data Pipeline
100% (2)
MODERN ENTERPRISE Data Pipeline
98 pages
Understanding Etl Er1
No ratings yet
Understanding Etl Er1
34 pages
ELT Vs ETL
No ratings yet
ELT Vs ETL
13 pages
Data Dictionary
No ratings yet
Data Dictionary
24 pages
Understanding ETL Updated-Edition
No ratings yet
Understanding ETL Updated-Edition
107 pages
Lec2 (Analyse The Different Ways in Which Data Is Stored and Processed For Use in An Organisation.)
No ratings yet
Lec2 (Analyse The Different Ways in Which Data Is Stored and Processed For Use in An Organisation.)
33 pages
Data Processing Pipelines Overview
100% (1)
Data Processing Pipelines Overview
22 pages
Chapter 2 Data Warehousing
No ratings yet
Chapter 2 Data Warehousing
57 pages
Build A True Data Lake With A Cloud Data Warehouse
No ratings yet
Build A True Data Lake With A Cloud Data Warehouse
15 pages
Big Data Architecture Guide
No ratings yet
Big Data Architecture Guide
4 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Types of Data Warehouse Tables Explained
No ratings yet
Types of Data Warehouse Tables Explained
5 pages
Anvesh - Sr. Data Engineer
No ratings yet
Anvesh - Sr. Data Engineer
6 pages
Informatica KBA Mcqs
No ratings yet
Informatica KBA Mcqs
58 pages
Ccs341-Dw-Int I Key-Set I-Ar
No ratings yet
Ccs341-Dw-Int I Key-Set I-Ar
18 pages
Data Virtualization For Dummies Eng 3
100% (3)
Data Virtualization For Dummies Eng 3
68 pages
Literature Review Datawarehouse
100% (1)
Literature Review Datawarehouse
40 pages
RFP Vol I 20170720 Final
100% (1)
RFP Vol I 20170720 Final
41 pages
Microsoft Learn For Orgs Playbook
No ratings yet
Microsoft Learn For Orgs Playbook
19 pages
Resume SSIS, SSRS, SSAS, SQL Server Developer
No ratings yet
Resume SSIS, SSRS, SSAS, SQL Server Developer
7 pages
Certsinside Talend Data Integration Certified Developer Exam Exam Dumps by Whitney 29 01 2024 7qa
No ratings yet
Certsinside Talend Data Integration Certified Developer Exam Exam Dumps by Whitney 29 01 2024 7qa
14 pages
What Is Oracle Data Integrator (ODI) ?
100% (1)
What Is Oracle Data Integrator (ODI) ?
8 pages
Data Science Concepts and Techniques
No ratings yet
Data Science Concepts and Techniques
2 pages
Big Data and Business Intelligence
No ratings yet
Big Data and Business Intelligence
108 pages
JD Data Engineer - Fresher
No ratings yet
JD Data Engineer - Fresher
2 pages
Data Movement Modeling
No ratings yet
Data Movement Modeling
176 pages
HR Analytics at Wells Fargo 1
No ratings yet
HR Analytics at Wells Fargo 1
46 pages
Data Analytics Bootcamp Overview
No ratings yet
Data Analytics Bootcamp Overview
26 pages
Software QA Tester with Automation Expertise
No ratings yet
Software QA Tester with Automation Expertise
5 pages
Predictive Analytics
No ratings yet
Predictive Analytics
28 pages
Azure Data Engineering Course Overview
No ratings yet
Azure Data Engineering Course Overview
12 pages
Data Warehousing Course Overview
No ratings yet
Data Warehousing Course Overview
4 pages
Kartheek Alla Informatica
No ratings yet
Kartheek Alla Informatica
5 pages
Srilakshmi M Resume
No ratings yet
Srilakshmi M Resume
2 pages
Senior Oracle Developer Resume
100% (1)
Senior Oracle Developer Resume
5 pages
Slowly Changing Dimension
100% (1)
Slowly Changing Dimension
14 pages
Big Data Work Book PDF
No ratings yet
Big Data Work Book PDF
65 pages
CCC
No ratings yet
CCC
25 pages
Basics of Data Warehousing, MIS and ETL
No ratings yet
Basics of Data Warehousing, MIS and ETL
36 pages
Technical Interview Experience - Azure Data Engineer
No ratings yet
Technical Interview Experience - Azure Data Engineer
7 pages
Business Analytics Course Overview
25% (4)
Business Analytics Course Overview
6 pages

Data Pipelines: Uses and Processes

Uploaded by

Data Pipelines: Uses and Processes

Uploaded by

Data Pipelines Explained

You might also like