0% found this document useful (0 votes)
25 views2 pages

Data Engineering Interview Stuff

Uploaded by

byrapaneni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views2 pages

Data Engineering Interview Stuff

Uploaded by

byrapaneni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

About Me

Sure, I’d be glad to. I’m a data engineer at SunPower working since 2019.

- SunPower

- American Express GBT

In my downtime, I enjoy reading about ML, AI, tech, and robotics.

Data profiling
Data profiling is the process of analyzing and exploring data to understand how it’s structured, what it
contains, the relationships between data sets, and how it could potentially be used most effectively.
As such, data and analytics teams will perform data profiling to better understand the condition and
the value of their data to determine best how to transform the data into an analytics-ready form.

Types of Data profiling


- Content discovery involves analyzing data rows for errors and systemic issues. For example,
this may involve reviewing a list of customers who don’t have valid email addresses.
- Structure discovery is necessary for making sure that data is formatted correctly and is
consistent throughout a database. Structure discovery might entail checking a list of addresses
for town names or zip codes, for example.
- Relationship discovery is used to analyze data in use and identify relationships across
spreadsheets or database tables. To illustrate, customer and order data is typically not stored
in the same table in a database. Following a transaction, these two relationships would need to
be discovered and linked to have any value.

Data profiling Tools


- IBM InfoSphere
- SAP Business Objects Data Services
- Informatica Data Explorer
- Talend
- Melissa

Traditional SQL Database


- Schema on write

No Sql Database
- Schema on write
-
Hungarian Matching
Stable Marriage Problem

TR:
Missing attachement
Content Filtering:
- Features
Collaborative Filtering
-
Compliance Filtering
- Contractual agreement with the airlines
- Hotels

What do you want to be?


Traveler DNA to reduce the call time from 11.5 to 7.2. Each minute saved 1MN$
History:
- Time of shopping
- IP
- Profile

- Stated information
- Implied Information Lang / Currency / Tax Status
- Observer Information

Recommendations System
What’s the next thing to buy

What is Cohort Analysis


- A tool to measure user engagement over time
- subset of behavioral analytics

You might also like