Google Data Science Interview Questions

The document outlines key concepts related to linear regression, including assumptions of error such as independence, homoscedasticity, and normality. It discusses the role of p-values in high-dimensional linear regression and the need for adjustments to avoid false positives. Additionally, it covers techniques for encoding high-cardinality categorical variables and explains how Principal Component Analysis (PCA) works for dimensionality reduction.

Uploaded by

ritikapawayyy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views6 pages

Google Data Science Interview Questions

Uploaded by

ritikapawayyy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

GOOGLE

DATA
SCIENCE
INTERVIEW
QUESTIONS
WHAT ARE THE ASSUMPTIONS OF ERROR IN LINEAR
REGRESSION

Independence of Errors - The error terms should be

1
independent of each other. This means that there should be
no correlation between consecutive errors (no
autocorrelation). This assumption is often tested using the
Durbin-Watson test in time series data.

Homoscedasticity - The variance of the error terms should

remain constant across all levels of the independent
variables. If the variance of the errors increases or
decreases (heteroscedasticity), it can lead to inefficiencies
in the estimation of coefficients.

Normality of Errors - The error terms should be normally

distributed, especially for hypothesis testing (i.e., t-tests for
coefficients). This assumption is crucial when constructing
confidence intervals and p-values.
WHAT IS THE FUNCTION OF P-VALUES IN HIGH
DIMENSIONAL LINEAR REGRESSION? 2
P-values are used to test the null hypothesis that a specific
regression coefficient (for a predictor) is zero. A low p-value
suggests that the predictor is statistically significant,
meaning it likely has an effect on the response variable.

In high-dimensional models, testing many predictors

increases the chance of false positives (Type I errors),
meaning some predictors might appear significant purely by
chance. Traditional pvalues need to be adjusted (e.g.,
Bonferroni correction, FDR methods) to account for this.

High-dimensional data often has strong multicollinearity,

meaning many predictors are highly correlated. This can
cause unstable estimates of regression coefficients, leading
to unreliable pvalues. So make sure to remove correlated
features
LET’S SAY YOU HAVE A CATEGORICAL VARIABLE
WITH THOUSANDS OF DISTINCT VALUES, HOW
WOULD YOU ENCODE IT?
3
Leave-One-Out Encoding A variation of target encoding,
leave-one-out encoding, computes the target mean for each
category, but excludes the current observation to avoid
target leakage.

Pros: Reduces target leakage, works well with highcardinality

features.
Cons: Computationally more expensive than simple target
encoding.

Embedding-Based Encoding - For extremely high cardinality

categorical features, embedding-based approaches are
often effective. This technique involves learning a dense
vector representation of each category, and you typically
use a NN to get the embedding.

Pros: Captures latent structure

Cons: More complex to implement
DESCRIBE TO ME HOW PCA WORKS
4
PCA is a dimensionality reduction technique used if you
think you have correlated features, noisy data, or to
visualize data in fewer dimensions.

To perform PCA you normalize features, calculate

covariance matrix (to indicate if variable increase/decrease
when another variables does), find eigenvectors (directions
where data is most spread out) or eigenvalues (amount of
variance/spread)

PCA does assume variables are linearly related, so cant be

used for non-linear relationships. Also, new dimension are
linear combination of older dimension so interpretation
does become harder.
THANK YOU

Google Data Science Interview Guide
No ratings yet
Google Data Science Interview Guide
6 pages
R Script
No ratings yet
R Script
14 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
ML Lab - Sukanya Raja
No ratings yet
ML Lab - Sukanya Raja
23 pages
Multivariate
100% (1)
Multivariate
78 pages
CS Notes
No ratings yet
CS Notes
3 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
EDA Assignment 1 Devyani1
No ratings yet
EDA Assignment 1 Devyani1
7 pages
Understanding Regression Types
No ratings yet
Understanding Regression Types
8 pages
hst951 7
No ratings yet
hst951 7
32 pages
PCA Assignment Quiz
No ratings yet
PCA Assignment Quiz
4 pages
Contents of DADM
No ratings yet
Contents of DADM
2 pages
Machine Learning Techniques Guide
No ratings yet
Machine Learning Techniques Guide
5 pages
Interview Questions On Machine Learning
100% (4)
Interview Questions On Machine Learning
22 pages
Data Analytics Courses in Pune
No ratings yet
Data Analytics Courses in Pune
25 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Understanding Eigenvalues and Eigenvectors
No ratings yet
Understanding Eigenvalues and Eigenvectors
48 pages
Dimensionality Reduction Techniques
No ratings yet
Dimensionality Reduction Techniques
7 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Data Science Interview Prep Guide
No ratings yet
Data Science Interview Prep Guide
3 pages
Linear Regression Makes Several Key Assumptions
No ratings yet
Linear Regression Makes Several Key Assumptions
5 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Machine Learning Lab Assignment 21: MCKV Institute of Engineering
No ratings yet
Machine Learning Lab Assignment 21: MCKV Institute of Engineering
3 pages
Linear Regression Datascience Basit PDF
No ratings yet
Linear Regression Datascience Basit PDF
19 pages
FDS Unit 1
No ratings yet
FDS Unit 1
9 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Anova and Pca
No ratings yet
Anova and Pca
10 pages
An End To End Comprehensive Guide For Pca
No ratings yet
An End To End Comprehensive Guide For Pca
24 pages
R Code Regression PCA Guide
No ratings yet
R Code Regression PCA Guide
5 pages
08PCA
No ratings yet
08PCA
21 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
No ratings yet
Project 2: Submitted By: Sumit Sinha Program & Group: Pgpbabionline May19 - A
17 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Predictive Analytics Primer
No ratings yet
Predictive Analytics Primer
66 pages
Outline Draft 1
No ratings yet
Outline Draft 1
3 pages
Bias-Variance Tradeoff and Model Evaluation
No ratings yet
Bias-Variance Tradeoff and Model Evaluation
9 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Origin vs. OriginPro Feature Comparison
No ratings yet
Origin vs. OriginPro Feature Comparison
3 pages
Statistical ML Overview
No ratings yet
Statistical ML Overview
34 pages
Week 9 Lecture - Revision Test-Dual-Translated
No ratings yet
Week 9 Lecture - Revision Test-Dual-Translated
92 pages
ML in 10 Pages 1683806402
No ratings yet
ML in 10 Pages 1683806402
10 pages
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
No ratings yet
Principal Component Analysis Limitations and How To Overcome Them Let's Talk A
5 pages
Unsupervised Learning: PCA & Clustering
No ratings yet
Unsupervised Learning: PCA & Clustering
96 pages
My Notes
No ratings yet
My Notes
15 pages
PCA for Dimensionality Reduction in R
No ratings yet
PCA for Dimensionality Reduction in R
31 pages
Feature Extraction Techniques Guide
No ratings yet
Feature Extraction Techniques Guide
16 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
PA Summary Sheet
No ratings yet
PA Summary Sheet
9 pages
Segmentation-Factor Analysis
No ratings yet
Segmentation-Factor Analysis
50 pages
Excel Stats Nicar2013
No ratings yet
Excel Stats Nicar2013
6 pages
Full Statistics Concept and Applications 2nd Edition Nabendu Pal Ebook All Chapters
100% (11)
Full Statistics Concept and Applications 2nd Edition Nabendu Pal Ebook All Chapters
84 pages
Chap003 - Forecasting
No ratings yet
Chap003 - Forecasting
39 pages
Alasan R2
No ratings yet
Alasan R2
24 pages
Simple Regression Analysis Guide
No ratings yet
Simple Regression Analysis Guide
2 pages
Ordinal Caca
No ratings yet
Ordinal Caca
3 pages
Gross1977
No ratings yet
Gross1977
3 pages
Multiple Regression and Model Building: Dr. Subhradev Sen Alliance School of Business
No ratings yet
Multiple Regression and Model Building: Dr. Subhradev Sen Alliance School of Business
150 pages
Telecom Data Science Tasks
No ratings yet
Telecom Data Science Tasks
3 pages
Adaptive Kalman Filtering For INS and GPS
No ratings yet
Adaptive Kalman Filtering For INS and GPS
11 pages
Model Assessment and Selection
No ratings yet
Model Assessment and Selection
33 pages
Econometrics: Omitted Variable Bias
No ratings yet
Econometrics: Omitted Variable Bias
34 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
LPM, Logit and Probit Models
No ratings yet
LPM, Logit and Probit Models
21 pages
Elasticity of Activity Status Analysis
No ratings yet
Elasticity of Activity Status Analysis
3 pages
TSF Extended Project
No ratings yet
TSF Extended Project
4 pages
The Assumptions of The Classical Linear Regression Model
No ratings yet
The Assumptions of The Classical Linear Regression Model
4 pages
Chap 5-1 - Machine Learning Basics - Jinwook Kim
No ratings yet
Chap 5-1 - Machine Learning Basics - Jinwook Kim
39 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
Smartpls Report: Complete Final Results
No ratings yet
Smartpls Report: Complete Final Results
291 pages
E:/Data Kuliah/Olah Data KERJAAN/SKRIPSI 49/lampiran Olah Data/OUTPUT SPSS/Path Amos - Amw
No ratings yet
E:/Data Kuliah/Olah Data KERJAAN/SKRIPSI 49/lampiran Olah Data/OUTPUT SPSS/Path Amos - Amw
9 pages
Grade 11 Statistics: Point Estimation Guide
No ratings yet
Grade 11 Statistics: Point Estimation Guide
5 pages
Econometrics Test: Sales Revenue Analysis
No ratings yet
Econometrics Test: Sales Revenue Analysis
4 pages
HW 3
No ratings yet
HW 3
20 pages
Least Square Method Definition
No ratings yet
Least Square Method Definition
7 pages
SEE2003 Final Exam Review
No ratings yet
SEE2003 Final Exam Review
24 pages
Regression and Prediction Basics
No ratings yet
Regression and Prediction Basics
8 pages
INDUSTRIAL CONTROL SYSTEMS Mathematical and Statistical Models and Techniques 1st ediiton by Adedeji Badiru, Oye Ibidapo Obe, Babatunde Ayeni ISBN 1420075586 â€ŽÂ 978-1420075588 - Read the ebook online or download it to own the full content
No ratings yet
INDUSTRIAL CONTROL SYSTEMS Mathematical and Statistical Models and Techniques 1st ediiton by Adedeji Badiru, Oye Ibidapo Obe, Babatunde Ayeni ISBN 1420075586 â€ŽÂ 978-1420075588 - Read the ebook online or download it to own the full content
41 pages
Box-Jenkins (Part 1)
No ratings yet
Box-Jenkins (Part 1)
35 pages
Regression Analysis Essentials
100% (1)
Regression Analysis Essentials
2 pages
Logistic Regression for BBA Students
No ratings yet
Logistic Regression for BBA Students
12 pages