0% found this document useful (0 votes)
272 views19 pages

Datanest - Data Science Interview

This document provides information and questions for an interview for a data science internship at Datanest. It includes 10 problems covering topics like data cleansing, analysis, storytelling, structured thinking, solution implementation, computer vision, NLP, Bayesian statistics, frequentist statistics, and feature engineering. Candidates are asked to provide concise answers to the problems in a slide format by the July 1st deadline. The interview is assessing problem solving skills and various data science research capacities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
272 views19 pages

Datanest - Data Science Interview

This document provides information and questions for an interview for a data science internship at Datanest. It includes 10 problems covering topics like data cleansing, analysis, storytelling, structured thinking, solution implementation, computer vision, NLP, Bayesian statistics, frequentist statistics, and feature engineering. Candidates are asked to provide concise answers to the problems in a slide format by the July 1st deadline. The interview is assessing problem solving skills and various data science research capacities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Data Science Internship

Interview Preparation
Deadline: 1 July 2019
(Please answer using the slide format and give your full
name on the first slide on your answer slide)
Notes
Deadline: 1 July 2019

As fast as you apply as fast as you can be interviewed, but please


prepare your answer carefully
Title of Content

1. Data Cleansing
2. Data Analysis
3. Data Storytelling
4. Structured Thinking
5. Data Solution Implementation
6. Optional Research Capacity : Computer Vision
7. Optional Research Capacity : NLP
8. Optional Research Capacity : Bayesian Statistics
9. Optional Research Capacity : Frequentist Statistics
10. Optional Research Capacity : Feature Engineering
Clue
In Datanest, we believe problem solving is the key of any data science
activities.

Please solve the problem effectively, minimize work on create synthetic data,
code, visualization, etc. Bring simplest answer that you can defend to technical
and non-technical people effectively.

The problem is not hard, but it requires you to be resourceful and have a strong
understanding of the problem

We start from “Dataset 2” so “Dataset 1” does not exist in this kit


Optional
There’s 5 research capacities that we are assessing:

1. Research Capacity : Computer Vision


2. Research Capacity : NLP
3. Research Capacity : Bayesian Statistics
4. Research Capacity : Frequentist Statistics
5. Research Capacity : Featuring Accuracy

Your score is determined by 3 topics with the highest score, you can work on 5 of them .
Dataset 2

Here’s the sample of Dataset


Dataset 3
Number of
Buyers
Dataset 4

Phone Number Status

085674872274 Real

085612341234 Unreal

081243579357 Real

081328648738 Real

081122334455 Unreal

081234567890 Unreal

081726842689 Real
Problem 1: Data Cleansing

1. In Dataset 2, How to transform the description column in


order to make it easier to analyze?
2. If the columns `label` is empty in 10 millions rows what will
you do to fill the missing data?
3. What yo do to deal with abbreviation and misspelled words?
4. How to deal with Imbalanced Classes, Outliers,and Rare Data?
Problem 2: Data Analysis

1. What is difference between bias and variance?


2. How do you know if one machine learning algorithm is better than another on
accuracy, reliability, and scalability?
3. What is difference between close-form and non close-form?
4. What is difference between feature, parameter, and variables?
5. What is difference between survival analysis, time series analysis, classification,
recommendation engine and clustering (in terms of input and output)?
6. What is differences between Hold-Out Validation, Cross- Validation, and
Bootstrapping?
Problem 3: Data Storytelling

1. Based on Dataset 3 (Slide 7) left chart, how many people that


came in May 2018 are still coming in July 2018?
2. What data need to make chart on Dataset 3?
3. How to create the left chart on Dataset 3?
4. How to create the right chart on Dataset 3?
5. If we make chart based on left chart in Data, what chart that
you need to make?
Problem 4: Structured Thinking

1. Based on dataset 4, What pattern determined that the number


is real and unreal?
2. Write pseudocode to determine if the number is real and
unreal?

(Clue: you can do multiple pseudocode)


Problem 5: Data Solution Implementation
1. What is differences between Business Intelligence and Data Science in terms of (a)
business question, (b) analytic characteristics, (c) analytic engagement processes, (d)
data models, (e) business view.
2. List benefits that data lake could bring to organizations existing data warehousing
environment, business analysts and data scientists.
3. What are issues that are preventing companies migrates to cloud solutions?
4. List the cultural changes that organizations must address if they would like to
become data driven, to leverage big data to its maximum business potential and what
are the organization needs to address those challenges.
5. Select two of outward-facing BI dashboards that can be checked daily/weekly (one
example for retail industry, one for financial industry), and what is the most important
insight to be displayed?
Problem 6: Computer Vision
1. Describe the required steps in order to build a proper object detection engine!
2. What is the difference between Semantic Segmentation, Object Detection, Image
Generation, and Pose Estimation in terms of Input, Output and Label?
3. In YOLO (https://pjreddie.com/darknet/yolo/), there are 5 type of loss function, can
you please explain them?
Problem 7: NLP
1. Explain differences (pros and cons) between building chatbot with NLTK, Seq2seq,
and Rasa Framework
2. What is differences between TF-IDF, Cosine Similarity, FastText in terms on text based
feature engineering?
Problem 8: Bayesian Statistics
1. What is differences between Bayesian and Frequentist statistics?
2. What is types of bayesian statistics are available on Ludwig
(https://uber.github.io/ludwig/), and describe their inputs and outputs of them?
Problem 9: Frequentist Statistics
1. You have multiple ads to experiment in a campaign, explain the steps of
experiment using (a) A/B Testing, and (b) Multi-Armed Bandit.
2. What is difference between panel data analysis, longitudinal data analysis,
multilevel statistical model, and structural equation modeling in terms of dataset
requirement?
Problem 10: Feature Engineering
1. What is differences between LabelCount, Target , NaN Encoding, Polynomial,
Consolidation and Expansion Encoding
2. What is differences between standard (Z), MinMax, Root and Log scaling
3. Please list 5 feature engineering on address data
Great things happen to
those who don't stop
Closing
believing, trying,
learning, and being
grateful.

You might also like