0% found this document useful (0 votes)

19 views7 pages

Data Analyst

mock alooba

Uploaded by

saintmichael2512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views7 pages

Data Analyst

mock alooba

Uploaded by

saintmichael2512

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

In an IoT sensor network monitoring environmental temperatures, sensors often

fail during extreme weather conditions, resulting in non-random missing

temperature readings. Which imputation method best addresses the bias from
these failures?

1. Fill missing readings using reliable seasonal averages consistently derived from
comprehensive historical sensor data.
2. Employ multiple imputation using MICE to accurately estimate missing temperature
values with correlated variables.
3. Use forward-fill technique to reliably propagate previous reading values consistently
across missing intervals effectively.
4. Apply simple linear interpolation between neighbouring points to precisely impute
missing sensor values accurately.
5. Remove affected records with missing readings to reliably maintain a robust and
complete dataset.

You oversee an ETL pipeline that extracts data from multiple source systems with
loosely controlled schemas. Occasionally, these sources introduce new columns or
alter data types unexpectedly, which can affect downstream processes. Which
design strategy best supports seamless schema evolution?

1. Adopt a metadata-driven ETL framework with dynamic schema detection.

2. Manually update transformation logic for each schema change.
3. Configure the pipeline to log errors and skip records with schema mismatches.
4. Freeze the schema and require all sources to conform.
5. Rebuild the ETL pipeline after every detected change.

Imagine you are given a DataFrame that contains separate columns for
'morning_sales', 'afternoon_sales', and 'evening_sales'. To analyze sales trends
over the course of the day, you need to transform this data into a long format
with two columns: one for the time period and another for sales figures. What is
an effective approach to perform this transformation in Pandas?

1. Reindex the DataFrame and perform string-based splitting on column names to

create new rows.
2. Use groupby() to aggregate the sales sums, thereby merging the different time
columns.
3. Concatenate separate DataFrames created from each sales column manually.
4. Use the melt() function to consolidate the time-specific columns into 'time' and 'sales'
columns.
5. Apply pivot_table() to rotate the data into a long format without specifying index
values.

A company is running an A/B test on two distinct email campaigns to boost

customer conversions. Each recipient is randomly assigned to either Campaign A
or Campaign B, and the outcome for every recipient is recorded as a conversion
(yes/no). Which inferential statistical test should be used to determine if there is
a statistically significant difference in conversion rates between the two
campaigns?

 Paired Samples t-test

 One-way ANOVA
 Independent Samples t-test
 Two-proportion z-test
 Chi-square test of independence

A retail chain analyzes daily customer spending, which shows a long right tail due
to occasional large purchases. To provide a robust summary of the data’s
dispersion, which statistic is most appropriate for capturing the spread of the
middle 50% of the data?

1. Interquartile Range (IQR)

2. Mean Absolute Deviation
3. Standard Deviation
4. Variance
5. Range

A digital platform tests two homepage designs to see which one increases user
engagement time. The samples from the two versions are independent and
moderately large. Which hypothesis test is best for comparing the average
engagement times?

1. Mann-Whitney U test
2. Z-test
3. One-sample t-test
4. Independent t-test
5. Paired t-test

A financial team wants to summarize the monthly operational expenses across

various departments, showcasing medians, variability, and potential outlier costs.
Which chart type would most effectively display these details in a compact
format?

1. Box plot
2. Pie chart
3. Bar chart
4. Line chart
5. Histogram

A global product performance dashboard uses various visualization types like bar
charts, line graphs, and pie charts to compare metrics across different regions.
Users report that inconsistent numerical scales are causing confusion and
misinterpretation of trends. Which principle should be prioritized to resolve this
issue?

1. Adopt uniform color gradients across all charts

2. Utilize diverse chart styles for regional uniqueness
3. Introduce interactive options for scaling adjustments
4. Ensure consistent scale and axis configurations
5. Standardize font styles on all labels

You are managing a database that consists of two tables: Customers and Orders.
Not every customer has placed an order, but you need to display every customer
alongside any corresponding orders. Which join operation should you use in your
SQL query to make sure that all customers are listed, regardless of whether they
have an order record?
1. FULL OUTER JOIN
2. CROSS JOIN
3. INNER JOIN
4. RIGHT JOIN
5. LEFT JOIN

A marketing team observes that the distribution of customer spending is highly

skewed, leading standard outlier detection methods to incorrectly flag a
significant segment of high-value customers as outliers. Which strategy would
best adjust for the skewness when detecting outliers?

1. Leverage simple mean-based thresholds after temporarily excluding extreme values.

2. Apply a log transformation to normalize the distribution before using the IQR method.
3. Use z-score standardization and filter out values with scores above 3.
4. Utilize clustering techniques like k-means to isolate small clusters as outliers.
5. Increase the IQR threshold multiplier from 1.5 to 3 to capture more data points.

While consolidating supplier records from various regions, you notice that
addresses use different abbreviations and formatting. Which approach best
addresses duplicate removal for records with such minor variations?

1. Apply exact matching on formatted data

2. Sort records and remove consecutive duplicates
3. Implement fuzzy matching with set threshold
4. Combine rule-based filters with manual review
5. Use supplier IDs for deduplication

Imagine you’re tasked with cleaning a survey dataset that includes participant
ages. Some entries are clearly invalid, falling outside the plausible range of 0 to
120. Which approach best validates these entries while retaining as much correct
data as possible?

1. Use a statistical outlier filter centered on the mean to remove values without using
preset thresholds.
2. Convert all age inputs to numbers and substitute extreme values with the overall
average.
3. Discard all age entries that fall outside the 0–120 range without further investigation.
4. Automatically adjust any age values outside the expected range to the nearest valid
boundary.
5. Apply predefined numeric boundaries to flag ages below 0 or above 120 and
manually review borderline cases.

You are working with a dataset of daily sales transactions that includes columns
for the date, region, and sales amount. To evaluate quarterly trends and total
sales per region, what is the most efficient method to group your data within a
pivot table?

1. Change the date formatting in the pivot table to display quarter numbers, which
Excel will then use to group the dates.
2. Utilize the pivot table’s 'Group Field' feature on the date field and select 'Quarters' so
that Excel aggregates the data automatically.
3. Manually add a new column calculating the quarter for each transaction and use that
column in the pivot table.
4. Sort the data by date before creating the pivot table to let Excel automatically detect
quarterly groupings.
5. Create separate pivot tables for each quarter by applying a date filter for each one.

Imagine you are compiling a quarterly sales report from a dataset that includes
the columns Region and Sales. You need to calculate the total sales where the
region is "North" and each sale exceeds $1,000. Which Excel formula structure
would best achieve this result?

1. =SUM(Sales*(Region="North")*(Sales>1000))
2. =SUMIF(Region, "North", Sales)
3. =SUMIFS(Sales, Region, "North", Sales, ">1000")
4. =SUMIF(Sales, ">1000") + SUMIF(Region, "North", Sales)
5. =SUMIFS(Sales, Sales, ">1000")

In your automated infrastructure deployment, a script that works flawlessly in the

staging environment repeatedly fails in a production Linux environment. After
some investigation, you find that subtle differences in environment variable
settings across these environments are affecting the script’s behavior. Which
measure would most effectively address this issue to ensure consistent script
execution?

1. Wrap the failing parts of the script with conditional checks that ignore discrepancies
in environment variables.
2. Implement a centralized configuration management system to enforce uniform
environment variable settings across all environments.
3. Manually adjust the environment variables on the production system each time
before deployment.
4. Hard-code the environment variables directly within the script for the production
deployment.
5. Increase the script’s error handling to bypass failures related to environment
variables.

Imagine you are writing a shell script to process a list of filenames that may
contain spaces. You have two filenames: "My File.txt" and "Another File.txt".
Which of the following looping approaches best ensures that each filename is
treated as a single item in your script?

1. files="My File.txt Another File.txt" for file in $files; do echo "$file" done
2. files=("My File.txt" "Another File.txt") for file in "$files"; do echo "$file" done
3. files=("My File.txt" "Another File.txt") for file in "${files[@]}"; do echo "$file" done
4. files="My File.txt Another File.txt" IFS=$'\n' for file in $files; do echo "$file" done
5. for file in $(ls *.txt); do echo "$file" done

Imagine you are responsible for allocating limited resources among several
ongoing projects in an organization. The historical performance data, current
trends, and future projection metrics vary significantly across these projects.
Time constraints limit the possibility of a deep dive into every detail, but you still
need to make a decision that balances risk and reward while considering long-
term benefits. How would you approach this decision?

1. Invest primarily in the project with the highest current performance numbers,
trusting that past success predicts future results.
2. Allocate resources equally across all projects to ensure fairness, assuming similar
potential across the board.
3. Focus on projects with a history of stable performance, believing that consistency will
guarantee future success even if growth prospects are limited.
4. Prioritize projects that receive favorable internal opinions, assuming political support
within the organization reflects quality potential.
5. Perform a cost-benefit analysis that integrates historical performance, current trends,
and future projections to identify which projects offer the best balance of risk and
reward.

Imagine you are assessing the impact of a new service launch. Sales figures show
a decline, but customer satisfaction surveys indicate strong positive feedback.
What would be the most effective step to reconcile these conflicting outcomes?

1. Investigate data collection methods from both sources to understand possible biases.
2. Consult with the marketing team to align survey interpretations with the sales
targets.
3. Immediately attribute the discrepancy to data entry errors and disregard one of the
reports.
4. Prioritize the positive customer feedback over the declining sales figures.
5. Conduct a detailed time-series analysis to check if external factors like seasonality
affected sales.

You are a data analyst working for a large US based superstore. You have been
granted access to a historic sales database for the superstore that contains all
sales orders from 2014-10-01 until 2017-09-09 across multiple related tables, as
detailed in the following Entity Relationship Diagram (ERD). Note: there are
multiple rows in orders per orderId.

Your manager would like to see the orderId, customerId and productId for the last
15 returned orders, based on the orderDate column. If two returned orders were
ordered on the same day, sort by value, showing the highest value order first.
WITH order_totals AS(
SELECTorderId,
MAX(customerId) AS customerId,
MAX(orderDate) AS orderDate,
SUM(value) AS order_value
FROM orders
GROUP BY orderId
), returned_orders AS(
SELECT t.orderId, t.customerId, t.orderDate, t.order_value, r.returned
FROM order_totals AS t
JOIN returns AS r
ON t.orderId = r.orderId
WHERE r.returned = 1
), ranked AS(
SELECT orderId, customerId, orderDate, order_value,
ROW_NUMBER()OVER(ORDER BY orderDate DESC, order_value DESC) AS rn
FROM returned_orders
)
SELECT o.orderId, o.customerId, o.productId
FROM ranked AS r
JOIN orders AS o
ON r.orderId = o.orderId
WHERE r.rn <= 15
ORDER BY r.orderDate DESC, r.order_value
LIMIT 15;

You are given two tables: employees (with columns employee_id, name) and
orders (with columns order_id, employee_id, order_date). You need to generate a
report listing the employees who have processed more orders than the average
number of orders processed by all employees. Which SQL query correctly uses
subqueries to achieve this goal?

SELECT e.employee_id, e.name

FROM employees e

WHERE (SELECT COUNT(*)

FROM orders o

WHERE o.employee_id = e.employee_id)

> (SELECT AVG(order_count)

FROM (SELECT COUNT(*) AS order_count

FROM orders

GROUP BY employee_id));

SELECT e.employee_id, e.name

FROM employees e

WHERE (SELECT COUNT(*)

FROM orders o
WHERE o.employee_id = e.employee_id)

> (SELECT AVG(COUNT(*))

FROM orders

GROUP BY employee_id);

SELECT e.employee_id, e.name

FROM employees e

JOIN (SELECT employee_id, COUNT(*) AS total

FROM orders

GROUP BY employee_id) t

ON e.employee_id = t.employee_id

WHERE t.total > (SELECT AVG(total) FROM orders);

SELECT e.employee_id, e.name

FROM employees e

WHERE COUNT(*) > (SELECT AVG(order_count)

FROM (SELECT COUNT(*) AS order_count

FROM orders

GROUP BY employee_id) AS avg_orders);

SELECT e.employee_id, e.name

FROM employees e

JOIN (SELECT employee_id, COUNT(*) AS order_count

FROM orders

GROUP BY employee_id) t

ON e.employee_id = t.employee_id

WHERE t.order_count > (

SELECT COUNT(*) FROM orders

)/(

SELECT COUNT(DISTINCT employee_id) FROM orders

);

Marketing Analyst
No ratings yet
Marketing Analyst
5 pages
Soal Latihan IT Specialist Data Analytics
No ratings yet
Soal Latihan IT Specialist Data Analytics
12 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Python Scenario Based Interview QA
No ratings yet
Python Scenario Based Interview QA
3 pages
Module 2 - Data Preprocessing
No ratings yet
Module 2 - Data Preprocessing
16 pages
Flipkart Analyst Interview Insights
No ratings yet
Flipkart Analyst Interview Insights
16 pages
Data Validation and Transformation in BI
No ratings yet
Data Validation and Transformation in BI
21 pages
Data Preprocessing
No ratings yet
Data Preprocessing
120 pages
U RFD FGnex W3 N 4 RN JQYCm MX
No ratings yet
U RFD FGnex W3 N 4 RN JQYCm MX
28 pages
GMU CLSS Data Analysis Techniques
No ratings yet
GMU CLSS Data Analysis Techniques
14 pages
Important Questions With Solutions IP
No ratings yet
Important Questions With Solutions IP
5 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
50 Interview Questions & Answers!
No ratings yet
50 Interview Questions & Answers!
52 pages
CS3352 Foundations of Data Science Apr May 2024 Question Paper Download
No ratings yet
CS3352 Foundations of Data Science Apr May 2024 Question Paper Download
19 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
AIML
No ratings yet
AIML
13 pages
Data Cleaning Checklist & AI Prompts (40 Prompts)
No ratings yet
Data Cleaning Checklist & AI Prompts (40 Prompts)
10 pages
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
No ratings yet
Lesson 3. Data Preparation and Structuring 1 Data Cleaning
36 pages
Unit-II Data Science QB
No ratings yet
Unit-II Data Science QB
33 pages
Microsoftfabricanalyticsengineerdp 600examdumps2024 240518151026 9b189f89
No ratings yet
Microsoftfabricanalyticsengineerdp 600examdumps2024 240518151026 9b189f89
17 pages
Set-B - CT2 - AnswerKey
No ratings yet
Set-B - CT2 - AnswerKey
10 pages
Assignment Big Data
No ratings yet
Assignment Big Data
7 pages
Importance of Data Preprocessing in Mining
No ratings yet
Importance of Data Preprocessing in Mining
77 pages
Data Preprocessing Essentials
No ratings yet
Data Preprocessing Essentials
33 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
PCED - Lösung en
No ratings yet
PCED - Lösung en
24 pages
DWDM Bits With Answers
No ratings yet
DWDM Bits With Answers
4 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Bank Loan Case Study
No ratings yet
Bank Loan Case Study
71 pages
Fds QB
No ratings yet
Fds QB
6 pages
23CS5PCDEV
No ratings yet
23CS5PCDEV
7 pages
Document
No ratings yet
Document
29 pages
Set 1
No ratings yet
Set 1
2 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
43 pages
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
No ratings yet
CSC 452 DM Week04 Data PreProcessing A 13102020 015436pm
31 pages
Data Cleaning Essentials
No ratings yet
Data Cleaning Essentials
42 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
11 pages
Data Handling Ques
No ratings yet
Data Handling Ques
2 pages
Data Literacy Homework Guide
No ratings yet
Data Literacy Homework Guide
4 pages
CSE1703 - Fundamental of Data Science
No ratings yet
CSE1703 - Fundamental of Data Science
6 pages
Big Data Analytics Suggestion
No ratings yet
Big Data Analytics Suggestion
3 pages
Chap.3 Data Preprocessing
No ratings yet
Chap.3 Data Preprocessing
6 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
BDA Important Questions
No ratings yet
BDA Important Questions
3 pages
Prelims
No ratings yet
Prelims
3 pages
Software Dev
No ratings yet
Software Dev
3 pages
Text 3
No ratings yet
Text 3
3 pages
Interview Questions For Data Analysis and Data Science
No ratings yet
Interview Questions For Data Analysis and Data Science
19 pages
ML ch-1
No ratings yet
ML ch-1
32 pages
Past Years Question - Oct24 - dsc651 Test
No ratings yet
Past Years Question - Oct24 - dsc651 Test
18 pages
Assignment 4 MB511
No ratings yet
Assignment 4 MB511
6 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Data Mining Concepts & Techniques
No ratings yet
Data Mining Concepts & Techniques
28 pages
Excel Mastery for All Users
90% (10)
Excel Mastery for All Users
122 pages
Python Pandas: DataFrames, Pivoting, and Sorting
No ratings yet
Python Pandas: DataFrames, Pivoting, and Sorting
19 pages
Excel SUM and SUMIF Function Guide
No ratings yet
Excel SUM and SUMIF Function Guide
86 pages
It Sba 2024
No ratings yet
It Sba 2024
8 pages
Advanced Data Visualization Techniques in Excel
No ratings yet
Advanced Data Visualization Techniques in Excel
14 pages
Global Business Computer Skills Course
No ratings yet
Global Business Computer Skills Course
10 pages
PowerQuery PowerPivot DAX PDF
100% (1)
PowerQuery PowerPivot DAX PDF
115 pages
EXCEL Lab Manual
No ratings yet
EXCEL Lab Manual
17 pages
Sales Data Analysis by Region and Item
No ratings yet
Sales Data Analysis by Region and Item
9 pages
Interview Study Guide
No ratings yet
Interview Study Guide
16 pages
Data Analytics Intern
No ratings yet
Data Analytics Intern
42 pages
MCD2080 Week 1 Tutorial & Exercises
No ratings yet
MCD2080 Week 1 Tutorial & Exercises
139 pages
Meet Excel 2016 New Features
No ratings yet
Meet Excel 2016 New Features
15 pages
Pivot Table Notes MS Excel
No ratings yet
Pivot Table Notes MS Excel
2 pages
66 Data Analyst Interview Questions To Ace Your Interview
No ratings yet
66 Data Analyst Interview Questions To Ace Your Interview
47 pages
Sa Lab Manual Final Merged
No ratings yet
Sa Lab Manual Final Merged
64 pages
Excel 2007 Advanced Training Course
No ratings yet
Excel 2007 Advanced Training Course
3 pages
Dca 2025
No ratings yet
Dca 2025
24 pages
Sod-Analyze Using MS Excel
No ratings yet
Sod-Analyze Using MS Excel
9 pages
(Etextbook PDF) For Business Analytics 3rd Edition by James R. Evans Ebook Fully Accessible Version
100% (1)
(Etextbook PDF) For Business Analytics 3rd Edition by James R. Evans Ebook Fully Accessible Version
42 pages
MS Word 2007 Document Formatting
No ratings yet
MS Word 2007 Document Formatting
79 pages
Data Visualization in Excel
No ratings yet
Data Visualization in Excel
5 pages
Foundations of Data and Digital Journalism Alex Richards Download
100% (1)
Foundations of Data and Digital Journalism Alex Richards Download
91 pages
MOOC Week-1 Student-Workbook - Exercises
No ratings yet
MOOC Week-1 Student-Workbook - Exercises
20 pages
Analyzing Survey Questionnaires
100% (1)
Analyzing Survey Questionnaires
16 pages
50 Excel Interview Questions
No ratings yet
50 Excel Interview Questions
31 pages
R22 Data Science Using Python Lab Manual
No ratings yet
R22 Data Science Using Python Lab Manual
127 pages
Python Pandas Assignment Guide
No ratings yet
Python Pandas Assignment Guide
9 pages
EXCEL Assignment
No ratings yet
EXCEL Assignment
10 pages
ADVANCE EXCEL Questions Sept 2023 Answer
No ratings yet
ADVANCE EXCEL Questions Sept 2023 Answer
3 pages