0% found this document useful (0 votes)
90 views24 pages

Amazon Data Analyst Interview Prep

Uploaded by

sabbanwar16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views24 pages

Amazon Data Analyst Interview Prep

Uploaded by

sabbanwar16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

#Amazon Data Analyst Interview Questions for 1-3 years of experience role :-

A. SQL:

1. You have two tables: Employee and Department.


- Employee Table Columns: Employee_id, Employee_Name, Department_id, Salary
- Department Table Columns: Department_id, Department_Name, Location

Write an SQL query to find the name of the employee with the highest salary in each location.

2. You have two tables: Orders and Customers.


- Orders Table Columns: Order_id, Customer_id, Order_Date, Amount
- Customers Table Columns: Customer_id, Customer_Name, Join_Date

Write an SQL query to calculate the total order amount for each customer who joined in the current
year. The output should contain Customer_Name and the total amount.

B. Python:

1. Basic oral questions on NumPy (e.g., array creation, slicing, broadcasting) and Matplotlib (e.g., plot
types, customization).

2. Basic oral questions on pandas (like: groupby, loc/iloc, merge & join, etc.)

2. Write the code in NumPy and Pandas to replicate the functionality of your answer to the second SQL
question.

C. Leadership or Situational Questions:

(Based on the leadership principle of Bias for Action)

- Describe a situation where you had to make a quick decision with limited information. How did you
proceed, and what was the outcome?

(Based on the leadership principle of Dive Deep)

- Can you share an example of a project where you had to delve deeply into the data to uncover insights
or solve a problem? What steps did you take, and what were the results?

(Based on the leadership principle of Customer Obsession)

- Tell us about a time when you went above and beyond to meet a customer's needs or expectations.
How did you identify their requirements, and what actions did you take to deliver exceptional service?

D. Excel:

Questions on advanced functions like VLOOKUP, XLookup, SUMPRODUCT, INDIRECT, TEXT functions,
SUMIFS, COUNTIFS, LOOKUPS, INDEX & MATCH, AVERAGEIFS. Plus, some basic questions on pivot tables,
conditional formatting, data validation, and charts.
Amazon interview questions for Data Analyst 2024.

1. How would you retrieve the second highest salary from a table called Employees without using LIMIT
or TOP?
2. Write a query to display employees who joined in the same month but in different years from the
Employees table.
3. Given two tables, Orders and Customers, write a query to find all customers who placed more than
five orders in the last year.
4. How would you update the Department column in the Employees table based on a matching
EmployeeID in another table called Departments?
5. Write a SQL query to find the total sales for each product category, but exclude categories with total
sales less than a specific threshold (e.g., $10,000).
6. How would you create a Python function that reads a large CSV file in chunks and processes each
chunk for data cleaning?
7. Write a Python script that takes a list of employee records, filters out records where the salary is
below a certain value, and writes the filtered records to a new file.
8. How would you handle missing values in a dataset using pandas, and how would you decide which
method to use (e.g., mean imputation vs. forward fill)?
9. Given a list of numbers, write a Python program to group the numbers into ranges (e.g., 1-10, 11-20)
and count the number of elements in each range.
10. How would you connect to an SQL database using Python and fetch data into a pandas DataFrame
for analysis?
11. How would you use VLOOKUP or INDEX-MATCH to find a value in one Excel sheet and return
corresponding information from another sheet?
12. Write an Excel formula that calculates the weighted average of a set of numbers, where the weights
are stored in another column.
13. How would you create a dynamic Excel dashboard that allows users to filter data by multiple criteria
and display the results visually (e.g., via charts or pivot tables)?
14. Explain how you would use Excel Solver to optimize a product mix for maximizing profit under given
constraints.
15. How can you create a measure that calculates the running total of sales over time, and how would
you display it in a line chart?
16. How would you use Power Query to clean and transform a dataset, such as removing duplicates,
splitting columns, and filtering rows based on conditions?
17. Describe a scenario where you would need to use Merge Queries in Power Query, and how would
you do it?
18. How can you create a custom tooltip in Power BI to show additional information when a user hovers
over a visual?

SQL Interview Question for Data Analyst Role at HDFC Bank

Problem: You are given two tables - customers and transactions.

customers table: Contains customer information.


customer_id: Unique ID for each customer.
customer_name: Name of the customer.

transactions table: Contains transaction details.


transaction_id: Unique ID for each transaction.
customer_id: The ID of the customer who made the transaction.
transaction_amount: Amount of each transaction.
transaction_date: Date of the transaction.

Write an SQL query to find the top 5 customers who have made the highest total transactions in terms
of the amount. Display the customer’s name and their total transaction amount, sorted in descending
order of total transaction amount.

Schema:

customers(customer_id, customer_name)
transactions(transaction_id, customer_id, transaction_amount, transaction_date)

Solution:

SELECT c.customer_name,
SUM(t.transaction_amount) AS total_spent
FROM customers c
JOIN transactions t
ON c.customer_id = t.customer_id
GROUP BY c.customer_name
ORDER BY total_spent DESC
LIMIT 5;

Explanation:

- The query first joins the customers and transactions tables on the customer_id.
- Then, it calculates the total transaction amount (SUM) for each customer using GROUP BY on
customer_name.
- It orders the result by the total spent in descending order (ORDER BY total_spent DESC).
- Finally, it limits the result to the top 5 customers.

This type of SQL problem helps in analyzing transactional data, which is crucial for roles like Data
Analysts in banks to monitor customer behavior and financial trends.
Top Scenario-Based Questions & Answers for Data analyst

1. Scenario: You are managing a SQL database for an e-commerce platform. The "Products" table
includes a column for "ProductCategory," and the "Orders" table records each sale. Your task is to
identify the top 3 best-selling product categories.

Question:
Write a SQL query to find the top 3 best-selling product categories based on the number of orders.

Expected Answer:
SELECT ProductCategory, COUNT(*) AS TotalOrders
FROM Orders
JOIN Products ON Orders.ProductID = Products.ProductID
GROUP BY ProductCategory
ORDER BY TotalOrders DESC
LIMIT 3;

2. Scenario: You are working with a SQL database that stores sales data. The database has a table called
"Sales" that records the sale date and amount for each transaction. Your task is to calculate the average
monthly sales for the current year.

Question:
Write a SQL query to calculate the average monthly sales for the current year.

Expected Answer:
SELECT MONTH(SaleDate) AS SaleMonth, AVG(SaleAmount) AS AvgMonthlySales
FROM Sales
WHERE YEAR(SaleDate) = YEAR(GETDATE())
GROUP BY MONTH(SaleDate);

Power BI consultant Question asked in EY for Associate level

1. What are the different types of filters available in Power BI?


2. Explain the difference between calculated columns and measures in Power BI.
3. How can you improve the performance of Power BI reports and dashboards?
4. What is the purpose of relationships in Power BI, and how do you create them?
5. How do you use DAX for advanced calculations? Can you explain some commonly used DAX
functions?
6. How would you handle a many-to-many relationship in Power BI?
7. Explain the difference between Power Query and Power Pivot.
8. How do you handle data transformation in Power BI using Power Query Editor?
9. How do you deal with missing or incomplete data in Power BI?
10. How would you design an interactive dashboard for a client with a large dataset?
11. Can you walk me through your process for selecting the appropriate visual for a specific type of
data?
12. How do you implement drill-through functionality in Power BI?
13. How do you integrate Power BI with other Microsoft services like Azure or Excel?
14. Have you worked on Power BI service? How do you schedule data refreshes in Power BI Service?
15. Can you describe a challenging project you’ve worked on in Power BI and how you handled it?
16. How do you gather and understand business requirements for a Power BI project?
17. How do you explain technical concepts of Power BI to non-technical stakeholders?
18. How do you ensure data security and privacy when working with sensitive client data?
Good morning
Here are the data analyst interview questions along with comprehensive answers:

1. What is data analysis?


- Data analysis is the process of inspecting, cleansing, transforming, and modeling data to discover
useful information, inform conclusions, and support decision-making. It involves statistical techniques
and tools to analyze data sets and extract insights.

2. What are the key steps in the data analysis process?


- The key steps include:
1. Define the Problem: Understand the question you need to answer.
2. Collect Data: Gather relevant data from various sources.
3. Clean Data: Remove inaccuracies and handle missing values.
4. Analyze Data: Use statistical methods to explore the data.
5. Interpret Results: Draw conclusions based on the analysis.
6. Communicate Findings: Present your findings in a clear manner.

3. What tools and software do you use for data analysis?


- Common tools include Excel for basic data manipulation, SQL for database queries, Python and R for
statistical analysis, and visualization tools like Tableau or Power BI for presenting data insights.

4. Can you explain what SQL is and how it is used in data analysis?
- SQL (Structured Query Language) is a programming language designed for managing and
manipulating relational databases. It allows analysts to query data, perform operations like filtering and
aggregating, and retrieve specific information from large datasets.

5. What is the difference between structured and unstructured data?


- Structured data is organized in a predefined format, such as tables with rows and columns (e.g.,
databases). Unstructured data lacks a specific format and can include text, images, videos, and social
media posts, making it more challenging to analyze.

6. How do you handle missing data in a dataset?


- Missing data can be handled in several ways:
- Deletion: Remove rows with missing values (only if they are not significant).
- Imputation: Replace missing values with mean, median, or mode.
- Prediction: Use algorithms to predict and fill in missing values.

8. Explain the concept of data visualization and its importance.


- Data visualization is the graphical representation of information and data. It helps to communicate
complex data insights in a clear and effective manner, making it easier for stakeholders to understand
trends, patterns, and outliers in the data.

10. How do you ensure the accuracy and integrity of your data analysis?
- Use multiple data sources to cross-verify information.
- Implement data validation techniques during data collection.
- Regularly audit and review datasets for inconsistencies.
- Document my analysis process for transparency
#confidential interview questions for data analyst

SQL:
1. What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?
2. How do you use GROUP BY and HAVING in SQL queries?
3. What is a CROSS JOIN, and how does it differ from other joins?
4. How can you optimize SQL queries for better performance?
5. Explain the difference between WHERE and HAVING clauses.

Excel:
1. How do you use Pivot Tables in Excel, and what are their key benefits?
2. Explain the VLOOKUP and HLOOKUP functions. How are they different?
3. How would you use Excel to find duplicate entries in a dataset?
4. What are Excel’s conditional formatting capabilities, and how do they work?
5. How do you handle large datasets in Excel without affecting performance?

Power BI:
1. What is the difference between a calculated column and a measure in Power BI?
2. How do you perform data modeling in Power BI, and why is it important?
3. Explain the importance of DAX (Data Analysis Expressions) in Power BI. Can you give an example?
4. How do you implement row-level security in Power BI?
5. What are the different types of visualizations available in Power BI, and when would you use each?

Python:
1. How would you use Python for data cleaning? Provide examples using pandas.
2. What are the common data structures in Python used for data manipulation (e.g., lists, dictionaries,
DataFrames)?
3. Explain the difference between loc[] and iloc[] in Pandas.
4. How can you handle missing values in a dataset using Python?
5. What is the difference between apply(), map(), and applymap() in Pandas?

Delloite Consulting Data Analyst Interview Experience (2 Years Experience)

1. Tell me about yourself and your roles & responsibilities at your current workplace.

2. How would you rate your proficiency in SQL on a scale of 1 to 10?

3. Can you write a query to find values in a text column that begin with a specific letter?

4. What is the purpose of a Common Table Expression (CTE) and how does it differ from subqueries?

5. Given a table with two columns (Name, Skills), how would you find candidates best suited for a Data
Analyst role that requires proficiency in SQL, Power BI, and Python?
6. We discussed scenario-based questions related to data visualization tools like Power BI:
a. What are some advantages of using Power BI in a large enterprise?
b. How would you handle report sharing and collaboration in Power BI Service?

7. Can you explain the difference between inner join and left join in SQL with an example?

Recent Interview Question for Data Analyst Role :-

Question 3)
What is the difference between a KPI (Key Performance Indicator) and a dimension?

KPI (Key Performance Indicator):

This is a measurable value that shows how well a company is achieving its key business objectives.
Think of KPIs as the important numbers that tell you how well you're doing. For example, if you're
running a sales team, a KPI might be "total sales revenue" or "number of new customers."

Dimension:

This is a characteristic or attribute that you can use to categorize and analyze data. Dimensions provide
context to your KPIs. For example, in the sales team scenario, dimensions might be "sales region,"
"product type," or "salesperson."

In summary, KPIs are the important metrics you're trying to improve, while dimensions are the
categories you use to break down and analyze those metrics.(simple text data)

Power BI Technical questions asked in Mphasis

1️⃣How would you implement a self-service BI solution using Power BI for a large enterprise client? What
considerations would you keep in mind?

2️⃣Explain the process of setting up and managing a Power BI Premium capacity. How does it differ from
shared capacity?

3️⃣How would you approach data governance and compliance in a Power BI implementation?

4️⃣Describe the process of implementing automated machine learning (Auto ML) in Power BI. What are its
limitations and best use cases?

5️⃣Explain the concept of composite models in Power BI. How can they be used to combine DirectQuery
and Import mode data sources?

6️⃣How would you implement a multi-language solution in Power BI for a global organization?

7️⃣Describe the process of creating and using Power BI dataflows for ETL processes. How do they compare
to traditional ETL tools?

8️⃣How would you approach performance tuning for a Power BI report that uses DirectQuery mode with a
large SQL Server database?

9️⃣Explain the concept of dynamic row-level security (RLS) in Power BI. How would you implement it for a
complex organizational hierarchy?

🔟How would you design a Power BI solution that integrates with Azure Synapse Analytics for big data
processing?

1️⃣1️⃣Describe the process of implementing paginated reports in Power BI. When would you choose
paginated reports over standard Power BI reports?

Hello Everyone,

Today I'm sharing the interview questions asked for Power BI role in Infosys company.

1. Explain about Power BI architecture.


2. What are the data sources that you have connected in Power BI project?
3. What are the data connectivity modes available in Power BI ? Explain about difference between each
of them.
4. What are the common techniques used in Power Query Editor?
5. What is incremental refresh and explain with an real time use case ?
6. What is Hierarchy in Power BI ?
7. Explain about the differences between measure and calculated column?
8. What are the types of views available in Power BI?
9. Explain about Gateway in Power BI and differences between standard gateway and personal
gateway?
10. How to remove weekends data from a sales table ?
11. What is the use of MATCH BY dax function in Power BI ?
12. When DAX query view is launched in Power BI ?
13. What is Microsoft Fabric ? Explain about it.
14. In which year, Power BI is released by Microsoft ?

Goldman Sachs senior data analyst interview asked questions

SQL

1 find avg of salaries department wise from table


2 Write a SQL query to see employee name and manager name using a self-join on 'employees' table
with columns 'emp_id', 'name', and 'manager_id'.
3 newest joinee for every department (solved using lead lag)

POWER BI

1. What does Filter context in DAX mean?


2. Explain how to implement Row-Level Security (RLS) in Power BI.
3. Describe different types of filters in Power BI.
4. Explain the difference between 'ALL' and 'ALLSELECTED' in DAX.
5. How do you calculate the total sales for a specific product using DAX?

PYTHON

1. Create a dictionary, add elements to it, modify an element, and then print the dictionary in
alphabetical order of keys.
2. Find unique values in a list of assorted numbers and print the count of how many times each value is
repeated.
3. Find and print duplicate values in a list of assorted numbers, along with the number of times each
value is repeated.

Apart from

Amazon SQL interview questions for data analyst (with answers):

Here are the top 5 interview questions from Amazon for your practice..

1. 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑟𝑒𝑣𝑖𝑒𝑤 𝑟𝑎𝑡𝑖𝑛𝑔𝑠 𝑝𝑒𝑟 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑝𝑒𝑟 𝑚𝑜𝑛𝑡ℎ

𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Given a table of product reviews, calculate the average review rating for each
product for every month. The data is in a 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 table, which includes 𝑟𝑒𝑣𝑖𝑒𝑤_𝑖𝑑, 𝑢𝑠𝑒𝑟_𝑖𝑑,
𝑠𝑢𝑏𝑚𝑖𝑡_𝑑𝑎𝑡𝑒, 𝑝𝑟𝑜𝑑𝑢𝑐𝑡_𝑖𝑑 and 𝑠𝑡𝑎𝑟𝑠.

2. 𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑖𝑛𝑔 𝑎 𝑠𝑙𝑜𝑤 𝑞𝑢𝑒𝑟𝑦

𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Amazon handles massives datasets, and optimizing SQL queries is crucial for
performance. Discuss various methods to optimize a slow SQL query.

3. 𝑆𝑄𝐿 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠

𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Explain SQL constraints and provide examples of different types of constraints
used to enforce data integrity in databases.

4. 𝐻𝑖𝑔ℎ𝑒𝑠𝑡-𝐺𝑟𝑜𝑠𝑠𝑖𝑛𝑔-𝐼𝑡𝑒𝑚𝑠

𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Find the top two highest grossing-products in each category for the year 2022
from a table 𝑝𝑟𝑜𝑑𝑢𝑐𝑡_𝑠𝑝𝑒𝑛𝑑. The table contains 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦, 𝑝𝑟𝑜𝑑𝑢𝑐𝑡, 𝑢𝑠𝑒𝑟_𝑖𝑑, 𝑠𝑝𝑒𝑛𝑑 and
𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛_𝑑𝑎𝑡𝑒. The output should include the category, product and total spend.

5. 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 () 𝑎𝑛𝑑 𝐷𝐸𝑁𝑆𝐸_()

𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Explain the difference between RANK() and DENSE_RANK() functions in SQL.

Power BI Technical Questions asked in LTIMindtree

⏩ How can you use Power BI dataflows to improve the overall architecture of your BI solution?

⏩ Explain how you would implement incremental refresh in Power BI. What are the requirements and
limitations?

⏩ Describe the process of creating and using calculation groups in Power BI. What are their advantages?

⏩ How would you approach version control and CI/CD for Power BI reports in an enterprise setting?

⏩ Explain the concept of query folding in Power Query. How can you ensure your transformations are
being folded back to the source?

DAX Scenarios:

⏩ Create a measure that shows the top N customers by sales, with all others grouped into an "Others"
category?

⏩ Create a measure that shows the year-over-year growth of a cumulative total?

⏩ Create a measure for a rolling average?

Microsoft Fabric Questions:

⏩ Describe the process of creating and using data warehouses in Microsoft Fabric. How does this
integrate with Power BI?

⏩ How does Microsoft Fabric handle data governance and security? Discuss features like data lineage
and access controls?

⏩ Explain the concept of OneLake in Microsoft Fabric. How does it simplify data management across
different analytics workloads?

⏩ How would you migrate an existing Power BI solution to leverage Microsoft Fabric capabilities?

⏩ Discuss the advantages and potential challenges of using Microsoft Fabric in an enterprise BI
ecosystem?
Interview questions asked for Data Analyst role in Cognizant

Question1) Explain the DAX measures you would use to create a rolling 12-month average in Power BI,
considering potential fiscal year differences?

1️⃣What’s the Rolling 12-Month Average?

It shows the average of something (like sales) for the past 12 months. Example: In December, you show
the average from January to December.

2️⃣Understanding DAX (Data Analysis Expressions)

In Power BI, we use DAX formulas to do calculations. Two important functions for this are:
DATESINPERIOD: Looks back 12 months from a specific date.
AVERAGEX: Calculates the average for those 12 months.

3️⃣The Basic Formula :-

Use this DAX measure to calculate a rolling 12-month average:-

Rolling 12-Month Average =


CALCULATE(
AVERAGEX(
DATESINPERIOD(
'Date'[Date], -- Your date column
MAX('Date'[Date]), -- Latest date in your data
-12, MONTH -- Look back 12 months
),
[YourMeasure] -- Thing you are calculating, like Sales or Revenue
)
)

4️⃣What Does It Do?

It looks at the last 12 months of data starting from the latest date. It calculates the average of what you
care about, like sales or revenue.

5️⃣Handling Fiscal Year Differences:-

Some companies start their fiscal year in a different month (e.g., April instead of January). If so, make
sure your calendar in Power BI reflects this: Create or use a custom calendar that starts in the correct
fiscal month (e.g., April). Adjust the start date in the formula to match your fiscal year.

Affine | Data Analyst | Experience: 5 years | Excel, SQL and Power BI | hashtag#Round1

Q1. Can you explain the significance of the dollar sign ($) in the VLOOKUP function?
Q2. How do you create a pivot table in Excel, and what is its importance?

Q3. What is the difference between calculated columns and measures in Power BI?

Q4. Could you describe the DAX formulas you used in your most recent project and their practical
applications?

Q5. What types of data sources have you connected to in Power Bl, and can you walk me through the
process?

Q6. What are the key differences between Import, Direct Query, and Live Connection modes in Power
BI?

Q7. What are ad hoc reporting requirements, and how do you approach them?

Q8. Can you explain a scenario where you ensured thorough testing while handling ad hoc requests?

Q9. What tools have you used when working on ad hoc reporting requirements?

Q10. What are the limitations or challenges of working with ad hoc reporting requests?

Q11. How would you write a SQL query to find the nth highest salary of an employee without using the
LIMIT clause?

Q12. Write a SQL query to retrieve even-numbered employee IDs from the employee table.

Q13. Why would you use the DENSE_RANK() function instead of RANK() when finding the nth highest
salary?

Q14. What is Row-Level Security (RLS) in Power BI, what are its types, and how do you implement it?

Q15. What strategies do you use to optimize a SQL query?

Q16. Can you explain the purpose of the CONCAT function in Excel and provide an example of its usage?

I am hiring you as a 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘀𝘁 if you can answer these 25 questions:

𝗦𝗤𝗟
- How do you write a query to find duplicate rows in a table?
- How would you perform a left join and filter out nulls in SQL?
- What is a window function in SQL, and how do you use it for ranking data?
- How do you calculate the cumulative sum for a column in SQL?
- What is the difference between UNION and UNION ALL in SQL?

𝗣𝘆𝘁𝗵𝗼𝗻
- How do you import a CSV file into a pandas DataFrame, and how would you handle missing data?
- How do you use list comprehensions to filter and transform data in Python?
- What are the differences between the apply() and map() functions in pandas?
- How do you visualize data using matplotlib or seaborn in Python?
- How do you write a function to calculate the correlation between two numerical columns in a pandas
DataFrame?

𝗘𝘅𝗰𝗲𝗹
- How would you use VLOOKUP or XLOOKUP to merge data between two Excel sheets?
- What is the difference between absolute and relative cell references, and when would you use each?
- How do you create a pivot table, and what types of data analysis can you perform with it?
- How would you use conditional formatting to highlight cells that meet certain criteria?
- How do you use the IF, AND, and OR functions together to create complex logical tests?

𝗣𝗼𝘄𝗲𝗿 𝗕𝗜
- How would you create and customize a calculated column in Power BI?
- What is the difference between a slicer and a filter in Power BI, and when would you use each?
- How do you create relationships between tables in Power BI, and how do they impact your data model?
- How would you set up row-level security (RLS) to control access to sensitive data in Power BI?
- What is the purpose of DAX functions like CALCULATE and FILTER, and how do you use them?

𝗧𝗮𝗯𝗹𝗲𝗮𝘂
- How do you create a calculated field in Tableau, and what types of calculations can you perform?
- What is a parameter in Tableau, and how can it be used to create interactive dashboards?
- How do you use a dual-axis chart in Tableau to show multiple measures in the same view?
- How would you optimize a Tableau dashboard for performance when working with large datasets?
- How do you create a custom date filter in Tableau to allow users to select specific date ranges?

Interview Questions for Power BI Developer Role: 1st Round & 2nd Round

1st Round

1. Self Introduction.
2. Discussion on my past project, followed by specific questions:
What was the size of your data model?
What were the major challenges in your project, and how did you
overcome them?
3. Which visuals did you use and why?
4. How did you optimize the performance of your reports?
5. Explain the CALCULATE function in Power BI.
6. Difference between SUMX and SUM in Power BI.
7. Explain the difference between Merge and Append with examples.
8. Difference between DATEPERIOD and DATEADD functions in Power BI.
9. What are Pivot and Unpivot? Provide examples.
10. Difference between a view and a stored procedure, with examples.
11. SQL Scenario: Write a query involving LEFT join and handling NULL values.
12. Example: Write a query to return all employees, including those with no associated departments,
from an Employee and Department table.
13. Explain the difference between a Star Schema and a Snowflake Schema.
14. What is a many-to-many relationship? How do you resolve it in Power BI?
15. Explain the concept of a calculated column and a measure.
16. What is the difference between a table and a matrix visual?
17. How would you handle performance issues in Power BI?
18. What are the different types of filters available in Power BI, and how do they impact data?

2nd Round

1. Rate yourself in SQL and Power BI.


2. Difference between UNION and UNION ALL.
3. Explain constraints in SQL with examples:
4. PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, etc.
5. Difference between CROSS join and SELF join, with examples.
6. What is referential integrity?
7. What are incremental data and incremental refresh in Power BI?
8. How to handle a many-to-many relationship in data modeling.
9. Difference between a reference query and an SQL query in Power BI.
10. Difference between Power BI Pro and Power BI Premium licenses.
11. Explain the SUMMARIZE DAX function with an example.
12. What types of reports have you created in Power BI?
13. Example: Have you worked with paginated reports? How do they differ from standard reports?
14. What is normalization, and why is it important in database design?
15. What is the difference between an INNER join and an OUTER join in SQL?
16. Explain the difference between RANKX and ROWNUMBER in DAX.
17. How do you implement row-level security (RLS) in Power BI?
18. Explain the role of Power Query in the ETL process.
19. How do you create dynamic titles in Power BI reports?
20. What is DAX Studio, and how can it help optimize queries?
21. What is the ALLEXCEPT function in DAX?
22. Explain a scenario where you used bookmarks and drill-through in Power BI.
23. How would you manage large datasets in Power BI?

Interview Experience for Power BI Role:


Here are the questions I was asked during my recent interview, which might help you in your
preparation:

1st Round:

Brief self-introduction.
Discussion on my past project, followed by specific questions.
Explain the CALCULATE function in Power BI.
Difference between SUMX and SUM in Power BI.
Explain the difference between Merge and Append with examples.
Difference between DATEPERIOD and DATEADD functions in Power BI.
What are Pivot and Unpivot? Provide examples.
Difference between a view and a stored procedure, with examples.
SQL scenario: Write a query involving LEFT join and handling NULL values.

2nd Round:

Rate yourself in SQL and Power BI.


Difference between UNION and UNION ALL.
Explain constraints in SQL with examples.
Difference between CROSS join and SELF join, with examples.
What is referential integrity?
What are incremental data and incremental refresh in Power BI?
How to handle a many-to-many relationship in data modeling.
Difference between a reference query and an SQL query in Power BI.
Difference between Power BI Pro and Power BI Premium licenses.
Explain the SUMMARIZE DAX function with an example.
What types of reports have you created in Power BI?
These questions covered a range of topics that might help you in your own interview preparation.

#Myntra interview questions for Data Analyst 2024.


1. You have a dataset with missing values. How would you use a combination of Pandas and NumPy to
fill missing values based on the mean of the column?
2. How would you create a new column in a Pandas DataFrame by normalizing an existing numeric
column using NumPy's np.min() and np.max()?
3. Explain how to group a Pandas DataFrame by one column and apply a NumPy function, like np.std()
(standard deviation), to each group.
4. How can you convert a time-series column in a Pandas DataFrame to NumPy's datetime format for
faster time-based calculations?
5. How would you identify and remove outliers from a Pandas DataFrame using NumPy's Z-score method
(scipy.stats.zscore)?
6. How would you use NumPy's percentile() function to calculate specific quantiles for a numeric column
in a Pandas DataFrame?
7. How would you use NumPy's polyfit() function to perform linear regression on a dataset stored in a
Pandas DataFrame?
8. How can you use a combination of Pandas and NumPy to transform categorical data into dummy
variables (one-hot encoding)?
9. How would you use both Pandas and NumPy to split a dataset into training and testing sets based on
a random seed?
10. How can you apply NumPy's vectorize() function on a Pandas Series for better performance?
11. How would you optimize a Pandas DataFrame containing millions of rows by converting columns to
NumPy arrays? Explain the benefits in terms of memory and speed.
12. How can you perform complex mathematical operations, such as matrix multiplication, using NumPy
on a subset of a Pandas DataFrame?
13. Explain how you can use np.select() to perform conditional column operations in a Pandas
DataFrame.
14. How can you handle time series data in Pandas and use NumPy to perform statistical analysis like
rolling variance or covariance?
15. How can you integrate NumPy's random module (np.random) to generate random numbers and add
them as a new column in a Pandas DataFrame?
16. Explain how you would use Pandas' applymap() function combined with NumPy's vectorized
operations to transform all elements in a DataFrame.
17. How can you apply mathematical transformations (e.g., square root, logarithm) from NumPy to
specific columns in a Pandas DataFrame?
18. How would you efficiently perform element-wise operations between a Pandas DataFrame and a
NumPy array of different dimensions?
19. How can you use NumPy functions like np.linalg.inv() or np.linalg.det() for linear algebra operations
on numeric columns of a Pandas DataFrame?
20. Explain how you would compute the covariance matrix between multiple numeric columns of a
DataFrame using NumPy.
21. What are the key differences between a Pandas DataFrame and a NumPy array? When would you
use one over the other?
22. How can you convert a NumPy array into a Pandas DataFrame, and vice versa? Provide an example.

𝗥𝗼𝗹𝗲𝘀 𝗮𝗻𝗱 𝗥𝗲𝘀𝗽𝗼𝗻𝘀𝗶𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗼𝗳 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘀𝘁


1. To convert raw data into interactive reports and dashboards.

2. To find meaningful insights and results of Data given.

3. Domain: Finance, Pharmacy, Manufacturing, Medical, Political, IT, Hotstar.

4. To take requirements from client and full fill it properly.

5. To follow the ETL Process and 3 compressive ways and deployment.

𝗗𝗮𝗶𝗹𝘆 𝗪𝗼𝗿𝗸 𝗟𝗶𝗳𝗲 𝗼𝗳 𝗗𝗮𝘁𝗮 𝗔𝗻𝗮𝗹𝘆𝘀𝘁

1. Stand up Meeting at Morning On Teams.

2. Discussion on three things: What you did yesterday, What today you will do and Sprint Work
Completion.

3. Then start working on Project along with BI Developer.

4. If doubt then check on YouTube, Google, Groups and Community.

5. Client Interaction in Week one time and Project explaining.

6. Our Team has 7-10 people in MNC.

𝗥𝗲𝗮𝗹 𝗧𝗶𝗺𝗲 𝗪𝗼𝗿𝗸 𝗮𝗻𝗱 𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝘀

1. Tools we used are Power BI, Tableau and Qliksense.


Top 3 tools in World

2. Also we work on programming languages like SQL, Python and R.

3. Advanced ETL Tools like SSIS, Informatica and ADF are used.
hashtag#DataAnalyst

Recently asked Power BI interview question

How do you work with large datasets in Power BI?


𝐀𝐧𝐬𝐰𝐞𝐫:

➡ When dealing with large datasets in Power BI, the primary challenge is the size of the data, which can
affect performance, making the report slow to load and refresh.

➡ Managing and visualizing such a vast amount of data requires efficient handling to avoid timeouts and
performance degradation.

➡ One of the strategies I use is to upload a subset of the data into Power BI Desktop initially.

For example, if I have data spanning five years, I might start by uploading only six months of data. This
speeds up the development process on the desktop.

➡ Next, I use the Power Query Editor to filter and aggregate data. This includes removing unnecessary
columns, filtering rows to include only relevant data, and aggregating data at a higher level.

For instance, if detailed transaction data is not necessary, I might aggregate daily sales data to monthly
sales data before loading it into Power BI.

➡ For extremely large datasets, I use DirectQuery mode, which allows Power BI to directly query the
underlying data source without importing the data into the Power BI model.

➡ This keeps the Power BI model lightweight and leverages the processing power of the database
server. However, this requires a well-optimized database and efficient query performance at the source.

➡ Sometimes, I use a combination of Import and DirectQuery modes, known as composite models. This
approach allows for flexibility by importing critical, smaller tables into the Power BI model and using
DirectQuery for larger fact tables.

➡ I ensure that the data model is optimized by creating appropriate relationships and using measures
efficiently.

➡ Reducing the complexity of DAX calculations and ensuring that the model only includes necessary
tables and relationships helps maintain performance.

By employing these strategies, I can manage large datasets efficiently, ensuring that my Power BI
reports are responsive and performant.
I am sharing real interview questions asked in companies nowadays to help you prepare more
practically for interviews.

Deloitte data analyst interview experience (0-3 yr)(sql ,powerbi)

SQL: 5 questions
Q1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using
example.

Q2 – Q4 use Table: employee (EmpID, ManagerID, JoinDate, Dept, Salary)

Q2: Find the nth highest salary from the Employee table.

Q3: You have an employee table with employee ID and manager ID. Find all employees under a specific
manager, including their subordinates at any level.

Q4: Write a query to find the cumulative salary of employees department-wise, who have joined
company in last 30 days.

Q5: Find the top 2 customers with the highest order amount for each product category, handling ties
appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)

Power BI: 3 questions


Q1: Explain step-by-step how will you create a sales dashboard from scratch.

Q2: Explain how you can optimize a slow Power BI report.

Q3: Explain Any 5 Chart Types and Their Uses in Representing Different Aspects of Data.

Behavioral: 2 Questions
Q1: Why you want to become a data analyst and why did you apply to this company?

Q2: Describe a time when you had to manage a difficult task with tight deadlines. How did you handle
it?

Flipkart Business Analyst Interview Experience (1-3 years):

SQL Questions:

1. What are window functions, and how do they differ from aggregate functions? Can you give a use
case?
2. Explain indexing. When would an index potentially reduce performance, and how would you approach
indexing strategy for a large dataset?
3. Write a query to retrieve customers who have made purchases in the last 30 days but did not
purchase anything in the previous 30 days.
4. Given a table of transactions, find the top 3 most purchased products for each category.
5. How would you identify duplicate records in a large dataset, and how would you remove only the
duplicates, retaining the first occurrence?

Guesstimates:

1. Estimate the number of online food delivery orders in a large metropolitan city over a month.
2. How many customer service calls would a telecom company receive daily for a customer base of 1
million?

Case Studies:

1. A sudden decrease in conversion rate is observed in a popular product category. How would you
investigate the cause and propose solutions?
2. Imagine the company is considering adding a new subscription model. How would you evaluate its
potential impact on customer lifetime value and revenue?

Managerial Questions:

1. Describe a time when you faced conflicting priorities on a project. How did you manage your workload
to meet deadlines?
2. How would you handle a disagreement within the team on an analytical approach?

Python Questions:

1. Write a Python function to find the longest consecutive sequence of unique numbers in a list.
2. If you’re working with a large dataset with missing values, what Python libraries would you use to
handle missing data, and why?

Technical Interview Questions for Power BI and SQL

1. Can you briefly tell me about yourself, the projects you have worked on, the technology you have
used, and your day-to-day responsibilities in your current project?
2. Suppose there is a large query with multiple joins and logic written within it, which used to execute in
10 to 15 minutes but has recently started taking more than 30 minutes. What would be your approach
to diagnose and resolve this issue? What would you look into within the query to identify what might
have changed?
3. Can you explain the types of indexes available in SQL and how they work? How many types of
indexes are supported by SQL?
4. What are DDL, DML and DQL commands in the SQL?
5. Suppose there is an organization with an employee table containing employee ID, employee name,
department, and salary. How would you find the 5th highest salary in that table?
6. Difference between Rank and Dense Rank function?
7. How do you calculate the number of years an employee has spent in the current organization using
DAX in Power BI?
8. Suppose you have two fact tables, Fact A and Fact B, and three dimension tables, Dimension A,
Dimension B, and Dimension C. Fact A is connected to all three dimension tables, and Fact B is also
connected to the same three dimension tables. How can you establish a relationship between Fact A and
Fact B?
9. Difference between Calculated Column and Measure?
10. Difference between Table and Matrix in Power BI?
11. If I have 2 tables in data modeling in power bi and relationship between them is many to many how i
can avoid that?
12. Explain me the scenario and please provide me the best optimized solution to handle this. Let's say
you have 100 excel files, each and every Excel file is meant for each and everyday sales data or order
data?
13. How many types of Active Relationship in Power BI?
14. Can you please tell me why we would used dateadd function Power BI?
15. Can you tell me the SUM and SUMX difference in Power BI?
16. Have you used custom visual charts?
17. What is difference between Direct Query and Live Connection?
18. Are you aware about Microsoft Fabric?
19. Major challenges you have faced in current Power BI Projects?
20. How you have used Power BI Service for what purpose?
21. How you deployed report from one environment to Another?
22. Can you please tell me workspace management?
23. What are the different versions of Power BI?
24. Have you ever faced a failure due to data type errors? Can you share any experiences related to
this?
25. Have you used Power Automate and Power Apps?
26. When you gathering the requirement from the client what is your go to approach?
27. Have you worked with data flows?
28. What kind of changes you are doing when once report back to your end from client?
29. What types of transformations have you applied in your reports or data models?
30. Can you name some important KPIs for the insurance and sales domains

PwC Data Analyst Interview Experience (1-3 years):

SQL Questions:

1. Write a query to find the cumulative revenue by month for each product category in a sales table.
2. How would you retrieve the top 5 products by sales volume, excluding any products that had zero
sales in the past 3 months?
3. Given a table of customer transactions, identify all customers who made purchases in two or more
consecutive months.
4. Write a query to calculate the retention rate of users on a monthly basis.
5. Find the nth highest salary from an employee table, where n is a parameter passed dynamically to
the query.
6. Explain how indexing works in SQL and how to decide which columns should be indexed for optimal
performance.
7. Describe the differences between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN and when to use each
one in a complex query.
8. What is the difference between HAVING and WHERE clauses in SQL, and when would you use each?

Python Questions:

1. Write a Python function to find all unique pairs of integers in a list that sum up to a given target value.
2. Given a string, write a function to check if it’s a palindrome, ignoring spaces, punctuation, and case
sensitivity.
3. Explain the difference between deep copy and shallow copy in Python. When would you use each?
4. What are decorators in Python, and how do they work? Provide an example of a scenario where a
decorator would be useful.

Guesstimate Questions:
1. Estimate the number of smartphones sold in India annually.
2. How would you estimate the daily revenue generated by roadside tea stalls across India?

Microsoft Data Analyst Interview Experience(1-5 years) :


CTC - 35 LPA

SQL Questions:

1️⃣Most Popular Client ID:


Select the most popular client_id based on a count of the number of users who have at least 50% of
their events from the following list: ‘video call received’, ‘video call sent’, ‘voice call received’, ‘voice call
sent’.

2️⃣Desktop-Only Users:
Write a query that returns the company (customer_id column) with the highest number of users that use
desktop only.

3️⃣Bottom Companies by Mobile Usage:


Write a query that returns a list of the bottom 2 companies by mobile usage. Company is defined in the
customer_id column. Mobile usage is defined as the number of events registered on a client_id ==
'mobile'. Order the result by the number of events ascending. In the case where there are multiple
companies tied for the bottom ranks (rank 1 or 2), return all the companies. Output the customer_id and
number of events.

4️⃣Exclusive Users per Client:


Write a query that returns a number of users who are exclusive to only one client. Output the client_id
and number of exclusive users.

5️⃣Unique Users per Client per Month:


Write a query that returns the number of unique users per client per month.

6️⃣Monthly User Share (New vs. Existing):


Calculate the share of new and existing users for each month in the table. Output the month, share of
new users, and share of existing users as a ratio. New users are defined as users who started using
services in the current month (there is no usage history in previous months). Existing users are users
who used services in the current month but also used services in any previous month. Assume that the
dates are all from the year 2020.
Table Structure
• id: int64
• time_id: datetime64[ns]
• user_id: object
• customer_id: object
• client_id: object
• event_type: object
• event_id: int64

Hello everyone!

Power BI consultant Question asked in EY for Associate level

1. What are the different types of filters available in Power BI?


2. Explain the difference between calculated columns and measures in Power BI.
3. How can you improve the performance of Power BI reports and dashboards?
4. What is the purpose of relationships in Power BI, and how do you create them?
5. How do you use DAX for advanced calculations? Can you explain some commonly used DAX
functions?
6. How would you handle a many-to-many relationship in Power BI?
7. Explain the difference between Power Query and Power Pivot.
8. How do you handle data transformation in Power BI using Power Query Editor?
9. How do you deal with missing or incomplete data in Power BI?
10. How would you design an interactive dashboard for a client with a large dataset?
11. Can you walk me through your process for selecting the appropriate visual for a specific type of
data?
12. How do you implement drill-through functionality in Power BI?
13. How do you integrate Power BI with other Microsoft services like Azure or Excel?
14. Have you worked on Power BI service? How do you schedule data refreshes in Power BI Service?
15. Can you describe a challenging project you’ve worked on in Power BI and how you handled it?
16. How do you gather and understand business requirements for a Power BI project?
17. How do you explain technical concepts of Power BI to non-technical stakeholders?
18. How do you ensure data security and privacy when working with sensitive client data?

Microsoft 9 Data Analyst Interview Experience(1-5 years):

CTC-35 LPA

SQL Questions:

1 Most Popular Client ID:

Select the most popular client_id based on a count of the number of users who have at least 50% of
their events from the following list: 'video call received', 'video call sent', 'voice call received', 'voice call
sent'.
2 Desktop-Only Users:

Write a query that returns the company (customer_id column) with the highest number of users that use
desktop only.

3 Bottom Companies by Mobile Usage:

Write a query that returns a list of the bottom 2 companies by mobile usage. Company is defined in the
customer_id column. Mobile usage is defined as the number of events registered on a client_id ==
'mobile'. Order the result by the number of events ascending. In the case where there are multiple
companies tied for the bottom ranks (rank 1 or 2), return all the companies. Output the customer_id and
number of events.

4 Exclusive Users per Client:

Write a query that returns a number of users who are exclusive to only one client. Output the client_id
and number of exclusive

5 Unique Users per Client per Month: Write a query that returns the number of unique users per client
per month.

6 Monthly User Share (New vs. Existing): Calculate the share of new and existing users for each month
in the table. Output the month, share of new users, and share of existing users as a ratio. New users are
defined as users who started using services in the current month (there is no usage history in previous
months). Existing users are users who used services in the current month but also used services in any
previous month. Assume that the dates are all from the year 2020.

Table Structure

• id: int64

• time_id: datetime64[ns]

• user_id: object

• customer_id: object

• client_id: object

⚫ event_type: object

⚫ event_id: int64

Recent Interview Question for Data Analyst Role at Deloitte :-

Question 1)
What is the difference between a KPI (Key Performance Indicator) and a dimension?

KPI (Key Performance Indicator):


This is a measurable value that shows how well a company is achieving its key business objectives.
Think of KPIs as the important numbers that tell you how well you're doing. For example, if you're
running a sales team, a KPI might be "total sales revenue" or "number of new customers."

Dimension:

This is a characteristic or attribute that you can use to categorize and analyze data. Dimensions provide
context to your KPIs. For example, in the sales team scenario, dimensions might be "sales region,"
"product type," or "salesperson."

In summary, KPIs are the important metrics you're trying to improve, while dimensions are the
categories you use to break down and analyze those metrics.(simple text data)

You might also like