Amazon Data Analyst Interview Prep
Amazon Data Analyst Interview Prep
A. SQL:
Write an SQL query to find the name of the employee with the highest salary in each location.
Write an SQL query to calculate the total order amount for each customer who joined in the current
year. The output should contain Customer_Name and the total amount.
B. Python:
1. Basic oral questions on NumPy (e.g., array creation, slicing, broadcasting) and Matplotlib (e.g., plot
types, customization).
2. Basic oral questions on pandas (like: groupby, loc/iloc, merge & join, etc.)
2. Write the code in NumPy and Pandas to replicate the functionality of your answer to the second SQL
question.
- Describe a situation where you had to make a quick decision with limited information. How did you
proceed, and what was the outcome?
- Can you share an example of a project where you had to delve deeply into the data to uncover insights
or solve a problem? What steps did you take, and what were the results?
- Tell us about a time when you went above and beyond to meet a customer's needs or expectations.
How did you identify their requirements, and what actions did you take to deliver exceptional service?
D. Excel:
Questions on advanced functions like VLOOKUP, XLookup, SUMPRODUCT, INDIRECT, TEXT functions,
SUMIFS, COUNTIFS, LOOKUPS, INDEX & MATCH, AVERAGEIFS. Plus, some basic questions on pivot tables,
conditional formatting, data validation, and charts.
Amazon interview questions for Data Analyst 2024.
1. How would you retrieve the second highest salary from a table called Employees without using LIMIT
or TOP?
2. Write a query to display employees who joined in the same month but in different years from the
Employees table.
3. Given two tables, Orders and Customers, write a query to find all customers who placed more than
five orders in the last year.
4. How would you update the Department column in the Employees table based on a matching
EmployeeID in another table called Departments?
5. Write a SQL query to find the total sales for each product category, but exclude categories with total
sales less than a specific threshold (e.g., $10,000).
6. How would you create a Python function that reads a large CSV file in chunks and processes each
chunk for data cleaning?
7. Write a Python script that takes a list of employee records, filters out records where the salary is
below a certain value, and writes the filtered records to a new file.
8. How would you handle missing values in a dataset using pandas, and how would you decide which
method to use (e.g., mean imputation vs. forward fill)?
9. Given a list of numbers, write a Python program to group the numbers into ranges (e.g., 1-10, 11-20)
and count the number of elements in each range.
10. How would you connect to an SQL database using Python and fetch data into a pandas DataFrame
for analysis?
11. How would you use VLOOKUP or INDEX-MATCH to find a value in one Excel sheet and return
corresponding information from another sheet?
12. Write an Excel formula that calculates the weighted average of a set of numbers, where the weights
are stored in another column.
13. How would you create a dynamic Excel dashboard that allows users to filter data by multiple criteria
and display the results visually (e.g., via charts or pivot tables)?
14. Explain how you would use Excel Solver to optimize a product mix for maximizing profit under given
constraints.
15. How can you create a measure that calculates the running total of sales over time, and how would
you display it in a line chart?
16. How would you use Power Query to clean and transform a dataset, such as removing duplicates,
splitting columns, and filtering rows based on conditions?
17. Describe a scenario where you would need to use Merge Queries in Power Query, and how would
you do it?
18. How can you create a custom tooltip in Power BI to show additional information when a user hovers
over a visual?
Write an SQL query to find the top 5 customers who have made the highest total transactions in terms
of the amount. Display the customer’s name and their total transaction amount, sorted in descending
order of total transaction amount.
Schema:
customers(customer_id, customer_name)
transactions(transaction_id, customer_id, transaction_amount, transaction_date)
Solution:
SELECT c.customer_name,
SUM(t.transaction_amount) AS total_spent
FROM customers c
JOIN transactions t
ON c.customer_id = t.customer_id
GROUP BY c.customer_name
ORDER BY total_spent DESC
LIMIT 5;
Explanation:
- The query first joins the customers and transactions tables on the customer_id.
- Then, it calculates the total transaction amount (SUM) for each customer using GROUP BY on
customer_name.
- It orders the result by the total spent in descending order (ORDER BY total_spent DESC).
- Finally, it limits the result to the top 5 customers.
This type of SQL problem helps in analyzing transactional data, which is crucial for roles like Data
Analysts in banks to monitor customer behavior and financial trends.
Top Scenario-Based Questions & Answers for Data analyst
1. Scenario: You are managing a SQL database for an e-commerce platform. The "Products" table
includes a column for "ProductCategory," and the "Orders" table records each sale. Your task is to
identify the top 3 best-selling product categories.
Question:
Write a SQL query to find the top 3 best-selling product categories based on the number of orders.
Expected Answer:
SELECT ProductCategory, COUNT(*) AS TotalOrders
FROM Orders
JOIN Products ON Orders.ProductID = Products.ProductID
GROUP BY ProductCategory
ORDER BY TotalOrders DESC
LIMIT 3;
2. Scenario: You are working with a SQL database that stores sales data. The database has a table called
"Sales" that records the sale date and amount for each transaction. Your task is to calculate the average
monthly sales for the current year.
Question:
Write a SQL query to calculate the average monthly sales for the current year.
Expected Answer:
SELECT MONTH(SaleDate) AS SaleMonth, AVG(SaleAmount) AS AvgMonthlySales
FROM Sales
WHERE YEAR(SaleDate) = YEAR(GETDATE())
GROUP BY MONTH(SaleDate);
4. Can you explain what SQL is and how it is used in data analysis?
- SQL (Structured Query Language) is a programming language designed for managing and
manipulating relational databases. It allows analysts to query data, perform operations like filtering and
aggregating, and retrieve specific information from large datasets.
10. How do you ensure the accuracy and integrity of your data analysis?
- Use multiple data sources to cross-verify information.
- Implement data validation techniques during data collection.
- Regularly audit and review datasets for inconsistencies.
- Document my analysis process for transparency
#confidential interview questions for data analyst
SQL:
1. What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?
2. How do you use GROUP BY and HAVING in SQL queries?
3. What is a CROSS JOIN, and how does it differ from other joins?
4. How can you optimize SQL queries for better performance?
5. Explain the difference between WHERE and HAVING clauses.
Excel:
1. How do you use Pivot Tables in Excel, and what are their key benefits?
2. Explain the VLOOKUP and HLOOKUP functions. How are they different?
3. How would you use Excel to find duplicate entries in a dataset?
4. What are Excel’s conditional formatting capabilities, and how do they work?
5. How do you handle large datasets in Excel without affecting performance?
Power BI:
1. What is the difference between a calculated column and a measure in Power BI?
2. How do you perform data modeling in Power BI, and why is it important?
3. Explain the importance of DAX (Data Analysis Expressions) in Power BI. Can you give an example?
4. How do you implement row-level security in Power BI?
5. What are the different types of visualizations available in Power BI, and when would you use each?
Python:
1. How would you use Python for data cleaning? Provide examples using pandas.
2. What are the common data structures in Python used for data manipulation (e.g., lists, dictionaries,
DataFrames)?
3. Explain the difference between loc[] and iloc[] in Pandas.
4. How can you handle missing values in a dataset using Python?
5. What is the difference between apply(), map(), and applymap() in Pandas?
1. Tell me about yourself and your roles & responsibilities at your current workplace.
3. Can you write a query to find values in a text column that begin with a specific letter?
4. What is the purpose of a Common Table Expression (CTE) and how does it differ from subqueries?
5. Given a table with two columns (Name, Skills), how would you find candidates best suited for a Data
Analyst role that requires proficiency in SQL, Power BI, and Python?
6. We discussed scenario-based questions related to data visualization tools like Power BI:
a. What are some advantages of using Power BI in a large enterprise?
b. How would you handle report sharing and collaboration in Power BI Service?
7. Can you explain the difference between inner join and left join in SQL with an example?
Question 3)
What is the difference between a KPI (Key Performance Indicator) and a dimension?
This is a measurable value that shows how well a company is achieving its key business objectives.
Think of KPIs as the important numbers that tell you how well you're doing. For example, if you're
running a sales team, a KPI might be "total sales revenue" or "number of new customers."
Dimension:
This is a characteristic or attribute that you can use to categorize and analyze data. Dimensions provide
context to your KPIs. For example, in the sales team scenario, dimensions might be "sales region,"
"product type," or "salesperson."
In summary, KPIs are the important metrics you're trying to improve, while dimensions are the
categories you use to break down and analyze those metrics.(simple text data)
1️⃣How would you implement a self-service BI solution using Power BI for a large enterprise client? What
considerations would you keep in mind?
2️⃣Explain the process of setting up and managing a Power BI Premium capacity. How does it differ from
shared capacity?
3️⃣How would you approach data governance and compliance in a Power BI implementation?
4️⃣Describe the process of implementing automated machine learning (Auto ML) in Power BI. What are its
limitations and best use cases?
5️⃣Explain the concept of composite models in Power BI. How can they be used to combine DirectQuery
and Import mode data sources?
6️⃣How would you implement a multi-language solution in Power BI for a global organization?
7️⃣Describe the process of creating and using Power BI dataflows for ETL processes. How do they compare
to traditional ETL tools?
8️⃣How would you approach performance tuning for a Power BI report that uses DirectQuery mode with a
large SQL Server database?
9️⃣Explain the concept of dynamic row-level security (RLS) in Power BI. How would you implement it for a
complex organizational hierarchy?
🔟How would you design a Power BI solution that integrates with Azure Synapse Analytics for big data
processing?
1️⃣1️⃣Describe the process of implementing paginated reports in Power BI. When would you choose
paginated reports over standard Power BI reports?
Hello Everyone,
Today I'm sharing the interview questions asked for Power BI role in Infosys company.
SQL
POWER BI
PYTHON
1. Create a dictionary, add elements to it, modify an element, and then print the dictionary in
alphabetical order of keys.
2. Find unique values in a list of assorted numbers and print the count of how many times each value is
repeated.
3. Find and print duplicate values in a list of assorted numbers, along with the number of times each
value is repeated.
Apart from
Here are the top 5 interview questions from Amazon for your practice..
𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Given a table of product reviews, calculate the average review rating for each
product for every month. The data is in a 𝑟𝑒𝑣𝑖𝑒𝑤𝑠 table, which includes 𝑟𝑒𝑣𝑖𝑒𝑤_𝑖𝑑, 𝑢𝑠𝑒𝑟_𝑖𝑑,
𝑠𝑢𝑏𝑚𝑖𝑡_𝑑𝑎𝑡𝑒, 𝑝𝑟𝑜𝑑𝑢𝑐𝑡_𝑖𝑑 and 𝑠𝑡𝑎𝑟𝑠.
𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Amazon handles massives datasets, and optimizing SQL queries is crucial for
performance. Discuss various methods to optimize a slow SQL query.
3. 𝑆𝑄𝐿 𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡𝑠
𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Explain SQL constraints and provide examples of different types of constraints
used to enforce data integrity in databases.
4. 𝐻𝑖𝑔ℎ𝑒𝑠𝑡-𝐺𝑟𝑜𝑠𝑠𝑖𝑛𝑔-𝐼𝑡𝑒𝑚𝑠
𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Find the top two highest grossing-products in each category for the year 2022
from a table 𝑝𝑟𝑜𝑑𝑢𝑐𝑡_𝑠𝑝𝑒𝑛𝑑. The table contains 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦, 𝑝𝑟𝑜𝑑𝑢𝑐𝑡, 𝑢𝑠𝑒𝑟_𝑖𝑑, 𝑠𝑝𝑒𝑛𝑑 and
𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛_𝑑𝑎𝑡𝑒. The output should include the category, product and total spend.
𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭: Explain the difference between RANK() and DENSE_RANK() functions in SQL.
⏩ How can you use Power BI dataflows to improve the overall architecture of your BI solution?
⏩ Explain how you would implement incremental refresh in Power BI. What are the requirements and
limitations?
⏩ Describe the process of creating and using calculation groups in Power BI. What are their advantages?
⏩ How would you approach version control and CI/CD for Power BI reports in an enterprise setting?
⏩ Explain the concept of query folding in Power Query. How can you ensure your transformations are
being folded back to the source?
DAX Scenarios:
⏩ Create a measure that shows the top N customers by sales, with all others grouped into an "Others"
category?
⏩ Describe the process of creating and using data warehouses in Microsoft Fabric. How does this
integrate with Power BI?
⏩ How does Microsoft Fabric handle data governance and security? Discuss features like data lineage
and access controls?
⏩ Explain the concept of OneLake in Microsoft Fabric. How does it simplify data management across
different analytics workloads?
⏩ How would you migrate an existing Power BI solution to leverage Microsoft Fabric capabilities?
⏩ Discuss the advantages and potential challenges of using Microsoft Fabric in an enterprise BI
ecosystem?
Interview questions asked for Data Analyst role in Cognizant
Question1) Explain the DAX measures you would use to create a rolling 12-month average in Power BI,
considering potential fiscal year differences?
It shows the average of something (like sales) for the past 12 months. Example: In December, you show
the average from January to December.
In Power BI, we use DAX formulas to do calculations. Two important functions for this are:
DATESINPERIOD: Looks back 12 months from a specific date.
AVERAGEX: Calculates the average for those 12 months.
It looks at the last 12 months of data starting from the latest date. It calculates the average of what you
care about, like sales or revenue.
Some companies start their fiscal year in a different month (e.g., April instead of January). If so, make
sure your calendar in Power BI reflects this: Create or use a custom calendar that starts in the correct
fiscal month (e.g., April). Adjust the start date in the formula to match your fiscal year.
Affine | Data Analyst | Experience: 5 years | Excel, SQL and Power BI | hashtag#Round1
Q1. Can you explain the significance of the dollar sign ($) in the VLOOKUP function?
Q2. How do you create a pivot table in Excel, and what is its importance?
Q3. What is the difference between calculated columns and measures in Power BI?
Q4. Could you describe the DAX formulas you used in your most recent project and their practical
applications?
Q5. What types of data sources have you connected to in Power Bl, and can you walk me through the
process?
Q6. What are the key differences between Import, Direct Query, and Live Connection modes in Power
BI?
Q7. What are ad hoc reporting requirements, and how do you approach them?
Q8. Can you explain a scenario where you ensured thorough testing while handling ad hoc requests?
Q9. What tools have you used when working on ad hoc reporting requirements?
Q10. What are the limitations or challenges of working with ad hoc reporting requests?
Q11. How would you write a SQL query to find the nth highest salary of an employee without using the
LIMIT clause?
Q12. Write a SQL query to retrieve even-numbered employee IDs from the employee table.
Q13. Why would you use the DENSE_RANK() function instead of RANK() when finding the nth highest
salary?
Q14. What is Row-Level Security (RLS) in Power BI, what are its types, and how do you implement it?
Q16. Can you explain the purpose of the CONCAT function in Excel and provide an example of its usage?
𝗦𝗤𝗟
- How do you write a query to find duplicate rows in a table?
- How would you perform a left join and filter out nulls in SQL?
- What is a window function in SQL, and how do you use it for ranking data?
- How do you calculate the cumulative sum for a column in SQL?
- What is the difference between UNION and UNION ALL in SQL?
𝗣𝘆𝘁𝗵𝗼𝗻
- How do you import a CSV file into a pandas DataFrame, and how would you handle missing data?
- How do you use list comprehensions to filter and transform data in Python?
- What are the differences between the apply() and map() functions in pandas?
- How do you visualize data using matplotlib or seaborn in Python?
- How do you write a function to calculate the correlation between two numerical columns in a pandas
DataFrame?
𝗘𝘅𝗰𝗲𝗹
- How would you use VLOOKUP or XLOOKUP to merge data between two Excel sheets?
- What is the difference between absolute and relative cell references, and when would you use each?
- How do you create a pivot table, and what types of data analysis can you perform with it?
- How would you use conditional formatting to highlight cells that meet certain criteria?
- How do you use the IF, AND, and OR functions together to create complex logical tests?
𝗣𝗼𝘄𝗲𝗿 𝗕𝗜
- How would you create and customize a calculated column in Power BI?
- What is the difference between a slicer and a filter in Power BI, and when would you use each?
- How do you create relationships between tables in Power BI, and how do they impact your data model?
- How would you set up row-level security (RLS) to control access to sensitive data in Power BI?
- What is the purpose of DAX functions like CALCULATE and FILTER, and how do you use them?
𝗧𝗮𝗯𝗹𝗲𝗮𝘂
- How do you create a calculated field in Tableau, and what types of calculations can you perform?
- What is a parameter in Tableau, and how can it be used to create interactive dashboards?
- How do you use a dual-axis chart in Tableau to show multiple measures in the same view?
- How would you optimize a Tableau dashboard for performance when working with large datasets?
- How do you create a custom date filter in Tableau to allow users to select specific date ranges?
Interview Questions for Power BI Developer Role: 1st Round & 2nd Round
1st Round
1. Self Introduction.
2. Discussion on my past project, followed by specific questions:
What was the size of your data model?
What were the major challenges in your project, and how did you
overcome them?
3. Which visuals did you use and why?
4. How did you optimize the performance of your reports?
5. Explain the CALCULATE function in Power BI.
6. Difference between SUMX and SUM in Power BI.
7. Explain the difference between Merge and Append with examples.
8. Difference between DATEPERIOD and DATEADD functions in Power BI.
9. What are Pivot and Unpivot? Provide examples.
10. Difference between a view and a stored procedure, with examples.
11. SQL Scenario: Write a query involving LEFT join and handling NULL values.
12. Example: Write a query to return all employees, including those with no associated departments,
from an Employee and Department table.
13. Explain the difference between a Star Schema and a Snowflake Schema.
14. What is a many-to-many relationship? How do you resolve it in Power BI?
15. Explain the concept of a calculated column and a measure.
16. What is the difference between a table and a matrix visual?
17. How would you handle performance issues in Power BI?
18. What are the different types of filters available in Power BI, and how do they impact data?
2nd Round
1st Round:
Brief self-introduction.
Discussion on my past project, followed by specific questions.
Explain the CALCULATE function in Power BI.
Difference between SUMX and SUM in Power BI.
Explain the difference between Merge and Append with examples.
Difference between DATEPERIOD and DATEADD functions in Power BI.
What are Pivot and Unpivot? Provide examples.
Difference between a view and a stored procedure, with examples.
SQL scenario: Write a query involving LEFT join and handling NULL values.
2nd Round:
2. Discussion on three things: What you did yesterday, What today you will do and Sprint Work
Completion.
3. Advanced ETL Tools like SSIS, Informatica and ADF are used.
hashtag#DataAnalyst
➡ When dealing with large datasets in Power BI, the primary challenge is the size of the data, which can
affect performance, making the report slow to load and refresh.
➡ Managing and visualizing such a vast amount of data requires efficient handling to avoid timeouts and
performance degradation.
➡ One of the strategies I use is to upload a subset of the data into Power BI Desktop initially.
For example, if I have data spanning five years, I might start by uploading only six months of data. This
speeds up the development process on the desktop.
➡ Next, I use the Power Query Editor to filter and aggregate data. This includes removing unnecessary
columns, filtering rows to include only relevant data, and aggregating data at a higher level.
For instance, if detailed transaction data is not necessary, I might aggregate daily sales data to monthly
sales data before loading it into Power BI.
➡ For extremely large datasets, I use DirectQuery mode, which allows Power BI to directly query the
underlying data source without importing the data into the Power BI model.
➡ This keeps the Power BI model lightweight and leverages the processing power of the database
server. However, this requires a well-optimized database and efficient query performance at the source.
➡ Sometimes, I use a combination of Import and DirectQuery modes, known as composite models. This
approach allows for flexibility by importing critical, smaller tables into the Power BI model and using
DirectQuery for larger fact tables.
➡ I ensure that the data model is optimized by creating appropriate relationships and using measures
efficiently.
➡ Reducing the complexity of DAX calculations and ensuring that the model only includes necessary
tables and relationships helps maintain performance.
By employing these strategies, I can manage large datasets efficiently, ensuring that my Power BI
reports are responsive and performant.
I am sharing real interview questions asked in companies nowadays to help you prepare more
practically for interviews.
SQL: 5 questions
Q1: Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() functions using
example.
Q2: Find the nth highest salary from the Employee table.
Q3: You have an employee table with employee ID and manager ID. Find all employees under a specific
manager, including their subordinates at any level.
Q4: Write a query to find the cumulative salary of employees department-wise, who have joined
company in last 30 days.
Q5: Find the top 2 customers with the highest order amount for each product category, handling ties
appropriately. Table: customer (CustomerID, ProductCategory, OrderAmount)
Q3: Explain Any 5 Chart Types and Their Uses in Representing Different Aspects of Data.
Behavioral: 2 Questions
Q1: Why you want to become a data analyst and why did you apply to this company?
Q2: Describe a time when you had to manage a difficult task with tight deadlines. How did you handle
it?
SQL Questions:
1. What are window functions, and how do they differ from aggregate functions? Can you give a use
case?
2. Explain indexing. When would an index potentially reduce performance, and how would you approach
indexing strategy for a large dataset?
3. Write a query to retrieve customers who have made purchases in the last 30 days but did not
purchase anything in the previous 30 days.
4. Given a table of transactions, find the top 3 most purchased products for each category.
5. How would you identify duplicate records in a large dataset, and how would you remove only the
duplicates, retaining the first occurrence?
Guesstimates:
1. Estimate the number of online food delivery orders in a large metropolitan city over a month.
2. How many customer service calls would a telecom company receive daily for a customer base of 1
million?
Case Studies:
1. A sudden decrease in conversion rate is observed in a popular product category. How would you
investigate the cause and propose solutions?
2. Imagine the company is considering adding a new subscription model. How would you evaluate its
potential impact on customer lifetime value and revenue?
Managerial Questions:
1. Describe a time when you faced conflicting priorities on a project. How did you manage your workload
to meet deadlines?
2. How would you handle a disagreement within the team on an analytical approach?
Python Questions:
1. Write a Python function to find the longest consecutive sequence of unique numbers in a list.
2. If you’re working with a large dataset with missing values, what Python libraries would you use to
handle missing data, and why?
1. Can you briefly tell me about yourself, the projects you have worked on, the technology you have
used, and your day-to-day responsibilities in your current project?
2. Suppose there is a large query with multiple joins and logic written within it, which used to execute in
10 to 15 minutes but has recently started taking more than 30 minutes. What would be your approach
to diagnose and resolve this issue? What would you look into within the query to identify what might
have changed?
3. Can you explain the types of indexes available in SQL and how they work? How many types of
indexes are supported by SQL?
4. What are DDL, DML and DQL commands in the SQL?
5. Suppose there is an organization with an employee table containing employee ID, employee name,
department, and salary. How would you find the 5th highest salary in that table?
6. Difference between Rank and Dense Rank function?
7. How do you calculate the number of years an employee has spent in the current organization using
DAX in Power BI?
8. Suppose you have two fact tables, Fact A and Fact B, and three dimension tables, Dimension A,
Dimension B, and Dimension C. Fact A is connected to all three dimension tables, and Fact B is also
connected to the same three dimension tables. How can you establish a relationship between Fact A and
Fact B?
9. Difference between Calculated Column and Measure?
10. Difference between Table and Matrix in Power BI?
11. If I have 2 tables in data modeling in power bi and relationship between them is many to many how i
can avoid that?
12. Explain me the scenario and please provide me the best optimized solution to handle this. Let's say
you have 100 excel files, each and every Excel file is meant for each and everyday sales data or order
data?
13. How many types of Active Relationship in Power BI?
14. Can you please tell me why we would used dateadd function Power BI?
15. Can you tell me the SUM and SUMX difference in Power BI?
16. Have you used custom visual charts?
17. What is difference between Direct Query and Live Connection?
18. Are you aware about Microsoft Fabric?
19. Major challenges you have faced in current Power BI Projects?
20. How you have used Power BI Service for what purpose?
21. How you deployed report from one environment to Another?
22. Can you please tell me workspace management?
23. What are the different versions of Power BI?
24. Have you ever faced a failure due to data type errors? Can you share any experiences related to
this?
25. Have you used Power Automate and Power Apps?
26. When you gathering the requirement from the client what is your go to approach?
27. Have you worked with data flows?
28. What kind of changes you are doing when once report back to your end from client?
29. What types of transformations have you applied in your reports or data models?
30. Can you name some important KPIs for the insurance and sales domains
SQL Questions:
1. Write a query to find the cumulative revenue by month for each product category in a sales table.
2. How would you retrieve the top 5 products by sales volume, excluding any products that had zero
sales in the past 3 months?
3. Given a table of customer transactions, identify all customers who made purchases in two or more
consecutive months.
4. Write a query to calculate the retention rate of users on a monthly basis.
5. Find the nth highest salary from an employee table, where n is a parameter passed dynamically to
the query.
6. Explain how indexing works in SQL and how to decide which columns should be indexed for optimal
performance.
7. Describe the differences between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN and when to use each
one in a complex query.
8. What is the difference between HAVING and WHERE clauses in SQL, and when would you use each?
Python Questions:
1. Write a Python function to find all unique pairs of integers in a list that sum up to a given target value.
2. Given a string, write a function to check if it’s a palindrome, ignoring spaces, punctuation, and case
sensitivity.
3. Explain the difference between deep copy and shallow copy in Python. When would you use each?
4. What are decorators in Python, and how do they work? Provide an example of a scenario where a
decorator would be useful.
Guesstimate Questions:
1. Estimate the number of smartphones sold in India annually.
2. How would you estimate the daily revenue generated by roadside tea stalls across India?
SQL Questions:
2️⃣Desktop-Only Users:
Write a query that returns the company (customer_id column) with the highest number of users that use
desktop only.
Hello everyone!
CTC-35 LPA
SQL Questions:
Select the most popular client_id based on a count of the number of users who have at least 50% of
their events from the following list: 'video call received', 'video call sent', 'voice call received', 'voice call
sent'.
2 Desktop-Only Users:
Write a query that returns the company (customer_id column) with the highest number of users that use
desktop only.
Write a query that returns a list of the bottom 2 companies by mobile usage. Company is defined in the
customer_id column. Mobile usage is defined as the number of events registered on a client_id ==
'mobile'. Order the result by the number of events ascending. In the case where there are multiple
companies tied for the bottom ranks (rank 1 or 2), return all the companies. Output the customer_id and
number of events.
Write a query that returns a number of users who are exclusive to only one client. Output the client_id
and number of exclusive
5 Unique Users per Client per Month: Write a query that returns the number of unique users per client
per month.
6 Monthly User Share (New vs. Existing): Calculate the share of new and existing users for each month
in the table. Output the month, share of new users, and share of existing users as a ratio. New users are
defined as users who started using services in the current month (there is no usage history in previous
months). Existing users are users who used services in the current month but also used services in any
previous month. Assume that the dates are all from the year 2020.
Table Structure
• id: int64
• time_id: datetime64[ns]
• user_id: object
• customer_id: object
• client_id: object
⚫ event_type: object
⚫ event_id: int64
Question 1)
What is the difference between a KPI (Key Performance Indicator) and a dimension?
Dimension:
This is a characteristic or attribute that you can use to categorize and analyze data. Dimensions provide
context to your KPIs. For example, in the sales team scenario, dimensions might be "sales region,"
"product type," or "salesperson."
In summary, KPIs are the important metrics you're trying to improve, while dimensions are the
categories you use to break down and analyze those metrics.(simple text data)