0% found this document useful (0 votes)
10 views5 pages

Interview Questions

The document outlines various SQL concepts, including the use of WHERE and HAVING clauses, CTEs, and joins, along with practical examples using sample data. It also discusses how to calculate pass percentages, create views, and rank test results using SQL and Python with pandas. Additionally, it touches on Power BI topics such as data sources, report development, and the differences between measures and calculated columns.

Uploaded by

gupta.ayushi2425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

Interview Questions

The document outlines various SQL concepts, including the use of WHERE and HAVING clauses, CTEs, and joins, along with practical examples using sample data. It also discusses how to calculate pass percentages, create views, and rank test results using SQL and Python with pandas. Additionally, it touches on Power BI topics such as data sources, report development, and the differences between measures and calculated columns.

Uploaded by

gupta.ayushi2425
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

Introduction

Technical skillset and projects worked on


Role and Responsibilities in current project

SQL
-----------------------

1. Difference WHERE and HAVING Clause


2. Use of CTE

Sample Data
---------------

Sno Test Station


1 A
2 B
3 C
4 D
5 A

Test Station Status


A Green
B Green
C Red

1. Give output of left join, inner join on above data


2. Which one will take less runtime

Sample Data - Sno is unique number of each device

Sno Measurement_Name Measure_value time_stamp


1 Left Speaker 10 10:00AM
1 Left Speaker 15 11:00 AM
1 Left Speaker 18 12:30 PM
2 Left Speaker 12 10:00AM
3 Left Speaker 16 11:00 AM
4 Left Speaker 15 12:30 PM
5 Left Speaker 10 10:00AM
5 Left Speaker 9 11:00 AM
5 Left Speaker 7 12:30 PM

1. Add a column "Test_result" which will contain values "Pass", "Fail" based on the
condition if Measure_Value>15 then "Pass"
Select [Link], s.Measurement_Name, s.Measure_value, s.time_stamp,
Case when s.Measure_value>15 then 'Pass' else 'Fail' end as Test_Result
from SampleData s

2. Fetch the last test result data for each device based on time_stamp.
With CTE as (
Select Sno, max(time_stamp) as last_timestamp
from SampleData
group by Sno
)
Select [Link], s.Measurement_Name, s.Measure_value, s.time_stamp,
Case when s.Measure_value>15 then 'Pass' else 'Fail' end as Test_Result
from SampleData s join CTE c on [Link] = [Link] and s.time_stamp = c.last_timestamp
3. Create a view with Pass percentage of each device.
CREATE VIEW PassPercentageView AS
SELECT
Sno,
round((SUM(CASE WHEN Measure_value > 15 THEN 1 ELSE 0 END) * 100.0 /
COUNT(*)),2) AS pass_percentage
FROM
SampleData
GROUP BY
Sno;

4. Use RANK function to create a view in desc order of time_stamp and assign rank
to each device
CREATE VIEW RankedTestResults AS
SELECT
Sno,
Measurement_Name,
Measure_value,
time_stamp,
case when Measure_value>15 then 'Pass'
else 'Fail'
end as Test_Result,
Rank() Over (partition by Sno Order by time_stamp desc) as Ranks
from SampleData

import pandas as pd

# Sample data as a list of dictionaries


data = {
'Sno': [1, 1, 1, 2, 3, 4, 5, 5, 5],
'Measurement_Name': ['Left Speaker'] * 9,
'Measure_value': [10, 15, 18, 12, 16, 15, 10, 9, 7],
'time_stamp': ['2024-10-11 [Link]', '2024-10-11 [Link]', '2024-10-11
[Link]',
'2024-10-11 [Link]', '2024-10-11 [Link]', '2024-10-11
[Link]',
'2024-10-11 [Link]', '2024-10-11 [Link]', '2024-10-11
[Link]']
}

# Convert the data to a pandas DataFrame


df = [Link](data)

# Convert time_stamp to datetime format


df['time_stamp'] = pd.to_datetime(df['time_stamp'])

-----------------------------------------------------------------------------------
------
# 1. Add a "Test_result" column
-----------------------------------------------------------------------------------
------
df['Test_result'] = df['Measure_value'].apply(lambda x: 'Pass' if x > 15 else
'Fail')
print("Data with Test_result column:")
print(df)
-----------------------------------------------------------------------------------
------
# 2. Fetch the last test result data for each device based on time_stamp
-----------------------------------------------------------------------------------
------
df_last_test = df.sort_values(by='time_stamp').groupby('Sno').last().reset_index()
print("\nLast test result for each device:")
print(df_last_test)

-----------------------------------------------------------------------------------
------
# 3. Calculate the pass percentage for each device
-----------------------------------------------------------------------------------
------
pass_percentage_df = [Link]('Sno').apply(
lambda x: round((x['Test_result'] == 'Pass').sum() * 100.0 / len(x), 2)
).reset_index(name='pass_percentage')
print("\nPass percentage for each device:")
print(pass_percentage_df)

-----------------------------------------------------------------------------------
------
# 4. Rank the test results for each device based on time_stamp
-----------------------------------------------------------------------------------
------
df['rank'] = [Link]('Sno')['time_stamp'].rank(ascending=False, method='dense')
print("\nRanked test results based on time_stamp:")
print(df)

-----------------------------------------------------------------------------------
-------------------------------------

Sample Data

Sno Measurement Name Measure value time stamp Date Test Station
11 Left Speaker 10 10:00AM 1-Jan A
11 Left Speaker 15 11:00 AM 1-Jan B
11 Left Speaker 18 12:30 PM 1-Jan A
12 Left Speaker 12 10:00AM 1-Jan B
13 Left Speaker 16 11:00 AM 1-Jan A
14 Left Speaker 15 12:30 PM 1-Jan A
15 Left Speaker 10 10:00AM 1-Jan A
15 Left Speaker 9 11:00 AM 1-Jan B
15 Left Speaker 7 12:30 PM 1-Jan B
1 Left Speaker 10 10:00AM 2-Jan A
1 Left Speaker 15 11:00 AM 2-Jan C
1 Left Speaker 18 12:30 PM 2-Jan G
2 Left Speaker 12 10:00AM 2-Jan A
3 Left Speaker 16 11:00 AM 2-Jan A
4 Left Speaker 15 12:30 PM 2-Jan D
5 Left Speaker 10 10:00AM 2-Jan H
5 Left Speaker 9 11:00 AM 2-Jan I
5 Left Speaker 7 12:30 PM 2-Jan B

1. Fetch test pass percentage of each test station


Select Test_Station,
sum(Case when Measure_value>15 then 1 else 0 end) *100/count(*) as
Test_Result_Percentage
from measure
group by Test_Station

2. Highest pass pecentage test station


SELECT
Test_Station,
ROUND(
(SUM(CASE WHEN Measure_value > 15 THEN 1 ELSE 0 END) * 100.0) / COUNT(*),
2
) AS pass_percentage
FROM
measurements
GROUP BY
Test_Station
ORDER BY
pass_percentage DESC
LIMIT 1; -- For MySQL

SELECT TOP 1
Test_Station,
ROUND(
(SUM(CASE WHEN Measure_value > 15 THEN 1 ELSE 0 END) * 100.0) /
NULLIF(COUNT(*), 0),
2
) AS pass_percentage
FROM
measurements
GROUP BY
Test_Station
ORDER BY
pass_percentage DESC;

With CTE as
(Select Test_Station,
sum(Case when Measure_value>15 then 1 else 0 end) *100/count(*) as
Test_Result_Percentage
from measure
group by Test_Station)

Select Test_Station, max(Test_Result_Percentage) from CTE


group by Test_Station

-----------------------------------------------------------------------------------
----------------

import pandas as pd

# Sample data
data = {
'Sno': [11, 11, 11, 12, 13, 14, 15, 15, 15, 1, 1, 1, 2, 3, 4, 5, 5, 5],
'Measurement Name': ['Left Speaker']*18,
'Measure value': [10, 15, 18, 12, 16, 15, 10, 9, 7, 10, 15, 18, 12, 16, 15, 10,
9, 7],
'time stamp': ['10:00AM', '11:00 AM', '12:30 PM', '10:00AM', '11:00 AM', '12:30
PM', '10:00AM', '11:00 AM', '12:30 PM',
'10:00AM', '11:00 AM', '12:30 PM', '10:00AM', '11:00 AM', '12:30
PM', '10:00AM', '11:00 AM', '12:30 PM'],
'Date': ['1-Jan']*9 + ['2-Jan']*9,
'Test Station': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'B', 'A', 'C', 'G',
'A', 'A', 'D', 'H', 'I', 'B']
}

# Creating the DataFrame


df = [Link](data)

df["test_pass_fail"] = df["Measure value"].apply(lambda x : "Pass" if x>15 else


"Fail")
[Link](5)

-----------------------------------------------------------------------------------
---------------------------
pass_percent_stat = [Link]('Test Station').agg(total_tests=('test_pass_fail',
'size'),
pass_tests=('test_pass_fail', lambda
x: (x == 'Pass').sum()))
pass_perecnt = pass_percent_stat['pass_tests']/pass_percent_stat['total_tests']*100
pass_percent_stat['pass_percentage'] = pass_perecnt
pass_percent_stat
-----------------------------------------------------------------------------------
------------------------------

pass_percentage_df = [Link]('Test Station').apply(


lambda x: round((x['test_pass_fail'] == 'Pass').sum() * 100.0 / len(x), 2)
).reset_index(name='pass_percentage')
print("\nPass percentage for each device:")
print(pass_percentage_df)

-----------------------------------------------------------------------------------
-----------------------------

pass_highest_percent = pass_percentage_df.sort_values(by='pass_percentage',
ascending=False)
pass_highest_percent.head(1)

-----------------------------------------------------------------------------------
---------------------------------
PYTHON
-------------
1. Implement above scenarios in python using Pandas
2. Some basic theoretical question on python

POWER BI
------------------
1. Various data sources and how can we connect those
2. How to develop a report from scratch (steps)
3. Difference between Measure and Calculated column
4. Which will take less time in loading Measure or Calculated Column

You might also like