Introduction
Technical skillset and projects worked on
Role and Responsibilities in current project
SQL
-----------------------
1. Difference WHERE and HAVING Clause
2. Use of CTE
Sample Data
---------------
Sno Test Station
1 A
2 B
3 C
4 D
5 A
Test Station Status
A Green
B Green
C Red
1. Give output of left join, inner join on above data
2. Which one will take less runtime
Sample Data - Sno is unique number of each device
Sno Measurement_Name Measure_value time_stamp
1 Left Speaker 10 10:00AM
1 Left Speaker 15 11:00 AM
1 Left Speaker 18 12:30 PM
2 Left Speaker 12 10:00AM
3 Left Speaker 16 11:00 AM
4 Left Speaker 15 12:30 PM
5 Left Speaker 10 10:00AM
5 Left Speaker 9 11:00 AM
5 Left Speaker 7 12:30 PM
1. Add a column "Test_result" which will contain values "Pass", "Fail" based on the
condition if Measure_Value>15 then "Pass"
Select [Link], s.Measurement_Name, s.Measure_value, s.time_stamp,
Case when s.Measure_value>15 then 'Pass' else 'Fail' end as Test_Result
from SampleData s
2. Fetch the last test result data for each device based on time_stamp.
With CTE as (
Select Sno, max(time_stamp) as last_timestamp
from SampleData
group by Sno
)
Select [Link], s.Measurement_Name, s.Measure_value, s.time_stamp,
Case when s.Measure_value>15 then 'Pass' else 'Fail' end as Test_Result
from SampleData s join CTE c on [Link] = [Link] and s.time_stamp = c.last_timestamp
3. Create a view with Pass percentage of each device.
CREATE VIEW PassPercentageView AS
SELECT
Sno,
round((SUM(CASE WHEN Measure_value > 15 THEN 1 ELSE 0 END) * 100.0 /
COUNT(*)),2) AS pass_percentage
FROM
SampleData
GROUP BY
Sno;
4. Use RANK function to create a view in desc order of time_stamp and assign rank
to each device
CREATE VIEW RankedTestResults AS
SELECT
Sno,
Measurement_Name,
Measure_value,
time_stamp,
case when Measure_value>15 then 'Pass'
else 'Fail'
end as Test_Result,
Rank() Over (partition by Sno Order by time_stamp desc) as Ranks
from SampleData
import pandas as pd
# Sample data as a list of dictionaries
data = {
'Sno': [1, 1, 1, 2, 3, 4, 5, 5, 5],
'Measurement_Name': ['Left Speaker'] * 9,
'Measure_value': [10, 15, 18, 12, 16, 15, 10, 9, 7],
'time_stamp': ['2024-10-11 [Link]', '2024-10-11 [Link]', '2024-10-11
[Link]',
'2024-10-11 [Link]', '2024-10-11 [Link]', '2024-10-11
[Link]',
'2024-10-11 [Link]', '2024-10-11 [Link]', '2024-10-11
[Link]']
}
# Convert the data to a pandas DataFrame
df = [Link](data)
# Convert time_stamp to datetime format
df['time_stamp'] = pd.to_datetime(df['time_stamp'])
-----------------------------------------------------------------------------------
------
# 1. Add a "Test_result" column
-----------------------------------------------------------------------------------
------
df['Test_result'] = df['Measure_value'].apply(lambda x: 'Pass' if x > 15 else
'Fail')
print("Data with Test_result column:")
print(df)
-----------------------------------------------------------------------------------
------
# 2. Fetch the last test result data for each device based on time_stamp
-----------------------------------------------------------------------------------
------
df_last_test = df.sort_values(by='time_stamp').groupby('Sno').last().reset_index()
print("\nLast test result for each device:")
print(df_last_test)
-----------------------------------------------------------------------------------
------
# 3. Calculate the pass percentage for each device
-----------------------------------------------------------------------------------
------
pass_percentage_df = [Link]('Sno').apply(
lambda x: round((x['Test_result'] == 'Pass').sum() * 100.0 / len(x), 2)
).reset_index(name='pass_percentage')
print("\nPass percentage for each device:")
print(pass_percentage_df)
-----------------------------------------------------------------------------------
------
# 4. Rank the test results for each device based on time_stamp
-----------------------------------------------------------------------------------
------
df['rank'] = [Link]('Sno')['time_stamp'].rank(ascending=False, method='dense')
print("\nRanked test results based on time_stamp:")
print(df)
-----------------------------------------------------------------------------------
-------------------------------------
Sample Data
Sno Measurement Name Measure value time stamp Date Test Station
11 Left Speaker 10 10:00AM 1-Jan A
11 Left Speaker 15 11:00 AM 1-Jan B
11 Left Speaker 18 12:30 PM 1-Jan A
12 Left Speaker 12 10:00AM 1-Jan B
13 Left Speaker 16 11:00 AM 1-Jan A
14 Left Speaker 15 12:30 PM 1-Jan A
15 Left Speaker 10 10:00AM 1-Jan A
15 Left Speaker 9 11:00 AM 1-Jan B
15 Left Speaker 7 12:30 PM 1-Jan B
1 Left Speaker 10 10:00AM 2-Jan A
1 Left Speaker 15 11:00 AM 2-Jan C
1 Left Speaker 18 12:30 PM 2-Jan G
2 Left Speaker 12 10:00AM 2-Jan A
3 Left Speaker 16 11:00 AM 2-Jan A
4 Left Speaker 15 12:30 PM 2-Jan D
5 Left Speaker 10 10:00AM 2-Jan H
5 Left Speaker 9 11:00 AM 2-Jan I
5 Left Speaker 7 12:30 PM 2-Jan B
1. Fetch test pass percentage of each test station
Select Test_Station,
sum(Case when Measure_value>15 then 1 else 0 end) *100/count(*) as
Test_Result_Percentage
from measure
group by Test_Station
2. Highest pass pecentage test station
SELECT
Test_Station,
ROUND(
(SUM(CASE WHEN Measure_value > 15 THEN 1 ELSE 0 END) * 100.0) / COUNT(*),
2
) AS pass_percentage
FROM
measurements
GROUP BY
Test_Station
ORDER BY
pass_percentage DESC
LIMIT 1; -- For MySQL
SELECT TOP 1
Test_Station,
ROUND(
(SUM(CASE WHEN Measure_value > 15 THEN 1 ELSE 0 END) * 100.0) /
NULLIF(COUNT(*), 0),
2
) AS pass_percentage
FROM
measurements
GROUP BY
Test_Station
ORDER BY
pass_percentage DESC;
With CTE as
(Select Test_Station,
sum(Case when Measure_value>15 then 1 else 0 end) *100/count(*) as
Test_Result_Percentage
from measure
group by Test_Station)
Select Test_Station, max(Test_Result_Percentage) from CTE
group by Test_Station
-----------------------------------------------------------------------------------
----------------
import pandas as pd
# Sample data
data = {
'Sno': [11, 11, 11, 12, 13, 14, 15, 15, 15, 1, 1, 1, 2, 3, 4, 5, 5, 5],
'Measurement Name': ['Left Speaker']*18,
'Measure value': [10, 15, 18, 12, 16, 15, 10, 9, 7, 10, 15, 18, 12, 16, 15, 10,
9, 7],
'time stamp': ['10:00AM', '11:00 AM', '12:30 PM', '10:00AM', '11:00 AM', '12:30
PM', '10:00AM', '11:00 AM', '12:30 PM',
'10:00AM', '11:00 AM', '12:30 PM', '10:00AM', '11:00 AM', '12:30
PM', '10:00AM', '11:00 AM', '12:30 PM'],
'Date': ['1-Jan']*9 + ['2-Jan']*9,
'Test Station': ['A', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'B', 'A', 'C', 'G',
'A', 'A', 'D', 'H', 'I', 'B']
}
# Creating the DataFrame
df = [Link](data)
df["test_pass_fail"] = df["Measure value"].apply(lambda x : "Pass" if x>15 else
"Fail")
[Link](5)
-----------------------------------------------------------------------------------
---------------------------
pass_percent_stat = [Link]('Test Station').agg(total_tests=('test_pass_fail',
'size'),
pass_tests=('test_pass_fail', lambda
x: (x == 'Pass').sum()))
pass_perecnt = pass_percent_stat['pass_tests']/pass_percent_stat['total_tests']*100
pass_percent_stat['pass_percentage'] = pass_perecnt
pass_percent_stat
-----------------------------------------------------------------------------------
------------------------------
pass_percentage_df = [Link]('Test Station').apply(
lambda x: round((x['test_pass_fail'] == 'Pass').sum() * 100.0 / len(x), 2)
).reset_index(name='pass_percentage')
print("\nPass percentage for each device:")
print(pass_percentage_df)
-----------------------------------------------------------------------------------
-----------------------------
pass_highest_percent = pass_percentage_df.sort_values(by='pass_percentage',
ascending=False)
pass_highest_percent.head(1)
-----------------------------------------------------------------------------------
---------------------------------
PYTHON
-------------
1. Implement above scenarios in python using Pandas
2. Some basic theoretical question on python
POWER BI
------------------
1. Various data sources and how can we connect those
2. How to develop a report from scratch (steps)
3. Difference between Measure and Calculated column
4. Which will take less time in loading Measure or Calculated Column