Questions and Answers
Q: You generate two NumPy arrays: a = np.array([2, 3, 1]) and b = np.array([[10], [20],
[30]]). When you perform np.dot(a, b), what result is returned?
A: A scalar value
Q: Which SQL clause is used to specify the columns to be included in the result set?
A: SELECT
Q: What does the TRIM function do in Excel?
A: Removes leading, trailing, and extra spaces in text
Q: Which of the following functions will return the second-largest value in a data range?
A: LARGE(range, 2)
Q: You merge two datasets — one contains transaction IDs and amounts, and the other
contains customer info. After merging using pd.merge(), you notice a sharp increase in
the number of rows. What is the most likely cause?
A: Merge was done on non-unique keys
Q: What does the function =SUBSTITUTE(A1, "old", "new") do?
A: Replaces the word "old" with "new" in A1
Q: Which Excel feature would you use to unpivot columns into rows for analysis?
A: Power Query
Q: You are designing a database for an online shopping platform. Which type of
database constraint should be used to ensure that the product price cannot be negative?
A: CHECK constraint
Q: If =INDEX(A1:C3, 2, 3) is used, what does it return?
A: The value in column 3 and row 2 of the range A1:C3
Q: What is the difference between LEFT JOIN and RIGHT JOIN?
A: LEFT JOIN returns all rows from the left table, and RIGHT JOIN returns all rows from the
right table.
Q: Which tool is best suited for splitting data into columns based on delimiters?
A: Text to Columns
Q: What does the formula =IFERROR(AVERAGE(A1:A10), 0) do?
A: Averages the range A1:A10 if there are no errors; otherwise, returns 0
Q: Which SQL operator is used for combining the result sets of two SELECT statements,
including all duplicate rows?
A: UNION
Q: Which of the following tools allows you to merge and transform large datasets in
Excel?
A: Power Query
Q: How would you create a dynamic range in Excel for data analysis?
A: Use a named range with OFFSET
Q: In the city of Normaltown, database normalization is the talk of the town. What is the
primary goal of normalization in a database?
A: To reduce data redundancy
Q: What is the purpose of the SQL ORDER BY clause?
A: Sort the result set
Q: Which of the following is NOT a valid aggregation in a PivotTable?
A: MEDIAN
Q: What is the result of a CROSS JOIN between tables A and B, both having 5 rows?
A: 25 rows
Q: You’re optimizing performance in a machine learning pipeline using NumPy. You find
that repeated loops are slowing down preprocessing. Which approach provides the best
speed-up?
A: Use NumPy vectorized operations
Q: A startup analyzes website traffic data and models the number of page visits per user
using a Poisson distribution. During A/B testing, the control group’s mean page visits is
3, while the experimental group shows a significant deviation. To statistically test the
significance of this change, which test is most appropriate?
A: Poisson regression likelihood ratio test
Q: You are working with a dataset of customer complaints where the
Complaint_Description column contains free-text entries. Many entries have minor
spelling variations and case differences, making aggregation unreliable. Which method
best improves the data quality for analysis?
A: Case normalization and fuzzy string matching
Q: Given a DataFrame with a column Order_Amount containing [100, 200, 150, 180, 90,
5000], you want to compute the IQR-based upper threshold to detect outliers. What is
the computed threshold?
A: 460
Q: What is the purpose of using the INDIRECT function in Excel?
A: To return a reference based on text
Q: What does =TEXT(TODAY(), "YYYY-MM-DD") return?
A: Current date in the format YYYY-MM-DD
Q: What does the "Solver" add-in allow you to do?
A: Optimize a value by changing multiple input variables
Q: A business analyst uses linear regression to estimate the impact of Advertising_Spend
on Sales. The R² value is 0.97, but residual plots show clear curvature and non-constant
variance. What is the best next step?
A: Use a polynomial regression
Q: SELECT COUNT(DISTINCT employee_id) FROM employees; If the "employees" table
has 100 records, how many distinct employee IDs will be counted?
A: Depends on how many unique employee_ids there are, but if all are unique, then 100
Q: In a FULL OUTER JOIN, what happens if there is a match in one table but not in the
other?
A: Includes unmatched rows from both tables with NULL values
Q: Which SQL keyword is used to break the connection between two tables in a self-
join?
A: ALIAS
Q: What does the TREND function do?
A: Forecast future values based on a linear trend
Q: Which formula would you use to return the row number of a value in a column?
A: MATCH(value, range, 0)
Q: Which chart type is best for showing a correlation between two variables?
A: Scatter Plot
Q: In a telecom dataset, some Call_Duration entries are missing, but the Call_Type (e.g.,
local, STD, ISD) column is fully filled. You observe that STD calls generally have longer
durations. What is the most appropriate imputation method?
A: Median grouped by Call_Type
Q: You have a dataset where the feature Salary has a mean of Rs.70,000 and a standard
deviation of Rs.15,000. You want to identify values that fall beyond 2 standard
deviations from the mean. What is the lower and upper bound for outliers based on this
rule?
A: Rs.40,000 and Rs.100,000
Q: What is the purpose of the SQL GROUP BY clause?
A: Group rows based on common values in one or more columns
Q: If a dataset has duplicates, which feature is best to remove them?
A: Remove Duplicates in the Data tab
Q: In the realm of databases, normalization unveils a story. Witness this enigmatic
scenario: What is the primary goal of normalization in database design?
A: To eliminate data anomalies and improve data integrity
Q: You are building a dashboard to monitor sales across regions. Each region has highly
skewed sales data due to occasional large deals. A bar chart hides these outliers visually.
Which visualization is best suited to capture both spread and anomalies?
A: Box plot
Q: Which of the following is true about array formulas?
A: They process multiple values simultaneously
Q: You are designing a database for a library management system. The "Books" table has
an attribute named "ISBN" that should be unique for each book. Which type of database
constraint should you use to enforce this uniqueness?
A: UNIQUE constraint
Q: Which SQL clause is used to limit the number of rows returned by a query?
A: LIMIT
Q: Which Excel feature allows you to create slicers for filtering data visually?
A: PivotTables
Q: Which SQL clause is used to filter rows from a table?
A: WHERE
Q: Which JOIN type returns only unmatched rows from the left table?
A: LEFT JOIN (with filter for unmatched rows)
Q: What is the purpose of a self-join in SQL?
A: To join a table with itself
Q: In SQL, what does the COUNT() function do?
A: Counts the number of rows in a result set
Q: What is the purpose of the SQL DISTINCT keyword?
A: Remove duplicate records from a result set
Q: What does the SQL GROUP BY clause do when used in a query?
A: Group rows based on common values in one or more columns
Q: Which tool in Excel provides an advanced summary of relationships between
variables?
A: Analysis ToolPak