2-Hour Quiz: Python, Pandas, NumPy, Matplotlib, Plotly, and MySQL
(Advanced Practical)
Instructions:
Time: 2 hours
Answer all questions by writing functional code or queries.
Ensure your code is clear, efficient, and commented where necessary.
For coding questions, assume you have the necessary libraries imported (e.g., numpy , pandas , matplotlib.pyplot , plotly.express ).
For MySQL, assume you have a database connection.
Submit your answers in a single script or document with separate sections for each question.
Total: 100 points
Note : These questions are designed to be challenging. Use error handling where appropriate and optimize your solutions.
Section 1: Python Basics (20 minutes, 20 points)
Question 1.1 (8 points)
Write a Python function group_anagrams(words) that takes a list of strings and returns a list of lists, where each sublist contains words that are anagrams of each
other. For example, group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]) should return [["eat", "tea", "ate"], ["tan", "nat"],
["bat"]] . Use efficient sorting techniques.
# Example usage:
# print(group_anagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# Output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
Question 1.2 (7 points)
Write a Python program that processes a list of dictionaries representing transactions: transactions = [{"id": 1, "amount": 100, "type": "credit"},
{"id": 2, "amount": 50, "type": "debit"}, {"id": 3, "amount": 75, "type": "credit"}] . Calculate the net balance (sum of credits minus sum of
debits) and store it in net_balance . If any transaction has an invalid type (not "credit" or "debit"), raise a ValueError with an appropriate message.
Question 1.3 (5 points)
Write a Python program to generate a 5x5 matrix of numbers where each cell contains the product of its row and column indices (1-based indexing). For example, the
cell at row 2, column 3 should contain 2 * 3 = 6 . Store the matrix as a nested list in index_matrix .
# Expected output for index_matrix:
# [[1, 2, 3, 4, 5],
# [2, 4, 6, 8, 10],
# [3, 6, 9, 12, 15],
# [4, 8, 12, 16, 20],
# [5, 10, 15, 20, 25]]
Section 2: NumPy (20 minutes, 20 points)
Question 2.1 (8 points)
Create a NumPy array of shape (5, 5) filled with random floats between 0 and 1. Normalize the array so that each row sums to 1 (i.e., convert each row to a probability
distribution). Store the normalized array in normalized_arr . Handle potential division-by-zero cases.
import numpy as np
# Your code here
Question 2.2 (7 points)
Given a NumPy array arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]) , write code to:
Create a new array shifted_arr where each row is shifted one position to the right, with the last element wrapping around to the first position.
Compute the element-wise product of arr and shifted_arr , and store the sum of all elements in product_sum .
# Example for one row: [1, 2, 3, 4] -> [4, 1, 2, 3]
Question 2.3 (5 points)
Create two NumPy arrays: a = np.array([1, 2, 3]) and b = np.array([4, 5, 6]) . Compute the outer product of a and b , then flatten the result into a 1D
array sorted in ascending order. Store the result in outer_sorted .
Section 3: Pandas (25 minutes, 20 points)
Question 3.1 (8 points)
Assume a CSV file employees.csv with columns: emp_id , name , department , salary , hire_date . Write Pandas code to:
Load the CSV into a DataFrame.
Convert hire_date to datetime format.
Group by department and calculate the average salary and earliest hire date per department.
Save the grouped results to department_summary.csv .
import pandas as pd
# Assume employees.csv exists
# Your code here
Question 3.2 (7 points)
Create a Pandas DataFrame with 6 rows and 3 columns: product , category , sales . Populate with sample data (e.g., products like "Laptop", categories like
"Electronics"). Write code to:
Create a pivot table showing total sales per category.
Identify the category with the lowest total sales and store its name in lowest_category .
Handle missing values by filling them with 0.
import pandas as pd
# Your code here
Question 3.3 (5 points)
Given a DataFrame df with columns student , math_score , science_score , write code to:
Create a new column average_score as the mean of math_score and science_score .
Filter rows where average_score is in the top 25% of all average scores.
Store the filtered DataFrame in top_performers .
import pandas as pd
# Assume df exists
# Your code here
Section 4: Matplotlib and Plotly (25 minutes, 20 points)
Question 4.1 (8 points)
Using Matplotlib, create a subplot with 2 plots (side by side):
Left plot: A histogram of 1000 random samples from a normal distribution (mean=0, std=1).
Right plot: A boxplot of the same data.
Include a shared title: "Normal Distribution Analysis"
Label axes appropriately.
Save the figure as normal_analysis.png .
import matplotlib.pyplot as plt
import numpy as np
# Your code here
Question 4.2 (7 points)
Using Plotly, create an interactive 3D scatter plot for:
x = [1, 2, 3, 4, 5]
y = [10, 15, 13, 17, 20]
z = [5, 8, 6, 12, 15]
Color points by z value and add a colorbar.
Add a title: "3D Scatter Analysis"
Display the plot (assume a Jupyter notebook environment).
import plotly.express as px
# Your code here
Question 4.3 (5 points)
Using Matplotlib, create a stacked bar chart for:
Categories: ["Q1", "Q2", "Q3", "Q4"]
Two datasets: product_A = [20, 25, 30, 35] and product_B = [15, 10, 20, 25]
Label axes: "Quarter" and "Sales"
Add a legend and title: "Quarterly Sales"
Display the plot.
import matplotlib.pyplot as plt
# Your code here
Section 5: MySQL (30 minutes, 20 points)
Question 5.1 (8 points)
Write a MySQL query to create two tables:
customers with columns: customer_id (int, primary key, auto-increment), name (varchar, 50), email (varchar, 100)
orders with columns: order_id (int, primary key, auto-increment), customer_id (int, foreign key referencing customers ), amount (decimal, 8,2) Insert 2
customers and 3 orders (at least one customer with multiple orders).
-- Your code here
Question 5.2 (7 points)
Assume tables products (columns: product_id , name , price ) and order_details (columns: order_id , product_id , quantity ). Write a MySQL query to:
Join the tables to calculate the total cost ( price * quantity ) per order.
Group by order_id and sum the total cost.
Filter for orders with total cost greater than 200.
Order by total cost in descending order.
-- Your code here
Question 5.3 (5 points)
Write a MySQL query to create a view called high_value_customers based on the customers and orders tables (from Question 5.1). The view should show:
customer_id , name , and total order amount for customers whose total orders exceed 500.
-- Your code here
Submission Instructions:
Combine all your code and queries into a single script or document.
Clearly label each question (e.g., "Question 1.1").
For Python/NumPy/Pandas/Matplotlib/Plotly, provide the code in Python blocks.
For MySQL, provide the queries in SQL blocks.
Ensure your code runs without errors assuming the required files/tables exist.
Include error handling where appropriate (e.g., invalid inputs, missing data).
Total Points: 100
Python: 20 points
NumPy: 20 points
Pandas: 20 points
Matplotlib/Plotly: 20 points
MySQL: 20 points
Challenge Note : These questions require efficient algorithms, careful data manipulation, and creative visualization. Test your solutions thoroughly!
Good luck!