0% found this document useful (0 votes)
66 views16 pages

ISOM 2600 Practice Final

Uploaded by

orange135654
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views16 pages

ISOM 2600 Practice Final

Uploaded by

orange135654
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ISOM 2600 Introduction to Business Analytics

Practice Exam
Topic 1:
Q1. What will be output for the following code?
import pandas as pd
s = pd.DataFrame(columns = ["age","gender"], index = range(5))
s["age"] = [22,25,43,4,5]
s["gender"] = ["F","M","M","F","M"]
s.shape[1]
A. 2
B. 4
C. 5
D. 10

Q2. What type of data is: arr = [(1,1), (2,2), (3,3)]?


A. Array of tuples
B. T uples of lists
C. List of tuples
D. Invalid type

Q3. What will be the output of the following code?


A = {'a': 3, 'b': 1}
B = {'a': 3, 'b1': 1}
C = {'a1': 3, 'b': 2}
Df = pd.DataFrame([A,B,C])
print(Df.shape)
A. (2,3)
B. (3,2)
C. (3,4)
D. (4,3)
Q4. Consider a data frame df with columns ["A", "B", "C", "D"] and rows ["r1", "r2", "r3"],
How to access last two rows of the df?
A. df.iloc[["r2","r3"]]
B. df.loc[["r2","r3"]]
C. df.iloc[1:2]
D. df.loc[1:2]

Q5: Which of the following statements is Incorrect?


A. DataFrame has both a row and column index.
B. A Series is a two-dimensional labelled data structure.
C. DataFrame created from single Series has 1 column.
D. All the above are correct.

Q6: Suppose We have been given the power consumption data of four powerplants abbreviated as
‘NI”,”PJME”,”PJMW”,PJM_load”, and we want to make to combine the four plots together with the
following layout.
PJMW PJM_load
NI PJME
; and the following code lines are provided:
import matplotlib.pyplot as plt
axes1=plt.subplot(2,2,w)
axes1.hist(df["NI"])
axes2=plt.subplot(2,2,x)
axes2.hist(df["PJME"])
axes3=plt.subplot(2,2,y)
axes3.hist(df["PJMW"])
axes4=plt.subplot(2,2,z)
axes4.hist(df["PJM_load"])
Which of the following set of values for (w,x,y,z) can achieve the layout above?
A.(1,2,3,4)
B.(3,2,1,4)
C.(3,4,1,2)
D.(2,4,1,3)
Q7: Consider a DataFrame df and the following code.
from numpy import nan
import pandas as pd
s = pd.date_range('2016-12-31', '2017-01-09', freq = 'D')
df = pd.DataFrame({"Value":[4,6,6,nan,4,nan,6,7,nan,5]}, index = s)
df['Value'].bfill().median()
What is the expected output?
A. 4.5
B. 5
C. 5.5
D. 6

Q8: Based on the code section in Q7, we then write the following script:
df['Wday'] = df.index.weekday
df['IsWday'] = [1 if (x < 6) & (x > 0) else 0 for x in df['Wday']]
df[(df['IsWday'] == 1)].ffill()['Value'].mode()
Suppose Dec 31st, 2016 is a Friday, and we start the week from Sunday. What is the expected output?
A.7
B.6
C.5
D.4

Q9: What's the output of the following codes: pd.to_datetime("2021/12/29").month


A. Error
B. 12
C. 2021
D. 29
Q10: Write the output of the following:
S1 = pd.Series([14, 7, 9], index = range(1, 8, 3))
print(S1)
A.
1 14
8 7
3 9
B.
1 14
4 7
7 9
C.
1 14
4 7
8 9
D. None of the above

Q11: By default plot() function plots a __:


A. Histogram
B. Bar graph
C. Line chart
D. Pie chart

Q12: Which of the following pyplot function is used to set the range on the x-axis:
A. xlim()
B. xlabel()
C. xvalues()
D. None of the above
Q13. T he following codes import the financial data of AAPL.

Which of the following code calculates the daily rate of return on AAPL?
A. (df["Adj Close"] - df["Adj Close"].shift(1)) / df["Adj Close"].shift(1)
B. (df["Adj Close"] - df["Adj Close"].shift(1)) / df["Adj Close"]
C. (df["Adj Close"] - df["Adj Close"].shift(-1)) / df["Adj Close "].shift(-1)
D. (df["Adj Close"] - df["Adj Close"].shift(-1)) / df["Adj Close"]

Q14: What's the output of the following codes:


import pandas as pd
import numpy as np
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
df = pd.DataFrame(dict)
df.fillna(method ='bfill')
A. First Score Second Score T hird Score
0 100.0 30.0 40.0
1 90.0 45.0 40.0
2 95.0 56.0 80.0
3 95.0 NaN 98.0
B. First Score Second Score T hird Score
0 100.0 30.0 40.0
1 90.0 45.0 40.0
2 90.0 56.0 80.0
3 95.0 56.0 98.0
C. First Score Second Score T hird Score
0 100.0 30.0 40.0
1 90.0 45.0 40.0
2 95.0 56.0 80.0
3 NaN NaN 98.0
D. First Score Second Score T hird Score
0 100.0 30.0 40.0
1 90.0 45.0 40.0
2 NaN 56.0 80.0
3 95.0 56.0 98.0

Q15: What is the output of the following list function?


sampleList = [10, 20, 30, 40, 50]
sampleList.append(60)
print(sampleList)

sampleList.append(60)
print(sampleList)

sampleList.append(60)
print(sampleList)

A. [10, 20, 30, 40, 50, 60]


[10, 20, 30, 40, 50, 60]
[10, 20, 30, 40, 50, 60]

B. [10, 20, 30, 40, 50, 60]


[10, 20, 30, 40, 50, 60, 60]
[10, 20, 30, 40, 50, 60, 60]

C. [10, 20, 30, 40, 50, 60]


[10, 20, 30, 40, 50, 60]
[10, 20, 30, 40, 50, 60, 60]

D. [10, 20, 30, 40, 50, 60]


[10, 20, 30, 40, 50, 60, 60]
[10, 20, 30, 40, 50, 60, 60, 60]
Topic 2
1. T he summary statistics of a dataset is produced by describe() command. However, a student
accidentally erased some of the statistics and he only knows the two commands: std() and
var(), from the package pandas.

What command should we use if we want to reproduce the missing statistics?


A. data.std(ddof = 0)
B. data.std(ddof = 1)
C. data.var(ddof = 0)
D. data.var(ddof = 1)

2. A company proposed a new policy to improve the employee’s performance. They first
divided the employees to experiment and control groups and recorded their performance
during the experiment. The data is stored as avg_exp_perf (experiment group) and
avg_con_perf (control group). Then they create a boxplot to visualize the data with a line
indicating the average performance of experiment group:

What is the corresponding script for the above output before add the horizontal line?
A. plt.box(avg_exp_perf, labels = ["Experiment"])
plt.box(avg_con_perf, labels = ["Control"])
B. plt.boxplot(avg_exp_perf, labels = ["Experiment"])
plt.boxplot(avg_con_perf, labels = ["Control"])
C. plt.box([avg_exp_perf,avg_con_perf], labels = ["Experiment", "Control"])
D. plt.boxplot([avg_exp_perf,avg_con_perf], labels = ["Experiment", "Control"])
[Q3-4] A data set consisting of the sales data of a superstore, is given with its first few rows as below:

A random sample of 100 is then drawn from the data. Then a bar chart is constructed to
summarize the counts of sub-category, in order to find the most popular product type. The
commands and output are given below:

3. What is the missing command in the first blank Q3?


A. sample(100)
B. resample(100)
C. randn(100)
D. random (100)

4. What is the missing command in the second blank Q4?


A. count()
B. get_value()
C. unique()
D. value_counts()
5. A data frame containing information about a population is made using the below codes.

A simulation study is conducted by repeating the following steps 10,000 times:


(i) Draw a random sample of size 5 from the population with replacement.
(ii) Calculate the sample standard deviation and store it in a list.
Which of the following code gives the desired result of the simulation study?
A.

B.

C.

D.

6. You are given the gender of HKUST students, stored in a dataframe with the first five data as
[F, M, F, M, M]. T o create a pie chart, which of the following command should be used?
A. plt.pie(X)
B. plt.pie(X.value_counts())
C. plt.pie(X.count())
D. We should recode the variable as ‘0’ and ‘1’ first.
[7-8] A credit card company would like to study their customer attributes, including their age and
yearly income (in $10,000). The data is stored in a dataframe “Customer”.

T hey wonder if their customer’s average yearly income is higher than $40,000 or not.
T he null and alternative hypotheses are
H0 : μ ≤ $40,000, H1 : μ > $40,000
T he z-value is then computed by the following code:

from scipy.stats import norm

income = Customer['income']

X_bar = income.mean()

s = income.std()

n = income.shape[0]

7. How to calculate the z statistics?


A. Z = (X_bar - 4)/(s/(n**0.5))
B. Z = (X_bar)/(s/(n**0.5))
C. Z = (X_bar - 4)/s
D. Z = (X_bar - 4)/(s/(n*0.5))

8. What is the correct command to find the p-value?


A. 1 – norm.cdf(Z)
B. 1 – norm.ppf(Z)
C. norm.cdf(Z)
D. norm.ppf(Z)
9. You are given a sample of data. In order to compute the 95% confidence interval of
population mean, you calculate the sample statistics using the codes below:

What is the missing code?


A. norm.ppf(0.05)
B. norm.ppf(0.95)
C. norm.ppf(0.025)
D. norm.ppf(0.975)
Topic 3
The following outputs are used for Q1 to Q5.
T he variable data stored a dataset as a DataFrame object in pandas. Here are the first five observations
of the dataset.

Constant terms are added based on the following python codes. The name of the new column is const.

Four different regression models are fitted using the following codes.

Furthermore, we have the following output:


Q1. Which of the following gives the smallest value? (model.resid will return the residuals of the
model)

A. (model_1.resid**2).sum()
B. (model_2.resid**2).sum()
C. (model_3.resid**2).sum()
D. (model_4.resid**2).sum()

Q2. It is known that the variable X2 is obtained by squaring X1. T he following code is run:

T he following information is also provided:

Which of the following plot is most likely to be the output of this code?

A. B.

C. D.
Q3. T he following code is run, and the output is also shown below.

Which of the following statement is correct?


A. T he slope of X1 is 5.0803.
B. 17.8% of variation of Y can be explained by the model.
C. 14.9% of variation of Y can be explained by the model.
D. T he adjusted R square is 0.178.
Q4. It is known that the variable X2 is obtained by squaring X1. T he following code is run and the
output is also shown below.

Which of the following output is most likely to obtained from the code
plt.scatter(y_hat,model_1.resid)?

A. B.

C. D.
Q5. T he following code is run, and the output is also shown below.

T he following code is further run:

--- (*)
Which of the following is the correct output from running code marked (*)?
A. 28, 0.149
B. 28, 0.178
C. 30, 0.149
D. 30, 0.178

You might also like