NIshant Chahal RM
NIshant Chahal RM
BATCH 2022-25
The success & the final outcome of this RM Lab Practical File require a lot of guidance and
assistance from many people and we extremely fortunate to have got this all along the
completion of our assistance work. Whether we have done only to such guidance and
assistance and we could not forget to thank them. I respect and thanks Dr. Anupama Sharma
for giving us an opportunity to do this project work and providing us all support and guidance
which made me complete the project on time, we are extremely grateful to her for providing
such a nice support and guidance.
We are really grateful because we managed to complete this project within the time given by
Dr. Anupama Sharma. Last but not the least we would like to express our gratitude to our
friends and parents for support and willingness to spend some time with us.
INDEX
Modules Topics Pg no.
Module 14 Q 21 ANOVA
MODULE-1
Step2- Click the Sample Files tab in the lower-left corner of the dialog.
Step3- Select the bankloan.sav data file and then click Open.
VARIABLE VIEW
MODULE-2
ADVANTAGES:
• The advantages of using SPSS as a software package compared to other are: • SPSS is
a comprehensive statistical software.
• Many complex statistical tests are available as a built-in feature.
• Interpretation of results is relatively easy.
• Easily and quickly displays data tables can be expanded.
LIMITATIONS
• SPSS can be expensive to purchase for students.
• Usually involves added training to completely exploit all the available features.
• The graph features are not as simple as of Microsoft Excel.
MODULE-3
Q5- How to import data in the SPSS and assigning values to variables?
Step1- Create an excel sheet with the data:-
Gender:- Male(1)
Female(2)
Marital Status:- Married(1)
Unmarried(2)
Education:- 12th Pass(1)
PG(2)
UG(3)
Experience:- 0-2 years(1)
3-5 years(2)
6-8 years(3)
9-10 years(4)
Median – Mid – value of the data set when it is arranged in ascending order.
Node is also known as graph vertex. It is a point on which the graph is defined and maybe
connected by graph edges. Each object in a graph is called a node.
Edge: An edge e is a link between two nodes. A link denotes movements between nodes. It has
a direction that is generally represented as an arrow. If an arrow is not used, it means the link
is bidirectional.
1. Pie Chart:
A pie chart is a circular graphical representation used to display data in a way that illustrates
the proportion of various categories within a whole. The circle represents the whole data set,
and the slices of the pie represent the individual categories or segments. The size of each slice
is proportional to the quantity it represents. Pie charts are most effective when showcasing data
with a relatively small number of categories and when the categories are non-overlapping and
distinct. They help in conveying the distribution of parts within a whole at a glance.
2. Bar Graph:
A bar graph (also known as a bar chart or bar diagram) is a graphical representation of data
using rectangular bars of varying lengths. It's used to compare values across different categories
or groups. Each bar represents a category, and the length or height of the bar is proportional to
the value it represents. Bar graphs can be displayed either horizontally or vertically, with the
categories on one axis and the values on the other. They're useful for showing comparisons,
trends, and patterns within categorical data.
3. Histogram:
Q9- Define boxplot and also explain the significance of boxplot in research.
Ans) When we display the data distribution in a standardize way using five summary –
minimum, Q1 (first quartile), median, Q3 (third quartile), maximum.
A box plot is a chart that shows data from a five number summary including one of the measures
of central tendency. It does not show the distribution in particular as much as the stem and leaf
plot or histogram does. But it is primarily used to indicate a distribution is skewed or not and
if there are potential unusual observations (also called outliers) present in a data set. Box plots
are also very beneficial when large number of data sets are involved or compared.
SIGNIFICANCE
It is used to know:-
Q10- Explain single box plot and cluster box plot and also explain the path and steps.
Ans) Single box plot graph displaying data from one quantitative variable. Also known as a
“box-and-whisker plot”. The box represents the middle 50% of observed values. The bottom
of the box is the first quartile (25th quartile) and the top of the box is the third quartile.
When a box plot is designed for a data asset with two or more categorical variables, one may
need to group/cluster some of the boxes by category. Such a clustered (grouped) box plot is
called a clustered box plot.
Steps:
Definition: To describe a single categorical variable, we use frequency tables. To describe the
relationship between two categorical variables, we use a special type of table called a cross-
tabulation (or "crosstab" for short). In a cross-tabulation, the categories of one variable
determine the rows of the table, and the categories of the other variable determine the
columns. The cells of the table contain the number of times that a particular combination of
categories occurred. The "edges" (or "margins") of the table typically contain the total
number of observations for that category.
Sandhya Gupta wants to see the scatter plot of information availability (cause) in online
ordering (effect) by the respondents.
Step4- Insert crosstabs and put the rows and columns from the data
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). It’s a common tool for
describing simple relationships without making a statement about cause and effect.
Correlation is measured by the correlation coefficient. It is very easy to calculate the
correlation coefficient in SPSS. Before calculating the correlation in SPSS, we should have
some basic knowledge about correlation.
Assumptions:
• Related pairs
• Data should be ratio or internal in nature
• Scores of each variable should be normally distributed
• Linear relationship between two variable
• Homoscedasticity- the variability in scores for one variable is approximately the same
at all values of other variables.
Types of correlation:
Degree Of Correlation:
Q13- Steps for computing correlation, cleaning correlation table and its interpretation.
Mr. X wants to see the scatter plot of information availability that is the (cause) and online
ordering (effect) by the respondents. Also find the Pearson’s correlation. Descriptive are below:
Information Availability Online Ordering
Not Important Never
Less Important Occasionally
Important Considerably
Very Important Almost always
Extremely Important Always
Step 5 - Select the Pearson option in correlation coefficients. Click ok and then the output
window will open.
Step 6 -
a. For Scatter plot, go to Graph tab and select chart builder
b. Select Scatter/Dot from “Choose from” menu and double-click on second graph.
• Value of Pearson correlation is between 0.75 and 1(i.e. 0.801), hence there is high
degree of positive correlation relationship between order frequency and
information.
• The scatter diagram also shows the dispersion of data.
MODULE-9
Q16- How to compute average mean for each construct variable? Also explain its steps
showing in data view window. Calculate mean for each variable in the study data.
Step- Create a data file in Excel and import it in SPSS.
Step 3- Add values for OB1-3, MET1-6 and USE1-3 given below in the image.
Step4- Go to transform tab and commute variable
Step5- Calculate the mean of the variable.
Sandhya wants to conduct research on internet use and she gathers 13 variables and 32
responses for each 13 variables based on: 1 (never), 2 (occasionally), 3 (considerably), 4
(almost always), 5 (always). The details of the variables are shown below.
Name Label
❖ Steps
Step 3 - A dialog box will open. Add all the variables to know the reliability.
Step 4 - Click on statistics. And click the following dialog boxes.
Step 5 - Click continue to view results
❖ Interpretation
The value of Cronbach’s alpha should be .7 or more to report reliability of the data. As it
can be seen, the value of Cronbach’s alpha is .732 which is more than the standard, which
signifies that data is reliable.
Module 11
Q18- Defining Chi-square, its steps and understanding the significance of p-value
in chi-square. Understanding acceptance and rejection of null hypothesis based on
p-value.
Respondents were asked their gender whether or not they are cigarette smokers. There
were 3 choices- smokers, past smokers and non-smokers. Suppose we want to test for
association between gender (male and female) and smoking behaviour using chi-square
test of independence.
Meaning --- Chi square test for association is used when you want to check
association between 2 categorical variables on nominal scale. However, it is important
to note that in the case of 2 variables, we compare the test can also be interpreted as
determining if there is a difference between 2 variables. The test is also referred as chi-
square test of independence and also known as Pearson Chi Square test. It is used to
determine whether there is a testimony significant difference between the expected
frequency and the observed frequency would be assuming the null hypothesis.
Problem Statement- To identify the association between gender and smoking
behaviour.
❖ Hypothesis:
H0- There is insignificant association between gender and smoking behaviour.
H1- There is significant association between gender and smoking behaviour.
❖ STEPS
1) To create a data file showing gender and smoking behaviour in excel
2) Import excel file in SPSS
5) Drag and drop smoking behavior into the row box and gender in column box
6) Click on statistics and select chi square
As it can be seen from the above table the p value (0.006) which is lower than the
alpha value (0.05) .Hence H1 was supported.
MODULE 12
Q19-
A. Define t test discuss the procedure for t test with one sapmle of spss.(car,with
ethanol and without ethanol)
B. To undestand the procedure for repeated measures t test (dependent
samples/paired t test in spss
C. To know the procedure for independent groups in t test.
ANS- (A)
• T- test-
T- test are used to determine the significant difference between 2 sets of scores. T test may be
one sample, independent groups and repeated measures test.
• Basic assumptions of t test
1. Data should be at interval or ratio level of measure
2. Data should be randomly sampled.
3. Data is numerical data representing samples from normally distributed population.
This test is used when data from single sample of participants is there and you want to
know whether the mean of the population from which the sample is drawn is the same as
hypothesized mean.
Q 1. Indian oil has developed a formulation with increased use of ethanol in petroleum
products, which increases engine efficiency with less harmful emissions. 30 cars were
test driven with and without the ethanol and the number of kilometers per litre were
recorded. The cars used for tests were having either automatic or manual transmission.
Steps:
Output :
Interpretation:
There are two tables, the table named as one sample statistics shows mean and standard
deviation values of one sample t test along with standard error mean.
In the next table named; one sample test, if the calculated value of t is greater than the table
value, we accept the alternate hypothesis, but if the calculated value of t is lesser than the
table value, then we accept the null hypothesis. Now in our working example, the value of
two tail significance is less than .05(p<.05), as such the difference between means is
significant. The output indicates that there is a significant difference in engine efficiency
between previous and current trial. The cars with current trial have more engine efficiency
than those in earlier trial with t(29) = 4.597, p< .05.
(B).
What is paired t test sample in SPSS?
This test is used to ask whether two sets of values are random samples from same or
different populations.
• If they are random samples from same population, then any differences across
conditions or groups can be attributed to random sampling variability.
• And, if two sets of values are random samples from different populations, then you
can attribute any difference between means across conditions to the independent
variable.
Paired t test is used when you have data from one group of participants, that individual
obtains two values under different levels of the independent variable.
STEPS;
Output:
• Interpretation
The first table named, paired sample statistics shows statistics of both with ethanol and
without ethanol
Next table, the paired sample correlations shows the correlation value of .934, p<.05.
The last table of paired sample test shows the value of two tail significance is less than
.05 (p<.05), as such the difference between means is significant. The output indicates
ythat there is a significant difference in engine efficiency between with ethanol and
without ethanol trial. The cars with ethanol additive have more engine efficiency than
those without ethanol, with t(29)= 3.753, p<.05.
(c).
• Steps for independent sample test-
Post Hoc analysis involves hunting through data for some significance. This test carries risk
of type 1 errors. Post hoc tests are designed to protect against type 1 errors, given that all the
possible comparisons are going to be made. These tests are stricter than planned comparisons
and it is difficult to obtain significance. Some post hoc tests are:
1) Scheffe test – allows every possible comparison to be made but is tough on rejecting
the null hypothesis
2) Tukey test/honestly significant difference test – lenient but the types of comparison
that can be made are restricted. This chapter will show Tukey test also
Q) Vijender Gupta wants to compare the scores of CBSE students from four metro
cities of India i.e. Delhi, Kolkata, Mumbai, Chennai. He obtained 20 participants on
random sampling from each of the four metro cities, collecting 100 responses. Also note
that, this is independent design, since the respondents are from different cities. He made
following hypothesis:
Null Hypothesis: There is no significant difference in scores from different metro cities of
India
Alternate Hypothesis: There is significant difference in scores from different metro cities of
India
CITY SCORE
1 400
1 450
1 499
1 480
1 495
1 300
1 350
1 356
1 269
1 298
1 299
1 599
1 466
1 591
1 502
1 598
1 548
1 459
1 489
1 499
2 389
2 398
2 399
2 599
2 598
2 457
2 498
2 400
2 300
2 369
2 368
2 348
2 499
2 475
2 489
2 498
2 399
2 398
2 378
2 498
3 488
3 469
3 425
3 450
3 399
3 385
3 358
3 299
3 298
3 389
3 398
3 349
3 358
3 498
3 452
3 411
3 398
3 379
3 295
3 250
4 450
4 400
4 450
4 428
4 398
4 359
4 360
4 302
4 310
4 295
4 259
4 301
4 322
4 365
4 389
4 378
4 345
4 498
4 489
4 456
❖ According to the test of homogeneity of variances the significance value is 0.077
(p>0.05) which means the null hypothesis which says that there is no significant
difference in scores from different metro cities of India will be accepted.
❖ According to the test of ANOVA the significance value is 0.15 (p<0.05) which means
the alternative hypothesis which says there is significant difference in scores from
different metro cities of India will be accepted and the null hypothesis will be
rejected.