0% found this document useful (0 votes)
17 views46 pages

RM Project File

Uploaded by

Nespin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views46 pages

RM Project File

Uploaded by

Nespin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Sri Guru Tegh Bahadur Institute of

Management and Information


Technology

Research Methodology
PRACTICAL FILE

SUBMITTED TO: SUBMITTED BY:


Mansi ahuja mam NISHANT
SHRIVASTAVA
ENROLLMENT NO-07191101722
(ASSISTANT PROFESSOR )
WORKSHEET 1
FREQUENCY DISTRIBUTION

Frequency distribution is a method of displaying the frequency (number of times a


particular value of varible repeats in the data) of different values of a varible in a dataset.
It represents the counts of all outcomes of variable in a sample. The frequency
distribution of a variable can be represented in tabluar as well as graphical forms.

Frequency distribution is very common and important method for analyzing the nominal
(categorical) and ordinal (ranking) variables in a dataset. In every questionnaire, one
section is dedicated to demographic profiles. The different categories of demographic
profiles in a dataset are normally represented by frequency distribution in a tabular as
well as graphical forms.

Objective: To calculate frequency distribution and present bar chart of education


profiles of the work.

Dataset of workers working in small and medium scale enterpreises in city og India is
shown below in table 1.1

S No. Gender Age Group Religion Education


1 Male less than 25 yrs. other religion high school
Old
2 Male 26-35 yrs. muslim below 10th

3 Male 36-45 yrs. other religion technical


diploma

4 Male 45 yrs above hindu intermediate

Table1.1: Data of workers in small- and medium- scale enterprises


S Gende Age Religio Educatio S Gende Age Religio Educatio
No r Grou n n No r Grou n n
. p . p
1 1 1 3 2 26 1 5 3 2
2 1 4 2 1 27 1 1 1 2
3 1 3 3 4 28 1 5 2 2
4 1 3 1 3 29 1 1 2 4
5 2 4 1 1 30 1 5 2 2
6 1 4 1 1 31 1 2 3 5
7 2 2 1 1 32 1 3 2 1
8 1 2 3 1 33 2 2 2 2
9 1 2 2 1 34 1 5 2 1
10 2 2 2 2 35 2 5 1 2
11 1 3 1 2 36 2 5 2 3
12 1 3 1 3 37 2 2 3 4
13 1 4 1 4 38 2 5 2 3
14 2 1 2 3 39 1 3 3 3
15 1 5 2 2 40 1 5 2 2
16 2 2 2 2 41 1 2 1 1
17 1 1 1 5 42 1 2 3 1
18 1 5 1 5 43 1 3 2 1
19 1 5 2 2 44 1 5 2 5
20 1 2 2 5 45 1 2 1 2
21 2 5 2 2 46 2 5 2 3
22 1 2 2 1 47 2 2 1 2
23 1 2 3 1 48 1 3 3 4
24 1 2 1 5 49 1 4 2 4
25 2 5 2 5 50 2 1 1 1

The coding details of different variables in the dataset are shown below table 1.2

Table1.2: Dataset of coding of different variables

Variables Numeric codes


Gender 1= male
2= female

Age group 1=less than 25 yrs. Old


2=26-35yrs.
3=36-45 yrs.
4=46-55 yrs.
5=56 and above

Religion 1=hindu
2=muslim
3=other religion

Education 1=below 10th


2=high school
3= intermediate
4= technical diploma
5= degree level

STEP 1: Click Analyse → Descriptive statistics →Frequency


SPSS Commands

Figure 1.1 Screenshot of frequency distribution (step1)


STEP 2: Transfer the variable education to variable window.

Figure 1.2 Screenshot of frequency distribution (step2)

STEP 3: Select the type of chart.


Example: Bar chart as shown in the Figure 1.3.

Figure 1.3 Screenshot of frequency distribution (step3)

STEP 4: Finally click ‘continue’ and then ‘ok’.

The final SPSS output in tabular form is shown below in Table 1.3.

Table1.3: SPSS Output of frequency distribution


Education of worker
Frequency Percent Valid Cumulative
Percent Percent
10th grade 14 28.0 28.0 28.0
high school 16 32.0 32.0 60.0
Intermediate 7 14.0 14.0 74.0
Valid technical 6 12.0 12.0 86.0
diploma
degree level 7 14.0 14.0 100.0
Total 50 100.0 100.0
SPSS output in graphical form is shown below in Figure 1.5.

Figure1.5 Bar Chart of education of workers

Conclusion: The education level of 50 different workers are calculated and found that
the number of workers below 10th grade, high school, intermediate, technical diploma and
degree level are 14(28%), 17(34%), 7(14%), 5(10%) and 7(14%) respectively.
WORKSHEET 2
Measures of Central Tendency

There are three mail measures of central tendency. These are as follows:

 Arithmetic mean
 Median
 Mode

Let us dicuss these three in detail.

Arithmetic Mean

The mean of variable represents its average value. It can be calculated by using the
following formula:
n

∑ fiXi
X = i=0
∑f
Where, X represents the mean and fi represents the frequncy of an ith observation of the
variable.

One of the problems with arithmetic mean is that it is highly sensitive to the presence of
outliers in the data of the relatedvariable. To avoid this problem, the trimmed mean of the
variable can be estimated. Trimmed mean is the value of the mean of a variable after
removing some extreme observation (e.g., 2.5 percent from both the tails of the
distribution) from the frequency distribution.

Medain

Medain is known as the ‘positional average’ of a variable. If we arrange the observations


of a variable in an ascending or descending order, the value of the observation that lies in
the middle of the series is known as median. The value of the medain divides the
observations of a variable into two equal haves. Half of the observations of the variable
are higher than the medain value and the other half observations are lower than median
value. The xtension of median are quartiles, deciles, and percentiles.
Mode

The mode of avariable is the observation with highest frequency or highest concentration
of frequencies.

Objective: To calculate mean, median, mode and quartile of monthly sales of


company.

Dataset of monthly sales figures (in crores) of an enterprise for 50 consecutives are given
in Table 2.1.

Table2.1: Monthly Sales Figures of 50 consecutive months of an enterprise


Month Sales Month Sales Month Sales Month Sales
1 60 14 24 27 23 40 150
2 70 15 12 28 32 41 34
3 45 16 8 29 54 42 56
4 90 17 15 30 34 43 97
5 110 18 40 31 45 44 34
6 40 19 54 32 49 45 54
7 90 20 56 33 68 46 70
8 50 21 25 34 65 47 98
9 70 22 43 35 70 48 45
10 65 23 56 36 60 49 89
11 54 24 120 37 30 50 100
12 72 25 120 38 40
13 45 26 130 39 110

SPSS Commands
STEP 1: Click Analyse → Descriptive statistics → Frequency

Figure 2.1 Screenshot of measures of central tendency (step1)

STEP 2: Transfer the variable to variable window and click ‘statistics’ as shown in the
figure 2.2.

Figure2.2 Screenshot of measures of central tendency (step2)


STEP 3: Select the option ‘mean’, ‘median’, ‘mode’ and ‘quartiles’ and click ‘continue’
and then ‘ok’ as shown in the figure 2.3.

Figure 2.3 Screenshot of measures of central tendency (step3)

SPSS output is shown in table 2.2

Table 2.2: SPSS Output of measures of central tendency

Statistics
sales per month
Valid 50
N
Missing 0
Mean 61.42
Median 55.00
Mode 45a
25 40.00
Percentiles 50 55.00
75 76.25
a. Multiple modes exist. The
smallest value is shown
Conclusion: Table 2.2 represents SPSS Output.
Mean value of sales figure is 61.38.

Median value of sales figure is 55.00.

Mode value of sales figure is 45a.

Percentile (25) value is – 40.00.

Percentile (50) value is - 55.00.

Percentile (75) value is - 75.25


WORKSHEET 3
Outlier Texting

Outlier are:

 The extreme observations lying in the extreme tails of the probability distribution
of the variables.
 The observations with the highest residuals for a relation model (regression
model)
 The observations that, if not included in the analysis, caus a significant difference
in the result.

On the basis of the cases mentioned above, outliers can be divided into three different
types:
1. Extreme values or univariate outliers
2. Multivariate outliers
3. Influencers

Two popular method of detecting outliers are


1. Extreme values
2. Box plot

Objective: To detect if any outlier(s) is present in the given data.

Dataset of 50 players are given in table 3.1

Table 3.1: Data of 50 players

Sno. gender Age sports hours


1 1 1 1 2.0
2 1 2 2 3.0
3 2 1 3 4.0
4 1 3 4 4.5
5 1 4 5 2.5
6 2 1 2 3.0
7 1 2 2 2.5
8 1 3 3 5
9 2 2 5 2.0
10 2 2 5 1.0
11 1 2 5 1.0
12 2 2 4 1.5
13 2 3 4 1.5
14 2 1 1 3.5
15 1 2 3 1.5
16 1 3 1 2.0
17 1 3 1 2.0
18 1 3 5 2.5
19 2 1 2 3.5
20 1 1 4 3.0
21 2 2 3 3.0
22 1 2 5 13.0
23 1 2 1 4.0
24 2 3 2 2.0
25 1 3 2 3.0
26 1 2 2 4.5
27 2 3 2 4.0
28 1 2 4 4.0
29 1 3 3 5.0
30 1 1 1 1.0

STEP 1: Click Analyse → Descriptive Statistics → Explore


SPSS Commands

Figure 3.1 Screenshot of oultier testing (step1)


STEP 2: Send the hours spend variable in the dependent list and then click statistics.

Figure 3.2 Screenshot of outlier testing (step2)

STEP 3: Select ‘Outliers’ and click ‘continue’ as shown in figure3.3

Figure3.3 Screenshot of outlier testing (step3)

The required output is shown in table 3.2 and box plot diagram is shown in figure 3.4.
Table3.2: SPSS output of outlier testing
Extreme Values
Case Valu
Numbe e
r
1 22 13.0
2 29 5.0
Hig
3 4 4.5
hest
4 26 4.5
hours spent 5 27 4.5
for playing 1 8 .5
2 30 1.0
Low
3 11 1.0
est
4 10 1.0
5 15 1.5a
a. Only a partial list of cases with the
value 1.5 is shown in the table of
lower extremes.

Figure3.4 Screenshot of SPSS Output-Box Plot Diagram

Conclusion- Table 3.2 represents SPSS Output of outliers. It represents extreme high and
extreme low values in the sportsman dataset. Case number 22, 29, 4, 26, and 27 have
extreme high values and Case number 8,30,11,10 and 15 have extreme lower values.
Figure 3.4 represents that Case number 22 is an outlier.
WORKSHEET 4.1
Test of Difference: One sample T-Test

In many situations, we come across claims made by marketers about their products. For
example, a car manufacturer may claim that the average mileage of a car is, for say, 199.9
kmpl or a business school may claim that the average package offered to its students is
Rs. 12 lakh per annum. A researcher may be interested in analyzing the truthfulness of
these claims. For this analysis, the researcher needs to randomly pick a small sample
from the population and compare its mean with the claimed population mean. The sample
mean and the population mean maybe different from each other. In order to test whether
this difference is statistically significant, we should apply one-sample test.

The null hypothesis of one-sample test is:

“Ho : There is no significant difference between sample mean and population mean.”

The t-statistic in one-sample t-test can be estimated by using the following formula:

x−μ
t=

σ
N−1

Where, x =sample mean, μ = population mean, σ = standard deviation of sample mean and
N = sample size.

Objective: To find out the difference between population mean(μ) and sample mean
(x ).

HO: There is no difference between population mean and sample mean.

H1: There is difference between population mean and sample mean.

Dataset of weight lost (in figure) by 50 customers a month after joining the weight loss
program is shown in table 4.1.1
Table4.1.1: Data of weight lost by 50 customers a month after joining the weight loss
program
S.No Weight lost S.No Weight lost S.No Weight lost
1 2 18 4 35 4
2 3 19 3 36 4
3 2 20 4 37 3
4 4 21 5 38 4
5 5 22 6 39 3
6 3 23 4 40 4
7 3 24 5 41 5
8 2 25 6 42 4
9 3 26 5 43 3
10 4 27 4 44 4
11 2 28 4 45 5
12 3 29 5 46 5
13 3 30 5 47 4
14 4 31 6 48 5
15 3 32 2 49 6
16 4 33 5 50 5
17 5 34 5

STEP1: Click Analyse → Compare Means → One sample t-test


SPSS Commands

Figure4.1.1 Screenshot of One sample t-test (step1)


STEP 2: Transfer the variable ‘weight loss’ to test variable window

Figure4.1.2 Screenshot of one sample t-test (step2)

STEP 3: Click ‘Options’

Figure4.1.3 Screenshot of one sample t-test (step 3)


STEP 4: Click Continue and then ‘OK’

Figure4.1.4 Screenshot of one sample t-test (step 4)

The final SPSS output (Statistical Package of Social Science) in tabular form is shown
below in table 4.2.2, table 4.2.3 and table 4.2.4 respectively.

Table4.1.2
One-Sample Statistics
N Mean Std. Std. Error
Deviation Mean
WeightLost 50 4.0200 1.11557 .15777

Table4.1.3
One-Sample Test
Test Value = 0
T Df Sig. (2-tailed) Mean 95% Confidence Interval of
Difference the Difference
Lower Upper
WeightLost 25.481 49 .000 4.02000 3.7030 4.3370

Since p value is less than significance level of 0.05, then Ho gets rejected.

Conclusion: Sample mean is 4.02 kgs which is less than the claimed population mean of
5 kgs. The t statistics is found to be 25.481 with p value of .000.Since the p value of t
statistics is less than 5% level of significance, hence with 95% confidence level the null
hypothesis of no difference between sample mean and population mean cannot be
accepted and it can be concluded that sample mean is significantly different from
population mean. Therefore, the company is making a wrong statement about the weight
loss of its customers.
WORKSHEET 4.2
Test of Difference: Paired Sample T-Test

A paired sample t-test is also known as repeated sample t-test because data (responses) is
collected from the same respondents but at different time periods. A paired sample t-test
should be used when we want to test the impact of an event or experiment on the variable
under study. In this case, the data is collected from the same respondents before and after
the event. After this, means are compared. The null hypothesis of paired samples t-test is
that the means of pre-sample and post-sample are equal. Some of the instances where
paired samples t-test can be applied are as follows:

a. Analyzing the effectiveness of training program on the performance of employees


of a business enterprise.
b. Analyzing the impact of a new advertisement on the sales of a product.
c. Analyzing the impact of a policy on the volatility in the stock market.
d. Analyzing the difference of the responses of the same group to the two different
treatments.
Objective: To find out the difference between before training and after training test.
HO: There is no difference between before training and after training test.
H1: There is difference between before training and after training test.

Dataset of the performance of the 30 employees of a training program is shown in table


4.2.1.

Table4.2.1 Data of the performance of employees

Pre-training Post-training Pre-training Post-training Pre-training Post-training


score score score score score score
56.00 82.00 38.00 67.00 65.00 68.00
45.00 76.00 44.00 56.00 53.00 56.00
56.00 78.00 76.00 91.00 49.00 53.00
34.00 64.00 34.00 48.00 42.00 56.00
56.00 62.00 38.00 68.00 53.00 76.00
42.00 60.00 42.00 65.00 58.00 82.00
43.00 68.00 83.00 90.00 34.00 45.00
56.00 69.00 72.00 87.00 43.00 76.00
70.00 78.00 47.00 64.00 45.00 67.00
56.00 87.00 48.00 53.00 65.00 72.00

SPSS Commands
STEP 1:Click Analyse → Compare Means → Paired sample t-test

Figure4.2.1 Screenshot of paired sample t-test(step1)

STEP 2: Click on the variable ‘pre-training score’. Then click on the post training
variable. Now, move the paired variable into the ‘paired variable’ box by clicking on the
right arrow button. Finally click on ‘OK’ as shown in the figure4.2.2
Figure
4.2.2 Screenshot of paired sample t-test (step2)

The final SPSS output (Statistical Package of Social Science) in tabular form is shown
below in table 4.2.2, table 4.2.3 and table 4.2.4 respectively.

Table4.2.2 : SPSS Output of Paired Sample t-test


Paired Samples Statistics

Mean N Std. Deviation Std. Error


Mean

Before 40050.0000 20 14837.62997 3317.79492


Pair 1
training

After training 43200.0000 20 17334.48039 3876.10765

Table4.2.3: SPSS Output of Paired Sample t-test


Paired Samples Correlations

N Correlation Sig.

Before training & After 20 .982 .000


Pair 1
training
Table4.2.4: SPSS Output of Paired Sample t-test

Paired Samples Test


Paired Differences T Df Sig. (2-tailed)

Mean Std. Std. Error 95% Confidence Interval


Deviation Mean of the Difference

Lower Upper

Before training – -3150.000 3937.33813 880.41557 -4992.73097 -1307.26903 -3.578 19 .002


Pair 1
After training

Conclusion- Paired sample statistics and paired sample correlation are shown in table
4.2.2 and 4.2.3 respectively. As shown in table 4.2.2 The mean sales figure before the
training program is 40050 whereas, the sales figure after training program is 43200.The
sales figure increases after the training program. Table 4.2.3 indicates sample correlation
coefficient between pre-test and post-test (0.982) and a test of significance of correlation
( p<.001).

Table 4.2.4 shows the result of paired sample t-test. The null hypothesis of paired sample
t-test assumes that the pre-sample sample mean is same as the post-sample mean. The p
value of t statistics is found to be less than 5% level of significance. Hence, the null
hypothesis cannot be accepted. Therefore it can be concluded that training program
is highly effective in increasing the sales figures of the company

\
WORKSHEET 4.3

Test of difference: Independent sample t-test

When we want to test the difference between two independent sample means, we use
independent-sample t-test. The independent samples may belong to the same population
or different population. Some of the instances in which the independent samples t-test can
be used are as follows:

1. Testing difference in the average level of performance between employees with the
MBA degree and employees without the MBA degree.

2. Testing difference in the average wages received by labor in two different industries.

3. Testing difference in the average monthly sales of the two firms.

‘HO: There is no significant difference between sample means of two independent


groups.’

The t-statistic in the case of independent-sample t-test can be calculated by using the
following formula:

X 1 −X 2
t x −x =
1 2

√( ( N 1−1 ) s 21 +(N 2−1)s 22


N 1+ N 2−2 )( 1 1
+
N1 N2 )
Where N 1 and N 2 are the sample size of two independent samples.

In SPSS, the independent samples t-test is conducted in two stages. At stage one, SPSS
software compares variances of two samples. The statistical method of comparing two
sample variances is known as Levene’s homogeneity test of variance. The null hypothesis
of this test is ‘Equal variance assumed’, i.e., there are no significant differences between
the sample variances of two independent samples. In other words, the two samples are
comparable. On the basis of Levene’s test of homogeneity, the SPSS gives two values of
t-statistic. In case of equal variances, both the values are the same. In case the sample
variances are different, the lower t-statistic value should be considered for final analysis.

Objective: To analyze the difference in the average performance of the employees of


an enterprise in the age groups, below and above 40 years of age.

Ho: There is no significant difference in the average performance of the employees for
age groups below and above 40 years of age.

H1: There is a significant difference in the average performance of the employees for age
groups below and above 40 years of age.

A researcher is interested to analyze the difference in the average performance of


employees of an enterprise in different demographic profiles. He divides employees on
the basis of gender and their age group. The data is given below in table 4.3.1

SPSS Commands

STEP 1: Click Analyse → Compare Means →Independent sample t-test


Figure4.3.1 Screenshot of independent sample t-test (step1)

STEP 2: Send the test variable ‘performance score’ to the test variable(s) window. Then
send ‘age’ variable in the grouping variable and click ‘Define Groups’ as shown in
figure4.3.2

Figure4.3.2 Screenshot of independent sample t-test (step2)

STEP 3: Now define the cut point as 40. Next click ‘Continue’ as shown in fig.4.3.3
Figure4.3.3 Screenshot of independent sample t-test (step3)
The final SPSS output (Statistical Package of Social Science) in tabular form is shown
below in table 4.3.2and table 4.3.3 respectively.

Table4.3.2 SPSS Output of independent sample t-test

Group Statistics
Age N Mean Std. Deviation Std. Error Mean
>= 40 22 68.86 19.075 4.067
Performance Score
< 40 28 55.07 16.777 3.171

Table4.3.3 SPSS Output of independent sample t-test

Table 4.3.3 Independent Samples Test


Levene's Test t-test for Equality of Means
for Equality of
Variances
F Sig. T Df Sig. Mean Std. 95%
(2- Differe Error Confidence
tailed) nce Differe Interval of the
nce Difference
Lower Upper
Equal 1.408 .241 2.71 48 .009 13.792 5.077 3.585 23.999
variances 7
Performance assumed
Score Equal 2.67 42.1 .011 13.792 5.157 3.387 24.197
variances not 5 70
assumed
Conclusion: Average performance score for less than 40years of age 68.86 with standard
deviation 19.075 and average performance for more than 40 years of age is 55.07 with
standard deviation 16.777. Table 4.3.3 shows that represents that result of levene’s test
which assumes the null hypothesis that all sample variance are same, the significance
value of 0.241 indicates that 95% level of confidence, the null hypothesis of equal
variance are accepted. Table 4.3.3 also shows that t statistics is 2.717 is less than 5%
level of significance. Hence, with 95% of confidence level the null hypothesis of no
significant difference in the average performance of the employees below and above 40
years of age is not accepted.

WORKSHEET 5
One way Anova
Concept of ANOVA

Independent-samples t-test can be applied to situations where there are only two
independent samples. In other words, we can use independent-samples t-tests for
comparing the means of two populations (such as males and females). When we have
more than two independent samples, t-test is inappropriate. The Analysis of Variance
(ANOVA) has an advantage over t-test when the researcher wants to compare the means
of a large number of populations (i.e., three or more). ANOVA is a parametric test that is
used to study the difference among more than two groups in the datasets. It helps in
explaining the amount of variation in the dataset. In a dataset, two main types of
variations can occur. One type of variation occurs due to chance and the other type of
variation occurs due to specific reasons. These variations are studied separately in
ANOVA to identify the actual cause of variation and help the researcher in taking
effective decisions.

In case of more than two independent samples, the ANOVA test explains three types of
variance. These are as follows:

 Total variance
 Between group variance
 Within group variance
The ANOVA test is based on the logic that if the between group variance is
significantly greater than the within group variance, it indicates that the means of
different samples are significantly different.

There are two main types of ANOVA, namely, one-way ANOVA and two-way
ANOVA. One-way ANOVA determines whether all the independent samples (groups)
have the same group means or not. On the other hand, two-way ANOVA is used when
you need to study the impact of two categorical variables on a scale variable.

Objective: To find out the difference between salaries of graduates, post graduates and
PhDs.
HO: There is no difference between salaries of graduates, post graduates and PhDs.
H1: There is difference between salaries of graduates, post graduates and PhDs.

Table 5.1 Data of salaries and qualification


Salary Qualification
65000.00 Postgraduate 2
60000.00 Postgraduate 2
45000.00 graduate 1
40000.00 Phd 3
35000.00 graduate 1
56000.00 Postgraduate 2
36000.00 Phd 3
45000.00 Phd 3
40000.00 Post graduate 2
35000.00 Graduate 1
56000.00 Phd 3
36000.00 Phd 3
25000.00 Graduate 1
23000.00 Graduate 1
40000.00 Graduate 1
35000.00 Post graduate 2
56000.00 Phd 3
36000.00 Post graduate 2
45000.00 Phd 3
40000.00 Graduate 1
35000.00 Post graduate 2
56000.00 Phd 3
37000.00 Post graduate 2
25000.00 Graduate 1
85000.00 Phd 3
32000.00 Post graduate 2
29000.00 Graduate 1
25000.00 Graduate 1

SPSS Commands
STEP 1: Click Analyse → Compare Means → One-way ANOVA
Figure5.1 Screenshot of One-way ANOVA (step1)

STEP 2: Transfer the variable ‘salary’ to dependent list window and variable
‘qualification’ to factor window.

Figure5.2 Screenshot of One-way ANOVA (step2)

STEP 3: Select ‘Post hoc’ and then click ‘Tukey’ as shown below in figure5.3
Figure5.3 Screenshot of One-way ANOVA (step3)

STEP 4: Click ‘Options’ and select ‘Homogeneity of variance test’ and ‘Means plot’ as
shown below in figure5.4.

Figure5.4 Screenshot of One-way ANOVA (step4)


The final SPSS output (Statistical Package of Social Science) in tabular form is shown
below in table 5.2, table 5.3, table5.4, table 5.5, table 5.6 and figure 5.5 respectively

Table5.2 SPSS Output of One-way ANOVA

Descriptives
Salary
N Mean Std. Std. Error
95% Confidence Interval Minim Maxim
Deviation for Mean um um
Lower Upper
Bound Bound
Graduate 10 32200.0000 7828.72205 2475.65928 26599.6696 37800.3304 23000 45000
Post 9 44000.0000 12629.33094 4209.77698 34292.2369 53707.7631 32000 65000
Graduate
Phd 9 50555.5556 15297.96646 5099.32215 38796.4976 62314.6135 36000 85000
Total 28 41892.8571 14082.66411 2661.37336 36432.1701 47353.5442 23000 85000

Table5.3 SPSS Output of One-way ANOVA

Test of Homogeneity of Variances


Salary
Levene Statistic df1 df2 Sig.

1.450 2 25 .254

Table5.4 SPSS Output of One-way ANOVA


ANOVA
Salary
Sum of Squares df Mean Square F Sig.

1654856349.206 2 827428174.603 5.591 .010


Between Groups
3699822222.222 25 147992888.889
Within Groups
5354678571.429 27
Total
Table5.5 SPSS Output of One-way ANOVA
Multiple Comparisons
Dependent Variable: Salary
Tukey HSD
(I) (J) Mean Std. Error Sig. 95% Confidence Interval
Qualification Qualification Difference (I-J) Lower Upper
Bound Bound
Post -11800.00000 5589.53873 .108 -25722.5914 2122.5914
Graduate Graduate
Phd -18355.55556* 5589.53873 .008 -32278.1470 -4432.9641
Post Graduate 11800.00000 5589.53873 .108 -2122.5914 25722.5914
Graduate Phd -6555.55556 5734.74573 .497 -20839.8330 7728.7219
*
Graduate 18355.55556 5589.53873 .008 4432.9641 32278.1470
Phd Post 6555.55556 5734.74573 .497 -7728.7219 20839.8330
Graduate
*. The mean difference is significant at the 0.05 level.

Table5.6 SPSS Output of One-way ANOVA


Salary
Tukey HSDa,b
Qualification N Subset for alpha = 0.05
1 2
Graduate 10 32200.0000
Post Graduate 9 44000.0000 44000.0000
Phd 9 50555.5556
Sig. .112 .486
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 9.310.
b. The group sizes are unequal. The harmonic mean of the group sizes is used.
Type I error levels are not guaranteed.
Figure5.5 Screenshot of SPSS Output-Graphical form

Conclusion: Table 5.2 indicates that average salary of graduate is 32200, of post-
graduate is 44000 and finally of Phd is 50555.5556. This indicates that average salary of
Phd is highest and average salary of graduates is lowest. Table 5.3 represents the Levene
Test which assumes the null hypothesis that all sample variances are same.The
significance value of 0.254 indicates that 95% level of confidence the null hypothesis can
be accepted. The homogeneity of variance is one of the desired condition of one way
ANOVA test. Table 5.4 represents the results of F test in one-way ANOVA. As shown in
Table 5.4 the p value of F statistics (5.591) is more than 5% level of significance. Hence
with 95% confidence level, the null hypothesis of equal group means will be accepted.
Thus it can be concluded that average salary of graduates, post-graduates and Phds are
same.
WORKSHEET-6

Chi-Square Test

Chi-square test is one of the most popular non-parametric tests. It is used in two cases
which are as follows:
 To test the association between nominal variables in research.
 To test the difference between the expected and observed frequencies of an event.
The process of chi-square test compares the actual observed frequencies with the
calculated expected frequencies of different combinations of nominal variables. The
difference between observed and expected frequencies gives logic of possible association
between categorical variables. The chi-square statistics compares the observed count in
each table cell to the count that would be expected between the row and column
classifications under the assumptions of no associations. A negligible difference between
observed and expected frequencies may indicate no association, whereas a big difference
may indicate the possibility of association.
Objective: To analyze the association between education background and level of
familiarity with the internet.
Ho: There is no significant association between education background and level of
formality with the internet.
H1: There is significant association between education background and level of formality
with the internet.
Table 6.2 has the data collected from 100 internet users. The data consists of two nominal
variables ‘Level of familiarity with the internet’ and ‘Education Background.’ The details
of the codes provided to different sub-categories of these nominal variables are shown in
table 6.1.
Table 6.1 Codes provided to sub-categories
Codes for the variable ‘Level of Codes for the variable ‘Education
familiarity with the internet’ Background’
1= Low Familiarity 1= Humanities
2= Medium 2= Management
3=High 3= Technology
4= IT

Table6.2 Data of 100 internet users

Sno. Level of familiarity Education background


with the internet
1. 3.00 1.00
2. 2.00 3.00
3. 3.00 1.00
4. 3.00 1.00
5. 3.00 4.00
6. 3.00 4.00
7. 3.00 1.00
8. 3.00 1.00
9. 3.00 1.00
10. 3.00 3.00
11. 2.00 1.00
12. 1.00 1.00
13. 3.00 1.00
14. 3.00 1.00
15. 3.00 3.00
16. 2.00 4.00
17. 2.00 2.00
18. 2.00 4.00
19. 2.00 2.00
20. 2.00 4.00
21. 3.00 1.00
22. 3.00 1.00
23. 3.00 4.00
24. 3.00 1.00
25. 3.00 2.00
26. 3.00 2.00
27. 3.00 4.00
28. 3.00 3.00
29. 2.00 2.00
30. 3.00 1.00
31. 1.00 3.00
32. 3.00 2.00
33. 2.00 4.00
34. 3.00 2.00
35. 2.00 2.00
36. 1.00 2.00
37. 2.00 1.00
38. 2.00 4.00
39. 1.00 1.00
40. 2.00 3.00
41. 2.00 2.00
42. 1.00 1.00
43. 2.00 3.00
44. 2.00 4.00
45. 2.00 2.00
46. 3.00 1.00
47. 3.00 3.00
48. 2.00 2.00
49. 3.00 2.00
50. 2.00 2.00
51. 1.00 2.00
52. 2.00 2.00
53. 1.00 4.00
54. 3.00 2.00
55. 2.00 2.00
56. 2.00 4.00
57. 1.00 3.00
58. 1.00 4.00
59. 3.00 4.00
60. 3.00 1.00
61. 1.00 2.00
62. 1.00 2.00
63. 2.00 2.00
64. 1.00 2.00
65. 2.00 2.00
66. 1.00 2.00
67. 2.00 2.00
68. 2.00 3.00
69. 1.00 2.00
70. 3.00 1.00
71. 2.00 2.00
72. 2.00 3.00
73. 1.00 1.00
74. 2.00 2.00
75. 2.00 2.00
76. 2.00 1.00
77. 1.00 1.00
78. 2.00 3.00
79. 1.00 2.00
80. 1.00 1.00
81. 1.00 1.00
82. 1.00 3.00
83. 1.00 1.00
84. 1.00 1.00
85. 1.00 2.00
86. 1.00 1.00
87. 2.00 1.00
88. 1.00 2.00
89. 2.00 1.00
90. 2.00 4.00
91. 2.00 1.00
92. 1.00 3.00
93. 1.00 4.00
94. 2.00 1.00
95. 1.00 1.00
96. 1.00 3.00
97. 1.00 2.00
98. 1.00 1.00
99. 1.00 1.00
100. 1.00 3.00

SPSS Commands
STEP 1: Click Analyse → Descriptive statistics →Cross Tabs

Figure6.1 Screenshot of chi-square test (step1)

STEP 2: Transfer ‘education background’ to the row(s) window and ‘familiarity with the
internet’ to the column(s) window. Click statistics as shown in figure6.2

Figure6.2 Screenshot of chi-square test (step2)


STEP 3: Select the ‘chi-square’ and ‘Phi and Cramer’s V’ and click ‘Continue’ as shown
in figure6.3

Figure6.3 Screenshot of chi-square test (step3)

STEP 4: Click on ‘Cells’ and select ‘Observed’ and ‘Expected’ and click ‘Continue’ as
shown in figure6.4

Figure6.4 Screenshot of chi-square test (step4)


The final SPSS output (Statistical Package of Social Science) in tabular form is shown
below in table 6.3, table 6.4, table6.5 and table6.6 respectively.

Table6.3 SPSS Output of Chi-square test

Case Processing Summary


Cases
Valid Missing Total
N Percent N Percent N Percent
Education background * 100 100.0% 0 0.0% 100 100.0%
level of familiarity with
the internet

Table6.4 SPSS Output of Chi-square test

Education background * level of familiarity with the internet Cross tabulation


Level of familiarity with the internet Total
low medium high
familiarity familiarity familiarity
Count 13 7 15 35
Humanities Expected 11.6 13.0 10.5 35.0
Count
Count 11 16 6 33
Management Expected 10.9 12.2 9.9 33.0
Education Count
background Count 6 6 4 16
Technology Expected 5.3 5.9 4.8 16.0
Count
Count 3 8 5 16
IT Expected 5.3 5.9 4.8 16.0
Count
Count 33 37 30 100
Total Expected 33.0 37.0 30.0 100.0
Count
Table6.5 SPSS Output of Chi-square test

Chi-Square Tests
Value Df Asymp. Sig.
(2-sided)
Pearson Chi-Square 9.515a 6 .147
Likelihood Ratio 10.096 6 .121
Linear-by-Linear .002 1 .963
Association
N of Valid Cases 100
a. 2 cells (16.7%) have expected count less than 5. The
minimum expected count is 4.80.

Table6.6 SPSS Output of Chi-square test

Symmetric Measures
Value Approx.
Sig.
Phi .308 .147
Nominal by
Nominal Cramer's .218 .147
V
N of Valid Cases 100

Conclusion- The p value (.147) is more than 5% level of significance which indicates
that null hypothesis of no association between education background and level of
familiarity with internet is accepted.

You might also like