0% found this document useful (0 votes)
32 views23 pages

SPSS Descriptive EDA Correlation

The document provides a comprehensive overview of descriptive statistics in SPSS, detailing key measures such as central tendency (mean, median, mode), dispersion (standard deviation, variance, range), and position (percentiles, quartiles). It also covers various graph types for data visualization, correlation analysis, exploratory data analysis (EDA), and data transformation techniques. The information is structured to guide users through SPSS paths for performing these analyses and understanding their practical relevance.

Uploaded by

shwetasingh153
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views23 pages

SPSS Descriptive EDA Correlation

The document provides a comprehensive overview of descriptive statistics in SPSS, detailing key measures such as central tendency (mean, median, mode), dispersion (standard deviation, variance, range), and position (percentiles, quartiles). It also covers various graph types for data visualization, correlation analysis, exploratory data analysis (EDA), and data transformation techniques. The information is structured to guide users through SPSS paths for performing these analyses and understanding their practical relevance.

Uploaded by

shwetasingh153
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Descriptive Statistics in SPSS

Summarize and describe features of a dataset.


SPSS Path: Analyze > Descriptive Statistics > Frequencies / Descriptives / Explore

Key Measures:
• Central Tendency: Mean, Median, Mode

• Dispersion: Std. Deviation, Variance, Range

• Position: Percentiles, Quartiles

Use: Understand data distribution and variability.


Mean (Average)
Sum of all values divided by number of
values.
Formula: Mean = (Σx) / n

SPSS Path: Analyze > Descriptive Statistics >


Descriptives

Practical Relevance:
• Represents the central value of the dataset.
• Useful in budgeting, income analysis, average
marks, etc.
• Sensitive to outliers.
• Example: Average monthly salary of employees.
Median
The middle value when data is arranged in
order.
SPSS Path: Analyze > Descriptive Statistics >
Frequencies (Check Median)

Practical Relevance:
• Useful when data has outliers or is skewed.
• Represents the 50th percentile.
• Median home price, median income are
common indicators.
• Example: Median house price avoids skew from
luxury properties.
Mode
Most frequently occurring value in a
dataset.
SPSS Path: Analyze > Descriptive Statistics >
Frequencies (Check Mode)
Practical Relevance:
• Best for categorical data (e.g., most preferred
brand).
• Can have more than one mode
(bimodal/multimodal).
• Example: Most common blood type in a
population.
Range
Difference between maximum and
minimum values.
Formula: Range = Max - Min

SPSS Path: Analyze > Descriptive Statistics >


Descriptives

Practical Relevance:
• Quick measure of spread.
• Does not consider all data points.
• Example: Temperature range in a city over
a week.
Standard Deviation & Variance
Variance: Average of squared
differences from the mean.
Std. Deviation: Square root of variance.
SPSS Path: Analyze > Descriptive Statistics >
Descriptives
Practical Relevance:
• Shows consistency or variability in data.
• Used in quality control, risk analysis, etc.
• Example: Comparing consistency of students’
exam scores.
Percentiles & Quartiles
 Percentiles divide data into 100 equal parts.

 Quartiles divide data into four parts (Q1,


Q2, Q3, Q4).

SPSS Path: Analyze > Descriptive Statistics >


Frequencies (check percentiles)

Practical Relevance:
• Used in health indicators, test scores, etc.
• Identifies outliers and spread.
• Example: 90th percentile in aptitude test
implies top 10% performer.
Graphs in SPSS
Purpose: Visual representation of data distributions and trends.
SPSS Paths:

Graphs > Chart Builder

Graphs > Legacy Dialogs

Graph Types:
• Histogram: Frequency distribution

• Bar Chart: Category comparisons

• Pie Chart: Percentage share

• Boxplot: Spread and outliers

• Line Graph: Time series trends


Histogram
A graphical representation showing the
distribution of numerical data.
SPSS Path: Graphs > Legacy Dialogs > Histogram

Practical Relevance:

• Displays frequency of data within equal


intervals.

• Helps visualize shape (normal, skewed,


bimodal).

• Example: Exam score distribution across


students.

• Usage: Detect skewness or distribution


Bar Chart

Represents categorical data with


rectangular bars.
SPSS Path: Graphs > Chart Builder > Bar

Practical Relevance:
• Useful for comparing quantities across different categories.
• Bars can be grouped, clustered, or stacked.
• Example: Number of students across academic
departments.
• Usage: Show frequencies or percentages of categorical
data.
Pie Chart
Circular chart divided into sectors to
show proportions.
SPSS Path: Graphs > Chart Builder > Pie
Practical Relevance:
• Good for showing parts of a whole.
• Best when you have a few categories (ideally
<6).
• Example: Market share of brands in a region.
• Usage: Show contribution of categories to a
total.
Boxplot
Shows distribution based on five-
number summary (min, Q1, median,
Q3, max).
SPSS Path: Graphs > Legacy Dialogs > Boxplot

Practical Relevance:
• Identifies outliers, spread, and central
tendency.
• Useful for group comparisons.
• Example: Compare monthly expenses of male
vs. female students.
• Usage: Quick outlier detection and variability
assessment.
Line Graph
Connects data points in a sequence to
show trends over time.
SPSS Path: Graphs > Chart Builder > Line
Practical Relevance:
• Ideal for time series analysis.
• Reveals patterns such as seasonality or
growth.
• Example: Monthly sales of a product over a
year.
• Usage: Monitor changes and trends over
periods.
Comparison

Graph Type Data Type Best For Avoid When

Category
Bar Chart Categorical Too many categories
comparisons

Proportional
Pie Chart Categorical (few) Many categories
distribution

Histogram Continuous Distribution shape Small data samples

Continuous + Spread & outliers by


Boxplot Very few data points
Grouping group

Unordered
Line Graph Time series Trend over time
categorical data
Correlation in SPSS
Definition: Measures linear relationship between two variables.
SPSS Menu Path: Analyze > Correlate > Bivariate

Types: Pearson, Spearman, Kendall


Interpretation:
• Correlation coefficient (r) ranges from -1 to +1
• Positive, negative, or no correlation
• Significance test (p-value)
• Visual Tool: Graphs > Legacy Dialogs > Scatter/Dot
[Insert correlation matrix screenshot here]
Exploratory Data Analysis (EDA) in SPSS
Goal: Discover patterns, spot anomalies, test assumptions.
SPSS Tools:
• Analyze > Descriptive Statistics > Explore
• Graphs > Legacy Dialogs > Histogram / Boxplot
• Transform > Recode / Compute

Key Activities:
• Identify missing data and outliers
• Analyze skewness & kurtosis
• Summarize with visual and numerical insights
• EDA precedes hypothesis testing and modeling.
Pearson Correlation
Definition: Measures linear correlation between two continuous variables.
SPSS Path: Analyze > Correlate > Bivariate (select Pearson)

Range: -1 to +1, where +1 means perfect positive and -1 perfect negative


correlation.

Practical Relevance:
• Used for normally distributed interval/ratio data.
• Example: Height and weight correlation.
• Usage: Assess strength/direction of relationships for prediction or
regression.
Spearman & Kendall Correlation
Definition: Non-parametric correlation methods.
SPSS Path: Analyze > Correlate > Bivariate (check Spearman/Kendall)

Practical Relevance:
• Spearman: Ranks data and works with monotonic relationships.
• Kendall: Based on concordant/discordant pairs; better for small samples.
• Example: Ranking of student preferences vs satisfaction scores.
• Usage: When data is ordinal or violates Pearson assumptions.
Scatter Plot
Definition: Graph showing relationship between two numeric variables.
SPSS Path: Graphs > Legacy Dialogs > Scatter/Dot

Practical Relevance:
• Visualize type and strength of correlation.
• Detect outliers or clusters.
• Example: Study time vs. exam scores.
• Usage: Understand linearity and suitability for regression.
Missing Values & Outliers
Definition: EDA aims to identify missing or extreme values.

SPSS Path: Analyze > Descriptive Statistics > Explore / Frequencies

Practical Relevance:
• Missing data can bias results.
• Outliers can distort summary statistics.
• Example: Blank survey responses; salary of 10x average.
• Usage: Decide on data imputation, exclusion, or transformation.
Skewness & Kurtosis
Skewness: Measure of data symmetry.
Kurtosis: Measure of data peak/sharpness.

SPSS Path: Analyze > Descriptive Statistics > Explore

Practical Relevance:
• Indicates deviation from normal distribution.
• Example: Right-skewed income distribution in urban areas.
• Usage: Choose statistical tests or transformation methods accordingly.
Data Transformation in EDA
Definition: Modify data for better analysis (e.g., normalize, compute new
variables).
SPSS Path: Transform > Recode into Different Variables / Compute Variable

Practical Relevance:
• Makes data analysis-ready by fixing scale or encoding.
• Example: Recode education levels, compute BMI from height & weight.
• Usage: Prepare data for modeling or visualization.

You might also like