SAS Visual Analytics –
Explore Data
Exploring and preparing our data in SAS Viya is a crucial
step before building a machine learning model.
Here’s a comprehensive list of steps we can follow to help
us explore and understand our dataset’s capabilities and
limitations:
1. Data Import and Initial Inspection
Import Data: Load your dataset into SAS Viya.
Inspect Data: Use procedures like PROC CONTENTS to understand the
structure, types, and summary statistics of your data.
2. Data Cleaning
Handle Missing Values: Identify and address missing values using
techniques like imputation or removal.
Remove Duplicates: Ensure there are no duplicate records in your dataset.
Correct Errors: Look for and correct any data entry errors or inconsistencies.
3. Data Transformation
Normalization/Standardization: Scale your data to ensure all features
contribute equally to the model.
Encoding Categorical Variables: Convert categorical variables into
numerical formats using one-hot encoding or label encoding.
Feature Engineering: Create new features that may be more predictive for
your model.
4. Exploratory Data Analysis (EDA)
Summary Statistics: Use PROC MEANS or PROC FREQ to get descriptive
statistics.
Data Visualization: Create visualizations like histograms, box plots, scatter
plots, and correlation matrices to understand relationships and distributions.
Correlation Analysis: Identify relationships between variables using
correlation coefficients.
5. Data Reduction
Feature Selection: Use techniques like correlation analysis, mutual
information, or feature importance from preliminary models to select relevant
features.
Dimensionality Reduction: Apply methods like PCA (Principal Component
Analysis) to reduce the number of features while retaining most of the
variance.
6. Data Splitting
Train-Test Split: Divide your data into training and testing sets to evaluate
your model’s performance.
Cross-Validation: Use cross-validation techniques to ensure your model
generalizes well to unseen data.
7. Data Sampling
Resampling Techniques: Apply techniques like bootstrapping or stratified
sampling to ensure your training data is representative of the overall dataset.
8. Data Exploration for Model Relevance
Feature Importance: Use preliminary models to identify which features are
most important for predicting the target variable.
Target Variable Analysis: Analyze the distribution and characteristics of
the target variable to understand its behavior.
9. Data Preparation for Modeling
Create Pipelines: Set up data preprocessing pipelines to automate the
transformation and cleaning steps.
Save Processed Data: Save the cleaned and transformed data for use in
model training.
10. Documentation and Reporting
Document Steps: Keep detailed records of all data exploration and
preparation steps.
Generate Reports: Create reports summarizing your findings and the steps
taken to prepare the data.
By following these steps, you’ll ensure that your data is well-
prepared and relevant for building a your first machine learning
model in SAS Viya