0% found this document useful (0 votes)
44 views14 pages

Predictive Modeling

Predictive modeling uses historical data and algorithms to forecast future outcomes, helping to make better decisions and identify potential problems early. The process involves classification and regression, with a focus on training and testing data to avoid overfitting. Python libraries like Pandas and Scikit-learn facilitate data handling and modeling, making predictive analytics more accessible.

Uploaded by

ckraig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views14 pages

Predictive Modeling

Predictive modeling uses historical data and algorithms to forecast future outcomes, helping to make better decisions and identify potential problems early. The process involves classification and regression, with a focus on training and testing data to avoid overfitting. Python libraries like Pandas and Scikit-learn facilitate data handling and modeling, making predictive analytics more accessible.

Uploaded by

ckraig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Predictive Modeling:

From Data to Decisions


#the basics

CHESTER ALLAN F. BAUTISTA, MIT


#How Does...?
Netflix/Spotify recommend content?

Email providers filter spam?

Universities identify students at risk of


dropping out?
Friend Suggestions(“People You May
Know”)
#Predictive Modeling
Using historical data + algorithms/ML =>
predictions about future/unknown outcomes

Finding patterns in the past to forecast the


future.
Descriptive and Diagnostic Analytics

Predictive Analytics (What will happen?)


#Why Use?
Make Better Choices

Get what you want

Save Time/Effort

Spot Problems Early


#Process
#Two Main Flavors of Prediction

Classification

Predicting Numbers (Regression)


#Past Data(Features & Target)

Target: The thing you want to predict (e.g.,


'Will Rain Tomorrow?').

Features:The pieces of past information used


to make the prediction

Model learns: How do Features relate to the


Target?
# Don't Cheat! Training vs. Testing

Training Data: The practice questions with


answers you study from
Testing Data: A separate set of questions
without answers used for the actual exam
Why? To avoid Overfitting

We need to know if the model works on new


problems it hasn't seen!
#How Good Was the Guess? (Evaluation)

We need a score! How do we grade the


model's test performance?
For Categories (Classification):
Accuracy: What percentage of predictions were correct?
(Simple, but can be tricky if one category is rare).

For Numbers (Regression):


Average Error (like MAE): On average, how far off was the
prediction from the real number? (Easy to understand).
#What about Python?

Python has great tools (libraries) to help.

Pandas: For handling and preparing your data


(the ingredients).
Scikit-learn (sklearn): The 'Swiss Army Knife'
for predictive modeling!
Has tools for: Data Splitting (Train/Test), Prepping Data, Many Model
Recipes (Classification/Regression), Evaluation Scores.
#Recap?

Predictive Modeling = Using the Past -> Predict the


Future.
Why? Better choices, personalization, efficiency.

Recipe: Goal -> Data -> Prep -> Model -> Train -> TEST!

Flavors: Categories (Classification) vs. Numbers


(Regression).

Testing on unseen data is crucial (avoid overfitting).


#Recap?

Evaluate how good the predictions are (metrics like


accuracy/average error).

Python (Pandas, Sklearn) helps make it happen.


#Gratitude?

Thank you !
#References
For Core Concepts & Process (Data Mining Perspective): Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts
and techniques (3rd ed.). Morgan Kaufmann Publishers.

For Core Concepts & Process (Alternative Data Mining Perspective): Tan, P. N., Steinbach, M., & Kumar, V. (2019).
Introduction to data mining (2nd ed.). Pearson.

For Conceptual Understanding with Statistical Learning Focus: James, G., Witten, D., Hastie, T., & Tibshirani, R.
(2021). An introduction to statistical learning: With applications in R (2nd ed.). Springer. 1 (Note: While examples are in R,
the conceptual explanations of classification, regression, train/test splits, overfitting, and basic evaluation are excellent and
widely applicable).

For Concepts Linked Directly to Python/Scikit-learn: Müller, A. C., & Guido, S. (2016). Introduction to machine learning
with Python: A guide for data scientists. O'Reilly Media.

For Python Libraries Mentioned:

Scikit-learn Documentation: Scikit-learn Developers. (n.d.). Scikit-learn: Machine learning in Python. Retrieved April 26,
2025, from https://scikit-learn.org/stable/

Pandas Documentation: The Pandas Development Team. (2024). pandas documentation. https://pandas.pydata.org/docs/
(Note: The date refers to the latest documentation build/release date if available, otherwise use n.d. and retrieval date).

You might also like