SYLLABUS
1. INTRODUCTION TO DATA SCIENCE
What is data science, relation to data mining, machine learning, big data and
statistics
Motivating examples
Why is it interesting?
Several data science settings
Introduction to the WEKA tool
Practical information
2. GETTING TO KNOW YOUR DATA
From data to features
o Interactive group discussion
o
Representing problems with matrices
Representing problem with relations
Example: Text with TFIDF
Computing simple statistics
o
Means, variances, standard deviations, weighted averaging, modes, quartiles
Example: Political predictions
Simple visualizations
o
Histograms
Boxplots
Scatterplots
Time series
Spatial data
Case studies
o
X & Y examples
Medical data
3. OVERVIEW OF TASKS & TECHNIQUES: PREDICTION
The prediction task
o Definition
o
Examples
Format of input / output data
Prediction algorithms
o
Decision trees
Rule learners
Linear/logistic regression
Nearest neighbour learning
Support vector machines
Properties of prediction algorithms and practical exercises
Combining classifiers
4. EVALUATION AND METHODOLOGY OF DATA SCIENCE
Experimental setup
o Training, tuning, test data
o
Holdout method, cross-validation, bootstrap method
Measuring performance of a model
o
Accuracy, ROC curves, precision-recall curves
Loss functions for regression
Interpretation of results
o
Confidence interval for accuracy
Hypothesis tests for comparing models, algorithms
5. DATA ENGINEERING
Attribute selection
o Filter methods
o
Wrapper methods
Data discretization
Unsupervised discretization
Supervised discretization
Data transformations
o
PCA and variants
Exercises
6. OVERVIEW OF TASKS & TECHNIQUES: PROBABILISTIC
MODELS
Introduction
o Probabilities
o
Naive Bayes
o
Rule of Bayes and Conditional Independence
Application to spam filtering
Bayesian Networks
o
Graphical representation
Independence and correlation
Temporal models
o
Markov Chains
Hidden Markov Models
7. OVERVIEW OF TASKS & TECHNIQUES: EXPLORATORY
DATA MINING
Introduction to Exploratory Data Mining
Association discovery
o
What is association discovery?
What are the challenges?
In detail: Apriori
Clustering
o
What is clustering?
What are the challenges?
In detail: agglomerative clustering
Hands-on: clustering in WEKA
8. CASE STUDIES IN DATA SCIENCE
Eve, the Pharmaceutical Robot Scientist: Data Science for Drug Discovery
Data science for sports analytics
Data science for sensor data (Introduction to challenge)
9. CHALLENGE
Introduction
Hands-on by participants
Discussion of results