0% found this document useful (0 votes)
20 views1 page

Report Machine Learning 101 1 1

The document discusses the fundamentals of machine learning as an essential component of data science, highlighting its significance as a hybrid profession. It covers various learning types, data science tasks, and specific algorithms such as supervised and unsupervised learning, along with challenges faced in the field. The chapter aims to provide a foundational understanding of machine learning for aspiring data scientists.

Uploaded by

Shahriar Azizi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views1 page

Report Machine Learning 101 1 1

The document discusses the fundamentals of machine learning as an essential component of data science, highlighting its significance as a hybrid profession. It covers various learning types, data science tasks, and specific algorithms such as supervised and unsupervised learning, along with challenges faced in the field. The chapter aims to provide a foundational understanding of machine learning for aspiring data scientists.

Uploaded by

Shahriar Azizi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

DATA SCIENCE REPORT SERIES

MACHINE LEARNING 101


Patrick Boily1,2,3 , Jen Schellinck2,4,5

Abstract
In October 2012, the Harvard Business Review published an article calling data science the “sexiest job
of the 21st century”, and declaring data scientists to be a ”hybrid of data hacker, analyst, communicator,
and trusted adviser” [11]. Would-be data scientists are usually introduced to the field via machine learning
algorithms and applications, which we discuss briefly in this chapter.
Keywords
Machine learning, statistical learning, supervised learning, unsupervised learning, association rules mining, classification,
decision trees, clustering, k−means, issues and challenges.
Funding Acknowledgement
Parts of this chapter were funded by Carleton University’s Centre for Quantitative Analysis and Decision Support.
1
Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
2
Data Action Lab, Ottawa, Canada
3
Idlewyld Analytics and Consulting Services, Wakefield, Canada
4
Sysabee, Ottawa, Canada
5
Institute of Cognitive Science, Carleton University, Ottawa, Canada
Email: [email protected]

Contents 1. Introduction

1 Introduction 1 From Data to Wisdom


2 Fundamentals 2
Data is not information, information is not
2.1 Learning Types . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
knowledge, knowledge is not understanding,
2.2 Data Science Tasks . . . . . . . . . . . . . . . . . . . . . . . . 3
understanding is not wisdom.
3 Association Rules Mining 4
– attributed to Cliff Stoll, Nothing to Hide: Privacy
3.1 Causation and Correlation . . . . . . . . . . . . . . . . . . . . 5
3.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
in the 21st Century, 2006
3.3 Generating Rules . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Toy Example: Titanic Dataset . . . . . . . . . . . . . . . . . . . 8 One of the challenges of working in the data science (DS),
3.5 Case Study: Danish Medical Data . . . . . . . . . . . . . . . 10 machine learning (ML) and artificial intelligence (AI)
4 Supervised Learning and Classification 12 fields is that nearly all quantitative work can be described
4.1 Classification Algorithms . . . . . . . . . . . . . . . . . . . . 13 with some combination of the terms DS/ML/AI (very often
4.2 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 15 to a ridiculous extent).
4.3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 17 Robinson [45] suggests that their relationships follow
4.4 Toy Example: Kyphosis Dataset . . . . . . . . . . . . . . . . 18 an inclusive hierarchical structure:
4.5 Case Study: Minnesota Tax Audits . . . . . . . . . . . . . . . 19
in a first stage, DS provides “insights” via visualization
5 Unsupervised Learning and Clustering 23
and (manual) inferential analysis;
5.1 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 24
in a second stage, ML yields “predictions” (or “ad-
5.2 k−Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
vice”), while reducing the operator’s analytical, in-
5.3 Clustering Validation . . . . . . . . . . . . . . . . . . . . . . . 27
5.4 Toy Example: Iris Dataset . . . . . . . . . . . . . . . . . . . . 28
ferential and decisional workload (although it is still
5.5 Case Study: The Livehoods Project . . . . . . . . . . . . . . 29 present to some extent), and
in the final stage, AI removes the need for oversight,
6 Issues and Challenges 32
allowing for automatic “actions” to be taken by a
6.1 Bad Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
completely unattended system.
6.2 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.3 Appropriateness and Transferability . . . . . . . . . . . . . . 33 The goals of artificial intelligence are laudable in an aca-
6.4 Pitfalls and Mistakes . . . . . . . . . . . . . . . . . . . . . . . 34 demic setting, but in practice, we believe that stakehold-

You might also like