0% found this document useful (0 votes)
117 views21 pages

Weakly Supervised Learning Overview

Weakly supervised learning uses new generated data that is incomplete, inexact, or inaccurate. Snorkel is a system that uses weak supervision sources like heuristics and classifiers to label data and train models. As a demonstration, unlabeled Brexit tweets were weakly labeled and used to build a classifier that achieved 78% accuracy, higher than one trained on a smaller set of manually labeled tweets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views21 pages

Weakly Supervised Learning Overview

Weakly supervised learning uses new generated data that is incomplete, inexact, or inaccurate. Snorkel is a system that uses weak supervision sources like heuristics and classifiers to label data and train models. As a demonstration, unlabeled Brexit tweets were weakly labeled and used to build a classifier that achieved 78% accuracy, higher than one trained on a smaller set of manually labeled tweets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

An introduction to weakly supervised learning.

Best practices.
Kristina Khvatova

Software Developer
Softec S.p.A.

Master in Computer Science and Applied Mathematics


Saint-Petersburg State University

Master in Computer Science and Data Analysis


Milano-Bicocca University

[email protected]
https://www.linkedin.com/in/kristina-khvatova-a529b21
Overview
➢ Introduction to weak supervision
➢ Three types of weakly supervised learning:
● incomplete
● inexact
● inaccurate
➢ Snorkel
➢ Brexit tweets classification with weak supervised learning
Problem Definition

4

Weak supervision

Weak supervision is the technique


of building models based on new
generated data.
Types:
- incomplete
- inexact
- inaccurate

5

Incomplete weak supervision

● Active learning
● Semi - supervised learning
6

Incomplete weak supervision

Active learning

● High accuracy
● Low costs

7
Incomplete weak supervision

Active learning

High costs for the project Decrease costs and Cost of query labels is
and high precision (90%) precision of the project the same as in (b), but
(70%) the precision is much
more higher (90%) the
same as (a)
8
Incomplete weak supervision

Semi-supervised learning

9

Incomplete weak supervision

Semi-supervised learning
Generative models Label propagation TSVM

1
0
Inexact weak supervision

1
1
Inexact weak supervision

“Is object localization for free? – Weakly-supervised learning with convolutional neural networks.” (CVPR2015)
1
2
Inaccurate weak supervision

http://www.scholarpedia.org/article/Ensemble_learning
1
3
Snorkel: The System for Programmatically
Building and Managing Training Data
Snorkel is a system for programmatically building and managing training datasets to rapidly and flexibly fuel machine
learning models.

● Data Programming with DDLite: Putting Humans in a Different Part of the Loop (June 2016)
● Conversational agents at IBM: Bootstrapping Conversational Agents With Weak Supervision (AAAI 2019)
● Web content & event classification at Google: Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial
Scale (SIGMOD Industry 2019), and Google AI blog post
● Business intelligence at Intel: Osprey: Non-Programmer Weak Supervision of Imbalanced Extraction Problems (SIGMOD
DEEM 2019)
● Anti-semitic tweet classification w/ Snorkel + transfer learning.
● Clinical text classification: A clinical text classification paradigm using weak supervision and deep representation (BMC
MIDM 2019)
● Social media text mining: Deep Text Mining of Instagram Data without Strong Supervision (ICWI 2018)
● Cardiac MRI classification with Stanford Medicine: Weakly supervised classification of rare aortic valve malformations
using unlabeled cardiac MRI sequences (BioArxiv 2018)
● Medical image triaging at Stanford Radiology: Cross-Modal Data Programming for Medical Images (NeurIPS ML4H 2017)
● GWAS KBC with Stanford Genomics: A Machine-Compiled Database of Genome-Wide Association Studies (NeurIPS
ML4H 2016)

1
4
Weak Supervision Formulation

A high-level schematic of the basic weak supervision “pipeline”: We start with one or more weak supervision sources:
crowdsourced data, heuristic rules, distant supervision, and/or weak classifiers provided by a subject matter expert. The
core technical challenge is to unify and model these sources. Then, this must be used to train the end model.

https://dawn.cs.stanford.edu/2017/07/16/weak-supervision/ 1
5
Snorkel: data programming
“Prime Minister Lee
Hsien Loong and his wife
Ho Ching leave a polling
station after casting their
votes in Singapore”
(NYTimes.com)

1
6
Demo: Step-By-Step Guide for Building a
Brexit Tweet Classifier

https://github.com/HazyResearch/snorkel

https://github.com/HazyResearch/metal

1
7
Demo: Step-By-Step Guide for Building a
Brexit Tweet Classifier

➔ Collecting unlabeled data: 3184

(tweets that contain #Brexit)

➔ Label 500 examples: 250 - ‘leave’, 250 - ‘stay’


➔ Create 5 LFs, apply on 2684 unlabeled tweets.

“Predicting Brexit:Classifying Agreement is Better than Sentiment and Pollsters” 1


8
Demo: Step-By-Step Guide for Building a
Brexit Tweet Classifier
Safer In #EU? No! No! No! Terrorists want
the UK to STAY Remember 7/7 Paris
#EUreferendum #VoteLeave

#Liverpool have broke the #Spanish


dominance in Europe... #English #football
says Yes We Belong in #Europe! #Stay
#strongerin

Tweet Label functions

@StrongerIn so if we stay in eu that means we get more zero hours contracts and employers can say 'we dont need to
now, fuck off' 󾓪 #TakeControl #VoteLeave

“Predicting Brexit:Classifying Agreement is Better than Sentiment and Pollsters” 1


9
Result: Brexit Tweet Classifier

Tweet Classifier on 500 labeled examples Tweet Classifier with Snorkel

LR ACCURACY: 0.52 LR ACCURACY: 0.78

2
0
Summary
➢ Weak supervision
■ incomplete
■ inexact
■ inaccurate
➢ Snorkel and Snorkel metal
➢ Demo application: Brexit Tweet Classifier

You might also like