0% found this document useful (0 votes)
100 views35 pages

CRF Tutorial for Sequence Prediction

This tutorial document covers conditional random fields (CRFs) for sequence prediction. It discusses how CRFs model the conditional distribution to predict sequences and are log-linear on feature functions. Parameter estimation is done by maximum likelihood, which is convex and solved through gradient-based methods. The document also generalizes CRFs to hidden CRFs, factorized linear models, and structured SVMs. Finally, it applies hidden CRFs to the problem of object recognition using a part-based model with spatial dependencies between image patches.

Uploaded by

muhammadkhahfi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views35 pages

CRF Tutorial for Sequence Prediction

This tutorial document covers conditional random fields (CRFs) for sequence prediction. It discusses how CRFs model the conditional distribution to predict sequences and are log-linear on feature functions. Parameter estimation is done by maximum likelihood, which is convex and solved through gradient-based methods. The document also generalizes CRFs to hidden CRFs, factorized linear models, and structured SVMs. Finally, it applies hidden CRFs to the problem of object recognition using a part-based model with spatial dependencies between image patches.

Uploaded by

muhammadkhahfi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Tutorial on Conditional Random Fields

for Sequence Prediction

Ariadna Quattoni
RoadMap

Sequence Prediction Problem

CRFs for Sequence Prediction

Generalizations of CRFs

Hidden Conditional Random Fields (HCRFs)

HCRFs for Object Recognition


RoadMap

Sequence Prediction Problem

CRFs for Sequence Prediction

Generalizations of CRFs

Hidden Conditional Random Fields (HCRFs)

HCRFs for Object Recognition


Sequence Prediction Problem

Example: Part-of-Speech Tagging

He reckons the current account deficit will narrow significantly

[PRP] [VB] [DT] [JJ] [NN] [NN] [MD] [VB] [RB]


Gesture Recognition

[HTF] [HTF] [HTF] [HOF] [HOF] [HOS]


RoadMap

Sequence Prediction Problem

CRFs for Sequence Prediction

Generalizations of CRFs

Hidden Conditional Random Fields (HCRFs)

HCRFs for Object Recognition


Conditional Random Fields:
Modelling the Conditional Distribution

Model the Conditional Distribution:

To predict a sequence compute:

Must be able to compute it efficiently.


Conditional Random Fields:
Feature Functions

Feature Functions:
Feature Functions

Express some characteristic of the empirical distribution


that we wish to hold in the model distribution
Conditional Random Fields:: Distribution

Label sequence modelled as a normalized product of


feature functions:

The model is log-linear on the Feature Functions


Parameter Estimation:Maximum Likelihood

IID training samples:

(negative) Conditional Log-Likelihood:


Parameter Estimation: Maximum Likelihood

Maximum Likelihood Estimation

Set optimal parameters to be:

This function is convex, i.e. no local minimums


Parameter Estimation:Optimization

Let:

Differentiating the log-likelihood with respect to parameter

Observed Mean Expected Feature


Feature Value Value Under
The Model
Parameter Estimation: Optimization

Generally, it is not possible to find and analytic solution to the


previous objective.

Iterative techniques, i.e. gradient based methods.


Maximum Entropy Interpretation

Notice that at the optimal solution of:

We must have that:

Finding max-entropy distribution that


Maximizing log-likelihood satisfies the set of constraints
defined by the feature functions
CRFs Inference
Given a model, i.e. parameter values

Can we compute the following efficiently?

Best Label
Sequence

Expected
Values

Both can be computed using dynamic programming.


RoadMap

Sequence Prediction Problem

CRFs for Sequence Prediction

Generalizations of CRFs

Hidden Conditional Random Fields (HCRFs)

HCRFs for Object Recognition


Generalization I: CRFs Beyond Sequences
Predicting Trees: Application Constituent Parsing

NP VP
PP

NP

D N V P D N

The boy smiled at the girl


Generalization II: Factorized Linear Models

To predict a sequence compute:

Linear Model

Objective: making accurate predictions on unseen data

The parameters of the linear model can be


optimized using other loss functions
Generalization II: Factorized Linear Models
Structured Hinge Loss

Let be the correct label sequence:

Structured SVM
RoadMap

Sequence Prediction Problem

CRFs for Sequence Prediction

Generalizations of CRFs

Hidden Conditional Random Fields (HCRFs)

HCRFs for Object Recognition


Hidden Conditional Random Fields

Sentiment Detection

This movie greatly appealed to me for many reasons - I loved it

+1 Positive Review

As dumb as history gets

-1 Negative Review
Hidden Conditional Random Fields
Object Recognition

+1 Car

A training sample
Hidden Conditional Random Fields
Model the conditional probability:

We introduce hidden variables:

Analogus to the standard CRF we define:

Maps a configuration to the reals.


Hidden Conditional Random Fields
Feature Functions
Parameter Estimation
Maximum Likelihood:

Find optimal parameters:

Iterative techniques, i.e. gradient based methods.


But now the function is not convex!!!

At test time make prediction:


Parameter Estimation
The derivative of the loss function
is given by:

The derivative can be expressed in terms of components:

that can be calculated using dynamic programming.


Similarly the argmax can also be computed efficiently.
RoadMap

Sequence Prediction Problem

CRFs for Sequence Prediction

Generalizations of CRFs

Hidden Conditional Random Fields (HCRFs)

HCRFs for Object Recognition


Application :: Object Recognition
SemiSupervised Part-based
Models
Motivation

Use a discriminative model.


Spatial dependencies between parts.
It is convenient to use an intermediate discrete hidden variable.
Potential of learning semantically-meaningful parts.
Framework for investigating which part structures emerge.
Graph Structure
Feature Functions

is a minimum spanning tree.


Weight (i, j)= distance between patches xi and xj

obtained with Lowes detector (textured regions)


SIFT features (describes the texture of the image region).
Patch description also includes relative location.
Viterbi Configuration
Learning Shape
Conclusions
Factorized Linear Models generalize linear prediction models to
the setting of structure prediction.

In standard linear prediction, finding the argmax and computing


gradients is trivial. In structure prediction it involves inference.

Factored representations allow for efficient inference algorithms


(most times based on dynamic programming)

Conditional Random Fields are an instance of this framework

Future Work
Better Algorithms for training HCRFs

You might also like