Lecture 1 – Probabilistic Graphical
Models
Book to care for
You can check this in lecture 1
- Probabilistic graphical models by Daphne Koller and Nir Friedman
- Probabilistic Machine Learning by Kevin P. Murphy
- Mathematics for machine learning
Probabilistic Graphical Models
PGM allows to:
- Deal with uncertainty in the data.
- Predict events in the world with an understanding of data and model uncertainty.
- Probabilistic Graphical Models will help us to do that.
Why PGM and not Deep Learning?
- Deep Learning has been successful in classification of images.
- Deep Neural Networks are often confident about decisions, and in some situations too much
so/for no good reason -> come with no explanation, introducing problems of troubling bias
(sometimes exacerbated by poor data)
- Model confidence for a decision will help tackle some of the bias issues.
How to gain global insight based on local observations?
Key Idea:
- Represent the world as a collection of random variables X_1, … Xn with joint distributions
p(X1, …, Xn)
- Learn the distribution from data
- Perform “inference”, i.e., compute conditional contributions
- probability of X_i given X_1 variables with probability of %.
Reasoning under uncertainty
- Different kinds of uncertainty: partial knowledge / noise / modelling limitations / inherent
randomness.
- Uncertainty does not play a big role in “classical” AI and many of the machine learning
approaches.
- Probabilistic approaches enable us to do things that are not possible otherwise.
Types of Models
- Linear
- Probabilistic approaches
- Density approaches
Graphical Models:
- Graph – as in: set of nodes connected with edges/ vertices
- To organize and represent knowledge and relations
- If we have random variables X_1, …, X_n with joint distribution p(X_1, …, X_n), and every
random variable could only take 2 values, a complete table would have 2^n rows.
A graph:
- A graph is a data structure G, consisting of a set of nodes I vertices, and a set of edges
- Graphs can be directed or undirected.
Key challenges:
- Represent the world as a collection of random variables X_1, … Xn with joint distributions
p(X1, …, Xn)
- How can we compactly describe this joint distribution?
- Directed graphical models (Bayesian Networks)
- Undirected graphical models (Markov random fields, factor graphs)
2. Learn the distribution from data
- Maximum likelihood estimation, other estimation methods
- How much data do we need?
- How much computation does it take?
3. Perform “inference” – i.e. Compute conditional distribution
Application of Probabilities: Detecting generating texts
It is possible for LLMs to watermark generated text.
- Without degrading text quality
- Without re-training the language model
- With an open-source approach, without publishing the language model.
The resulting text can be detected as “generated” with extremely high probability.
Here is the idea:
For every word (token) to be generated:
- Seed the RNG with previous work
- Create a new whitelist, a random 50% of the entire vocabulary.
- We only choose words from whitelist.
Generated text will have only *whitelisted words. Probability that you pick a word from whitelist is
50%.
For N words, this probability is 0.5^N.
A tweet of 25 words, with only words from whitelist, is 99.99997% generated.
A watermark for language models
Actual approach is more sophisticated:
- No “strict” black - / whitelist, but avoid balcklisted words probabilistically.
- Can better deal with “low entropy” parts of the text (“Barack” -> “Obama”, almost always)
- Can then use smaller whitelist (e.g. 25%).
- False positive (human text flagged as fake) are less likely to happen.
“Synonym attacks” need to replace impractically large parts of the generated text
Assessment Question: