The Deep Learning Revolution (2010s)
2012: Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton achieve a significant breakthrough with the
AlexNet model, which wins the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) by a
large margin. AlexNet's success demonstrates the power of deep convolutional neural networks
(CNNs) and GPU acceleration for training deep models.
2013: Word2Vec, developed by Tomas Mikolov and colleagues at Google, introduces a new approach
to learning word embeddings, significantly advancing natural language processing (NLP) tasks.
2014: Ian Goodfellow and his colleagues introduce Generative Adversarial Networks (GANs), a
novel approach to generating realistic data through adversarial training.
2014: The Deep Q-Network (DQN), developed by DeepMind, achieves human-level performance in
playing Atari games, showcasing the potential of deep reinforcement learning.
2015: ResNet (Residual Networks), developed by Kaiming He and colleagues at Microsoft Research,
wins the ILSVRC with a significant margin. ResNet's introduction of skip connections helps in training
very deep networks.
Success in Handwriting Recognition
Graves et. al., in 2009 outperformed all
entries in an international Arabic handwriting
recognition competition.
Success in Speech Recognition
Dahl et. al., showed relative error reduction of
16.0% and 23.2% over the state of the art
system.
New Record on MNIST
Ciresan et. al., set a new record on the
MNIST dataset in 2010 using good old
backpropagation on GPUs (GPUs enter
scene)
First Superhuman Visual Pattern Recognition
D.C. Ciresan et. al., achieved 0.56%
error rate in the IJCNN Traffic Sign
Recognition Competition.
Winning more Visual Recognition Challenges
From Cats to Convolutional Neural Networks
Hubel and Wiesel Experiment
Experimentally showed that each
neuron has a fixed receptive field — i.e.,
a neuron will fire only in response to a
visual stimuli in a specific region in the
visual space.
Neocognitron
Used for Handwritten character
recognition and pattern recognition
(Fukushima et. al.)
Convolutional Neural Network
Handwriting digit recognition using
backpropagation over a Convolutional
Neural Network (LeCun et. al.)
LeNet-5
Introduced the (now famous) MNIST dataset
(LeCun et. al.)
An algorithm inspired by an experiment on
cats is today used to detect cats in videos :-)
Better Optimization Methods
Faster convergence, better accuracies.
The Curious Case of Sequences
Sequences:
● They are everywhere.
● Time Series, speech, music, text, video
● Each unit in the sequence interacts with other units
● Need models to capture this interaction
Hopfield Network
Content-addressable memory systems for
storing and retrieving patterns.
Jordan Network
The output state of each time step is fed to
the next time step thereby allowing
interactions between time steps in the
sequence.
Elman Network
The hidden state of each time step is fed
to the next time step thereby allowing
interactions between time steps in the
sequence.
Drawbacks of RNNs
Hochreiter et. al. and Bengio et. al. showed the difficulty in training RNNs (the
problem of exploding and vanishing gradients)
Long Short Term Memory
Showed that LSTMs can solve complex
long time lag tasks that could never be
solved before.
Sequence to Sequence Learning
● Initial success in using RNNs/LSTMs
for large-scale Sequence to
Sequence Learning Problems
● Introduction of Attention which
inspired a lot of research over the
next years
Attention is All You NEED: Transformers
● Introduced by Ashish Vaswani et
al. in 2017, transformers leverage
self-attention mechanisms to
process sequences more
effectively than traditional RNNs.
● A Breakthrough in Natural
Language Processing (NLP).
The Deep Learning Revolution (2010s -2020s)
2018: OpenAI releases GPT (Generative Pre-trained Transformer), setting new benchmarks in NLP
tasks with its ability to generate coherent and contextually relevant text.
2020: OpenAI releases GPT-3, a language model with 175 billion parameters, pushing the boundaries
of what is possible with NLP and generating significant public interest and debate about the future of
AI.
2021: DeepMind's AlphaFold achieves a breakthrough in protein structure prediction, demonstrating
the impact of deep learning on scientific discovery.
2022: The DALL·E 2 and Stable Diffusion models showcase the ability of deep learning models to
generate high-quality images from textual descriptions, revolutionizing the field of generative art and
creative AI.
2023: Google Research introduces PaLM (Pathways Language Model), a large-scale language
model designed to improve understanding and reasoning across multiple languages and tasks.