0% found this document useful (0 votes)

288 views84 pages

ETH Zurich Talk - April 14, 2025

The document discusses significant trends in artificial intelligence (AI) and machine learning (ML), highlighting advancements in model architecture, training techniques, and hardware improvements over the past fifteen years. Key developments include the evolution of neural networks, the introduction of transformer models, and the use of self-supervised learning, which have collectively transformed AI capabilities. The document also emphasizes the importance of collaboration and open-source tools in shaping the future of AI technology.

Uploaded by

周炎兵

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

288 views84 pages

ETH Zurich Talk - April 14, 2025

Uploaded by

周炎兵

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Important Trends in AI:

How Did We Get Here,

What Can We Do Now and How
Can We Shape AI’s Future?
Jeff Dean, Chief Scientist, Google Research & Google DeepMind

@jeffdean.bsky.social and @JeffDean

ai.google/research/people/jeff

Presenting the work of many people at Google and elsewhere

Some observations
In recent years, ML has completely changed our expectations of
what is possible with computers

Increasing scale (compute, data, model size) delivers better results

Algorithmic and model architecture improvements have provided

massive improvements as well

The kinds of computations we want to run and the hardware on

which we run them is changing dramatically
Fifteen Years of Machine Learning Advances

How Did Today’s Models Come To Be?

Key Building Block from Last Century: Neural Networks

weights
weights

Key building block: neural networks, made up of artificial neurons, loosely designed to
mimic how real neurons behave
Key Building Block from Last Century: Backpropagation

weights
weights

Backpropgation of
errors gives an
algorithm for how to
update the weights of
whole neural network
based on errors
observed at the
outputs of the model

Key building block: backpropagation of errors (using chain rule) gives effective algorithm
for updating the weights of a neural network to minimize errors on training data
2012: Scale Matters

Training a very large neural network (60X bigger than previous largest neural network) using
16,000 CPU cores gives major advances in quality
(~70% relative improvement in ImageNet 22K state-of-the-art)
Le et al., ICML 2012, arxiv.org/abs/1112.6209
2012: Distributed Training on Many Computers

Model parallelism Data parallelism

Combining model parallelism and data parallelism for neural network training across
thousands of computers enables training of much larger (50-100X) neural networks than
previously possible

Large Scale Distributed Deep Networks, Dean et al., NeurIPS 2012,

research.google.com/archive/large_deep_networks_nips2012.pdf
2013: Distributed Representations of Words Are Powerful
Word2Vec

Distributed representations of words are powerful:

(1) Nearby words in high dimensional space are related
cat, puma, tiger, … are all nearby

(2) Directions are meaningful

king – queen ~= man – woman
ICLR 2013 workshop, arxiv.org/abs/1310.4546 Appeared in NeurIPS 2013, arxiv.org/abs/1310.4546
2014: Models that Map One Sequence to Another are Powerful
Sequence to Sequence

Use a neural encoder over an input sequence to generate state, use that to
initialize state of a neural decoder. Scale up LSTMs and this works.

Appeared in NeurIPS 2014, arxiv.org/abs/1409.3215

2015: Specialized Hardware for Neural Network Inference
about 1.2 1.21042
reduced × about 0.6 × 0.61127
precision NOT
about 0.7 0.73989343
ok

handful of speciﬁc
Tensor Processing Unit (TPU) operations × =
v1: 2015, 92 teraops (inference only)

Specialization is much more efficient:

Compared to contemporary CPUs & GPUs:
TPU v1 is 15X-30X faster
TPU v1 is 30X-80X more energy efficient

Appeared in ISCA, 2017, arxiv.org/abs/1704.04760. Now most cited paper in ISCA’s 50 year history
2016: Specialized Supercomputers for Neural Network Training

Connect thousands of chips together (TPU pods) with custom high-speed networks
to enable faster neural network training

TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for
Embeddings, Jouppi et al., ISCA 2023, arxiv.org/abs/2304.01433
Continual Hardware Performance Scaling

11 1126 42522
petaflops petaflops petaflops

blog.google/products/google-cloud/ironwood-tpu-age-of-inference/
Continual Hardware Improvements in Energy Efficiency

~30X energy
efficiency
improvement
vs. TPU v2

Peak FP8 flops delivered per watt of thermal design power per chip package

blog.google/products/google-cloud/ironwood-tpu-age-of-inference/
Open source tools enable the whole community

pytorch.org
tensorflow.org

github.com/jax-ml/jax
2017: Transformer Model Architecture: Attention

Don’t try to force state into single recurrent distributed representation.

Instead, save all past representations and attend to them.

Attention is All You Need, Vaswani et al., NeurIPS 2017, arxiv.org/abs/1706.03762

2017: Transformer Model Architecture: Attention

Figure from Scaling Laws for Neural Language Models,

Kaplan et al., arxiv.org/abs/2001.08361

Higher accuracy w/ 10X-100X less compute and 10X smaller models!

Attention is All You Need, Vaswani et al., NeurIPS 2017, arxiv.org/abs/1706.03762

2018: Language Modeling At Scale With Self-Supervised Data

There’s lots of text in the world! Self-supervised learning on this text can
provide very large amounts of training data with the “right” answer known (“wrong
guess” is used to provide gradient descent loss training signal)

Self-supervised learning
on text with large models
is one of the major
reasons chat/language
models have gotten so
good

Language Models are Few-Shot Learners, Brown et al., NeurIPS, 2020, arxiv.org/abs/2005.14165
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., ACL 2019, arxiv.org/abs/1810.04805
2018: Language Modeling At Scale With Self-Supervised Data

Different kinds of training objectives:

Autoregressive (look at prefix, predict next word): Self-supervised learning
Zürich is ______ on text with large models
Zürich is the _______ is one of the major
Zürich is the largest _______ reasons chat/language
models have gotten so
Fill-in-the-Blank (e.g. look in both directions, BERT):
good
Zürich ____ the largest ____ in ______.
Zürich is the ______ city ____ Switzerland.
….
Language Models are Few-Shot Learners, Brown et al., NeurIPS, 2020, arxiv.org/abs/2005.14165
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Devlin et al., ACL 2019, arxiv.org/abs/1810.04805
2021: Transformers for Vision

Visualization of
attention mechanism

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, Alexey Dosovitskiy et al., ICLR 2021,
arxiv.org/abs/2010.11929
2017: Sparse Models (e.g. Mixture of Experts) Outperform
Dense Models

(A)

or (B)

Give model much larger capacity w/ lots of experts but only activate a few chosen experts per token:
(A) ~8X reduction in training compute cost for ~same accuracy, or
(B) major accuracy improvements for same training compute cost
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton and Jeff Dean.
ICLR 2017, arxiv.org/abs/1701.06538
Continued Research on Sparse Models

Gemini 1.5 Pro/Gemini 2.0/Gemini 2.5 use mixture-of-expert (MoE) architectures, building on a long line
of Google research efforts on sparse models:
● 2017: Shazeer et al., Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.
ICLR 2017. arxiv.org/abs/1701.06538
● 2020: Lepikhin et al., GShard: Scaling giant models with conditional computation and automatic sharding.
ICLR 2020. arxiv.org/abs/2006.16668
● 2021: Carlos Riquelme et al., Scaling vision with sparse mixture of experts, NeurIPS 2021.
arxiv.org/abs/2106.05974
● 2021: Fedus et al., Switch transformers: Scaling to trillion parameter models with simple and efficient
sparsity. JMLR 2022. arxiv.org/abs/2101.03961
● 2022: Clark et al., Unified scaling laws for routed language models, ICML 2022. arxiv.org/abs/2202.01169
● 2022: Zoph et al., Designing effective sparse expert models. arxiv.org/abs/2202.08906
● 2023: Puigcerver et al., From Sparse to Soft Mixtures of Experts. arxiv.org/abs/2308.00951
● 2023: Obando-Cero et al., Mixtures of Experts Unlock Parameter Scaling for Deep RL.
arxiv.org/abs/2402.08609
● 2024: Raposo et al., Mixture-of-Depths: Dynamically allocating compute in transformer-based language
models. arxiv.org/abs/2404.02258
● 2024: Douillard et al., DiPaCo: Distributed Path Composition. arxiv.org/abs/2403.10616
2018: Software abstractions for Distributed ML Computations
Example: Pathways

Region A Region B

Building 1 Building 2 Building 1

Scalable software can simplify running large-scale computations

Pathways: Asynchronous Distributed Dataflow for ML, Barham et al., MLSys 2022: arxiv.org/abs/2203.12533
2018: Software abstractions for Distributed ML Computations

Client
With JAX+Pathways, entire training process
driven by a single Python process on one host

Region A Region B

Building 1 Building 2 Building 1

Scalable software can simplify running large-scale computations

Pathways: Asynchronous Distributed Dataflow for ML, Barham et al., MLSys 2022: arxiv.org/abs/2203.12533
Pathways: Now Available for Cloud Customers

Pathways: Enables a single JAX client can see and use many devices (e.g. 1 to 100,000
chips), even though these are distributed across many hosts and even many TPU pods

Pathways: Asynchronous Distributed Dataflow for ML, Barham et al., MLSys 2022: arxiv.org/abs/2203.12533
2022: “Thinking longer” at inference time is very useful
“Chain of Thought prompting” is one such technique

Chain of Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei, Xuezhi Wang, Dale
Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou, 2022, arxiv.org/abs/2201.11903
2022: “Thinking longer” at inference time is very useful
“Chain of Thought prompting” is one such technique

Solve rate (%age)

Model scale
(billions of parameters)

Prompting model to “show its work” improves accuracy on reasoning tasks

dramatically
Chain of Thought Prompting Elicits Reasoning in Large Language Models, Jason Wei, Xuezhi Wang, Dale
Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou, 2022, arxiv.org/abs/2201.11903
2014: Distillation: Use Powerful “Teacher” Models to Make
Smaller, Cheaper “Student” Models

“performed the Concerto for “ ?

Real next word: “Violin”

Distillation: Use large high quality model as “teacher” when training smaller
“student” model
Rejected from NeurIPS 2014. Published in workshop & put on Arxiv: arxiv.org/abs/1503.02531. 24,000+ citations.
2014: Distillation: Use Powerful “Teacher” Models to Make
Smaller, Cheaper “Student” Models

Gives much richer signal for

training: try to get student to
match “soft probability
distribution” of large model

“performed the Concerto for “ ?

Real next word: “Violin”

Teacher model says: “Violin: 0.4, Piano: 0.2, Trumpet: 0.01, Airplane: 0.00000001”

“performed the Concerto for “ ?

Real next word: “Violin”

Teacher model says: “Violin: 0.4, Piano: 0.2, Trumpet: 0.01, Airplane: 0.00000001”

Distillation: Use large high quality model as “teacher” when training smaller
“student” model
Rejected from NeurIPS 2014. Published in workshop & put on Arxiv: arxiv.org/abs/1503.02531. 24,000+ citations.
2022: Many Different Parallelism Schemes During Inference

Right choices for how to distribute

inference computation heavily influenced
by things like batch size or latency
constraints

Efficiently Scaling Transformer Inference, Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James
Bradbury, Anselm Levskaya, Jonathan Heek, Kefan Xiao, Shivani Agrawal, Jeff Dean, arxiv.org/abs/2211.05102
2023: Speculative Decoding
Use small “drafter” model to predict next K tokens
● Then predict next K tokens in one shot with large model (more efficient: batch size K not 1)
● Advance generation by as many tokens as match in prefix of size K
● Guaranteed identical output distribution

Larger, slower model

Faster model (drafter)

Fast Inference from Transformers via Speculative Decoding, Yaniv Leviathan, Matan Kalman & Yossi Matias,
ICML ‘23, arxiv.org/abs/2211.17192
Innovations at Many Levels

Inference-time
Inference algorithms Chain-of-Thought Speculative Decoding compute scaling

Unsupervised and Asynchronous

Training algorithms Distillation SFT + RLxF
Self-Supervised Learning Training

Model architecture Word2Vec Seq2Seq Transformers MoEs Visual Transformers

Software abstractions DistBelief Pathways

Hardware TPUv1 → TPUv2 → TPUv3 → TPUv4 → TPUv5p → Trillium → Ironwood

Gemini:

Putting These Advances Together

Project started in Feb 2023
Many collaborators from Google DeepMind, Google Research, and rest of Google

Goal: Train the world’s best multimodal models and use them all across Google
Gemini 1.0: Dec 2023
Gemini 1.5: Feb 2024 (demonstrated 10M token context window, Flash model)
Gemini 2.0: Dec 2024 (2.0 Flash as good as 1.5 Pro, multimodal live streaming, …)
Gemini 2.0 Thinking: Jan 2025 (2.0 Flash Experimental Thinking)
Gemini 2.5: Mar 2025 (2.5 Pro released), Apr 2025 (“2.5 Flash coming soon”)
https://blog.google/technology/ai/google-gemini-ai https://g.co/gemini
Gemini: A Family of Highly Capable Multimodal Models, by the Gemini Team, arxiv.org/abs/2312.11805
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, by the Gemini Team, arxiv.org/abs/2403.05530
Gemini - Multimodal from the start

Gemini - multimodal from the start

Gemini: A Family of Highly Capable Multimodal Models, by the Gemini Team, arxiv.org/abs/2312.11805
Gemini 1.5
Increased context length
Models can now handle up to 10 million
tokens, with external APIs now offering up
to 2 million tokens for text and/or video.

Clearer context
The information within the context window
is clearer, reducing hallucinations &
enabling in-context learning.

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context, by the Gemini Team, arxiv.org/abs/2403.05530
Gemini 2.0
(Like 1.0 and 1.5 and 2.5) Builds on many of
the innovations I just described:

● TPUs
● Cross-datacenter training
● Pathways
● JAX
● Distributed representations of words
● Transformers
● Sparse Mixture of Experts
● Distillation
● + … many more innovations …

blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024
Gemini 2.5 Pro
Our most capable model (for now!)

blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
Gemini 2.5 Pro
Our most capable model (for now!) Leaderboard positions
● #1 LMSYS
● # LiveBench
● #1 Humanity’s Last Exam
● #1 SEAL
● #1 Artificial Analysis
● #1 Aider Polyglot
● #1 MathArena.ai
● #1 Mensa IQ test
● #1 Fiction.LiveBench
● #1 SimpleBench
● #1 Kagi leaderboard
● #2 WebDev Arena
● #4 LiveCodeBench
● #4 NYT Connections
● #2 Creative Writing
● #4 Vectara
● # 1 Perfect Information Game

blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/
Users Generally Enjoying Capabilities of Gemini 2.5 Pro
Long context abilities are very helpful (especially for code)
Pushing the Pareto Frontier of Optimal Quality/Price
Organizing a Large-Scale Scientific Effort
Like Gemini
Many Contributors in Many Different Areas
Many Contributors in Many Different Areas
Gemini Structure & Ways of Working
Overall Leads Program Management Product Management

Model Development Areas Capabilities

Pre-training Safety Code
Post-training Vision Agents
On-device Models Audio Internationalization
… …
Core Areas
Data Evals
Infrastructure Codebase
Serving Longer-term Research
…
Gemini Structure & Ways of Working
Many people in many locations:
~⅓ in San Francisco Bay Area
~⅓ in London
~⅓ in many other places:
NYC, Paris, Boston, Zürich, Bangalore, Tel Aviv, Seattle, …

Time zones are annoying!

● “Golden Hours” between California/West Coast and London/Europe
are important
Gemini Structure & Ways of Working
Lots and lots of large and small discussions and information sharing conducted via
Google Chat Spaces (I’m in 200+ such spaces)

RFCs (Request for Comment): semi-formal way of getting feedback, knowing what
others are working on, etc.

Leaderboards and common baselines enable data-driven decision making about how
to improve
● Multiple rounds of experimentation.
● Many experiments at small scale
● Advance smaller number of successful experiments to next scale
● Every so often (every few weeks), incorporate successful experiments
demonstrated at largest experimental scale into new candidate baseline
● Repeat
Training at Scale:
Silent Data Corruption errors (SDCs)

Despite best efforts, given the scale of ML systems

and the size of ML training jobs, hardware errors
can occur, and sometimes incorrect computations
from one buggy chip can spread and infect the
entire training system
Silent data corruption

Non-deterministically produce incorrect

results, silently

Challenging problem when running largely

independent computation

Multiplicatively worse at scale with

synchronous stochastic gradient descent

Can quickly spread results across

thousands of components across ML
supercomputer

Cores that Don't Count, Peter H. Hochschild, Paul Jack Turner, Jeffrey C. Mogul, Rama Krishna Govindaraju, Parthasarathy
Ranganathan, David E Culler, Amin Vahdat, HotOS 2021, research.google/pubs/cores-that-dont-count/
Metrics anomaly: anomaly due to SDC

Anomaly due to SDC

Gradient Norm

Time
Metrics anomaly: expected anomaly (no SDC)

Anomaly with NO SDC

Gradient Norm

Time
SDC with no metrics anomaly
Gradient Norm

SDC detected with NO anomaly

The step replay shows different values,
but both values are in the normal range.

Time
ML Controller transparently handles Silent Data Corruption
(SDC)

Synchronous training worker SDC checker Hot spare

Defective machine SDC checker SDC Checker

Normal training causes SDC automatically moves training to
state identifies SDC hot spare and
sends defective
machine for repair
What Can These Models Do?
Example
In-context learning: Kalamang translation

First part of
chapter 1
Example
In-context learning: Kalamang translation
Kalamang is only spoken by ~130 people in eastern Indonesian Papua
In-context learning: Kalamang translation

With in-context info, model can translate as effectively as a human learner

who has spent months on the same language materials
Example

Video of bookshelf
-> JSON
“The killer app of
Gemini 1.5 Pro is video.”

Simon Willison

…
Example

Video understanding
& summarization
In a table, please write
the sport, the
teams/athletes involved,
the year and a short
description of why each
of these moments in
sports are so iconic.
Example Digitization of historical data

https://climatelabbook.substack.com/p/data-rescue-with-ai
Gemini 2.5 Pro example:
Code Generation via High Level Language
Inference time compute gives us another
dimension of compute for quality scaling
deepmind.google/technologies/gemini/flash-thinking/
deepmind.google/technologies/gemini/flash-thinking/
Now That We Have These Powerful
Models, What Will This Mean?
Shaping AI's Impact on Billions of Lives

● Form team of senior computer scientists + rising stars in AI

○ From academia, big tech and startups
● Propose what impact could be given directed research &
policy efforts on AI for public good Mariano-Florentino Cuéllar Jeff Dean John Hennessy
○ Rather than predict societal impact of AI given a laissez faire approach
● Aim to shape AI’s upsides and dampen AI’s downsides
○ For high, middle, and low income nations
● Audience: AI practitioners + policymakers + public
● Approach: Interview 24 experts in 7 ﬁelds
○ Employment, Education, Healthcare, Information, Media, Governance,
Finale Doshi-Velez Andy Konwinski Sanmi Koyejo
and Science
○ e.g. Barack Obama, Sal Khan, John Jumper, Neal Stephenson, Dario
Amodei, Bob Wachter, …
● Uncovered 5 guidelines for AI for public good
74
Pelonomi Moiloa Emma Pierson David Patterson
Shaping AI's Impact on Billions of Lives

Mariano-Florentino Cuéllar Jeff Dean John Hennessy

Finale Doshi-Velez Andy Konwinski Sanmi Koyejo

“Shaping AI's Impact on Billions of Lives,” by Mariano-Florentino (Tino) Cuéllar, Jeff Dean, Finale
Doshi-Velez, John Hennessy, Andy Konwinski, Sanmi Koyejo, Pelonomi Moiloa, Emma Pierson, and
David Patterson, December, 2024 75
See ShapingAI.com and arxiv.org/abs/2412.02730 Pelonomi Moiloa Emma Pierson David Patterson
Humans and AI systems working as a team can
do more than either on their own

● AI focused on human productivity produce

more positive beneﬁts than those focused
on human labor replacement
○ Increases human employability
○ Bonus: People can also be safeguards if AI veers
off course in areas not well trained
○ Bonus: People and AIs tend to make different
mistakes, so collaboration of experts with AI can
also improve results
● Productivity focus helps both AI and people
succeed

Shaping AI's Impact on Billions of Lives, see ShapingAI.com and arxiv.org/abs/2412.02730

To increase employment, aim for productivity
improvements in ﬁelds that create more jobs

● Despite tremendous productivity gains in computing and

passenger jets, the US in 2020 had 8 times more
commercial airline pilots and 11 times more
programmers than in 1970

● Demand for passenger travel and programming was

elastic ⇒ more jobs
○ Goods with elastic demand are those where a decrease in price
results in a large increase in the quantity acquired

● US agriculture demand is inelastic, so productivity gains

⇒ fewer jobs
○ From 20% of US workforce to 2% in one lifetime (1940 to 2020)

Shaping AI's Impact on Billions of Lives, see ShapingAI.com and arxiv.org/abs/2412.02730

What could be impact in next 5 years of near
term AI by following the guidelines?

● To give concrete targets for improving AI’s

impact, propose
milestoneskilometerstones per ﬁeld

● Rather than recognize past achievements,

offer signiﬁcant inducement prizes that try
to stimulate progress on these milestones
○ E.g., XPRIZE, Netﬂix, Kaggle, …

Shaping AI's Impact on Billions of Lives, see ShapingAI.com and arxiv.org/abs/2412.02730

Education AI Milestone: Worldwide Tutor

● A tutoring tool to accelerate general education

for every child
○ In their language
○ In their culture
○ In their best learning style
● To help teachers with challenge of supporting a
range of student capability
○ Keeping high-achieving students engaged while
supporting those who struggle
● E.g., Rising Academies* in Africa
○ Improves student outcomes by one grade level relative
to students without it

* Henkel, Owen, Hannah Horne-Robinson, Nessie Kozhakhmetova, and Amanda Lee. “Eﬀective and Scalable Math Support: Experimental Evidence on the 79
Impact of an AI-Math Tutor in Ghana.” In International Conference on Artiﬁcial Intelligence in Education, pp. 373-381. Cham: Springer Nature Switzerland, 2024.
Healthcare AI Milestone: Broad Medical AI

● Learns from many data modalities

○ Images, laboratory results, health records, genomics,
medical research, …
● Can help carry out diverse set of tasks
○ Bedside decision support
○ Interacting with patients after leaving hospital
○ Drafting radiology reports that describe both abnormalities
and relevant normal ﬁndings
■ While taking into account the patient’s history
● Can explain recommendations using written or
spoken text and images
● Milestone requires deﬁning metrics and benchmarks
to measure progress

Shaping AI's Impact on Billions of Lives, see ShapingAI.com and arxiv.org/abs/2412.02730

Information AI Milestone:
Civic Discourse Platform

● Mediates conversations or attitudes to enhance

public understanding and civic discourse
○ Move communities from polarization to pluralism
● AI system makes suggestions on how to rephrase
comments and questions more diplomatically*
● AI system to hold discussions with conspiracy
theorists**
● AI systems could help bring consensus on diﬃcult
issues across whole populations***
* Argyle, Lisa, et al. “Leveraging AI for democratic discourse: Chat interventions can improve online political conversations at scale.” Proc. National Academy of
Sciences, vol. 120, no. 41, 2023.
** Costello, Thomas, Gordon Pennycook, and David Rand. “Durably reducing conspiracy beliefs through dialogues with AI.” Science, vol. 385, no. 6714, 2024, p.
Eadq1814.
*** Tsai, Lily and Alex Pentland. “Rediscovering the Pleasures of Pluralism: The Potential of Digitally Mediated Civic Participation,” The Digitalist Papers, 2024.
Science

● Advances in science via AI could be one of

largest impacts for public good
● Many examples:
○ AlphaFold for protein folding
○ Black hole visualization
○ Flood forecasting
○ Materials discovery
○ Neural net-based weather prediction
○ Airplane contrail reduction to reduce CO2e
○ Controlling plasma for nuclear fusion
○ …
● Most ﬁelds of science excited about AI

Shaping AI's Impact on Billions of Lives, see ShapingAI.com and arxiv.org/abs/2412.02730

Science AI Milestone:
Scientist’s AI Aide/Collaborator

● Accelerate pace of science by improving the

productivity of scientists
○ Help suggest interesting hypotheses and automate
experiments
○ Identify important new relevant research, ideally
customized to individual to summarize what is new
compared to what the scientist already knew

Early example: Google’s Co-Scientist work*

● Multi-agent scientiﬁc discovery system, showing
inference time compute scaling leads to better rated
hypotheses
* research.google/blog/accelerating-scientiﬁc-breakthroughs-with-an-ai-co-scientist/
Shaping AI's Impact on Billions of Lives, see ShapingAI.com and arxiv.org/abs/2412.02730
Conclusions
● AI models and products are becoming incredibly
powerful and useful tools
○ Further research and innovation will continue this trend

● Will have dramatic impact in many diverse areas:

○ Healthcare, education, scientiﬁc research, media
creation, misinformation, …

● Potentially makes deep expertise more available to

many more people

● Done well, our AI-assisted future is bright!

The Evolution of Deep Learning
No ratings yet
The Evolution of Deep Learning
53 pages
Understanding Deep Learning Concepts
No ratings yet
Understanding Deep Learning Concepts
74 pages
Deep Learning Evolution at Google
No ratings yet
Deep Learning Evolution at Google
69 pages
Jeff Dean's Lecture For YC AI
100% (19)
Jeff Dean's Lecture For YC AI
86 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
TensorFlow for AI Researchers
No ratings yet
TensorFlow for AI Researchers
240 pages
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
No ratings yet
Self-Supervision, Bert, and Beyond: Building Transformer-Based Natural Language Processing Applications (Part 2)
117 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
Deep Learning Most Important Ideas PDF
No ratings yet
Deep Learning Most Important Ideas PDF
16 pages
NN DL Unit - III
No ratings yet
NN DL Unit - III
19 pages
Deep Learning Insights for Academics
No ratings yet
Deep Learning Insights for Academics
55 pages
Bigdata Neural Networks
No ratings yet
Bigdata Neural Networks
144 pages
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
No ratings yet
Unit-5 (DL For Different Domains, Role of GPUs and DL Frameworks)
15 pages
Deep Learning
100% (4)
Deep Learning
32 pages
The First Artificial Neuron
No ratings yet
The First Artificial Neuron
2 pages
Lecture 01
No ratings yet
Lecture 01
45 pages
Listofpapers1 0
No ratings yet
Listofpapers1 0
8 pages
Chapter1. Introduction To Deep Learning
No ratings yet
Chapter1. Introduction To Deep Learning
21 pages
AI 101 CheatSheet for Beginners
No ratings yet
AI 101 CheatSheet for Beginners
18 pages
DeepLearning - 1NT22CS078 - I Shania Jone
No ratings yet
DeepLearning - 1NT22CS078 - I Shania Jone
4 pages
CS231n Deep Learning Overview
No ratings yet
CS231n Deep Learning Overview
66 pages
Deep Learning Module-01
No ratings yet
Deep Learning Module-01
17 pages
Hardware Architectures For Deep Neural Networks-MIT'16
No ratings yet
Hardware Architectures For Deep Neural Networks-MIT'16
300 pages
Lec25 Architectures
No ratings yet
Lec25 Architectures
52 pages
L10-DL Intro
No ratings yet
L10-DL Intro
25 pages
7 CNN
No ratings yet
7 CNN
66 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
MN906 AI Watermarking
No ratings yet
MN906 AI Watermarking
99 pages
Deep Learning in Image Processing Review
No ratings yet
Deep Learning in Image Processing Review
23 pages
Dl-Unit 1
No ratings yet
Dl-Unit 1
12 pages
Deep Learning Basics Overview
90% (10)
Deep Learning Basics Overview
69 pages
Group I
No ratings yet
Group I
20 pages
Deep Learning Frameworks & Techniques
No ratings yet
Deep Learning Frameworks & Techniques
5 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Large-Scale Deep Learning with TensorFlow
No ratings yet
Large-Scale Deep Learning with TensorFlow
119 pages
Deep Learning Course Syllabus
No ratings yet
Deep Learning Course Syllabus
38 pages
Deep Learning Insights by Yann LeCun
No ratings yet
Deep Learning Insights by Yann LeCun
72 pages
Deep Learning Resources Guide
No ratings yet
Deep Learning Resources Guide
5 pages
What Is A Transformer
No ratings yet
What Is A Transformer
1 page
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Deep Learning Module Overview
No ratings yet
Deep Learning Module Overview
17 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
1 AI - Introduction and ML
No ratings yet
1 AI - Introduction and ML
32 pages
Deep Learning
No ratings yet
Deep Learning
127 pages
8 Deep Learning CNN
No ratings yet
8 Deep Learning CNN
63 pages
Deep Learning: A Comprehensive Guide
No ratings yet
Deep Learning: A Comprehensive Guide
12 pages
Deep Learning Cookbook Overview
No ratings yet
Deep Learning Cookbook Overview
24 pages
On Deep Learning
No ratings yet
On Deep Learning
97 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
UNIT I Part 1 Notes
No ratings yet
UNIT I Part 1 Notes
28 pages
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning For Computer Vision A Brief Review
No ratings yet
Computational Intelligence and Neuroscience - 2018 - Voulodimos - Deep Learning For Computer Vision A Brief Review
13 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Lecture 4-Deep Learning and Cognitive Computing
No ratings yet
Lecture 4-Deep Learning and Cognitive Computing
35 pages
Harnessing Large Language Models For Training-Free Video Anomaly Detection
No ratings yet
Harnessing Large Language Models For Training-Free Video Anomaly Detection
13 pages
Holmes-VAD - Towards Unbiased and Explainable Video Anomaly Detection Via Multi-Modal LLM
No ratings yet
Holmes-VAD - Towards Unbiased and Explainable Video Anomaly Detection Via Multi-Modal LLM
19 pages
Lijuan Slides Cvpr2024 Fundationmodels
No ratings yet
Lijuan Slides Cvpr2024 Fundationmodels
25 pages
Ethical Issues in Large Vision Datasets
No ratings yet
Ethical Issues in Large Vision Datasets
25 pages
Grounded SAM - Assembling Open-World Models For Diverse Visual Tasks
No ratings yet
Grounded SAM - Assembling Open-World Models For Diverse Visual Tasks
11 pages
CountCLIP Lingchen
No ratings yet
CountCLIP Lingchen
15 pages
11.29-Paper reading-OmniVL
No ratings yet
11.29-Paper reading-OmniVL
25 pages
AGI in Computer Vision Insights from GPT
No ratings yet
AGI in Computer Vision Insights from GPT
59 pages
Missing Person Detection with AI Techniques
No ratings yet
Missing Person Detection with AI Techniques
4 pages
K-Means Clustering Guide
No ratings yet
K-Means Clustering Guide
31 pages
HimaniBansal (Resume)
No ratings yet
HimaniBansal (Resume)
1 page
Nptel Assignment
No ratings yet
Nptel Assignment
28 pages
Essential AI Terms for Business Leaders
No ratings yet
Essential AI Terms for Business Leaders
5 pages
Data Science & Big Data Analytics Course
No ratings yet
Data Science & Big Data Analytics Course
47 pages
Soham Chakraborty - ERP-CA2
No ratings yet
Soham Chakraborty - ERP-CA2
2 pages
Machine Learning in SCM Optimization
No ratings yet
Machine Learning in SCM Optimization
7 pages
Machine Learning Basics for Beginners
No ratings yet
Machine Learning Basics for Beginners
122 pages
E-Governance Project Approval
No ratings yet
E-Governance Project Approval
23 pages
Artificial Neural Networks in Business Two Decades of Research2016Applied Soft Computing Journal
No ratings yet
Artificial Neural Networks in Business Two Decades of Research2016Applied Soft Computing Journal
17 pages
Data MBA Online Certification 2020
No ratings yet
Data MBA Online Certification 2020
16 pages
AI Hamadani Et Al. TOC
No ratings yet
AI Hamadani Et Al. TOC
6 pages
Fraud Detection Using Machine Learning
No ratings yet
Fraud Detection Using Machine Learning
46 pages
AI in CRE - APAC - VFINAL
No ratings yet
AI in CRE - APAC - VFINAL
18 pages
Locally Weighted Linear Regression
No ratings yet
Locally Weighted Linear Regression
11 pages
Marksheet Complete
No ratings yet
Marksheet Complete
5 pages
Automated Brain Tumor Detection Using Convolutional Neural Networks On MRI Scans
No ratings yet
Automated Brain Tumor Detection Using Convolutional Neural Networks On MRI Scans
6 pages
ML Exam Q&A
No ratings yet
ML Exam Q&A
10 pages
Crime Analysis and Foreacsting
No ratings yet
Crime Analysis and Foreacsting
20 pages
Research Paper (1) (1) (1) Final
No ratings yet
Research Paper (1) (1) (1) Final
4 pages
CSE455/CSE552 Machine Learning (Spring 2024) Homework #1: Hand-In Policy Collaboration Policy Grading
No ratings yet
CSE455/CSE552 Machine Learning (Spring 2024) Homework #1: Hand-In Policy Collaboration Policy Grading
2 pages
Dual View AI Enhances Breast Cancer Detection
No ratings yet
Dual View AI Enhances Breast Cancer Detection
15 pages
Essential Machine Learning Interview Questions
No ratings yet
Essential Machine Learning Interview Questions
14 pages
Bringing Generative AI To Adaptive Learning in Education
No ratings yet
Bringing Generative AI To Adaptive Learning in Education
14 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
25 pages
The Indian Hospitality Industry-A Comprehensive Guide To Managing COVID-19 (Hotelivate Release)
No ratings yet
The Indian Hospitality Industry-A Comprehensive Guide To Managing COVID-19 (Hotelivate Release)
33 pages
C3.Ai A New Technology Stack
No ratings yet
C3.Ai A New Technology Stack
22 pages
14.M.E Big Data
No ratings yet
14.M.E Big Data
89 pages
Curriculum For Master of Computer Applications (MCA) 2016: Apj Abdul Kalam Technological University
No ratings yet
Curriculum For Master of Computer Applications (MCA) 2016: Apj Abdul Kalam Technological University
13 pages