0% found this document useful (0 votes)

48 views49 pages

E3. AI Agents

The document discusses the capabilities and limitations of large language models (LLMs) like GPT-4 in performing real-world tasks, highlighting the significant gap between benchmark performance and practical applications. It emphasizes the need for AI agents to possess critical capabilities such as tool use, abstract reasoning, and up-to-date knowledge to effectively automate tasks with minimal human intervention. The introduction of WebArena is proposed as a realistic environment for evaluating AI agents, addressing the challenges in current evaluation methods.

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views49 pages

E3. AI Agents

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Solving Real-World Tasks

with AI Agents
Shuyan Zhou
Language Technologies Institute
Carnegie Mellon University
[email protected]
shuyanzhou.com
LLMs are useful, people are optimistic about the future
$1.3T revenue from generative AI in 2032
w/o GPT-4
w/ GPT-4

Density

Quality of work

Sparks of Artificial General Intelligence:

Early experiments with GPT-4
Sébastien Bubeck Varun Chandrasekaran Ronen Eldan Johannes Gehrke
Eric Horvitz Ece Kamar Peter Lee Yin Tat Lee Yuanzhi Li Scott Lundberg
Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang

Microsoft Research
r 2023

Abstract
[Dell’Acqua et intelligence
Artificial al, 2023] (AI) [Bloomberg 2023]
researchers have been developing and refining large language models (LLMs)
2
that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding
LLMs can assist humans in many self-contained tasks
LLMs

“Write a data loader to read

this csv le”

def data_loader
…
Speed up a small part of a task
Not automate the tasks in an
end-to-end fashion 3
fi
The dream of AI is far more wild
action My research goal Automate various tasks with
minimal human intervention
Perform
AI agents scienti c
research

feedback
Develop software
Personalized health
and wellness
Reproduce results

Experiments

Literature review Finance and growth

4
management
fi
Questions to answer

How good are strong LLMs (e.g., GPT-4)? How can we

perform reliable evaluation?

What are the fundamental gaps between LLMs and AI

agents?

How could we mitigate the gaps?

5
Talk Overview
Natural language has LLMs know up to a
How good are LLMs? cutoff date
inherent limitations

Learning new
Evaluating AI Speaking AI’s
knowledge by
agents “language”
reading
- Zhou* et al., WebArena, ICLR 2024 - Zhou et al., PaP, SUKI 2022 - Zhou et al., DocPrompting, ICLR 2023
- Wang, Cuenca, Zhou et al., MCoNaLa, F- - Zhou* et al., PaL, ICML 2023 - Zhou* et al., Hierarchical Procedural KB,
EACL 2023 - Madaan, Zhou et al., CoCoGen, EMNLP 2022 ACL 2022
- Wang, Zhou et al., ODEX, F-EMNLP 2023 - Zhang, Xu, Yang, Zhou et al, Crepe, F-EACL 2023 6
Signi cant gap in benchmarks vs real-world applications

Task-solving rate on
Miniwob++
96.3%
human

“Play my favorite music” … GPT 4

[Liu et al., Miniwob++, 2018]

7
fi
Signi cant gap in benchmarks vs real-world applications

Task-solving rate on
Miniwob++
96.3%
human

… GPT 4

“Assign this issue to myself”

8
fi
Requirements for the agent evaluation

Realistic Useful &

interactive complex
environment tasks

Existing evaluations make trade-offs between them

Reliable Easy
evaluation extendability

Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 9
WebArena ful lls all requirements without compromise

Realistic Useful & Invite Alexis to

my agent repo
interactive complex
environment tasks
with rich contents

Checking the
members..
Reliable Easy
evaluation extendability
“Alexis
invited”

Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 10
fi
Example task in WebArena
Shop Find the customer who has spent the most money in my store over the past 56 days.
owner Customer
Send the customer some owers. appreciation task

Outcome-based evaluation
• A new order with owers

Identify the customer by • Shipped to Alex Martin

Buy some owers online
examining the order history
to the customer
in the store portal

812 long-horizon, realistic computer tasks

Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 11
fl
fl
fl
fl
LLMs are the critical yet early step toward AI autonomy
LLMs lack several critical capabilities to be AI agents
78.2
WebArena Task Success

Huge gap!
Rate (%)

14.9
7.1 8.9
1.4 1.7
Human Llama2 Gemini
Mixtral GPT 3.5 GPT 4
70B Pro
{
Open-source models struggle
Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 12
LLMs lack critical capabilities to be AI agents
Tool use
AI agents
Alex’s total spend is • Employ tools to enhance accuracy
78.56 x 7 + 46.7 = 543.6
and expand capabilities
56 days ago is 5/20/2023

LLMs
• Scarce in natural language corpus
• Not consider tool use in standard
LLM development

Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 13
LLMs lack critical capabilities to be AI agents
Abstract reasoning
AI agents
• Learn the common principles
• Maintain steady and reliable
Fork `metaseq` performance
Fork `transformers`
Fork all repos owned by Meta LLMs
• Inconsistent performance
across conceptually similar
tasks
Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 14
LLMs lack critical capabilities to be AI agents

Find the customer

who spent […] Send How can I nd all orders?
the customer […]

Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 15
fi
LLMs lack critical capabilities to be AI agents
Up-to-date knowledge
AI agents
• Up-to-date knowledge to deal
How can I nd all orders? with the evolving world

LLMs
• Knowledge of LLMs is
limited by the training cutoff
GPT-4 knowledge cutoff: Sep 2021
WebArena application version: Jan 2023

Zhou* et al, WebArena: A realistic web environment for building autonomous agents, ICLR 2024 16
fi
Tool use
Up-to-date knowledge
Abstract reasoning

Speaking AI’s Learning by

“language” reading docs

17
Generating natural language for various tasks
Alex Martin made three orders: $47.51 on 9/18/2023, $765.8 on 1/1/2024 and
$35.4 on 1/9/2024. How much he spent in my store in the last 56 days?

Today is 1/20/2024. I rst subtract 20

days […] The date 56 days ago is
12/20/2023
[Wei et al., Chain-of-thought]
[…] Order 1 was placed on 9/18/2023,
which is not within the last 56 days
[…] 765.8 + 35.4 = $785.4

Zhou et al, Procedures as programs: hierarchical control of situated agents through natural language, SUKI 2022 18
fi
Natural language exhibits limitations in performing tasks
Today is 1/20/2024, Alex made three orders: $47.51 on 9/18/2023, $765.8 on
1/1/2024, $35.4 on 1/9/2024. How much has he spent in the last 56 days?

Today is 1/20/2024. I rst subtract 20

days […] The date 56 days ago is
12/20/2024
[…] Order 1 was placed on 9/18/2023,
which is not within the last 56 days
[…] 765.8 + 35.4 = $785.4

[Wei et al., Chain-of-thought]

Con ne reasoning and solving within LLMs

Zhou et al, Procedures as programs: hierarchical control of situated agents through natural language, SUKI 2022 19
fi
fi
Natural language exhibits limitations in performing tasks
Today is 2/13/2024
1/20/2024, Alex made three orders: $47.51 on 9/18/2023, $765.8 on
1/1/2024, $35.4 on 1/9/2024. How much has he spent in the last 192 days
56 days?

Today is 1/20/2024. I rst subtract 20 Today is 2/13/2024. I rst subtract 13

days […] The date 56 days ago is days […] The date 192 days ago is
?
12/20/2024 8/5/2023.
[…] Order 1 was placed on 9/18/2023, […] Order 1 was placed on 9/18/2023,
which is not within the last 56 days which is within the last 192 days
[…] 765.8 + 35.4 = $785.4 […] 47.51 + 765.8 + 35.4 …
[Wei et al., Chain-of-thought]

Con ne reasoning and solving within LLMs Express solutions at the example level

Zhou et al, Procedures as programs: hierarchical control of situated agents through natural language, SUKI 2022 20
fi
fi
fi
Maybe AI agents should speak another
“language”, but what is that?

21
Solving various tasks by reasoning with programs (PaL)
Today is 1/20/2024, Alex made three orders: $47.51 on 9/18/2023, $765.8 on
1/1/2024, $35.4 on 1/9/2024. How much has he spent in the last 56 days?

[...] [...]
The first order is $47.51 order1_amount = 47.51
It was made on 9/18/2023 order_1_date = datetime(2023,9,18)
[...] [...]
Now check if the first order # check if order 1 is within the period
was placed within the period if order_1_date > start_date:
9/18/2023 is before the period, alex_total_spend += order1_amount
so it is not included [...]
[...]
So the answer is $801.2 >>> The total is $801.2
[Wei et al., Chain-of-thought] PaL
Zhou* et al, PaL: Program-aided language models, ICML 2023 22
Key design choices of PaL
Today is 1/20/2024, Alex made three orders: $47.51 on 9/18/2023, $765.8 on
1/1/2024, $35.4 on 1/9/2024. How much has he spent in the last 56 days?

Interleave between natural language

Python
and programming language
order1_amount = 47.51
order2_amount = 765.8
[...]
# check if order 1 is within 56 days
[...]
• Abundant
• Easily comprehensible a = 47.51 [Chowdhery et al, PaLM]
b = 765.8 [Mishra et al, Lila]
return float(a + b) [Austin el at, Learning ..]

Zhou* et al, PaL: Program-aided language models, ICML 2023 23

Few-shot in-context learning with coding-pro cient LLMs
Alex Martin made three orders: $47.51
Input 1 on 9/18/2023, $765.8 on
1/1/2024 and $35.4 on 1/9/2024. How much he spent in my examples
In-context
store in the last 56 days?Program 1
Input 2 • Manually create
Program 2 • Select from a training set
…

coding-pro cient LLM

? [...]
order1_amount = 47.51
order_1_date = ...
# check if [...] 24
Zhou* et al, PaL: Program-aided language models, ICML 2023
fi
fi
PaL of oads the solving to tools seamlessly
Task solving accuracy (%) on
date understanding (Bigbench)

Today is 1/20/2024 […] How much has he 76.2

spent in the last 56 days?
from ..
from datetime import datetime, timedelta a = ..
b = ..
today = datetime(2024, 1, 20) c = a - timedelata(days=56)
64.8
# calculate 56 days ago 63.4
start_date = today - timedelta(days=56)
[...]
if order_1_date > start_date: CoT PaL PaL w/
[...] [Wei et al., 2022] only PL
[Chowdhery et al, PaLM]
[Mishra el at, Lila]
[Austin el at, Learning ..]
Zhou* et al, PaL: Program-aided language models, ICML 2023 25
fl
PaL > Large language models + Tools
Alex made two orders within the • Parsing failures
• Error propagation

Task solving accuracy (%) on GSM8k

last 56 days: one for $765.8 and
another for $35.4. How much did he • Limited toolset
spend in total? 72.0
63.1 65.4
[…] the total of two orders is
765.8 + 35.8 […]

order1_value = 765.8
[...]

[…] the total of two orders is

CoT PaL CoT
765.8 + 35.8 [Wei et al., 2022] +
<calculator(765.8+35.8)=801.6> Calculator
[Schick et al., Toolformer]
801.6[…]
Zhou* et al, PaL: Program-aided language models, ICML 2023 26
Natural language performs example-level problem solving
Today is 1/20/2024, Alex made three orders: $47.51 on 9/18/2023, $765.8 on
1/1/2024, $35.4 on 1/9/2024. How much has he spent in the last 56 days?

Slight changes result in signi cant solution difference

Today is 1/20/2024. I rst subtract 20 Today is 2/13/2024. I rst subtract 13

days […] The date 56 days ago is days […] The date 192 days ago is
12/20/2024 8/5/2023.
[…] Order 1 was placed on 9/18/2023, […] Order 1 was placed on 9/18/2023,
which is not within the last 56 days which is within the last 192 days
[…] 765.8 + 35.4 = […] 47.51 + 765.8 + 35.4 …

Indirect

Zhou* et al, PaL: Program-aided language models, ICML 2023 27

fi
fi
fi
Programs encourage express “task templates”

today = datetime(2024,1,20) today = datetime(2024,2,13)

start_date = today - \ start_date = today - \
timedelta(days=56) timedelta(days=192)
[...] [...]
if order_1_date > start_date: if order_1_date > start_date:
total += order_1_amount total += order_1_amount
[...] [...]

direct

PaL
Zhou* et al, PaL: Program-aided language models, ICML 2023 28
Programs enhance LLMs in using in-context examples
• Maintain an object attribute list
4
• Spatial reasoning
Task solving accuracy (%)

2
What’s the color of the right most object?
1

0
Colo Peng Repe Objec
red o uins at cop t cou
b ject y nting What’s the color of the object left to
the goggle?
Datasets where different examples share common
Example tasks in colored objects
problem-solving strategies
Zhou* et al, PaL: Program-aided language models, ICML 2023 29
Programs enhance LLMs in using in-context examples
CoT PaL
100
96.7

Task solving accuracy (%)

95.1 93.3 90.6
75 86.3
79.2
68.8 73
50

0
Colo Peng Repe Objec
red o uins at cop t cou
bject y nting

Datasets where different examples share common

problem-solving strategies

Zhou* et al, PaL: Program-aided language models, ICML 2023 30

Bonus: Programs naturally encode structures
class Graph:
“Get Alex’s total spend within 56 days” goal = "Get the total spend of
Alex within 56 days"
def __init__(self):
Identify the date 56
days ago identify_date_56_days_ago = Node()
verify_order1_date = Node()
[...]
Verify order 1’s Verify order 2’s Verify order 3’s identify_date_56_days_ago.children = [
date date date verify_order1_date,
verify_order2_date
verify_order3_date
]
Sum the quali ed
orders
By a coding-pro cient model

Madaan, Zhou et al, Large language models of code are few-shot commonsense learners, EMNLP 2022
Zhang, Xu, Yang, Zhou et al, Causal Reasoning of Entities and Events in Procedural Texts, F-EACL 2023 31
fi
fi
Hypothesis 1: Corpus
• Pre-training corpus for code models contains procedural knowledge
useful for these tasks, e.g., game engine

Code snippet taken from https://github.com/allenai/ScienceWorld/ 32

Hypothesis 2: Training
class BakeACake:
def __init__(self) -> None:
self.find_recipe = Node()
self.gather_ingredients = Node()
self.mix_ingredients = Node()
self.find_recipe = Node()
self.preheat_oven_at_375f = Node()
self.put_cake_batter_into_oven = Node()
self.take_cake_out_after_30_min = Node()

self.find_recipe.children = [self.gather_ingredients, self.preheat_oven_at_375f]

self.gather_ingredients.children = [self.mix_ingredients]
self.mix_ingredients.children = [self.put_cake_batter_into_oven]
self.preheat_oven_at_375f.children = [self.put_cake_batter_into_oven]
self.put_cake_batter_into_oven.children = [self.take_cake_out_after_30_min]

Training on code makes the model better at

procedures / long-range inference / connecting-the-dots

[Kim et al, 2023] Coding-pro cient model shows stronger performance on entity tracking
33
fi
PaL brings a range of problems under one roof
Connecting PaL and follow-up work
+ Multi-sample generation
[Zhou et al, PaL] Improve program
+ More modularized planning generation quality
[PaL, Jiang et al]
+ Execution feedback
[Wang et al, Sun et al]

+ APIs for other modalities For multi-modal

[Lu et al, Stanic et al] tasks

PaL + Finetune with program-aided

solution for speci c domains
(e.g., math)
Sophisticated domain
[Yue et al, Xu et al]
models
34
fi
Speak general-purpose
programming language with a
coding-pro cient model
Tool use
Evaluating AI Abstract
Speaking AI’sreasoning Learning by
agents “language” reading docs

35
fi
LLMs do not always have enough knowledge

Find the customer

who has spent the
most money in my How can I nd all orders?
store over the past 56
days. Send the
customer some
owers.

36
fl
fi
Knowledge is limited by the training cutoff

How can I nd all orders?

Trained
Updated, new knowledge
knowledge

Time
Knowledge cutoff
37
fi
Humans adapt to new knowledge via reading

Not available for new knowledge

Direct demonstrations
38
Study scenario: using new tools by reading tool docs

“List slurm jobs

submitted by John” “List slurm jobs
submitted by John”
Bash commands
squeue ..
ls

Python APIs
“Make a temporary
mkdtemp le to save the logs”
numpy

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 39
fi
DocPrompting: Retrieval-then-generation
Docs for new commands
View slurm jobs submitted by John

squeue is used to view job squeue is used to view job

… by Slurm. … by Slurm

-u <user_list> —user=<.. -u <user_list> —user=<..

squeue -u john
Specify the usernames … Specify the usernames …

-i <seconds>, -- …

Retriever Generator
-j, <job_id_list> …

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 40
(n, di ), ∉ Dn �ni , dj �.
n n i n
+ − ∗ −
positive pair while each and n form a negative pair
dj We train the retriever
in a contrastiveContrastively
fashion where the similarity score
training of a positive pair is maximized while that of
+the doc retriever
in-batch negative pairs is minimized. For a pair (ni , di ), the loss function is defined as:
exp �sim(hn , hd+i )� Cosine similarity
L = − log
r
(3)
exp �sim(hn , hd+i )� + ∑d− ∈B�Dn∗ exp �sim(hn , hd−j )�
j

2
https://github.com/elastic/elasticsearch

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 41
(n, di ), ∉ Dn �ni , dj �.
n n i n
+ − ∗ −
positive pair while each and n form a negative pair
dj We train the retriever
in a contrastiveContrastively
fashion where the similarity score
training of a positive pair is maximized while that of
+the doc retriever
in-batch negative pairs is minimized. For a pair (ni , di ), the loss function is defined as:
exp �sim(hn , hd+i )� Cosine similarity
L = − log
r
(3)
exp �sim(hn , hd+i )� + ∑d− ∈B�Dn∗ exp �sim(hn , hd−j )�
j

2
https://github.com/elastic/elasticsearch
squeue is used to view job
…
by Slurm.
3

View slurm jobs submitted ls is used to list the

by John information ….

… dropout …
dropout

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 [SimCSE, Gao et al.] 42
Retrieve k nearest documents
…

… …

…
…

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 43
Learning to read the documents
..
log p(c * | c1, c2, . . . , cn, . . . i)
View slurm jobs submitted
by shuyanzh every 5 secs
Retriever retrieves irrelevant information!

squeue is used to view job

… by slurm
Generator squeue -u john

-u <user_list> —user=<..
Specify the usernames …

ls is used to list the Learning to ignore irrelevant information

information ….

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 44
DocPrompting is applicable to various model architectures

NL doc 1 doc 2 doc 3 pre x generation code

NL + doc 1

NL + doc 2 encoder cat .. decoder code

NL + doc 3

Zhou, Alon, Xu, Wang, Jiang, Neubig, DocPrompting, ICLR 2023 [FID, Izacard and Grave] 45
fi
DocPrompting allows models to adapt to unseen tools
without explicit demonstrations
Docs for held-out
commands Bash command exact match (%)

175B
22.55
220M
Retriever

9.15 8.94

2.18
Generator
CodeT5 + OpenAI +
(supervised) DocPrompting Codex In-doc retrieval

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 46
DocPrompting allows models to adapt to unseen tools
without explicit demonstrations
Docs for held-out
CodeT5 CodeT5+DocPrompting
Python APIs
40
34.46
31.87
30 27.54 27.08
25.54
23.38

pass@k
18.7
20
Retriever 14.31
8.26
10
5.41

0
Generator 1 10 50 100 200
Execution-based Evaluation for
Python code generation (CoNaLa)
Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 47
Docs ease the mapping between NL and code

NL Docs Code

NL ↔ Code NL ↔ Docs (NL + Docs) ↔ Code

40
39

30
Recall(%)

24
20
12
10 8
0 2
0
1 2
N-gram Matching Recall

Zhou et al, DocPrompting: Generating code by retrieving the docs, ICLR 2023 48
What docs created by humans that
explain the tool usage

retrieval and doc-augmented

How
generation

Evaluating AI Up-to-date
Speaking AI’s knowledge Learning by
agents “language”
Human-written docs as
- Theorem provingreading
[Wu et al, LeanDoJo]
- Proprietary code libraries [Zan et al, When]
learning resources - API use in products

+ Code document - [Zhou et al, Generating Code

Explanations with Controllability on
generation Purpose]
49

Large Language Model Agent: A Survey On Methodology, Applications and Challenges
No ratings yet
Large Language Model Agent: A Survey On Methodology, Applications and Challenges
26 pages
LLM AI Agents: Task Planning & Tools
No ratings yet
LLM AI Agents: Task Planning & Tools
36 pages
GALLM Unit 5 Note
No ratings yet
GALLM Unit 5 Note
7 pages
AI Agents: LLM vs. Traditional
No ratings yet
AI Agents: LLM vs. Traditional
15 pages
Demystifying AI Agents Curated by ProductHood School 1735180517
No ratings yet
Demystifying AI Agents Curated by ProductHood School 1735180517
17 pages
LLM Agent History
No ratings yet
LLM Agent History
71 pages
LLM Agents and Tool Use
No ratings yet
LLM Agents and Tool Use
62 pages
Agentic Ai Beginner Guide
No ratings yet
Agentic Ai Beginner Guide
5 pages
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
100% (3)
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
48 pages
EL4106Intro 2024
No ratings yet
EL4106Intro 2024
69 pages
Sanet - ST - Building Applications With AI Agents
100% (2)
Sanet - ST - Building Applications With AI Agents
72 pages
Inference Efficiency by Learning Task Complexity
No ratings yet
Inference Efficiency by Learning Task Complexity
9 pages
Literature Survey 1
No ratings yet
Literature Survey 1
6 pages
Code Agents
No ratings yet
Code Agents
24 pages
Agents in Langchain
No ratings yet
Agents in Langchain
6 pages
Navin
No ratings yet
Navin
12 pages
IAI Sp2025 Session 16 - Improving LLMs (Continued)
No ratings yet
IAI Sp2025 Session 16 - Improving LLMs (Continued)
28 pages
How AI Agents Can Help Supercharge Language Models - A Handbook For Developers
No ratings yet
How AI Agents Can Help Supercharge Language Models - A Handbook For Developers
127 pages
Lan Graph
No ratings yet
Lan Graph
7 pages
AI Product Essentials Crash Course
No ratings yet
AI Product Essentials Crash Course
270 pages
LLM-Based Agents: A Comprehensive Survey
No ratings yet
LLM-Based Agents: A Comprehensive Survey
86 pages
LLM Agents: Framework and Applications
No ratings yet
LLM Agents: Framework and Applications
11 pages
Reinforcement Learning 2505.01441v1
No ratings yet
Reinforcement Learning 2505.01441v1
40 pages
Large Action Models: From Inception To Implementation: Lu Wang Fangkai Yang Chaoyun Zhang Junting Lu
No ratings yet
Large Action Models: From Inception To Implementation: Lu Wang Fangkai Yang Chaoyun Zhang Junting Lu
25 pages
Google REST
No ratings yet
Google REST
19 pages
How LLM Agents Learn and Reason
No ratings yet
How LLM Agents Learn and Reason
2 pages
Building LLM Agents by Incorporating Insights From Computer Systems
No ratings yet
Building LLM Agents by Incorporating Insights From Computer Systems
14 pages
Dynamic LLM Agents for Real-World Tasks
No ratings yet
Dynamic LLM Agents for Real-World Tasks
15 pages
REX: Enhancing AI Agents with Rapid Exploration
No ratings yet
REX: Enhancing AI Agents with Rapid Exploration
16 pages
LLM Agents: Framework and Applications
No ratings yet
LLM Agents: Framework and Applications
16 pages
Enabling Intelligent Interactions Between An Agent and An
No ratings yet
Enabling Intelligent Interactions Between An Agent and An
17 pages
What We Learned From A Year of Building With LLMs (Part I) - O'Reilly
No ratings yet
What We Learned From A Year of Building With LLMs (Part I) - O'Reilly
22 pages
Multi-Agentic RAG With Hugging Face Code Agents - by Gabriele Sgroi, PHD - Dec, 2024 - Towards Data Science
No ratings yet
Multi-Agentic RAG With Hugging Face Code Agents - by Gabriele Sgroi, PHD - Dec, 2024 - Towards Data Science
42 pages
Principles of Building AI Agents - Deck Version-1
100% (1)
Principles of Building AI Agents - Deck Version-1
12 pages
LLM Powered Autonomous Agents - Lil'Log
No ratings yet
LLM Powered Autonomous Agents - Lil'Log
24 pages
LLM Agent Architecture Explained
No ratings yet
LLM Agent Architecture Explained
2 pages
Fine Tuning Techniques For Large Language Models LLMs
100% (4)
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Application Driven Valuen Alignment in Agentic AI Systems
No ratings yet
Application Driven Valuen Alignment in Agentic AI Systems
38 pages
Comparing LLMs Using A Unified Performance Ranking System
No ratings yet
Comparing LLMs Using A Unified Performance Ranking System
13 pages
Controllable LLM Planning Framework
No ratings yet
Controllable LLM Planning Framework
22 pages
2.10 Tool Use and Agents
No ratings yet
2.10 Tool Use and Agents
3 pages
9625 WebRL Training LLM Web Ag
No ratings yet
9625 WebRL Training LLM Web Ag
19 pages
Genie
No ratings yet
Genie
40 pages
Advances and Challenges in Foundation Agents
No ratings yet
Advances and Challenges in Foundation Agents
264 pages
Autoagents: A Framework For Automatic Agent Generation
No ratings yet
Autoagents: A Framework For Automatic Agent Generation
9 pages
What We Learned From A Year of Building With LLMs (For True Epub) (Eugene Yan, Bryan Bischof, Charles Frye Etc.)
No ratings yet
What We Learned From A Year of Building With LLMs (For True Epub) (Eugene Yan, Bryan Bischof, Charles Frye Etc.)
90 pages
AI Agents - Introduction (Part-1) - Discover AI Agents, Their Design, and - by Vipra Singh - Medium
No ratings yet
AI Agents - Introduction (Part-1) - Discover AI Agents, Their Design, and - by Vipra Singh - Medium
22 pages
A Survey On Large Language Model Based Autonomous Agents
No ratings yet
A Survey On Large Language Model Based Autonomous Agents
42 pages
Emerging Architectures For LLM Applications - Andreessen Horowitz
No ratings yet
Emerging Architectures For LLM Applications - Andreessen Horowitz
15 pages
Ai Agent Overview
100% (2)
Ai Agent Overview
33 pages
Cognitive Architectures For Language Agents: Theodore R. Sumers Shunyu Yao Karthik Narasimhan Thomas L. Griffiths
No ratings yet
Cognitive Architectures For Language Agents: Theodore R. Sumers Shunyu Yao Karthik Narasimhan Thomas L. Griffiths
32 pages
Enabling LLM Based Agents
No ratings yet
Enabling LLM Based Agents
19 pages
LLM Basics for Researchers
No ratings yet
LLM Basics for Researchers
54 pages
Survey On Autonomous AI Agents For Task Automation and Advanced Reasoning - Final PPT - Review2
No ratings yet
Survey On Autonomous AI Agents For Task Automation and Advanced Reasoning - Final PPT - Review2
12 pages
Artificial Intelligence Agents Notes
No ratings yet
Artificial Intelligence Agents Notes
4 pages
Ai Agents
No ratings yet
Ai Agents
39 pages
Agentic Ai Interview Questions
No ratings yet
Agentic Ai Interview Questions
26 pages
E5. Efficient LM Methods
No ratings yet
E5. Efficient LM Methods
41 pages
1 Solution
No ratings yet
1 Solution
3 pages
LLM Scaling Laws & Emergent Capacities
No ratings yet
LLM Scaling Laws & Emergent Capacities
23 pages
Pre-Training & LLM 2
No ratings yet
Pre-Training & LLM 2
46 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
Neural Language Models & Tokenization
No ratings yet
Neural Language Models & Tokenization
70 pages
Multi-Class Classification
No ratings yet
Multi-Class Classification
52 pages
Deep Learning Recap
No ratings yet
Deep Learning Recap
13 pages
Matrices and Linear Transformations
No ratings yet
Matrices and Linear Transformations
74 pages
Introduction
No ratings yet
Introduction
6 pages
0.1. Probability Review
No ratings yet
0.1. Probability Review
6 pages
Orthogonality
No ratings yet
Orthogonality
61 pages
Subspace and Basis
No ratings yet
Subspace and Basis
60 pages
Italian Business Culture Report
No ratings yet
Italian Business Culture Report
9 pages
Gec 5
No ratings yet
Gec 5
9 pages
Walkthrough of Language, Reading and Literacy Shaping Paper & Clustering of Learning Competencies
100% (11)
Walkthrough of Language, Reading and Literacy Shaping Paper & Clustering of Learning Competencies
42 pages
Total Physical Response PDF
No ratings yet
Total Physical Response PDF
4 pages
Understanding Speech Act Theory
No ratings yet
Understanding Speech Act Theory
8 pages
X SB English Board Question Papers
No ratings yet
X SB English Board Question Papers
86 pages
Ebook - 8 Principles New Cover
100% (1)
Ebook - 8 Principles New Cover
32 pages
Hanna-Barbera Cartoons Essays 1995
100% (4)
Hanna-Barbera Cartoons Essays 1995
44 pages
Tigrinya Cordon Search LSK
No ratings yet
Tigrinya Cordon Search LSK
28 pages
AZMEH, Aziz Al. (2014) The Emergence of Islam in Late Antiquity. Allah and His People, Cap.03
No ratings yet
AZMEH, Aziz Al. (2014) The Emergence of Islam in Late Antiquity. Allah and His People, Cap.03
64 pages
m1 Critical Thinking
No ratings yet
m1 Critical Thinking
4 pages
漢字加密的故事和挑戰
No ratings yet
漢字加密的故事和挑戰
28 pages
2 Leisure Time
No ratings yet
2 Leisure Time
19 pages
Placement Test - Model Answer - Year
No ratings yet
Placement Test - Model Answer - Year
3 pages
Edith Bonomi CV PDF
No ratings yet
Edith Bonomi CV PDF
1 page
Present Tense Regular
No ratings yet
Present Tense Regular
6 pages
ENGLISH 5 Use Complex Sentences To Show Problem-Solution Relationship of Ideas
50% (2)
ENGLISH 5 Use Complex Sentences To Show Problem-Solution Relationship of Ideas
11 pages
Alexander Pope and the Heroic Couplet
No ratings yet
Alexander Pope and the Heroic Couplet
2 pages
Factors Affecting Handwriting
No ratings yet
Factors Affecting Handwriting
20 pages
Thesis On Improving Speaking Skills
100% (3)
Thesis On Improving Speaking Skills
4 pages
Basic of HTML: Name Enrollment Number
No ratings yet
Basic of HTML: Name Enrollment Number
29 pages
Kabbalah Unveiled
No ratings yet
Kabbalah Unveiled
386 pages
Evilia 2022
No ratings yet
Evilia 2022
10 pages
1.640 ATP 2023-24 GR 6 Setswana HL Final DATED Edited
No ratings yet
1.640 ATP 2023-24 GR 6 Setswana HL Final DATED Edited
31 pages
2nd Sem Eng-1
No ratings yet
2nd Sem Eng-1
22 pages
Quantitative Methods for Business Students
100% (1)
Quantitative Methods for Business Students
187 pages
Balinisteanu RomanianFolkloreLiterary 2016
No ratings yet
Balinisteanu RomanianFolkloreLiterary 2016
24 pages
Legislation Concerning The Vlachs - Before and After Ottoman Conquest
No ratings yet
Legislation Concerning The Vlachs - Before and After Ottoman Conquest
19 pages
Parent-Community Involvement Plan CTEL 503
No ratings yet
Parent-Community Involvement Plan CTEL 503
8 pages