Agentic Inference Systems

This project implements advanced agentic systems for complex inference tasks, focusing on Deep Research capabilities and Self-Refining generation strategies.

Project Structure

The repository focuses on two main agentic workflows:

1. Deep Research Agent (`deep_research_agent/`)

A comprehensive research agent capable of executing multi-step research tasks.

Core capabilities:
- Automated information gathering and synthesis
- Multi-step reasoning and planning
- Integration with external tools (Search, Browser)
Key components:
- react_agent.py: Implementation of the ReAct (Reasoning + Acting) paradigm.
- mcp_agents/: Modular Component Protocol (MCP) agents for extensible tool use.
- graph/: Graph-based reasoning utilities.

2. Self-Refining Agent (`self_refine/`)

An agentic system that iteratively improves its own outputs through self-correction.

Core capabilities:
- Self-evaluation of generated content
- Iterative refinement loops
- Performance analysis on benchmarks (MMLU, Graph tasks)
Key components:
- self_refine.py: Main logic for the self-refining loop.
- refine_modal.py: Modal integration for scalable execution.
- analyze_accuracy.py: Tools for evaluating refinement performance.

3. Reranking & Evaluation (`rerank_outputs.py`)

Tools for evaluating and selecting the best generations from multiple candidates.

Implements various scoring mechanisms:
- Scalar Reward Models using Skywork/Reward-Llama
- Pairwise Reward Models using LLM-Blender (PairRM)
- MBR (Minimum Bayes Risk) decoding with BLEU and BERTScore
- Log-probability analysis using Qwen models

Setup & Installation

Install Dependencies:
```
pip install -r requirements.txt
```

Environment Variables: Create a .env file with the following keys:

OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
# Add other provider keys as needed

Usage

Running the Deep Research Agent

Navigate to the deep_research_agent directory and configure the agent in react_agent.yaml.

python deep_research_agent/react_agent.py

Running Self-Refinement

To run the self-refinement experiments:

python self_refine/self_refine.py --task mmlu --model qwen3-4b

Reranking Outputs

To evaluate generated outputs using the reranking system:

python rerank_outputs.py

This will process all_results_processed.json and compute scores for all candidates.

Analysis

Use calculate_stats_reranking.py to generate statistical analysis and plots comparing different reranking strategies against gold-standard evaluations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentic Inference Systems

Project Structure

1. Deep Research Agent (`deep_research_agent/`)

2. Self-Refining Agent (`self_refine/`)

3. Reranking & Evaluation (`rerank_outputs.py`)

Setup & Installation

Usage

Running the Deep Research Agent

Running Self-Refinement

Reranking Outputs

Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
deep_research_agent		deep_research_agent
self_refine		self_refine
.DS_Store		.DS_Store
README.md		README.md
calculate_stats_reranking.py		calculate_stats_reranking.py
requirements.txt		requirements.txt
rerank_outputs.py		rerank_outputs.py

Folders and files

Latest commit

History

Repository files navigation

Agentic Inference Systems

Project Structure

1. Deep Research Agent (deep_research_agent/)

2. Self-Refining Agent (self_refine/)

3. Reranking & Evaluation (rerank_outputs.py)

Setup & Installation

Usage

Running the Deep Research Agent

Running Self-Refinement

Reranking Outputs

Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Deep Research Agent (`deep_research_agent/`)

2. Self-Refining Agent (`self_refine/`)

3. Reranking & Evaluation (`rerank_outputs.py`)

Packages