Skip to content

Conversation

@sdtblckgov
Copy link
Contributor

@sdtblckgov sdtblckgov commented Apr 1, 2025

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior? (You can also link to an open issue here)

No aideml agent

What is the new behavior?

Yes aideml agent!

tl;dr: implements a version of https://github.com/WecoAI/aideml using inspect's new agents protocol.

Running the agent currently has a few requirements:

  • install igraph
  • your dockerfile must contain pandas, genson, and humanize pip packages

This implementation is fairly rough right now, so happy to do a bit of iteration to clean it up / document it better. I would like to make it more of a general framework, take away the need to install extra dependencies, and move the default prompts away from being so ML focused. Providing a default aide docker image would be a nice bonus, similar to how the web browser tool works currently, but I do not know how to do this.

Below is some example code documenting how to use the agent currently:

from pathlib import Path

from rich import print

from inspect_ai import Task, eval, task
from inspect_ai.agent import aide_agent
from inspect_ai.dataset import Sample
from inspect_ai.scorer import exact


@task
def hello_world():
    return Task(
        dataset=[
            Sample(
                input="Just reply with Hello World",
                target="Hello World",
            )
        ],
        solver=[
            aide_agent(data_dir=Path("/home/ubuntu/workspace/aideml-inspect/aide_inspect/assets/test")),
        ],
        scorer=exact(),
        sandbox="docker",
    )


if __name__ == "__main__":
    log = eval(hello_world(), model="anthropic/claude-3-7-sonnet-20250219")
    print(".")

@jjallaire let me know what you'd need here for this to be mergeable!

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

no

Other information:

n/a

@sdtblckgov sdtblckgov requested a review from jjallaire April 1, 2025 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants