AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows

This repository is the public artifact bundle for the second paper built from the ai-eval-forge package.

What is in this repo

ai-eval-forge-mixed-check-regression-testing-preprint.pdf for the submission-ready manuscript
paper.md for the source draft
paper.bib for the current references
assets/workflow-figure.svg for the workflow figure
submission-metadata.json for reusable submission metadata
ai-eval-forge-preprint-package.zip for one-click uploads to preprint platforms

Abstract

Large-model and agent teams often need faster regression checks than broad benchmark suites can provide. This paper presents AI Eval Forge, a zero-dependency evaluation harness for mixed-check regression testing across LLM and agent workflows. The tool supports exact-match, substring, regex, token-F1, JSON validity, JSON field equality, citation coverage, and bounded custom-expression checks in a compact case format that works with JSON or JSONL. The contribution is not a new benchmark. It is a small, inspectable evaluation layer that helps teams compare runs, catch regressions, and summarize pass rate, score, cost, and latency without standing up a heavy evaluation stack. The paper describes the harness design, check model, reporting format, and practical role of mixed-check cases in real workflow testing.

Source package

npm package: @mukundakatta/ai-eval-forge
GitHub repository: MukundaKatta/ai-eval-forge-js

Submission path

This bundle is prepared for:

Zenodo
OSF Preprints
SSRN

Citation

Use the versioned preprint record once published.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
abstract.txt		abstract.txt
ai-eval-forge-mixed-check-regression-testing-preprint.pdf		ai-eval-forge-mixed-check-regression-testing-preprint.pdf
ai-eval-forge-paper-site-notes.md		ai-eval-forge-paper-site-notes.md
ai-eval-forge-preprint-package.zip		ai-eval-forge-preprint-package.zip
author-bio.txt		author-bio.txt
cover-note.txt		cover-note.txt
keywords.txt		keywords.txt
next-edits.md		next-edits.md
paper.bib		paper.bib
paper.md		paper.md
render_preprint_pdf.py		render_preprint_pdf.py
submission-metadata.json		submission-metadata.json
zenodo-upload-checklist.md		zenodo-upload-checklist.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows

What is in this repo

Abstract

Source package

Submission path

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Eval Forge: Mixed-Check Regression Testing for LLM and Agent Workflows

What is in this repo

Abstract

Source package

Submission path

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages