Cancel
Save as New
Evaluate any Gooey.AI Workflow output against a dataset of inputs and "golden" or expert-created desired answers. Score every row of any CSV, google sheet or excel with any LLM-as-Judge instruction prompt; then average every score in any column to generate automated evaluations.
Run
Examples
API
3mo ago
Show as Links
Here's what you uploaded:
Loading...
GPT-5.1 • OpenAI
Claude 4.5 Sonnet • Anthropic
Claude 4.1 Opus • Anthropic
Gemma 2 9B • Google
Gemini 3 Pro • Google
Gemini 2.5 Flash • Google
Gemini 2.5 Flash Lite • Google
Gemini 2 Flash Lite • Google
Gemini 2 Flash • Google
Pixtral Large 24/11 • Mistral
Mistral Large 24/11 • Mistral
Mistral Small 25/01 • Mistral
AgriLLM Qwen-3 30B • ai71
DeepSeek V3.2 • DeepSeek
GPT-5.2 • OpenAI
GPT-4o-mini • OpenAI
GPT-4o • OpenAI
o3-mini • OpenAI
o3 • OpenAI
GPT-5 • OpenAI
GPT-4.1 • OpenAI
GPT-5 Mini • OpenAI
GPT-4.1 Mini • OpenAI
SEA-LION v4 • AISingapore
GPT-5 Chat • OpenAI
GPT-5 Nano • OpenAI
o4-mini • OpenAI
Apertus 70B Instruct • SwissAI • in🇨🇭
Gemini 3 Flash • Google
Gemini 2.5 Pro • Google
GPT-4.1 Nano • OpenAI
o4-mini with Thinking • OpenAI
GPT-5.2 with Thinking • OpenAI
Kimi K2 Instruct • Moonshot AI
Llama 4 Maverick Instruct • Meta AI
Gemini 3.1 Pro Preview • Google
Claude 4.6 Opus • Anthropic
Kimi K2.5 • Moonshot AI
GLM-5 • Z.ai
Add a Prompt
mean
median
min
max
sum
cumsum
prod
cumprod
std
var
first
last
count
cumcount
nunique
rank
Add an Aggregation
⚙️ Settings
Run cost = 9 credits
With each run, you agree to Gooey.AI's terms & privacy policy.
ℹ️ Details
🙋🏽♀️ Need more help? Join our Discord
Which AI model actually works best for your needs? Upload your own data and evaluate any Gooey.AI workflow, LLM or AI model against any other. Great for large data sets, AI model evaluation, task automation, …
Create secure, multilingual AI agents for global impact. Includes support for 1500+ languages, photos, RAG, agentic tools and instant deployment to WhatsApp, Voice/SMS, Slack & Web. Works on top private & …
Transcribe mp3s, WhatsApp voice, YouTube videos in 1000+ langs with Meta’s MMS /Seemless M4T, OpenAI's GPT4o Audio LLM, Whisper v2/v3, Azure, Google, GhanaNLP, AI4Bharat & Bhasini ASR models. Optionally …
We've built the best Retrieval Augmented Generation (RAG) as-a-Service anywhere - now with page-level citations! Absorb tables, PDFs, docs, links, videos or audio clips and use our synthetic data maker to …