Most command-line tools weren’t designed with AI agents in mind. They were built for humans who can squint at irregular output, infer meaning from context, and forgive the occasional formatting inconsistency. When you hand these tools to an agent, that forgiveness evaporates. What looks like helpful verbose logging to a developer becomes an unparseable wall of noise to an LLM trying to extract a single boolean success indicator.
The gap between human-friendly and agent-friendly output is wider than it appears. A CLI that prints colorful status updates, progress bars, and helpful warnings is doing exactly what it should for interactive use. But those ANSI escape codes, those dynamically updating lines, those context-dependent messages—they turn into parsing nightmares the moment you try to wrap them in a script that needs to make decisions based on the results. The Twelve-Factor App methodology has something to say about this: treat logs as event streams, not formatted output. That wisdom applies doubly when your consumer is an agent.
The contract of clarity
Building a good wrapper starts with understanding what the underlying tool actually promises versus what it happens to do in practice. Read the exit codes. Not the documentation about exit codes—actually run the tool with bad input, missing files, network failures, and see what comes back. Does it return zero on partial success? Does it write errors to stdout instead of stderr? Does it change its output format based on whether stdout is a TTY? These aren’t edge cases; they’re the reality you’re papering over.
Your wrapper’s job is to normalize all of that into something predictable. That usually means capturing both stdout and stderr, checking the exit code, and then parsing what you got into a structure the agent can rely on. JSON is almost always the right answer here. Not because it’s trendy, but because it’s unambiguous. An agent reading JSON doesn’t have to guess whether a line starting with “Error:” is a fatal failure or a warning. It just checks the status field and knows.
The parsing itself is where most wrappers break. Regex is tempting but fragile. The moment the CLI author changes their punctuation or rewords a message, your pattern stops matching. Better to anchor on structural elements when possible: look for specific markers the tool guarantees, parse line-by-line with clear delimiters, or better yet, see if the tool has a machine-readable output mode you can enable. Many modern CLIs offer --json or --format=json flags precisely because their authors learned this lesson the hard way.
Error handling is the other half of the contract. When the CLI fails, your wrapper needs to surface enough information for the agent to decide what to do next. Not just “command failed”—that’s useless. Include the exit code, the last few lines of stderr, and ideally a categorization of the failure type. Was it a network timeout? A permission issue? Invalid input? Each category suggests a different recovery strategy, and agents can’t infer that from raw terminal output.
The best wrappers are boring. They do one thing, they do it the same way every time, and they fail loudly when something goes wrong. No clever heuristics, no silent fallbacks, no “this usually works” logic. Predictability is the only feature that matters. When an agent calls your wrapper at 3am during a heartbeat routine with no human around to interpret creative error messages, boring becomes beautiful.