fix(nanogpt): implement XML tool call parsing for NanoGPT transformer by djdembeck · Pull Request #1211 · looplj/axonhub

djdembeck · 2026-03-28T21:09:39Z

Summary

This PR adds XML tool call parsing support for the NanoGPT transformer to handle multiple malformed XML formats that AI models output, converting them to OpenAI-compatible tool_calls.

Problem

NanoGPT models output tool calls in various inconsistent XML formats instead of the standard OpenAI format:

<!-- Standard format (rarely seen) -->
<use_tool name="Write"><file_path>X</file_path><content>Y</content></use_tool>

<!-- Malformed formats (common) -->
<Write file_path="X" content="Y"/>
<Write>{"file_path":"X","content":"Y"}</use_tool>
<Write file_path="X">content</use_tool>
<Write><file_path>X</file_path><content>Y</content></Write>
<Write attr="X"
}
</use_tool>  <!-- unclosed opening tag -->

These formats were not being parsed, causing test failures and missing tool calls.

Solution

Implemented a regex-based XML parser in llm/transformer/nanogpt/xml_parser.go that:

Pre-checks content for XML tool call indicators (MaybeHasXMLToolCalls)
Normalizes common malformations (normalizeXML):
- Unclosed opening tags: <Write attr="..."\n → <Write attr="...">\n
- Mismatched closing tags: </Write_file> → </Write>
- Wrong closing: </use_use> → </use_tool>
- Mismatched pairs: <Write>content</use_tool> → <Write>content</Write>
Parses multiple XML formats:
- Self-closing: <Write file_path="X" content="Y"/>
- Simple content: <Write file_path="X">content</Write>
- JSON in content: <Write>{"file_path":"X"}</use_tool>
- Nested elements: <Write><file_path>X</file_path><content>Y</content></Write>
- No space after tag: <Write_File>{...}</Write_File>
Extracts tool name and arguments, converting to OpenAI format

Changes

llm/transformer/nanogpt/model.go (+17 lines): Add XML parsing call in ToOpenAIMessage()
llm/transformer/nanogpt/xml_parser.go (+356 lines): New regex-based XML parser

Key Features

ReDoS-safe regex patterns (use [^<] instead of [\s\S]*? where possible)
Handles 6+ different XML format variations from NanoGPT
Escapes quotes in synthetic attributes to prevent XML injection
Package-level regex compilation for performance
Maintains existing test compatibility

Out of Scope

This PR intentionally does NOT handle the following issues, as they are considered model/provider bugs rather than parsing issues:

Tool name concatenation: When the model generates malformed tag names like <globglobglobls> or <Write_file_path=...>, these are model output errors. The parser correctly rejects these as unknown tool names.
Missing closing tags in content: If the model generates <Write>content without any closing tag, this is unrecoverable malformed XML. The parser handles some common cases (like unclosed opening tags) but cannot guess missing closing tags.
Nested tool calls: The parser does not support tool calls nested inside other tool calls (e.g., <Write><use_tool>...</use_tool></Write>). This pattern has not been observed in production.
CDATA sections: XML CDATA sections (<![CDATA[...]]>) are not supported as they have not been observed in model outputs.
XML namespaces: Namespaced tags like <tools:Write> are not supported as they have not been observed.

The parser is designed to be lenient for common malformations but not attempt to fix fundamentally broken model output. Errors in model generation should be reported upstream to the model provider.

gemini-code-assist

Code Review

This pull request introduces several significant updates across the backend, frontend, and documentation. Key additions include support for the Fireworks AI provider, the implementation of a "Compact" response endpoint for the OpenAI Responses API, and enhanced dashboard statistics with time-period filtering. The prompt system was expanded to support API Key-based activation conditions, and the model registry was updated with numerous new models and reasoning capability metadata. Additionally, the build system now supports FreeBSD, and a new axonhub-cli skill was added for GraphQL-based management. I have no feedback to provide as there were no review comments.

djdembeck · 2026-03-28T21:21:14Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces the xmltools package to parse XML-based tool calls from LLM responses, supporting both standard <use_tool> tags and an alternative format where tool names are used as tags. The NanoGPT model implementation is updated to utilize this parser during message conversion, and extensive integration tests are added. Feedback suggests improving the robustness of the XML parser to prevent data loss on malformed input and adding fallbacks for JSON marshaling failures in the alternative format parser.

gemini-code-assist · 2026-03-28T21:24:17Z

llm/transformer/shared/xmltools/parser.go

+		if err != nil {
+			break
+		}


The current error handling for decoder.Token() can lead to silent data loss. If an XML syntax error occurs, the loop breaks, and any content following the error is discarded. This could result in parts of the model's response being silently ignored, which can be a significant issue.

To make the parser more robust against malformed XML from the LLM, you should handle syntax errors by treating the remainder of the content as plain text instead of dropping it. You can achieve this by recording the decoder's offset before attempting to read a token, and if an error occurs, append the rest of the input from that offset to the textBuilder.

You'll need to add offset := decoder.InputOffset() at the beginning of the for loop for this suggestion to work.

if err != nil { if err != io.EOF { // Malformed XML, treat the rest of the content as text. textBuilder.WriteString(content[offset:]) } break }

gemini-code-assist · 2026-03-28T21:24:17Z

llm/transformer/shared/xmltools/parser.go

+				if jsonBytes, err := json.Marshal(v); err == nil {
+					tool.Arguments[k] = string(jsonBytes)
+				}


In the default case of the switch statement, if json.Marshal(v) fails, the argument is silently dropped. While this is an edge case, it could lead to tool calls with missing arguments without any warning, making debugging difficult. For a robust parser, it's better to have a fallback mechanism or at least log the failure.

if jsonBytes, err := json.Marshal(v); err == nil { tool.Arguments[k] = string(jsonBytes) } else { // Fallback for types that cannot be marshaled to JSON. tool.Arguments[k] = fmt.Sprintf("%v", v) }

- Document error handling behavior in ToOpenAIMessage() docstring - Fix race condition in tool call ID generation using content hash - Support any valid JSON (not just objects) in alternative format - Add fallback for unknown tool names with uppercase-starting tags - Document content clearing logic behavior All review issues from PR looplj#1211 have been addressed.

- Fix streaming aggregator index assignment bug (MEDIUM): - Check if tool call exists before incrementing nextToolIndex - Prevents non-sequential indices when mixing ID/no-ID chunks - Document XML parser leniency (LOW): - Add comment explaining uppercase letter heuristic - Trade-off between compatibility and false positives - Return errors from Parse() instead of swallowing (LOW): - Now returns error when standard parsing fails - Callers can make informed decisions about parsing results - Increase hash length to reduce collision risk (Very Low): - Changed from 8 to 16 hex characters (32 to 64 bits) - Better safety margin for high-volume scenarios - Add streaming documentation to model.go (LOW): - Document deterministic ID generation strategy - Explain streaming behavior and idempotency

…matching validation

…, JSON in content

…s, newline

…empty attrs

… tag

…>Y</content></Write>

…rPattern single quotes

djdembeck · 2026-03-31T06:45:57Z

Had to re-work this multiple times to find a solution that largely works to my liking.

Basically, NanoGPT proxies to multiple providers in a round-robin like way. Each provider may have different quants, so model quality (and tool call) can vary significantly, with many GLM 5 models using XML tool calling.

The PR description notes sometimes there are just fully malformed tool calls, which would be a cat and mouse game to fix in axonhub, so I kept the implementation focused on just converting XML tool calls to native tool calls.

looplj · 2026-03-31T14:53:43Z

Thanks for the great idea.
In fact, the universal XML tool call middleware was my original initial idea. However, since most models now have fairly good support for native tool calls, I haven’t gone ahead with implementing it.

djdembeck changed the base branch from unstable to release/v0.9.x March 28, 2026 21:10

djdembeck marked this pull request as draft March 28, 2026 21:10

djdembeck changed the title ~~fix: support XML tool calls with nested JSON arguments in alternative format~~ fix: NanoGPT XML tool parser - support alternative format with nested JSON arguments Mar 28, 2026

djdembeck changed the title ~~fix: NanoGPT XML tool parser - support alternative format with nested JSON arguments~~ fix (nanogpt): XML tool parser - support alternative format with nested JSON arguments Mar 28, 2026

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

djdembeck added 10 commits March 30, 2026 15:55

fix: handle NanoGPT malformed XML with </use_use> closing tag

a0fc085

fix: use regex-based XML parser to handle multiple tool formats

817d9d9

fix: address code review issues - regex safety, sort efficiency, tag …

4a003f3

…matching validation

fix: handle more NanoGPT XML variations - Write_FILE, mismatched tags…

c253e03

…, JSON in content

fix: address code review - ReDoS protection, regex safety, empty attr…

421c1cd

…s, newline

fix: ReDoS-safe regex patterns, optional space in self-closing tags, …

e5eb8bc

…empty attrs

fix: handle unclosed opening tags in NanoGPT XML output

b252704

fix: handle <Write_File>{...}</Write_File> format without space after…

ce191f5

… tag

fix: handle nested XML format <Write><file_path>X</file_path><content…

023bef5

…>Y</content></Write>

fix: address code review - attr escaping, regex at package level, att…

c79e685

…rPattern single quotes

djdembeck force-pushed the feat/add-nanogpt-xml-tool-parsing branch from 9ed9002 to c79e685 Compare March 31, 2026 05:37

djdembeck changed the title ~~fix (nanogpt): XML tool parser - support alternative format with nested JSON arguments~~ fix(nanogpt): implement XML tool call parsing for NanoGPT transformer Mar 31, 2026

djdembeck added 2 commits March 31, 2026 01:29

test: add comprehensive XML parser tests for NanoGPT tool call parsing

adaf644

fix: escape quotes in nested XML pattern to prevent attribute injection

f337771

djdembeck marked this pull request as ready for review March 31, 2026 06:43

looplj approved these changes Mar 31, 2026

View reviewed changes

looplj merged commit fd8f653 into looplj:release/v0.9.x Mar 31, 2026
2 checks passed

djdembeck deleted the feat/add-nanogpt-xml-tool-parsing branch April 1, 2026 06:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(nanogpt): implement XML tool call parsing for NanoGPT transformer#1211

fix(nanogpt): implement XML tool call parsing for NanoGPT transformer#1211
looplj merged 12 commits intolooplj:release/v0.9.xfrom
djdembeck:feat/add-nanogpt-xml-tool-parsing

djdembeck commented Mar 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

djdembeck commented Mar 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 28, 2026

Uh oh!

gemini-code-assist bot Mar 28, 2026

Uh oh!

djdembeck commented Mar 31, 2026

Uh oh!

looplj commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

djdembeck commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Key Features

Out of Scope

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

djdembeck commented Mar 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

djdembeck commented Mar 31, 2026

Uh oh!

looplj commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

djdembeck commented Mar 28, 2026 •

edited

Loading