Skip to content

feat: add reasoning_content and tool-calling support to ChatDataset#1644

Merged
akoumpa merged 8 commits intoNVIDIA-NeMo:mainfrom
zeel2104:zeel2104/feat_reasoning_toolcalling_chatdataset
Apr 3, 2026
Merged

feat: add reasoning_content and tool-calling support to ChatDataset#1644
akoumpa merged 8 commits intoNVIDIA-NeMo:mainfrom
zeel2104:zeel2104/feat_reasoning_toolcalling_chatdataset

Conversation

@zeel2104
Copy link
Copy Markdown
Contributor

@zeel2104 zeel2104 commented Apr 1, 2026

What does this PR do ?

Improve OpenAI message-format ChatDataset support for reasoning traces and tool calling by adding explicit validation, better loss masking for multi-turn assistant turns, optional reasoning-token masking, and updated docs/tests.

Changelog

  • Add explicit normalization and validation for assistant reasoning_content
  • Normalize content: null to "" instead of "None" in chat message preprocessing
  • Validate assistant tool_calls entries require id, type, and function.{name,arguments}
  • Validate tool role messages require tool_call_id
  • Fix non-{% generation %} fallback masking so all assistant turns are supervised in multi-turn tool-calling conversations
  • Add optional mask_reasoning_content support to exclude rendered reasoning traces from loss
  • Warn when dataset rows contain reasoning_content but the active chat template does not reference it
  • Add unit tests covering reasoning/tool-calling normalization, malformed tool payloads, multi-turn masking, and optional reasoning masking
  • Update docs/guides/dataset-overview.md with reasoning + tool-calling examples and chat-template requirements

Before your PR is "Ready for review"

Pre checks:

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zeel2104 zeel2104 changed the title Support reasoning_content and tool calling in openai message format dataset feat: add reasoning_content and tool-calling support to ChatDataset Apr 1, 2026
Copy link
Copy Markdown
Contributor

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed tech pubs review of docs/guides/dataset-overview.md and provided a few copyedits.

Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
Comment on lines 267 to 271
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be duplicate content, check, and then delete

Comment thread docs/guides/dataset-overview.md Outdated
Comment on lines 280 to 284
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be duplicate content, check, and then delete

Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
Comment thread docs/guides/dataset-overview.md Outdated
@zeel2104
Copy link
Copy Markdown
Contributor Author

zeel2104 commented Apr 1, 2026

@jgerh
Applied the copyedits in docs/guides/dataset-overview.md, including the NeMo AutoModel naming fixes and removal of the duplicate sections. Thanks for the review.

@zeel2104 zeel2104 requested a review from ZhiyuLi-Nvidia as a code owner April 1, 2026 21:28
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 1, 2026

/ok to test c07d79e

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 1, 2026

/ok to test 909d307

Comment thread examples/llm_finetune/qwen/qwen2_5_0p5b_instruct_fineproofs_chat.yaml Outdated
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 3, 2026

/ok to test 9b47e54

akoumpa
akoumpa previously approved these changes Apr 3, 2026
Copy link
Copy Markdown
Contributor

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zeel2104 !

@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 3, 2026

/ok to test 06200d0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support reasoning_content and tool calling in openai message format dataset

4 participants