Skip to content

[MLOS-459] Support enriched evalmetric event submission#7503

Merged
gsvigruha merged 22 commits intomasterfrom
gergely.svigruha/eval-data-model-new-fields
Feb 13, 2026
Merged

[MLOS-459] Support enriched evalmetric event submission#7503
gsvigruha merged 22 commits intomasterfrom
gergely.svigruha/eval-data-model-new-fields

Conversation

@gsvigruha
Copy link
Copy Markdown
Contributor

@gsvigruha gsvigruha commented Feb 12, 2026

What does this PR do?

Adds reasoning, metadata and assessment to the submitEvaluation endpoint.
Add support for JSON value type.

Motivation

Close feature gaps between the Python SDK.
Customer FR: https://datadoghq.atlassian.net/browse/MLOS-459

Test

@datadog-official

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 12, 2026

Overall package size

Self size: 4.61 MB
Deduped: 5.45 MB
No deduping: 5.45 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | import-in-the-middle | 2.0.6 | 81.92 kB | 813.08 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 0% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.18%. Comparing base (c589ad4) to head (27c9b5a).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
packages/dd-trace/src/llmobs/sdk.js 0.00% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7503      +/-   ##
==========================================
+ Coverage   80.13%   80.18%   +0.04%     
==========================================
  Files         730      731       +1     
  Lines       31104    31212     +108     
==========================================
+ Hits        24926    25026     +100     
- Misses       6178     6186       +8     
Flag Coverage Δ
aiguard-macos 39.00% <0.00%> (-0.18%) ⬇️
aiguard-ubuntu 39.12% <0.00%> (-0.18%) ⬇️
aiguard-windows 38.86% <0.00%> (-0.18%) ⬇️
apm-capabilities-tracing-macos 48.76% <0.00%> (-0.04%) ⬇️
apm-capabilities-tracing-ubuntu 48.79% <0.00%> (-0.04%) ⬇️
apm-capabilities-tracing-windows 48.49% <0.00%> (-0.04%) ⬇️
apm-integrations-child-process 38.57% <0.00%> (-0.17%) ⬇️
apm-integrations-couchbase-18 37.47% <0.00%> (-0.17%) ⬇️
apm-integrations-couchbase-eol 37.96% <0.00%> (-0.02%) ⬇️
apm-integrations-oracledb 38.00% <0.00%> (-0.17%) ⬇️
appsec-express 55.37% <0.00%> (-0.16%) ⬇️
appsec-fastify 51.99% <0.00%> (-0.14%) ⬇️
appsec-graphql 52.33% <0.00%> (-0.14%) ⬇️
appsec-kafka 44.57% <0.00%> (-0.21%) ⬇️
appsec-ldapjs 44.32% <0.00%> (-0.15%) ⬇️
appsec-lodash 44.00% <0.00%> (-0.15%) ⬇️
appsec-macos 58.43% <0.00%> (-0.14%) ⬇️
appsec-mongodb-core 49.24% <0.00%> (-0.15%) ⬇️
appsec-mongoose 49.93% <0.00%> (-0.14%) ⬇️
appsec-mysql 51.33% <0.00%> (-0.14%) ⬇️
appsec-node-serialize 43.51% <0.00%> (-0.15%) ⬇️
appsec-passport 48.08% <0.00%> (-0.17%) ⬇️
appsec-postgres 51.08% <0.00%> (-0.14%) ⬇️
appsec-sourcing 42.86% <0.00%> (-0.15%) ⬇️
appsec-template 43.68% <0.00%> (-0.15%) ⬇️
appsec-ubuntu 58.51% <0.00%> (-0.14%) ⬇️
appsec-windows 58.30% <0.00%> (-0.14%) ⬇️
instrumentations-instrumentation-bluebird 32.27% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-body-parser 40.78% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-child_process 37.88% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-cookie-parser 34.49% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-express 34.83% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-express-mongo-sanitize 34.63% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-express-session 40.40% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-fs 31.87% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-generic-pool 30.19% <ø> (ø)
instrumentations-instrumentation-http 39.59% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-knex 32.27% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-mongoose 33.62% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-multer 40.52% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-mysql2 38.37% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-passport 44.39% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-passport-http 44.04% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-passport-local 44.60% <0.00%> (-0.16%) ⬇️
instrumentations-instrumentation-pg 37.78% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-promise 32.19% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-promise-js 32.20% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-q 32.24% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-url 32.16% <0.00%> (-0.17%) ⬇️
instrumentations-instrumentation-when 32.21% <0.00%> (-0.17%) ⬇️
llmobs-ai 41.55% <0.00%> (-0.03%) ⬇️
llmobs-anthropic 40.60% <0.00%> (-0.16%) ⬇️
llmobs-bedrock 39.49% <0.00%> (-0.15%) ⬇️
llmobs-google-genai 40.10% <0.00%> (-0.16%) ⬇️
llmobs-langchain 39.64% <0.00%> (-0.13%) ⬇️
llmobs-openai 44.44% <0.00%> (-0.16%) ⬇️
llmobs-vertex-ai 40.31% <0.00%> (-0.23%) ⬇️
platform-core 29.71% <ø> (ø)
platform-esbuild 32.89% <ø> (ø)
platform-instrumentations-misc 40.53% <ø> (ø)
platform-shimmer 36.14% <ø> (ø)
platform-unit-guardrails 31.27% <ø> (ø)
plugins-azure-event-hubs 24.02% <ø> (ø)
plugins-azure-service-bus 23.42% <ø> (ø)
plugins-bullmq 43.70% <0.00%> (-0.18%) ⬇️
plugins-cassandra 38.04% <0.00%> (-0.31%) ⬇️
plugins-cookie 25.08% <ø> (ø)
plugins-cookie-parser 24.87% <ø> (ø)
plugins-crypto 24.72% <ø> (ø)
plugins-dd-trace-api 38.42% <0.00%> (-0.18%) ⬇️
plugins-express-mongo-sanitize 25.04% <ø> (ø)
plugins-express-session 24.83% <ø> (ø)
plugins-fastify 42.51% <0.00%> (-0.17%) ⬇️
plugins-fetch 38.57% <0.00%> (-0.16%) ⬇️
plugins-fs 38.67% <0.00%> (-0.18%) ⬇️
plugins-generic-pool 24.06% <ø> (ø)
plugins-google-cloud-pubsub 45.72% <0.00%> (-0.14%) ⬇️
plugins-grpc 41.25% <0.00%> (-0.17%) ⬇️
plugins-handlebars 25.08% <ø> (ø)
plugins-hapi 40.42% <0.00%> (-0.32%) ⬇️
plugins-hono 40.68% <0.00%> (-0.17%) ⬇️
plugins-ioredis 38.47% <0.00%> (-0.17%) ⬇️
plugins-knex 24.80% <ø> (ø)
plugins-ldapjs 22.61% <ø> (ø)
plugins-light-my-request 24.48% <ø> (ø)
plugins-limitd-client 32.56% <0.00%> (-0.17%) ⬇️
plugins-lodash 24.13% <ø> (ø)
plugins-mariadb 39.58% <0.00%> (?)
plugins-memcached 38.21% <0.00%> (-0.18%) ⬇️
plugins-microgateway-core 39.44% <0.00%> (-0.17%) ⬇️
plugins-moleculer 40.81% <0.00%> (-0.17%) ⬇️
plugins-mongodb 39.45% <0.00%> (-0.17%) ⬇️
plugins-mongodb-core 39.08% <0.00%> (-0.17%) ⬇️
plugins-mongoose 39.13% <0.00%> (-0.17%) ⬇️
plugins-multer 24.83% <ø> (ø)
plugins-mysql 39.22% <0.00%> (-0.15%) ⬇️
plugins-mysql2 39.32% <0.00%> (-0.17%) ⬇️
plugins-node-serialize 25.12% <ø> (ø)
plugins-opensearch 37.87% <0.00%> (-0.31%) ⬇️
plugins-passport-http 24.91% <ø> (ø)
plugins-postgres 35.73% <0.00%> (-0.14%) ⬇️
plugins-process 24.72% <ø> (ø)
plugins-pug 25.08% <ø> (ø)
plugins-redis 38.95% <0.00%> (-0.18%) ⬇️
plugins-router 43.43% <0.00%> (-0.03%) ⬇️
plugins-sequelize 23.66% <ø> (ø)
plugins-test-and-upstream-amqp10 38.55% <0.00%> (-0.02%) ⬇️
plugins-test-and-upstream-amqplib 43.89% <0.00%> (-0.18%) ⬇️
plugins-test-and-upstream-apollo 39.27% <0.00%> (-0.15%) ⬇️
plugins-test-and-upstream-avsc 38.81% <0.00%> (-0.18%) ⬇️
plugins-test-and-upstream-bunyan 33.87% <0.00%> (-0.17%) ⬇️
plugins-test-and-upstream-connect 41.10% <0.00%> (-0.17%) ⬇️
plugins-test-and-upstream-graphql 40.22% <0.00%> (-0.17%) ⬇️
plugins-test-and-upstream-koa 40.67% <0.00%> (-0.17%) ⬇️
plugins-test-and-upstream-protobufjs 39.05% <0.00%> (-0.18%) ⬇️
plugins-test-and-upstream-rhea 44.14% <0.00%> (-0.18%) ⬇️
plugins-undici 39.37% <0.00%> (-0.16%) ⬇️
plugins-url 24.72% <ø> (ø)
plugins-valkey 38.13% <0.00%> (-0.14%) ⬇️
plugins-vm 24.72% <ø> (ø)
plugins-winston 34.25% <0.00%> (-0.16%) ⬇️
plugins-ws 42.22% <0.00%> (-0.17%) ⬇️
profiling-macos 39.97% <0.00%> (-0.17%) ⬇️
profiling-ubuntu 40.10% <0.00%> (-0.17%) ⬇️
profiling-windows 41.34% <0.00%> (-0.17%) ⬇️
serverless-azure-functions-client 23.75% <ø> (ø)
serverless-azure-functions-eventhubs 23.75% <ø> (ø)
serverless-azure-functions-servicebus 23.75% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Feb 12, 2026

Benchmarks

Benchmark execution time: 2026-02-13 15:48:25

Comparing candidate commit 27c9b5a in PR branch gergely.svigruha/eval-data-model-new-fields with baseline commit c589ad4 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 231 metrics, 29 unstable metrics.

@gsvigruha gsvigruha changed the title Gergely.svigruha/eval data model new fields [MLOS-459] Support enriched evalmetric event submission Feb 12, 2026
@gsvigruha gsvigruha marked this pull request as ready for review February 12, 2026 03:55
@gsvigruha gsvigruha requested a review from a team as a code owner February 12, 2026 03:55
@sabrenner sabrenner self-assigned this Feb 12, 2026
Copy link
Copy Markdown
Collaborator

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noting for transparency on the pr that we talked briefly offline that we might also have to include an update to use the v2 endpoint of the evaluation metrics api. this should not have an impact on the existing api here

@gsvigruha gsvigruha requested a review from a team as a code owner February 12, 2026 17:52
Copy link
Copy Markdown
Collaborator

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple more js-specific comments! thanks for attaching manual tests in the pr description 🙇

Comment on lines +444 to +452
if (reasoning !== undefined) {
payload.reasoning = reasoning
}
if (metadata !== undefined) {
payload.metadata = metadata
}
if (assessment !== undefined) {
payload.assessment = assessment
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will allow null values through (in js, null !== undefined). we can do this instead

Suggested change
if (reasoning !== undefined) {
payload.reasoning = reasoning
}
if (metadata !== undefined) {
payload.metadata = metadata
}
if (assessment !== undefined) {
payload.assessment = assessment
}
if (reasoning != null) {
payload.reasoning = reasoning
}
if (metadata != null) {
payload.metadata = metadata
}
if (assessment != null) {
payload.assessment = assessment
}

as null == undefined

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i can never wrap my head around TS null vs undefined and == vs ===

sabrenner
sabrenner previously approved these changes Feb 13, 2026
Copy link
Copy Markdown
Collaborator

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! unless the diff is bugged on my end it looks like we still have

assessment?: string

instead of 'pass' | 'fail' in the index.d.ts but not a blocker

@gsvigruha
Copy link
Copy Markdown
Contributor Author

I added 'pass' | 'fail' to reasoning 🤦 thanks for the catch

@gsvigruha gsvigruha merged commit 4ed9593 into master Feb 13, 2026
919 of 924 checks passed
@gsvigruha gsvigruha deleted the gergely.svigruha/eval-data-model-new-fields branch February 13, 2026 16:32
dd-octo-sts bot pushed a commit that referenced this pull request Feb 14, 2026
* Add reasoning, assessment and metadata

* more guards

* nit

* fix syntax

* some unit tests

* more tests

* fix lint

* undefined

* address comments

* partial revert

* revert metadata

* pass / fail

* fix test

* fix test

* json

* token

* doh

* fix message

* fix doc

* fixes

* fix
@dd-octo-sts dd-octo-sts bot mentioned this pull request Feb 14, 2026
juan-fernandez pushed a commit that referenced this pull request Feb 18, 2026
* Add reasoning, assessment and metadata

* more guards

* nit

* fix syntax

* some unit tests

* more tests

* fix lint

* undefined

* address comments

* partial revert

* revert metadata

* pass / fail

* fix test

* fix test

* json

* token

* doh

* fix message

* fix doc

* fixes

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants