Seeking input on proposal for "reference"-type attribute values.

### Area(s)

area:new

### Is your change request related to a problem? Please describe.

Some attributes may may be very large such as:

 - [ECS span attributes for HTTP](https://www.elastic.co/guide/en/ecs/current/ecs-http.html):
   - `http.request.body.content`
   - `http.response.body.content`
 - [GenAI span event attributes](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/#events):
   - `gen_ai.prompt`
   - `gen_ai.completion`
 - [Logs exceptions](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-logs/#attributes):
   -  `exception.stacktrace`
- [Span event exceptions:](https://opentelemetry.io/docs/specs/semconv/exceptions/exceptions-spans/):
   -  `exception.stacktrace`

In order to provide a very tight SLO and to provide consistent performance, an operations backend might provide very tight limits on the size of payloads/content/attributes. However, it may still be desirable when this is the case to be able to index most of the properties/attributes as well as to cross-link to an alternative storage solution (with a looser SLO) for full, large data.

In the OTel Collector Contrib repository, there is an open issue for a new connector component ("[New component: Blob Attribute Uploader Connector](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/33737)") which attempts to resolve the above issue by proposing a connector that will, in the instrumentation/collector, do the following:

1. Upload the full attribute data to a blob storage system (such as Google Cloud Storage, Amazon S3, Azure Blob storage, or -- in the future -- any other blob storage backend supported by the CDK or that chooses to contribute to the connector).

2. Remove the original attribute that would otherwise be too large.

3. Inject attribute(s) that contain references to where the data was uploaded.

It is with respect to this last one, that we need an established standard / data model for how to represent the uploaded data. The "[SetInMap](https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/blob/blob_writer_span_processor/connector/blobattributeuploadconnector/internal/foreignattr/factory.go)" function in the "[foreign attribute](https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/connector/blobattributeuploadconnector/internal/foreignattr)" internal library of the [draft connector](https://github.com/michaelsafyan/open-telemetry.opentelemetry-collector-contrib/tree/blob_writer_span_processor/connector/blobattributeuploadconnector) shows the solution currently being implemented/pursued. However, I'd like to get more community feedback/input in order to ensure that there is agreement on the approach.

### Describe the solution you'd like

## Summary:

| **Before uploading** | **After uploading** |
| --- | --- |
| `somekey` | `somekey.ref.uri` |
|                    | `somekey.ref.content_type` _\[OPTIONAL\]_ |

## Formally:

A backend system that sees an attribute matching the pattern "${prefix}.ref.uri" should assume that "${prefix}.ref.uri" contains a URI that reports that location of the original, full value of "${prefix}". If a key named "${prefix}" also is present, it is likely that "${prefix}" contains a truncated and/or redacted copy of the original value, with "${prefix}.ref.uri" pointing to the location where the original, unadulterated, full version of the value has been recorded. If "${prefix}.ref.uri" exists, then "${prefix}.ref.content_type" may also optionally exist, containing a MIME type describing the data stored at the URI in "${prefix}.ref.uri".

## Prerequisites / Interactions:

If this course of action were to be taken, we would need two changes to [the "Attribute Naming" specification in OTel:](https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/)

1. Reserve ".ref." so that it can only be used for this purpose ("*.ref.*" to be used only for information about reference-typed values and their metadata).

2. Exempt this usage from the rule that "Names SHOULD NOT coincide with namespaces"

With respect to the latter exemption, our understanding is that this rule exists to allow coaelescing the flat attributes into a structured object. We believe that ".ref." fits the spirit of this even while violating the literal rule as it exists now, because the ".ref." does not introduce a conceptually new attribute; rather, it replaces the existing attribute with a different representation. Thus if a backend were to implement a coalescing system to make attributes non-flat, they could combine all of the ".ref." attributes into a single "ReferenceValue" type as the value for the top-level attribute that does not contain any ".ref." value in it.


### Describe alternatives you've considered

## Use a compound data type

For example, we could use a "KeyValueList kvlist_value" to represent the reference not as a bunch of separate attributes but as a single complex attribute value. However, this would go contrary to the [attribute specification](https://opentelemetry.io/docs/specs/otel/common/) as well as existing OTel libraries which do not provide direct access to the underlying data model in OTLP following this convention restriction.

## Introduce a new data type

We could attempt to introduce into OTLP (and into other parts of the specification) a new "ReferenceValue" data type. However, this would require significant effort across multiple languages, libraries, backend systems, etc. to support and thus seems like a non-starter.

## Just move the large attributes to be events/logs

This assumes that the event/logs backend can, itself, accept arbitrarily large data. This assumption may not hold true for every event/log backend system. There is an inherent tradeoff between reliabilty/latency and the amount of data that a backend can allow; to provide a very tight latency SLO that caps 99%-ile latency, it's important to guarantee low variance in the size of requests which, in turn, may limit the amount of data that an event/log backend can accept (while being able to provide such a tight time bound). When using such a system for indexing the metadata about the events and making them quickly available for searching, it would still be useful to have a client-side mechanism to route larger content that is not critical for indexing to a blob storage system more suitable for large object storage and to be able to make it easy to interlink and navigate between backend systems by referencing the storage location.

## Replace/upload just the entire body of events/logs

While this might serve the purpose of "gen_ai.prompt" / "gen_ai.completion" if/when it moves from a span event to a more general event, it would still be necessary to provide a data model for representing that the event body had been uploaded/replaced with a reference. In addition, this would not cover many other cases (e.g. "http.request.body.content"), and -- beyond this -- it is desirable to upload a much more targeted/limited portion of the data in order to allow indexing / searching the other content that does fit within backend subsystem limits.


### Additional context

## Examples

### Example 1: `http.response.body.content` _\[Span Attribute\]_

**Before**

```
# TracesData [Before]
resource_spans: {
  resource: { … }
  scope_spans: {
     scope: { … }
     spans: {
         trace_id: …
         span_id: …
         …
         attributes: {
            key: "http.response.body.content"
            value: { string_value: "{ \"results\": [ … ] " }   # very long API JSON repsonse
          }
         }
         …
     }
  }
}
```

**After**

```
# TracesData [After]
resource_spans: {
  resource: { … }
  scope_spans: {
     scope: { … }
     spans: {
         trace_id: …
         span_id: …
         …
         attributes: {
            key: "http.response.body.content.ref.uri"
            value: { string_value: "gs://some-bucket/traceAttachments/12345/7890/response.json" }
          }
          attributes: {
             key: "http.response.body.content.ref.content_type"
             value: { string_value: "application/json" }
            }
         }
         …
     }
  }
}
```

### Example 2: `gen_ai.prompt` / ` gen_ai.completion` _\[Span Event Attribute\]_

**Before**

```
# TracesData [Before]
resource_spans: {
  resource: { … }
  scope_spans: {
     scope: { … }
     spans: {
         trace_id: …
         span_id: …
         …
         events: {
           # Note that this precise event name is mandatory per:
           # https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
           name: "gen_ai.content.prompt"
           # …
           attributes: {
              key: "gen_ai.prompt"
              value: { string_value: "Imagine that there is a very long text prompt here…." }
            }
           }
         }
        events: {
             # Note that this precise event name is mandatory per:
             # https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
             name: "gen_ai.content.completion"
             attributes: {
                 key: "gen_ai.completion"
                 value: { string_value: "{ \"completions\": [...]}"  }  # very long completion JSON
              }
           }
        }
         …
     }
  }
}

```

**After**

```
# TracesData [After]
resource_spans: {
  resource: { … }
  scope_spans: {
     scope: { … }
     spans: {
         trace_id: …
         span_id: …
         …
         events: {
           # Note that this precise event name is mandatory per:
           # https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
           name: "gen_ai.content.prompt"
           # …
           attributes: {
              key: "gen_ai.prompt.ref.uri"
              value: { string_value: "s3://bucket/some/path/to/the/prompt.txt" }
            }
            attributes: {
               key: "gen_ai.prompt.ref.content_type"
               value: { string_value: "text/plain" }
             }
           }
         }
        events: {
             # Note that this precise event name is mandatory per:
             # https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
             name: "gen_ai.content.completion"
             attributes: {
                 key: "gen_ai.completion.ref.uri"
                 value: { string_value: "azblob://account/container/path/to/response.json" }
              }
              attributes: {
                  key: "gen_ai.completion.ref.content_type"
                  value: { string_value: "application/json" }
              }
           }
        }
         …
     }
  }
}
```

Before uploading	After uploading
`somekey`	`somekey.ref.uri`
	`somekey.ref.content_type` [OPTIONAL]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking input on proposal for "reference"-type attribute values. #1428

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Summary:

Formally:

Prerequisites / Interactions:

Describe alternatives you've considered

Use a compound data type

Introduce a new data type

Just move the large attributes to be events/logs

Replace/upload just the entire body of events/logs

Additional context

Examples

Example 1: `http.response.body.content` [Span Attribute]

Example 2: `gen_ai.prompt` / `gen_ai.completion` [Span Event Attribute]

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Seeking input on proposal for "reference"-type attribute values. #1428

Description

Area(s)

Is your change request related to a problem? Please describe.

Describe the solution you'd like

Summary:

Formally:

Prerequisites / Interactions:

Describe alternatives you've considered

Use a compound data type

Introduce a new data type

Just move the large attributes to be events/logs

Replace/upload just the entire body of events/logs

Additional context

Examples

Example 1: http.response.body.content [Span Attribute]

Example 2: gen_ai.prompt / gen_ai.completion [Span Event Attribute]

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Example 1: `http.response.body.content` [Span Attribute]

Example 2: `gen_ai.prompt` / `gen_ai.completion` [Span Event Attribute]