Type size and endianness issues in the telemetry compression code

The `unittest_handle_GetTelemetrySubscriptions` test introduced with e3ae6e65612b6318211495594c8cac92111a8230 and v2.10.1 by @emasab, revealed a long-standing pointer size and endianness issue likely introduced with KIP 714 compression support in v2.5.0.

Specifically, the [v2.11.1 Debian s390x build](https://ci.debian.net/packages/libr/librdkafka/testing/s390x/63821402/) failed with this error:
```
 38s RDUT: FAIL: rdkafka_request.c:7012: unittest_handle_GetTelemetrySubscriptions: assert failed: rk->rk_telemetry.accepted_compression_types[0] == RD_KAFKA_COMPRESSION_GZIP: Expected 'gzip' compression type, got 'codec0x1000000?'
```

s390x is a 64-bit big-endian architecture. The value above is 0x1000000 == htonl(0x1) == htonl(RD_KAFKA_COMPRESSION_GZIP).

Digging a bit deeper:

`rk_telemetry.accepted_compression_types[]` is of type `rd_kafka_compression_t`, which is an enum. In standard C, sizeof(rd_kafka_compression_t) == sizeof(enum) == sizeof(int). In this case, that's == 4 (32-bits).

The code being tested is:
```c
void rd_kafka_handle_GetTelemetrySubscriptions(rd_kafka_t *rk,
                                               [...]
                                               void *opaque) {

[...]
                rk->rk_telemetry.accepted_compression_types_cnt = arraycnt;
                rk->rk_telemetry.accepted_compression_types =
                    rd_calloc(arraycnt, sizeof(rd_kafka_compression_t));

                for (i = 0; i < (size_t)arraycnt; i++)
                        rd_kafka_buf_read_i8(
                            rkbuf,
                            &rk->rk_telemetry.accepted_compression_types[i]);
```

However, `rd_kafka_buf_read_i8()` reads a int8_t, i.e. size == 1. **That's a bug**. Changing this to use an intermediate 8-bit value e.g.
```c
                for (i = 0; i < (size_t)arraycnt; i++) {
                        int8_t value;
                        rd_kafka_buf_read_i8(rkbuf, &value);
                        rk->rk_telemetry.accepted_compression_types[i] = value; // implicit cast int8_t -> int
                }
```
...makes the code correct, and the unit test pass across both little-endian and big-endian architectures.

Note that the unit test, `unittest_handle_GetTelemetrySubscriptions`  has this code:
```c
rd_kafka_buf_write_i8(rkbuf, RD_KAFKA_COMPRESSION_GZIP)
```
That's actually correct -- that's what the code being tested above expects: an int8_t.

**Additionally**, this is the push code, which is NOT tested by the unit test:
```c
rd_kafka_PushTelemetryRequest(rd_kafka_broker_t *rkb,
                              [...]
                              const rd_kafka_compression_t compression_type,
                              [...]
                              void *opaque) {
[...]
        size_t len = sizeof(rd_kafka_Uuid_t) + sizeof(int32_t) +
                     sizeof(rd_bool_t) + sizeof(compression_type) +
                     metrics_size;
        rkbuf = rd_kafka_buf_new_flexver_request(rkb, RD_KAFKAP_PushTelemetry,
                                                 1, len, rd_true);

        rd_kafka_buf_write_uuid(rkbuf, client_instance_id);
        rd_kafka_buf_write_i32(rkbuf, subscription_id);
        rd_kafka_buf_write_bool(rkbuf, terminating);
        rd_kafka_buf_write_i8(rkbuf, compression_type);
[...]
}
```

A `sizeof(compression_type)` is being allocated, but an i8 is actually being written. `rd_kafka_buf_write_i8()` accepts an int8_t as its second argument, but an int (e.g. int32_t) is being passed, which results in an implicit casted.

I don't think this is an overrun, nor manifests as an actual bug beyond wasting 4 bytes of memory to store 1, but it _is_ evidence of confusion with regards to the size of kafka_compression_t and implicit casts surrounding it, which can be a breeding ground for more serious bugs. At minimum, this should be `rd_kafka_buf_write_i8(rkbuf, (int8_t)compression_type);`

Instead of casting from/to int8_t on reads and writes (as suggested above), perhaps it should be considered to make the in-memory format match the buffer size, i.e. not using an enum for kafka_compression_t but an int8_t.

(I did not check any of the other types used in telemetry requests.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type size and endianness issues in the telemetry compression code #5179

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type size and endianness issues in the telemetry compression code #5179

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions