Skip to content

Type size and endianness issues in the telemetry compression code #5179

@paravoid

Description

The unittest_handle_GetTelemetrySubscriptions test introduced with e3ae6e6 and v2.10.1 by Emanuele Sabellico (@emasab), revealed a long-standing pointer size and endianness issue likely introduced with KIP 714 compression support in v2.5.0.

Specifically, the v2.11.1 Debian s390x build failed with this error:

 38s RDUT: FAIL: rdkafka_request.c:7012: unittest_handle_GetTelemetrySubscriptions: assert failed: rk->rk_telemetry.accepted_compression_types[0] == RD_KAFKA_COMPRESSION_GZIP: Expected 'gzip' compression type, got 'codec0x1000000?'

s390x is a 64-bit big-endian architecture. The value above is 0x1000000 == htonl(0x1) == htonl(RD_KAFKA_COMPRESSION_GZIP).

Digging a bit deeper:

rk_telemetry.accepted_compression_types[] is of type rd_kafka_compression_t, which is an enum. In standard C, sizeof(rd_kafka_compression_t) == sizeof(enum) == sizeof(int). In this case, that's == 4 (32-bits).

The code being tested is:

void rd_kafka_handle_GetTelemetrySubscriptions(rd_kafka_t *rk,
                                               [...]
                                               void *opaque) {

[...]
                rk->rk_telemetry.accepted_compression_types_cnt = arraycnt;
                rk->rk_telemetry.accepted_compression_types =
                    rd_calloc(arraycnt, sizeof(rd_kafka_compression_t));

                for (i = 0; i < (size_t)arraycnt; i++)
                        rd_kafka_buf_read_i8(
                            rkbuf,
                            &rk->rk_telemetry.accepted_compression_types[i]);

However, rd_kafka_buf_read_i8() reads a int8_t, i.e. size == 1. That's a bug. Changing this to use an intermediate 8-bit value e.g.

                for (i = 0; i < (size_t)arraycnt; i++) {
                        int8_t value;
                        rd_kafka_buf_read_i8(rkbuf, &value);
                        rk->rk_telemetry.accepted_compression_types[i] = value; // implicit cast int8_t -> int
                }

...makes the code correct, and the unit test pass across both little-endian and big-endian architectures.

Note that the unit test, unittest_handle_GetTelemetrySubscriptions has this code:

rd_kafka_buf_write_i8(rkbuf, RD_KAFKA_COMPRESSION_GZIP)

That's actually correct -- that's what the code being tested above expects: an int8_t.

Additionally, this is the push code, which is NOT tested by the unit test:

rd_kafka_PushTelemetryRequest(rd_kafka_broker_t *rkb,
                              [...]
                              const rd_kafka_compression_t compression_type,
                              [...]
                              void *opaque) {
[...]
        size_t len = sizeof(rd_kafka_Uuid_t) + sizeof(int32_t) +
                     sizeof(rd_bool_t) + sizeof(compression_type) +
                     metrics_size;
        rkbuf = rd_kafka_buf_new_flexver_request(rkb, RD_KAFKAP_PushTelemetry,
                                                 1, len, rd_true);

        rd_kafka_buf_write_uuid(rkbuf, client_instance_id);
        rd_kafka_buf_write_i32(rkbuf, subscription_id);
        rd_kafka_buf_write_bool(rkbuf, terminating);
        rd_kafka_buf_write_i8(rkbuf, compression_type);
[...]
}

A sizeof(compression_type) is being allocated, but an i8 is actually being written. rd_kafka_buf_write_i8() accepts an int8_t as its second argument, but an int (e.g. int32_t) is being passed, which results in an implicit casted.

I don't think this is an overrun, nor manifests as an actual bug beyond wasting 4 bytes of memory to store 1, but it is evidence of confusion with regards to the size of kafka_compression_t and implicit casts surrounding it, which can be a breeding ground for more serious bugs. At minimum, this should be rd_kafka_buf_write_i8(rkbuf, (int8_t)compression_type);

Instead of casting from/to int8_t on reads and writes (as suggested above), perhaps it should be considered to make the in-memory format match the buffer size, i.e. not using an enum for kafka_compression_t but an int8_t.

(I did not check any of the other types used in telemetry requests.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions