Clock source not stable

Board: Jetson Orin NX
Kernel: 5.15.148

Hello,

we are experiencing a problem with the clock not being stable after ~6 minutes after boot.
When using simple code that sends one byte over UART every second, it works well for ~6 minutes, but starts to send each 0.92 seconds afterwards.

The code to test this is as follows:

int main() {
    const char* uart_path = "/dev/ttyTHS1";
    int uart_fd = open(uart_path, O_WRONLY | O_NOCTTY | O_SYNC);

    if (!configureUART(uart_fd)) {
        close(uart_fd);
        return 1;
    }

    using namespace std::chrono;
    auto next = steady_clock::now();

    while (true) {
        const char message = 255;
        ssize_t bytes_written = write(uart_fd, &message, 1);

        next += seconds(1);
        std::this_thread::sleep_until(next);
    }

    close(uart_fd);
    return 0;
}

I used a Saleae logic to measure the diff between two messages sent:

  • I tried different modes using nvpmodel, it did not change the behavior
  • I used jetson_clocks, it did not change the behavior
  • I tried using the performance governor, it did not change the behavior
  • I tried using different clocksources than arch_sys_counter, it did not change the behavior

The behavior is always the same. After a reboot it works well for ~6 minutes, then it will always take 0.92 seconds instead of 1 second. It does not get worse, even after an hour it will still take 0.92 seconds instead of 1 second, so it is not some small drift, but one specific event.

Except of that 1 byte there is no other traffic on UART.

Based on the measured diffs I assume that the CPU frequency changes after some time and causes the time to be calculated incorrectly.

This is also confirmed by the following test:

#include <chrono>
#include <iostream>
#include <thread>

int main() {
    using namespace std::chrono;
    auto next = steady_clock::now();
    uint64_t ticks_last = 0;

    while (true) {
        uint64_t freq;
        asm volatile("mrs %0, cntfrq_el0" : "=r" (freq));
        uint64_t ticks;
        asm volatile("mrs %0, cntvct_el0" : "=r" (ticks));

        std::cout << "ticks: " << ticks << " " << "freq: " << freq << std::endl;
        std::cout << "diff: " << (ticks - ticks_last) / static_cast<double>(freq) << std::endl;

        ticks_last = ticks;
        next += seconds(1);
        std::this_thread::sleep_until(next);
    }

    return 0;
}

Which prints the following in the problematic state:

ticks: 33085888182 freq: 31250000
diff: 0.923026
ticks: 33114742542 freq: 31250000
diff: 0.92334
ticks: 33143579730 freq: 31250000
diff: 0.92279
ticks: 33172436056 freq: 31250000
diff: 0.923402
ticks: 33201276706 freq: 31250000
diff: 0.922901

Did anyone observe similar problems already? Thanks.

Could you verify by udelay() instead of sleep?

Please also use CLOCK_MONOTONIC_RAW API to check.
Get the CLOCK_MONOTONIC_RAW time like below.

t0 = clock_gettime();
udelay(1000)

while(1)
{
    t1 = clock_gettime();
    if ((t1 -t2) == 1s) {
        send char from UART;
        break;
    }
    else
        udelay(1000);
    }
}

Hello,

I adapted my code as follows

#include <chrono>
#include <iostream>
#include <thread>
#include <stdio.h>
#include <unistd.h>
#include <time.h>
#include <sys/time.h>

void timespec_diff(struct timespec *start, struct timespec *stop,
                   struct timespec *result)
{
    if ((stop->tv_nsec - start->tv_nsec) < 0) {
        result->tv_sec = stop->tv_sec - start->tv_sec - 1;
        result->tv_nsec = stop->tv_nsec - start->tv_nsec + 1000000000;
    } else {
        result->tv_sec = stop->tv_sec - start->tv_sec;
        result->tv_nsec = stop->tv_nsec - start->tv_nsec;
    }

    return;
}

int main() {
    using namespace std::chrono;
    auto next = steady_clock::now();
    uint64_t ticks_last = 0;

    struct timespec spec_last;

    while (true) {
        uint64_t freq;
        asm volatile("mrs %0, cntfrq_el0" : "=r" (freq));
        uint64_t ticks;
        asm volatile("mrs %0, cntvct_el0" : "=r" (ticks));

        struct timespec spec_now;
        struct timespec spec_diff;
        clock_gettime(CLOCK_MONOTONIC_RAW, &spec_now);
        timespec_diff(&spec_last, &spec_now, &spec_diff);

        std::cout << "ticks: " << ticks << " " << "freq: " << freq << std::endl;
        std::cout << "diff: " << (ticks - ticks_last) / static_cast<double>(freq) << std::endl;
        std::cout << "mono diff: " << spec_diff.tv_sec << " s " << spec_diff.tv_nsec << " ns" << std::endl;

        ticks_last = ticks;
        spec_last = spec_now;

        next += seconds(1);
        //std::this_thread::sleep_until(next);
        sleep(1);
    }

    return 0;
}

  • I can’t use any of the delay functions as everything is running in userspace, so I only have access to various variants of sleep
  • Something very weird I noticed: The problem will only happen if ethernet is connected. I was running the tool before for some hours without ethernet and it did not occur
  • In dmesg, journalctl I searched for things related to the clocksource, there is no change there. In the good and the bad case the clocksource is arch_sys_counter

Here is a good log, where ethernet was not connected:

ticks: 10911839280 freq: 31250000
diff: 1.00044
mono diff: 1 s 439264 ns

ticks: 10943147351 freq: 31250000
diff: 1.00186
mono diff: 1 s 1858336 ns

ticks: 10974401389 freq: 31250000
diff: 1.00013
mono diff: 1 s 129216 ns

ticks: 11005686235 freq: 31250000
diff: 1.00112
mono diff: 1 s 1115232 ns

ticks: 11036945195 freq: 31250000
diff: 1.00029
mono diff: 1 s 286560 ns

ticks: 11068256942 freq: 31250000
diff: 1.00198
mono diff: 1 s 1975936 ns

Here is a bad log, where ethernet was connected. You can actually see the jump from good to bad:

ticks: 13675890585 freq: 31250000
diff: 1.00027
mono diff: 1 s 273568 ns

ticks: 13707143613 freq: 31250000
diff: 1.0001
mono diff: 1 s 97120 ns

ticks: 13737045860 freq: 31250000
diff: 0.956872
mono diff: 0 s 956871776 ns

ticks: 13765907866 freq: 31250000
diff: 0.923584
mono diff: 0 s 923584288 ns

ticks: 13794758638 freq: 31250000
diff: 0.923225
mono diff: 0 s 923224544 ns

ticks: 13823610711 freq: 31250000
diff: 0.923266
mono diff: 0 s 923266400 ns

ticks: 13852462504 freq: 31250000
diff: 0.923257
mono diff: 0 s 923257536 ns

ticks: 13881313581 freq: 31250000
diff: 0.923234
mono diff: 0 s 923234240 ns

I verified the times with the UART send, and the external measurements shows the same values.

It gets really interesting when also measuring CLOCK_MONOTONIC, because then the output will look as follows in the error state:

ticks: 49254724153 freq: 31250000
diff: 0.923417
mono raw diff: 0 s 923418208 ns
mono diff: 1 s 395220 ns

ticks: 49283577845 freq: 31250000
diff: 0.923318
mono raw diff: 0 s 923318112 ns
mono diff: 1 s 286814 ns

So somehow the kernel feels the need to incorrectly adapt the RAW values in case ethernet is connected.

Looks like the problem doesn’t matter with the clocks, it more like the sleep API precise problem.
Please use usleep() to check the timediff == 1s to send char to UART to verify.

I was able to determine the cause of this issue. The problem was that the chrony NTP client was sending incorrect adjustment data to the kernel via the adjtimex system call. I really was not aware that NTP was able to mess up the CLOCK_MONOTONIC this much.

So the issue is not specific to the Jetson.

Thanks for all the help.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.