Skip to content

Some timing metrics are in seconds but reported as milliseconds #20804

@viktorvia

Description

@viktorvia

Apache Airflow version

2.2.2

What happened

When Airflow reports timing stats it uses either a timedelta or a direct value. When using timedelta it is converted automatically to the correct units of measurement but when using a direct value it is accepted to already be in the correct units.

Unfortunately the Stats class, either being statsd.StatsClient or a stub, expects milliseconds while the Airflow code passes the value in seconds.

The result is two of the timing metrics are wrong by a magnitude of 1000.

This affects dag_processing.last_duration.<dag_file> and smart_sensor_operator.loop_duration.

The rest either pass timedelta or use a Stats.timer which calculates timing on its own and is not affected.

What you expected to happen

All timing metrics to be in the correct unit of measurement.

How to reproduce

Run a statsd-exporter and a prometheus to collect the metrics and compare to the logs.

For the dag processing metric, the scheduler logs the amounts and can be directly compared to the gathered metric.

Operating System

Linux

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

Using these two with the configs below to process the metrics. The metrics can be viewed in the prometheus UI on localhost:9090.

   prometheus:
        image: prom/prometheus:v2.32.1
        command:
            - --config.file=/etc/prometheus/config.yml
            - --web.console.libraries=/usr/share/prometheus/console_libraries
            - --web.console.templates=/usr/share/prometheus/consoles
        ports:
            - 9090:9090
        volumes:
            - ./prometheus:/etc/prometheus
    
    statsd-exporter:
        image: prom/statsd-exporter:v0.22.4
        command:
            - --statsd.mapping-config=/tmp/statsd_mapping.yml
        ports:
            - 9102:9102
            - 9125:9125
            - 9125:9125/udp
        volumes:
            - ./prometheus/statsd_mapping.yml:/tmp/statsd_mapping.yml

The prometheus config is:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: airflow_statsd
    scrape_interval: 1m
    scrape_timeout: 30s 
    static_configs:
      - targets:
        - statsd-exporter:9102

The metrics mapping for statsd-exporter is:

mappings:
  - match: "airflow.dag_processing.last_duration.*"
    name: "airflow_dag_processing_last_duration"
    labels:
      dag_file: "$1"

  - match: "airflow.collect_db_tags"
    name: "airflow_collect_db_tags"
    labels: {}

  - match: "airflow.scheduler.critical_section_duration"
    name: "airflow_scheduler_critical_section_duration"
    labels: {}

  - match: "airflow.dagrun.schedule_delay.*"
    name: "airflow_dagrun_schedule_delay"
    labels:
      dag_id: "$1"

  - match: "airflow.dag_processing.total_parse_time"
    name: "airflow_dag_processing_total_parse_time"
    labels: {}

  - match: "airflow.dag_processing.last_run.seconds_ago.*"
    name: "airflow_dag_processing_last_run_seconds_ago"
    labels:
      dag_file: "$1"

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions