Skip to content

Writing Logs to Elasticsearch: asctime field in JSON log output set to null #10024

@rgrizzell

Description

@rgrizzell

Apache Airflow version: 1.10.11

Kubernetes version (if you are using kubernetes) (use kubectl version): N/A

Environment:

  • Cloud provider or hardware configuration: QEMU
  • OS (e.g. from /etc/os-release): Ubuntu 20.04 (and Fedora 32)
  • Kernel (e.g. uname -a): 5.4.0-42-generic (and 5.6.6-300.fc32.x86_64)
  • Install tools: Virtualenv
  • Others: Python 2.7.18 (and Python 3.8.3)

What happened:

If Airflow is set to log in Plaintext, it will print the log messages with asctime properly set.
[2020-07-22 15:13:13,783] {standard_task_runner.py:54} INFO - Started process 116982 to run task

With the configuration changes to print to JSON, asctime is set to null.
{"asctime": null, "task_id": "runme_0", "execution_date": "2020_07_23T00_00_00_000000", "filename": "standard_task_runner.py", "lineno": 54, "message": "Started process 281332 to run task", "try_number": "1", "levelname": "INFO", "dag_id": "example_bash_operator"}

What you expected to happen:

The asctime field should contain a proper timestamp when the log message is printed in JSON.

How to reproduce it:

Using any standard default Airflow installation with the Elasticsearch subpackage installed, follow the instructions for Writing Logs to Elasticsearch. A running Elasticsearch instance is not required to reproduce this bug.

Diff of changes to `airflow.cfg`
-- airflow.cfg
+++ airflow.cfg.new
@@ -9,7 +9,7 @@
 
 # Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
 # Set this to True if you want to enable remote logging.
-remote_logging = False
+remote_logging = True
 
 # Users must supply an Airflow connection id that provides access to the storage
 # location.
@@ -754,7 +754,7 @@
 
 [elasticsearch]
 # Elasticsearch host
-host =
+host = localhost:9200
 
 # Format of the log_id, which is used to query for a given tasks logs
 log_id_template = {dag_id}-{task_id}-{execution_date}-{try_number}
@@ -768,10 +768,10 @@
 frontend =
 
 # Write the task logs to the stdout of the worker, rather than the default files
-write_stdout = False
+write_stdout = True
 
 # Instead of the default log formatter, write the log lines as JSON
-json_format = False
+json_format = True
 
 # Log fields to also attach to the json output, if enabled
 json_fields = asctime, filename, lineno, levelname, message

Run the example_bash_operator DAG to generate log output. Check log files or stdout for JSON output.

Anything else we need to know:

I was able to reproduce this pretty consistently regardless of the environment I was testing in, however I never bothered to try setting the configuration via Environment Variables nor did I try to run it in any sort of containerized environment (such as Kubernetes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugThis is a clearly a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions