-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Description
Apache Airflow version: 1.10.11
Kubernetes version (if you are using kubernetes) (use kubectl version): N/A
Environment:
- Cloud provider or hardware configuration: QEMU
- OS (e.g. from /etc/os-release): Ubuntu 20.04 (and Fedora 32)
- Kernel (e.g.
uname -a): 5.4.0-42-generic (and 5.6.6-300.fc32.x86_64) - Install tools: Virtualenv
- Others: Python 2.7.18 (and Python 3.8.3)
What happened:
If Airflow is set to log in Plaintext, it will print the log messages with asctime properly set.
[2020-07-22 15:13:13,783] {standard_task_runner.py:54} INFO - Started process 116982 to run task
With the configuration changes to print to JSON, asctime is set to null.
{"asctime": null, "task_id": "runme_0", "execution_date": "2020_07_23T00_00_00_000000", "filename": "standard_task_runner.py", "lineno": 54, "message": "Started process 281332 to run task", "try_number": "1", "levelname": "INFO", "dag_id": "example_bash_operator"}
What you expected to happen:
The asctime field should contain a proper timestamp when the log message is printed in JSON.
How to reproduce it:
Using any standard default Airflow installation with the Elasticsearch subpackage installed, follow the instructions for Writing Logs to Elasticsearch. A running Elasticsearch instance is not required to reproduce this bug.
Diff of changes to `airflow.cfg`
-- airflow.cfg
+++ airflow.cfg.new
@@ -9,7 +9,7 @@
# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Set this to True if you want to enable remote logging.
-remote_logging = False
+remote_logging = True
# Users must supply an Airflow connection id that provides access to the storage
# location.
@@ -754,7 +754,7 @@
[elasticsearch]
# Elasticsearch host
-host =
+host = localhost:9200
# Format of the log_id, which is used to query for a given tasks logs
log_id_template = {dag_id}-{task_id}-{execution_date}-{try_number}
@@ -768,10 +768,10 @@
frontend =
# Write the task logs to the stdout of the worker, rather than the default files
-write_stdout = False
+write_stdout = True
# Instead of the default log formatter, write the log lines as JSON
-json_format = False
+json_format = True
# Log fields to also attach to the json output, if enabled
json_fields = asctime, filename, lineno, levelname, message
Run the example_bash_operator DAG to generate log output. Check log files or stdout for JSON output.
Anything else we need to know:
I was able to reproduce this pretty consistently regardless of the environment I was testing in, however I never bothered to try setting the configuration via Environment Variables nor did I try to run it in any sort of containerized environment (such as Kubernetes).