Skip to content

Warn if robots.txt is accessed #17962

@thejens

Description

@thejens

Description

#17946 implements a /robots.txt endpoint to block search engines crawling Airflow - in the cases where it is (accidentally) exposed to the public Internet.

If we record any GET requests to that end-point we'd have a strong warning flag that the deployment is exposed, and could issue a warning in the UI, or even enable some kill-switch on the deployment.

Some deployments are likely intentionally available and rely on auth mechanisms on the login endpoint, so there should be a config option to suppress the warnings.

An alternative approach would be to monitor for requests from specific user-agents used by crawlers for the same reasons

Use case/motivation

People who accidentally expose airflow have a slightly higher chance of realising they've done so and tighten their security.

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions