Skip to content

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Oct 24, 2020

closes: #11146

Adds a CLI command airflow kubernetes cleanup-pods --namespace NAMESPACE
to Clean up Kubernetes pods in evicted/failed/succeeded states for the Helm Chart

cc @alikhtag


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

closes: apache#11146

Adds a CLI command `airflow kubernetes cleanup-pods --namespace NAMESPACE`
to Clean up Kubernetes pods in evicted/failed/succeeded states for the Helm Chart
@kaxil kaxil requested review from ashb, dimberman and mik-laj October 24, 2020 00:24
@kaxil kaxil mentioned this pull request Oct 24, 2020
@kaxil
Copy link
Member Author

kaxil commented Oct 24, 2020

CI is green @mik-laj

config.load_incluster_config()
core_v1 = client.CoreV1Api()
print(f'Listing pods in namespace {namespace}')
pod_list = core_v1.list_namespaced_pod(namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On large clusters, retrieving the collection of some resource types may result in very large responses that can impact the server and client. For instance, a cluster may have tens of thousands of pods, each of which is 1-2kb of encoded JSON. Retrieving all pods across all namespaces may result in a very large response (10-20MB) and consume a large amount of server resources.
Should we use chunking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for a specific namespace, but yes chunking would be good - do you have any examples to hand?

Copy link
Member

@mik-laj mik-laj Oct 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue_token = None
while True:
    pod_list = core_v1.list_namespaced_pod(namespace, limit=500, _continue=continue_token)
    for pod in pod__list.items:
        # Process item
    continue_token = pod_list.metadata._continue
    if not continue_token:
        break    

Not tested, but should work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 5e4a078

@mik-laj
Copy link
Member

mik-laj commented Oct 24, 2020

Are we planning to backport this changr to Airflow 1.10? if not, can you add documentation that this option is not supported in Airflow 1.10. we maintain compatible with Airflow 1.10 and 2.0, so when this is not possible, we shoyld have documentation to inform the user about these limitations.

imagePullPolicy: {{ .Values.images.airflow.pullPolicy }}
# Don't use entry point here, we don't need to wait on pg-bouncer etc being available.
command: ["airflow-cleanup-pods", "--namespace={{ .Release.Namespace }}"]
args: ["kubernetes", "cleanup-pods", "--namespace={{ .Release.Namespace }}"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment left: #11802 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I am planning to backport this, will add 1.10.13 milestone

@kaxil kaxil added this to the Airflow 1.10.13 milestone Nov 3, 2020
@kaxil kaxil requested a review from mik-laj November 3, 2020 13:31
@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Nov 3, 2020
@github-actions
Copy link

github-actions bot commented Nov 3, 2020

The PR should be OK to be merged with just subset of tests as it does not modify Core of Airflow. The committers might merge it or can add a label 'full tests needed' and re-run it to run all tests if they see it is needed!

@kaxil kaxil merged commit 980c725 into apache:master Nov 3, 2020
@kaxil kaxil deleted the cleanup-pods-cli-helm-chart branch November 3, 2020 15:28
potiuk pushed a commit that referenced this pull request Nov 15, 2020
potiuk pushed a commit that referenced this pull request Nov 16, 2020
potiuk pushed a commit that referenced this pull request Nov 16, 2020
kaxil added a commit that referenced this pull request Nov 18, 2020
@kaxil kaxil modified the milestones: Airflow 1.10.13, Airflow 1.10.14 Dec 1, 2020
kaxil added a commit that referenced this pull request Dec 3, 2020
ashb pushed a commit that referenced this pull request Dec 3, 2020
kaxil added a commit that referenced this pull request Dec 3, 2020
kaxil added a commit that referenced this pull request Dec 3, 2020
AntonyRileyAtVerto pushed a commit to vertoanalytics/incubator-airflow that referenced this pull request Feb 2, 2021
- BugFix: Tasks with ``depends_on_past`` or ``task_concurrency`` are stuck (apache#12663)
- Fix issue with empty Resources in executor_config (apache#12633)
- Fix: Deprecated config ``force_log_out_after`` was not used (apache#12661)
- Fix empty asctime field in JSON formatted logs (apache#10515)
- [AIRFLOW-2809] Fix security issue regarding Flask SECRET_KEY (apache#3651)
- [AIRFLOW-2884] Fix Flask SECRET_KEY security issue in www_rbac (apache#3729)
- [AIRFLOW-2886] Generate random Flask SECRET_KEY in default config (apache#3738)
- Add missing comma in setup.py (apache#12790)
- Bugfix: Unable to import Airflow plugins on Python 3.8 (apache#12859)
- Fix setup.py missing comma in ``setup_requires`` (apache#12880)
- Don't emit first_task_scheduling_delay metric for only-once dags (apache#12835)

- Update setup.py to get non-conflicting set of dependencies (apache#12636)
- Rename ``[scheduler] max_threads`` to ``[scheduler] parsing_processes`` (apache#12605)
- Add metric for scheduling delay between first run task & expected start time (apache#9544)
- Add new-style 2.0 command names for Airflow 1.10.x (apache#12725)
- Add Kubernetes cleanup-pods CLI command for Helm Chart (apache#11802)
- Don't let webserver run with dangerous config (apache#12747)
- Replace pkg_resources with importlib.metadata to avoid VersionConflict errors (apache#12694)

- Clarified information about supported Databases
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021

def _delete_pod(name, namespace):
"""Helper Function for cleanup_pods"""
core_v1 = client.CoreV1Api()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use get_kube_client here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup we should

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:CLI area:helm-chart Airflow Helm Chart okay to merge It's ok to merge this PR as it does not require more tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cleanup CronJob fails

3 participants