Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Oct 14, 2020

Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk requested review from dimberman, houqp and kaxil October 14, 2020 22:52
@potiuk potiuk force-pushed the more-stable-kubernetes-port-forwarding branch from a4e9a9a to 2782072 Compare October 14, 2020 22:54
@potiuk potiuk requested review from ashb and turbaszek October 14, 2020 23:01
@potiuk potiuk force-pushed the more-stable-kubernetes-port-forwarding branch from 2782072 to 740355d Compare October 14, 2020 23:31
@potiuk
Copy link
Member Author

potiuk commented Oct 15, 2020

Hey @dimberman -> I hope this one will fix the recent problems with kubernetes port-forward stability. There were a lot of prroblems caused by it recently and I implemented another workaround - increasing port numbers between subsequent tries. I thin ti will solve the problem for good.

Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.
@potiuk potiuk force-pushed the more-stable-kubernetes-port-forwarding branch from 740355d to 3ff8087 Compare October 15, 2020 00:34
@potiuk
Copy link
Member Author

potiuk commented Oct 15, 2020

@kaxil @ashb @turbaszek -> this one should solve the Kubernetes problems we started to experienced recently. They apparently were not related to the provider split as I originally suspected - but to some changes in the way how port forwarding started to interact with GA runner. So looking forward to reviews 👍

One more thing and maybe you can help me verify my theory.

I believe GA is kinda reusing workers without full restarts between them - that might be the reason for 137 errors and resource exhaustion because they do not clean up the machines fully.

It could be an accident this is the only explanation for an error I saw yesterday that some other jobs were affected by the kubectl background processes that we started in other jobs. This was an earlier version of the fix, but it did not have the trap that kills (first gently and then forcefully) all kubectl instances running in the background:

https://github.com/apache/airflow/runs/1256383093?check_suite_focus=true

There were seemingly unrelated errors (in several other jobs). Seems like for other jobs (theoretically in different machines!), the tests were affected by the background-running hanging kubectls, as if the 8080 port numbers continued to be be "taken". I am not 100% sure of that, but that is the only explanation I have for this. The errors went completely away when I added the trap to kill the kubectls (in unrelated jobs !).

@potiuk potiuk merged commit 3447b55 into apache:master Oct 15, 2020
@potiuk potiuk deleted the more-stable-kubernetes-port-forwarding branch October 15, 2020 09:06
potiuk added a commit that referenced this pull request Nov 14, 2020
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.

(cherry picked from commit 3447b55)
@potiuk potiuk added the type:misc/internal Changelog: Misc changes that should appear in change log label Nov 14, 2020
potiuk added a commit that referenced this pull request Nov 16, 2020
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.

(cherry picked from commit 3447b55)
potiuk added a commit that referenced this pull request Nov 16, 2020
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.

(cherry picked from commit 3447b55)
kaxil pushed a commit that referenced this pull request Nov 18, 2020
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.

(cherry picked from commit 3447b55)
cfei18 pushed a commit to cfei18/incubator-airflow that referenced this pull request Mar 5, 2021
Seems that port forwarding during kubernetes tests started to behave
erratically - seems that kubectl port forward sometimes might hang
indefinitely rather than connect or fail.
We change the strategy a bit to try to allocate
increasing port numbers in case something like that happens.

(cherry picked from commit 3447b55)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools type:misc/internal Changelog: Misc changes that should appear in change log

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants