Skip to content

KEDA operator works slowly (or doesn't work) when some ScaledObject reconciliation produces timeouts #5083

@JorTurFer

Description

@JorTurFer

Report

Recently we have faced with an unexpected behavior when accidentally we deployed a ScaledObject targeting to a server producing timeouts (in our case, due to network firewall). When that happened, KEDA stop working or started working wrong.

I have replicated the scenario deploying a ScaledObject with a cron, adding/removing 1 instance on each minute. Suddenly, I deployed a Kafka Scaler targeting to a Kafka cluster blocked by network (and producing timeouts during the initial setup on reconciliation loop):
image

After this, KEDA stopped working because the operator was stuck on the reconciliation loop even though I set GOMAXPROCS: 8, so it's not a matter of routines. It's like if we have any deadlock at some point.
The only solution in this case has been removing the wrong ScaledObject.

In our internal case, we are using v2.11.2, so it's not related with latest version.

Expected Behavior

KEDA must prevent service disruptions just for a single ScaledObject wrongly configured

Actual Behavior

KEDA stops working as expected

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

Status

Ready To Ship

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions