Conversation
| tags.datadoghq.com/version: {{ $tag }} | ||
| name: {{ $app }} | ||
| spec: | ||
| serviceName: "" |
| operator: Equal | ||
| value: 'true' | ||
| effect: NoSchedule | ||
| serviceAccountName: ml-k8s-admin |
There was a problem hiding this comment.
don't think this is defined ml-k8s-admin. does {{ include "modelEngine.fullname" . }} provide sufficient permissions?
There was a problem hiding this comment.
iiuc the serviceaccount that the endpoint builder and gateway use should have enough permission
| serviceAccountName: ml-k8s-admin | ||
| volumes: | ||
| - configMap: | ||
| name: ml-worker-config |
There was a problem hiding this comment.
{{ .Values.aws.configMap.name }} ?
| requests: | ||
| cpu: 1000m | ||
| volumeMounts: | ||
| - mountPath: /root/.aws/config |
|
|
||
| SQS_SAMPLE_COUNT = 10 | ||
|
|
||
| logger = make_logger("model_engine_server.core.celery_autoscaler") |
There was a problem hiding this comment.
nit: use logger_name() that you can import as well
| return CELERY_AUTOSCALER_EXCLUDED_NAMESPACES | ||
| except ModuleNotFoundError: | ||
| return [] | ||
| finally: |
| else: | ||
| raise ValueError("broker_type not redis or sqs, how did we get here?") | ||
|
|
||
| env = os.getenv("DD_ENV", "training") |
There was a problem hiding this comment.
do we want to default to training here?
| @@ -0,0 +1,635 @@ | |||
| import asyncio as aio | |||
There was a problem hiding this comment.
could this be moved to the celery/ folder?
| {{- $env := .Values.context }} | ||
| {{- $tag := .Values.tag }} | ||
| {{- $service_identifier := .Values.serviceIdentifier }} | ||
| {{- $num_shards := ternary 30 3 (eq .Values.context "prod") }} |
There was a problem hiding this comment.
the 30 and 3 values feel like they should be configurable in the helm chart
| operator: Equal | ||
| value: 'true' | ||
| effect: NoSchedule | ||
| serviceAccountName: ml-k8s-admin |
There was a problem hiding this comment.
iiuc the serviceaccount that the endpoint builder and gateway use should have enough permission
|
|
||
| for f in dataclasses.fields(CeleryAutoscalerParams): | ||
| k = f.name | ||
| # TODO: this technically can remain the same, since Celery deployments won't have the `channel` field and |
There was a problem hiding this comment.
I guess we can remove this comment probably? there is no NATS autoscaler anymore
| {{- $tag := .Values.tag }} | ||
| {{- $service_identifier := .Values.serviceIdentifier }} | ||
| {{- $num_shards := ternary 30 3 (eq .Values.context "prod") }} | ||
| {{- range $base_app := tuple "celery-autoscaler-elasticache" "celery-autoscaler-sqs" }} |
There was a problem hiding this comment.
it's probably fine to have this range but I think that we'd be able to choose which one based on whether we're having endpoints use redis or sqs at some point
yixu34
left a comment
There was a problem hiding this comment.
Let's parameterize the cost tags.
| metadata: | ||
| labels: | ||
| app: {{ $app }} | ||
| team: infra |
There was a problem hiding this comment.
These tags shouldn't be hardcoded, they're internal.
There was a problem hiding this comment.
see other deployments. something like
{{- include "modelEngine.selectorLabels.celeryAutoScaler" . | nindent 4 }}
{{- include "modelEngine.labels" . | nindent 4 }}
tags.datadoghq.com/service: {{ include "modelEngine.celeryAutoScalerName" . }}
should
There was a problem hiding this comment.
I think the labels are hardcoded in some other deployments (e.g. searching team: infra gives me this file... Do we want the product tag for the autoscaler to change from common to model-engine? (If this is ok, I can reuse modelEngine.labels as suggested 🙂)
There was a problem hiding this comment.
yeah i think that file should parametrized tags. don't see why balloon deployment needs a different product tag
| metadata: | ||
| labels: | ||
| app: {{ $app }} | ||
| team: infra |
There was a problem hiding this comment.
see other deployments. something like
{{- include "modelEngine.selectorLabels.celeryAutoScaler" . | nindent 4 }}
{{- include "modelEngine.labels" . | nindent 4 }}
tags.datadoghq.com/service: {{ include "modelEngine.celeryAutoScalerName" . }}
should
| ad.datadoghq.com/main.logs: '[{"service": "{{ $app }}", "source": "python"}]' | ||
| sidecar.istio.io/inject: "false" | ||
| labels: | ||
| app: {{ $app }} |
There was a problem hiding this comment.
these tags should also be templatized
| # num_shards is number of instances of the autoscaler | ||
| celery_autoscaler: | ||
| enabled: true | ||
| message_broker: sqs |
There was a problem hiding this comment.
this is already defined as the root-level "celeryBrokerType" (can see that at the bottom of this file)
seanshi-scale
left a comment
There was a problem hiding this comment.
had one nit about deduplicating info in the helm values but the rest looks good
Pull Request Summary
Move over the Celery autoscaler
Test Plan and Usage Guide
Installing a test deployment autoscaled an async endpoint from 0 to 1 pod