Skip to content

telemetry: use native stats by default#41991

Merged
istio-testing merged 2 commits intoistio:masterfrom
kyessenov:drop_wasm_stats
Nov 22, 2022
Merged

telemetry: use native stats by default#41991
istio-testing merged 2 commits intoistio:masterfrom
kyessenov:drop_wasm_stats

Conversation

@kyessenov
Copy link
Copy Markdown
Contributor

@kyessenov kyessenov commented Nov 14, 2022

Signed-off-by: Kuat Yessenov [email protected]

Follow up to #41441 but for non-telemetry API installation (default), until a plan to default to telemetry API emerges. This should give sufficient burn time in CI, and avoid support for rarely used Wasm version.

@kyessenov kyessenov requested a review from a team as a code owner November 14, 2022 21:16
@istio-testing istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 14, 2022
@kyessenov kyessenov added the release-notes-none Indicates a PR that does not require release notes. label Nov 14, 2022
Copy link
Copy Markdown
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just wanted to verify - are all the overrides possible compatible with native vs WASM? Or are there some obscure parts of the config that are not implemented/compatible?

@kyessenov
Copy link
Copy Markdown
Contributor Author

@howardjohn What exactly do you mean by overrides? Some CEL expressions that refer to Wasm attributes are not compatible, because it's not Wasm. Those are not documented, so unlikely to actually be in-use.

@howardjohn
Copy link
Copy Markdown
Member

I mean the .Values.telemetry.v2.prometheus.configOverride.outboundSidecar. I think we had established all fields exposed via Telemetry are compatible, but the configOverride can access arbitrary fields in the internal proto.

Signed-off-by: Kuat Yessenov <[email protected]>
@kyessenov kyessenov requested a review from a team as a code owner November 14, 2022 22:30
@kyessenov
Copy link
Copy Markdown
Contributor Author

The schema is the same for wasm and stats filter, but I marked some fields removed. Seems like istio fails on the removed xDS fields, so we need to set the flag.

@kyessenov
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@zirain
Copy link
Copy Markdown
Member

zirain commented Nov 16, 2022

/retest

@zirain
Copy link
Copy Markdown
Member

zirain commented Nov 17, 2022

/retest

1 similar comment
@zirain
Copy link
Copy Markdown
Member

zirain commented Nov 17, 2022

/retest

@kyessenov
Copy link
Copy Markdown
Contributor Author

@zirain argh, another metric for istio version needs to be added. not sure why this test fails now only.

@zirain
Copy link
Copy Markdown
Member

zirain commented Nov 17, 2022

@zirain argh, another metric for istio version needs to be added. not sure why this test fails now only.

IC

istioctl x es sleep-69cfb4968f-h72rz | grep .istio_build
wasmcustom.component.proxy.tag.1.15.3;.istio_build: 1

@kyessenov
Copy link
Copy Markdown
Contributor Author

/retest

Copy link
Copy Markdown
Member

@zirain zirain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kyessenov Why do we still need the envoyfilter? Couldn't move it to native implement?

@kyessenov
Copy link
Copy Markdown
Contributor Author

@hzxuzhonghu We may not need this if #37645 gets done, but noone is assigned to that issue, so until then, this will give us some burn time with EnvoyFilters.

@istio-testing istio-testing merged commit 639632b into istio:master Nov 22, 2022
Comment thread pkg/envoy/proxy.go
// than the flush interval.
"--file-flush-interval-msec", "1000",
"--disable-hot-restart", // We don't use it, so disable it to simplify Envoy's logic
"--allow-unknown-static-fields",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed? Are there unknown fields now that we are accepting?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because Wasm used a JSON parser that ignored fields with typos so I'm fairly sure some metric overrides had ill-defined fields.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "static" field flags affects ECDS is still not clear to me, likely a bug somewhere.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried we are going to miss real bugs now for this small telemetry case. But at the same time it would not be great if users with broken overrides pod's don't startup when they upgrade.

On the other hand, they would probably immediately find the issue and rollback/fix I would imagine if we did hard fail...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still get a warning for unknown fields.

I think we have to support unknown fields for automatic upgrades, there's just no other way to introduce new fields without putting conditions everywhere.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this is only for static config, right? So over XDS we are fine; static is tightly coupled with the Envoy version since the bootsrtap is generated by pilot-agent which is in the same container... except now we have bootstrap DS so maybe this makes sense now.

I am a bit confused how this dynamic XDS is impacted by allow-unknown-static-fields which I thought was only for startup bootstrap config

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused how this dynamic XDS is impacted by allow-unknown-static-fields which I thought was only for startup bootstrap config

Me, too, it's probably a bug but it's already shipped so we have to live with it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. If its a bug might be worth filing an envoy issue? but this lgtm for now - even with it fixed, for Bootstrap DS we may want it.

kyessenov added a commit to kyessenov/istio that referenced this pull request Jan 5, 2023
* telemetry: use native stats by default

Signed-off-by: Kuat Yessenov <[email protected]>

* removed fields

Signed-off-by: Kuat Yessenov <[email protected]>

Signed-off-by: Kuat Yessenov <[email protected]>
istio-testing pushed a commit that referenced this pull request Jan 5, 2023
* update envoy

Signed-off-by: Kuat Yessenov <[email protected]>

* fix

Signed-off-by: Kuat Yessenov <[email protected]>

* fixes

Signed-off-by: Kuat Yessenov <[email protected]>

* test

Signed-off-by: Kuat Yessenov <[email protected]>

* telemetry: use native stats by default (#41991)

* telemetry: use native stats by default

Signed-off-by: Kuat Yessenov <[email protected]>

* removed fields

Signed-off-by: Kuat Yessenov <[email protected]>

Signed-off-by: Kuat Yessenov <[email protected]>

* fix

Signed-off-by: Kuat Yessenov <[email protected]>

* make gen

Signed-off-by: Kuat Yessenov <[email protected]>

Signed-off-by: Kuat Yessenov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes-none Indicates a PR that does not require release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants