-
Notifications
You must be signed in to change notification settings - Fork 844
Fix trivy scan failures w/ otel upgrades #5906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tracing/tracing.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @aluzzardi the jaeger exporter is deprecated, jaeger now says to just use plain otel. I had to add this to get lint passing.
Is this code still worth maintaining? I don't remember if it's from older iterations on metrics/logging or if it's still in use.
|
@helderco any clue why python tests would be failing consistently with this error all of a sudden? https://github.com/dagger/dagger/actions/runs/6541678095/job/17763530250?pr=5906#step:4:938 Doesn't seem related to any changes here (? at least I can't see how it could be). Hoping it's just some ephemeral thing, but let me know if you think it seems legit. |
|
the dagger-in-dagger engine tests are failing only on github-paid-runner consistently, all other tests are okay though 😕 can repro locally, |
vito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving on principle so you can merge once CI is sorted out 🫡
If there's mismatches between dependency versions on otel libs between us and buildkit (and its various deps), then we can get schema merge errors. While sad and worth notifying the user about via a log statement, these are not actually fatal, so it's not worth exiting over. Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
This is a bit silly. The previous update to log otel failures caused the
word "merge" to appear in the output (it's an error about merging otel
schemas), which in turn caused our test assertion to fail because it's
checking that no "merge" (-op) vertexes show up in the output by
default.
There's not really any great way to make the test more robust than I can
see other than just asserting on a more specific string, "merge (". It's
important to have this test coverage though, so this seems like the best
solution.
Signed-off-by: Erik Sipsma <[email protected]>
|
Looking into the dagger-in-dagger engine tests failing on github-paid-runner, it seems it may be a legit problem (glad we have these tests if so!) I saw from a combination of SIGQUIT stacks and and that it kept spawning a short lived Unfortunately, that actual output gets entirely hidden (I'm not sure it's even possible for us to fix that without upstream changes), but with strace I could see that it seems to be printing the same So my best guess is that while I updated our code to ignore that error in the engine + cli, the I haven't pieced together every strand of spaghetti to explain exactly why this would only happen with the If the above is correct, then our options are:
cc @jedevc in case you have any ideas too |
|
Happy update: I gave a quick shot at just pulling I'm guessing that the deps w/ vulnerabilities only appear in the "server-side" rather than client side, so this makes sense. I think we should just go with this for now. Cleaning up the commit w/ comments on what's going on, then will merge if everything is green still 🤞 |
See comment in `buildctlBin` func in `internal/mage/util/engine.go` for reasoning. Signed-off-by: Erik Sipsma <[email protected]>
I suspect this is a conflict inside the cache volumes. We need to at least have different caches per python version. |
* engine: upgrade otel deps to address trivy scan result https://github.com/dagger/dagger/security/dependabot/381 Signed-off-by: Erik Sipsma <[email protected]> * engine: log+ignore otel tracer errors If there's mismatches between dependency versions on otel libs between us and buildkit (and its various deps), then we can get schema merge errors. While sad and worth notifying the user about via a log statement, these are not actually fatal, so it's not worth exiting over. Signed-off-by: Erik Sipsma <[email protected]> * core: ignore lint failure about deprecated jaeger exporter import Signed-off-by: Erik Sipsma <[email protected]> * assert on more specific message in internal pipeline test This is a bit silly. The previous update to log otel failures caused the word "merge" to appear in the output (it's an error about merging otel schemas), which in turn caused our test assertion to fail because it's checking that no "merge" (-op) vertexes show up in the output by default. There's not really any great way to make the test more robust than I can see other than just asserting on a more specific string, "merge (". It's important to have this test coverage though, so this seems like the best solution. Signed-off-by: Erik Sipsma <[email protected]> * ci: use buildctl binary from upstream image See comment in `buildctlBin` func in `internal/mage/util/engine.go` for reasoning. Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Christian Schlatter <[email protected]>
* engine: upgrade otel deps to address trivy scan result https://github.com/dagger/dagger/security/dependabot/381 Signed-off-by: Erik Sipsma <[email protected]> * engine: log+ignore otel tracer errors If there's mismatches between dependency versions on otel libs between us and buildkit (and its various deps), then we can get schema merge errors. While sad and worth notifying the user about via a log statement, these are not actually fatal, so it's not worth exiting over. Signed-off-by: Erik Sipsma <[email protected]> * core: ignore lint failure about deprecated jaeger exporter import Signed-off-by: Erik Sipsma <[email protected]> * assert on more specific message in internal pipeline test This is a bit silly. The previous update to log otel failures caused the word "merge" to appear in the output (it's an error about merging otel schemas), which in turn caused our test assertion to fail because it's checking that no "merge" (-op) vertexes show up in the output by default. There's not really any great way to make the test more robust than I can see other than just asserting on a more specific string, "merge (". It's important to have this test coverage though, so this seems like the best solution. Signed-off-by: Erik Sipsma <[email protected]> * ci: use buildctl binary from upstream image See comment in `buildctlBin` func in `internal/mage/util/engine.go` for reasoning. Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]>
* engine: upgrade otel deps to address trivy scan result https://github.com/dagger/dagger/security/dependabot/381 Signed-off-by: Erik Sipsma <[email protected]> * engine: log+ignore otel tracer errors If there's mismatches between dependency versions on otel libs between us and buildkit (and its various deps), then we can get schema merge errors. While sad and worth notifying the user about via a log statement, these are not actually fatal, so it's not worth exiting over. Signed-off-by: Erik Sipsma <[email protected]> * core: ignore lint failure about deprecated jaeger exporter import Signed-off-by: Erik Sipsma <[email protected]> * assert on more specific message in internal pipeline test This is a bit silly. The previous update to log otel failures caused the word "merge" to appear in the output (it's an error about merging otel schemas), which in turn caused our test assertion to fail because it's checking that no "merge" (-op) vertexes show up in the output by default. There's not really any great way to make the test more robust than I can see other than just asserting on a more specific string, "merge (". It's important to have this test coverage though, so this seems like the best solution. Signed-off-by: Erik Sipsma <[email protected]> * ci: use buildctl binary from upstream image See comment in `buildctlBin` func in `internal/mage/util/engine.go` for reasoning. Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]>
* engine: upgrade otel deps to address trivy scan result https://github.com/dagger/dagger/security/dependabot/381 Signed-off-by: Erik Sipsma <[email protected]> * engine: log+ignore otel tracer errors If there's mismatches between dependency versions on otel libs between us and buildkit (and its various deps), then we can get schema merge errors. While sad and worth notifying the user about via a log statement, these are not actually fatal, so it's not worth exiting over. Signed-off-by: Erik Sipsma <[email protected]> * core: ignore lint failure about deprecated jaeger exporter import Signed-off-by: Erik Sipsma <[email protected]> * assert on more specific message in internal pipeline test This is a bit silly. The previous update to log otel failures caused the word "merge" to appear in the output (it's an error about merging otel schemas), which in turn caused our test assertion to fail because it's checking that no "merge" (-op) vertexes show up in the output by default. There's not really any great way to make the test more robust than I can see other than just asserting on a more specific string, "merge (". It's important to have this test coverage though, so this seems like the best solution. Signed-off-by: Erik Sipsma <[email protected]> * ci: use buildctl binary from upstream image See comment in `buildctlBin` func in `internal/mage/util/engine.go` for reasoning. Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]>
* engine: upgrade otel deps to address trivy scan result https://github.com/dagger/dagger/security/dependabot/381 Signed-off-by: Erik Sipsma <[email protected]> * engine: log+ignore otel tracer errors If there's mismatches between dependency versions on otel libs between us and buildkit (and its various deps), then we can get schema merge errors. While sad and worth notifying the user about via a log statement, these are not actually fatal, so it's not worth exiting over. Signed-off-by: Erik Sipsma <[email protected]> * core: ignore lint failure about deprecated jaeger exporter import Signed-off-by: Erik Sipsma <[email protected]> * assert on more specific message in internal pipeline test This is a bit silly. The previous update to log otel failures caused the word "merge" to appear in the output (it's an error about merging otel schemas), which in turn caused our test assertion to fail because it's checking that no "merge" (-op) vertexes show up in the output by default. There's not really any great way to make the test more robust than I can see other than just asserting on a more specific string, "merge (". It's important to have this test coverage though, so this seems like the best solution. Signed-off-by: Erik Sipsma <[email protected]> * ci: use buildctl binary from upstream image See comment in `buildctlBin` func in `internal/mage/util/engine.go` for reasoning. Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]>
PRs and CI on main are now failing due to trivy finding vulns in our engine image, due to a new GHSA around otel: https://github.com/dagger/dagger/security/dependabot/381
Unfortunately, simply bumping otel libs is never "simple". After bumping our go.mod, the engine and CLI started exiting on startup w/ error
cannot merge resource due to conflicting Schema URL.I'd be lying if I said I understood all the inner machinations of otel, but from my understanding this is due to the fact that buildkit (and other transitive dependencies) have not yet upgraded their versions of otel libs. Due to the fact that these tracers are global to the process, we end up with mismatches in some versions and these failures.
There is a PR in buildkit open already to update their versions, but it stalled because they don't want to upgrade until their dep, containerd, has: moby/buildkit#4318
So, the solution here is to just log these errors rather than fail+exit because of them. If there's a better approach, I'm all ears 🙂