Add mayhem for API as a github workflow#43249
Conversation
Given the openapi spec and api server, this generates and runs random test payloads, looking for exceptional behavior. Mayhem for API docs: https://mayhem4api.forallsecure.com/docs/ mapi GitHub action docs: https://github.com/marketplace/actions/mayhem-for-api Signed-off-by: J. David Lowe <[email protected]>
|
I feel like I'm missing something important while reading this report -- "Internal Server Error" appears to be the response to most of this API fuzzing, which is kind of what I'd expect the daemon to return when fed otherwise garbage inputs? 🙈 😅 Can you give some idea of how we're intended to interpret this result? Is the implication that we should update our swagger to specifically say that "Internal Server Error" means bad arguments (which IMO is a bit implied?) or that we should be updating our parameters/types to be more specific on what they accept so that the fuzzer doesn't generate quite as many invalid values? (Aren't "invalid" inputs a major part of the point of a fuzzer? 😇) |
|
I guess there's some cases that return a 500, but should've returned a 4xx (most likely because we default to 500 if no specific error was assigned); for example, I see this one with a 500 status; So, the error is handled correctly, but not assigned an |
That's not an unreasonable expectation, and it appears to be true of dockerd, whether or not it's correct ;)
The hard-line position, taken by mapi by default, is that 500 means the server broke in some way, and that invalid inputs, no matter how bonkers, should result in 4xx. (And anecdotally, as someone with an operational role, I value that distinction so that I can have meaningful alerts triggered by 500s!) If that position isn't palatable to this project (understandable!), my suggestion would be to disable the generic |
Agreed (with some nuances); I like using the chart from https://www.codetinkerer.com/2015/12/04/choosing-an-http-status-code.html, which is quite helpful in case of doubt As to "500 means the server broke"; perhaps this is an area where HTTP status codes are not always a good fit for APIs. Generally, I'd describe these as "it's not your fault, it's ours" (the request is likely OK), but (unfortunately) we're not always able to tell if a failure is temporary, a configuration issue (could be an issue with the host), or a (temporary?) failure in one of the components that the daemon depends on (e.g., containerd, runc). Such errors remain to be a "best effort", in which cases, we will "handle" the error (so that the daemon as a whole remains functioning), and return the error information we have as error-message (to allow the user to investigate if needed).
I'm on the fence on this one. I think that many of these 500 errors should be considered a bug. There are some cases where we explicitly return a 500 (using Lines 63 to 66 in c94596a While those logs may help somewhat, the work is definitely not completed (and admitted, not on top of the priority list, unless it's causing breakage); that said, your tool may help to make more of these omissions visible.
This should be possible, to some extend: we could make the API return the stack if an error occurred. I don't think this should be the default behaviour, but we could add an option (perhaps an environment variable) to enable this. It won't be possible to return a stack trace for every error though; Golang errors do not contain a trace by default; we do use the https://github.com/pkg/errors package to generate some of our errors (which provides a stack trace), but that project has been "frozen", and many projects started to remove its use in favor of Golang's "native" error wrapping (which, as far as I'm aware, does not contain stack traces). For fun, I gave this a quick go during my lunch, and pushed it as a PR: #43252 |
Yep! "it's not your fault, it's ours" is a nicer way of phrasing what I was trying to convey with "the server broke in some way" (and those flowcharts are great, thanks!)
Yep, I imagine it would (especially given stack traces; that's exciting!) If the results are interesting, but fixing them isn't a high priority, it might make more sense to integrate on a schedule, rather that on pull requests. I don't love a PR-based integration until the baseline is clean, i.e. you can have some confidence that issues found were introduced in the current PR. |
|
This is really cool. |
- What I did
Added a github action to run Mayhem for API (which is a free automated testing tool for http APIs) against dockerd: given the openapi spec and api server, it generates and runs random test payloads, looking for exceptional behavior.
Disclosure: working on this tool is my day job!
- How I did it
The action builds and runs dockerd listening on a tcp port. There's a little bit of jiggery-pokery waiting for the server to start (without which the action is flaky, which nobody wants.) Then it runs mapi for 60 seconds; the results are reported as SARIF ("Code scanning results").
This is probably the easiest/most straightforward way to integrate. If you like mapi but aren't thrilled about the specifics of this workflow, there are lots of options, and I'd be more than happy to help get something working!
- How to verify it
Merging new workflows across forks is a bit weird. You can see some results in my fork, without doing any other work:
the action: https://github.com/makesoftwaresafe/moby/actions/runs/1850050316
the scan results: https://github.com/makesoftwaresafe/moby/pull/1/checks?check_run_id=5209023377
the details in the mayhem for API ui: https://mayhem4api.forallsecure.com/job/10840
To make it work in the upstream repo, there's a bit of out-of-PR effort to integrate:
- Description for the changelog
"Add mayhem for API as a github workflow"
- A picture of a cute animal (not mandatory but encouraged)