Add detection of buggy DEP cursors#1016
Conversation
|
I would be careful with this. Resetting the cursor and re-doing a fetch will cause all devices to re-sync. Would it be better to just go into the normal polling loop when this condition is detected? |
|
Yeah, I'm not 100% sure this is the way to go either, but it does follow the existing behavior when the cursor is explicitly invalid (which one could argue is the case here as well). By "normal polling loop" do you mean use the fetch request instead of the sync request? Presumably, it's going to have the same issue with the cursor (either returns the same one, or we delete it and sync all the devices again). An alternative might be to simply detect the issue (the cursor being the same), and exit the loop and log a message letting the user know what's going on. Then provide a way for the user to manually delete the cursor if they want - either provide docs for removing it from the db, or, even better, an API endpoint to do it. |
No I mean try and avoid doing a fetch. Keep the cursor and the sync mode - basically just pretend like the "more" condition isn't true and don't |
|
Ah, yeah that would be much less aggressive. I do wonder if Apple would eventually return the right thing, though, or if the sync would get stuck (possibly forever?) at that point. Maybe we should also log a message if the cursor is the same and When I get a free moment, I'll update this PR to do that instead. |
f57a540 to
bb14556
Compare
|
I pushed a new change to just exit the sync early and log an error message. |
|
Thinking about this more: I wonder if we're getting a cursor exhausted message for this and that's a clue that we should be ignoring this cursor. 🤔 |
|
In this particular case, the API responded 200 OK, more_to_follow=true, and with the same cursor used in the request. (If it had responded with any other HTTP status, the existing code would have errored out.) Based on Apple's docs, that's incorrect behavior from Apple's side for sure. |
Well, the logs in the Slack thread are inconclusive because I want to see the preceding Fetch runs but I'm curious if it got a cursor exhausted on the fetch run (up in line 289 in this PR) and then every next subsequent sync was using (or not handling) an exhausted cursor? Maybe? Just sorta thinking out loud. I guess the logic works either way; but if that's expected behavior maybe the logging could be omitted? Dunno, I'd have to see it in action I guess. I also ask because NanoDEP's sync logic is slightly different: |
Recently on the MacAdmins Slack @trekkim described an issue where the DEP syncer was syncing several times a second.
On closer inspection, Apple's response to the fetch devices endpoint used the same cursor sent in the request, and set the
more_to_followfield to true. This caused MicroMDM to make the same request over and over again in a tight loop.Ultimately the fix was to delete the "bad" cursor out of the database and start a new sync.
Ideally, Apple would fix this behavior on their side. Until then, this PR adds detection (return same cursor and
more_to_follow=true) and correction (fetch with an empty cursor) for this situation.