Skip to content

Fix flaky test TestServiceUpdateSecrets#38499

Merged
cpuguy83 merged 2 commits intomoby:masterfrom
olljanat:change_serviceIsUpdated2
Jan 9, 2019
Merged

Fix flaky test TestServiceUpdateSecrets#38499
cpuguy83 merged 2 commits intomoby:masterfrom
olljanat:change_serviceIsUpdated2

Conversation

@olljanat
Copy link
Contributor

@olljanat olljanat commented Jan 5, 2019

- What I did
Alternative for #38474 to solve #37547

- How I did it
Took original commit from #38474 as baseline and added new serviceSpecIsUpdated() function which checks only that service version is updated when only updating service labels.

- How to verify it
Run CI builds couple of times and survive without TestServiceUpdateSecrets or TestServiceUpdateConfigs failures.

@codecov
Copy link

codecov bot commented Jan 5, 2019

Codecov Report

Merging #38499 into master will decrease coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #38499      +/-   ##
==========================================
- Coverage   36.65%   36.62%   -0.04%     
==========================================
  Files         608      608              
  Lines       45174    45174              
==========================================
- Hits        16560    16544      -16     
- Misses      26329    26341      +12     
- Partials     2285     2289       +4

@olljanat
Copy link
Contributor Author

olljanat commented Jan 5, 2019

(reserved for my derek commands)

@derek derek bot added the area/testing label Jan 5, 2019
@olljanat
Copy link
Contributor Author

olljanat commented Jan 5, 2019

Wow 😮 this PR passed CI on first try. Haven't see that happening earlier.

Anyway as purpose is see if those flaky tests are fixed I will do force push to see if they passes second time.

@olljanat
Copy link
Contributor Author

olljanat commented Jan 5, 2019

Only RS1 failed on second run. Looks nice.
@thaJeztah @vdemeester PTAL

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some thoughts, but happy to get feedback on those 👍

@olljanat olljanat force-pushed the change_serviceIsUpdated2 branch from 457fcd7 to d5cc0ad Compare January 8, 2019 17:58
Tests which will re-deploy containers uses function serviceIsUpdated() to
make sure that service update really reached state UpdateStateCompleted.

Tests which will not re-deploy container uses function
serviceSpecIsUpdated to make sure that service version is increased.

Signed-off-by: Olli Janatuinen <[email protected]>
@olljanat olljanat force-pushed the change_serviceIsUpdated2 branch from d5cc0ad to b868ada Compare January 8, 2019 18:01
@olljanat olljanat changed the title integration: wait for service update to be completed - alternative (num 2) Fix flaky test TestServiceUpdateSecrets Jan 8, 2019
@olljanat
Copy link
Contributor Author

olljanat commented Jan 8, 2019

@kolyshkin you maybe also want to look this?
@cpuguy83 promised to do some checks for it (conversation is on slack)

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to make our CI work (thanks!!)

But we should dig into this indeed to see why we can't rely on service.UpdateStatus for this; this feels like a band-aid for either a bug, or an issue in the design

@olljanat
Copy link
Contributor Author

olljanat commented Jan 9, 2019

@thaJeztah on that case IMO correct approach would be add similar label modify tests to swarmkit side.

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 9, 2019

This is interesting that updating labels actually even sets the update status to nil.

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 9, 2019

In swarmkit when the service is updated in the store, the UpdateStatus is always set to nil. I guess since there's nothing left to reconcile no update status needs to be set.

@cpuguy83
Copy link
Member

cpuguy83 commented Jan 9, 2019

This isServiceUpdated seems a bit weak. I think we should pass in the original spec version and make sure the new index version is > the original just like we do in isServiceSpecUpdated.

@olljanat
Copy link
Contributor Author

olljanat commented Jan 9, 2019

@cpuguy83 can be done but I would still prefer to merge this and create another PR for those improvements so we get CI working again.

UpdateStatus is set to Nil on https://github.com/docker/swarmkit/blob/0091d9285168040649140c76629e4284857e5365/manager/controlapi/service.go#L827-L828

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The old implementation just looks incorrect for what it was doing.

@cpuguy83 cpuguy83 merged commit 9dd4341 into moby:master Jan 9, 2019
@thaJeztah
Copy link
Member

Looking at that code; #38499 (comment)

As part of the Service update, controlapi.UpdateService() sets Updatestatus to nil to indicate that an update is in progress; after that the updated service is written to the store (store.UpdateService()), which updates the UpdatedAt and Version.

Updater.Run() checks if any tasks are "dirty" and need updating; https://github.com/docker/swarmkit/blob/486ad69951868792367a503042ef3f55b399cc40/manager/orchestrator/update/updater.go#L137-L141

	var dirtySlots []orchestrator.Slot
	for _, slot := range slots {
		if u.isSlotDirty(slot) {
			dirtySlots = append(dirtySlots, slot)
		}
	}

Some changes on a service don't require updates to the task (container); to check if updates are needed, these functions are called:

If changes are needed, len(dirtySlots) > 0, in which case and update is started (if no update is in progress yet). From that part of the code, it's clear that service.UpdateStatus == nil is required to even consider starting an update. Updater.startUpdate() is called, which sets the service's UpdateStatus to UpdateStatus_UPDATING, and sets the UpdateStatus.StartedAt to the current time.

However, in case of the service-label update, no updates to the tasks are needed, so len(dirtySlots) == 0; in that case, no action is needed, given that no update (or "rollback") was ever started for this. In that case, the Service.UpdateStatus is not updated, so remains nil

	// Abort immediately if all tasks are clean.
	if len(dirtySlots) == 0 {
		if service.UpdateStatus != nil &&
			(service.UpdateStatus.State == api.UpdateStatus_UPDATING ||
				service.UpdateStatus.State == api.UpdateStatus_ROLLBACK_STARTED) {
			u.completeUpdate(ctx, service.ID)
		}
		return
	}

Even if we would skip the check (if an updated or rollback was started), update.completeUpdate() checks service.UpdateStatus. If service.UpdateStatus == nil, it assumes that another update was started, so the service.UpdateStatus is not updated;

	if service.UpdateStatus == nil {
		// The service was changed since we started this update
		return nil
	}

Based on the above, it looks like service.UpdateStatus == nil is ambiguous; as it can either be;

  • An update was triggered, but no action was required (in which case it's equal to UpdateStatus_COMPLETED or UpdateStatus_ROLLBACK_COMPLETED)
  • An update was triggered, but the actual update was not yet started

I think we can make it non-ambiguous if we set the status to UpdateStatus_COMPLETED if there are no tasks to update (len(dirtySlots) == 0); this requires a change to Updater.Run() (change the check), and to update.completeUpdate()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants