Skip to content

add service profile integration tests for service profile metrics#2685

Merged
dadjeibaah merged 15 commits intomasterfrom
dad/sp-metrics-integration-tests
Apr 18, 2019
Merged

add service profile integration tests for service profile metrics#2685
dadjeibaah merged 15 commits intomasterfrom
dad/sp-metrics-integration-tests

Conversation

@dadjeibaah
Copy link
Contributor

This PR is a continuation of #2638. It adds tests that verify that retries are being performed after service profiles are added for a service and also tests to see if timeout limits for a particular route are respected.

Fixes #2518

Dennis Adjei-Baah added 2 commits April 10, 2019 15:06
@dadjeibaah dadjeibaah requested review from ihcsim and siggy April 11, 2019 01:09
@dadjeibaah dadjeibaah self-assigned this Apr 11, 2019
Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 11, 2019

@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 11, 2019

@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 11, 2019

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@dadjeibaah dadjeibaah force-pushed the dad/sp-metrics-integration-tests branch from 140c733 to c20eca0 Compare April 12, 2019 05:02
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 12, 2019

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 12, 2019

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 12, 2019

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 12, 2019

Dennis Adjei-Baah added 2 commits April 12, 2019 11:57
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 12, 2019

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 12, 2019

Copy link
Contributor

@ihcsim ihcsim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions and comments below.

cliLines = append(cliLines, routes)
}
if isWideOutput {
cmd = append(cmd, "-owide")
Copy link
Contributor

@ihcsim ihcsim Apr 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed, unless you want to add an else to the --output json below. The JSON output already contains the extra effective_* and actual_* fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice! thanks for pointing that out, I had no idea.

case "timeouts", "budgets":
// If the P99 latency is greater than 500ms retries are probably happening before applying
// the service profile and we can't reliably test the service profile.
assertion.assertFunc = func(rt *rowStat) bool { return rt.LatencyP99 < 500 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking latency is a bit unreliable. If the intent here is to check that retries aren't happening, can we just check if effective_success and actual_success is zero (since --failure-rate=1.0)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Testing for latency is pretty finicky, however, the behaviour we are testing is timeouts. The intent for this test case was to ensure that we aren't already hitting the retry timeout that we will eventually test for. My thinking here was, if we are already seeing timeouts surpassing the threshold even before applying the service profile with the intended timeout, we can't reliably prove that the service profile is doing its job after it's applied.

Another interesting thing here is with --failure-rate=1.0, we will still see effective_success and actual_success be zero even after we've applied the service profile and the service is retrying requests. Similar to your suggestion, I think comparing actual_rps and effective_rps would suffice. 🤔

// set in in service profile. hello-timeouts-service and hello-budgets always fails
// so we expect all request latencies to be greater than or equal to the timeout set.
case "timeouts", "budgets":
assertion.assertFunc = func(rt *rowStat) bool { return rt.LatencyP99 >= 500 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, checking latency can be unreliable. If the intent here is to check retries happened, but requests continue to fail because of --failure-rate=1.0, we can try to see if this works:

  • Configure slow cooker to output a fixed number of requests via the -totalRequests flag
  • Make sure effective_success and actual_success are zero
  • Use linkerd top to check the count column for retried requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of the initial ticket, I thought we were trying to specifically test the timeouts behavior. Maybe, since its unreliable we should just test that retries are being performed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, since its unreliable we should just test that retries are being performed?

SGTM, unless others have a way to do this. It isn't easy to reliably test timeouts without context. So far, I can't find anything in the service profile and route code that will be useful here, based on the little that I know.

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 15, 2019

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 17, 2019

Copy link
Member

@siggy siggy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! mostly tioli comments, good to go following ivan's comment around wide and a lint fix.


var TestHelper *testutil.TestHelper

type rowStat struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI, you could make jsonRouteStats (from the cmd package) public and re-use that.

sourceName: "tap",
namespace: testNamespace,
deployName: "deploy/t1",
deployName: "deployment/t1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: was this change necessary to match against returned data from a routes command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I had to use the full deployment name in order to get stuff back. I can make a change to the routes command to accept both deploy and deployment

ports:
- containerPort: 9999
restartPolicy: OnFailure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ws?


profile := &sp.ServiceProfile{}

// Grab the output and convert it to a service profile object for modification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🆒

}
}

func assertRouteStat(assertion *routeStatAssertion, t *testing.T, assertFn func(stat *rowStat) error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tioli: since routeStatAssertion is only used in one place, i'd probably just pass upstream, downstream, and namespace directly into assertRouteStat

Dennis Adjei-Baah added 2 commits April 18, 2019 09:47
Signed-off-by: Dennis Adjei-Baah <[email protected]>
Signed-off-by: Dennis Adjei-Baah <[email protected]>

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@dadjeibaah dadjeibaah merged commit be61465 into master Apr 18, 2019
@dadjeibaah dadjeibaah deleted the dad/sp-metrics-integration-tests branch April 18, 2019 18:01
@l5d-bot
Copy link
Collaborator

l5d-bot commented Apr 18, 2019

@admc admc mentioned this pull request Apr 18, 2019
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce ServiceProfile test plan

4 participants