add service profile integration tests for service profile metrics by dadjeibaah · Pull Request #2685 · linkerd/linkerd2

dadjeibaah · 2019-04-11T01:09:45Z

This PR is a continuation of #2638. It adds tests that verify that retries are being performed after service profiles are added for a service and also tests to see if timeout limits for a particular route are respected.

Fixes #2518

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-11T01:32:24Z

Integration test results for 5688cef: fail 😕
Log output: https://gist.github.com/5cba440f9a130da64a27da147460c1c3

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-11T17:15:00Z

Integration test results for 5e9096e: fail 😕
Log output: https://gist.github.com/fa3f2c63b2f70e25ca97769e98fc4580

l5d-bot · 2019-04-11T19:00:57Z

Integration test results for 140c733: fail 😕
Log output: https://gist.github.com/07933f089315b5ace5edbe66313498fa

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-12T05:17:13Z

Integration test results for c20eca0: fail 😕
Log output: https://gist.github.com/ef403f78b9910a62077a94927f7c0a8b

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-12T17:15:14Z

Integration test results for 88ba27b: fail 😕
Log output: https://gist.github.com/2a254de48986f81d0b6bde51361cfc20

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-12T17:33:30Z

Integration test results for 9e834d9: fail 😕
Log output: https://gist.github.com/c1e5d217bd4df8b1f36ad1c4db5b6538

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-12T17:51:27Z

Integration test results for 192864e: fail 😕
Log output: https://gist.github.com/679fb82fdc47bd96b6415db4a5f1db89

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-12T21:08:15Z

Integration test results for 0eddb5b: fail 😕
Log output: https://gist.github.com/6dadc15f35a0e684af75444e7c648df3

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-12T23:44:30Z

Integration test results for 977d572: success 🎉
Log output: https://gist.github.com/a3ab6cec3cae88562ce8fc922e0e4b5f

ihcsim

Some questions and comments below.

ihcsim · 2019-04-12T23:40:35Z

test/serviceprofiles/serviceprofiles_test.go

-			cliLines = append(cliLines, routes)
-		}
+	if isWideOutput {
+		cmd = append(cmd, "-owide")


Probably not needed, unless you want to add an else to the --output json below. The JSON output already contains the extra effective_* and actual_* fields.

Ah nice! thanks for pointing that out, I had no idea.

ihcsim · 2019-04-12T23:42:56Z

test/serviceprofiles/serviceprofiles_test.go

+			case "timeouts", "budgets":
+				// If the P99 latency is greater than 500ms retries are probably happening before applying
+				// the service profile and we can't reliably test the service profile.
+				assertion.assertFunc = func(rt *rowStat) bool { return rt.LatencyP99 < 500 }


I think checking latency is a bit unreliable. If the intent here is to check that retries aren't happening, can we just check if effective_success and actual_success is zero (since --failure-rate=1.0)?

Good point. Testing for latency is pretty finicky, however, the behaviour we are testing is timeouts. The intent for this test case was to ensure that we aren't already hitting the retry timeout that we will eventually test for. My thinking here was, if we are already seeing timeouts surpassing the threshold even before applying the service profile with the intended timeout, we can't reliably prove that the service profile is doing its job after it's applied.

Another interesting thing here is with --failure-rate=1.0, we will still see effective_success and actual_success be zero even after we've applied the service profile and the service is retrying requests. Similar to your suggestion, I think comparing actual_rps and effective_rps would suffice. 🤔

ihcsim · 2019-04-12T23:45:38Z

test/serviceprofiles/serviceprofiles_test.go

+				// set in in service profile. hello-timeouts-service and hello-budgets always fails
+				// so we expect all request latencies to be greater than or equal to the timeout set.
+			case "timeouts", "budgets":
+				assertion.assertFunc = func(rt *rowStat) bool { return rt.LatencyP99 >= 500 }


Similar to above, checking latency can be unreliable. If the intent here is to check retries happened, but requests continue to fail because of --failure-rate=1.0, we can try to see if this works:

Configure slow cooker to output a fixed number of requests via the -totalRequests flag

Make sure effective_success and actual_success are zero

Use linkerd top to check the count column for retried requests

From my understanding of the initial ticket, I thought we were trying to specifically test the timeouts behavior. Maybe, since its unreliable we should just test that retries are being performed?

Maybe, since its unreliable we should just test that retries are being performed?

SGTM, unless others have a way to do this. It isn't easy to reliably test timeouts without context. So far, I can't find anything in the service profile and route code that will be useful here, based on the little that I know.

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-15T17:50:01Z

Integration test results for 11a8396: success 🎉
Log output: https://gist.github.com/b859db1fec776fe0c153bffd79394b11

Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-17T22:28:17Z

Integration test results for 6d3ba84: success 🎉
Log output: https://gist.github.com/1c0b943503be67f3d5a3334b7a22b89b

siggy

looks good! mostly tioli comments, good to go following ivan's comment around wide and a lint fix.

siggy · 2019-04-17T22:15:56Z

test/serviceprofiles/serviceprofiles_test.go


 var TestHelper *testutil.TestHelper

+type rowStat struct {


TIOLI, you could make jsonRouteStats (from the cmd package) public and re-use that.

siggy · 2019-04-17T22:17:07Z

test/serviceprofiles/serviceprofiles_test.go

 			sourceName:     "tap",
 			namespace:      testNamespace,
-			deployName:     "deploy/t1",
+			deployName:     "deployment/t1",


curious: was this change necessary to match against returned data from a routes command?

Yea, I had to use the full deployment name in order to get stuff back. I can make a change to the routes command to accept both deploy and deployment

siggy · 2019-04-17T22:18:52Z

test/serviceprofiles/testdata/tap_application.yaml

        ports:
        - containerPort: 9999
      restartPolicy: OnFailure
+


siggy · 2019-04-17T22:29:13Z

test/serviceprofiles/serviceprofiles_test.go

+
+			profile := &sp.ServiceProfile{}
+
+			// Grab the output and convert it to a service profile object for modification


siggy · 2019-04-17T22:39:28Z

test/serviceprofiles/serviceprofiles_test.go

+	}
+}
+
+func assertRouteStat(assertion *routeStatAssertion, t *testing.T, assertFn func(stat *rowStat) error) {


tioli: since routeStatAssertion is only used in one place, i'd probably just pass upstream, downstream, and namespace directly into assertRouteStat

Signed-off-by: Dennis Adjei-Baah <[email protected]>

Signed-off-by: Dennis Adjei-Baah <[email protected]> Signed-off-by: Dennis Adjei-Baah <[email protected]>

l5d-bot · 2019-04-18T18:09:00Z

Integration test results for 6496635: success 🎉
Log output: https://gist.github.com/af38ec7434cea74da468c89da602be7a

Dennis Adjei-Baah added 2 commits April 10, 2019 15:06

add integration tests for retryable requests

4bf8eb0

Signed-off-by: Dennis Adjei-Baah <[email protected]>

refactor service profile metrics tests

26207e0

Signed-off-by: Dennis Adjei-Baah <[email protected]>

dadjeibaah requested review from ihcsim and siggy April 11, 2019 01:09

dadjeibaah self-assigned this Apr 11, 2019

make golint happy

5688cef

Signed-off-by: Dennis Adjei-Baah <[email protected]>

add swagger file for world svc, modify assertion for retries testcase

5e9096e

Signed-off-by: Dennis Adjei-Baah <[email protected]>

siggy added area/test area/profiles labels Apr 11, 2019

trigger l5d-bot

c20eca0

Signed-off-by: Dennis Adjei-Baah <[email protected]>

dadjeibaah force-pushed the dad/sp-metrics-integration-tests branch from 140c733 to c20eca0 Compare April 12, 2019 05:02

increase retry duration for sp test

88ba27b

Signed-off-by: Dennis Adjei-Baah <[email protected]>

increase timeout to 1 minute

9e834d9

Signed-off-by: Dennis Adjei-Baah <[email protected]>

more timeout

192864e

Signed-off-by: Dennis Adjei-Baah <[email protected]>

Dennis Adjei-Baah added 2 commits April 12, 2019 11:57

rewrite assertions to make more sense

24996bb

Signed-off-by: Dennis Adjei-Baah <[email protected]>

remove extraneous fields to rowstat objects

0eddb5b

Signed-off-by: Dennis Adjei-Baah <[email protected]>

add additional test for retry budgets

977d572

Signed-off-by: Dennis Adjei-Baah <[email protected]>

ihcsim reviewed Apr 13, 2019

View reviewed changes

make test names constants

11a8396

Signed-off-by: Dennis Adjei-Baah <[email protected]>

refactor integration tests

6d3ba84

Signed-off-by: Dennis Adjei-Baah <[email protected]>

siggy approved these changes Apr 17, 2019

View reviewed changes

Dennis Adjei-Baah added 2 commits April 18, 2019 09:47

remove routeAssertion struct

dc6254c

Signed-off-by: Dennis Adjei-Baah <[email protected]>

address PR feedback

6496635

Signed-off-by: Dennis Adjei-Baah <[email protected]> Signed-off-by: Dennis Adjei-Baah <[email protected]>

dadjeibaah merged commit be61465 into master Apr 18, 2019

dadjeibaah deleted the dad/sp-metrics-integration-tests branch April 18, 2019 18:01

admc mentioned this pull request Apr 18, 2019

Linkerd 2.3 release test plan #2459

Closed

25 tasks


		profile := &sp.ServiceProfile{}

		// Grab the output and convert it to a service profile object for modification

Conversation

dadjeibaah commented Apr 11, 2019

Uh oh!

l5d-bot commented Apr 11, 2019

Uh oh!

l5d-bot commented Apr 11, 2019

Uh oh!

l5d-bot commented Apr 11, 2019

Uh oh!

l5d-bot commented Apr 12, 2019

Uh oh!

l5d-bot commented Apr 12, 2019

Uh oh!

l5d-bot commented Apr 12, 2019

Uh oh!

l5d-bot commented Apr 12, 2019

Uh oh!

l5d-bot commented Apr 12, 2019

Uh oh!

l5d-bot commented Apr 12, 2019

Uh oh!

ihcsim left a comment

Choose a reason for hiding this comment

Uh oh!

ihcsim Apr 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l5d-bot commented Apr 15, 2019

Uh oh!

l5d-bot commented Apr 17, 2019

Uh oh!

siggy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

l5d-bot commented Apr 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ihcsim Apr 12, 2019 •

edited

Loading