Skip to content

flaky agentmgr.test.js tests in Travis #84

@alexkli

Description

@alexkli

The tests in agentmgr.test.js have become very flaky (Dec 2020 after not many changes and Travis job runs since the 1.3.0 release in July). Possibly due to newer faster hardware in Travis or docker related changes. It can resolve on a re-run, but it takes a few times.

A similar flaky test was fixed in #82.

Example failed job: https://travis-ci.com/github/apache/openwhisk-wskdebug/jobs/464430797

Test failures:

  1) agentmgr
       should use non-concurrent agent if openwhisk does not support concurrency:
     Error: Timeout of 30000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/home/travis/build/apache/openwhisk-wskdebug/test/agentmgr.test.js)
  

  2) agentmgr
       should handle if the agent was left around from a previous run:
     Error: (HTTP code 500) server error - driver failed programming external connectivity on endpoint wskdebug-myaction-1608750742209 (70bb0576af18dc960ee9ed18f704100d9baffe0f994f783f49330e4bd99a36c7): Bind for 0.0.0.0:46747 failed: port is already allocated 
      at /home/travis/build/apache/openwhisk-wskdebug/node_modules/docker-modem/lib/modem.js:296:17
      at getCause (node_modules/docker-modem/lib/modem.js:326:7)
      at Modem.buildPayload (node_modules/docker-modem/lib/modem.js:295:5)
      at IncomingMessage.<anonymous> (node_modules/docker-modem/lib/modem.js:270:14)
      at endReadableNT (_stream_readable.js:1145:12)
      at process._tickCallback (internal/process/next_tick.js:63:19)

  3) agentmgr
       should remove backup action if --cleanup is set:
     Error: Unexpected error while polling agent for activation.
      at AgentMgr.waitForActivations (src/agentmgr.js:65:181)
      at process._tickCallback (internal/process/next_tick.js:68:7)

It seems the should use non-concurrent agent if openwhisk does not support concurrency fails to start or shutdown the docker container and times out waiting for that. This can then have detrimental effect on the subsequent tests that fail too, it seems.

it("should use non-concurrent agent if openwhisk does not support concurrency", async function() {
const action = "myaction";
const code = `const main = () => ({ msg: 'CORRECT' });`;
mockActivationDbAgent(action, code);
const argv = {
port: test.port,
action: "myaction"
};
const dbgr = new Debugger(argv);
await dbgr.start();
dbgr.run();
// wait a bit
await test.sleep(500);
await dbgr.stop();
test.assertAllNocksInvoked();
});

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtaskCI, administrative or other task

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions