Skip to content

bug: Okta source plugin rate limiting #10570

@pkaeding

Description

@pkaeding

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The Okta source plugin gets rate limited, and loses some data.

The logs have messages like this:

{
    "level": "error",
    "module": "okta-src",
    "client": "okta",
    "error": "too many requests",
    "message": "table resolver finished with error",
    "table": "okta_application_group_assignments",
    "time": "2023-05-05T14:12:50Z"
}

Expected Behavior

If the service (okta in this case) rate limits CQ< it should backoff and retry, and not complete until all the data is gathered (or retries are exhausted).

CloudQuery (redacted) config

kind: source
spec:
  # Source spec section
  name: okta
  path: cloudquery/okta
  version: "v2.2.4"
  tables: ["*"]
  destinations: ["s3"]
  spec:
    # Required. Your Okta domain name
    domain: "https://${OKTA_DOMAIN}/"
    # Optional. Okta Token to access API, you can set this with OKTA_API_TOKEN environment variable
    # ⚠️ Warning - Your token should be kept secret and not committed to source control
    token: $${OKTA_TOKEN}
---
kind: destination
spec:
  name: "s3"
  path: "cloudquery/s3"
  version: "v3.1.2"
  write_mode: "append" # s3 only supports 'append' mode
  # batch_size: 10000 # optional
  # batch_size_bytes: 5242880 # optional
  spec:
    bucket: "${CQ_S3_BUCKET}"
    region: "${AWS_REGION}" # Example: us-east-1
    path: "cloudquery/{{TABLE}}/{{YEAR}}/{{MONTH}}/{{DAY}}/{{UUID}}.json"
    format: "json"
    athena: false # <- set this to true for Athena compatibility

Steps To Reproduce

  1. Run cloudquery sync
  2. Observe errors in the log
  3. Examine output table, find some missing rows (I noticed them in the join tables, eg okta_group_users, okta_application_group_assignments, etc)

CloudQuery (redacted) logs

{
    "level": "error",
    "module": "okta-src",
    "client": "okta",
    "error": "too many requests",
    "message": "table resolver finished with error",
    "table": "okta_application_group_assignments",
    "time": "2023-05-05T14:12:50Z"
}

CloudQuery version

2.5.3

Additional Context

Running in fargate, using ghcr.io/cloudquery/cloudquery:2.5

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions