Skip to content

bug: MongoDB destination plugin does not create indexes for primary keys #9614

@sulewicz

Description

@sulewicz

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

MongoDB destination plugin does not create indexes for primary keys which makes incremental syncs CPU intensive as COLLSCAN is executed for each write operation on the entire targetcollection.

Sample operation from performance logs:
Command

{ op: 'update',
  ns: '<redacted>.aws_iam_role_last_accessed_details',
  command: { 
    q: { arn: 'arn:aws:iam::<redacted>:role/<redacted>',
        service_namespace: 's3'
        },
     u: { '$set': {<redacted>}
        }
    }
}

Plan:

{ stage: 'UPDATE',
nReturned: 0,
executionTimeMillisEstimate: 0,
works: 2251,
advanced: 0,
needTime: 2250,
needYield: 0,
saveState: 2,
restoreState: 2,
isEOF: 1,
nMatched: 0,
nWouldModify: 0,
nWouldUpsert: 1,
inputStage: 
 { stage: 'COLLSCAN',
   filter: 
    { '$and': 
       [ { arn: { '$eq': 'arn:aws:iam::<redacted>:role/<redacted>' } },
         { service_namespace: { '$eq': 'servicequotas' } } ] },
   nReturned: 0,
   executionTimeMillisEstimate: 0,
   works: 2251,
   advanced: 0,
   needTime: 2250,
   needYield: 0,
   saveState: 2,
   restoreState: 2,
   isEOF: 1,
   direction: 'forward',
   docsExamined: 2249 } },

Expected Behavior

MongoDB destination plugin should automatically create indexes for primary keys on all collections.

CloudQuery (redacted) config

kind: source
spec:
  name: aws
  path: cloudquery/aws
  version: "v15.7.0"
  tables:
    - "aws_ec2_instances"
    - "aws_iam_users"
    - "aws_iam_roles"
  destinations: ["mongodb"]
  spec:
    regions:
      - us-east-1
    accounts:
      - id: "<redacted_account_id>"
---
kind: destination
spec:
  name: "mongodb"
  path: "cloudquery/mongodb"
  version: "v1.1.2"
  spec:
    connection_string: "mongodb://localhost:27017"
    database: "<redacted_db_name>"

Steps To Reproduce

Perform CloudQuery sync with the provided configuration.

CloudQuery (redacted) logs

N/A

CloudQuery version

2.5.2

Additional Context

No response

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions