Skip to content

bug: aws_s3_bucket_encryption_rules node duplication in Neo4j #12392

@smokentar

Description

@smokentar

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

aws_s3_bucket_encryption_rules -> new nodes are created on every CloudQuery sync even if a node for the S3 bucket already exists.
This is due to the loading cypher query using _cq_id (which is always unique) as the single node property.
Suggestion to avoid this duplication is using the bucket_arn field as node property (an s3 bucket can only have one set of encryption rules).
This way when performing the merge if an aws_s3_bucket_encryption_rules node exists for the bucket it will not be re-created on every sync.

Current load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_s3_bucket_encryption_rules {_cq_id: row._cq_id}) SET t = row" {...}

Expected Behavior

aws_s3_bucket_encryption_rules nodes are unique.
A CloudQuery sync will not create a new node every time. Instead it will only create a node if one doesn't already exist with the specified properties.

Desired load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_s3_bucket_encryption_rules {bucket_arn: row.bucket_arn}) SET t = row" {...}

CloudQuery (redacted) config

kind: source
spec:
  name: "aws"
  registry: "github"
  path: "cloudquery/aws"
  version: "v19.2.0"
  tables: ["*"]
  skip_tables:
    - aws_ec2_vpc_endpoint_services
    - aws_cloudtrail_events
    - aws_docdb_cluster_parameter_groups
    - aws_docdb_engine_versions
    - aws_ec2_instance_types
    - aws_elasticache_engine_versions
    - aws_elasticache_parameter_groups
    - aws_elasticache_reserved_cache_nodes_offerings
    - aws_elasticache_service_updates
    - aws_elasticsearch_versions
    - aws_neptune_cluster_parameter_groups
    - aws_neptune_db_parameter_groups
    - aws_rds_cluster_parameters
    - aws_rds_cluster_parameter_groups
    - aws_rds_cluster_parameter_group_parameters
    - aws_rds_db_parameter_groups
    - aws_rds_engine_versions
    - aws_servicequotas_services
    - aws_servicequotas_quotas
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
    - aws_iam_policy_last_accessed_details
    - aws_directconnect_locations
    - aws_ram_resource_types
    - aws_lambda_runtimes
    - aws_docdb_event_categories
  destinations: ["neo4j"]
  spec:
    accounts:
      - id: "${ACCOUNT_ID}"
        role_arn: "${ROLE_ARN}"
        role_session_name: "Discovery"
    aws_debug: false

kind: destination
spec:
  name: "neo4j"
  registry: "github"
  path: "cloudquery/neo4j"
  version: "v4.0.1"
  # batch_size: 10000 # optional
  # batch_size_bytes: 5242880 # optional
  spec:
    connection_string: "${DB_ENDPOINT}"
    username: "${DB_USERNAME}"
    password: "${DB_PASSWORD}"

Steps To Reproduce

No response

CloudQuery (redacted) logs

TRC RUN "UNWIND $rows AS row MERGE (t:aws_s3_bucket_encryption_rules {_cq_id: row._cq_id}) SET t = row" {...}

CloudQuery version

v3.8.0

Additional Context

No response

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions