Skip to content

bug: aws_dynamodb_table_continuous_backups node duplication in Neo4j #12395

@smokentar

Description

@smokentar

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

aws_dynamodb_table_continuous_backups -> new nodes are created on every CloudQuery sync even if a node for a given DynamoDB table already exists.
This is due to the loading cypher query using _cq_id (which is always unique) as the single node property.
Suggestion to avoid this duplication is using the table_arn field as node property (One DynamoDB table should have only one node for continuous backup config reported).
This way when performing the merge if a aws_dynamodb_table_continuous_backups node exists for a given table it will not be duplicated.

Current load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_dynamodb_table_continuous_backups {_cq_id: row._cq_id}) SET t = row"{...}

Expected Behavior

aws_dynamodb_table_continuous_backups nodes are unique.
A CloudQuery sync will not create a new node every time. Instead it will only create a node if one doesn't already exist with the specified properties.

Desired load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_dynamodb_table_continuous_backups {table_arn: row.table_arn}) SET t = row"{...}

CloudQuery (redacted) config

kind: source
spec:
  name: "aws"
  registry: "github"
  path: "cloudquery/aws"
  version: "v19.2.0"
  tables: ["*"]
  skip_tables:
    - aws_ec2_vpc_endpoint_services
    - aws_cloudtrail_events
    - aws_docdb_cluster_parameter_groups
    - aws_docdb_engine_versions
    - aws_ec2_instance_types
    - aws_elasticache_engine_versions
    - aws_elasticache_parameter_groups
    - aws_elasticache_reserved_cache_nodes_offerings
    - aws_elasticache_service_updates
    - aws_elasticsearch_versions
    - aws_neptune_cluster_parameter_groups
    - aws_neptune_db_parameter_groups
    - aws_rds_cluster_parameters
    - aws_rds_cluster_parameter_groups
    - aws_rds_cluster_parameter_group_parameters
    - aws_rds_db_parameter_groups
    - aws_rds_engine_versions
    - aws_servicequotas_services
    - aws_servicequotas_quotas
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
    - aws_iam_policy_last_accessed_details
    - aws_directconnect_locations
    - aws_ram_resource_types
    - aws_lambda_runtimes
    - aws_docdb_event_categories
  destinations: ["neo4j"]
  spec:
    accounts:
      - id: "${ACCOUNT_ID}"
        role_arn: "${ROLE_ARN}"
        role_session_name: "Discovery"
    aws_debug: false

kind: destination
spec:
  name: "neo4j"
  registry: "github"
  path: "cloudquery/neo4j"
  version: "v4.0.1"
  # batch_size: 10000 # optional
  # batch_size_bytes: 5242880 # optional
  spec:
    connection_string: "${DB_ENDPOINT}"
    username: "${DB_USERNAME}"
    password: "${DB_PASSWORD}"

Steps To Reproduce

No response

CloudQuery (redacted) logs

TRC RUN "UNWIND $rows AS row MERGE (t:aws_dynamodb_table_continuous_backups {_cq_id: row._cq_id}) SET t = row"{...}

CloudQuery version

v3.8.0

Additional Context

No response

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions