Skip to content

bug: aws_lambda_function_versions node duplication in Neo4j #12391

@smokentar

Description

@smokentar

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

aws_lambda_function_versions -> new nodes are created on every CloudQuery sync even if the version of the lambda is the same.
This is due to the loading cypher query using _cq_id (which is always unique) as the single node property.
Suggestion to avoid this duplication is using the version and function_arn fields as node properties. This way when performing the merge if a version (eg. 2) of function_arn (eg. xxxx) exists it will not be re-created on each sync.

Current load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_lambda_function_versions {_cq_id: row._cq_id}) SET t = row"{...}

Expected Behavior

aws_lambda_function_versions nodes are unique.
A CloudQuery sync will not create a new node every time. Instead it will only create a node if one doesn't already exist with the specified properties.

Desired load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_lambda_function_versions {version: row.version, function_arn: row.function_arn}) SET t = row"{...}

CloudQuery (redacted) config

kind: source
spec:
  name: "aws"
  registry: "github"
  path: "cloudquery/aws"
  version: "v19.2.0"
  tables: ["*"]
  skip_tables:
    - aws_ec2_vpc_endpoint_services
    - aws_cloudtrail_events
    - aws_docdb_cluster_parameter_groups
    - aws_docdb_engine_versions
    - aws_ec2_instance_types
    - aws_elasticache_engine_versions
    - aws_elasticache_parameter_groups
    - aws_elasticache_reserved_cache_nodes_offerings
    - aws_elasticache_service_updates
    - aws_elasticsearch_versions
    - aws_neptune_cluster_parameter_groups
    - aws_neptune_db_parameter_groups
    - aws_rds_cluster_parameters
    - aws_rds_cluster_parameter_groups
    - aws_rds_cluster_parameter_group_parameters
    - aws_rds_db_parameter_groups
    - aws_rds_engine_versions
    - aws_servicequotas_services
    - aws_servicequotas_quotas
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
    - aws_iam_policy_last_accessed_details
    - aws_directconnect_locations
    - aws_ram_resource_types
    - aws_lambda_runtimes
    - aws_docdb_event_categories
  destinations: ["neo4j"]
  spec:
    accounts:
      - id: "${ACCOUNT_ID}"
        role_arn: "${ROLE_ARN}"
        role_session_name: "Discovery"
    aws_debug: false

kind: destination
spec:
  name: "neo4j"
  registry: "github"
  path: "cloudquery/neo4j"
  version: "v4.0.1"
  # batch_size: 10000 # optional
  # batch_size_bytes: 5242880 # optional
  spec:
    connection_string: "${DB_ENDPOINT}"
    username: "${DB_USERNAME}"
    password: "${DB_PASSWORD}"

Steps To Reproduce

No response

CloudQuery (redacted) logs

TRC RUN "UNWIND $rows AS row MERGE (t:aws_lambda_function_versions {_cq_id: row._cq_id}) SET t = row"{...}

CloudQuery version

v3.8.0

Additional Context

No response

Pull request (optional)

  • I can submit a pull request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions