-
Notifications
You must be signed in to change notification settings - Fork 547
Description
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
aws_lambda_function_versions -> new nodes are created on every CloudQuery sync even if the version of the lambda is the same.
This is due to the loading cypher query using _cq_id (which is always unique) as the single node property.
Suggestion to avoid this duplication is using the version and function_arn fields as node properties. This way when performing the merge if a version (eg. 2) of function_arn (eg. xxxx) exists it will not be re-created on each sync.
Current load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_lambda_function_versions {_cq_id: row._cq_id}) SET t = row"{...}
Expected Behavior
aws_lambda_function_versions nodes are unique.
A CloudQuery sync will not create a new node every time. Instead it will only create a node if one doesn't already exist with the specified properties.
Desired load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_lambda_function_versions {version: row.version, function_arn: row.function_arn}) SET t = row"{...}
CloudQuery (redacted) config
kind: source
spec:
name: "aws"
registry: "github"
path: "cloudquery/aws"
version: "v19.2.0"
tables: ["*"]
skip_tables:
- aws_ec2_vpc_endpoint_services
- aws_cloudtrail_events
- aws_docdb_cluster_parameter_groups
- aws_docdb_engine_versions
- aws_ec2_instance_types
- aws_elasticache_engine_versions
- aws_elasticache_parameter_groups
- aws_elasticache_reserved_cache_nodes_offerings
- aws_elasticache_service_updates
- aws_elasticsearch_versions
- aws_neptune_cluster_parameter_groups
- aws_neptune_db_parameter_groups
- aws_rds_cluster_parameters
- aws_rds_cluster_parameter_groups
- aws_rds_cluster_parameter_group_parameters
- aws_rds_db_parameter_groups
- aws_rds_engine_versions
- aws_servicequotas_services
- aws_servicequotas_quotas
- aws_iam_role_last_accessed_details
- aws_iam_user_last_accessed_details
- aws_iam_policy_last_accessed_details
- aws_directconnect_locations
- aws_ram_resource_types
- aws_lambda_runtimes
- aws_docdb_event_categories
destinations: ["neo4j"]
spec:
accounts:
- id: "${ACCOUNT_ID}"
role_arn: "${ROLE_ARN}"
role_session_name: "Discovery"
aws_debug: false
kind: destination
spec:
name: "neo4j"
registry: "github"
path: "cloudquery/neo4j"
version: "v4.0.1"
# batch_size: 10000 # optional
# batch_size_bytes: 5242880 # optional
spec:
connection_string: "${DB_ENDPOINT}"
username: "${DB_USERNAME}"
password: "${DB_PASSWORD}"
Steps To Reproduce
No response
CloudQuery (redacted) logs
TRC RUN "UNWIND $rows AS row MERGE (t:aws_lambda_function_versions {_cq_id: row._cq_id}) SET t = row"{...}
CloudQuery version
v3.8.0
Additional Context
No response
Pull request (optional)
- I can submit a pull request