-
Notifications
You must be signed in to change notification settings - Fork 547
Description
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
aws_regions -> new set of regions are created on every CloudQuery sync even if they already exist for a given AWS Account ID.
This is due to the loading cypher query using _cq_id (which is always unique) as the single node property.
Suggestion to avoid this duplication is using the account_id and region fields as node properties.
One AWS Account should only have one set of regions discovered, currently 27.
With this setting, new set of regions (27) will still be discovered for different accounts (as they can have different opt_in_status / enabled values), but one account will only have one set.
Current load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_regions {_cq_id: row._cq_id}) SET t = row"{...}
Expected Behavior
Each discovered AWS Account has one set of regions (27) discovered.
Desired load cypher query:
TRC RUN "UNWIND $rows AS row MERGE (t:aws_regions {account_id: row.account_id, region: row.region}) SET t = row"{...}
CloudQuery (redacted) config
kind: source
spec:
name: "aws"
registry: "github"
path: "cloudquery/aws"
version: "v19.2.0"
tables: ["*"]
skip_tables:
- aws_ec2_vpc_endpoint_services
- aws_cloudtrail_events
- aws_docdb_cluster_parameter_groups
- aws_docdb_engine_versions
- aws_ec2_instance_types
- aws_elasticache_engine_versions
- aws_elasticache_parameter_groups
- aws_elasticache_reserved_cache_nodes_offerings
- aws_elasticache_service_updates
- aws_elasticsearch_versions
- aws_neptune_cluster_parameter_groups
- aws_neptune_db_parameter_groups
- aws_rds_cluster_parameters
- aws_rds_cluster_parameter_groups
- aws_rds_cluster_parameter_group_parameters
- aws_rds_db_parameter_groups
- aws_rds_engine_versions
- aws_servicequotas_services
- aws_servicequotas_quotas
- aws_iam_role_last_accessed_details
- aws_iam_user_last_accessed_details
- aws_iam_policy_last_accessed_details
- aws_directconnect_locations
- aws_ram_resource_types
- aws_lambda_runtimes
- aws_docdb_event_categories
destinations: ["neo4j"]
spec:
accounts:
- id: "${ACCOUNT_ID}"
role_arn: "${ROLE_ARN}"
role_session_name: "Discovery"
aws_debug: false
kind: destination
spec:
name: "neo4j"
registry: "github"
path: "cloudquery/neo4j"
version: "v4.0.1"
# batch_size: 10000 # optional
# batch_size_bytes: 5242880 # optional
spec:
connection_string: "${DB_ENDPOINT}"
username: "${DB_USERNAME}"
password: "${DB_PASSWORD}"
Steps To Reproduce
No response
CloudQuery (redacted) logs
TRC RUN "UNWIND $rows AS row MERGE (t:aws_regions {_cq_id: row._cq_id}) SET t = row"{...}
CloudQuery version
v3.8.0
Additional Context
No response
Pull request (optional)
- I can submit a pull request