Skip to content

Conversation

@bbernays
Copy link
Collaborator

@bbernays bbernays commented May 4, 2023

Summary

In this PR we add experimental support for 2 tables (aws_inspector2_findings and aws_cloudtrail_events).

TableOptions is a nested structure that uses the CQ table name as the first level key. After that each table uses the aws API call to hold the actual API input. So in JSON form the object looks like this:

{
    "table_options":{
        "aws_inspector2_findings":{
            "ListFindings":{}
        }
        "aws_cloudtrail_events":{
            "LookupEvents":{}
        }
    }
}
  spec:
    table_options:
      aws_inspector2_findings:
        list_findings:
          filter_criteria:
            finding_status:
              - comparison: EQUALS
                value: ACTIVE
      aws_accessanalyzer_analyzer_findings:
        list_findings:
          
      aws_cloudtrail_events:
        lookup_events:
          start_time: 2023-05-01T20:20:52Z
          end_time:   2023-05-03T20:20:52Z
          lookup_attributes:
            - attribute_key: EventName
              attribute_value: RunInstances

Update:

  1. In order to maintain all inputs in snake case, I wrote a script that inputs the input struct from the aws sdk and makes a local copy of the struct that uses the SDK caser to ensure consistent names
  2. All of the logic for accessing the inputs is in its own package under the client
  3. All of the validation and copying from one struct type to the sdk type is validated with unit tests

@hermanschaaf
Copy link
Member

Nice, I like this, and it's also a really unexpectedly small code change 😄

The only issue I see that I'd like to discuss a bit, is what if users want to apply multiple filters? For example, for aws_cloudtrail_events, LookupAttributes are unfortunately limited to a single key and value ("Currently the list can contain only one item."). If CQ users wanted to sync more... well, they'd need to do that in an entirely different sync.

One could argue that this is a limitation of the AWS API, take it up with them. But on the other hand, we do have ways to deal with this. For example, what if instead of using:

type ctAPIs struct {
	LookupEventsOpts cloudtrail.LookupEventsInput `json:"LookupEvents,omitempty"`
}

we did:

type ctAPIs struct {
	LookupEventsOpts []cloudtrail.LookupEventsInput `json:"LookupEvents,omitempty"`
}

(so LookupEvents becomes a slice). Then the config becomes this:

  spec:
    table_options:
      aws_cloudtrail_events:
        LookupEvents:
          - StartTime: 2023-05-01T20:20:52Z
            EndTime:   2023-05-03T20:20:52Z
            LookupAttributes:
              - AttributeKey: EventName
                AttributeValue: RunInstances

Which is almost exactly the same as before, except there's an extra dash. However it can then be used in this way:

  spec:
    table_options:
      aws_cloudtrail_events:
        LookupEvents:
          - StartTime: 2023-05-01T20:20:52Z
            EndTime:   2023-05-03T20:20:52Z
            LookupAttributes:
              - AttributeKey: EventName
                AttributeValue: RunInstances
          - StartTime: 2023-05-01T20:20:52Z
            EndTime:   2023-05-03T20:20:52Z
            LookupAttributes:
              - AttributeKey: EventName
                AttributeValue: StartInstances
          - StartTime: 2023-05-01T20:20:52Z
            EndTime:   2023-05-03T20:20:52Z
            LookupAttributes:
              - AttributeKey: EventName
                AttributeValue: StopInstances

You'd have to repeat the start time/end time if you need this, but at least you could then fetch all the events you need in a single sync. Inside the resolver, we could loop over each entry in the list and do them one-by-one (probably in separate goroutines, but not necessarily).

What do you think?

@bbernays
Copy link
Collaborator Author

bbernays commented May 4, 2023

@hermanschaaf - I like the idea of taking in an array of input arguments and then iterating over them! I think if we do it for a single table we should support it for all tables/APIs that support user provided inputs. One other thing that this will possibly generate will be duplicate data as the filters could theoretically overlap. Is that a concern?

@disq
Copy link
Member

disq commented May 4, 2023

Where do we stand on account-level filtering of these options? Each account gets its own list of TableOptions. But then you'd need catch-all etc. as well so it could get out of hand quickly. Maybe some kind of an account-id-match "metafield" can be included, which could take wildcards or multiple account ids. In a future iteration.

@bbernays
Copy link
Collaborator Author

bbernays commented May 5, 2023

Each account gets its own list of TableOptions

At this time I think we should start with just a single "global" TableOptions if users need an account specific filters for specific accounts/regions they can use multiple source blocks

@bbernays
Copy link
Collaborator Author

bbernays commented May 5, 2023

I will not be including the documentation in this PR. I am working on a follow up PR that will solely be the documentation

@bbernays bbernays linked an issue May 5, 2023 that may be closed by this pull request
1 task
svc := c.Services().Accessanalyzer
paginator := accessanalyzer.NewListAnalyzersPaginator(svc, &accessanalyzer.ListAnalyzersInput{})
for paginator.HasMorePages() {
// no need to override API call options anymore: https://github.com/aws/aws-sdk-go-v2/issues/1260
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we used to override the region due to a bug in the sdk, now we do it because our sdk clients aren't per account+region...

@hermanschaaf
Copy link
Member

I had a few comments on the codegen, but hoping we can get away with a shorter solution that doesn't require any codegen; here's a branch that demonstrates the idea (and some other changes): https://github.com/cloudquery/cloudquery/compare/aws-inputs-simplified

The basic idea is to use a custom JSON Unmarshal function that first unmarshals to a map[string]any, then transforms all the key names from snake to camel, then marshalling the map back to JSON and calling the default config object unmarshal on that instead. It seems to work in my testing and it's relatively simple I think

@bbernays
Copy link
Collaborator Author

bbernays commented May 9, 2023

@hermanschaaf - I am fine with your approach... In the end both approaches achieve the same thing with the exact same public interface. The difference that I see between the two solutions is the codegen solutions handles all of the name changing at build time rather than run time. Having access to the structs that define TableOptions might be important when it comes to documenting the object.

Copy link
Member

@hermanschaaf hermanschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the update! I think the only thing remaining now is to use a list instead of individual objects so that you can combine multiple filters.

@hermanschaaf
Copy link
Member

@bbernays I pushed a couple of commits to the branch:

  • A test to validate the snake case JSON parsing is working as expected
  • A fix for an edge case discovered by the above
  • Renaming table_options package tableoptions (not great for readability, but more consistent with our other packages)
  • A commit to use lists in the config like we discussed 600879c This was a bit trickier than I thought for Cloudtrail events, because there we have to carefully combine it with the state backend. I kept it backwards-compatible, but please have a look there and let me know whether you agree this is the right way to implement it. None of these are using additional goroutines yet. I think that's okay for now

Copy link
Member

@hermanschaaf hermanschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so the PR is unblocked; since I pushed a couple of commits myself though it would be good to get a review of that :)

@hermanschaaf hermanschaaf added the automerge Automatically merge once required checks pass label May 16, 2023
@kodiakhq kodiakhq bot merged commit 161b11b into cloudquery:main May 16, 2023
kodiakhq bot pushed a commit that referenced this pull request May 16, 2023
🤖 I have created a release *beep* *boop*
---


## [17.3.0](plugins-source-aws-v17.2.0...plugins-source-aws-v17.3.0) (2023-05-16)


### This Release has the Following Changes to Tables
- Table `aws_acmpca_certificate_authorities` was added
- Table `aws_cloudformation_stack_templates` was added
- Table `aws_cloudformation_template_summaries` was added
- Table `aws_config_config_rule_compliance_details` was added
- Table `aws_efs_access_points` was added

### Features

* Add AWS Config Compliance Details table ([#10544](#10544)) ([9b43a2a](9b43a2a))
* Add Cloudformation Template Summaries table ([#10571](#10571)) ([3515db9](3515db9))
* **aws-services:** Support newly added regions ([#10806](#10806)) ([52b5e0f](52b5e0f))
* **aws:** Add private certificate authorities ([#10691](#10691)) ([9c97e85](9c97e85))
* **aws:** Add Support for Cloudformation Templates ([#10701](#10701)) ([7a23c2e](7a23c2e))
* **aws:** Add support for EFS Access Point ([#10803](#10803)) ([d994c85](d994c85))
* **aws:** Support Table level inputs ([#10564](#10564)) ([161b11b](161b11b))
* **deps:** Upgrade to Apache Arrow v13 (latest `cqmain`) ([#10605](#10605)) ([a55da3d](a55da3d))


### Bug Fixes

* **aws-policies:** Api Gateway stage logging for REST ([#10625](#10625)) ([f0d6f57](f0d6f57))
* **aws-policies:** Api Gateway stage logging for websockets ([#10702](#10702)) ([e667400](e667400))
* **aws:** Change column type of `aws_cloudformation_stack_templates.template_body` ([#10752](#10752)) ([75b9785](75b9785))
* **aws:** Fix the case where resrouce_id is null in ECS.2 of foundational policy ([#10692](#10692)) ([f5cf2d8](f5cf2d8))
* **aws:** Handle Cloudfront Regions in different partitions ([#10690](#10690)) ([158aab1](158aab1))
* **deps:** Update module github.com/aws/aws-sdk-go-v2/config to v1.18.25 ([#10786](#10786)) ([caca1a4](caca1a4))
* **deps:** Update module github.com/aws/aws-sdk-go-v2/service/acm to v1.17.11 ([#10789](#10789)) ([9122f84](9122f84))
* **deps:** Update module github.com/aws/aws-sdk-go-v2/service/amp to v1.16.11 ([#10790](#10790)) ([431905f](431905f))
* **deps:** Update module github.com/aws/aws-sdk-go-v2/service/amplify to v1.13.10 ([#10791](#10791)) ([81d175b](81d175b))
* **deps:** Update module github.com/cloudquery/plugin-pb-go to v1.0.8 ([#10798](#10798)) ([27ff430](27ff430))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
kodiakhq bot pushed a commit that referenced this pull request May 16, 2023
@bbernays bbernays deleted the aws-inputs branch July 6, 2023 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge Automatically merge once required checks pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Support Filtering Access Analyzer Findings feat: Configuration options for aws_cloudtrail_events table

4 participants