Skip to content

Refactor: CrawlerBase and PermissionsManager snapshotting pattern#2402

Merged
nfx merged 15 commits intomainfrom
crawler-snapshot-refactor
Sep 6, 2024
Merged

Refactor: CrawlerBase and PermissionsManager snapshotting pattern#2402
nfx merged 15 commits intomainfrom
crawler-snapshot-refactor

Conversation

@asnare
Copy link
Copy Markdown
Contributor

@asnare asnare commented Aug 9, 2024

Changes

This PR:

  • Refactors the existing permissions crawler so that it uses the same fetch/crawl/snapshot pattern as the rest of the crawlers.
  • Removes the permissions crawler's bespoke .cleanup() implementation in favour of the .reset() method already available on the base class.
  • Some dead code was removed on the permissions crawler: .load_for_all()
  • Specifies .snapshot() as an interface (on CrawlerBase) that all crawlers need to support.

Linked issues

Progresses #2074, by laying the ground for crawlers to support refreshing.

Note that this PR and #2392 have a behaviour conflict that will trigger a failure in test_runtime_crawl_permissions if either is merged: resolving this is trivial.

Functionality

  • modified existing workflow: assessment

Tests

  • updated unit tests
  • updated integration tests

asnare added 3 commits August 9, 2024 15:25
…hot pattern as the other crawlers.

The PermissionsCrawler predates the CrawlerBase implementation and hadn't been updated to use the current pattern.
@asnare asnare added the internal this pull request won't appear in release notes label Aug 9, 2024
@asnare asnare self-assigned this Aug 9, 2024
@asnare asnare requested review from a team and vsevolodstep-db August 9, 2024 14:09
@asnare asnare changed the title Crawler snapshot refactor Refactor: CrawlerBase and PermissionsManager snapshotting pattern Aug 9, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 9, 2024

✅ 152/152 passed, 19 skipped, 2h49m32s total

Running from acceptance #5565

## Changes

This PR is stacked on top of #2402 and is further refactors the crawler
snapshotting pattern to use common code without changing the existing
system behaviour. This is intended as groundwork to the next step,
whereby `snapshot()` will be updated to support refreshing the
inventory.

### Linked issues

Progresses #2074, stacked on top of #2402.

### Tests

- existing unit tests (with minor updates)
- existing integration tests
@nfx nfx temporarily deployed to account-admin August 12, 2024 16:22 — with GitHub Actions Inactive
@asnare asnare requested review from JCZuurmond and nfx September 3, 2024 08:25
Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one tiny change


def has_groups(self) -> bool:
return len(self.snapshot()) > 0
first_element = islice(self.snapshot(), 1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the change on this method? i find it risky. it's fine to load snapshot twice, as it's materialized

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted it.

This was intended to be a poor-man's limit 1 because .snapshot() is normally lazy, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pfft. And I just noticed the method isn't actually called from anywhere (including tests). So I'm just going to nuke it entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and removed.

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx merged commit 92a8f47 into main Sep 6, 2024
@nfx nfx deleted the crawler-snapshot-refactor branch September 6, 2024 14:12
nfx pushed a commit that referenced this pull request Sep 10, 2024
## Changes

This PR updates some changes introduced in #2477 where it looks like
merging the changes from #2402 went awry.

### Linked issues

Progresses 2074, updates #2477.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal this pull request won't appear in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants