Add secret detection logic to Azure service principal crawler#950
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #950 +/- ##
=======================================
Coverage 87.94% 87.94%
=======================================
Files 43 44 +1
Lines 5258 5285 +27
Branches 943 948 +5
=======================================
+ Hits 4624 4648 +24
- Misses 430 433 +3
Partials 204 204 ☔ View full report in Codecov by Sentry. |
| from databricks.labs.ucx.framework.crawlers import CrawlerBase, SqlBackend | ||
|
|
||
| SECRET_PATTERN = r"{{secrets\/(.*)\/(.*)}}" | ||
| TENANT_PATTERN = r"https:\/\/login.microsoftonline.com\/(.*)\/oauth2\/token" |
There was a problem hiding this comment.
Technically, this only works for public cloud, but not for govcloud or china
There was a problem hiding this comment.
expanded regex for govcloud & cn
| if len(secret_matched) == 0: | ||
| logger.warning('Secret in config stored in plaintext.') | ||
| else: | ||
| secret_scope, secret_key = secret_matched[0][0], secret_matched[0][1] |
There was a problem hiding this comment.
Nesting gets too deep, split this method up
58185b2 to
3e005fa
Compare
| storage_account = storage_account_matched.group(1).strip(".") | ||
| tenant_key = "fs.azure.account.oauth2.client.endpoint." + storage_account | ||
| # adjust the key to lookup for tenant id & secret id | ||
| tenant_key = tenant_key + f".{storage_account}" |
There was a problem hiding this comment.
| tenant_key = tenant_key + f".{storage_account}" | |
| tenant_key = | |
| f"{tenant_key}.{storage_account}" |
More readable this way
| client_secret_key = client_secret_key + f".{storage_account}" | ||
|
|
||
| # retrieve client secret of spn | ||
| matching_secret_keys = [key for key in config.keys() if re.search(client_secret_key, key)] |
There was a problem hiding this comment.
Do you need suffix matching or full regex? Regex may yield dot-related bugs
| from databricks.labs.ucx.assessment.jobs import JobsMixin | ||
| from databricks.labs.ucx.framework.crawlers import CrawlerBase, SqlBackend | ||
|
|
||
| SECRET_PATTERN = r"{{secrets\/(.*)\/(.*)}}" |
There was a problem hiding this comment.
Can you make these constants as private fields of a mixin class of they are not used elsewhere?
* Added secret detection logic to Azure service principal crawler ([#950](#950)). * Create storage credentials based on instance profiles and existing roles ([#869](#869)). * Enforced `protected-access` pylint rule ([#956](#956)). * Enforced `pylint` on unit and integration test code ([#953](#953)). * Enforcing `invalid-name` pylint rule ([#957](#957)). * Fixed AzureResourcePermissions.load to call Installation.load ([#962](#962)). * Fixed installer script to reuse an existing UCX Cluster policy if present ([#964](#964)). * More `pylint` tuning ([#958](#958)). * Refactor `workspace_client_mock` to have combine fixtures stored in separate JSON files ([#955](#955)). Dependency updates: * Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0 ([#961](#961)).
* Added secret detection logic to Azure service principal crawler ([#950](#950)). * Create storage credentials based on instance profiles and existing roles ([#869](#869)). * Enforced `protected-access` pylint rule ([#956](#956)). * Enforced `pylint` on unit and integration test code ([#953](#953)). * Enforcing `invalid-name` pylint rule ([#957](#957)). * Fixed AzureResourcePermissions.load to call Installation.load ([#962](#962)). * Fixed installer script to reuse an existing UCX Cluster policy if present ([#964](#964)). * More `pylint` tuning ([#958](#958)). * Refactor `workspace_client_mock` to have combine fixtures stored in separate JSON files ([#955](#955)). Dependency updates: * Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0 ([#961](#961)).
## Changes - Add client secret detection logic to Azure service principal crawler, this is needed for #874 ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [x] added unit tests - [x] added integration tests
* Added secret detection logic to Azure service principal crawler ([#950](#950)). * Create storage credentials based on instance profiles and existing roles ([#869](#869)). * Enforced `protected-access` pylint rule ([#956](#956)). * Enforced `pylint` on unit and integration test code ([#953](#953)). * Enforcing `invalid-name` pylint rule ([#957](#957)). * Fixed AzureResourcePermissions.load to call Installation.load ([#962](#962)). * Fixed installer script to reuse an existing UCX Cluster policy if present ([#964](#964)). * More `pylint` tuning ([#958](#958)). * Refactor `workspace_client_mock` to have combine fixtures stored in separate JSON files ([#955](#955)). Dependency updates: * Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0 ([#961](#961)).
Changes
Tests