Skip to content

Add secret detection logic to Azure service principal crawler#950

Merged
nfx merged 6 commits intomainfrom
fix/spn-secret
Feb 19, 2024
Merged

Add secret detection logic to Azure service principal crawler#950
nfx merged 6 commits intomainfrom
fix/spn-secret

Conversation

@nkvuong
Copy link
Copy Markdown
Contributor

@nkvuong nkvuong commented Feb 15, 2024

Changes

Tests

  • manually tested
  • added unit tests
  • added integration tests

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 15, 2024

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (fc48c6f) 87.94% compared to head (a1ee029) 87.94%.

Files Patch % Lines
src/databricks/labs/ucx/assessment/secrets.py 91.48% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #950   +/-   ##
=======================================
  Coverage   87.94%   87.94%           
=======================================
  Files          43       44    +1     
  Lines        5258     5285   +27     
  Branches      943      948    +5     
=======================================
+ Hits         4624     4648   +24     
- Misses        430      433    +3     
  Partials      204      204           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

from databricks.labs.ucx.framework.crawlers import CrawlerBase, SqlBackend

SECRET_PATTERN = r"{{secrets\/(.*)\/(.*)}}"
TENANT_PATTERN = r"https:\/\/login.microsoftonline.com\/(.*)\/oauth2\/token"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this only works for public cloud, but not for govcloud or china

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expanded regex for govcloud & cn

if len(secret_matched) == 0:
logger.warning('Secret in config stored in plaintext.')
else:
secret_scope, secret_key = secret_matched[0][0], secret_matched[0][1]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nesting gets too deep, split this method up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@nkvuong nkvuong marked this pull request as ready for review February 19, 2024 10:30
@nkvuong nkvuong requested review from a team and prajin-29 February 19, 2024 10:30
storage_account = storage_account_matched.group(1).strip(".")
tenant_key = "fs.azure.account.oauth2.client.endpoint." + storage_account
# adjust the key to lookup for tenant id & secret id
tenant_key = tenant_key + f".{storage_account}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tenant_key = tenant_key + f".{storage_account}"
tenant_key =
f"{tenant_key}.{storage_account}"

More readable this way

client_secret_key = client_secret_key + f".{storage_account}"

# retrieve client secret of spn
matching_secret_keys = [key for key in config.keys() if re.search(client_secret_key, key)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need suffix matching or full regex? Regex may yield dot-related bugs

from databricks.labs.ucx.assessment.jobs import JobsMixin
from databricks.labs.ucx.framework.crawlers import CrawlerBase, SqlBackend

SECRET_PATTERN = r"{{secrets\/(.*)\/(.*)}}"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make these constants as private fields of a mixin class of they are not used elsewhere?

@nfx nfx merged commit 93a6248 into main Feb 19, 2024
@nfx nfx deleted the fix/spn-secret branch February 19, 2024 12:16
qziyuan added a commit that referenced this pull request Feb 20, 2024
qziyuan added a commit that referenced this pull request Feb 20, 2024
nfx added a commit that referenced this pull request Feb 21, 2024
* Added secret detection logic to Azure service principal crawler ([#950](#950)).
* Create storage credentials based on instance profiles and existing roles ([#869](#869)).
* Enforced `protected-access` pylint rule ([#956](#956)).
* Enforced `pylint` on unit and integration test code ([#953](#953)).
* Enforcing `invalid-name` pylint rule ([#957](#957)).
* Fixed AzureResourcePermissions.load to call Installation.load ([#962](#962)).
* Fixed installer script to reuse an existing UCX Cluster policy if present ([#964](#964)).
* More `pylint` tuning ([#958](#958)).
* Refactor `workspace_client_mock` to have combine fixtures stored in separate JSON files ([#955](#955)).

Dependency updates:

 * Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0 ([#961](#961)).
@nfx nfx mentioned this pull request Feb 21, 2024
nfx added a commit that referenced this pull request Feb 21, 2024
* Added secret detection logic to Azure service principal crawler
([#950](#950)).
* Create storage credentials based on instance profiles and existing
roles ([#869](#869)).
* Enforced `protected-access` pylint rule
([#956](#956)).
* Enforced `pylint` on unit and integration test code
([#953](#953)).
* Enforcing `invalid-name` pylint rule
([#957](#957)).
* Fixed AzureResourcePermissions.load to call Installation.load
([#962](#962)).
* Fixed installer script to reuse an existing UCX Cluster policy if
present ([#964](#964)).
* More `pylint` tuning
([#958](#958)).
* Refactor `workspace_client_mock` to have combine fixtures stored in
separate JSON files
([#955](#955)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0
([#961](#961)).
qziyuan added a commit that referenced this pull request Feb 21, 2024
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
## Changes
- Add client secret detection logic to Azure service principal crawler,
this is needed for #874

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [x] added unit tests
- [x] added integration tests
dmoore247 pushed a commit that referenced this pull request Mar 23, 2024
* Added secret detection logic to Azure service principal crawler
([#950](#950)).
* Create storage credentials based on instance profiles and existing
roles ([#869](#869)).
* Enforced `protected-access` pylint rule
([#956](#956)).
* Enforced `pylint` on unit and integration test code
([#953](#953)).
* Enforcing `invalid-name` pylint rule
([#957](#957)).
* Fixed AzureResourcePermissions.load to call Installation.load
([#962](#962)).
* Fixed installer script to reuse an existing UCX Cluster policy if
present ([#964](#964)).
* More `pylint` tuning
([#958](#958)).
* Refactor `workspace_client_mock` to have combine fixtures stored in
separate JSON files
([#955](#955)).

Dependency updates:

* Updated databricks-sdk requirement from ~=0.19.0 to ~=0.20.0
([#961](#961)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants