Skip to content

Create inventory of service principals and direct files access in Azure #249

@birbalgithub

Description

@birbalgithub

In Azure, data access is authorized using service principals and it can be configured in either of the settings below-
Cluster conf
Spark session
Cluster policy
DLT settings

Today, there is no automated way of knowing how many service principal are being used in the workspace and where they are being set (cluster/cluster policies/dlt/notebook ?). As such, it is hard for the customer is figure out how many STORAGE CREDENTIALS needs to be created as part of UC upgrade. Furthermore, they could be accessing the files or mount points directly using service principal credentials and as such it is important to list all those files and mount points to estimates the EXTERNAL LOCATIONS that needs to be created. Additionally, it is also useful to list the cluster/cluster policies/dlt/notebook wherever the service principal credentials are being set to identify their owner/user. This would help customer map the permission to the required group on Storage Credential and External locations.

We need a feature in the tool to create an inventory of all service principals and direct files/mount points that are currently being used in the workspace along with the objects (cluster/cluster policies/dlt/notebook) that are using them . The direct files/mount points can be found by scanning their code (Notebook/scala/java/sql) and the service principal can be found by scanning the following four settings.

  1. Cluster conf settings
    spark.hadoop.fs.azure.account.auth.type OAuth
    spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
    spark.hadoop.fs.azure.account.oauth2.client.id
    spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets//}}
    spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com//oauth2/token

  2. Spark session level settings
    spark.conf.set("fs.azure.account.auth.type","OAuth")
    spark.conf.set("fs.azure.account.oauth.provider.type","org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
    spark.conf.set("fs.azure.account.oauth2.client.id","")
    spark.conf.set("fs.azure.account.oauth2.client.secret",dbutils.secrets.get(scope="",key=""))
    spark.conf.set("fs.azure.account.oauth2.client.endpoint","https://login.microsoftonline.com//oauth2/token")

  3. DLT settings
    "configuration": {
    "spark.hadoop.fs.azure.account.auth.type": "OAuth",
    "spark.hadoop.fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "spark.hadoop.fs.azure.account.oauth2.client.id":"",
    "spark.hadoop.fs.azure.account.oauth2.client.secret":"{{secrets//}}",
    "spark.hadoop.fs.azure.account.oauth2.client.endpoint":"https://login.microsoftonline.com//oauth2/token"
    },

  4. Cluster policy level settings
    {
    "spark_conf.fs.azure.account.auth.type": {
    "type": "fixed",
    "value": "OAuth",
    "hidden": true
    },
    "spark_conf.fs.azure.account.oauth.provider.type": {
    "type": "fixed",
    "value": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "hidden": true
    },
    "spark_conf.fs.azure.account.oauth2.client.id": {
    "type": "fixed",
    "value": "",
    "hidden": true
    },

"spark_conf.fs.azure.account.oauth2.client.secret": {
"type": "fixed",
"value": "{{secrets//}}",
"hidden": true
},
"spark_conf.fs.azure.account.oauth2.client.endpoint": {
"type": "fixed",
"value": "https://login.microsoftonline.com//oauth2/token",
"hidden": true
}
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions