-
Notifications
You must be signed in to change notification settings - Fork 101
Create inventory of service principals and direct files access in Azure #249
Description
In Azure, data access is authorized using service principals and it can be configured in either of the settings below-
Cluster conf
Spark session
Cluster policy
DLT settings
Today, there is no automated way of knowing how many service principal are being used in the workspace and where they are being set (cluster/cluster policies/dlt/notebook ?). As such, it is hard for the customer is figure out how many STORAGE CREDENTIALS needs to be created as part of UC upgrade. Furthermore, they could be accessing the files or mount points directly using service principal credentials and as such it is important to list all those files and mount points to estimates the EXTERNAL LOCATIONS that needs to be created. Additionally, it is also useful to list the cluster/cluster policies/dlt/notebook wherever the service principal credentials are being set to identify their owner/user. This would help customer map the permission to the required group on Storage Credential and External locations.
We need a feature in the tool to create an inventory of all service principals and direct files/mount points that are currently being used in the workspace along with the objects (cluster/cluster policies/dlt/notebook) that are using them . The direct files/mount points can be found by scanning their code (Notebook/scala/java/sql) and the service principal can be found by scanning the following four settings.
-
Cluster conf settings
spark.hadoop.fs.azure.account.auth.type OAuth
spark.hadoop.fs.azure.account.oauth.provider.type org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
spark.hadoop.fs.azure.account.oauth2.client.id
spark.hadoop.fs.azure.account.oauth2.client.secret {{secrets//}}
spark.hadoop.fs.azure.account.oauth2.client.endpoint https://login.microsoftonline.com//oauth2/token -
Spark session level settings
spark.conf.set("fs.azure.account.auth.type","OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type","org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id","")
spark.conf.set("fs.azure.account.oauth2.client.secret",dbutils.secrets.get(scope="",key=""))
spark.conf.set("fs.azure.account.oauth2.client.endpoint","https://login.microsoftonline.com//oauth2/token") -
DLT settings
"configuration": {
"spark.hadoop.fs.azure.account.auth.type": "OAuth",
"spark.hadoop.fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"spark.hadoop.fs.azure.account.oauth2.client.id":"",
"spark.hadoop.fs.azure.account.oauth2.client.secret":"{{secrets//}}",
"spark.hadoop.fs.azure.account.oauth2.client.endpoint":"https://login.microsoftonline.com//oauth2/token"
}, -
Cluster policy level settings
{
"spark_conf.fs.azure.account.auth.type": {
"type": "fixed",
"value": "OAuth",
"hidden": true
},
"spark_conf.fs.azure.account.oauth.provider.type": {
"type": "fixed",
"value": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"hidden": true
},
"spark_conf.fs.azure.account.oauth2.client.id": {
"type": "fixed",
"value": "",
"hidden": true
},
"spark_conf.fs.azure.account.oauth2.client.secret": {
"type": "fixed",
"value": "{{secrets//}}",
"hidden": true
},
"spark_conf.fs.azure.account.oauth2.client.endpoint": {
"type": "fixed",
"value": "https://login.microsoftonline.com//oauth2/token",
"hidden": true
}
}