Skip to main content

Shared Storage

Guzzle Shared storage is used to store guzzle jars, third-party dependency jars and extra deployment specific jars in Azure storage account. This jars will be used when the job is triggered on remote cluster.

note

Shared storage would be configured when using remote cluster to run Guzzle jobs.

Network Architecture for Guzzle on Azure#

When using remote compute for running activities, it is mandatory to give all configuration using Azure blob storage. A shared storage simply means storing guzzle jars, third-party dependency jars and extra deployment specific jars in Azure storage account. And the job configs and logs will be saved on local Guzzle VM.

When jobs is triggered on remote cluster Guzzle API will be used to read and write guzzle logs. And the job configs will be referred using Guzzle API. This stored jars will be installed on remote cluster and will use to run the jobs when the job is triggered on remote cluster from Guzzle.

Setup Shared Storage#

  1. Go to the Manage menu from the top navigation bar

  2. Navigate to Environment Config -> Shared Storage

  3. Enter following configuration details:

Properties to setup Shared Storage#

PropertyDescriptionDefault ValueRequired
Account NameSpecify the Azure Storage Account Name.NoneYes
ContainerSpecify the Azure storage container nameNoneYes
Authentication TypeSelect the Authentication Type:
1. Access Key
- If this option is selected then access key of storage account is required.
2. Service Principle
- If this option is selected then tenant id, client id and client secret is required
NoneYes
Access KeySpecify the access key of storage account.
For specify access key the following options are available:
1. Manual: Provide access key directly.
2. Azure Key Vault: For this user have to integrate Key Vault with Guzzle for that visit here. Give value of the key vault name and secret name where access key is stored in Azure Key Vault instance.
NoneYes
Tenant IdSpecify the service principle tenant idNoneYes
Client idSpecify the service principle client idNoneYes
Client SecretSpecify the client secret.
For specify client secret the following options are available:
1. Manual: Provide client secret directly.
2. Azure Key Vault: To use Azure key vault feature user have to integrate Key Vault with Guzzle for that visit here. Give value of the key vault name and secret name where client secret is stored in Azure Key Vault instance.
NoneYes
Container DirectorySpecify directory inside the container where jars will be saved.
You can specify / (or root directory) if you want to store on root
NoneYes
Databricks SecretGive below information for sharing the jars with remote cluster from storage account.
Secret Scope: This is the secret scope defined in Databricks workspace.
Secret Key: This is the secret containing the access key for the storage account that is to be used for shared storage.

Refer to these articles to create keyvault backed Databricks Secret Scope.
How to create secret scope is defined here.
NoneNo
Sync StorageThis button will be used to sync Guzzle jars, third party dependency jars and extra deployment specific jars to the storage account from Guzzle VM
UpdateThis button will be used to update all the properties
CancelTo cancel the updated values in property

Guzzle will validate the shared storage configuration ensuring the storage account, container, access key, and folder are valid before updating the shared storage information.

As soon as user press the update button, it will verify the credentials and stored in guzzle-api.yml file.

storage:
type: "local"
properties:
auth_type: "access_key"
access_key: "XXXXXX"
container_directory: "/"
container_name: "XXXXXXXXX"
account_name: "XXXXX"
databricks_secret:
key: "XXXXX"
scope: "XXXX"

Interface of Shared Storage using Access Key#

Interface of Shared Storage using Service Principle#