This Universal Task allows Stonebranch users to perform end-to-end Orchestration and Automation of Jobs & Clusters in Databricks environment, either in AWS or Azure.
This integration will use the Databricks URL and the user bearer token to connect with the Databricks environment. Users can perform the following actions with respect to the Databricks jobs.
Create and list jobs
Get job details
Run now jobs
Run submit jobs
Cancel run jobs
Also with respect to Databricks clusters, this integration can perform the following operations.
With respect to Databricks DBFS, this Universal Task also provides a feature to upload larger files.
Refer to Changelog for version history information.
This integration requires a Universal Agent and a Python runtime to execute the Universal Task against a Databricks environment. This Universal Task has been tested with the Azure Databricks environment API version 2.0.
Requires Python 3.7.6 or higher. Tested with the Universal Agent bundled Python distribution. It requires "requests" python module.
Software Requirements for Universal Agent
Software Requirements for Universal Controller
This task uses Python module "requests" to make REST API calls to the Databricks environment.
Databricks URL and user bearer token would be required as basic input for this Universal Task.
Authentication is possible either by generating a personal access token in Databricks Environment or an Azure AD-based authentication.
Using the Azure AD based Authentication involves some configuration as shown below in Azure (https://portal.azure.com).
Login to Azure; Use the App registrations services to create an application registration for Azure Databricks to get the client id / Client secret credentials
For the newly created Databricks App registration add the API permission for Azure Databricks as shown below:
Finally, add the service principal to Azure Databricks
Create a job in a Databricks environment from Universal Controller. Here, a JSON input for job creation in Databricks environment will be used.
List the jobs available within the Databricks environment.
Get Job details
Provides an existing job definition in Databricks by providing the job ID as input.
Run now Jobs
This feature helps to run an existing job in Databricks environment using the run time input parameters supplied in JSON from the Universal Task and the Universal Controller will be monitoring the execution of the job until it gets completed.
Run Submit jobs
This feature helps to run a job in Databricks environment that can be dynamically defined in JSON as an input parameter in the Universal Task and the Universal Controller will be monitoring the execution of the job until it gets completed.
Cancel Run job
Cancel an execution of a job that is in a running state within the Databricks environment.
Create a cluster in Databricks environment. Input to be provided in the JSON in a script in this Universal Task.
List the clusters available in the Databricks environment.
Start a cluster that is in a stopped state in Databricks.
Restart a cluster in the Databricks environment.
Terminate cluster in Databricks environment by providing cluster ID as input.
Get a Cluster info
Provides the definition of an existing cluster in Databricks environment in JSON.
Upload file to DBFS
Upload a file from local server to a Databricks file system DBFS.
Import Universal Template
To use this downloadable Universal Template, you first must perform the following steps:
- This Universal Task requires the Resolvable Credentials feature. Check that the Resolvable Credentials Permitted system property has been set to true.
- To import the Universal Template into your Controller, follow the instructions here.
- When the files have been imported successfully, refresh the Universal Templates list; the Universal Template will appear on the list.
Modifications of this integration, applied by users or customers, before or after import, might affect the supportability of this integration.
Configure Universal Task
For the new Universal Task type, create a new task, and enter the task-specific details that were created in the Universal Template.
|Input Type||Default Value||Type|
Specify the Databricks URL.
|Azure AD Authentication||Optional||False||Boolean||Check the box when authentication must go via Azure AD.|
|Azure Tenant ID||Required||Text||If Azure AD Authentication is checked then specify the Azure Tenant id from the App registrations for databricks.|
|Azure Client ID & Secret||Required||Credential||If Azure AD Authentication is checked then specify the runtime User as Azure client id & runtime Password as the Secret value from the App registrations for databricks.|
|Scope||Required||2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default||Text||If Azure AD Authentication is checked then specify the scope. The default value of scope would be "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default" as mentioned in the Azure Databricks documents|
Provide the Databricks Personal token or the Azure AD token.
Select a Function that would like to perform with Databricks.
Create Request Script
Feed the script for the new job creation or cluster in Databricks.
Provide the Databricks Job ID.
Job Run Request
Specify the parameters for Jar or notebook or python or spark-submit or the Job submit run request.
Specify the Databricks Run ID.
Provide the cluster ID.
Local file name
Local file name with path.
DBFS file name
Provide the Databricks file path and name.
Specify if the uploaded files need to be overwritten in DBFS.
|SSL Verify||Optional||True||Boolean||Check if this Universal Task requires certificate verification for Databricks REST-API calls.|
|Certificate Path and file name||Optional||Text|
Path of the certificate for SSL Verification, if the SSL Verify field is enabled.
|Status Check Interval||Optional||10||Integer||The interval for checking the status of the job.|
|Maximum Retry||Optional||10||Integer||Number of retries during waiting for the task to be completed.|
Run now Job
Run Now Job - Script Details
Run Submit Job
Upload Local File to DBFS
The exit codes for Universal Task are described in the following table.
This document references the following documents:
User documentation for creating Universal Templates in the Universal Controller user interface.
User documentation for creating Universal Tasks in the Universal Controller user interface.
Fixed: Retry Interval and Max Retry parsing secured. It was failing for actions that don't use these values.
Added: Retry count will reset on successful execution.
Fixed: Retry options added for connection errors also
Added: Timeout and Retry options added to requests module
Added: A field added for the sleep interval during job status check
Fixed: check Run Status command may not return run_state value
Added: Regenerate Azure Ad token while Job monitor if token expires
Added: Included azure Ad based access to DB via MSAL
Added: SSL Verify option added
Fixed: Usage for the Double quotes for the _scriptPath
Fixed: Print state run now print logs
Added: Updates for customer requirements
Added: Upload & Export functionalities