Disclaimer
Your use of this download is governed by Stonebranch’s Terms of Use, which are available at https://www.stonebranch.com/integration-hub/Terms-and-Privacy/Terms-of-Use/
Overview
AWS Glue is a serverless data-preparation service for extract, transform, and load (ETL) operations. It makes it easy for data engineers, data analysts, data scientists, and ETL developers to extract, clean, enrich, normalize, and load data.
This Universal Extension provides tha capability to submit a new AWS Glue Job.
Software Requirements
This integration requires a Universal Agent and a Python runtime to execute the Universal Task.
Software Requirements for Universal Template and Universal Task
Requires Python 3.7.0 or higher. Tested with the Universal Agent bundled Python distribution.
Software Requirements for Universal Agent
Both Windows and Linux agents are supported:
- Universal Agent for Windows x64 Version 7.0.0.0 and later with python options installed.
- Universal Agent for Linux Version 7.0.0.0 and later with python options installed.
Software Requirements for Universal Controller
Universal Controller Version 7.0.0.0 and later.
Network and Connectivity Requirements
Extension's Univesal Agent host should be able to reach AWS Glue REST endpoints. The AWS Credentials provided in the AWS Glue Universal Task, should have sufficient permissions on AWS to invoke Glue Jobs.
Key Features
This Universal Extension provides the following key features:
- Start a Glue job.
- Support authorization via IAM Role-Based Access Control (RBAC) strategy.
- Support Proxy communication via HTTP/HTTPS protocol.
Import Universal Template
To use the Universal Template, you first must perform the following steps:
This Universal Task requires the Resolvable Credentials feature, check that the Resolvable Credentials Permitted system property has been set to true. For more information about Resolvable Credentials click here.
Download the provided ZIP file.
In the Universal Controller UI, select Administration >Configuration > Universal Templates to display the current list of Universal Templates.
Click Import Template.
Select the template ZIP file and Import.
When the template has been imported successfully, the Universal Template will appear on the list, refresh your Navigation Tree to see these tasks in the Automation Center Menu.
Configure Universal Task
For the new Universal Task type AWS Glue, create a new task, and enter the task-specific details that were created in the Universal Template.
Input Fields
The input fields for this Universal Extension are described below.
Field | Input type | Default value | Type | Description |
---|---|---|---|---|
Action | Required | Start Job Run | Choice | The action performed upon the task execution. Available action:
|
AWS Region | Required | - | Text | Region for the Amazon Web Service. Find more information about the AWS Service endpoints and quotas here. |
AWS Credentials | Required | - | Credentials | The AWS account credentials. They are comprised of:
|
Role Based Access | Optional | False | Boolean | Special type of authorization is provided by Role Assumption where the client sends his own credentials and the role he wants to assume from another user. If allowed, the client receives temporary credentials with limited time access to some resources. |
Role ARN | Optional | - | Text | Role Arn: Amazon Role, which is applied for the connection. Role ARN format: Example RoleArn: arn:aws:iam::119322085622:role .Required when Role Based Access="True". |
Job Name | Required | - | Text | The name of the Glue job that will be invoked. |
Job Run ID | Optional | - | Text | The ID of a previous Job Run to retry. |
Security Configuration | Optional | - | Text | The name of the Security Configuration structure to be used with the Job Run. |
Worker Type | Optional | None | Choice | The type of predefined worker that is allocated when a job runs. Available options are:
|
Number Of Workers | Optional | - | Integer | The number of workers of a defined Worker Type that are allocated when a job is executed. The maximum number of workers that can be defined are:
|
Job Timeout | Optional | 2880 | Integer | The Job Run timeout in minutes. Note: The value of 2880 Minutes is the default timeout value provide by Amazon for new AWS Glue Jobs. It is suggested that users tune this parameter to the minimum value to avoid having running jobs for more than expected. For more information please refer to Amazon AWS Glue pricing guide. |
Notify Delay Period | Optional | - | Integer | After a job run starts, the number of minutes to wait before sending a job run delay notification. |
Input Arguments | Optional | - | Array | The job arguments specifically for this run. For this Job Run, they replace the default arguments set in the job definition itself. |
Proxy Type | Optional | HTTP | Choice | Type of proxy connection to be used. Available options are:
|
Proxy | Optional | - | Text | Comma separated list of Proxy servers. Valid formats:http://proxyip:port or http://proxyip:port,https://proxyip:port .Required when Use Proxy is checked. |
Proxy CA Bundle File | Optional | - | Text | The path to a custom certificate bundle to use when establishing SSL/TLS connections with proxy. Used when Proxy Type is configured for "HTTPS" or "HTTPS With Credentials". |
Proxy Credentials | Optional | - | Credentials | Credentials to be used for the proxy communication. They are comprised of:
|
Extension Cancelation
When using a 7.0 or newer template, we must ensure that the “Always Cancel On Force Finish” is checked. This is to minimize leaving “orphan” processes on the OS without the option for the agent to see they are running.
Task Examples
Start Job Run with only required arguments
Start a new Glue job run, providing the only required field Job Name.
Start Job Run with all optional input arguments
Start a new Glue Job Run for a given Run ID (retries a previous execution), with all optional input argument.
Start Job Run with Role ARN and Proxy configuration
Start a new Glue Job Run assuming a provided ARN Role, and also using a Proxy configuration.
Exit Codes
The exit codes for AWS Lambda Extension are described below.
Exit Code | Status Classification Code | Status Classification Description | Status Description |
---|---|---|---|
0 | SUCCESS | Successful Execution | SUCCESS: Successful Task execution |
1 | FAIL | Failed Execution | FAIL: < Error Description > |
2 | AUTHENTICATION_ERROR | Bad credentials | AUTHENTICATION_ERROR: Account cannot be authenticated. |
3 | AUTHORIZATION_ERROR | Insufficient Permissions | AUTHORIZATION_ERROR: Account is not authorized to perform the requested action. |
10 | CONNECTION_ERROR | Bad connection data or connection timed out | CONNECTION_ERROR: < Error Description > |
11 | CONNECTION_ERROR | Extension specific connection error | CONNECTION_ERROR: ProxyConnectionError: Failed to connect to proxy URL <url> |
20 | DATA_VALIDATION_ERROR | Input fields validation error | DATA_VALIDATION_ERROR: Some of the input fields cannot be validated. See STDERR for more details |
Extension Output
The Extension Output for AWS Lambda Universal Task successful execution is described below.
{
"exit_code": 0,
"status_description": "SUCCESS: AWS Glue Job started successfully",
"changed": true,
"invocation": {
"extension": "ue-aws-glue",
"version": "1.0.0",
"fields": {
"action": "Start Job Run",
"aws_credentials_user": "test-user",
"aws_credentials_password": "****",
"region": "us-east-1",
"role_based_access": false,
"role_arn": null,
"job_name": "TestJob1",
"job_run_id": null,
"security_config": null,
"worker_type": "G.1X",
"num_workers": 3,
"job_timeout": 2880,
"notify_delay_period": 3,
"input_arguments": [
{
"Stonebranch": "Extension"
}
],
"use_proxy": true,
"proxy": "https://proxy.example.com:8080",
"proxy_type": "HTTPS",
"proxy_ca_bundle_file": "/tmp/proxy_ca.pem",
"proxy_credentials_user": null,
"proxy_credentials_password": null
}
},
"result": {
"out_job_run_id": "jr_c83819e1ded81e44fc05d8bfbbf9394b9c9edc7693312d0be05d51ab2fd921c7"
}
}
Document References
This document references the following documents:
Name | Location | Description |
---|---|---|
Universal Templates | https://docs.stonebranch.com/confluence/display/U70/Universal+Templates | User documentation for creating Universal Templates in the Universal Controller user interface. |
Universal Tasks | https://docs.stonebranch.com/confluence/display/UC70/Universal+Tasks | User documentation for creating Universal Tasks in the Universal Controller user interface. |
AWS Glue | https://docs.aws.amazon.com/glue/?id=docs_gateway | Documentation for AWS Lambda |
IAM RBAC authorization model | https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html#introduction_attribute-based-access-control_compare-rbac | User Documentation for Comparing ABAC to the traditional RBAC model |