Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

Disclaimer

Your use of this download is governed by Stonebranch’s Terms of Use, which are available at https://www.stonebranch.com/integration-hub/Terms-and-Privacy/Terms-of-Use/

Overview

AWS Glue is a serverless data-preparation service for extract, transform, and load (ETL) operations. It makes it easy for data engineers, data analysts, data scientists, and ETL developers to extract, clean, enrich, normalize, and load data.

This Universal Extension provides the capability to submit a new AWS Glue Job.

Software Requirements

This integration requires a Universal Agent and a Python runtime to execute the Universal Task.

Software Requirements for Universal Template and Universal Task

Requires Python 3.7.0 or higher. Tested with the Universal Agent bundled Python distribution.

Software Requirements for Universal Agent

Both Windows and Linux agents are supported:

  • Universal Agent for Windows x64 Version 7.0.0.0 and later with python options installed.
  • Universal Agent for Linux Version 7.0.0.0 and later with python options installed.

Software Requirements for Universal Controller

Universal Controller Version 7.0.0.0 and later.

Network and Connectivity Requirements

Extension's Universal Agent host should be able to reach AWS Glue REST endpoints. The AWS Credentials provided in the AWS Glue Universal Task, should have sufficient permissions on AWS to invoke Glue Jobs.

Key Features

This Universal Extension provides the following key features:

  • Start a Glue job.
  • Support authorization via IAM Role-Based Access Control (RBAC) strategy.
  • Support Proxy communication via HTTP/HTTPS protocol.

Import Universal Template

To use the Universal Template, you first must perform the following steps:

  1. This Universal Task requires the Resolvable Credentials feature. Check that the Resolvable Credentials Permitted system property has been set to true.
  2. Download the provided ZIP file.

  3. In the Universal Controller UI, select Administration >Configuration > Universal Templates to display the current list of Universal Templates.

  4. Click Import Template.

  5. Select the template ZIP file and Import.

When the template has been imported successfully, the Universal Template will appear on the list, refresh your Navigation Tree to see these tasks in the Automation Center Menu.

Configure Universal Task

For the new Universal Task type AWS Glue, create a new task, and enter the task-specific details that were created in the Universal Template.

Input Fields

The input fields for this Universal Extension are described in the following table.

FieldInput typeDefault valueTypeDescription
ActionRequiredStart Job RunChoice

Action performed upon the task execution.

Available action:

  • Start Job Run
AWS RegionRequired-TextRegion for the Amazon Web Service. Find more information about the AWS Service endpoints and quotas here.
AWS CredentialsRequired-Credentials

AWS account credentials. They are comprised of:

  • AWS access key ID
  • AWS secret access key
Role Based AccessOptionalFalseBoolean

Special type of authorization is provided by Role Assumption where the client sends his own credentials and the role he wants to assume from another user.

If allowed, the client receives temporary credentials with limited time access to some resources.

Role ARNOptional-Text

Role Arn: Amazon Role, which is applied for the connection. Role ARN format: Example RoleArn: arn:aws:iam::119322085622:role.

Required when Role Based Access="True".

Job NameRequired-TextName of the Glue job that will be invoked.
Job Run IDOptional-TextID of a previous Job Run to retry.
Security ConfigurationOptional-TextName of the Security Configuration structure to be used with the Job Run.
Worker TypeOptionalNoneChoiceType of predefined worker that is allocated when a job runs. Available options are:
  • Standard
  • G.1X
  • G.2X
Number Of WorkersOptional-IntegerNumber of workers of a defined Worker Type that are allocated when a job is executed. The maximum number of workers that can be defined are:
  • 299 for G.1X
  • and 149 for G.2X

Required when Worker Type is not None.

Job TimeoutOptional2880Integer

Job Run timeout in minutes.

Note

The value of 2880 Minutes is the default timeout value provided by Amazon for new AWS Glue Jobs. It is suggested that users tune this parameter to the minimum value to avoid having running jobs for more than expected.

For more information please refer to Amazon AWS Glue pricing guide.

Notify Delay PeriodOptional-IntegerAfter a job run starts, the number of minutes to wait before sending a job run delay notification.
Input ArgumentsOptional-ArrayJob arguments specifically for this run. For this Job Run, they replace the default arguments set in the job definition itself.
Proxy TypeOptionalHTTPChoiceType of proxy connection to be used. Available options are:
  • HTTP
  • HTTPS
  • HTTPS With Credentials

Visible only when Use Proxy = "True".

ProxyOptional-Text

Comma-separated list of Proxy servers.

Valid formats:

http://proxyip:port or http://proxyip:port,https://proxyip:port.

Required when Use Proxy is checked.

Proxy CA Bundle FileOptional-Text

Path to a custom certificate bundle to use when establishing SSL/TLS connections with proxy.

Used when Proxy Type is configured for "HTTPS" or "HTTPS With Credentials".

Proxy CredentialsOptional-Credentials

Credentials to be used for the proxy communication.

They are comprised of:

  • username
  • password

Required when "Proxy Type" is configured for "HTTPS" or "HTTPS With Credentials".

Extension Cancellation

When using a 7.0 or newer template, we must ensure that the “Always Cancel On Force Finish” is checked. This is to minimize leaving “orphan” processes on the OS without the option for the agent to see they are running.


Task Examples

Start Job Run with only required arguments

Start a new Glue job run, providing the only required field Job Name.

Start Job Run with all optional input arguments

Start a new Glue Job Run for a given Run ID (retries a previous execution), with all optional input argument.

Start Job Run with Role ARN and Proxy configuration

Start a new Glue Job Run assuming a provided ARN Role, and also using a Proxy configuration.

Exit Codes

The exit codes for AWS Lambda Extension are described below.

Exit CodeStatus Classification CodeStatus Classification DescriptionStatus Description
0SUCCESSSuccessful ExecutionSUCCESS: Successful Task execution.
1FAILFailed ExecutionFAIL: < Error Description >.
2AUTHENTICATION_ERRORBad credentialsAUTHENTICATION_ERROR: Account cannot be authenticated.
3AUTHORIZATION_ERRORInsufficient PermissionsAUTHORIZATION_ERROR: Account is not authorized to perform the requested action.
10CONNECTION_ERRORBad connection data or connection timed outCONNECTION_ERROR: < Error Description >.
11CONNECTION_ERRORExtension specific connection errorCONNECTION_ERROR: ProxyConnectionError: Failed to connect to proxy URL <url>.
20DATA_VALIDATION_ERRORInput fields validation errorDATA_VALIDATION_ERROR: Some of the input fields cannot be validated. See STDERR for more details.

Extension Output

The Extension Output for AWS Lambda Universal Task successful execution is described below.

{
    "exit_code": 0,
    "status_description": "SUCCESS: AWS Glue Job started successfully",
    "changed": true,
    "invocation": {
        "extension": "ue-aws-glue",
        "version": "1.0.0",
        "fields": {
            "action": "Start Job Run",
            "aws_credentials_user": "test-user",
            "aws_credentials_password": "****",
            "region": "us-east-1",
            "role_based_access": false,
            "role_arn": null,
            "job_name": "TestJob1",
            "job_run_id": null,
            "security_config": null,
            "worker_type": "G.1X",
            "num_workers": 3,
            "job_timeout": 2880,
            "notify_delay_period": 3,
            "input_arguments": [
                {
                    "Stonebranch": "Extension"
                }
            ],
            "use_proxy": true,
            "proxy": "https://proxy.example.com:8080",
            "proxy_type": "HTTPS",
            "proxy_ca_bundle_file": "/tmp/proxy_ca.pem",
            "proxy_credentials_user": null,
            "proxy_credentials_password": null
        }
    },
    "result": {
        "out_job_run_id": "jr_c83819e1ded81e44fc05d8bfbbf9394b9c9edc7693312d0be05d51ab2fd921c7"
    }
}

Document References

This document references the following documents:

NameLocationDescription
Universal Templateshttps://docs.stonebranch.com/confluence/display/UC72x/Universal+TemplatesUser documentation for creating Universal Templates in the Universal Controller user interface.
Universal Taskshttps://docs.stonebranch.com/confluence/display/UC72x/Universal+TasksUser documentation for creating Universal Tasks in the Universal Controller user interface.
AWS Gluehttps://docs.aws.amazon.com/glue/?id=docs_gatewayDocumentation for AWS Lambda.
IAM RBAC authorization modelhttps://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html#introduction_attribute-based-access-control_compare-rbacUser Documentation for Comparing ABAC to the traditional RBAC model.
  • No labels