AWS EMR

Disclaimer

Your use of this download is governed by Stonebranch’s Terms of Use, which are available at Stonebranch Integration Hub - Terms of Use.

Version Information

Template Name

Extension Name

Version

AWS EMR

ue-aws-emr

1.0.0

Refer to Changelog for version history information.

Overview

Amazon EMR (Elastic MapReduce) is a cloud-based service offered by Amazon Web Services (AWS) designed to process and analyze large data sets quickly and cost-effectively. Amazon EMR notebooks. AWS EMR Notebooks are Jupyter-based notebooks that are integrated with Amazon EMR, providing execution capabilities and in addition an interactive environment for data exploration and analysis. 

This integration provides the ability the start an AWS EMR notebook execution from UAC and optionally monitor it's execution. It also allows Task authors to specify a large variety of notebook execution configuration options.

Key Features

Feature

Description

Start Notebook Execution

Start an AWS EMR Notebook execution with related parameters


Software Requirements

This integration requires a Universal Agent and a Python runtime to execute the Universal Task.

Area

Details

Python Version

Requires Python version 3.7 or 3.11 . Tested with the Universal Agent bundled Python distribution.

Universal Agent

Both Windows and Linux agents are supported:

  • Universal Agent for Windows x64 Version >= 7.4.0.0

  • Universal Agent for Linux Version >= 7.4.0.0

Universal Controller

Universal Controller Version >= 7.4.0.0

Supported Actions

Action: Start Notebook Execution

This action initiates the execution of an AWS EMR notebook. The task author can configure it to either simply trigger the execution or to trigger and wait until it completes successfully or fails. AWS Credentials and AWS Region can be specified in the Task definition for control, or they should be set up in the task execution environment. For more details, please refer to the 'Input Fields' chapter.

Scenario A: Start Notebook Execution with required fields only and corresponding task output fields


Scenario B: Start Notebook Execution With Role Based Access (ARN)


Scenario C: Start Notebook Execution with Notebook Parameters (JSON)




Scenario D: Start Notebook Execution Wait For Completion Polling Interval

Scenario E: Start Notebook Execution with environment variables passed to the execution


Scenario F: Start Notebook Execution with Additional Configuration as JSON UAC Script



Action Output

Output Type

Description

Examples

EXTENSION

The extension output provides the following information:

  • exit_code, status, status_description: General info regarding the task execution.

  • invocation.fields: The task configuration used for this task execution.

  • result.notebook_execution_info: Field containing information regarding the notebook execution as retrieved from AWS.

  • result.errors: List of errors that might have occurred during execution.

Detailed information about the fields in the result.notebook_execution_info field can be found on the official AWS documentation page.

Successful Execution with Polling
{
    "exit_code": 0,
    "status_description": "Task executed successfully",
    "invocation": {
        "extension": "ue-aws-emr",
        "version": "1.0.0",
        "fields": {
            "action": "Start Notebook Execution",
            "credentials": {
                "user": <AWS_ACCESS_KEY>,
                "password": "****",
                "token": null,
                "key_location": null,
                "passphrase": null
            },
            "region": <AWS_DEFAULT_REGION>,
            "editor_id": <AWS_EMR_EDITOR_ID>,
            "relative_path": "/path/to/notebook.ipynb",
            "execution_engine_id": <AWS_EMR_EXECUTION_ENGINE_ID>,
            "service_role": <AWS_EMR_SERVICE_ROLE>,
            "wait_for_completion": true,
            "polling_interval": 15,
            "max_polls": 0,
            "notebook_parameters": null,
            "optional_parameters": false,
            "notebook_execution_name": null,
            "role_arn": null,
            "additional_conf_type": null,
            "additional_conf_text": null,
            "additional_conf_script": null
        }
    },
    "result": {
        "notebook_execution_info": {
            "NotebookExecutionId": <AWS_NOTEBOOK_EXECUTION_ID>,
            "EditorId": <AWS_EMR_EDITOR_ID>,
            "ExecutionEngine": {
                "Id": <AWS_EMR_EXECUTION_ENGINE_ID>,
                "Type": "EMR",
                "MasterInstanceSecurityGroupId": <AWS_SECURITY_GROUP_ID>
            },
            "NotebookExecutionName": null,
            "NotebookParams": null,
            "Status": "FINISHED",
            "StartTime": "2024-05-28 15:13:47.107000+03:00",
            "EndTime": "2024-05-28 15:14:09.189000+03:00",
            "Arn": <AWS_NOTEBOOK_EXECUTION_ARN>,
            "OutputNotebookURI": <AWS_OUTPUT_NOTEBOOK_URI>,
            "LastStateChangeReason": "Execution is finished for cluster <AWS_EMR_EXECUTION_ENGINE_ID>.",
            "NotebookInstanceSecurityGroupId": <AWS_SECURITY_GROUP_ID>,
            "Tags": null,
            "OutputNotebookS3Location": {
                "Bucket": <AWS_S3_BUCKET>,
                "Key": <AWS_S3_KEY>
            },
            "OutputNotebookFormat": "null"
        }
    }
}


Failed Execution - Invalid Editor Id
{
    "exit_code": 1,
    "status_description": "Notebook Execution has failed: An error occurred (ValidationException) when calling the StartNotebookExecution operation: editorId 'random_id' is not valid.",
    "invocation": {
        "extension": "ue-aws-emr",
        "version": "1.0.0",
        "fields": {
            "role_arn": null,
            "notebook_execution_name": null,
            "execution_engine_id": <AWS_EMR_EXECUTION_ENGINE_ID>,
            "additional_conf_type": null,
            "additional_conf_text": null,
            "polling_interval": 15,
            "max_polls": 0,
            "service_role": <AWS_EMR_SERVICE_ROLE>,
            "relative_path": "/path/to/notebook.ipynb",
            "notebook_parameters": null,
            "wait_for_completion": false,
            "action": "Start Notebook Execution",
            "optional_parameters": false,
            "region": <AWS_DEFAULT_REGION>,
            "editor_id": "random_id",
            "credentials": {
                "user": <AWS_ACCESS_KEY>,
                "password": "****",
                "key_location": null,
                "passphrase": null,
                "token": null
            }
        }
    },
    "result": {
        "errors": [
            "Notebook Execution has failed: An error occurred (ValidationException) when calling the StartNotebookExecution operation: editorId 'random_id' is not valid."
        ]
    }
}

Input Fields

Name

Type

Description

Version Information

Action

Choice

The action performed upon the task execution. Available options:

  • Start Notebook Execution (default)

Default value is “Start Notebook Execution”. Note that “Notebook” is an entity on AWS that later was renamed to “Workspace”. However in AWS documentation both terms are used.

1.0.0

Region

Text

Region for the Amazon Web Service i.e. "us-east-1"

When AWS Region is not populated as part of the task definition, during task execution the integration will look for AWS Region on the task execution environment (AWS configuration file or through AWS environment variables).

1.0.0

Credentials

Credentials

The Credentials definition should be as follows.

  • AWS Access Key ID as "Runtime User".

  • AWS Secret Access Key as "Runtime Password".

When AWS Credentials is not populated as part of the task definition, during task execution the integration will look for AWS Credentials on the task execution environment. Refer to AWS Credential Configuration Options for more information

1.0.0

Assume Role ARN

Large Text

Role ARN of the AWS Assume Role functionality. The Assume Role functionality in AWS (Amazon Web Services) allows a user or service to take on the permissions of another IAM (Identity and Access Management) role temporarily.

If field is left empty Role Assumption is not performed.

1.0.0

Service Role

Text

The name or ARN of the IAM role that is used as the service role for Amazon EMR (the Amazon EMR role) for the notebook execution.

1.0.0

Execution Engine ID

Text

The unique identifier of the execution engine. For an EMR cluster, this is the cluster ID.

1.0.0

Editor ID

Text

The unique identifier of the EMR Notebook to use for notebook execution.

1.0.0

Relative Path

Text

The path and file name of the notebook file for this execution, relative to the path specified for the EMR Notebook.

For example, if you specify a path of "s3://MyBucket/MyNotebookswhen you create an EMR Notebook for a Notebook with an ID ofe-ABCDEFGHIJK1234567890ABCD (the Editor ID of mentioned above), and you specify a Relative Path of "my_notebook_executions/notebook_execution.ipynb" , the location of the file for the notebook execution is "s3://MyBucket/MyNotebooks/e-ABCDEFGHIJK1234567890ABCD/my_notebook_executions/notebook_execution.ipynb"

1.0.0

Notebook Parameters (JSON)

Large Text

Input parameters in JSON format passed to the Amazon EMR Notebook at runtime for execution.

Example Input
{
"number": 30,
"word": "random text",
"random_dict": {"random_key": "text"}
}

1.0.0

Wait for Completion

Checkbox

A switch that controls whether the UAC task will wait until the Notebook Execution is completed.

Default value is unchecked.

1.0.0

Polling Interval

Integer

The time (in seconds) between retries for getting the status of Notebook Execution.

Mandatory if Wait for Success or Failure is “true”

As a best practice, if the Notebook Execution expected completion duration is long, set the polling Interval to a larger value. A short value will trigger frequent checks towards AWS which in the case of long-duration jobs is inefficient in terms of resources.

Default value is 15.

1.0.0

Max Number of Polls

Integer

Maximum number of polls. Can be used to control the approximate expected duration of the Notebook Execution (in relation also to Polling Interval (sec))

If left empty the UAC Task Instance will poll indefinitely checking whether the Notebook Execution is completed or resulted in failure.

If the Maximum Number of Polls is reached the exit code of the Universal Task Instance is 40.

Available if Wait for Success or Failure is “true”.

1.0.0

Additional Optional Parameters

Checkbox

A switch that controls if any additional optional parameters and configurations should be passed to the notebook execution.

Default value is unchecked.

1.0.0

Notebook Execution Name

Text

An optional name for the notebook execution.

Available if Additional Optional Parameters is “true”.

1.0.0

Provide Additional Configuration As

Choice

Specifies how additional configuration options for EMR Notebook execution can be defined on the UAC Task.

Available options are:

  • – None --

  • As JSON Text (default)

  • As JSON UAC Script

Available if Additional Optional Parameters is “true”.

1.0.0

Additional Configuration (Text)

Large Text

Additional configuration options for the notebook execution.

Default Value
{
    "NotebookInstanceSecurityGroupId":"string",
    "ExecutionEngine": {
        "MasterInstanceSecurityGroupId": "string",
        "ExecutionRoleArn": "string"
    },
    "Tags":[
        {
            "Key": "string",
            "Value": "string"
        }
    ],
    "NotebookS3Location":{
        "Bucket": "string",
        "Key": "string"
    },
    "OutputNotebookS3Location":{
        "Bucket": "string",
        "Key": "string"
    },
    "EnvironmentVariables":{
        "string": "string"
    }
}

The default value is a placeholder value meant to serve as a way to inform the user on what the available fields are. It is not meant to be used directly as a default value for an execution of the integration.

Task authors can use the default value provided, as a template.

  • Root JSON elements (for example NotebookInstanceSecurityGroupId) are optional. Consequently if a JSON element is not required, it should be removed.

  • "string" is a placeholder and needs to be updated with the real value. The value needs to be a JSON string format, which means that depending on the value some characters should be escaped accordingly to JSON grammar.

  • Information about the functionality of each field can be found on the Official AWS Guide.

Field is available if Provide Additional Configuration As is “As JSON Text”.

1.0.0

Additional Configuration (Script)

Script

Provides the same functionality as Additional Configuration (Text) option with the added benefit of being reusable by saving the configuration as an UAC script.

Available if Provide Additional Configuration As is “As JSON UAC Script”.

1.0.0

Output Fields

Field

Type

Description

Introduced in Version

Notebook Execution ID

Text

The Notebook Execution ID.

1.0.0

Notebook Execution Status

 Text

The latest retrieved status of the notebook execution.

1.0.0

Cancelation and Rerun

On cancellation the UAC Task Instance is stopped on the Universal Controller, however if any notebook executions are started they can be stopped through AWS Console.

Exit Codes

Exit Code

Status

Status Description

 Meaning

0

Success

“SUCCESS: Task executed successfully.“

Successful Execution

1

Failure

“Execution Failed: <<Error Description>>”

Generic Error or notebook execution has reached a failed state

2

Failure

“Authentication Error: Account cannot be authenticated.“

Bad credentials

3

Failure

“Authorization Error: Account is not authorized to perform the requested action.“

Insufficient permissions

10

Failure

“Connection Error: <<Error Description>>“

Reserved for Connection Failure (generic connection errors).

11

Failure

“Connection Error: <<Error Description>>“

Reserved exit codes for connection errors that are specific to the integrated service/library.

20

Failure

“Data Validation Error: <<Error Description>>“

Input fields validation error.

40

Failure

“Polling Timeout: maximum poling timeout reached.“

Maximum number of polls has been reached.

STDOUT and STDERR

STDOUT of this integration is empty and STDERR provides additional information to the user, the detail of it is tuned by Log Level Task Definition field.

Backward compatibility is not guaranteed for the content of STDOUT/STDERR and can be changed in future versions without notice

How To

Import Universal Template

To use the Universal Template, you first must perform the following steps.

  1. This Universal Task requires the Resolvable Credentials feature. Check that the Resolvable Credentials Permitted system property has been set to true.

  2. To import the Universal Template into your Controller, follow these instructions.

  3. When the files have been imported successfully, refresh the Universal Templates list; the Universal Template will appear on the list.

Modifications of this integration, applied by users or customers, before or after import, might affect the supportability of this integration. For more information refer to Integration Modifications.

Configure Universal Task

For a new Universal Task, create a new task, and enter the required input fields.

Integration Modifications

Modifications applied by users or customers, before or after import, might affect the supportability of this integration. The following modifications are discouraged to retain the support level as applied for this integration.

  • Python code modifications should not be done.

  • Template Modifications

    • General Section

      • "Name", "Extension", "Variable Prefix", "Icon" should not be changed.

    • Universal Template Details Section

      • "Template Type", "Agent Type", "Send Extension Variables", "Always Cancel on Force Finish" should not be changed.

    • Result Processing Defaults Section

      • Success and Failure Exit codes should not be changed.

      • Success and Failure Output processing should not be changed.

    • Fields Restriction Section
      The setup of the template does not impose any restrictions, However with respect to "Exit Code Processing Fields" section.

      1. Success/Failure exit codes need to be respected.

      2. In principle, as STDERR and STDOUT outputs can change in follow-up releases of this integration, they should not be considered as a reliable source for determining success or failure of a task.

Users and customers are encouraged to report defects, or feature requests at Stonebranch Support Desk.

Document References

This document references the following documents:

Document Link

Description

Universal Templates

User documentation for creating, working with, and understanding Universal Templates and Integrations.

Universal Tasks

User documentation for creating Universal Tasks in the Universal Controller user interface.

Changelog

ue-aws-emr-1.0.0 (2024-06-06) 

Initial Version