Configuring DataStage

This topic provides steps to configure DataStage conversion stage.

Choose the ETL Type as DataStage.
In Input Artifacts, upload the source data via:
- Browse Files: To select the source files from the local system.
- Select From Data Source: To select the source files from the data source. To do so, follow the steps below:
  - Click Select From Data Source.
  - Choose repository.
  - Select data source.
  - Select the entities.
  - Click to save the source data source

Select the Target Type to which you need to transform the source scripts. The Target types are:

Spark
Snowflake
Databricks Notebook
Databricks Lakehouse
Delta Live Tables
AWS Glue studio
Matillion ETL
AWS Glue Job

Click Data Configuration.

The Input column in the table below provides the input requirements based on the Target Type selection.

Target Type	Input
Spark	To convert the sequence jobs to Airflow equivalent and generate the corresponding Python artifacts turn on the Convert Sequence Jobs to Airflow toggle. If you did not activate this toggle, the sequence jobs will convert to Spark equivalent jobs. In Output Type, the default output type for the transformation is set to Python. To perform syntax validation, turn on the Validation toggle. In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Snowflake	To perform syntax validation, turn on the Validation toggle. In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Delta Live Tables
Databricks Notebook	In Output Type, select Python or Juypter as the output type format for the generated artifacts. To convert the DataStage sequence jobs to Databricks workflows equivalent and generate the corresponding JSON artifacts, turn on the Convert Sequence Jobs to Databricks Workflows toggle. To perform syntax validation, turn on the Validation toggle. In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Databricks Lakehouse	In Output Type, select Python or DBT as the output type format for the generated artifacts. To perform syntax validation, turn on Validation toggle. In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
AWS Glue Studio	In Target Database Details, provide database name, schema name, and prefix. If the prefix is provided, the table name displays in prefix_database_tablename format. In AWS Glue Catalog Database, provide the AWS Glue catalog database connection details to connect the database and schema. In S3 Bucket Base Path, specify the S3 storage repository path to store the files. In UDF File Location and UDF Jar Location, specify the UDF file and Jar location path respectively to define the new UDF location. In Target Connection Name, provide the connection name to add the predefined connection to Glue. To perform syntax validation, turn on Validation toggle. In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
AWS Glue Job	In Data Interaction Technique, select Glue: Redshift. This data interaction method consumes input data from Amazon Redshift, processes it in Glue, and stores the processed or output data back in Redshift. In Default Database, select appropriate default database (Teradata, Netezza, Oracle, etc.) for queries where the database type is not defined in the uploaded artifacts. Selecting Not Sure will only convert queries whose database type is available. In Source Data Source, select the data source (DDL) which contains the corresponding metadata to ensure accurate query conversion.
Matillion ETL	In Output Type, the default output type for the transformation is set to JSON. To perform syntax validation, turn on Validation toggle. In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.

Click Save to update the changes.

An alert pop-up message appears. This message prompts you to refer your respective assessment to determine the anticipated quota deduction required when converting your scripts to target. Then click Ok.

Click to provide a preferred pipeline name.
Click to execute the pipeline. Clicking (Execute) navigates you to the listing page which shows your pipeline status as Running state. It changes its state to Success when it is completed successfully
Click pipeline card to see reports.

To view the DataStage conversion report, visit DataStage Conversion Report.

Next:

Configuring SSIS