Skip to the content
LeaplogicLeaplogic
  • Home
  • About Us
  • Contact
SIGN IN
  • Home
  • About Us
  • Contact

  • Getting Started
    • Before You Begin
    • Creating an Account
    • Logging into LeapLogic
    • Reset Password
    • Quick Tour of the Web Interface
    • LeapLogic in 15 minutes
      • Prerequisites
      • Step 1. Log into LeapLogic
      • Step 2. Create Assessment and Get Insights
      • Step 3. Create Transformation Pipeline and See Results
      • Step 4. Edit or Optimize the Transformed Code
      • Step 5: Complete the Transformation Lifecycle
  • Introduction to LeapLogic
    • Overview
    • High Level Architecture
    • Supported Legacy and Cloud Platforms
    • Key Features
  • LeapLogic Deployment Architecture & Prerequisites
  • Workload Assessment
    • Overview
    • Value Proposition
    • Creating Assessment
      • Prerequisites
      • Step 1. Provide Primary Inputs
        • Automation Coverage
      • Step 2. Add the Additional Inputs
        • Table Stat Extraction Steps
          • Teradata
          • Oracle
          • Netezza
      • Step 3. Update the Source Configuration
      • Step 4. Configure the Recommendation Settings
    • Assessment Listing
    • Understanding Insights and Recommendations
      • Volumetric Info
      • EDW
        • Oracle
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Vertica
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Snowflake
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Azure Synapse
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • SQL Server
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Teradata
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Netezza
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Google Big Query
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Redshift
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • PostgreSQL
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Duck DB
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • ClickHouse
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Exasol
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • DB2
          • Highlights
          • Analysis
          • Optimization
          • Recommendations
          • Lineage
          • Downloadable Reports
      • ETL
        • Informatica
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Ab Initio
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • DataStage
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Talend
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SSIS
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Informatica BDM
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Oracle Data Integrator
          • Highlights
          • Analysis
          • Downloadable Reports
        • Pentaho
          • Highlights
          • Analysis
          • Downloadable Reports
        • Azure Data Factory
          • ARM Template
          • Highlights
          • Analysis
          • Downloadable Reports
        • Matillion
          • Highlights
          • Analysis
          • Downloadable Reports
        • SnapLogic
          • Highlights
          • Analysis
          • Downloadable Reports
      • Orchestration
        • AutoSys
          • Highlights
          • Analysis
          • Downloadable Reports
        • Control-M
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SQL Server
          • Highlights
          • Analysis
      • BI
        • OBIEE
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Tableau
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • IBM Cognos
          • Highlights
          • Analysis
          • Downloadable Reports
        • MicroStrategy
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Power BI
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SSRS
          • Highlights
          • Analysis
          • Downloadable Reports
        • SAP BO
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • WebFOCUS
          • Highlights
          • Analysis
          • Downloadable Reports
      • Analytics
        • SAS
          • Highlight
          • Analysis
          • Lineage
          • Downloadable Reports
        • Alteryx
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
      • Shell
        • Highlights
        • Analysis
        • Lineage
        • Downloadable Reports
      • Integrated Assessment (EDW, ETL, Orchestration, BI)
        • Highlights
        • Analysis
        • Optimization
        • Lineage
        • Recommendations
    • Managing Assessment Reports
      • Downloading Report
      • Input Report Utility
      • View Configuration
    • Lineage - Dependency Analysis
    • Column-Level Lineage
      • Manage Lineage
    • Complexity Calculation Logic
    • Key Benefits
    • Ad hoc Query
  • Script/ Query Log/ Code Extraction Prerequisites
    • Cloud
      • Azure Data Factory
      • Snowflake
      • Azure Synapse
      • Google BigQuery
      • Redshift
      • Azure SQL Database Hyperscale
      • Aurora PostgreSQL
    • EDW
      • Oracle
      • Netezza
      • Teradata
      • Vertica
      • SQL Server
      • Db2
      • MySQL
      • PostgreSQL
    • ETL
      • DataStage
      • Informatica
      • SSIS
      • Talend
      • ODI
      • IICS
      • DBT
      • Pentaho
      • Matillion
      • SnapLogic
      • Ab Initio
      • SAP BODS
      • TAC
      • WebFOCUS
    • BI
      • IBM Cognos
      • OBIEE
      • Tableau
      • Metabase
      • MicroStrategy
      • PowerBI
      • LeapLogic Utility for SAP BO
      • SAP BO Universe and Web Intelligence
      • SSRS
    • Analytics
      • SAS
      • Alteryx
    • Orchestration
      • AutoSys
      • Control-M
      • SQL Server
    • Mainframe
  • Metadata Management
    • Overview
    • Introduction to Data Catalog
      • Managing Data Catalog
        • Building Data Catalog
        • Insights to Data Catalog
        • Managing the Repository and Data Source
      • Creating Repository (Repo)
      • Creating Data Source
    • Tag Management
    • Key benefits
  • Batch Processing using Pipeline
    • Introduction
    • Designing Pipeline
      • How to create a pipeline
        • Configuring Migration Stage
          • Schema Optimization
        • Configuring Transformation Stage
          • On-premises to Cloud
          • Cloud-to-Cloud
            • Assigning Roles
            • Intelligent Modernization
          • LeapFusion
        • Configuring Validation Stage
          • Data Validation
            • Table
            • File
            • File and Table
            • Cell-by-cell validation
          • Query Validation
            • Query Validation (When Data is Available)
            • Query Validation (When Data is Not Available)
          • Schema Validation
        • Configuring Execution Stage
        • Configuring ETL Conversion Stage
          • Ab Initio
          • Informatica
          • Informatica BDM
          • Matillion
          • DataStage
          • SSIS
          • IICS
          • Talend
          • Oracle Data Integrator
          • Pentaho
          • SnapLogic
        • Configuring Mainframe Conversion Stage
          • Cobol
          • JCL
        • Configuring Orchestration Stage
          • AutoSys
          • Control-M
        • Configuring BI Conversion Stage
          • OBIEE to Power BI
          • OBIEE to AWS QuickSight
          • Tableau to Amazon QuickSight
          • Tableau to Power BI
          • Tableau to Superset
          • Tableau to Looker
          • IBM Cognos to Power BI
        • Configuring Analytics Conversion Stage
          • SAS
          • Alteryx
        • Configuring Script Conversion Stage
    • Key Features
      • How to schedule a pipeline
      • Configuring Parameters
  • Pipeline Reports
    • Overview of Pipeline Report
    • Pipeline Listing
    • Reports and Insights
      • Migration
      • Transformation
        • On-premises to Cloud
        • Cloud-to-Cloud
      • Validation
        • Data
          • File
          • Table
          • File and Table
        • Query
          • Query Validation Report (When Data is Available)
          • Query Validation Report (When Data is not Available)
        • Schema
      • Execution
      • ETL
        • Ab Initio
        • Informatica
        • Informatica BDM
        • Matillion
        • DataStage
        • SSIS
        • IICS
        • Talend
        • Oracle Data Integrator
        • Pentaho
        • SnapLogic
      • Mainframe
        • Cobol
        • JCL
      • Orchestration
        • AutoSys
        • Control-M
      • BI
        • OBIEE to Power BI
        • OBIEE to Amazon QuickSight
        • Tableau to Amazon QuickSight
        • Tableau to Power BI
        • Tableau to Superset
        • Tableau to Looker
        • IBM Cognos to Power BI
      • Analytics
        • SAS
        • Alteryx
      • Shell Script
      • Common Model
    • Automation Level Indicator
      • ETL
        • Informatica
        • Matillion
        • DataStage
        • Informatica BDM
        • SnapLogic
        • IICS
        • Ab Initio
        • SSIS
        • Talend
        • Pentaho
      • Orchestration
        • AutoSys
        • Control-M
      • EDW
      • Analytics
        • SAS
        • Alteryx
      • BI
      • Shell Script
    • Error Specifications & Troubleshooting
  • SQL Transformation
    • Overview
    • Creating and Executing the Online Notebook
      • How to Create and Execute the Notebook
      • Supported Features
    • Configuring the Notebook
      • Transformation
      • Unit Level Validation
      • Script Level Validation
    • Notebook Listing
  • Operationalization
    • Overview
      • Basic
      • Advanced
      • Cron Expression
    • Parallel Run Pipeline Listing
  • Transformation Source
    • Introduction
    • Creating Transformation Source Type
  • Governance
    • Summary of Governance - Roles and Permissions
    • User Creation
      • Creating a new User Account
    • Adding Roles and permissions
      • How to add Roles and Permissions to a new user?
    • Adding Group Accounts
    • Manage Quota
    • Product Usage Metrics
  • License
    • Workload Assessment
    • EDW Conversion
    • ETL Conversion
    • BI Conversion
    • Orchestration Conversion
    • Data Migration
    • Data Validation
  • LeapLogic Desktop Version
    • Overview
    • Registration and Installation
    • Getting Started
    • Creating Assessment
      • ETL
      • DML
      • Procedure
      • Analytics
      • Hadoop
    • Reports and Insights
      • Downloadable Reports
      • Reports for Estimation
    • Logging and Troubleshooting
    • Sample Scripts
    • Desktop vs. Web Version
    • Getting Help
  • LeapLogic (Version 5.0) Deployment
    • System Requirements
    • Prerequisites
    • Deployment
      • Extracting Package
      • Placing License Key
      • Executing Deployment Script
      • Accessing LeapLogic
    • Uploading License
    • Appendix
    • Getting Help
  • Removed Features
    • Configuring File Validation Stage
    • Variable Extractor Stage
      • Variable Extractor Report
    • Configuring Meta Diff Stage
      • Meta Diff
    • Configuring Data Load Stage
      • Data Load
    • Configuring Multi Algo Stage
  • FAQs
  • Tutorial Videos
  • Notice
Home   »  Batch Processing using Pipeline   »  Designing Pipeline   »  How to create a pipeline   »  Configuring ETL Conversion Stage  »  Configuring DataStage

Configuring DataStage

This topic provides steps to configure DataStage conversion stage.

  1. Choose the ETL Type as DataStage.
  2. In Input Artifacts, upload the source data via:
    • Browse Files: To select the source files from the local system.
    • Select From Data Source: To select the source files from the data source. To do so, follow the steps below:
      • Click Select From Data Source.
      • Choose repository.
      • Select data source.
      • Select the entities.
      • Click to save the source data source
  1. Select the Target Type to which you need to transform the source scripts. The Target types are:
    1. Spark
    2. Snowflake
    3. Databricks Notebook
    4. Databricks Lakehouse
    5. Delta Live Tables
    6. AWS Glue studio
    7. Matillion ETL
    8. AWS Glue Job
  1. Click Data Configuration.

    The Input column in the table below provides the input requirements based on the Target Type selection.

    Target Type Input
    Spark
    • To convert the sequence jobs to Airflow equivalent and generate the corresponding Python artifacts turn on the Convert Sequence Jobs to Airflow toggle. If you did not activate this toggle, the sequence jobs will convert to Spark equivalent jobs.
    • In Output Type, the default output type for the transformation is set to Python.
    • To perform syntax validation, turn on the Validation toggle.
      • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
      • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
     
    Snowflake
    • To perform syntax validation, turn on the Validation toggle.
    • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
     
    Delta Live Tables
    • Enable DLT Meta toggle to facilitate the creation of a bronze table within the Databricks Lakehouse. Rather than fetching data directly from the source such as flat files, this feature creates a bronze table (exact replica of the file) within Databricks and helps to refine data during data ingestion. With DLT Meta enabled, flat files are stored as tables within Databricks ensuring efficient data retrieval directly from these tables. This enhancement significantly boosts overall performance.
    • In DBFS Base Path, provide the DBFS base location where the source flat files and DDL files are stored. This information is required to create the bronze table in Databricks.
    • To perform syntax validation, turn on the Validation toggle.
    • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
    Databricks Notebook
    • In Output Type, select Python or Juypter as the output type format for the generated artifacts.
    • In Data Interaction Technique, select your data interaction method. Following are the options:
      • Databricks-Native: Select Databricks-Native to fetch, process, and store data in Databricks Notebook.
      • Databricks: Unity Catalog: Select Databricks: Unity Catalog to access data via Databricks Unity Catalog. In Databricks, the Unity Catalog serves as a metadata repository from which data is fetched, processed, and stored within the catalog.
      • Databricks: External: Select this data interaction technique to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Databricks, and then move the processed data or output to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Databricks, and finally the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Databricks.
        • If the selected data interaction technique is Databricks: External, you need to specify the source database of your data. In the Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Databricks.
    • To convert the DataStage sequence jobs to Databricks workflows equivalent and generate the corresponding JSON artifacts, turn on the Convert Sequence Jobs to Databricks Workflows toggle.
    • In Default Database, select appropriate default database (Teradata, Netezza, Oracle, etc.) for queries where the database type is not defined in the uploaded artifacts. Selecting Not Sure will only convert queries whose database type is available.
    • To perform syntax validation, turn on the Validation toggle.
    • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
    Databricks Lakehouse
    • In Output Type, select Python or DBT as the output type format for the generated artifacts.
    • In Data Interaction Technique, select your data interaction method. Following are the options:
      • Databricks-Native: Select Databricks-Native to fetch, process, and store data in Databricks Lakehouse.
      • Databricks: Unity Catalog: Select Databricks: Unity Catalog to access data via Databricks Unity Catalog. In Databricks, the Unity Catalog serves as a metadata repository from which data is fetched, processed, and stored within the catalog.
      • Databricks: External: Select this data interaction technique to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Databricks, and then move the processed data or output to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Databricks, and finally the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Databricks.
        • If the selected data interaction technique is Databricks: External, you need to specify the source database of your data. In the Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Databricks.
    • In Default Database, select appropriate default database (Teradata, Netezza, Oracle, etc.) for queries where the database type is not defined in the uploaded artifacts. Selecting Not Sure will only convert queries whose database type is available.
    • To perform syntax validation, turn on Validation toggle.
    • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
    AWS Glue Studio
    • In Target Database Details, provide database name, schema name, and prefix. If the prefix is provided, the table name displays in prefix_database_tablename format.
    • In AWS Glue Catalog Database, provide the AWS Glue catalog database connection details to connect the database and schema.
    • In S3 Bucket Base Path, specify the S3 storage repository path to store the files.
    • In UDF File Location and UDF Jar Location, specify the UDF file and Jar location path respectively to define the new UDF location.
    • In Target Connection Name, provide the connection name to add the predefined connection to Glue.
    • To perform syntax validation, turn on Validation toggle.
    • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
     
    AWS Glue Job
    • In Data Interaction Technique, Glue: Redshift is selected by default. This data interaction method consumes input data from Amazon Redshift, processes it in Glue, and stores the processed or output data back in Redshift. 
    • In Default Database, select appropriate default database (Teradata, Netezza, Oracle, etc.) for queries where the database type is not defined in the uploaded artifacts. Selecting Not Sure will only convert queries whose database type is available.
    • In Source Data Source, select the data source (DDL) which contains the corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
     
    Matillion ETL
    • In Output Type, the default output type for the transformation is set to JSON language.
    • To perform syntax validation, turn on Validation toggle.
    • In Source Data Source, select the data source (DDL) which contains corresponding metadata to ensure accurate query conversion.
    • In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
     
  1. Click Save to update the changes.

  1. An alert pop-up message appears. This message prompts you to refer your respective assessment to determine the anticipated quota deduction required when converting your scripts to target. Then click Ok.
  1. Click  to provide a preferred pipeline name.
  2. Click  to execute the pipeline. Clicking  (Execute) navigates you to the listing page which shows your pipeline status as Running state. It changes its state to Success when it is completed successfully
  3. Click pipeline card to see reports.

To view the DataStage conversion report, visit DataStage Conversion Report.


Next:

Configuring SSIS


To learn more, contact our support team or write to: info@leaplogic.io

Copyright © 2025 Impetus Technologies Inc. All Rights Reserved

  • Terms of Use
  • Privacy Policy
  • License Agreement
To the top ↑ Up ↑