Configuring Talend – Leaplogic

Configuring Talend

This topic provides steps to configure the Talend ETL conversion stage.

Select the ETL Type as Talend.
In Input Artifacts, upload the source data via:
- Browse Files: To select the source files from the local system.
- Select From Data Source: To select the source files from the data source. To do so, follow the steps below:
  - Click Select From Data Source.
  - Choose repository.
  - Select data source.
  - Select the entities.
  - Click to save the source data source.

Select Target Type as AWS Glue Job, Databricks Lakehouse or Spark to which you need to transform the source scripts.
Click Data Configuration.

The Input column in the table below provides the input requirements based on the Target Type selection.

Target Type	Input
AWS Glue Job	In Data Interaction Technique, select your data interaction method. Following are the options: Glue-Redshift: Select Glue-Redshift to fetch input from Amazon Redshift, process it in Glue, and store the processed data or output in Redshift. Glue: Data Catalog: This method accesses data through the data catalog which serves as a metadata repository. Then the data is processed within Glue and the processed or output data gets stored in the data catalog. In Storage Format, select the storage format of your data such as Delta or Iceberg. Glue: External: Select this data interaction method to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in the Glue, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Glue, and finally, the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Redshift. If the selected data interaction technique is Glue: External, you need to specify the source database of your data. In the Source Database Connection select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Redshift. Redshift ETL Orchestration via Glue: This method accesses, processes, and executes data in Amazon Redshift and uses Glue for orchestration. In this scenario, both source data and intermediate tables are converted to Redshift. In Attainable Automation, select the way you want the system to calculate achievable automation for the transformation of the source scripts. Assessment-Based: Calculates the level of automation based on assessment logic. The conversion-config.json file contains a pre-defined automation percentage for each component and you can also modify it as required. Transformation-Based: Calculates the level of automation based on actual conversion. In this method, automation percentage is calculated for each component based on the used, supported, and unsupported properties. In Artifacts Location, specify the location from where you need to call external files such as parameter files, orchestration scripts. In S3 Bucket Base Path, provide the S3 storage repository path where you need to store the source and target files.
Databricks Lakehouse	In Output Type, the default output type for the transformation is set to Python. In Source Database Connection, select the required source database to load the data such as Oracle, SQL Server, Teradata, Netezza, etc. In DBFS File Base Path, specify the DBFS (Databricks File System) location where you need to fetch the input files and store the transformed data. In other words, it is a base path for input files and output data. In Attainable Automation, select the way you want the system to calculate achievable automation for the transformation of the source scripts. Assessment-Based: Calculates the level of automation based on assessment logic. The conversion-config.json file contains a pre-defined automation percentage for each component and you can also modify it as required. Transformation-Based: Calculates the level of automation based on actual conversion. In this method, automation percentage is calculated for each component based on the used, supported, and unsupported properties.
Spark	In Output Type, the default output type for the transformation is set to Python In Source Database Connection, select the required source database to load the data such as Oracle, SQL Server, Teradata, Netezza, DB2, Vertica. Choose the Validation Type - None or Cluster. If the Validation type is Cluster, upload the Data source. In Attainable Automation, select the way you want the system to calculate achievable automation for the transformation of the source scripts. Assessment-Based: Calculates the level of automation based on assessment logic. The conversion-config.json file contains a pre-defined automation percentage for each component and you can also modify it as required. Transformation-Based: Calculates the level of automation based on actual conversion. In this method, automation percentage is calculated for each component based on the used, supported, and unsupported properties. In File Base Path, specify the base path for input files and output data. In Artifacts Location, specify the location from where you need to call external files such as parameter files, orchestration scripts.

Click Save to update the changes.
An alert pop-up message appears. This message prompts you to refer your respective assessment to determine the anticipated quota deduction required when converting your scripts to target. Then click Ok.

Click to provide a preferred pipeline name.
Click to execute the pipeline. Clicking (Execute) navigates you to the listing page which shows your pipeline status as Running state. It changes its state to Success when it is completed successfully.
Click pipeline card to see report.

To view the Talend conversion report, visit Talend Transformation Report.