AWS Glue Job |
- In Data Interaction Technique, select your data interaction method. Following are the options:
- Glue-Redshift: Select Glue-Redshift to fetch input from Amazon Redshift, process it in Glue, and store the processed data or output in Redshift.
- Glue: Data Catalog: This method accesses data through the data catalog which serves as a metadata repository. Then the data is processed within Glue and the processed or output data gets stored in the data catalog.
- In Storage Format, select the storage format of your data such as Delta or Iceberg.
- Glue: External: Select this data interaction method to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in the Glue, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Glue, and finally, the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Redshift.
- If the selected data interaction technique is Glue: External, you need to specify the source database of your data. In the Source Database Connection select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Redshift.
- Redshift ETL Orchestration via Glue: This method accesses, processes, and executes data in Amazon Redshift and uses Glue for orchestration. In this scenario, both source data and intermediate tables are converted to Redshift.
- In Attainable Automation, select the way you want the system to calculate achievable automation for the transformation of the source scripts.
- Assessment-Based: Calculates the level of automation based on assessment logic. The conversion-config.json file contains a pre-defined automation percentage for each component and you can also modify it as required.
- Transformation-Based: Calculates the level of automation based on actual conversion. In this method, automation percentage is calculated for each component based on the used, supported, and unsupported properties.
- In Artifacts Location, specify the location from where you need to call external files such as parameter files, orchestration scripts.
- In S3 Bucket Base Path, provide the S3 storage repository path where you need to store the source and target files.
|
Databricks Lakehouse |
- Choose the Output Type as Python 3.
- In Source Database Connection, select the required source database to load the data such as Oracle, SQL Server, Teradata, Netezza, etc.
- In DBFS File Base Path, specify the DBFS (Databricks File System) location where you need to fetch the input files and store the transformed data. In other words, it is a base path for input files and output data.
- In Attainable Automation, select the way you want the system to calculate achievable automation for the transformation of the source scripts.
- Assessment-Based: Calculates the level of automation based on assessment logic. The conversion-config.json file contains a pre-defined automation percentage for each component and you can also modify it as required.
- Transformation-Based: Calculates the level of automation based on actual conversion. In this method, automation percentage is calculated for each component based on the used, supported, and unsupported properties.
|
Spark |
- Choose the Output Type as:
- In Source Database Connection, select the required source database to load the data such as Oracle, SQL Server, Teradata, Netezza, DB2, Vertica.
- Choose the Validation Type - None or Cluster. If the Validation type is Cluster, upload the Data source.
- In Attainable Automation, select the way you want the system to calculate achievable automation for the transformation of the source scripts.
- Assessment-Based: Calculates the level of automation based on assessment logic. The conversion-config.json file contains a pre-defined automation percentage for each component and you can also modify it as required.
- Transformation-Based: Calculates the level of automation based on actual conversion. In this method, automation percentage is calculated for each component based on the used, supported, and unsupported properties.
- In File Base Path, specify the base path for input files and output data.
- In Artifacts Location, specify the location from where you need to call external files such as parameter files, orchestration scripts.
|