Configuring SnapLogic – Leaplogic

Configuring SnapLogic

This topic provides steps to configure SnapLogic conversion stage.

Select the ETL Type as SnapLogic.
In Input Artifacts, upload the source data via:
- Browse Files: To select the source files from the local system.
- Select From Data Source: To select the source files from the data source. To do so, follow the steps below:
  - Click Select From Data Source.
  - Choose repository.
  - Select data source.
  - Select the entities.
  - Click to save the source data source.

Select Target Type as Spark to which you need to transform the source scripts.
Click Data Configuration.

The Input column in the table below provides the input requirements based on the Target Type selection.

Target Type	Input
Spark	In Output Type, the default output type for the transformation is set to Python language. In Data Interaction Technique, select your data interaction method. Following are the options: Spark-Native: Select Spark-Native to fetch, process, and store data in Spark. Spark: External: Select this data interaction technique to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Spark, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Databases to establish the database connection and load the input data. Then data is processed in Spark, and finally the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Databases but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Spark. In Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Spark.
AWS Glue Job	In Data Interaction Technique, select your data interaction method. Following are the options: Glue: Redshift: Select Glue-Redshift to fetch input data from Amazon Redshift, process it in Glue, and store the processed or output data in Redshift. In this scenario, source data are converted to Redshift whereas temporary or intermediate tables are converted to Spark. Glue: Data Catalog: This method accesses data through the data catalog which serves as a metadata repository. Then the data is processed in Glue and the processed or output data gets stored in the data catalog. In Storage Format, select the storage format of your data such as Delta or Iceberg. Glue: External: Select this data interaction method to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Glue, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Glue, and finally the processed or output data gets stored at the external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Redshift. If the selected data interaction technique is Glue: External, you need to specify the source database of your data. In the Source Database Connection select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Redshift. Redshift ETL Orchestrated via Glue: This method accesses, processes, and executes data in Amazon Redshift and uses Glue for orchestration jobs. In this scenario, both source data and intermediate tables are converted to Redshift. Glue: Hybrid: This interaction technique can leverage three different data interaction techniques - Glue: Redshift, Glue: Data Catalog, and Glue: External. Depending upon your use case, it can either take input from the data sources such as Redshift, Delta, Iceberg, RDS instance, or an external source; process it in Glue; and move the output (processed) data into the respective data sources. To define the tables and data sources, download the template (CSV file from the Input File field) and specify the tables with their database type etc. For instance, you can select this option if your source tables reside on multiple data sources such as Redshift, Delta, Iceberg, external sources, and RDS instances. It takes input from the specified data sources, processes in Glue, and moves the output (processed) data into the respective data sources. In Input File, upload a CSV file that contains information about tables, including their database type and name. To define tables residing on various data sources, you can download the attached template, make the required changes, and then upload it. In Default Database, select default database for the queries for which the database type is not defined in the uploaded artifacts. Selecting Not Sure will convert only those queries whose database type is available. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
AWS Glue Notebook	In Data Interaction Technique, select your data interaction method. Following are the options: Glue: Redshift: Select Glue-Redshift to fetch input data from Amazon Redshift, process it in Glue, and store the processed or output data in Redshift. In this scenario, source data are converted to Redshift whereas temporary or intermediate tables are converted to Spark. Glue: Data Catalog: This method accesses data through the data catalog which serves as a metadata repository. Then the data is processed in Glue and the processed or output data gets stored in the data catalog. In Storage Format, select the storage format of your data such as Delta or Iceberg. Glue: External: Select this data interaction method to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Glue, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Glue, and finally the processed or output data gets stored at the external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Redshift. If the selected data interaction technique is Glue: External, you need to specify the source database of your data. In the Source Database Connection select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Redshift. Redshift ETL Orchestrated via Glue: This method accesses, processes, and executes data in Amazon Redshift and uses Glue for orchestration jobs. In this scenario, both source data and intermediate tables are converted to Redshift. In Default Database, select default database for the queries for which the database type is not defined in the uploaded artifacts. Selecting Not Sure will convert only those queries whose database type is available. In Property file path, provide the S3 repo path where the property files are stored. In Dependent Utility Path, provide the S3 repo path where the utility files are stored as a wheel binary package. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
AWS Glue Studio	In Target Database Details, specify database name, schema name, and prefix. The table name displays in prefix_database_tablename format if prefix is provided. In AWS Glue Catalog Database, provide the AWS Glue Catalog Database connection details to connect the database and schema. In S3 Bucket Base Path, provide the S3 storage repository path where you need to store the source and target files. In UDF File Location, specify the UDF location. In UDF Jar Location, specify the new UDF jar location. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Databricks Lakehouse	In Data Interaction Technique, select your data interaction method. Following are the options: Databricks-Native: Select Databricks-Native to fetch, process, and store data in Databricks Lakehouse. Databricks: Unity Catalog: Select Databricks: Unity Catalog to access data via Databricks Unity Catalog. In Databricks, the Unity Catalog serves as a metadata repository from which data is fetched, processed, and stored within the catalog. Databricks: External: Select this data interaction technique to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Databricks, and then move the processed data or output to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Databricks, and finally the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Databricks. If the selected data interaction technique is Databricks: External, you need to specify the source database of your data. In the Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Databricks. In Default Database, select default database for the queries for which the database type is not defined in the uploaded artifacts. Selecting Not Sure will convert only those queries whose database type is available. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Databricks Notebook	In Data Interaction Technique, select your data interaction method. Following are the options: Databricks-Native: Select Databricks-Native to fetch, process, and store data in Databricks Notebook. Databricks: Unity Catalog: Select Databricks: Unity Catalog to access data via Databricks Unity Catalog. In Databricks, the Unity Catalog serves as a metadata repository from which data is fetched, processed, and stored within the catalog. Databricks: External: Select this data interaction technique to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Databricks, and then move the processed data or output to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Databricks, and finally the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Databricks. If the selected data interaction technique is Databricks: External, you need to specify the source database of your data. In the Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Databricks. In Default Database, select default database for the queries for which the database type is not defined in the uploaded artifacts. Selecting Not Sure will convert only those queries whose database type is available. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Delta Live Tables	In Data Interaction Technique, select your data interaction method. Following are the options: Databricks-Native: Select Databricks-Native to fetch, process, and store data in Delta Live Tables. Enable DLT Meta toggle to facilitate the creation of a bronze table within the Databricks Lakehouse. Rather than fetching data directly from the source such as flat files, this feature creates a bronze table (exact replica of the file) within Databricks and helps to refine data during data ingestion. With DLT Meta enabled, flat files are stored as tables within Databricks ensuring efficient data retrieval directly from these tables. This enhancement significantly boosts overall performance. In DBFS Base Path, specify the DBFS location where the source flat files and DDL files are stored. This information is required to create the bronze table in Databricks. Databricks: Unity Catalog: Select Databricks: Unity Catalog to access data via Databricks Unity Catalog. In Databricks, the Unity Catalog serves as a metadata repository from which data is fetched, processed, and stored within the catalog. Databricks: External: Select this data interaction method to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in the Databricks, and then move the processed data or output to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then this data is processed in Databricks, and finally, the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Databricks. Enable DLT Meta toggle to facilitate the creation of a bronze table within the Databricks Lakehouse. Rather than fetching data directly from the source such as flat files, this feature creates a bronze table (exact replica of the file) within Databricks and helps to refine data during data ingestion. With DLT Meta enabled, flat files are stored as tables within Databricks ensuring efficient data retrieval directly from these tables. This enhancement significantly boosts overall performance. If the selected data interaction technique is Databricks: External, you need to specify the source database of your data. In the Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Databricks. In DBFS Base Path, provide the DBFS base location where the source flat files and DDL files are stored. This information is required to create the bronze table in Databricks. In Default Database, select default database for the queries for which the database type is not defined in the uploaded artifacts. Selecting Not Sure will convert only those queries whose database type is available. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
Redshift ELT	In Output Type, the default output type for the transformation is set to Python language. In Database Name, provide the target database name to which you need to store the transformed data. In S3 Bucket Base Path, provide the S3 storage repository path where you need to store the source and target files. In Target Data Source, select the target data source to perform syntax validation. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.

Save to update the changes.

An alert pop-up message appears. This message prompts you to refer your respective assessment to determine the anticipated quota deduction required when converting your scripts to target. Then click Ok.

Click to provide a preferred pipeline name.
Click to execute the pipeline. Clicking (Execute) navigates you to the listing page which shows your pipeline status as Running state. It changes its state to Success when it is completed successfully.
Click pipeline card to see report.

To view the SnapLogic conversion report, visit SnapLogic Conversion Report.