Databricks Notebook |
- In Data Interaction Technique, select your data interaction method. Following are the options:
- Databricks-Native: Select Databricks-Native to fetch, process, and store data in Databricks Lakehouse.
- Databricks: Unity Catalog: Select Databricks: Unity Catalog to access data via Databricks Unity Catalog. In Databricks, the Unity Catalog serves as a metadata repository from which data is fetched, processed, and stored within the catalog.
- Databricks: External: Select this data interaction technique to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Databricks, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the Source Database Connection to establish the database connection and load the input data. Then data is processed in Databricks, and finally the processed or output data gets stored at an external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Databricks.
- If the selected data interaction technique is Databricks: External, you need to specify the source database of your data. In the Source Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Databricks.
- Convert Orchestration Jobs to Databricks Workflows converts Matillion orchestration jobs to Databricks workflow equivalent and generates the corresponding JSON artifacts. By default, this feature is enabled.
- In Validation Type, select None or Cluster as the mode of validation.
- None: Select this option if you do not want to perform any validation.
- Cluster: Select this option to perform syntax validation.
- In Data Source, upload the corresponding data source. To successfully perform the syntax validation of the transformed queries, it is advisable to ensure that the required input tables are created or already present on the target side and secondly all the user-defined functions (UDFs) are registered on the target data source.
|