AWS Glue Studio |
- In Database Name, Schema Name, and Prefix, provide database name, schema name, and prefix respectively. If the prefix is provided, the table name displays in prefix_database_tablename format.
- In AWS Glue Catalog Database, provide the AWS Glue catalog database connection details to connect the database and schema.
- In S3 Bucket Base Path, specify the S3 storage repository path to store the files.
- In UDF File Location, specify the UDF file location.
- In UDF Jar Location, specify the jar location path to define the new UDF location.
|
AWS Glue Job |
- In Data Interaction Technique, select your data interaction method. The following are the options:
- Glue-Redshift: Select Glue-Redshift to fetch input data from Amazon Redshift, process it in Glue, and store the processed or output data in Redshift. In this scenario, source data are converted to Redshift whereas temporary or intermediate tables are converted to Spark.
- Glue: Data Catalog: This method accesses data through the data catalog which serves as a metadata repository. Then the data is processed in Glue and the processed or output data gets stored in the data catalog.
- In Storage Format, select the storage format of your data such as Delta or Iceberg.
- Glue: External: Select this data interaction method to fetch input data from an external source such as Oracle, Netezza, Teradata, etc., and process that data in Glue, and then move the processed or output data to an external target. For instance, if the source input file contains data from any external source like Oracle, you need to select Oracle as the source database to establish the database connection and load the input data. Then data is processed in Glue, and finally the processed or output data gets stored at the external target (Oracle). However, if you select Oracle as the Source Database Connection but the source input file contains data from an external source other than Oracle, such as Teradata, then by default, it will run on Redshift.
- If the selected data interaction technique is Glue: External, you need to specify the source database of your data. InSource Database Connection, select the database you want to connect to. This establishes the database connection to load data from external sources like Oracle, Teradata, etc. If the database is selected, the converted code will have connection parameters (in the output artifacts) related to the database. If the database is not selected, you need to add the database connection details manually to the parameter file to execute the dataset; otherwise, by default, it executes on Redshift.
- Redshift ETL Orchestration via Glue: This method accesses, processes, and executes data in Amazon Redshift and uses Glue for orchestration jobs. In this scenario, both source data and intermediate tables are converted to Redshift.
- In Default Database, select the source database against which all the pipelines are configured.
|