Creating Data Source

The data source contains information about the database or schema. To create metadata, you can extract any data set or access data from a variety of data sources. Before you start extracting the data or entities, make sure that you have already set up a location or a repository for the source.

To add the data source, follow the below steps:

In Repository, select the repository name to which you need to associate the data source.
In Data Source Name, enter your preferred data source name and then provide the input requirements based on the category type selection in the repository.

The Input column in the table below provides the input requirements based on the category and type you have chosen to create the repository.

Catagory	Type	Input
Big data	Databricks Lakehouse	Database Name: Provide the Database Name from which you want to fetch the entities. Workspace Name: Domain name/ creator’s name used to login to Databricks. Cluster ID: Provide the cluster-ID. Additional information Access Token: Provide Access Tokens to authenticate databricks. In Databricks, the authentication flows rely on tokens instead of passwords. Transport Mode: Provide Transport mode details. You can provide either the jdbc url such as jdbc:spark://<address>;<transportMode>;<httpPath>;<Authentication Mechanism>;<UID>; <Password> or the transport mode. Here the transport Mode is http. SSL: Secure Sockets Layer (SSL) is used to establish secure communication between client and server or to provide security over the Internet. HTTP Path: Provide an HTTP path to identify and access the resource in the host. Auth Mechanism: Provide Auth Mechanism to ensure security. UID: Provide a unique identification number to ensure security.
	Databricks
	Google Cloud BigQuery	Database Name: Provide the GCP database name to retrieve the metadata. Bucket Name: Provide the Google cloud bucket name which can be temporarily used in data migration.
	Hive	Database Name: Provide the Database Name from which you want to fetch the entities. Warehouse Directory: Provide the Warehouse Directory which acts as a database directory that contains Hive tables. Keytab Path: Keytab (also known as Key table) contains service keys that are mainly used to allow server applications to accept authentications from clients. Client Principal: Provide Client Principal. The clients are the users who initiate communication for a service request. The identity of the user is identified by the client principal for authentication. Service Principal: Provide an identity for service that runs on hosts. For example, hive/xxxxxx-xxxxxx.xxxxxx.co.in@IMPETUS.CO.IN. Metastore Principal: Provide the ability to remove files and directories within the hive/warehouse directory. For example, hive/xxxxxx-xxxxxx.xxxxxx.co.in@IMPETUS.CO.IN .
	Spark	Database Name: Provide the Database Name from which you want to fetch the entities. Keytab Path: Keytab (also known as Key table) contains service keys that are mainly used to allow server applications to accept authentications from clients. Client Principal: Provide Client Principal details. The clients are the users who initiate communication for a service request. The identity of the user is identified by the client principal for authentication. Service Principal: Provide an identity for service that runs on hosts. Metastore Principal: Provide the ability to remove files and directories within the database/ warehouse directory. Spark Service Principal: Provide Spark Service Principal to identify the spark resources.
	Spark
DDL	Greenplum	The data source is automatically created by grabbing the data from the uploaded DDL file.
	Netezza
	Oracle
	SQL Server
	Teradata
	Vertica
	Vertica
ETL	AWS Glue	IAM role: Provide IAM role to ensure security. Region: Refers to a physical location where data centers are grouped. Temp Directory: It is used as a temporary directory for the job. Schema Name: Refers to a schema. Database Name: Provide the Database Name from which you want to fetch the entities. Bucket Name: Provide a bucket name. Access id: Provide access id to access the AWS console. Secret Key: Provide a credential key to access the AWS console.
File System	Amazon S3	Schema Name: Refers to a schema. Database Name: Provide the Database Name from which you want to fetch the entities. Bucket Name: Provide a bucket name. Access id: Provide access id to access the AWS console. Secret Key: Provide a credential key to access the AWS console.
	Azure Data Lake Storage	Account Name: Provide the account name. Version: Choose the version such as Azure Data Lake Storage Gen1 or Azure Data Lake Storage Gen 2. If you choose version as Azure Data Lake Storage Gen1, Username: Provide username. Password: Provide a password. Client ID: Provide client id. Credentials: Provide the credentials. Refresh URL: Provide Refresh URL for authentication. To access the application, you need to call the refresh url by using correct or valid credentials. Default Endpoints Protocol: Mounting protocol for the Azure Data Lake Storage. Account Key: Provide the key. If you choose version as Azure Data Lake Storage Gen2, Container Name: Provide a Container name that organizes a set of blobs. SAS Token: Provide a SAS token. Storage Account Access Key: Provide an access key to authorize data.
	DBFS	Access Token: Provide the access token to authenticate and access the DBFS. Path: Provide the path.
	File Transfer Protocol	Provide warehouse directory which contains data.
	Secured File Transfer Protocol	Warehouse Directory: Provide warehouse directory which contains data/ entity for every file. Authentication: Choose Credential based or Key based authentication method.
	Unix File System	If you have specified "localhost" in the Host Address when creating the repository, Warehouse Directory: Provide warehouse directory which contains data. If you have specified an IP address in the Host Address when creating the repository, Warehouse Directory: Provide warehouse directory which contains data. Authentication: Choose Credential based or Key based authentication method.
	Google Cloud Storage	Provide the bucket name.
	Google Cloud Storage	Provide the bucket name.
	HDFS	Warehouse Directory: Provide warehouse directory which contains data/ entity for every file. Keytab Path: Keytab (also known as Key table) contains service keys that are mainly used to allow server applications to accept authentications from clients. Client Principal: Provide Client Principal details.
MPP	Netezza	Provide the Database Name from which you want to fetch the entities.
MPP	Teradata
RDBMS	Azure Synapse	Schema Name: Refers to a schema. Database Name: Provide the Database Name from which you want to fetch the entities.
	Greenplum
	Oracle
	Redshift
	Vertica
	SQL Server
	Snowflake	Schema Name: Refers to a schema. Database Name: Provide the Database Name from which you want to fetch the entities. Warehouse Name: Provide the warehouse name.
	Aurora PostgreSQL	Provide the schema and database name.
	Others	Driver Name: Provide the name of the driver. Driver Class Name: Provide the Driver Class Name. Connection URI: Provide connection URI. Schema Name: Refers to a schema. Database Name: Provide the Database Name from which you want to fetch the entities.
	Cloud SQL for Postgres	Database Name: Provide the Database Name from which you want to fetch the entities. Auth Type: Select the authentication type such as Password or IAM.
Business Intelligence	Power BI	Nil
	Amazon QuickSight	AWS Account Id: Provide the AWS account ID. Region: Refers to a physical location where data centers are grouped. Access Key: Provide access key to access the Amazon QuickSight. Secret Key: Provide a password to access Amazon QuickSight. Principal Username: Provide username.
	Looker	Looker EndPoint: Provide the URL to connect with the required Looker endpoint. Client ID: Provide the client ID to access the Looker endpoint. Client Secret: Provide the client secret to access the Looker endpoint.
Version Control System	Git	Endpoint: Provide the URL to connect with the required Git repository. Client ID: Provide the client ID to access the endpoint. Client Secret: Provide the client secret to access the endpoint.

If the created repository is Teradata, In Database Name, enter the database name from which you want to fetch the entities.

In Username and Password , provide a valid username and password to connect to the required database.
In Connection Info, provide connection details such as input sources, like offline query logs or live query logs, etc.
Click Test to verify whether you can connect with the data source. When the tool is successfully connected to the schema/database, it displays ; if not, it displays .
To facilitate the importing and synchronizing of the referred schema/database, turn on the Import Data Source toggle.
In Tags and Description , provide tags and description respectively.

In Email , provide your e-mail address to receive system-generated updates made to the data source.

Click at the top right of the Data Source page to create a data source. When a Data Source is successfully created, the system displays an alerting snackbar pop-up window to notify the success message.