Deployment
This section covers the process of deployment of LeapLogic 4.4 in detailed steps.
In this Topic:
LeapLogic 4.4 release is shipped as a TAR package. Follow the below-given steps to extract the package.
- Copy the LeapLogic 4.4 package-idw-{version}.tar.gz to a directory on the edge node.
- Extract the package from the archived file using the below command:
tar -xzf idw-{version}.tar.gz
A directory with the name idw-{version} is created.- Change the current directory to the above extracted directory using the below command:
This directory is referenced as IDW_HOME. Ensure that the ownership of the IDW_HOME directory is the same as the user deploying the LeapLogic 4.4 release.
The directory structure of IDW_HOME is as given below.
idw-{version}
application/assessment/
commons/
governance/
license-validation/
metadata/
turin-pipeline/
validation/
web-ui-2.0/
workload-migration/
bin/
conf/
lib/connectors/
hadoop_distro_libs/
jars/
platform/azcopy
hadoop/
hive/
postgres/ (or “mysql”, “rds/”, as applicable)
power bi/
python/
sasCommonFunctionDependency
sbt/
scala/
spark/
Provide External Libraries
The IDW_HOME directory contains a lib folder. You need to provide external libraries for the LeapLogic services. The libraries are categorized as:
- Database Connectors: List of source databases (driver JARs for Teradata, Netezza, Oracle, etc.)
- Hadoop Distribution JARs: Libraries for Hadoop ecosystem components (Hadoop / Hive / Spark etc.)
LeapLogic and metadata components also use connector JARs. The directory location is “IDW_HOME/lib/connectors/”.
You need to copy or create links for the required JARs in the respective directories before the deployment process. For the Hadoop distribution jars, when ./configure.sh script is executed, it will automatically create the symbolic links.
Also, the copying of connector jars is automated for on premise deployments. In case of on premise deployment, execute connectors-jars.sh script before configure.sh script which will automatically copy all the connector jars in the deployment bundle. It will copy all the connector jars to “leaplogic_bundle_name/lib/connectors” location. However, in case of deployment on cloud (kept on AWS S3) or for client (kept on Expanse), this step must be still performed manually only.
The required JAR list is provided in the last sub-section of this section Configuring External Libraries.
Provide Environment Details
The IDW_HOME/bin folder contains a file namely idw-env.sh. This file contains the configuration related to the environment like JAVA_HOME, IDW_HOST, Ports, APP_USERNAME, etc.
You need to configure environment settings in the idw-env.sh file. The next few sections will elaborate different properties to be configured in the idw-env.sh file.
a) Node Configuration
It is recommended to deploy LeapLogic on an edge node. You need to specify the node settings like JAVA_HOME, hostname, database password and the ports configuration for LeapLogic services.
LeapLogic services use default ports from 13000 to 13300. You may edit the port configurations in idw-env.sh file as per your network policies.
To ensure security, it is recommended to specify encrypted passwords. Refer to Password Encryption Utility for more details.
The variables, other than port settings, that need to be configured in the idw-env.sh file for LeapLogic services are as follows.
Property Name |
Description |
JAVA_HOME |
The Java JDK 17 installation directory on the node |
IDW_HOST |
FQDN for the node |
IDW_INTERNAL_HOST |
Internal IP of the machine. Default value is fetched using `hostname-f` |
b) Hadoop Distribution Details
LeapLogic services consume underlying Hadoop environment and ecosystem components like Hive, Sqoop, etc. If it’s available, you need to provide basic information about the Hadoop cluster. If not, then update the property “export IS_HADOOP_CLUSTER_PRESENT” to “false” in the idw-env.sh file and skip the rest of this section.
The common variables that need to be configured in the idw-env.sh file for Hadoop are as follows.
Property Name |
Description |
JDK_VERSION_HADOOP_DISTRO |
The version of JDK 1.8 for the underlying Hadoop cluster |
HADOOP_DISTRIBUTION |
The underlying Hadoop distribution is CDH or EMR |
FS_DEFAULT_NAME |
Hadoop HDFS URI. Hadoop URI (uniform resource identifier) consists of Scheme, Authority and Path. In other words, the fully qualified name of the directory/file is determined by these three parameters. The URI format is scheme://authority/path. |
HIVE_METASTORE_URI |
Hive Metastore Service URI. The property hive.metastore.uris is a comma separated list of metastore URIs on which a metastore service is running. |
YARN_RESOURCE_MANAGER_HOST |
Resource Manager FQDN |
HADOOP_HOME |
Directory on edge node for Hadoop installation. Set its value as per the cluster configuration. Its default value is as follows: /opt/cloudera/parcels/CDH/lib/hadoop export HADOOP_HOME="/opt/cloudera/parcels/CDH/lib/hadoop"; export HIVE_CLIENT_LIB_DIR="/opt/cloudera/parcels/CDH/lib/hive/lib"; |
export HADOOP_DISTRO_VERSION="" |
Set distribution version in this property, for example, 6.2 |
HIVE_CLIENT_LIB_DIR |
This refers to the directory on the edge node for Hive libraries |
SPARK_YARN_HOME |
This refers to the directory on the edge node for the Spark2 client home |
WM_QLV_CLASSPATH_CDH |
Set path of Hadoop and Spark JARs to enable the LeapLogic Query Validation feature.
For example:
WM_QLV_CLASSPATH_CDH="${HADOOP_HOME}/client/*:${IDW_PLATFORM_DIR}/spark/jars/*:${IDW_PLATFORM_DIR}/hive/conf/hive-site.xml";
"${HADOOP_HOME}/client/*" – this is only available in CDH 6.x and not in 7.x |
TURIN_DISTRO_CLASSPATH |
Set path of Hadoop and Hive JARs to enable the Turin pipeline executor. For example: ${HIVE_CLIENT_LIB_DIR}/*:${HADOOP_HOME}/client/* |
HDFS_TEMP_LOCATION |
This is the location where temp files are placed at the HDFS location. For example: HDFS_TEMP_LOCATION=/tmp Note: Please make sure that this location has read and write permission for the end-user. |
HDFS_UDFBINARY_LOCATION |
This is the location where UDF JARs are placed at the HDFS location. For example: HDFS_UDFBINARY_LOCATION=/tmp/lib/udf-jars/wmg Note: Please make sure that this location has read and write permission for the end-user. |
CLUSTER_MAPRED_HOME |
This is the location of the MapReduce library
export CLUSTER_MAPRED_HOME="/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce"; |
ENABLE_SAVE_TRANSLATED_QUERY_TO_CACHE |
By default, this property will remain False for in-premise deployment. However, it should be made True for AWS deployments. This is particularly useful for LeapLogic Runtime.
export ENABLE_SAVE_TRANSLATED_QUERY_TO_CACHE="false"; |
The idw-env.sh file also contains a few properties related to YARN Resource Manager’s port for Hadoop services. You need to ensure that valid ports are configured as per the underlying Hadoop cluster.
c) License Details
LeapLogic’s License module deducts quota as queries, blocks, and scripts. If the customer provides the consent to let the LeapLogic Centralized Licensing server capture the quota deduction information at a pre-configured centralized location, then it stores the information about consumed and left-over quota.
The variables that need to be configured in the idw-env.sh file for License and quota deduction are as follows.
Property Name |
Description |
LICENSE_MANAGER_CONSENT_FLAG |
This refers to the property whether the customer has given the consent to store the quota information in the centralized licensing server or not. This can be set as true or false. |
LICENSE_MANAGER_ENDPOINT_URL |
This is the URL of the centralized licensing server where the quota information will be saved for the customer. Its value is in the format: VALUE_LICENSE_MANAGER_ENDPOINT_URL |
LICENSE_MANAGER_HTTPS_FLAG |
If the deployment is with HTTPS, then set export LICENSE_MANAGER_HTTPS_FLAG to true |
LICENSE_PRIVATE_KEY_FILEPATH |
This is the license private key file path where the private key file needs to be placed. This is necessary for decryption. Its value is in the format: VALUE_LICENSE_PRIVATE_KEY_FILEPATH |
d) Kerberos Details
This configuration is applicable only when the Hadoop cluster is used. It’s recommended but an optional configuration, only if the underlying cluster is non-kerberised.
LeapLogic services also have the capability to interact with a Kerberos enabled Hadoop environment. You need to configure Kerberos during LeapLogic deployment.
You need to set the value of IS_KERBERISED variable to true in the idw-env.sh file.
The variables that need to be configured in the idw-env.sh file for Kerberos are as follows.
Property Name |
Description |
IS_KERBERISED |
This refers to the property whether the cluster is kerberised which can be set as true or false. |
HIVE_PRINCIPAL |
Hive principal for accessing the Hive services. |
HDFS_PRINCIPAL |
HDFS principal for accessing the HDFS services. |
KERBERISED_PRINCIPAL |
Kerberos user principal for authentication. |
KERBERISED_USER_KEYTAB_FILE |
Kerberos user Keytab file. |
KERBERISE_TICKET_CACHE |
Kerberos environment ticket cache file location. |
KERBERISED_REALM |
Kerberos realm for the cluster. It is the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service. A realm name is often, but not always the upper-case version of the name of the DNS domain over which it presides. |
KINIT_COMMAND |
Kerberos kinit command for user authentication. The kinit command obtains or renews a Kerberos ticket. |
HIVE_METASTORE_PRINCIPAL |
Kerberos principal for Hive Metastore. |
e) SMTP Settings (Recommended but optional configuration, only if access to SMTP server is not available)
LeapLogic uses an SMTP server to send notification emails to the client. SMTP settings are required to enable notifications related to features like user registration, assessment completion and reports, etc.
The variables that need to be configured in the idw-env.sh file for SMTP are as follows.
Property Name |
Description |
SMTP_MAIL_HOST |
SMTP server hostname |
SMTP_MAIL_PORT |
SMTP server connection port |
SMTP_LOGIN_ID |
SMTP login id of the mailing account |
SMTP_LOGIN_PASSWORD |
SMTP password of the mailing account |
MONIT_EMAIL_TO_NOTIFY |
The email where an automated notification will be sent in case any service is down |
f) Other Configurations
There are several other configurations in idw-env.sh file. The configurations include the host/port/directory settings for LeapLogic services. It is recommended to validate the port configurations for the services and ensure that the ports are available.
Following is the list of some advanced settings in the file that you may wish to enable or disable:
Property Name |
Description |
CONFIGURE_HADOOP_LIBS |
Configure Hadoop distribution JARs using shell scripts.
It uses idw-common-libs-${HADOOP_DISTRIBUTION}.txt in IDW_HOME/bin/ folder to create soft links for required Hadoop distributed JARs.
Recommended value # true. |
IS_HADOOP_CLUSTER_PRESENT |
This is required when LeapLogic is deployed on machines that are not a part of the Hadoop cluster. When this is set as “false”, the symlinks in translation_lib under IDW_HOME/lib will not be created. In case of single-node deployment, the recommended value for this property is false. |
ASSESSMENT_MODE |
Select local for single-node deployment or select cluster when files need to be consumed through a cluster, for example from HDFS. |
VALIDATION_MODE |
Configures the mode of validation as local or clustered mode. By default, this is set to 'local'. Modify this to ‘cluster’ when you want to perform cluster-level validation.
export VALIDATION_MODE="local"; # local or cluster |
ENABLE_MULTITENANCY |
LeapLogic application supports organization-level multi-tenancy. Application users for different organizations will have restricted access to the application.
Recommended value # true. |
APP_USERNAME |
Configures your default username used for login. |
APP_PASSWORD |
Configures your default password used for login. |
DATABASE_PASSWORD |
Configures database password used for signing into your PostgreSQL database. |
PLATFORM_COMPONENT_ENABLED_FLAG |
By default, this flag is set to False. Make it True when platform components are available in the bundle. |
Execute Configure Script
Once the user configurations are done in idw-env.sh file, you need to execute the configure.sh script in the IDW_HOME/bin folder. It updates the configuration files for LeapLogic services as per the environment settings provided in idw-env.sh file.
${IDW_HOME}/bin/configure.sh
By default, for any fresh deployment, the APP_USERNAME and APP_PASSWORD strings are empty in the idw-env.sh file. Once the configure.sh command is executed, the system prompts for entering a username and password of your choice. Enter a username and password that you want to use as your login credentials for the application. It stores the entered password in the idw-env.sh file in an encrypted format.
To enable logging, use the following command.
${IDW_HOME}/bin/execute ${IDW_HOME}/bin/configure.sh
The script prints the final status with the exit status of the script – “Exiting LeapLogic 4.4 script with exit status: VALUE_STATUS”.
- If the exit status is 0, then the script is completed successfully.
- If the exit status is a value other than 0, then you need to resolve the issues and re-execute the script.
Before moving to the next step, you need to ensure the following:
- The exit status of the script is 0.
- The temporary files on the path: IDW_HOME/application/metadata/fileCache/ directory (if any) are deleted.
Execute Initialization Script
Once the LeapLogic services are configured successfully, you need to execute the init.sh script in the IDW_HOME/bin folder. For higher security, when you execute init.sh script for Postgres DB it asks for a password. Use the database password which we configured in the idw-env.sh file. It initializes the database for LeapLogic componets etc. Use the below given command.
Like the exit status for configure.sh, the init.sh should also return the exit status as 0.
UDF JAR File
LeapLogic contains a JAR file with Hive’s user defined functions.
LeapLogic UDF JAR is placed at the path: IDW_HOME/application/workload-migration/udf-binary/wmg-bdw-udfbinary.jar
To execute the transformed Hive queries containing these UDFs, you need to copy the UDF JAR file on the underlying HDFS location inside the directory HDFS_UDFBINARY_LOCATION. You must have read access on the JAR file. Alternatively, when the idw-deployment.sh script is executed, it will automatically copy the UDF jars at the required location.
Deployment on EMR cluster
Once you select the value of the HADOOP_DISTRIBUTION property as EMR, the deployment script takes care of the deployment activities automatically. However, you need to configure a few properties and perform a few activities manually as stated below.
- Configure JAVA_HOME as /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.312.b07-0.66.amzn1.x86_64 on all the nodes.
- Change spark.yarn.jars=/usr/lib/spark/jars/*,/usr/lib/* in the file wm-cdp-spark.conf.template located at the path: /home/hadoop/idw-3.2-release-May-10/application/workload-migration/conf/utils/
- Create a new directory namely jars at the location /usr/lib. Copy all the dependent jars in the jars folder. The dependent jars can be found here.
- /usr/lib/hadoop
- /usr/lib/hadoop/lib
- /usr/lib/hadoop-hdfs
- /usr/lib/hadoop-hdfs/lib
- /usr/lib/hadoop-mapreduce
- /usr/lib/hadoop-yarn
- /usr/lib/hadoop-yarn/lib
- /usr/lib/sqoop
- /usr/lib/sqoop/lib
- /usr/lib/hive/lib
- /usr/lib/hbase
- /usr/lib/hbase/lib
- Comment the below command. spark.yarn.jars=/opt/cloudera/parcels/CDH/lib/spark/jars/*,/opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-*.jar
- After running configure.sh, check whether the above-mentioned value is present in the file: wm-cdh-spark.conf located on the path /home/hadoop/idw-3.2-release-May-10/application/workload-migration/conf/utils/
- Copy the connector jars from this location /home/hadoop/idw-3.2-release-May-10/lib/connectors/ to /usr/lib/sqoop/lib/ location
- Create a new folder namely redshift on the path /home/hadoop/idw-3.2-release-May-10/lib/connectors/ and move RedshiftJDBC42-1.1.17.1017.jar into this folder.
Furthermore, for running validation, you need a few Python plugins. Run the below command to get those plugins into the package for installation.
sudo yum -y install gcc python-setuptools python-devel postgresql-devel
sudo pip install – -upgrade pip
sudo pip install psycopg2
Start Services
Once configure.sh and init.sh scripts are executed successfully, you can start/stop all the LeapLogic services using the start-all.sh and stop-all.sh scripts in the IDW_HOME/bin directory.
Execute the start-all.sh script to start all required LeapLogic services using a single command:
${IDW_HOME}/bin/start-all.sh
This command may take a little time to execute for which you might have to wait a bit.
Configure External Libraries
As described in the Step#2 (Provide External Libraries) of the deployment process, LeapLogic services require external JARs to be either copied or provided before deployment.
a) Connector JARs
The connector JARs may include the JDBC connector JARs for Postgres, MySQL, Teradata, Netezza, Oracle, SQL Server, etc. You need to provide the JARs in the following directories:
b) Hadoop Distribution JARs
LeapLogic services consume Hadoop APIs and require JARs for Hadoop ecosystem components. You need to provide JARs in the IDW_HOME/lib/hadoop_distro_libs folder from the environment. The list of jars required is present on the location IDW_HOME/bin with the name idw-common-libs-CDH.txt
Also, an important aspect to note here is either disable or make false the hive.auto.convert.join property which needs to be updated from the Cloudera Admin console.
If the version of the Hadoop distribution is different, then you can copy the required JARs based on the distribution version. Alternatively, when ./configure.sh script is executed, it will automatically create the symbolic links.