Skip to the content
LeaplogicLeaplogic
  • Home
  • About Us
  • Contact
SIGN IN
  • Home
  • About Us
  • Contact

  • Getting Started
    • Before You Begin
    • Creating an Account
    • Logging into LeapLogic
    • Reset Password
    • Quick Tour of the Web Interface
    • LeapLogic in 15 minutes
      • Prerequisites
      • Step 1. Log into LeapLogic
      • Step 2. Create Assessment and Get Insights
      • Step 3. Create Transformation Pipeline and See Results
      • Step 4. Edit or Optimize the Transformed Code
      • Step 5: Complete the Transformation Lifecycle
  • Introduction to LeapLogic
    • Overview
    • High Level Architecture
    • Supported Legacy and Cloud Platforms
    • Key Features
  • Workload Assessment
    • Overview
    • Value Proposition
    • Creating Assessment
      • Prerequisites
      • Step 1. Provide Primary Inputs
        • Automation Coverage
      • Step 2. Add the Additional Inputs
        • Table Stat Extraction Steps
          • Teradata
          • Oracle
          • Netezza
      • Step 3. Update the Source Configuration
      • Step 4. Configure the Recommendation Settings
    • Assessment Listing
    • Understanding Insights and Recommendations
      • Volumetric Info
      • EDW
        • Oracle
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Vertica
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Snowflake
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Azure Synapse
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • SQL Server
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Teradata
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Netezza
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Google Big Query
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Redshift
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • PostgreSQL
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Duck DB
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • ClickHouse
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Exasol
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • DB2
          • Highlights
          • Analysis
          • Optimization
          • Recommendations
          • Lineage
          • Downloadable Reports
      • ETL
        • Informatica
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Ab Initio
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • DataStage
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Talend
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SSIS
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Informatica BDM
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Oracle Data Integrator
          • Highlights
          • Analysis
          • Downloadable Reports
        • Pentaho
          • Highlights
          • Analysis
          • Downloadable Reports
        • Azure Data Factory
          • ARM Template
          • Highlights
          • Analysis
          • Downloadable Reports
        • Matillion
          • Highlights
          • Analysis
          • Downloadable Reports
        • SnapLogic
          • Highlights
          • Analysis
          • Downloadable Reports
      • Orchestration
        • AutoSys
          • Highlights
          • Analysis
          • Downloadable Reports
        • Control-M
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SQL Server
          • Highlights
          • Analysis
      • BI
        • OBIEE
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Tableau
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • IBM Cognos
          • Highlights
          • Analysis
          • Downloadable Reports
        • MicroStrategy
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Power BI
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SSRS
          • Highlights
          • Analysis
          • Downloadable Reports
        • SAP BO
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • WebFOCUS
          • Highlights
          • Analysis
          • Downloadable Reports
      • Analytics
        • SAS
          • Highlight
          • Analysis
          • Lineage
          • Downloadable Reports
        • Alteryx
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
      • Integrated Assessment (EDW, ETL, Orchestration, BI)
        • Highlights
        • Analysis
        • Optimization
        • Lineage
        • Recommendations
    • Managing Assessment Reports
      • Downloading Report
      • Input Report Utility
      • View Configuration
    • Complexity Calculation Logic
    • Key Benefits
    • Ad hoc Query
  • Metadata Management
    • Overview
    • Introduction to Data Catalog
      • Managing Data Catalog
        • Building Data Catalog
        • Insights to Data Catalog
        • Managing the Repository and Data Source
      • Creating Repository (Repo)
      • Creating Data Source
    • Tag Management
    • Key benefits
  • Batch Processing using Pipeline
    • Introduction
    • Designing Pipeline
      • How to create a pipeline
        • Configuring Migration Stage
          • Schema Optimization
        • Configuring Transformation Stage
          • On-premises to Cloud
          • Cloud-to-Cloud
          • LeapLogic Express
        • Configuring Validation Stage
          • Data Validation
            • Table
            • File
            • File and Table
            • Cell-by-cell validation
          • Query Validation
            • Query Validation (When Data is Available)
            • Query Validation (When Data is Not Available)
          • Schema Validation
        • Configuring Execution Stage
        • Configuring ETL Conversion Stage
          • Ab Initio
          • Informatica
          • Informatica BDM
          • Matillion
          • DataStage
          • SSIS
          • IICS
          • Talend
          • Oracle Data Integrator
          • Pentaho
          • SnapLogic
        • Configuring Mainframe Conversion Stage
          • Cobol
          • JCL
        • Configuring Orchestration Stage
          • AutoSys
          • Control-M
        • Configuring BI Conversion Stage
          • OBIEE to Power BI
          • OBIEE to AWS QuickSight
          • Tableau to Amazon QuickSight
          • Tableau to Power BI
          • Tableau to Superset
          • Tableau to Looker
          • IBM Cognos to Power BI
        • Configuring Analytics Conversion Stage
          • SAS
          • Alteryx
        • Configuring Script Conversion Stage
    • Key Features
      • How to schedule a pipeline
      • Configuring Parameters
  • Pipeline Reports
    • Overview of Pipeline Report
    • Pipeline Listing
    • Reports and Insights
      • Migration
      • Transformation
        • On-premises to Cloud
        • Cloud-to-Cloud
        • LeapLogic Express
      • Validation
        • Data
          • File
          • Table
          • File and Table
        • Query
          • Query Validation Report (When Data is Available)
          • Query Validation Report (When Data is not Available)
        • Schema
      • Execution
      • ETL
        • Ab Initio
        • Informatica
        • Informatica BDM
        • Matillion
        • DataStage
        • SSIS
        • IICS
        • Talend
        • Oracle Data Integrator
        • Pentaho
        • SnapLogic
      • Mainframe
        • Cobol
        • JCL
      • Orchestration
        • AutoSys
        • Control-M
      • BI
        • OBIEE to Power BI
        • OBIEE to Amazon QuickSight
        • Tableau to Amazon QuickSight
        • Tableau to Power BI
        • Tableau to Superset
        • Tableau to Looker
        • IBM Cognos to Power BI
      • Analytics
        • SAS
        • Alteryx
      • Shell Script
      • Common Model
    • Automation Level Indicator
      • ETL
        • Informatica
        • Matillion
        • DataStage
        • Informatica BDM
        • SnapLogic
        • IICS
        • Ab Initio
        • SSIS
        • Talend
        • Pentaho
      • Orchestration
        • AutoSys
        • Control-M
      • EDW
      • Analytics
        • SAS
        • Alteryx
      • BI
      • Shell Script
    • Error Specifications & Troubleshooting
  • SQL Transformation
    • Overview
    • Creating and Executing the Online Notebook
      • How to Create and Execute the Notebook
      • Supported Features
    • Configuring the Notebook
      • Transformation
      • Unit Level Validation
      • Script Level Validation
    • Notebook Listing
  • Operationalization
    • Overview
      • Basic
      • Advanced
      • Cron Expression
    • Parallel Run Pipeline Listing
  • Transformation Source
    • Introduction
    • Creating Transformation Source Type
  • Governance
    • Summary of Governance - Roles and Permissions
    • User Creation
      • Creating a new User Account
    • Adding Roles and permissions
      • How to add Roles and Permissions to a new user?
    • Adding Group Accounts
    • Default Quota Limits
    • Product Usage Metrics
  • License
    • EDW
    • ETL
  • LeapLogic Desktop Version
    • Overview
    • Registration and Installation
    • Getting Started
    • Creating Assessment
      • ETL
      • DML
      • Procedure
      • Analytics
      • Hadoop
    • Reports and Insights
      • Downloadable Reports
      • Reports for Estimation
    • Logging and Troubleshooting
    • Sample Scripts
    • Desktop vs. Web Version
    • Getting Help
  • LeapLogic (Version 4.8) Deployment
    • System Requirements
    • Prerequisites
    • Deployment
      • Extracting Package
      • Placing License Key
      • Executing Deployment Script
      • Accessing LeapLogic
    • Uploading License
    • Appendix
    • Getting Help
  • Removed Features
    • Configuring File Validation Stage
    • Variable Extractor Stage
      • Variable Extractor Report
    • Configuring Meta Diff Stage
      • Meta Diff
    • Configuring Data Load Stage
      • Data Load
    • Configuring Multi Algo Stage
  • FAQs
  • Tutorial Videos
  • Notice
Home   »  Batch Processing using Pipeline   »  Designing Pipeline   »  How to create a pipeline   »  Configuring Validation Stage  »  Configuring Query Validation

Configuring Query Validation

This stage validates the transformed queries using the query validation engine to ensure successful query transformation. The output generated by the source and target (transformed) queries are compared to determining the accuracy of the transformed queries.

In query validation, there are two scenarios depending on the availability of test data.

  • When test data is available: In this case, you have access to real enterprise dataset (clients have shared the test dataset or allow access to their database containing source tables) to validate the source and target queries. The source and target (transformed) queries are executed to generate outputs, then those outputs are compared to ensure the accuracy of the transformed queries.
  • When test data is not available: In this case, real enterprise data is not available for testing. So, the validation engine auto-generates test data based on complex query conditions to validate the transformed queries. The system inserts values into the source and target tables based on the auto-generated test data. Then, both the source and target (transformed) queries are executed to generate outputs, and those outputs are compared to ensure the accuracy of the transformed queries.

Validation process

Query validation consists of the following steps when test data is not available:

  • Generate synthetic test data using various scenarios and edge cases using the validation engine (LeapLogic Engine or Embark).
  • Create and populate tables on the databases. This includes the incorporation of values into the source and target tables based on the generated synthetic test data.
  • Execute source and target (transformed) queries.
  • Validate the results with the output data generated after execution.

To configure the Query Validation, follow the below steps:

  1. In Validation type, select Query to validate the transformed queries.
  2. Choose Data Availability as:
    • Yes: If you have real enterprise dataset to validate the transformed queries. See Configuring Query Validation (When Data is Available) for configuration details.
    • No: If real enterprise dataset is not available. In this case, LeapLogic auto-generates sample dataset to validate the transformed queries. See Configuring Query Validation (When Data is Not Available) for configuration details.

Configuring Query Validation (When Data is Available)

To configure the query validation:

  1. In Source Type, select the source type such as Oracle, SQL Server, Teradata, or Netezza.
  2. In Source DDL File, upload the ddl file to define the source schema.
  1. Choose the Target Metadata as:
    1. None: Does not execute the source DDL. If any target table is missing, it will be created during the execution of its respective query.
    2. Upload Target DDL: To execute the target DDL scripts and create the target schema. In Target DDL File, you can upload the target DDL files to define the target schema.
    3. Transform Source DDL: Transforms the source DDL to the target equivalent format and executes it on the target.
  1. In Transformation Output, upload the Transformation stage output file to map the original and transformed queries. An integrated pipeline does not require the Transformation Output files to be uploaded. As part of the pipeline execution, the system automatically fetches the data from the Transformation stage.
  1. In Data Source Mapping, map the database in the script with the source and target databases. To do so,
    1. In the Source/Target tab, two cards are available:
      1. Databases in Scripts
      2. Source Databases/Target Databases

    2. The Databases in Scripts card contains all the databases available in the selected script. The database(s) displayed in the Databases in Scripts card must be mapped to the Source Database/ Target Database card.
    3. Click + on the Source Databases/Target Databases card and add the dummy database. To add the database, follow the steps below:
      1. Click + on the Source Databases/Target Databases card.
      2. Choose repository.
      3. Select data source.
      4. Click to save the source/ target data source.
    4. To map the database(s) in the Databases in Scripts card to the Source Databases/ Target Databases card, follow any of the steps below.
      1. Drag and drop the database in the Databases in Scripts card to the database in the Source Databases/Target Databases card.
      2. Click the checkbox of the preferred database available in the Databases in Scripts card and then click the preferred database in the Source Databases/ Target Databases card.
  1. Enter the IAM Role.
  1. In Storage Data Source, upload the storage data source.
  2. Click Source Configuration.
  1. In Query Parser, select the parser such as PyHive, PySpark, or Hive based on the uploaded transformation output. If you uploaded:
    1. .sql files in Transformation Output, then you must choose Hive as the Query Parser. The query parser parses the .sql file and fetches the original and target queries.
    2. .py files with spark queries in Transformation Output, then choose PySpark as the Query Parser. The query parser parses the .py file and fetches the original and target queries.
    3. .py files with hive queries in Transformation Output, then choose PyHive as the Query Parser. The query parser parses the .py file and fetches the original and target queries.
  2. In Functions , select Cell-by-Cell or required aggregate functions from to perform validation on the dataset.
  1. In Driver Input, you can define the running sequences of the transformation output script.
  1. In Unique ID, provide the unique id that is appended to the original table name copy on which the validation is performed.
  2. In Source Table Suffix, provide the source table suffix in the source table copy for comparison.
  3. In Target Table Suffix, provide the target table suffix in the target table copy for comparison.
  4. Select the Execution Level such as Query by Query or All scripts, where Query by Query validates at a query level while All scripts validates all the scripts in one go.
  5. In On Mismatch , select option based on which you can Continue or Stop the validation whenever the system encounters a mismatch.
  1. In Pipeline Execution , select Resume or Start Over to resume or start over the validation if the selected On Mismatch option is Stop.
  2. In Resultant Tables, select None, Cleanup or Overwrite Original And Cleanup, where:
    1. None: To retain both original and intermediate table as is.
    2. Cleanup: To clean up all the intermediate tables
    3. Overwrite Original And Cleanup: To overwrite the original (source) table with the new intermediate table and cleanup the other leftover tables.
  3. In Custom Properties, enter the Key and Value. For any parameterized queries, you can replace the value at execution/run time, which can be passed here as keys and values.
  1. In On screen, select the value from the sliding bar to decide how many records need to be displayed on the UI.
  2. In Download (CSV) , choose the value from the sliding bar to display the maximum number of mismatched records for downloaded reports.
  3. Click Save to save the Validation Stage. The system displays an alerting snackbar pop-up to notify the success message when the Validation stage is successfully saved.


Configuring Query Validation (When Data is Not Available)

  1. In Source Type, select the source type such as Vertica, Oracle, SQL Server, Teradata, or Netezza.
  2. In Source DDL File, upload the ddl file via:
    1. Browse Files: To select the file from the local system.
    2. Select From Data Source: To select the file from the data source.
  1. Choose the Target Metadata as:
    1. None: Does not execute the source DDL. The missing target tables will be created during the execution of the respective queries.
    1. Upload Target DDL: To execute the target DDL scripts and create the target schema. In Target DDL File, you can upload the target DDL files.
    1. Transform Source DDL: Transforms the source DDL to the target equivalent format and executes it on the target.
  1. In Transformation Output, upload the Transformation stage output file to map the original and transformed queries. An integrated pipeline does not require the Transformation Output files to be uploaded. As part of the pipeline execution, the system automatically fetches the data from the Transformation stage.
  1. In Data Source Mapping, map the database in the script with the source and target databases. To do so,
    1. In the Source/Target tab, two cards are available:
      1. Databases in Scripts
      2. Source Databases/Target Databases

    2. The Databases in Scripts card contains all the databases available in the selected script. The database(s) displayed in the Databases in Scripts card must be mapped to the Source Databases/ Target Database card.
    3. Click + on the Source Databases/Target Databases card and add the dummy database. To add the database, follow the steps below:
      1. Click + on the Source Databases/Target Databases card.
      2. Choose repository.
      3. Select data source.
      4. Click to save the source/ target data source.
    4. To map the database(s) in the Databases in Scripts card to the Source Databases/ Target Databases card, follow any of the steps below.
      1. Drag and drop the database in the Databases in Scripts card to the database in the Source Databases/Target Databases card.
      2. Click the checkbox of the preferred database available in the Databases in Script card and then click the preferred database in the Source Databases/ Target Databases card.
  1. In IAM Role, assign IAM role.
  2. In Storage Data Source, upload the storage data source.
  3. Select the Source Configuration.
  1. In Stop On, select the criteria to stop the validation. The criteria are:
    1. Data Generation Issue
    2. Not Matched
    3. Error
    4. Explain Plan Error
  2. In Population Size, select the value from the sliding bar. It defines the size of raw sample data generated for comparison by data generator.
  3. In Number of Records, select the value from the sliding bar to choose the best data sample for comparison.

Note:

In the query validation engine, the best results are obtained when Population Size is set to 30 and the Number of Records is set to 3.

  1. In the Validation Engine, select the engine based on which you need to generate the test data. The Validation Engine options include:
    1. LeapLogic Engine: Select this to auto-generate the test data based on the native LeapLogic engine for validating queries against the source and target data.
    2. Embark: Select Embark pretrained model to auto-generate the test data for validating queries against the source and target data.
  1. Click Save to save the Validation Stage. When the Validation stage is successfully saved, the system displays an alerting snackbar pop-up to notify the success message.
  1. Click to execute the integrated or standalone pipeline. The pipeline listing page opens. The pipeline listing page displays your pipeline status as Running state. When execution is successfully completed status changes to Success.
  1. Click on your pipeline card to see reports.

To view the Query Validation Stage report, visit Query Validation Report.


Note:

In the query validation report:

  • Queries with the statement types: Select, Update, Insert with Select, Create with Select, and Delete are valid use cases for query validation and queries with any other statement type may fall under the ‘N/A’ category.
  • There could be ‘Not Matched’ cases due to incorrect transformation, or there might be differences in data generation compared with the given function, for example, the value for the current timestamp may be different in the source and the target.

To learn more, contact our support team or write to: info@leaplogic.io

Copyright © 2025 Impetus Technologies Inc. All Rights Reserved

  • Terms of Use
  • Privacy Policy
  • License Agreement
To the top ↑ Up ↑