Skip to the content
LeaplogicLeaplogic
  • Home
  • About Us
  • Contact
SIGN IN
  • Home
  • About Us
  • Contact

  • Getting Started
    • Before You Begin
    • Creating an Account
    • Logging into LeapLogic
    • Reset Password
    • Quick Tour of the Web Interface
    • LeapLogic in 15 minutes
      • Prerequisites
      • Step 1. Log into LeapLogic
      • Step 2. Create Assessment and Get Insights
      • Step 3. Create Transformation Pipeline and See Results
      • Step 4. Edit or Optimize the Transformed Code
      • Step 5: Complete the Transformation Lifecycle
  • Introduction to LeapLogic
    • Overview
    • High Level Architecture
    • Supported Legacy and Cloud Platforms
    • Key Features
  • Workload Assessment
    • Overview
    • Value Proposition
    • Creating Assessment
      • Prerequisites
      • Step 1. Provide Primary Inputs
        • Automation Coverage
      • Step 2. Add the Additional Inputs
        • Table Stat Extraction Steps
          • Teradata
          • Oracle
          • Netezza
      • Step 3. Update the Source Configuration
      • Step 4. Configure the Recommendation Settings
    • Assessment Listing
    • Understanding Insights and Recommendations
      • Volumetric Info
      • EDW
        • Oracle
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Vertica
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Snowflake
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Azure Synapse
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • SQL Server
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Teradata
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Netezza
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Google Big Query
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Redshift
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • PostgreSQL
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Duck DB
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • ClickHouse
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • Exasol
          • Highlights
          • Analysis
          • Optimization
          • Lineage
          • Recommendations
          • Downloadable Reports
        • DB2
          • Highlights
          • Analysis
          • Optimization
          • Recommendations
          • Lineage
          • Downloadable Reports
      • ETL
        • Informatica
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Ab Initio
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • DataStage
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Talend
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SSIS
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Informatica BDM
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Oracle Data Integrator
          • Highlights
          • Analysis
          • Downloadable Reports
        • Pentaho
          • Highlights
          • Analysis
          • Downloadable Reports
        • Azure Data Factory
          • ARM Template
          • Highlights
          • Analysis
          • Downloadable Reports
        • Matillion
          • Highlights
          • Analysis
          • Downloadable Reports
        • SnapLogic
          • Highlights
          • Analysis
          • Downloadable Reports
      • Orchestration
        • AutoSys
          • Highlights
          • Analysis
          • Downloadable Reports
        • Control-M
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SQL Server
          • Highlights
          • Analysis
      • BI
        • OBIEE
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Tableau
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • IBM Cognos
          • Highlights
          • Analysis
          • Downloadable Reports
        • MicroStrategy
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • Power BI
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • SSRS
          • Highlights
          • Analysis
          • Downloadable Reports
        • SAP BO
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
        • WebFOCUS
          • Highlights
          • Analysis
          • Downloadable Reports
      • Analytics
        • SAS
          • Highlight
          • Analysis
          • Lineage
          • Downloadable Reports
        • Alteryx
          • Highlights
          • Analysis
          • Lineage
          • Downloadable Reports
      • Integrated Assessment (EDW, ETL, Orchestration, BI)
        • Highlights
        • Analysis
        • Optimization
        • Lineage
        • Recommendations
    • Managing Assessment Reports
      • Downloading Report
      • Input Report Utility
      • View Configuration
    • Complexity Calculation Logic
    • Key Benefits
    • Ad hoc Query
  • Metadata Management
    • Overview
    • Introduction to Data Catalog
      • Managing Data Catalog
        • Building Data Catalog
        • Insights to Data Catalog
        • Managing the Repository and Data Source
      • Creating Repository (Repo)
      • Creating Data Source
    • Tag Management
    • Key benefits
  • Batch Processing using Pipeline
    • Introduction
    • Designing Pipeline
      • How to create a pipeline
        • Configuring Migration Stage
          • Schema Optimization
        • Configuring Transformation Stage
          • On-premises to Cloud
          • Cloud-to-Cloud
          • LeapLogic Express
        • Configuring Validation Stage
          • Data Validation
            • Table
            • File
            • File and Table
            • Cell-by-cell validation
          • Query Validation
            • Query Validation (When Data is Available)
            • Query Validation (When Data is Not Available)
          • Schema Validation
        • Configuring Execution Stage
        • Configuring ETL Conversion Stage
          • Ab Initio
          • Informatica
          • Informatica BDM
          • Matillion
          • DataStage
          • SSIS
          • IICS
          • Talend
          • Oracle Data Integrator
          • Pentaho
          • SnapLogic
        • Configuring Mainframe Conversion Stage
          • Cobol
          • JCL
        • Configuring Orchestration Stage
          • AutoSys
          • Control-M
        • Configuring BI Conversion Stage
          • OBIEE to Power BI
          • OBIEE to AWS QuickSight
          • Tableau to Amazon QuickSight
          • Tableau to Power BI
          • Tableau to Superset
          • Tableau to Looker
          • IBM Cognos to Power BI
        • Configuring Analytics Conversion Stage
          • SAS
          • Alteryx
        • Configuring Script Conversion Stage
    • Key Features
      • How to schedule a pipeline
      • Configuring Parameters
  • Pipeline Reports
    • Overview of Pipeline Report
    • Pipeline Listing
    • Reports and Insights
      • Migration
      • Transformation
        • On-premises to Cloud
        • Cloud-to-Cloud
        • LeapLogic Express
      • Validation
        • Data
          • File
          • Table
          • File and Table
        • Query
          • Query Validation Report (When Data is Available)
          • Query Validation Report (When Data is not Available)
        • Schema
      • Execution
      • ETL
        • Ab Initio
        • Informatica
        • Informatica BDM
        • Matillion
        • DataStage
        • SSIS
        • IICS
        • Talend
        • Oracle Data Integrator
        • Pentaho
        • SnapLogic
      • Mainframe
        • Cobol
        • JCL
      • Orchestration
        • AutoSys
        • Control-M
      • BI
        • OBIEE to Power BI
        • OBIEE to Amazon QuickSight
        • Tableau to Amazon QuickSight
        • Tableau to Power BI
        • Tableau to Superset
        • Tableau to Looker
        • IBM Cognos to Power BI
      • Analytics
        • SAS
        • Alteryx
      • Shell Script
      • Common Model
    • Automation Level Indicator
      • ETL
        • Informatica
        • Matillion
        • DataStage
        • Informatica BDM
        • SnapLogic
        • IICS
        • Ab Initio
        • SSIS
        • Talend
        • Pentaho
      • Orchestration
        • AutoSys
        • Control-M
      • EDW
      • Analytics
        • SAS
        • Alteryx
      • BI
      • Shell Script
    • Error Specifications & Troubleshooting
  • SQL Transformation
    • Overview
    • Creating and Executing the Online Notebook
      • How to Create and Execute the Notebook
      • Supported Features
    • Configuring the Notebook
      • Transformation
      • Unit Level Validation
      • Script Level Validation
    • Notebook Listing
  • Operationalization
    • Overview
      • Basic
      • Advanced
      • Cron Expression
    • Parallel Run Pipeline Listing
  • Transformation Source
    • Introduction
    • Creating Transformation Source Type
  • Governance
    • Summary of Governance - Roles and Permissions
    • User Creation
      • Creating a new User Account
    • Adding Roles and permissions
      • How to add Roles and Permissions to a new user?
    • Adding Group Accounts
    • Default Quota Limits
    • Product Usage Metrics
  • License
    • EDW
    • ETL
  • LeapLogic Desktop Version
    • Overview
    • Registration and Installation
    • Getting Started
    • Creating Assessment
      • ETL
      • DML
      • Procedure
      • Analytics
      • Hadoop
    • Reports and Insights
      • Downloadable Reports
      • Reports for Estimation
    • Logging and Troubleshooting
    • Sample Scripts
    • Desktop vs. Web Version
    • Getting Help
  • LeapLogic (Version 4.8) Deployment
    • System Requirements
    • Prerequisites
    • Deployment
      • Extracting Package
      • Placing License Key
      • Executing Deployment Script
      • Accessing LeapLogic
    • Uploading License
    • Appendix
    • Getting Help
  • Removed Features
    • Configuring File Validation Stage
    • Variable Extractor Stage
      • Variable Extractor Report
    • Configuring Meta Diff Stage
      • Meta Diff
    • Configuring Data Load Stage
      • Data Load
    • Configuring Multi Algo Stage
  • FAQs
  • Tutorial Videos
  • Notice
Home   »  Workload Assessment   »  Understanding Insights and Recommendations   »  EDW   »  Redshift  »  Optimization

Optimization

This topic provides a detailed examination of the optimization process. LeapLogic’s intelligent grammar engine identifies optimization opportunities at schema, and orchestration levels.

In this Topic:

  • Schema
  • Orchestration

Related Topics:

  • Highlights
  • Analysis
  • Optimization
  • Lineage
  • Recommendations
  • Downloadable Reports

Schema

Schema refers to how data is organized within a database. Optimizing schema is critical based on the complexity of the queries in the schema. By interpreting the input files, DDL files help in retrieving schema-level information and generating corresponding recommendations for the target.

Schema optimization opportunities include:

  • Increasing parallelism
  • Joins and Update/Delete/Merge operations can be optimized

Target as Databricks

If the target is Databricks, the schema optimization is based on liquid clustering. Liquid clustering organizes data based on the columns specified in the Cluster By key.

  • Entity Name: Name of the entity.
  • Cluster By: Column name specified here is used to organize data within the table.
  • Frequency of Use: Displays the frequency of table used.
  • Primary Key: To identify unique row in the table. It will not accept NULL values.
  • Tune Based on Workload: If a Delta table experiences frequent MERGE, UPDATE, or DELETE operation, then this helps to tune file sizes for rewrites by setting the table property delta.tuneFileSizesForRewrites.
  • Tune Based on Table Sizes: Helps to configure the target file size based on source table size. For example, for 1 TB table size, the size of the target file should be 256 MB; for 10 TB, it should be 1 GB and likewise.
  • Compaction (bin-packing): Delta Lake on Databricks improves the speed of read queries from a table by coalescing small files into larger ones.
  • Data Source: Provides data source details.
  • Entity Type: Displays the type of entities.
  • Transactional: Displays the number of data flow components that are used to perform sorting, merging, data cleansing and so on.
  • Data Volume: Displays the quantity of data.
  • Unique Key: To identify records in a table.
  • Details: Provides additional details about the entities.

Target as AWS/ Snowflake/ GCP

If the target is AWS/ Snowflake/ GCP, the schema optimization is performed using the Partition By key.

  • Entity Name: Name of the entity.
  • Partition by: Divides the table data into smaller, separate partitions for efficient data retrieval.
  • Data Source: Provides data source details.
  • Entity Type: Displays the type of each entity.
  • Frequency of Use: Displays the frequency of table used.
  • Transactional: Displays the number of data flow components that are used to perform sorting, merging, data cleansing and so on.
  • Data Volume: Displays the quantity of data.
  • Primary Key: Displays the Primary Key which identifies each row in the table. It does not accept NULL values.
  • Unique Key: Displays the Unique key. The unique key ensures that the values are unique across all records in the table.
  • Details: Provides additional details about the entities.

Target as Hive/ Spark

If the target is Hive/ Spark, the schema optimization strategies are Partitioning, Bucketing, and Clustering. For schema optimization, determining the right partitioning strategy helps to logically divide the data into different directories for efficient data retrieval.

  • Entity Name: Name of the entity.
  • Partition By: Divides the table data into smaller, separate partitions for efficient data retrieval.
  • Cluster By: Groups data into different files for efficient data retrieval.
  • Split By: Splits data based on the selected column.
  • Sort By: Displays the Sort By key, which is used to arrange the data.
  • No. of Buckets: Displays the number of buckets used to divide data during clustering.
  • Distributed By: Displays details of columns used for distribution.
  • Data Source: Provides data source details.
  • Entity Type: Displays the type of entities.
  • Frequency of Use: Displays the frequency of table used.
  • Transactional: Displays the number of data flow components that are used to perform sorting, merging, data cleansing and so on.
  • Data Volume: Displays the quantity of data.
  • Primary Key: Displays the Primary Key which identifies each row in the table. It does not accept NULL values.
  • Unique Key: Displays the Unique key. The unique key ensures that the values are unique across all records in the table.
  • Details: Provides additional details about the entities.


Orchestration

This section details data splits and recommends the maximum possible number of queries that can be executed in parallel for orchestration optimization.

Orchestration helps to easily manage workflows to accomplish a specific task. In orchestration, the independent queries of each file are divided into different splits that run in parallel to increase efficiency.

As per the above figure, the file contains 20 queries, with 5 splits based on query dependency. By running the splits in parallel, we can enhance the performance and execution time.

To get more insight, navigate to Files‘ row > Number of splits row, you can view the query and graphical distribution of the statement type.

Orchestration’s GRAPH displays the dependencies between files, splits, and output tables. To get the dependency information regarding splits and queries, choose the preferred node.


To learn more, contact our support team or write to: info@leaplogic.io

Copyright © 2025 Impetus Technologies Inc. All Rights Reserved

  • Terms of Use
  • Privacy Policy
  • License Agreement
To the top ↑ Up ↑