Azure Data Factory Assessment Report

This topic contains information about the Azure Data Factory assessment report. The assessment assesses workloads and produces in-depth insights that help plan the migration. The Azure Data Factory assessment supports only ZIP files (ARM template) as input files.

To learn how to export an ARM template from the Azure portal, click Export ARM Template.

In This Topic:

Highlights
- Summary
- Resources
- Pipelines
- Queries
Analysis
- Files
- Pipelines
- Queries
- Entities
- Resources
- Artifacts
Downloadable Reports
- Insights and Recommendations
- Source Inventory Analysis

Highlights

The highlights section gives you a high-level overview of your assessment summary of the analytics performed on the selected workloads. It includes information about resource types and pipelines.

Summary

This section summarizes the input source scripts and the associated workload inventory. It includes information about pipelines, activities, resources, procedures, and so on.

Files: Displays the total number of input source files.
Pipelines: Set of activities to accomplish a task or work.
Activities: Each pipeline can contain multiple activities. Activities are primary components that specify actions such as validating data, deleting data, obtaining metadata, etc.
Procedures: Displays the number of stored procedures used in the data factory. It is a set of SQL queries to perform an action or task.
Resources: Resources are manageable services or entities, for instance, databases, storage accounts, virtual networks, etc.
Entities: Displays the number of entities used in the dataset.
URLs: Displays the number of URLs. It is a service to establish or invoke communication such as sending email, triggering any website, etc.
External Files: External files are external libraries that are used in Azure Data Factory. The files with external file formats such as bash, CMD, etc., are considered external files.

Resources

This section provides an overview of resource types such as datasets, linked services, and triggers.

Datasets: It is a collection of data that is used in various activities.
Linked Services: Linked services are connection strings that contain connection details such as databases, URLs, file paths, etc., to connect to different services (cloud, legacy data warehouses, etc.).
Triggers: Triggers are used to execute a pipeline. In Azure Data Factory, there are three types of triggers:
- Schedule: To execute the pipeline based on a predefined schedule.
- Blobevents: Execute the pipeline when a new or updated Azure blob storage is identified.
- Tumbling window: To execute the pipeline at a periodic interval irrespective of past and future dated data.

Pipelines

This section provides information about the total number of pipelines within the entire inventory along with an assessment of their complexity.

Queries

This section provides an overview of unique, analyzed, and unanalyzed queries along with their complexity.

Unique Query: Displays the number of unique queries. In Unique Queries, the duplicate queries are eliminated, and solitary queries are identified.
Analyzed: Displays the number of analyzed queries.
Not analyzed: Displays the number of queries that are not analyzed.

Analysis

This topic provides a detailed examination of the source files.

Files

This section provides a comprehensive report of the source files along with information about the total number of files, pipelines, activities, datasets, and so on.

File Name: Displays the file name. The used naming convention is folder_name/file_name.
Pipelines: Displays the number of pipelines existing in the file. Pipeline is a set of activities to accomplish a task or work.
Activities: Displays the number of activities. Each pipeline contains multiple activities. Activities are primary components that specify the actions such as validating data, deleting data, obtaining metadata, etc.
Datasets: Displays the number of datasets. It is a collection of data which is used in various activities.
Procedures: Displays the number of procedures. Procedures are a set of SQL queries to perform an action or task.
Queries: Displays the number of queries.
Complexity: Displays the complexity of the file.

Pipelines

This section provides detailed information about pipelines, including their activities, complexity, dependency conditions, and relative files.

Pipeline Name: Displays the name of the pipeline.
Relative File: Displays the relative file path on which the pipeline is available.
Activities: Displays the number of activities in each pipeline.
Complexity: Displays the pipeline complexity.
Dependency Conditions: Provides details about the resources on which the pipeline depends. For instance, the pipeline may be dependent on other pipelines, datasets, link services, etc.

Browse through each pipeline to get more insights into the associated activities.

Activity Name: Displays the name of the activity.
Type: Displays the type of activities carried out within the pipeline such as copying data, filtering data, get metadata, etc.
Activity Dependency: Provides details about the activity upon which the current activity is dependent along with its status (Succeeded/Failed). For example, as in the above image, the activity If Condition1 is dependent on the Get Metadata1 activity. And, If the Get Metadata1 activity succeeds, then If Condition1 will be invoked. However, if Get Metadata1 fails, then If Condition1 will not be executed.
Activity State: Displays the state of the activity such as Active or Inactive.
Activity Description: Provides a description of each activity.

Queries

This section provides detailed information about queries segregated into unique, analyzed, and not-analyzed queries.

Unique Queries

This section displays a list of all unique queries along with the total number of unique queries. In Unique Queries, duplicate queries are eliminated, and solitary queries are identified.

File Name: Displays the file name associated with the query.
Pipeline Name: Displays the name of the pipeline associated with the query.
Source Type: Displays the type of database where queries will be processed.
Query Type: Displays the query type.
Complexity: Displays the query complexity.

Analyzed

This section lists all queries that meet the analysis criteria.

File Name: Displays the file name associated with the query.
Pipeline Name: Displays the name of the pipeline associated with the query.
Source Type: Displays the type of database where queries will be processed.
Query Type: Displays the query type.
Complexity: Displays the query complexity.

Not Analyzed

This section lists all queries that did not meet the analysis criteria.

File Name: Displays the file name associated with the query.
Pipeline Name: Displays the name of the pipeline associated with the query.
Source Type: Displays the type of database where queries will be processed.
Query Type: Displays the query type.
Complexity: Displays the query complexity.

Entities

This section displays a detailed analysis of the entities. It includes information about the type of entities, databases, and database type.

Entity Name: Displays the name of the entity.
Type: Displays the type of entity.
Database Name: Displays the database name.
Database Type: Displays the type of database such as Oracle, Redshift, etc., where the entity is present.

Resources

This section provides detailed information about various resource types such as datasets, linked services, and triggers. Resources are manageable services or entities, for instance, databases, storage accounts, virtual networks, etc.

Datasets

This section lists all the datasets. Datasets are a collection of data that is used in various activities.

Dataset Name: Displays the name of the dataset.
Type: Displays the dataset type.
Relative File: Displays the relative file path of the dataset.
Linked Service Name: Displays the name of the linked service associated with the dataset.
Schema: Displays the associated schema name.
Table: Displays the associated table name.
Dependency: Provides details about the resources on which the dataset depends. For instance, the dataset may be dependent on other datasets, link services, etc.

Linked Service

This section lists all the linked services. Linked services are connection strings that contain connection details such as databases, URLs, file paths, etc., to connect to different services (cloud, legacy data warehouses, etc.).

Linked Service Name: Displays the name of the linked service.
Type: Displays the type of the linked services.
Relative File: Displays the relative file path of the linked services.
Dependency: Provides details about the resources on which the linked service depends. For instance, the linked service may be dependent on other datasets, link services, etc.

Trigger

This section lists all triggers. Triggers are used to execute a pipeline.

Trigger Name: Displays the name of the trigger.
Type: Displays the type of trigger. There are three types of triggers:
- Schedule: Execute the pipeline based on a predefined schedule.
- BlobEvents: Execute the pipeline when a new or updated Azure blob storage is identified.
- Tumbling window: Execute the pipeline at a periodic interval irrespective of past and future dated data.
Relative File: Displays the relative file path of the trigger.
Pipeline: Displays the pipeline associated with each trigger.
Runtime State: Specifies the runtime state of each trigger, such as started or stopped.
Frequency: Displays the frequency such as hour, minute, etc., at which the trigger is scheduled to execute.
Interval: Provides the time interval at which the trigger is scheduled to execute.

Artifacts

This section lists all the missing artifacts such as linked services, pipelines, triggers, etc., and the external files within the entire inventory.

Missing Files

This section offers a comprehensive view of all the missing artifacts, categorized into pipelines and resources.

All

This section lists all the missing artifacts. The missing artifacts are identified based on the dependency condition. For instance, if a pipeline depends on dataset A and that dataset A is not found in the input JSON file, it will be considered as a missing artifact.

Artifact Name: Displays the name of the missing artifact.
Type: Displays the type of missing artifact such as pipelines, resources, triggers, etc.
Relative File: Displays the relative file where the missing artifact is written.
Source Name: Displays the name of the dependent artifact of the missing artifact.
Source Type: Specifies the type of the dependent artifact of the missing artifact.

Pipelines

This section lists all the missing pipelines.

Artifact Name: Displays the name of the missing artifact.
Relative File: Displays the relative file path of the pipeline.
Source Name: Displays the name of the dependent artifact of the missing pipeline.
Source Type: Specifies the type of dependent artifact of the missing pipeline.

Resources

This section lists all the missing resources.

Artifact Name: Displays the name of the missing resource.
Resource Type: Specifies the type of resource.
Relative File: Displays the relative file path of the resource.
Source Name: Displays the name of the dependent artifact of the missing resource.
Source Type: Specifies the type of dependent artifact of the missing resource.

External Files

Lists all the external files including information about the associated pipelines, type of the external files, relative files, and more.

Artifact Name: Displays the name of the external file.
Type: Specifies the type of external file.
Relative File: Displays the relative file path of the source input file.
Artifact File Path: Displays the path of the artifacts accessed from external resources, such as DBFS. For example, if the external file is stored in DBFS, the artifact file path represents the DBFS path. If the file is located in S3, then the S3 path shows as the artifact file path.
Pipeline Name: Displays the associated pipeline name.

Downloadable Reports

Downloadable reports allow you to export detailed ADF assessment reports of your source data which enables you to gain in-depth insights with ease. To access these assessment reports, click Reports.

Types of Reports

In the Reports section, you can see various types of reports such as Insights and Recommendations, and Source Inventory Analysis. Each report type offers detailed information allowing you to explore your assessment results.

Insights and Recommendations

This report provides an in-depth insight into the source input files.

ADF Assessment Report.xlsx: This report provides insights about the source inventory. It includes information about the source inventory and pipelines.

This report contains the following information:

Report Summary: Provides information about all the generated artifacts.
Volumetric Info: Presents a summary of the aggregated inventory after analyzing the source files. For instance, it provides volumetric information about stored procedures, datasets, pipelines, activities, and more.
Pipeline Summary: Lists all the pipelines associated with the input files. It also provides information about components and pipeline-level complexity.

Source Inventory Analysis

It is an intermediate report which helps to debug failures or calculate the final report. It includes all the generated CSV reports such as ADF File Summary.csv, ADF Pipeline Summary.csv, Query Summary.csv, and more.

ADF File Summary.csv: This report provides information about ADF files including the count of pipelines, procedures, resources, and so on.

ADF Pipeline Summary.csv: This report provides information about pipelines including the total number of activities, external files, dependency conditions, and more

Query Detail.csv: This report provides information about queries including the analyzed status, complexity, parsing status, and more. If the analyzed status is TRUE, it indicates that the query is analyzed successfully. Conversely, a FALSE status indicates that the query is not analyzed.

Query Summary.csv: This report provides information about queries including the number of analyzed queries, not analyzed queries, complexity, and so on.