Complexity Calculation Logic

This topic describes the logic for calculating complexity. Based on the count of certain applicable patterns as per the workload type, the intelligent assessment engine calculates the workload complexity. As per the input workload, it may also consider variables, lines of code, expression length, or number of queries per job, etc.

In This Topic:

EDW – Query (Teradata, Oracle, SQL Server, Vertica, Netezza, BigQuery, Snowflake, Redshift)
EDW – Procedural Code
ETL (Informatica, DataStage, Ab Initio, Talend, SSIS, Matillion, ODI, Pentaho, Azure Data Factory)
Analytics (SAS, Alteryx)
BI (OBIEE, IBM Cognos, MicroStrategy, SAP BO, Tableau, SSRS, Power BI, WebFOCUS)
Shell Script
Orchestration (AutoSys, Control-M)
Hadoop

EDW – Query

This section outlines the logic used to calculate the complexity of EDW queries. The complexity is categorized into the following ranges, based on a total complexity score calculated by summing the complexity values of all components in a query.

Complexity	Total Complexity Score (summation of complexity number of all components in a query)
Simple	0-4
Medium	4-10
Complex	10+

The complexity is determined based on factors such as the minimum/ maximum occurrences of functions, operations, sub-queries, joins, total length of the query, case clauses, nested sub-queries, and more.

Category	Sub Category	Min	Max	Complexity Number
Recursive	Recursive	1	1	5
Window Functions	windowfunctions	1	4	1
	windowfunctions	5	10	2
	windowfunctions	11		3
Aggregate Functions	aggregatefunctions	1	4	1
	aggregatefunctions	5	10	2
	aggregatefunctions	11		3
Union	union	1	4	0.5
Union	union	5		2
Operation	minus	1	1	1
Operation	minus	2		2
Sub Query clause	subquery with select clause	1	2	1
	subquery with select clause	3	4	1.5
	subquery with select clause	5	5	2
	subquery with select clause	6	6	2.5
	subquery with select clause	7		3
	subquery with where clause	1	2	0.5
	subquery with where clause	3	4	1
	subquery with where clause	5	6	2.5
	subquery with where clause	7		3
	subquery with from clause	1	2	0.5
	subquery with from clause	3	4	1
	subquery with from clause	5		2
	subquery with other clause	1	2	1
	subquery with other clause	3	4	2
	subquery with other clause	5		3
	subquery with in /anyall/exists	1	2	0.5
	subquery with in /anyall/exists	3	4	1.5
	subquery with in /anyall/exists	5		3
Join	cross join	1	2	0.5
	cross join	3	4	1
	cross join	5	6	2
	cross join	7		3
	inner join	1	2	0.5
	inner join	3	5	1
	inner join	6	8	2
	inner join	9		3
	outer join	1	2	0.5
	outer join	3	4	2
	outer join	5	6	3
	outer join	7	8	4
	outer join	9		5
Order By	order by columns/(having/groupby) derived columns	2		1
Length	length	5001	10000	1
	length	10001	20000	2
	length	30001	40000	3
	length	40001	50000	4
	length	50001		5
Join Condition	where and join condition	1	5	0
	where and join condition	6	10	0.5
	where and join condition	10	20	1
	where and join condition	21	30	2
	where and join condition	31	40	3
	where and join condition	40		4
Case Clause	case	1	2	0.5
	case	3	6	1
	case	6		1.5
Sub Query	nested subquery	1	1	1
	nested subquery	2	2	2
	nested subquery	3	3	3
	nested subquery	4	4	4
	nested subquery	5		5
select*				1

EDW – Procedural Code

The below table describes the logic for calculating PL/SQL complexity. It calculates the PL/SQL complexity based on the number of queries, conditions, cursors, goto statements, loops, and lines of code.

Complexity	Total Query Count	Condition Count	Cursor Count	GoTo Count	Line Of Code	Loop Count
Trivial	<=10	<=2	<=1	<=2	<=100	<=1
Simple	>=11 and <=20	>2 and <=5	=2	>2 and <=5	>=101 and <=350	=2
Medium	>=21 and <=40	>=6 and <=10	>=3 and <=5	>=6 and <=10	>=351 and <=1000	>=3 and <=5
Complex	>=41 and <=60	>=11 and <=20	>=6 and <=10	>=11 and <=20	>=1001 and <=2000	>=6 and <=10
Very Complex	>60	>20	>10	>20	>2000	>10

ETL

The below tables describe the logic for calculating ETL complexity.

Informatica

The below table describes the logic for calculating Informatica complexity. It calculates the complexity on the basis of components, transformations, expressions, variables, running aggregate functions, and length of columns.

Script Severity	Components & Inputs/output	Active/Passive Transformations	Expressions	Variables	Running Agg	Out Column Len	Other Transformations
Simple	<=30	<=24	<=5	<=10	<=2	<=1000	<= 1
Medium	>30 and <= 100	>24 and <= 88	>5 and <=10	>10 and <=30	>2 and <=6	>1000 and <=4000	>1 and <= 2
Complex	>100	>88	>10	>30	>6	>4000	>2

DataStage

The below table describes the logic for calculating DataStage complexity. It calculates the complexity based on number of components, job type, transformations, and more.

Complexity	Job Type	Total Components	Active/ Passive Transformations	Other Transformation	CTransformation Stage
Simple	Parallel/ Mainframe/ Server job	<=30	<=24	<=1	<=5
Medium	Parallel/ Mainframe/ Server job	<=100	<=88	<=2	<=10
Complex	Parallel/ Mainframe/ Server job	>100	>88	>2	>10

Ab Initio

The below table describes the logic for calculating Ab Initio complexity. It calculates the complexity on the basis of number of components, transform components, ebcdic files, and more.

Graphs Complexity	Components	Other Components	Transform Components	Simple XFR (Unique)	Complex XFR(Unique)	Complex/Ebcdic DML(Unique)
Simple	<=20	<=20	<=10	<=10	<=2	<=0
Medium	>20 & <=50	>20 & <=50	>10 & <=20	>10 & <=20	>2 & <=5	>0 & <=2
Complex	>50	>50	>20	>20	>5	>2

Talend

The below table describes the logic for calculating Talend complexity. It calculates the complexity on the basis of tmaps, expression length, and components.

Graph Severity	Tmap Count	Expression Length	Component Count
1	<=4	<=500	<=15
2	>4 & <=10	>500 & <=1000	>15 & <=30
3	>10 & <=30	>1000 & <=4000	>30 & <=100
4	>30 & <=60	>4000 & <=10000	>100 & <=200
5	>60 & <=120	>10000 & <=20000	>200 & <=300

SSIS

The below table describes the logic for calculating SSIS complexity. It calculates the complexity on the basis of components, event handlers, constraints, loops, and microsoft.pipeline.

Graphs Severity (1-5 Scale)	Total Components	Total Event Handlers	Total Constraint	For Each Loop/For Loop	Microsoft.Pipeline
Trivial (1)	<=30	<=5	<=5	0	<=1
Simple (2)	>30 & <=80	>5 & <=10	>5 & <=10	1	<=5
Medium (3)	>80 & <=150	>10 & <=15	>10 & <=15	<=3	<=20
Complex (4)	>150 & <=200	>15 & <=20	>15 & <=20	<=6	<=40
V Complex (5)	>200	>20	>20	>6	>40

Matillion

The below table describes the logic for calculating Matillion complexity. It calculates the complexity on the basis of joins, formulas, components, and more.

Category	Simple	Medium	Complex	Very Complex
Number of SQL Queries	<5	5-8	9-12	>12
Number of Joins	<5	5-10	11-30	>30
Number of Iterators	<5	5-10	11-20	>20
Number of Custom Logic	0	1-4	5-20	>20
Number of Formulas / Calculators	<10	10-12	13-40	>40
Number of Aggregations	<5	5-10	11-30	>30
Number of Filters/ Conditions	<10	11-15	16-50	>50
Number of Python Scripts	<5	5-10	11-20	>20
Number of Orchestration	<5	5-10	11-12	>12
Number of Components	<60	61-100	101-250	>250

ODI

The below table shows the complexity matrix for ODI packages.

Complexity	Trivial	Simple	Medium	Complex	V Complex	V. V. Complex
Complexity Number Range	<=3	<=6	<=9	<=15	<=20	>20

Pentaho

The below table describes the logic used to calculate Pentaho job complexity. The complexity is calculated based on the total number of components along with simple, medium, and complex components.

Job Severity (1-5 Scale)	Total Number of Components	Number of Simple Components	Number of Medium Components	Number of Complex Components
1	<=10	<=10	<=5	<=2
2	>10 & <=20	>10 & <=20	>5 & <=10	>2 & <=5
3	>20 & <=30	>20 & <=30	>10 & <=20	>5 & <=10
4	>30 & <=40	>30 & <=40	>20 & <=30	>10 & <=15
5	>40	>40	>30	>15

The number of components refers to the total count of job-level components and the components used in the transformations that are invoked within the job.

In addition, code deduplication is applied based on the number of times the same transformation or job is called from other jobs or reused.

For instance,

If there are 5 jobs – J1, J2, J3, J4, and J5, and J5 is a duplicate of J1, then only 4 unique jobs are considered.
Similarly, if there are 4 transformation and one transformation is a duplicate, then the number of transformations is 3.

Azure Data Factory

The table below describes the logic used to calculate the ADF pipeline complexity. The complexity is determined based on the count of simple, medium, and complex activities along with the total number of activities.

Pipeline Complexity	Number of Simple Activity	Number of Medium Activity	Number of Complex Activity	Total Number of Activity
Simple	<=8	<=2	0	<=10
Medium	>8 & <=13	>2 & <=5	>0 & <=2	>10 & <=20
Complex	>13	>5	>2	>20

The following table lists the activities categorized as simple, medium, and complex.

Simple Activities	Medium Activities	Complex Activities
DatabricksNotebook Jar Python SqlServerStoredProcedure Spark job definition AzureFunctionActivity Append variable Delete ExecutePipeline Fail Lookup SetVariable Set Pipeline Return Value WebActivity Copy Data flow Spark GetMetadata Parameter Script AppendVariable Wait ExecuteSSISPackage	Filter ForEach IfCondition Until Switch Hive Validation	MapReduce Pig Streaming ML Studio (classic) Batch Execution Update Machine Learning Studio (classic) models

Analytics

The below tables describe the logic for calculating the complexity of analytics scripts such as SAS and Alteryx.

SAS

The table below describes the logic used to calculate SAS complexity. Complexity is determined based on several factors, such as the number of conditional procedural statements, data statements, external accesses, total create macro lines, and other relevant factors.

Complexity Factors	Simple	Medium	Complex	Very Complex	Very Very Complex
Conditional Procedural Statement	<=3	>3 & <=50	>50 & <=100	>100 & <=500	>500 & <=1000
Data Statements	<=3	>3 & <=10	>10 & <=50	>50 & <=200	>200 & <=500
Total Statements	<=50	>50 & <=200	>200 & <=500	>500 & <=1000	>1000 & <=5000
External Accesses	<=8	>8 & <=18	>18 & <=80	>80 & <=180	>180 & <=1000
ODS Reporting - Basic	<=1	2	>2 & <=5	>5 & <=10	>10 & <=50
ODS Reporting - Advanced	0	0	1	>1 & <=3	>3 & <=5
PROC Category - Basic	<=8	>8 & <=20	>20 & <=100	>100 & <=200	>200 & <=1000
PROC Category - Intermediate	<=4	>4 & <=6	>6 & <=10	>10 & <=40	>40 & <=200
PROC Category - Advanced	<=1	2	>2 & <=5	>5 & <=10	>10 & <=20
Total Create Macro Lines	<=200	>200 & <=500	>500 & <=1000	>1000 & <=2000	>2000 & <=10000
Total SAS Code Lines With Valid Code	<=100	>100 & <=500	>500 & <=1000	>1000 & <=2000	>2000 & <=5000
Total Processed SAS Code Lines	<=100	>100 & <=500	>500 & <=1000	>1000 & <=2000	>2000 & <=5000

The complexity of PROC types—such as PROC Category – Basic, Intermediate, and Advanced—is calculated using predefined weightage values. For example, if you encounter five PROC Category – Basic, the calculation is 5 × 0.5 = 2.5, which corresponds to Simple complexity.

Refer to the table below for the weightage assigned to each PROC type.

PROC Type	Basic	Intermediate	Advance
Weightage	0.5	0.75	1

The complexity of a SAS file is determined by the maximum complexity level among all factors.

For example: If a file contains:

Conditional Procedural Statements = 48 (Medium complexity)
Data Statements = 3 (Simple complexity)
Total Statements = 48 (Simple complexity)

In this case, the overall complexity of that file is Medium.

Alteryx

The below table describes the logic for calculating Alteryx complexity. It calculates the complexity on the basis of nodes, length of expressions, and more.

Script Severity	Total Nodes	Expression Length	AIML	NLP	Reporting	SAS
Trivial	<=15	<=500	0	0	0	0
Simple	>15 & <=30	>500 & <=1000	0	0	0	0
Medium	>30 & <=100	>1000 & <=4000	<=1	0	0	0
Complex	>100 & <=200	>4000 & <=10000	>1 & <=5	<=1	<=1	0
Very Complex	>200	>10000	>5	>1	>1	>0

The below tables describe the logic for calculating BI complexity.

OBIEE

The complexity of OBIEE reports depends on the number of columns, filters, views, or prompts. There are three complexity levels: simple, medium, and complex. The overall complexity of the report is determined by the highest complexity among the individual components.

The below table shows the complexity matrix for OBIEE reports.

Complexity	Number of columns	Number Of Filters	Number Of Views	Number Of Prompts
Simple	<= 5	<=1	<=2	<=1
Medium	>5 AND <= 10	>1 AND <=2	>2 AND <=4	>1 AND <=2
Complex	> 10	>2	>4	>2

For example, if the report has 4 columns (which falls under simple complexity), 1 view (which falls under simple complexity), and 3 filters (which falls under complex complexity), then overall report complexity is considered complex. This is because the system will always consider the highest complexity level among the individual components when determining overall report complexity.

IBM Cognos

The complexity of IBM Cognos is determined by calculating the object’s corresponding score and comparing it to predefined complexity values.

The following table describes the predefined values for complexity calculation for IBM Cognos.

Complexity	Overall Score
Simple	<=3
Medium	>3 and <=8
Complex	> 8

The following table displays the objects and its corresponding values.

Objects	Corresponding Score
Page	1
Visual	1
Field count in report across the queries <= 5	0
Field count in report across the queries > 5 and <= 10	1
Field count in report across the queries > 10 and <= 20	2
Field count in report across the queries > 20	4
1 package	0
2 packages	2
> 2 packages	4
Custom Queries	4 + sum of complexity score of queries involved

MicroStrategy

The table below describes the logic for calculating the dataset complexity of MicroStrategy. It calculates the complexity based on the count of columns, expressions, tables, and join statements.

Dataset Complexity	Total Column Count	Expression Count (Filter Included)	Table Count	Join Count
Simple	100	30	1-10	10
Medium	101-200	31-60	11-20	11-20
Complex	200+	60+	20+	20+

The table below describes the logic for calculating the visual complexity of MicroStrategy. It calculates the complexity based on the count of columns, measures, and filters.

Visual Complexity	Column Count (Dimensions and Masures)	Measure Count	Filter Count
Simple	<5	<3	<3
Medium	5-10	3-6	3-5
Complex	>10	>6	>5

SAP BO

The table below describes the logic for calculating the dataset complexity of SAP BO. It calculates the complexity based on the count of columns, expressions, tables, and join statements.

Dataset Complexity	Total Column Count	Expression Count (Filter Included)	Table Count	Join Count
Simple	100	30	1-10	10
Medium	101-200	31-60	11-20	11-20
Complex	200+	60+	20+	20+

The table below describes the logic for calculating the visual complexity of SAP BO. It calculates the complexity based on the count of columns, measures, and filters.

Visual Complexity	Column Count (Dimensions and Masures)	Measure Count	Filter Count
Simple	<5	<3	<3
Medium	5-10	3-6	3-5
Complex	>10	>6	>5

Tableau

The table below describes the logic for calculating the dataset complexity of Tableau. It calculates the complexity based on the count of columns, expressions, tables, and join statements.

Dataset Complexity	Total Column Count	Expression Count (Filter Included)	Table Count	Join Count
Simple	100	30	1-7	0-6
Medium	101-200	31-60	8-15	7-14
Complex	200+	60+	15+	14+

The table below describes the logic for calculating the visual complexity of Tableau. It calculates the complexity based on the count of columns, measures, and filters.

Visual Complexity	Column Count (Dimensions and Masures)	Measure Count	Filter Count
Simple	<5	<3	<3
Medium	5-10	3-6	3-5
Complex	>10	>6	>5

SSRS

The table below describes the logic for calculating the visual complexity of SSRS. It calculates the complexity based on the count of columns, calculated columns, filters and parameters.

Visual Complexity	Total Column Count	Calculated Columns Count	Filter and Parameters count	Comments
Simple	<5	<3	<3	Sub-reports should be considered as separate reports
Medium	5-10	3-6	3-5
Complex	>10	>6	>5

Power BI

The table below describes the logic for calculating the dataset complexity of Power BI. It calculates the complexity based on the count of columns, measures, tables, and relationships.

Dataset Complexity	Total Column Count	Measure Count	Table Count	Relationship Count
Simple	0-50	0-14	1-5	0-4
Medium	51-150	15-50	6-15	5-14
Complex	150+	50+	15+	14+

The table below describes the logic for calculating the visual complexity of Power BI. It calculates the complexity based on the count of columns, measures, and filters.

Visual Complexity	Column Count	Measure Count	Filter Count
Simple	0-3	0-3	0-4
Medium	4-8	3-6	4-9
Complex	8+	6+	9+

WebFOCUS

The complexity of each WebFOCUS report is calculated based on the predefined weights of its components. Each component is assigned a fixed weight, and the report’s complexity score is obtained by summing the weights of all its components. The total score is then compared with the predefined ranges to classify the complexity level of WebFOCUS report (e.g., Simple, Medium, and Complex).

The following table shows the predefined ranges used to classify the complexity of a WebFOCUS report.

Complexity Level	Score Range
Simple	< 6
Medium	>= 6 and <= 10
Complex	> 10

The following table displays the components and their corresponding weights.

Category	Elements/ Components	Score (weight)	Comments
Master File	Segment	1	To calculate the report complexity, add the scores (weights) of all components present in the report.
Master File	Folder	1
Master File	dba_roles	2
Master File	Filter	1
FEX File	SQL Analysis Score	Sum of the scores of all components within the master file
FEX File	Master File Score	Sum of the scores of all components within the master file
FEX File	Each Transformation	1
FEX File	Each Table	1
Report Complexity	Basic Graph	1
Report Complexity	Export & HTML	3
Report Complexity	Parameter	1

Shell Script

The below table describes the logic used to calculate Shell script complexity. The complexity is calculated based on the number of basic file operations and email notifications, advanced file operations, SQL blocks, conditions, functions, and more.

Complexity Level	Number of Basic File Operations and Email Notifications (e.g. cp, mv, rm, zip, unzip, tar)	Number of Advanced File Operations (e.g. sed, awk, cut, grep, sort, tr)	Number of SQL Blocks	Number of Conditions	Number of Functions	Number of Loops	Lines of Code
1	<=2	=0	=0	<= 2	<= 0	=0	<= 20
2	>= 3 & <= 5	>= 1 & <= 2	>= 1 & <= 2	>=3 & <= 5	>= 1 & <= 3	>= 2 & <= 3	>= 21 & <= 50
3	>= 6 & <= 12	>= 3 & <= 5	>= 3 & <= 6	>= 6 & <= 10	>= 4 & <= 5	>= 4 & <= 6	>= 51 & <= 150
4	>= 13 & <= 18	>= 6 & <= 8	>= 7 & <= 12	>= 11 & <= 15	>= 6 & <= 8	>= 7 & <= 10	>= 151 & <= 300
5	> 18	> 8	> 12	> 15	> 8	> 10	> 300

Category Definitions:

Number of Basic File Operations: Number of basic file operations, including file-level actions such as cp, mv, rm, touch, mkdir, zip, unzip, tar, and similar commands.
Number of Email Notifications: Count of email notification mechanisms (mail, sendmail, mailx).
Number of Advanced File Operations: Number of advanced file operations, including text and content transformation commands like sed, awk, cut, grep, sort, tr, xargs, etc.
Number of SQL Blocks: Number of embedded SQL blocks using Teradata or other RDBMS inside the shell script.
Number of Conditions: Number of logical constructs such as if, case, and nested conditions.
Number of Functions: Total number of shell functions defined.
Number of Loops: Number of for, while, or until loops.
Lines of Code: Number of lines of code.

Orchestration

The below tables describe the logic for calculating the complexity of orchestration scripts such as AutoSys and Control-M.

AutoSys

The below table describes the logic for calculating AutoSys job complexity.

Job Complexity	Condition	Custom Calendar	Box scheduled jobs (Job and Parent scheduled at diff time)	Comment
Simple	No	No	No	If the three objects (Condition, Custom calendar, and Box scheduled jobs) are absent, then the complexity is Simple
Medium				If one or two of the objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Medium
Complex	Yes	Yes	Yes	If all three objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Complex

The box complexity is determined based on the highest complexity factor of the box or the jobs within the box. If the box contains multiple jobs, the job complexity factor is the average of the complexities of all the jobs in the box.

The below table describes the logic for calculating AutoSys box complexity.

Box Complexity	Condition	Custom Calendar	Box scheduled jobs (Job and Parent scheduled at diff time)	Comment
Simple	No	No	No	If the three objects (Condition, Custom calendar, and Box scheduled jobs) are absent, then the complexity is Simple
Medium				If one or two of the objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Medium
Complex	Yes	Yes	Yes	If all three objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Complex
* If the box contains multiple jobs, then the box complexity is determined based on the highest complexity factor of the box or the jobs within the box

The complexity levels are categorized as Simple (1), Medium (2), and Complex (3). Let’s see a few scenarios to calculate the box complexity if it contains multiple jobs.

If a box has Simple (1) complexity and that box contains 3 jobs – J1, J2, and J3 with 1, 2, and 3 complexity values respectively, then the box and job complexity factors are as follows:

The box complexity factor is 1, because the box complexity is Simple.
The job complexity factor is 2, because average job complexity is 2 (sum of job complexity values/number of jobs).

Here, the complexity of box is considered as Medium (2) because the job complexity factor (2) is higher than the box complexity factor (1).

Similarly, if box complexity factor is 3 and the job complexity factor is 2, then the box complexity is considered as Complex (3) because the box complexity factor (3) is higher than the job complexity factor (2).

Control-M

The below table describes the logic for calculating Control-M folder and job complexity.

Folder and Job Complexity	Frequency	Dependencies	Comments
Simple	Every Day: Scheduled the job to run on every day	<=1	The complexity of regular folder is always simple.
Medium	Specific Dates: Scheduled the job to run on specific dates such as Dec 30, June 15, etc.	2 or 3	The complexity of Smart folder is determined as Medium or Complex based on the highest number of complexity of jobs present in it. If the frequency of a Smart folder is Advanced schedule (scheduled to run on weekdays or specific days of the month), then the complexity is Complex. If the frequency of a Smart folder is Every Day (scheduled to run on every day), then the complexity is Medium. If that Smart folder contains jobs with complexity higher than the Smart folder, then its complexity is determined based on the highest number of jobs’ complexity. For instance, if the frequency of a Smart folder is Medium and that Smart folder contains 3 jobs – J1, J2, and J3 with Complex complexity for each job, then the complexity of Smart folder is Complex. Similarly, if the frequency of a Smart folder is Medium and that Smart folder contains 3 jobs – J1 and J2 with Medium complexity, and J3 with Complex complexity, then the complexity of Smart folder is Medium because the number of Medium complexity jobs (2) are greater than Complex complexity jobs (1).
Complex	Advanced schedule: Scheduled the jobs to run based on an advanced schedule, such as: Run on weekdays: Schedules the job to run on specific weekdays. For example, WEEKDAYS="1,4". Run on specific days of the month: Schedules the job to run on specific days of the month. For example, DAYS="1,6,4,17,27"	>=4

Hadoop

The below tables describe the logic for calculating Hadoop complexity.

Hive/ Impala Job to Spark SQL

The below table describes the logic for calculating the complexity of Hive/ Impala jobs with the target as Spark SQL. It calculates the complexity based on the number of queries per job.

Hive/ Impala Job to Spark SQL	Number of queries per job
1	<=5
2	<=10
3	<=20
4	<=30
5	>30

PySpark/ Spark Job to PySpark/ Spark

The below table describes the logic for calculating the complexity of PySpark/ Spark jobs with the target as PySpark/ Spark. It calculates the complexity on the basis of data frame operations.

PySpark/ Spark Job to Pyspark/ Spark	Data Frame Operations
1	<=5
2	<=10
3	<=20
4	<=30
5	>30

Hive/ Impala Job to Redshift/ Snowflake

The below table describes the logic for calculating the complexity of Hive/ Impala jobs with the target as Redshift/ Snowflake. It calculates the complexity based on the number of queries per job.

Hive/ Impala Job to Redshift/ Snowflake	Number of queries per job
1	<=5
2	<=10
3	<=20
4	<=30
5	>30

Spark SQL Job to Spark SQL

The below table describes the logic for calculating the complexity of Spark SQL jobs with the target as Spark SQL. It calculates the complexity based on the number of queries per job.

Spark SQL Job to Spark SQL	Number of queries per job
1	<=5
2	<=10
3	<=20
4	<=30
5	>30

MapReduce/ Pig Latin Job to PySpark/ Spark

The below table describes the logic for calculating the complexity of MapReduce/ Pig Latin jobs with the target as PySpark/ Spark. It calculates the complexity on the basis of lines of code.

MapReduce/ Pig Latin Job to Pyspark/ Spark	LOC
1	<=100
2	<=500
3	<=1000
4	<=2500
5	>2500

Spark SQL Job to Redshift/ Snowflake

The below table describes the logic for calculating the complexity of Spark SQL jobs with the target as Redshift/ Snowflake. It calculates the complexity based on the number of queries per job.

Spark SQL Job to Redshift/ Snowflake	Number of queries per job
1	<=5
2	<=10
3	<=20
4	<=30
5	>30

Sqoop/ Nifi Job to Spark SQL

The below table describes the logic for calculating the complexity of Sqoop/ Nifi jobs with the target as Spark SQL. It calculates the complexity on the basis of transformations.

Sqoop/ Nifi Job to Spark SQL	Number of transformations
1	<=5
2	<=10
3	<=20
4	<=30
5	>30