Complexity Calculation Logic
This topic describes the logic for calculating complexity. Based on the count of certain applicable patterns as per the workload type, the intelligent assessment engine calculates the workload complexity. As per the input workload, it may also consider variables, lines of code, expression length, or number of queries per job, etc.
In This Topic:
- EDW – Query (Teradata, Oracle, SQL Server, Vertica, Netezza, BigQuery, Snowflake, Redshift)
- EDW – Procedural Code
- ETL (Informatica, DataStage, Ab Initio, Talend, SSIS, Matillion, ODI)
- Analytics (SAS, Alteryx)
- BI (OBIEE, IBM Cognos, MicroStrategy, SAP BO, Tableau, SSRS, Power BI)
- Orchestration (AutoSys)
- Hadoop
EDW – Query
The below table describes the logic for calculating query complexity. It calculates the query complexity on the basis of minimum/maximum occurrences of functions, operations, sub-queries, joins, total length of the query, case clauses, and nested sub-queries.
Category |
Sub Category |
Min |
Max |
Complexity Number |
Recursive |
Recursive |
1 |
1 |
5 |
Window Functions |
windowfunctions |
1 |
4 |
1 |
windowfunctions |
5 |
10 |
2 |
windowfunctions |
11 |
|
3 |
Aggregate Functions |
aggregatefunctions |
1 |
4 |
1 |
aggregatefunctions |
5 |
10 |
2 |
aggregatefunctions |
11 |
|
3 |
Union |
union |
1 |
4 |
0.5 |
union |
5 |
|
2 |
Operation |
minus |
1 |
1 |
1 |
minus |
2 |
|
2 |
Sub Query clause |
subquery with select clause |
1 |
2 |
1 |
subquery with select clause |
3 |
4 |
1.5 |
subquery with select clause |
5 |
5 |
2 |
subquery with select clause |
6 |
6 |
2.5 |
subquery with select clause |
7 |
|
3 |
subquery with where clause |
1 |
2 |
0.5 |
subquery with where clause |
3 |
4 |
1 |
subquery with where clause |
5 |
6 |
2.5 |
subquery with where clause |
7 |
|
3 |
subquery with from clause |
1 |
2 |
0.5 |
subquery with from clause |
3 |
4 |
1 |
subquery with from clause |
5 |
|
2 |
subquery with other clause |
1 |
2 |
1 |
subquery with other clause |
3 |
4 |
2 |
subquery with other clause |
5 |
|
3 |
subquery with in /anyall/exists |
1 |
2 |
0.5 |
subquery with in /anyall/exists |
3 |
4 |
1.5 |
subquery with in /anyall/exists |
5 |
|
3 |
Join |
cross join |
1 |
2 |
0.5 |
cross join |
3 |
4 |
1 |
cross join |
5 |
6 |
2 |
cross join |
7 |
|
3 |
inner join |
1 |
2 |
0.5 |
inner join |
3 |
5 |
1 |
inner join |
6 |
8 |
2 |
inner join |
9 |
|
3 |
outer join |
1 |
2 |
0.5 |
outer join |
3 |
4 |
2 |
outer join |
5 |
6 |
3 |
outer join |
7 |
8 |
4 |
outer join |
9 |
|
5 |
Order By |
order by columns/(having/groupby) derived columns |
2 |
|
1 |
Length |
length |
5001 |
10000 |
1 |
length |
10001 |
20000 |
2 |
length |
30001 |
40000 |
3 |
length |
40001 |
50000 |
4 |
length |
50001 |
|
5 |
Join Condition |
where and join condition |
1 |
5 |
0 |
where and join condition |
6 |
10 |
0.5 |
where and join condition |
10 |
20 |
1 |
where and join condition |
21 |
30 |
2 |
where and join condition |
31 |
40 |
3 |
where and join condition |
40 |
|
4 |
Case Clause |
case |
1 |
2 |
0.5 |
case |
3 |
6 |
1 |
case |
6 |
|
1.5 |
Sub Query |
nested subquery |
1 |
1 |
1 |
nested subquery |
2 |
2 |
2 |
nested subquery |
3 |
3 |
3 |
nested subquery |
4 |
4 |
4 |
nested subquery |
5 |
|
5 |
select* |
|
|
|
1 |
EDW – Procedural Code
The below table describes the logic for calculating PL/SQL complexity. It calculates the PL/SQL complexity based on the number of queries, conditions, cursors, goto statements, loops, and lines of code.
Complexity |
Query Count |
Condition Count |
Cursor Count |
GoTo Count |
Line Of Code |
Loop Count |
Simple |
<=20 |
<=5 |
<=2 |
<=5 |
<=350 |
<=2 |
Medium |
>=21 and <=40 |
>=6 and <=10 |
>=3 and <=5 |
>=6 and <=10 |
>=351 and <=1000 |
>=3 and <=5 |
Complex |
>40 |
>10 |
>5 |
>10 |
>1000 |
>5 |
ETL
The below tables describe the logic for calculating ETL complexity.
Informatica
The below table describes the logic for calculating Informatica complexity. It calculates the complexity on the basis of components, transformations, expressions, variables, running aggregate functions, and length of columns.
Script Severity |
Components & Inputs/output |
Active/Passive Transformations |
Expressions |
Variables |
Running Agg |
Out Column Len |
Other Transformations |
Simple |
<=30 |
<=24 |
<=5 |
<=10 |
<=2 |
<=1000 |
<= 1 |
Medium |
>30 and <= 100 |
>24 and <= 88 |
>5 and <=10 |
>10 and <=30 |
>2 and <=6 |
>1000 and <=4000 |
>1 and <= 2 |
Complex |
>100 |
>88 |
>10 |
>30 |
>6 |
>4000 |
>2 |
DataStage
The below table describes the logic for calculating DataStage complexity. It calculates the complexity based on number of components, job type, transformations, and more.
Complexity |
Job Type |
Total Components |
Active/ Passive Transformations |
Other Transformation |
CTransformation Stage |
Simple |
Parallel/ Mainframe/ Server job |
<=30 |
<=24 |
<=1 |
<=5 |
Medium |
Parallel/ Mainframe/ Server job |
<=100 |
<=88 |
<=2 |
<=10 |
Complex |
Parallel/ Mainframe/ Server job |
>100 |
>88 |
>2 |
>10 |
Ab Initio
The below table describes the logic for calculating Ab Initio complexity. It calculates the complexity on the basis of number of components, lookup files, ebcdic files, dml, and xfr directory.
Script Severity |
Component Count |
Lookup File Count |
Ebcdic File Count |
Complex DML Count |
Complex XFR Count |
Simple |
<=20 |
<=2 |
<=2 |
<=2 |
<=2 |
Medium |
>20 and <=50 |
>2 and <=3 |
>2 and <=3 |
>2 and <=3 |
>2 and <=3 |
Complex |
> 50 |
>=4 |
>=4 |
>=4 |
>=4 |
Talend
The below table describes the logic for calculating Talend complexity. It calculates the complexity on the basis of tmaps, and components.
Script Severity |
Tmap Count |
Component Count |
1 |
<= 1 |
<=4 |
2 |
>1 & <=3 |
>4 & <=10 |
3 |
>3 & <=7 |
>10 & <=20 |
4 |
>7 & <=12 |
>20 & <=35 |
5 |
>12 |
>35 |
SSIS
The below table describes the logic for calculating SSIS complexity. It calculates the complexity on the basis of components and loops.
Graphs Severity (1-5 Scale) |
Total Components |
For Each Loop/For Loop |
Trivial (1) |
30 |
0 |
Simple (2) |
80 |
1 |
Medium (3) |
150 |
3 |
Complex (4) |
200 |
6 |
V Complex (5) |
200 |
6 |
Matillion
The below table describes the logic for calculating Matillion complexity. It calculates the complexity on the basis of joins, formulas, components, and more.
Category |
Simple |
Medium |
Complex |
Very Complex |
Number of SQL Queries |
<5 |
5-8 |
9-12 |
>12 |
Number of Joins |
<5 |
5-10 |
11-30 |
>30 |
Number of Iterators |
<5 |
5-10 |
11-20 |
>20 |
Number of Custom Logic |
0 |
1-4 |
5-20 |
>20 |
Number of Formulas / Calculators |
<10 |
10-12 |
13-40 |
>40 |
Number of Aggregations |
<5 |
5-10 |
11-30 |
>30 |
Number of Filters/ Conditions |
<10 |
11-15 |
16-50 |
>50 |
Number of Python Scripts |
<5 |
5-10 |
11-20 |
>20 |
Number of Orchestration |
<5 |
5-10 |
11-12 |
>12 |
Number of Components |
<60 |
61-100 |
101-250 |
>250 |
ODI
The below table shows the complexity matrix for ODI packages.
Complexity |
Trivial |
Simple |
Medium |
Complex |
V Complex |
V. V. Complex |
Complexity Number Range |
<=3 |
<=6 |
<=9 |
<=15 |
<=20 |
>20 |
Analytics
The below tables describe the logic for calculating the complexity of analytics scripts such as SAS and Alteryx.
SAS
The below table describes the logic for calculating SAS complexity. It calculates the SAS complexity based on the number of SQL statements, external resources, SQL lines, conditional procedural statements, and more.
Complexity |
Total SQL Statement |
External Resource |
Total Sql Lines |
Total Create Macro Lines |
External Acccess |
Conditional Procedural Statement |
Proc Base SAS |
Proc Other Advanced |
Proc SAS ETS |
Simple (1) |
<4 |
<4 |
200 |
200 |
4 |
3 |
4 |
1 |
1 |
Medium (2) |
<8 |
10 |
1000 |
500 |
8 |
50 |
10 |
2 |
2 |
High (3) |
<50 |
30 |
2000 |
1000 |
50 |
100 |
40 |
5 |
5 |
Very High (4) |
<100 |
80 |
3000 |
2000 |
100 |
500 |
100 |
10 |
10 |
Very Very High (5) |
>500 |
500 |
10000 |
10000 |
500 |
1000 |
500 |
50 |
50 |
Complexity |
Proc SAS STAT |
Proc SAS OR |
Proc SAS ODS |
Proc SQL Count |
Total Source Code Lines |
Total Statement Count |
Data Statement Count |
Create Macros |
Total Statement Count |
SAS Script Generation Statement Count |
Simple (1) |
1 |
1 |
1 |
4 |
100 |
50 |
3 |
8 |
50 |
0 |
Medium (2) |
2 |
2 |
2 |
8 |
500 |
100 |
10 |
15 |
200 |
0 |
High (3) |
5 |
5 |
5 |
50 |
1000 |
200 |
50 |
50 |
500 |
0 |
Very High (4) |
10 |
10 |
10 |
100 |
2000 |
500 |
200 |
100 |
1000 |
0 |
Very Very High (5) |
50 |
50 |
50 |
500 |
5000 |
1000 |
500 |
500 |
5000 |
1 |
Alteryx
The below table describes the logic for calculating Alteryx complexity. It calculates the complexity on the basis of nodes, length of expressions, transformations, and more.
Script Severity |
Total Nodes |
Expression Length |
AIML |
NLP |
Reporting |
SAS |
Control |
Transformation Basic |
Transformation Advanced |
InOut Advanced |
InOut Basic |
Display |
Trivial |
<=15 |
<=500 |
0 |
0 |
0 |
0 |
0 |
<=15 |
0 |
0 |
<=15 |
<15 |
Simple |
>15 & <=30 |
>500 & <=1000 |
0 |
0 |
0 |
0 |
<=1 |
>15 & <=30 |
0 |
0 |
>15 & <=30 |
<30 |
Medium |
>30 & <=100 |
>1000 & <=4000 |
<=1 |
0 |
0 |
0 |
>1 & <=2 |
>30 & <=100 |
<=5 |
<=5 |
>30 & <=100 |
<100 |
Complex |
>100 & <=200 |
>4000 & <=10000 |
>1 & <=5 |
<=1 |
<=1 |
0 |
>2 & <=5 |
>100 & <=200 |
>5 & <=10 |
>5 & <=10 |
>100 & <=200 |
<200 |
Very Complex |
>200 |
>10000 |
>5 |
>1 |
>1 |
>0 |
>5 |
>200 |
>10 |
>10 |
>200 |
>200 |
BI
The below tables describe the logic for calculating BI complexity.
OBIEE
The complexity of OBIEE reports depends on the number of columns, filters, views, or prompts. There are three complexity levels: simple, medium, and complex. The overall complexity of the report is determined by the highest complexity among the individual components.
The below table shows the complexity matrix for OBIEE reports.
Complexity |
Number of columns |
Number Of Filters |
Number Of Views |
Number Of Prompts |
Simple |
<= 5 |
<=1 |
<=2 |
<=1 |
Medium |
>5 AND <= 10 |
>1 AND <=2 |
>2 AND <=4 |
>1 AND <=2 |
Complex |
> 10 |
>2 |
>4 |
>2 |
For example, if the report has 4 columns (which falls under simple complexity), 1 view (which falls under simple complexity), and 3 filters (which falls under complex complexity), then overall report complexity is considered complex. This is because the system will always consider the highest complexity level among the individual components when determining overall report complexity.
IBM Cognos
The complexity of IBM Cognos is determined by calculating the object’s corresponding score and comparing it to predefined complexity values.
The following table describes the predefined values for complexity calculation for IBM Cognos.
Complexity |
Overall Score |
Simple |
<=3 |
Medium |
>3 and <=8 |
Complex |
> 8 |
The following table displays the objects and its corresponding values.
Objects |
Corresponding Score |
Page |
1 |
Visual |
1 |
Field count in report across the queries <= 5 |
0 |
Field count in report across the queries > 5 and <= 10 |
1 |
Field count in report across the queries > 10 and <= 20 |
2 |
Field count in report across the queries > 20 |
4 |
1 package |
0 |
2 packages |
2 |
> 2 packages |
4 |
Custom Queries |
4 + sum of complexity score of queries involved |
MicroStrategy
The table below describes the logic for calculating the dataset complexity of MicroStrategy. It calculates the complexity based on the count of columns, expressions, tables, and join statements.
Dataset Complexity |
Total Column Count |
Expression Count (Filter Included) |
Table Count |
Join Count |
Simple |
100 |
30 |
1-10 |
<=10 |
Medium |
101-200 |
31-60 |
11-20 |
<=20 |
Complex |
201+ |
61+ |
21+ |
>21 |
The table below describes the logic for calculating the visual complexity of MicroStrategy. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count (Dimensions and Masures) |
Measure Count |
Filter Count |
Simple |
<5 |
<3 |
<3 |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
SAP BO
The table below describes the logic for calculating the dataset complexity of SAP BO. It calculates the complexity based on the count of columns, expressions, tables, and join statements.
Dataset Complexity |
Total Column Count |
Expression Count (Filter Included) |
Table Count |
Join Count |
Simple |
100 |
30 |
1-10 |
<=10 |
Medium |
101-200 |
31-60 |
11-20 |
<=20 |
Complex |
201+ |
61+ |
21+ |
>21 |
The table below describes the logic for calculating the visual complexity of SAP BO. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count (Dimensions and Masures) |
Measure Count |
Filter Count |
Simple |
<5 |
<3 |
<3 |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
Tableau
The table below describes the logic for calculating the dataset complexity of Tableau. It calculates the complexity based on the count of columns, expressions, tables, and join statements.
Dataset Complexity |
Total Column Count |
Expression Count (Filter Included) |
Table Count |
Join Count |
Simple |
100 |
30 |
1-7 |
<=6 |
Medium |
101-200 |
31-60 |
8-15 |
<=14 |
Complex |
201+ |
61+ |
16+ |
>15 |
The table below describes the logic for calculating the visual complexity of Tableau. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count (Dimensions and Masures) |
Measure Count |
Filter Count |
Simple |
<5 |
<3 |
<3 |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
SSRS
The table below describes the logic for calculating the visual complexity of SSRS. It calculates the complexity based on the count of columns, calculated columns, filters and parameters.
Visual Complexity |
Total Column Count |
Calculated Columns Count |
Filter and Parameters count |
Comments |
Simple |
<5 |
<3 |
<3 |
Sub-reports should be considered as separate reports |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
Power BI
The table below describes the logic for calculating the dataset complexity of Power BI. It calculates the complexity based on the count of columns, measures, tables, and relationships.
Dataset Complexity |
Total Column Count |
Measure Count |
Table Count |
Relationship Count |
Simple |
0-50 |
0-14 |
1-5 |
<=4 |
Medium |
51-150 |
15-50 |
06-15 |
<=14 |
Complex |
150+ |
50+ |
15+ |
>14 |
The table below describes the logic for calculating the visual complexity of Power BI. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count |
Measure Count |
Filter Count |
Simple |
0-3 |
<3 |
<4 |
Medium |
4-8 |
3-6 |
4-9 |
Complex |
>8 |
>6 |
>9 |
Orchestration – AutoSys
The below table describes the logic for calculating AutoSys job complexity.
Job Complexity |
Condition |
Custom Calendar |
Box scheduled jobs (Job and Parent scheduled at diff time) |
Comment |
Simple |
No |
No |
No |
If the three objects (Condition, Custom calendar, and Box scheduled jobs) are absent, then the complexity is Simple |
Medium |
|
|
|
If one or two of the objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Medium |
Complex |
Yes |
Yes |
Yes |
If all three objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Complex |
The below table describes the logic for calculating AutoSys box complexity. The box complexity is derived based on the number of job complexity.
Box |
Simple |
Medium |
Complex |
Comment |
Simple |
Higher count of "Simple" complexity |
|
|
- Higher count of Simple complexity: If a box contains a higher number of Simple complexity jobs compared to Medium or Complex complexity jobs, then the box complexity is considered Simple. For instance, if a box has 10 jobs with 5 simple, 3 Medium, and 2 Complex complexities, then it will be considered Simple complexity.
- Higher count of Medium complexity: If a box contains a higher number of Medium complexity jobs compared to Simple or Complex complexity jobs, then the box complexity is considered Medium. For instance, if a box has 10 jobs with 3 simple, 5 Medium and 2 Complex complexity, then it will be considered Medium complexity.
- Higher count of Complex complexity: If a box contains a higher number of Complex complexity jobs compared to Simple or Medium complexity jobs, then the box complexity is considered Complex. For instance, if a box has 10 jobs with 3 simple, 2 Medium and 5 Complex complexity, then it will be considered Complex complexity.
- Equal complexity counts: If the counts of Simple, Medium, and Complex complexity jobs are same, then it is considered Complex. For instance, if a box has 6 jobs with 2 Simple, 2 Medium and 2 Complex complexity, then it will be considered Complex complexity.
|
Medium |
|
Higher count of "Medium" complexity |
|
Complex |
|
|
Higher count of "Complex" complexity |
Hadoop
The below tables describe the logic for calculating Hadoop complexity.
Hive/ Impala Job to Spark SQL
The below table describes the logic for calculating the complexity of Hive/ Impala jobs with the target as Spark SQL. It calculates the complexity based on the number of queries per job.
Hive/ Impala Job to Spark SQL |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
PySpark/ Spark Job to PySpark/ Spark
The below table describes the logic for calculating the complexity of PySpark/ Spark jobs with the target as PySpark/ Spark. It calculates the complexity on the basis of data frame operations.
PySpark/ Spark Job to Pyspark/ Spark |
Data Frame Operations |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
Hive/ Impala Job to Redshift/ Snowflake
The below table describes the logic for calculating the complexity of Hive/ Impala jobs with the target as Redshift/ Snowflake. It calculates the complexity based on the number of queries per job.
Hive/ Impala Job to Redshift/ Snowflake |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
Spark SQL Job to Spark SQL
The below table describes the logic for calculating the complexity of Spark SQL jobs with the target as Spark SQL. It calculates the complexity based on the number of queries per job.
Spark SQL Job to Spark SQL |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
MapReduce/ Pig Latin Job to PySpark/ Spark
The below table describes the logic for calculating the complexity of MapReduce/ Pig Latin jobs with the target as PySpark/ Spark. It calculates the complexity on the basis of lines of code.
MapReduce/ Pig Latin Job to Pyspark/ Spark |
LOC |
1 |
<=100 |
2 |
<=500 |
3 |
<=1000 |
4 |
<=2500 |
5 |
>2500 |
Spark SQL Job to Redshift/ Snowflake
The below table describes the logic for calculating the complexity of Spark SQL jobs with the target as Redshift/ Snowflake. It calculates the complexity based on the number of queries per job.
Spark SQL Job to Redshift/ Snowflake |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
Sqoop/ Nifi Job to Spark SQL
The below table describes the logic for calculating the complexity of Sqoop/ Nifi jobs with the target as Spark SQL. It calculates the complexity on the basis of transformations.
Sqoop/ Nifi
Job to Spark SQL |
Number of transformations |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |