Complexity Calculation Logic
This topic describes the logic for calculating complexity. Based on the count of certain applicable patterns as per the workload type, the intelligent assessment engine calculates the workload complexity. As per the input workload, it may also consider variables, lines of code, expression length, or number of queries per job, etc.
In This Topic:
- EDW – Query (Teradata, Oracle, SQL Server, Vertica, Netezza, BigQuery, Snowflake, Redshift)
- EDW – Procedural Code
- ETL (Informatica, DataStage, Ab Initio, Talend, SSIS, Matillion, ODI)
- Analytics (SAS, Alteryx)
- BI (OBIEE, IBM Cognos, MicroStrategy, SAP BO, Tableau, SSRS, Power BI)
- Orchestration (AutoSys, Control-M)
- Hadoop
EDW – Query
The below table describes the logic for calculating query complexity. It calculates the query complexity on the basis of minimum/maximum occurrences of functions, operations, sub-queries, joins, total length of the query, case clauses, and nested sub-queries.
Category |
Sub Category |
Min |
Max |
Complexity Number |
Recursive |
Recursive |
1 |
1 |
5 |
Window Functions |
windowfunctions |
1 |
4 |
1 |
windowfunctions |
5 |
10 |
2 |
windowfunctions |
11 |
|
3 |
Aggregate Functions |
aggregatefunctions |
1 |
4 |
1 |
aggregatefunctions |
5 |
10 |
2 |
aggregatefunctions |
11 |
|
3 |
Union |
union |
1 |
4 |
0.5 |
union |
5 |
|
2 |
Operation |
minus |
1 |
1 |
1 |
minus |
2 |
|
2 |
Sub Query clause |
subquery with select clause |
1 |
2 |
1 |
subquery with select clause |
3 |
4 |
1.5 |
subquery with select clause |
5 |
5 |
2 |
subquery with select clause |
6 |
6 |
2.5 |
subquery with select clause |
7 |
|
3 |
subquery with where clause |
1 |
2 |
0.5 |
subquery with where clause |
3 |
4 |
1 |
subquery with where clause |
5 |
6 |
2.5 |
subquery with where clause |
7 |
|
3 |
subquery with from clause |
1 |
2 |
0.5 |
subquery with from clause |
3 |
4 |
1 |
subquery with from clause |
5 |
|
2 |
subquery with other clause |
1 |
2 |
1 |
subquery with other clause |
3 |
4 |
2 |
subquery with other clause |
5 |
|
3 |
subquery with in /anyall/exists |
1 |
2 |
0.5 |
subquery with in /anyall/exists |
3 |
4 |
1.5 |
subquery with in /anyall/exists |
5 |
|
3 |
Join |
cross join |
1 |
2 |
0.5 |
cross join |
3 |
4 |
1 |
cross join |
5 |
6 |
2 |
cross join |
7 |
|
3 |
inner join |
1 |
2 |
0.5 |
inner join |
3 |
5 |
1 |
inner join |
6 |
8 |
2 |
inner join |
9 |
|
3 |
outer join |
1 |
2 |
0.5 |
outer join |
3 |
4 |
2 |
outer join |
5 |
6 |
3 |
outer join |
7 |
8 |
4 |
outer join |
9 |
|
5 |
Order By |
order by columns/(having/groupby) derived columns |
2 |
|
1 |
Length |
length |
5001 |
10000 |
1 |
length |
10001 |
20000 |
2 |
length |
30001 |
40000 |
3 |
length |
40001 |
50000 |
4 |
length |
50001 |
|
5 |
Join Condition |
where and join condition |
1 |
5 |
0 |
where and join condition |
6 |
10 |
0.5 |
where and join condition |
10 |
20 |
1 |
where and join condition |
21 |
30 |
2 |
where and join condition |
31 |
40 |
3 |
where and join condition |
40 |
|
4 |
Case Clause |
case |
1 |
2 |
0.5 |
case |
3 |
6 |
1 |
case |
6 |
|
1.5 |
Sub Query |
nested subquery |
1 |
1 |
1 |
nested subquery |
2 |
2 |
2 |
nested subquery |
3 |
3 |
3 |
nested subquery |
4 |
4 |
4 |
nested subquery |
5 |
|
5 |
select* |
|
|
|
1 |
EDW – Procedural Code
The below table describes the logic for calculating PL/SQL complexity. It calculates the PL/SQL complexity based on the number of queries, conditions, cursors, goto statements, loops, and lines of code.
Complexity |
Query Count |
Condition Count |
Cursor Count |
GoTo Count |
Line Of Code |
Loop Count |
Simple |
<=20 |
<=5 |
<=2 |
<=5 |
<=350 |
<=2 |
Medium |
>=21 and <=40 |
>=6 and <=10 |
>=3 and <=5 |
>=6 and <=10 |
>=351 and <=1000 |
>=3 and <=5 |
Complex |
>40 |
>10 |
>5 |
>10 |
>1000 |
>5 |
ETL
The below tables describe the logic for calculating ETL complexity.
Informatica
The below table describes the logic for calculating Informatica complexity. It calculates the complexity on the basis of components, transformations, expressions, variables, running aggregate functions, and length of columns.
Script Severity |
Components & Inputs/output |
Active/Passive Transformations |
Expressions |
Variables |
Running Agg |
Out Column Len |
Other Transformations |
Simple |
<=30 |
<=24 |
<=5 |
<=10 |
<=2 |
<=1000 |
<= 1 |
Medium |
>30 and <= 100 |
>24 and <= 88 |
>5 and <=10 |
>10 and <=30 |
>2 and <=6 |
>1000 and <=4000 |
>1 and <= 2 |
Complex |
>100 |
>88 |
>10 |
>30 |
>6 |
>4000 |
>2 |
DataStage
The below table describes the logic for calculating DataStage complexity. It calculates the complexity based on number of components, job type, transformations, and more.
Complexity |
Job Type |
Total Components |
Active/ Passive Transformations |
Other Transformation |
CTransformation Stage |
Simple |
Parallel/ Mainframe/ Server job |
<=30 |
<=24 |
<=1 |
<=5 |
Medium |
Parallel/ Mainframe/ Server job |
<=100 |
<=88 |
<=2 |
<=10 |
Complex |
Parallel/ Mainframe/ Server job |
>100 |
>88 |
>2 |
>10 |
Ab Initio
The below table describes the logic for calculating Ab Initio complexity. It calculates the complexity on the basis of number of components, lookup files, ebcdic files, dml, and xfr directory.
Script Severity |
Component Count |
Lookup File Count |
Ebcdic File Count |
Complex DML Count |
Complex XFR Count |
Simple |
<=20 |
<=2 |
<=2 |
<=2 |
<=2 |
Medium |
>20 and <=50 |
>2 and <=3 |
>2 and <=3 |
>2 and <=3 |
>2 and <=3 |
Complex |
> 50 |
>=4 |
>=4 |
>=4 |
>=4 |
Talend
The below table describes the logic for calculating Talend complexity. It calculates the complexity on the basis of tmaps, and components.
Script Severity |
Tmap Count |
Component Count |
1 |
<= 1 |
<=4 |
2 |
>1 & <=3 |
>4 & <=10 |
3 |
>3 & <=7 |
>10 & <=20 |
4 |
>7 & <=12 |
>20 & <=35 |
5 |
>12 |
>35 |
SSIS
The below table describes the logic for calculating SSIS complexity. It calculates the complexity on the basis of components and loops.
Graphs Severity (1-5 Scale) |
Total Components |
For Each Loop/For Loop |
Trivial (1) |
30 |
0 |
Simple (2) |
80 |
1 |
Medium (3) |
150 |
3 |
Complex (4) |
200 |
6 |
V Complex (5) |
200 |
6 |
Matillion
The below table describes the logic for calculating Matillion complexity. It calculates the complexity on the basis of joins, formulas, components, and more.
Category |
Simple |
Medium |
Complex |
Very Complex |
Number of SQL Queries |
<5 |
5-8 |
9-12 |
>12 |
Number of Joins |
<5 |
5-10 |
11-30 |
>30 |
Number of Iterators |
<5 |
5-10 |
11-20 |
>20 |
Number of Custom Logic |
0 |
1-4 |
5-20 |
>20 |
Number of Formulas / Calculators |
<10 |
10-12 |
13-40 |
>40 |
Number of Aggregations |
<5 |
5-10 |
11-30 |
>30 |
Number of Filters/ Conditions |
<10 |
11-15 |
16-50 |
>50 |
Number of Python Scripts |
<5 |
5-10 |
11-20 |
>20 |
Number of Orchestration |
<5 |
5-10 |
11-12 |
>12 |
Number of Components |
<60 |
61-100 |
101-250 |
>250 |
ODI
The below table shows the complexity matrix for ODI packages.
Complexity |
Trivial |
Simple |
Medium |
Complex |
V Complex |
V. V. Complex |
Complexity Number Range |
<=3 |
<=6 |
<=9 |
<=15 |
<=20 |
>20 |
Analytics
The below tables describe the logic for calculating the complexity of analytics scripts such as SAS and Alteryx.
SAS
The below table describes the logic for calculating SAS complexity. It calculates the SAS complexity based on the number of SQL statements, external resources, SQL lines, conditional procedural statements, and more.
Complexity |
Total SQL Statement |
External Resource |
Total Sql Lines |
Total Create Macro Lines |
External Acccess |
Conditional Procedural Statement |
Proc Base SAS |
Proc Other Advanced |
Proc SAS ETS |
Simple (1) |
<4 |
<4 |
200 |
200 |
4 |
3 |
4 |
1 |
1 |
Medium (2) |
<8 |
10 |
1000 |
500 |
8 |
50 |
10 |
2 |
2 |
High (3) |
<50 |
30 |
2000 |
1000 |
50 |
100 |
40 |
5 |
5 |
Very High (4) |
<100 |
80 |
3000 |
2000 |
100 |
500 |
100 |
10 |
10 |
Very Very High (5) |
>500 |
500 |
10000 |
10000 |
500 |
1000 |
500 |
50 |
50 |
Complexity |
Proc SAS STAT |
Proc SAS OR |
Proc SAS ODS |
Proc SQL Count |
Total Source Code Lines |
Total Statement Count |
Data Statement Count |
Create Macros |
Total Statement Count |
SAS Script Generation Statement Count |
Simple (1) |
1 |
1 |
1 |
4 |
100 |
50 |
3 |
8 |
50 |
0 |
Medium (2) |
2 |
2 |
2 |
8 |
500 |
100 |
10 |
15 |
200 |
0 |
High (3) |
5 |
5 |
5 |
50 |
1000 |
200 |
50 |
50 |
500 |
0 |
Very High (4) |
10 |
10 |
10 |
100 |
2000 |
500 |
200 |
100 |
1000 |
0 |
Very Very High (5) |
50 |
50 |
50 |
500 |
5000 |
1000 |
500 |
500 |
5000 |
1 |
Alteryx
The below table describes the logic for calculating Alteryx complexity. It calculates the complexity on the basis of nodes, length of expressions, transformations, and more.
Script Severity |
Total Nodes |
Expression Length |
AIML |
NLP |
Reporting |
SAS |
Control |
Transformation Basic |
Transformation Advanced |
InOut Advanced |
InOut Basic |
Display |
Trivial |
<=15 |
<=500 |
0 |
0 |
0 |
0 |
0 |
<=15 |
0 |
0 |
<=15 |
<15 |
Simple |
>15 & <=30 |
>500 & <=1000 |
0 |
0 |
0 |
0 |
<=1 |
>15 & <=30 |
0 |
0 |
>15 & <=30 |
<30 |
Medium |
>30 & <=100 |
>1000 & <=4000 |
<=1 |
0 |
0 |
0 |
>1 & <=2 |
>30 & <=100 |
<=5 |
<=5 |
>30 & <=100 |
<100 |
Complex |
>100 & <=200 |
>4000 & <=10000 |
>1 & <=5 |
<=1 |
<=1 |
0 |
>2 & <=5 |
>100 & <=200 |
>5 & <=10 |
>5 & <=10 |
>100 & <=200 |
<200 |
Very Complex |
>200 |
>10000 |
>5 |
>1 |
>1 |
>0 |
>5 |
>200 |
>10 |
>10 |
>200 |
>200 |
BI
The below tables describe the logic for calculating BI complexity.
OBIEE
The complexity of OBIEE reports depends on the number of columns, filters, views, or prompts. There are three complexity levels: simple, medium, and complex. The overall complexity of the report is determined by the highest complexity among the individual components.
The below table shows the complexity matrix for OBIEE reports.
Complexity |
Number of columns |
Number Of Filters |
Number Of Views |
Number Of Prompts |
Simple |
<= 5 |
<=1 |
<=2 |
<=1 |
Medium |
>5 AND <= 10 |
>1 AND <=2 |
>2 AND <=4 |
>1 AND <=2 |
Complex |
> 10 |
>2 |
>4 |
>2 |
For example, if the report has 4 columns (which falls under simple complexity), 1 view (which falls under simple complexity), and 3 filters (which falls under complex complexity), then overall report complexity is considered complex. This is because the system will always consider the highest complexity level among the individual components when determining overall report complexity.
IBM Cognos
The complexity of IBM Cognos is determined by calculating the object’s corresponding score and comparing it to predefined complexity values.
The following table describes the predefined values for complexity calculation for IBM Cognos.
Complexity |
Overall Score |
Simple |
<=3 |
Medium |
>3 and <=8 |
Complex |
> 8 |
The following table displays the objects and its corresponding values.
Objects |
Corresponding Score |
Page |
1 |
Visual |
1 |
Field count in report across the queries <= 5 |
0 |
Field count in report across the queries > 5 and <= 10 |
1 |
Field count in report across the queries > 10 and <= 20 |
2 |
Field count in report across the queries > 20 |
4 |
1 package |
0 |
2 packages |
2 |
> 2 packages |
4 |
Custom Queries |
4 + sum of complexity score of queries involved |
MicroStrategy
The table below describes the logic for calculating the dataset complexity of MicroStrategy. It calculates the complexity based on the count of columns, expressions, tables, and join statements.
Dataset Complexity |
Total Column Count |
Expression Count (Filter Included) |
Table Count |
Join Count |
Simple |
100 |
30 |
1-10 |
<=10 |
Medium |
101-200 |
31-60 |
11-20 |
<=20 |
Complex |
201+ |
61+ |
21+ |
>21 |
The table below describes the logic for calculating the visual complexity of MicroStrategy. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count (Dimensions and Masures) |
Measure Count |
Filter Count |
Simple |
<5 |
<3 |
<3 |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
SAP BO
The table below describes the logic for calculating the dataset complexity of SAP BO. It calculates the complexity based on the count of columns, expressions, tables, and join statements.
Dataset Complexity |
Total Column Count |
Expression Count (Filter Included) |
Table Count |
Join Count |
Simple |
100 |
30 |
1-10 |
<=10 |
Medium |
101-200 |
31-60 |
11-20 |
<=20 |
Complex |
201+ |
61+ |
21+ |
>21 |
The table below describes the logic for calculating the visual complexity of SAP BO. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count (Dimensions and Masures) |
Measure Count |
Filter Count |
Simple |
<5 |
<3 |
<3 |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
Tableau
The table below describes the logic for calculating the dataset complexity of Tableau. It calculates the complexity based on the count of columns, expressions, tables, and join statements.
Dataset Complexity |
Total Column Count |
Expression Count (Filter Included) |
Table Count |
Join Count |
Simple |
100 |
30 |
1-7 |
<=6 |
Medium |
101-200 |
31-60 |
8-15 |
<=14 |
Complex |
201+ |
61+ |
16+ |
>15 |
The table below describes the logic for calculating the visual complexity of Tableau. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count (Dimensions and Masures) |
Measure Count |
Filter Count |
Simple |
<5 |
<3 |
<3 |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
SSRS
The table below describes the logic for calculating the visual complexity of SSRS. It calculates the complexity based on the count of columns, calculated columns, filters and parameters.
Visual Complexity |
Total Column Count |
Calculated Columns Count |
Filter and Parameters count |
Comments |
Simple |
<5 |
<3 |
<3 |
Sub-reports should be considered as separate reports |
Medium |
5-10 |
3-6 |
3-5 |
Complex |
>10 |
>6 |
>5 |
Power BI
The table below describes the logic for calculating the dataset complexity of Power BI. It calculates the complexity based on the count of columns, measures, tables, and relationships.
Dataset Complexity |
Total Column Count |
Measure Count |
Table Count |
Relationship Count |
Simple |
0-50 |
0-14 |
1-5 |
<=4 |
Medium |
51-150 |
15-50 |
06-15 |
<=14 |
Complex |
150+ |
50+ |
15+ |
>14 |
The table below describes the logic for calculating the visual complexity of Power BI. It calculates the complexity based on the count of columns, measures, and filters.
Visual Complexity |
Column Count |
Measure Count |
Filter Count |
Simple |
0-3 |
<3 |
<4 |
Medium |
4-8 |
3-6 |
4-9 |
Complex |
>8 |
>6 |
>9 |
Orchestration
The below tables describe the logic for calculating the complexity of orchestration scripts such as AutoSys and Control-M.
AutoSys
The below table describes the logic for calculating AutoSys job complexity.
Job Complexity |
Condition |
Custom Calendar |
Box scheduled jobs (Job and Parent scheduled at diff time) |
Comment |
Simple |
No |
No |
No |
If the three objects (Condition, Custom calendar, and Box scheduled jobs) are absent, then the complexity is Simple |
Medium |
|
|
|
If one or two of the objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Medium |
Complex |
Yes |
Yes |
Yes |
If all three objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Complex |
The box complexity is determined based on the highest complexity factor of the box or the jobs within the box. If the box contains multiple jobs, the job complexity factor is the average of the complexities of all the jobs in the box.
The below table describes the logic for calculating AutoSys box complexity.
Box Complexity |
Condition |
Custom Calendar |
Box scheduled jobs (Job and Parent scheduled at diff time) |
Comment |
Simple |
No |
No |
No |
If the three objects (Condition, Custom calendar, and Box scheduled jobs) are absent, then the complexity is Simple |
Medium |
|
|
|
If one or two of the objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Medium |
Complex |
Yes |
Yes |
Yes |
If all three objects (Condition, Custom calendar, Box scheduled jobs) are present, then the complexity is Complex |
* If the box contains multiple jobs, then the box complexity is determined based on the highest complexity factor of the box or the jobs within the box |
The complexity levels are categorized as Simple (1), Medium (2), and Complex (3). Let’s see a few scenarios to calculate the box complexity if it contains multiple jobs.
If a box has Simple (1) complexity and that box contains 3 jobs – J1, J2, and J3 with 1, 2, and 3 complexity values respectively, then the box and job complexity factors are as follows:
- The box complexity factor is 1, because the box complexity is Simple.
- The job complexity factor is 2, because average job complexity is 2 (sum of job complexity values/number of jobs).
Here, the complexity of box is considered as Medium (2) because the job complexity factor (2) is higher than the box complexity factor (1).
Similarly, if box complexity factor is 3 and the job complexity factor is 2, then the box complexity is considered as Complex (3) because the box complexity factor (3) is higher than the job complexity factor (2).
Control-M
The below table describes the logic for calculating Control-M folder and job complexity.
Folder and Job Complexity |
Frequency |
Dependencies |
Comments |
Simple |
Every Day: Scheduled the job to run on every day |
<=1 |
The complexity of regular folder is always simple. |
Medium |
Specific Dates: Scheduled the job to run on specific dates such as Dec 30, June 15, etc. |
2 or 3 |
The complexity of Smart folder is determined as Medium or Complex based on the highest number of complexity of jobs present in it.
- If the frequency of a Smart folder is Advanced schedule (scheduled to run on weekdays or specific days of the month), then the complexity is Complex.
- If the frequency of a Smart folder is Every Day (scheduled to run on every day), then the complexity is Medium. If that Smart folder contains jobs with complexity higher than the Smart folder, then its complexity is determined based on the highest number of jobs’ complexity.
- For instance, if the frequency of a Smart folder is Medium and that Smart folder contains 3 jobs – J1, J2, and J3 with Complex complexity for each job, then the complexity of Smart folder is Complex. Similarly, if the frequency of a Smart folder is Medium and that Smart folder contains 3 jobs – J1 and J2 with Medium complexity, and J3 with Complex complexity, then the complexity of Smart folder is Medium because the number of Medium complexity jobs (2) are greater than Complex complexity jobs (1).
|
Complex |
Advanced schedule: Scheduled the jobs to run based on an advanced schedule, such as:
- Run on weekdays: Schedules the job to run on specific weekdays. For example, WEEKDAYS="1,4".
- Run on specific days of the month: Schedules the job to run on specific days of the month. For example, DAYS="1,6,4,17,27"
|
>=4 |
Hadoop
The below tables describe the logic for calculating Hadoop complexity.
Hive/ Impala Job to Spark SQL
The below table describes the logic for calculating the complexity of Hive/ Impala jobs with the target as Spark SQL. It calculates the complexity based on the number of queries per job.
Hive/ Impala Job to Spark SQL |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
PySpark/ Spark Job to PySpark/ Spark
The below table describes the logic for calculating the complexity of PySpark/ Spark jobs with the target as PySpark/ Spark. It calculates the complexity on the basis of data frame operations.
PySpark/ Spark Job to Pyspark/ Spark |
Data Frame Operations |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
Hive/ Impala Job to Redshift/ Snowflake
The below table describes the logic for calculating the complexity of Hive/ Impala jobs with the target as Redshift/ Snowflake. It calculates the complexity based on the number of queries per job.
Hive/ Impala Job to Redshift/ Snowflake |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
Spark SQL Job to Spark SQL
The below table describes the logic for calculating the complexity of Spark SQL jobs with the target as Spark SQL. It calculates the complexity based on the number of queries per job.
Spark SQL Job to Spark SQL |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
MapReduce/ Pig Latin Job to PySpark/ Spark
The below table describes the logic for calculating the complexity of MapReduce/ Pig Latin jobs with the target as PySpark/ Spark. It calculates the complexity on the basis of lines of code.
MapReduce/ Pig Latin Job to Pyspark/ Spark |
LOC |
1 |
<=100 |
2 |
<=500 |
3 |
<=1000 |
4 |
<=2500 |
5 |
>2500 |
Spark SQL Job to Redshift/ Snowflake
The below table describes the logic for calculating the complexity of Spark SQL jobs with the target as Redshift/ Snowflake. It calculates the complexity based on the number of queries per job.
Spark SQL Job to Redshift/ Snowflake |
Number of queries per job |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |
Sqoop/ Nifi Job to Spark SQL
The below table describes the logic for calculating the complexity of Sqoop/ Nifi jobs with the target as Spark SQL. It calculates the complexity on the basis of transformations.
Sqoop/ Nifi
Job to Spark SQL |
Number of transformations |
1 |
<=5 |
2 |
<=10 |
3 |
<=20 |
4 |
<=30 |
5 |
>30 |