Netezza Query Execution Logs Extraction and Source Code Prerequisites
This topic briefs about the Netezza assessment query execution logs extraction and source code prerequisites.
In This Topic:
Introduction
LeapLogic Assessment profiles existing inventory, identify complexity, lineage and provides comprehensive recommendations for migration to modern data platform like Cloud and Hadoop.
Assessment checklist
Assessment Process
All the environment specific variables are highlighted. The start and end date should be consistent for all the queries.
Logs
This requires query execution logs in a CSV format. The CSV file must have the following columns (# separated file) retrieved from _v_qryhist table of Netezza.
UserName#AppID#ClientID#StartTime#AMPCPUTime#TotalIOCount#ParserCPUTime#FirstRespTime#FirstStepTime#ProcID#QueryID#MaxAMPCPUTime#MaxAmpIO#Query_Execution_Time#QueryText#schemaName#totalCPU#totalIO
Netezza query logs with wrapper script
Artifacts
Download the above file and rename it to the nz_query_log.ksh.
Follow the steps below for execution of wrapper script for extracting the query logs from Netezza.
- Login to your Netezza server/ Netezza Client Machine (Linux-based) where Netezza client utilities are installed
- Create a folder and copy the file nz_query_log.ksh inside this folder
- Edit the file nz_query_log.ksh for environmental details and date range for which logs need to fetch as specified in Box-1
- Give the execute permission to nz_query_log.ksh file by running the command: chmod +x nz_query_log.ksh
- Run the following command to execute the shell script:
- Ensure that the current directory is the above-created folder
- Run the command ./nz_query_log.ksh
- Execution of preceding step will generate output file in the form of query logs inside the output folder path that you will specify in OUT_FILE_PATH parameter in nz_query_log.ksh file
Box- required parameter entries in nz_query_log.ksh script.
Edit the value of following configuration parameters in the shell script(nz_query_log.ksh) to your specific use case:
# Path of the file where you want to generate logs file
OUT_FILE_PATH=’/export/home/nz/nz_hist_log’
# input Variables (start date,end date,Netezza db username,Netezza db password)
INPUT_START_DT=2020-04-01
INPUT_END_DT=2020-05-31
NZ_USER=admin
NZ_PASSWORD=password
# Provide the comma separated Database name for which log file will be generated. Leave the value of this parameter blank to generate log file for all the databases.
LOG_ASSESSMENT_DATABASES=NEWTPCDS,TPDS
#Provide the name of history database
HISTDB=HISTDB
#Provide the name of history database user
HISTUSER=HISTDBOWNER
Except LOG_ASSESSMENT_DATABASES, setting of all other mentioned variables is mandatory.
The output in the form of query logs files would be created for given date range. For example, with the start and end dates in above shell script variables, the log file will be generated with the name as:
nz_out.log.2020-04-01_2020-05-31
Total I/O Usage by Days
Save the results of this query in a CSV file:
select cast(START_TIME as date) as startDate,
cast(END_TIME as date) as endDate,
SUM( HOST_DISK_READ_SECS + HOST_DISK_WRITE_SECS + HOST_FABRIC_SECS + SPU_DISK_READ_SECS + SPU_DISK_WRITE_SECS + SPU_FABRIC_SECS + SPU_DATA_DISK_READ_SECS + SPU_DATA_DISK_WRITE_SECS + SPU_TEMP_DISK_READ_SECS + SPU_TEMP_DISK_WRITE_SECS ) as TOTALREDWRITE
from _V_SCHED_GRA_EXT
group by 1,2
– – limit 10
Replace startDate and endDate with actual export assessment start and end date.
Hardware Configuration
The hardware configuration is required for the calculation of total CPU utilization.
- Netezza series and specs (Cores, RAM, HDD/SDD)
- Number of Netezza nodes
- Total available AMPCPU seconds per day
Database Object Count
Save its results in a CSV file.
select database::nvarchar(64) as database,OBJTYPE, count(OBJNAME) as Tables_count
from _V_OBJ_RELATION_XDB
group by database,OBJTYPE;
Database Volume
Save its results in a CSV file.
SELECT ORX.database::nvarchar(64) AS “DatabaseName”,
case when sum (SOD.allocated_bytes) is null then 0 else SUM(SOD.allocated_bytes)/1073741824 end AS “AllocatedSpace_GB”
FROM _V_SYS_OBJECT_DSLICE_INFO SOD INNER JOIN _V_OBJ_RELATION_XDB ORX ON ORX.objid = SOD.tblid
GROUP BY “DatabaseName”
ORDER BY “DatabaseName”;
High Data Volume Tables
This query provides result for tables with high data volume. Save its results in a CSV file.
select objname as table_name,
database as db_name,
(allocated_bytes/1048576) as allocated_mbytes
from _v_sys_relation_xdb sys,
_v_sys_object_dslice_info ds
where ds.tblid = sys.objid
and dsid in (1,2,3,4)
and allocated_mbytes > 10240
order by
allocated_mbytes desc,
table_name,
db_name,
dsid;
This SQL will collect databases with volume equal or above 10 GB.
Databases
This query provides the total number of databases. Save its results in a CSV file.
select distinct database::nvarchar(64) from _V_OBJ_RELATION_XDB;
Users
This query provides the total number of users. Save its results in a CSV file.
SELECT GROUPNAME,OWNER,USERNAME FROM _V_GROUPUSERS;
Data for Partitioning and Bucketing
This query extracts table size and column details to be utilized for partition and bucket recommendation. Save the result in a CSV file.
Please note that this query will give details for the database that we are in. Please run it for all databases. Also, please generate the statistics before executing this query.
select DATABASE_NAME,Table_Name,TABLE_SIZE_BYTES,NUM_ROWS,Column_Name, to_number(num_of_Unique_Values, 999999999999999999999999999999) as num_unique_val
from(
SELECT
_V_TABLE.”DATABASE” as database_name,
_V_TABLE.TABLENAME as Table_Name,
case when tbl_stat.used_bytes is null then 0 else tbl_stat.used_bytes end as table_size_bytes,
_V_TABLE.RELTUPLES as num_rows,
SUBSTR(_v_relation_column.attname || ‘ ‘, 1, 25) as Column_Name,
case when _v_relation_column.attdispersion = ‘0’ then ‘0’
when _v_relation_column.attdispersion = ‘-1’ then _V_TABLE.RELTUPLES::varchar(100)
else
TO_CHAR( (CAST((1.0/_v_relation_column.attdispersion) AS BIGINT)), ‘ 999,999,999,999,999 ‘ ) end as num_of_Unique_Values
FROM _v_relation_column
left outer join _v_statistic on
(_v_relation_column.objid = _v_statistic.objid AND
_v_relation_column.attnum = _v_statistic.attnum
)
inner join _V_TABLE
on (_v_relation_column.objid = _V_TABLE.OBJID)
inner join _v_table_storage_stat as tbl_stat
on(tbl_stat.OBJID = _V_TABLE.OBJID)
WHERE
_V_TABLE.OBJTYPE = ‘TABLE’
AND
( _v_relation_column.schema=current_schema OR upper(_v_relation_column.schema) in (‘DEFINITION_SCHEMA’, ‘INFORMATION_SCHEMA’))
ORDER BY
_V_TABLE.TABLENAME ,
_v_relation_column.attnum
) as a;
Source Code Assessment
Provide the below active or in-scope Netezza source code artifacts as applicable in the migration scope.
Code Artifact |
Criticality |
Remarks |
Orchestration Scripts (Control-M / Autosys / Cron etc.) |
Must |
To identify interdependencies across scheduler scripts / jobs, queries, and dependent workloads |
Procedures / Functions |
Must |
To identify workload complexity, query patterns, query count etc. for effort estimation and technical debt |
Views |
Must |
To identify view complexity, patterns and effort estimations |
Shell Scripts |
Must |
To identify count, dependencies, SQL queries and PL/SQL, logic (example: email, notification etc.) and effort estimations |
DDL |
Must |
To identify column usage, and provide recommendation on column level lineage, and query optimization on the target system |
DML / SQL Files |
Must |
To identify count, dependencies, SQL queries and effort estimations |