Oracle ETL测试:父子关系、值范围与数据类型验证及示例查询
Hey there! As someone who’s spent a fair bit of time doing ETL testing with Oracle, let’s walk through how to validate parent-child relationships, value ranges, and data types between your staging table T1 and target table T2. I’ll include concrete query examples you can adapt for your specific tables:
First, let’s assume your tables have a parent-child link—for example, T2 is the parent table with a primary key id, and T1 (the staging table) has a foreign key t2_id that references T2.id. We need to make sure every foreign key value in T1 exists in T2’s primary key, and optionally that all parent records in T2 have corresponding child records in T1 (if your business rules require it).
Example Queries:
-- 检查T1中存在但T2中不存在的外键值(如果有结果,说明父子关系存在异常) SELECT t1.t2_id, COUNT(*) AS invalid_record_count FROM T1 LEFT JOIN T2 ON T1.t2_id = T2.id WHERE T2.id IS NULL GROUP BY t1.t2_id;
-- (可选)检查T2中没有对应子记录的父记录(如果业务要求每个父必须有子) SELECT t2.id, COUNT(t1.t2_id) AS child_record_count FROM T2 LEFT JOIN T1 ON T2.id = T1.t2_id WHERE t1.t2_id IS NULL GROUP BY t2.id;
If both queries return no results, your parent-child relationships are valid.
This step ensures that the distribution of data (min, max, averages) in T2 matches what’s in T1, or that T2’s values fall within the expected range defined by T1. This works for numeric, date, and even string fields (though string ranges are less common).
Example Queries:
-- 对比T1和T2中数值/日期字段的极值与平均值 SELECT 'T1' AS source_table, MIN(transaction_amount) AS min_amount, MAX(transaction_amount) AS max_amount, ROUND(AVG(transaction_amount), 2) AS avg_amount, MIN(transaction_date) AS min_date, MAX(transaction_date) AS max_date FROM T1 UNION ALL SELECT 'T2' AS source_table, MIN(transaction_amount) AS min_amount, MAX(transaction_amount) AS max_amount, ROUND(AVG(transaction_amount), 2) AS avg_amount, MIN(transaction_date) AS min_date, MAX(transaction_date) AS max_date FROM T2;
-- 检查T2中是否存在超出T1值范围的记录 SELECT t2.id, t2.transaction_amount, t2.transaction_date FROM T2 WHERE t2.transaction_amount < (SELECT MIN(transaction_amount) FROM T1) OR t2.transaction_amount > (SELECT MAX(transaction_amount) FROM T1) OR t2.transaction_date < (SELECT MIN(transaction_date) FROM T1) OR t2.transaction_date > (SELECT MAX(transaction_date) FROM T1);
Matching results in the first query and no results in the second mean your value ranges are consistent.
We need to confirm two things here: (1) the column data types/lengths between T1 and T2 match (to avoid data truncation or type mismatches), and (2) the actual data stored in T2 conforms to its defined data type.
Example Queries:
-- 对比T1和T2的字段数据类型及长度 SELECT tc.column_name, (SELECT data_type || CASE WHEN data_type IN ('VARCHAR2', 'CHAR') THEN '(' || data_length || ')' WHEN data_type = 'NUMBER' THEN '(' || precision || ',' || scale || ')' ELSE '' END FROM USER_TAB_COLUMNS WHERE table_name = 'T1' AND column_name = tc.column_name) AS t1_column_def, (SELECT data_type || CASE WHEN data_type IN ('VARCHAR2', 'CHAR') THEN '(' || data_length || ')' WHEN data_type = 'NUMBER' THEN '(' || precision || ',' || scale || ')' ELSE '' END FROM USER_TAB_COLUMNS WHERE table_name = 'T2' AND column_name = tc.column_name) AS t2_column_def FROM USER_TAB_COLUMNS tc WHERE tc.table_name IN ('T1', 'T2') GROUP BY tc.column_name HAVING t1_column_def != t2_column_def;
-- (可选)检查T2中数值型字段是否包含无效数据(针对隐性转换场景) SELECT id, transaction_amount FROM T2 WHERE NOT REGEXP_LIKE(transaction_amount, '^[0-9]+(\.[0-9]+)?$');
No results in the first query mean your table structures match. For numeric fields, Oracle will throw errors during invalid data loads, but this second query can catch edge cases if you’re dealing with string-to-number conversions.
内容的提问来源于stack exchange,提问作者lifeofpy




