Oracle数据库批量记录校验Shell脚本超时优化方案咨询
你现在遇到的是典型的N+1查询性能瓶颈——10万次单独的SQL请求来回折腾数据库,光是网络开销和数据库的SQL解析成本就足以让任务超时。下面我给你分两种技术栈(Shell/PLSQL + Java)提供生产环境验证过的高效优化方案,直接就能上手:
一、Shell + PLSQL 方向优化(无需切换语言)
1. 批量导入临时表 + 单次关联查询(最优方案)
这是性能提升最明显的思路:把本地10万条数据一次性导入Oracle临时表,再通过一次JOIN/EXISTS查询完成批量校验,彻底避免逐行请求。
步骤1:创建会话级临时表
临时表只会在当前会话存在,会话结束自动清空,不会污染主库:
CREATE GLOBAL TEMPORARY TABLE temp_check_data ( col1 VARCHAR2(100), col2 VARCHAR2(100), testval3 VARCHAR2(100), -- 你要校验的字段 col4 VARCHAR2(100), testval5 VARCHAR2(100), -- 你要校验的字段 col6 VARCHAR2(100) ) ON COMMIT PRESERVE ROWS; -- 提交后保留数据,适合会话内批量操作
步骤2:用SQL Loader批量导入本地文本
Oracle自带的sqlldr是专门为批量数据导入优化的工具,比逐行INSERT快几十倍。先写一个控制文件data_loader.ctl:
LOAD DATA INFILE 'your_input_file.txt' BADFILE 'data.bad' -- 存储导入失败的行 DISCARDFILE 'data.dsc' -- 存储被丢弃的行 FIELDS TERMINATED BY '|' -- 指定分隔符为| TRAILING NULLCOLS -- 处理末尾空列 ( col1, col2, testval3, col4, testval5, col6 )
然后在Shell中执行导入命令:
sqlldr your_username/your_password@your_oracle_service control=data_loader.ctl log=data_loader.log
步骤3:执行批量校验查询
用一次查询找出不符合条件的记录(比如testval3和testval5都不存在于主表的行):
-- 方案1:用LEFT JOIN判断 SELECT t.* FROM temp_check_data t LEFT JOIN your_main_table m ON m.testval3 = t.testval3 OR m.testval5 = t.testval5 WHERE m.testval3 IS NULL AND m.testval5 IS NULL; -- 方案2:用EXISTS更高效(推荐) SELECT t.* FROM temp_check_data t WHERE NOT EXISTS ( SELECT 1 FROM your_main_table m WHERE m.testval3 = t.testval3 OR m.testval5 = t.testval5 );
最后可以用spool命令把查询结果导出到本地文件:
SPOOL check_result.txt -- 上面的校验SQL SPOOL OFF
2. 构造批量IN子句查询(无需临时表的折中方案)
如果不想创建临时表,可以把10万条记录的testval3/testval5分组,每次构造包含1000个值的IN子句(Oracle IN子句有长度限制,1000个值以内安全),减少查询次数。
Shell脚本示例:
#!/bin/bash INPUT_FILE="your_input_file.txt" BATCH_SIZE=1000 COUNT=0 VALUES="" # 逐行读取文本,提取需要校验的字段 while IFS='|' read -r col1 col2 testval3 col4 testval5 col6; do # 转义单引号,避免SQL语法错误和注入风险 ESCAPED_VAL3=$(echo "$testval3" | sed "s/'/''/g") ESCAPED_VAL5=$(echo "$testval5" | sed "s/'/''/g") VALUES+="('$ESCAPED_VAL3','$ESCAPED_VAL5')," COUNT=$((COUNT+1)) # 达到批量大小,执行一次查询 if [ $COUNT -eq $BATCH_SIZE ]; then VALUES=${VALUES%,} # 去掉最后一个逗号 # 构造批量查询SQL SQL="SELECT val3, val5, CASE WHEN EXISTS(SELECT 1 FROM your_main_table m WHERE m.testval3 = val3 OR m.testval5 = val5) THEN '存在' ELSE '不存在' END AS result FROM (VALUES $VALUES) t(val3, val5)" # 执行查询并保存结果 echo "$SQL" | sqlplus -s your_username/your_password@your_oracle_service >> check_result.txt # 重置计数器和参数 COUNT=0 VALUES="" fi done < "$INPUT_FILE" # 处理剩余的不足批量的记录 if [ $COUNT -gt 0 ]; then VALUES=${VALUES%,} SQL="SELECT val3, val5, CASE WHEN EXISTS(SELECT 1 FROM your_main_table m WHERE m.testval3 = val3 OR m.testval5 = val5) THEN '存在' ELSE '不存在' END AS result FROM (VALUES $VALUES) t(val3, val5)" echo "$SQL" | sqlplus -s your_username/your_password@your_oracle_service >> check_result.txt fi
二、Java 方向优化(适合灵活扩展业务逻辑)
1. JDBC批量插入临时表 + 关联查询
和Shell方案的思路一致,用Java实现批量导入和校验,适合后续需要添加复杂业务逻辑的场景。
Java代码示例:
import java.sql.*; import java.io.BufferedReader; import java.io.FileReader; public class OracleDataChecker { private static final String DB_URL = "jdbc:oracle:thin:@your_oracle_service"; private static final String DB_USER = "your_username"; private static final String DB_PASS = "your_password"; private static final String INPUT_FILE = "your_input_file.txt"; private static final int BATCH_SIZE = 1000; // 每1000条批量插入一次 public static void main(String[] args) { try (Connection conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASS)) { conn.setAutoCommit(false); // 关闭自动提交,提升批量插入效率 // 准备临时表插入语句 String insertSql = "INSERT INTO temp_check_data (col1, col2, testval3, col4, testval5, col6) VALUES (?, ?, ?, ?, ?, ?)"; try (PreparedStatement pstmt = conn.prepareStatement(insertSql); BufferedReader br = new BufferedReader(new FileReader(INPUT_FILE))) { String line; int count = 0; while ((line = br.readLine()) != null) { String[] cols = line.split("\\|", 6); // 按|分割,最多取6列 pstmt.setString(1, cols[0]); pstmt.setString(2, cols[1]); pstmt.setString(3, cols[2]); // testval3 pstmt.setString(4, cols[3]); pstmt.setString(5, cols[4]); // testval5 pstmt.setString(6, cols[5]); pstmt.addBatch(); count++; if (count % BATCH_SIZE == 0) { pstmt.executeBatch(); conn.commit(); } } // 处理剩余的未批量提交的数据 pstmt.executeBatch(); conn.commit(); } // 执行批量校验查询 String checkSql = "SELECT t.* FROM temp_check_data t WHERE NOT EXISTS (SELECT 1 FROM your_main_table m WHERE m.testval3 = t.testval3 OR m.testval5 = t.testval5)"; try (Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery(checkSql)) { // 输出校验结果,这里可以改成写入文件 while (rs.next()) { System.out.printf("%s|%s|%s|%s|%s|%s%n", rs.getString("col1"), rs.getString("col2"), rs.getString("testval3"), rs.getString("col4"), rs.getString("testval5"), rs.getString("col6")); } } } catch (Exception e) { e.printStackTrace(); } } }
2. Oracle JDBC数组批量传参(无需临时表)
利用Oracle JDBC的数组传递功能,把所有testval3/testval5打包成数组传递给数据库,在PLSQL中批量处理,适合纯校验场景。
Java代码示例:
import java.sql.*; import oracle.jdbc.OracleConnection; import oracle.sql.ARRAY; import oracle.sql.ArrayDescriptor; public class BatchParamChecker { private static final String DB_URL = "jdbc:oracle:thin:@your_oracle_service"; private static final String DB_USER = "your_username"; private static final String DB_PASS = "your_password"; private static final String INPUT_FILE = "your_input_file.txt"; public static void main(String[] args) { try (Connection conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASS)) { OracleConnection oracleConn = conn.unwrap(OracleConnection.class); // 读取所有需要校验的字段 BufferedReader br = new BufferedReader(new FileReader(INPUT_FILE)); String line; String[] val3List = new String[100000]; String[] val5List = new String[100000]; int index = 0; while ((line = br.readLine()) != null) { String[] cols = line.split("\\|", 6); val3List[index] = cols[2]; val5List[index] = cols[4]; index++; } br.close(); // 创建Oracle数组 ArrayDescriptor val3Desc = ArrayDescriptor.createDescriptor("VARCHAR2", oracleConn); ARRAY oracleVal3Array = new ARRAY(val3Desc, oracleConn, val3List); ArrayDescriptor val5Desc = ArrayDescriptor.createDescriptor("VARCHAR2", oracleConn); ARRAY oracleVal5Array = new ARRAY(val5Desc, oracleConn, val5List); // 执行批量校验查询 String sql = "SELECT t.idx, t.val3, t.val5, CASE WHEN EXISTS(SELECT 1 FROM your_main_table m WHERE m.testval3 = t.val3 OR m.testval5 = t.val5) THEN 'Y' ELSE 'N' END AS exists_flag " + "FROM (SELECT ROWNUM idx, column_value val3 FROM TABLE(?) v3) t1 " + "JOIN (SELECT ROWNUM idx, column_value val5 FROM TABLE(?) v5) t2 ON t1.idx = t2.idx"; try (PreparedStatement pstmt = oracleConn.prepareStatement(sql)) { pstmt.setArray(1, oracleVal3Array); pstmt.setArray(2, oracleVal5Array); ResultSet rs = pstmt.executeQuery(); while (rs.next()) { System.out.printf("行号:%d,testval3:%s,testval5:%s,是否存在:%s%n", rs.getInt("idx"), rs.getString("val3"), rs.getString("val5"), rs.getString("exists_flag")); } } } catch (Exception e) { e.printStackTrace(); } } }
关键前置优化
不管用哪种方案,一定要确保你的主表your_main_table在testval3和testval5上创建索引,否则1000万条表的全表扫描会拖慢所有查询:
-- 创建复合索引,适配OR查询场景 CREATE INDEX idx_main_testvals ON your_main_table(testval3, testval5);
内容的提问来源于stack exchange,提问作者Busybee




