You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Oracle数据库批量记录校验Shell脚本超时优化方案咨询

优化10万条记录与Oracle数据库的校验性能方案

你现在遇到的是典型的N+1查询性能瓶颈——10万次单独的SQL请求来回折腾数据库,光是网络开销和数据库的SQL解析成本就足以让任务超时。下面我给你分两种技术栈(Shell/PLSQL + Java)提供生产环境验证过的高效优化方案,直接就能上手:


一、Shell + PLSQL 方向优化(无需切换语言)

1. 批量导入临时表 + 单次关联查询(最优方案)

这是性能提升最明显的思路:把本地10万条数据一次性导入Oracle临时表,再通过一次JOIN/EXISTS查询完成批量校验,彻底避免逐行请求。

步骤1:创建会话级临时表

临时表只会在当前会话存在,会话结束自动清空,不会污染主库:

CREATE GLOBAL TEMPORARY TABLE temp_check_data (
    col1 VARCHAR2(100),
    col2 VARCHAR2(100),
    testval3 VARCHAR2(100), -- 你要校验的字段
    col4 VARCHAR2(100),
    testval5 VARCHAR2(100), -- 你要校验的字段
    col6 VARCHAR2(100)
) ON COMMIT PRESERVE ROWS; -- 提交后保留数据,适合会话内批量操作

步骤2:用SQL Loader批量导入本地文本

Oracle自带的sqlldr是专门为批量数据导入优化的工具,比逐行INSERT快几十倍。先写一个控制文件data_loader.ctl

LOAD DATA
INFILE 'your_input_file.txt'
BADFILE 'data.bad' -- 存储导入失败的行
DISCARDFILE 'data.dsc' -- 存储被丢弃的行
FIELDS TERMINATED BY '|' -- 指定分隔符为|
TRAILING NULLCOLS -- 处理末尾空列
(
    col1,
    col2,
    testval3,
    col4,
    testval5,
    col6
)

然后在Shell中执行导入命令:

sqlldr your_username/your_password@your_oracle_service control=data_loader.ctl log=data_loader.log

步骤3:执行批量校验查询

用一次查询找出不符合条件的记录(比如testval3和testval5都不存在于主表的行):

-- 方案1:用LEFT JOIN判断
SELECT t.*
FROM temp_check_data t
LEFT JOIN your_main_table m 
    ON m.testval3 = t.testval3 OR m.testval5 = t.testval5
WHERE m.testval3 IS NULL AND m.testval5 IS NULL;

-- 方案2:用EXISTS更高效(推荐)
SELECT t.*
FROM temp_check_data t
WHERE NOT EXISTS (
    SELECT 1 FROM your_main_table m 
    WHERE m.testval3 = t.testval3 OR m.testval5 = t.testval5
);

最后可以用spool命令把查询结果导出到本地文件:

SPOOL check_result.txt
-- 上面的校验SQL
SPOOL OFF

2. 构造批量IN子句查询(无需临时表的折中方案)

如果不想创建临时表,可以把10万条记录的testval3/testval5分组,每次构造包含1000个值的IN子句(Oracle IN子句有长度限制,1000个值以内安全),减少查询次数。

Shell脚本示例:

#!/bin/bash

INPUT_FILE="your_input_file.txt"
BATCH_SIZE=1000
COUNT=0
VALUES=""

# 逐行读取文本,提取需要校验的字段
while IFS='|' read -r col1 col2 testval3 col4 testval5 col6; do
    # 转义单引号,避免SQL语法错误和注入风险
    ESCAPED_VAL3=$(echo "$testval3" | sed "s/'/''/g")
    ESCAPED_VAL5=$(echo "$testval5" | sed "s/'/''/g")
    VALUES+="('$ESCAPED_VAL3','$ESCAPED_VAL5'),"
    COUNT=$((COUNT+1))
    
    # 达到批量大小,执行一次查询
    if [ $COUNT -eq $BATCH_SIZE ]; then
        VALUES=${VALUES%,} # 去掉最后一个逗号
        # 构造批量查询SQL
        SQL="SELECT val3, val5, CASE WHEN EXISTS(SELECT 1 FROM your_main_table m WHERE m.testval3 = val3 OR m.testval5 = val5) THEN '存在' ELSE '不存在' END AS result FROM (VALUES $VALUES) t(val3, val5)"
        # 执行查询并保存结果
        echo "$SQL" | sqlplus -s your_username/your_password@your_oracle_service >> check_result.txt
        # 重置计数器和参数
        COUNT=0
        VALUES=""
    fi
done < "$INPUT_FILE"

# 处理剩余的不足批量的记录
if [ $COUNT -gt 0 ]; then
    VALUES=${VALUES%,}
    SQL="SELECT val3, val5, CASE WHEN EXISTS(SELECT 1 FROM your_main_table m WHERE m.testval3 = val3 OR m.testval5 = val5) THEN '存在' ELSE '不存在' END AS result FROM (VALUES $VALUES) t(val3, val5)"
    echo "$SQL" | sqlplus -s your_username/your_password@your_oracle_service >> check_result.txt
fi

二、Java 方向优化(适合灵活扩展业务逻辑)

1. JDBC批量插入临时表 + 关联查询

和Shell方案的思路一致,用Java实现批量导入和校验,适合后续需要添加复杂业务逻辑的场景。

Java代码示例:

import java.sql.*;
import java.io.BufferedReader;
import java.io.FileReader;

public class OracleDataChecker {
    private static final String DB_URL = "jdbc:oracle:thin:@your_oracle_service";
    private static final String DB_USER = "your_username";
    private static final String DB_PASS = "your_password";
    private static final String INPUT_FILE = "your_input_file.txt";
    private static final int BATCH_SIZE = 1000; // 每1000条批量插入一次

    public static void main(String[] args) {
        try (Connection conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASS)) {
            conn.setAutoCommit(false); // 关闭自动提交,提升批量插入效率

            // 准备临时表插入语句
            String insertSql = "INSERT INTO temp_check_data (col1, col2, testval3, col4, testval5, col6) VALUES (?, ?, ?, ?, ?, ?)";
            try (PreparedStatement pstmt = conn.prepareStatement(insertSql);
                 BufferedReader br = new BufferedReader(new FileReader(INPUT_FILE))) {

                String line;
                int count = 0;
                while ((line = br.readLine()) != null) {
                    String[] cols = line.split("\\|", 6); // 按|分割,最多取6列
                    pstmt.setString(1, cols[0]);
                    pstmt.setString(2, cols[1]);
                    pstmt.setString(3, cols[2]); // testval3
                    pstmt.setString(4, cols[3]);
                    pstmt.setString(5, cols[4]); // testval5
                    pstmt.setString(6, cols[5]);
                    pstmt.addBatch();
                    count++;

                    if (count % BATCH_SIZE == 0) {
                        pstmt.executeBatch();
                        conn.commit();
                    }
                }
                // 处理剩余的未批量提交的数据
                pstmt.executeBatch();
                conn.commit();
            }

            // 执行批量校验查询
            String checkSql = "SELECT t.* FROM temp_check_data t WHERE NOT EXISTS (SELECT 1 FROM your_main_table m WHERE m.testval3 = t.testval3 OR m.testval5 = t.testval5)";
            try (Statement stmt = conn.createStatement();
                 ResultSet rs = stmt.executeQuery(checkSql)) {

                // 输出校验结果,这里可以改成写入文件
                while (rs.next()) {
                    System.out.printf("%s|%s|%s|%s|%s|%s%n",
                            rs.getString("col1"), rs.getString("col2"), rs.getString("testval3"),
                            rs.getString("col4"), rs.getString("testval5"), rs.getString("col6"));
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

2. Oracle JDBC数组批量传参(无需临时表)

利用Oracle JDBC的数组传递功能,把所有testval3/testval5打包成数组传递给数据库,在PLSQL中批量处理,适合纯校验场景。

Java代码示例:

import java.sql.*;
import oracle.jdbc.OracleConnection;
import oracle.sql.ARRAY;
import oracle.sql.ArrayDescriptor;

public class BatchParamChecker {
    private static final String DB_URL = "jdbc:oracle:thin:@your_oracle_service";
    private static final String DB_USER = "your_username";
    private static final String DB_PASS = "your_password";
    private static final String INPUT_FILE = "your_input_file.txt";

    public static void main(String[] args) {
        try (Connection conn = DriverManager.getConnection(DB_URL, DB_USER, DB_PASS)) {
            OracleConnection oracleConn = conn.unwrap(OracleConnection.class);

            // 读取所有需要校验的字段
            BufferedReader br = new BufferedReader(new FileReader(INPUT_FILE));
            String line;
            String[] val3List = new String[100000];
            String[] val5List = new String[100000];
            int index = 0;
            while ((line = br.readLine()) != null) {
                String[] cols = line.split("\\|", 6);
                val3List[index] = cols[2];
                val5List[index] = cols[4];
                index++;
            }
            br.close();

            // 创建Oracle数组
            ArrayDescriptor val3Desc = ArrayDescriptor.createDescriptor("VARCHAR2", oracleConn);
            ARRAY oracleVal3Array = new ARRAY(val3Desc, oracleConn, val3List);
            ArrayDescriptor val5Desc = ArrayDescriptor.createDescriptor("VARCHAR2", oracleConn);
            ARRAY oracleVal5Array = new ARRAY(val5Desc, oracleConn, val5List);

            // 执行批量校验查询
            String sql = "SELECT t.idx, t.val3, t.val5, CASE WHEN EXISTS(SELECT 1 FROM your_main_table m WHERE m.testval3 = t.val3 OR m.testval5 = t.val5) THEN 'Y' ELSE 'N' END AS exists_flag " +
                    "FROM (SELECT ROWNUM idx, column_value val3 FROM TABLE(?) v3) t1 " +
                    "JOIN (SELECT ROWNUM idx, column_value val5 FROM TABLE(?) v5) t2 ON t1.idx = t2.idx";
            try (PreparedStatement pstmt = oracleConn.prepareStatement(sql)) {
                pstmt.setArray(1, oracleVal3Array);
                pstmt.setArray(2, oracleVal5Array);
                ResultSet rs = pstmt.executeQuery();
                while (rs.next()) {
                    System.out.printf("行号:%d,testval3:%s,testval5:%s,是否存在:%s%n",
                            rs.getInt("idx"), rs.getString("val3"), rs.getString("val5"), rs.getString("exists_flag"));
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

关键前置优化

不管用哪种方案,一定要确保你的主表your_main_tabletestval3testval5上创建索引,否则1000万条表的全表扫描会拖慢所有查询:

-- 创建复合索引,适配OR查询场景
CREATE INDEX idx_main_testvals ON your_main_table(testval3, testval5);

内容的提问来源于stack exchange,提问作者Busybee

火山引擎 最新活动