BigQuery中基于EVENT_ID批量更新已有行多列的技术方案问询

阿华AIGC实验室

2026-4-29

Solution for Batch Updating Columns in BigQuery Under Quota Limits

Looking at your scenario—where you need to populate multiple columns in primary_table from some_table_2 (matching on EVENT_ID) while avoiding BigQuery quota restrictions and preserving existing rows/structure—here are two reliable, actionable approaches:

Approach 1: Split Updates into Small Column Groups

Since you can't update all columns in one go due to quotas, break the columns into smaller logical groups and run separate UPDATE statements for each. This works best if each EVENT_ID has a 1:1 row match between tables. If you have 1:many matches (like your example), skip to Approach 2.

Example Batch 1: Update `HEIGHT` and `WEIGHT`

UPDATE primary_table p
SET 
  HEIGHT = s.HEIGHT,
  WEIGHT = s.WEIGHT
FROM some_table_2 s
WHERE p.EVENT_ID = s.EVENT_ID;

Example Batch 2: Update `SEX`

UPDATE primary_table p
SET SEX = s.SEX
FROM some_table_2 s
WHERE p.EVENT_ID = s.EVENT_ID;

Split further (e.g., one column per update) if needed to stay within quota limits.

Approach 2: Use `MERGE` with Row Numbers for 1:Many Matches

Your example shows duplicate EVENT_ID rows in both tables—a direct UPDATE would cause unpredictable matches (BigQuery picks a random row in multi-match scenarios). To fix this, add row numbers to both tables to ensure precise row-to-row alignment, then run batch MERGE operations.

Step 1: Verify Matching Logic First (Critical!)

Before making any changes, confirm your row matching works as expected with a test query:

SELECT
  p.EVENT_ID,
  p.TREATMENT_TIME,
  s.HEIGHT,
  s.WEIGHT,
  s.SEX
FROM (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY TREATMENT_TIME) AS row_num
  FROM primary_table
) p
JOIN (
  SELECT
    *,
    ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY HEIGHT) AS row_num -- Use a consistent sort field here
  FROM some_table_2
) s ON p.EVENT_ID = s.EVENT_ID AND p.row_num = s.row_num;

Adjust the ORDER BY clauses to match your data's logical row order (e.g., use a unique ID or timestamp instead of HEIGHT/TREATMENT_TIME if available).

Step 2: Run Batch `MERGE` Updates

Batch 1: Update `HEIGHT` and `WEIGHT`

MERGE INTO primary_table p
USING (
  SELECT
    EVENT_ID,
    HEIGHT,
    WEIGHT,
    ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY HEIGHT) AS row_num
  FROM some_table_2
) s
ON p.EVENT_ID = s.EVENT_ID 
AND ROW_NUMBER() OVER(PARTITION BY p.EVENT_ID ORDER BY p.TREATMENT_TIME) = s.row_num
WHEN MATCHED THEN
  UPDATE SET
    HEIGHT = s.HEIGHT,
    WEIGHT = s.WEIGHT;

Batch 2: Update `SEX`

MERGE INTO primary_table p
USING (
  SELECT
    EVENT_ID,
    SEX,
    ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY HEIGHT) AS row_num -- Keep the same sort field as above!
  FROM some_table_2
) s
ON p.EVENT_ID = s.EVENT_ID 
AND ROW_NUMBER() OVER(PARTITION BY p.EVENT_ID ORDER BY p.TREATMENT_TIME) = s.row_num
WHEN MATCHED THEN
  UPDATE SET SEX = s.SEX;

Key Notes

Quota Compliance: Split columns into as many small batches as needed to stay under BigQuery's update/merge quota limits. Even single-column updates work if required.
Preserve Structure: Both approaches keep primary_table's original schema intact—no new columns or rows are added.
Data Integrity: Always test your matching logic first with a SELECT query to ensure you're updating the correct rows before running write operations.

内容的提问来源于stack exchange，提问作者arielhasidim