BigQuery中基于EVENT_ID批量更新已有行多列的技术方案问询
Looking at your scenario—where you need to populate multiple columns in primary_table from some_table_2 (matching on EVENT_ID) while avoiding BigQuery quota restrictions and preserving existing rows/structure—here are two reliable, actionable approaches:
Approach 1: Split Updates into Small Column Groups
Since you can't update all columns in one go due to quotas, break the columns into smaller logical groups and run separate UPDATE statements for each. This works best if each EVENT_ID has a 1:1 row match between tables. If you have 1:many matches (like your example), skip to Approach 2.
Example Batch 1: Update HEIGHT and WEIGHT
UPDATE primary_table p SET HEIGHT = s.HEIGHT, WEIGHT = s.WEIGHT FROM some_table_2 s WHERE p.EVENT_ID = s.EVENT_ID;
Example Batch 2: Update SEX
UPDATE primary_table p SET SEX = s.SEX FROM some_table_2 s WHERE p.EVENT_ID = s.EVENT_ID;
Split further (e.g., one column per update) if needed to stay within quota limits.
Approach 2: Use MERGE with Row Numbers for 1:Many Matches
Your example shows duplicate EVENT_ID rows in both tables—a direct UPDATE would cause unpredictable matches (BigQuery picks a random row in multi-match scenarios). To fix this, add row numbers to both tables to ensure precise row-to-row alignment, then run batch MERGE operations.
Step 1: Verify Matching Logic First (Critical!)
Before making any changes, confirm your row matching works as expected with a test query:
SELECT p.EVENT_ID, p.TREATMENT_TIME, s.HEIGHT, s.WEIGHT, s.SEX FROM ( SELECT *, ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY TREATMENT_TIME) AS row_num FROM primary_table ) p JOIN ( SELECT *, ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY HEIGHT) AS row_num -- Use a consistent sort field here FROM some_table_2 ) s ON p.EVENT_ID = s.EVENT_ID AND p.row_num = s.row_num;
Adjust the ORDER BY clauses to match your data's logical row order (e.g., use a unique ID or timestamp instead of HEIGHT/TREATMENT_TIME if available).
Step 2: Run Batch MERGE Updates
Batch 1: Update HEIGHT and WEIGHT
MERGE INTO primary_table p USING ( SELECT EVENT_ID, HEIGHT, WEIGHT, ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY HEIGHT) AS row_num FROM some_table_2 ) s ON p.EVENT_ID = s.EVENT_ID AND ROW_NUMBER() OVER(PARTITION BY p.EVENT_ID ORDER BY p.TREATMENT_TIME) = s.row_num WHEN MATCHED THEN UPDATE SET HEIGHT = s.HEIGHT, WEIGHT = s.WEIGHT;
Batch 2: Update SEX
MERGE INTO primary_table p USING ( SELECT EVENT_ID, SEX, ROW_NUMBER() OVER(PARTITION BY EVENT_ID ORDER BY HEIGHT) AS row_num -- Keep the same sort field as above! FROM some_table_2 ) s ON p.EVENT_ID = s.EVENT_ID AND ROW_NUMBER() OVER(PARTITION BY p.EVENT_ID ORDER BY p.TREATMENT_TIME) = s.row_num WHEN MATCHED THEN UPDATE SET SEX = s.SEX;
Key Notes
- Quota Compliance: Split columns into as many small batches as needed to stay under BigQuery's update/merge quota limits. Even single-column updates work if required.
- Preserve Structure: Both approaches keep
primary_table's original schema intact—no new columns or rows are added. - Data Integrity: Always test your matching logic first with a
SELECTquery to ensure you're updating the correct rows before running write operations.
内容的提问来源于stack exchange,提问作者arielhasidim




