如何在Impala/Hive中替代MINUS运算符实现时间差计算?
Hey there! Let's break down your two questions step by step—since Impala and Hive don't natively support the MINUS operator, and we also need to calculate that time difference for your Game table.
1. Replacing the MINUS Operator in Impala/Hive
First off, both Impala and Hive don't support MINUS (or EXCEPT, as it's called in some databases) directly. But we have three solid alternatives to get the same "difference set" result (records present in the first query but not the second):
Option 1: LEFT JOIN + IS NULL
This is the most reliable method, especially if your subquery might have NULL values (which can break NOT IN). Here's how to rewrite a MINUS query:
Original intended query:
SELECT col1, col2 FROM table1 MINUS SELECT col1, col2 FROM table2;
Rewritten with LEFT JOIN:
SELECT t1.col1, t1.col2 FROM table1 t1 LEFT JOIN table2 t2 ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 WHERE t2.col1 IS NULL;
Option 2: NOT IN
Use this if you're sure the subquery results have no NULL values (otherwise it'll return an empty set). It's more concise:
SELECT col1, col2 FROM table1 WHERE (col1, col2) NOT IN ( SELECT col1, col2 FROM table2 );
Pro tip: Add a WHERE col1 IS NOT NULL AND col2 IS NOT NULL to the subquery if you're unsure about NULLs.
Option 3: NOT EXISTS
This is logically intuitive and performs well, similar to the LEFT JOIN approach:
SELECT col1, col2 FROM table1 t1 WHERE NOT EXISTS ( SELECT 1 FROM table2 t2 WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2 );
2. Calculating Time Difference for the Game Table
To compute halftime_start - official_start and save it as firsttimeplayed, we'll use Impala's time functions—let's assume your Official_start and halftime_start are TIMESTAMP types (adjust if they're strings, but TIMESTAMP is ideal).
Calculating the Difference
If you want the difference in a specific unit (like minutes, seconds, or hours), use TIMESTAMPDIFF:
SELECT Official_start, halftime_start, -- Calculate difference in minutes; replace MINUTE with SECOND/HOUR/DAY as needed TIMESTAMPDIFF(MINUTE, Official_start, halftime_start) AS firsttimeplayed FROM Game;
If you prefer an interval format (e.g., 0 0:45:00 for 45 minutes), just subtract the timestamps directly:
SELECT Official_start, halftime_start, halftime_start - Official_start AS firsttimeplayed FROM Game;
Persisting the firsttimeplayed Field
If you want to save this field permanently:
Create a new table with the field
CREATE TABLE Game_with_firsttimeplayed AS SELECT *, TIMESTAMPDIFF(MINUTE, Official_start, halftime_start) AS firsttimeplayed FROM Game;
Update the original table (Impala only, requires ACID-enabled tables)
First add the column, then update its values:
-- Add the new column ALTER TABLE Game ADD COLUMN firsttimeplayed INT; -- Update the column with calculated values UPDATE Game SET firsttimeplayed = TIMESTAMPDIFF(MINUTE, Official_start, halftime_start);
Note: Impala's UPDATE works only on ACID-compliant tables (usually Parquet format with transactions enabled). If your table isn't ACID, stick with creating a new table instead.
内容的提问来源于stack exchange,提问作者user6203336




