You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在Impala/Hive中替代MINUS运算符实现时间差计算?

Hey there! Let's break down your two questions step by step—since Impala and Hive don't natively support the MINUS operator, and we also need to calculate that time difference for your Game table.

1. Replacing the MINUS Operator in Impala/Hive

First off, both Impala and Hive don't support MINUS (or EXCEPT, as it's called in some databases) directly. But we have three solid alternatives to get the same "difference set" result (records present in the first query but not the second):

Option 1: LEFT JOIN + IS NULL

This is the most reliable method, especially if your subquery might have NULL values (which can break NOT IN). Here's how to rewrite a MINUS query:
Original intended query:

SELECT col1, col2 FROM table1
MINUS
SELECT col1, col2 FROM table2;

Rewritten with LEFT JOIN:

SELECT t1.col1, t1.col2
FROM table1 t1
LEFT JOIN table2 t2
  ON t1.col1 = t2.col1 AND t1.col2 = t2.col2
WHERE t2.col1 IS NULL;

Option 2: NOT IN

Use this if you're sure the subquery results have no NULL values (otherwise it'll return an empty set). It's more concise:

SELECT col1, col2 FROM table1
WHERE (col1, col2) NOT IN (
    SELECT col1, col2 FROM table2
);

Pro tip: Add a WHERE col1 IS NOT NULL AND col2 IS NOT NULL to the subquery if you're unsure about NULLs.

Option 3: NOT EXISTS

This is logically intuitive and performs well, similar to the LEFT JOIN approach:

SELECT col1, col2 FROM table1 t1
WHERE NOT EXISTS (
    SELECT 1 FROM table2 t2
    WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2
);

2. Calculating Time Difference for the Game Table

To compute halftime_start - official_start and save it as firsttimeplayed, we'll use Impala's time functions—let's assume your Official_start and halftime_start are TIMESTAMP types (adjust if they're strings, but TIMESTAMP is ideal).

Calculating the Difference

If you want the difference in a specific unit (like minutes, seconds, or hours), use TIMESTAMPDIFF:

SELECT
    Official_start,
    halftime_start,
    -- Calculate difference in minutes; replace MINUTE with SECOND/HOUR/DAY as needed
    TIMESTAMPDIFF(MINUTE, Official_start, halftime_start) AS firsttimeplayed
FROM Game;

If you prefer an interval format (e.g., 0 0:45:00 for 45 minutes), just subtract the timestamps directly:

SELECT
    Official_start,
    halftime_start,
    halftime_start - Official_start AS firsttimeplayed
FROM Game;

Persisting the firsttimeplayed Field

If you want to save this field permanently:

Create a new table with the field

CREATE TABLE Game_with_firsttimeplayed AS
SELECT
    *,
    TIMESTAMPDIFF(MINUTE, Official_start, halftime_start) AS firsttimeplayed
FROM Game;

Update the original table (Impala only, requires ACID-enabled tables)

First add the column, then update its values:

-- Add the new column
ALTER TABLE Game ADD COLUMN firsttimeplayed INT;

-- Update the column with calculated values
UPDATE Game
SET firsttimeplayed = TIMESTAMPDIFF(MINUTE, Official_start, halftime_start);

Note: Impala's UPDATE works only on ACID-compliant tables (usually Parquet format with transactions enabled). If your table isn't ACID, stick with creating a new table instead.

内容的提问来源于stack exchange,提问作者user6203336

火山引擎 最新活动