You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

含CONCAT函数的SQL执行顺序及GROUP BY使用别名的疑问

Understanding BigQuery SQL Execution & GROUP BY with SELECT Aliases

Great question! Let’s break this down clearly, especially focusing on BigQuery’s behavior since that’s what you’re working with in your Coursera course.

The "Standard SQL" vs. BigQuery Syntax Difference

In standard SQL, the logical execution order follows this sequence:

  1. FROM and any JOIN operations
  2. WHERE (filters rows before grouping)
  3. GROUP BY
  4. HAVING (filters groups after grouping)
  5. SELECT
  6. ORDER BY

By this standard, GROUP BY runs before SELECT, so you can’t reference aliases defined in SELECT in your GROUP BY clause—you’d have to repeat the entire CONCAT(fullVisitorId, CAST(visitID AS STRING)) expression instead.

But BigQuery (along with a few other databases like PostgreSQL) includes a convenient syntax extension that lets you reference SELECT aliases in GROUP BY. This works because BigQuery’s query parser resolves these aliases early on, mapping them to their underlying expressions before executing the GROUP BY step. It’s a quality-of-life feature to avoid duplicating complex logic.

Step-by-Step Execution of Your Query in BigQuery

Let’s walk through exactly how your query runs:

  1. Filter data with WHERE: First, BigQuery scans transaction_table and keeps only rows where sessionQualityDim > 60. This reduces the dataset size upfront, optimizing performance for subsequent steps.
  2. Resolve SELECT expressions: Next, BigQuery parses all expressions in your SELECT clause. It calculates CONCAT(fullVisitorId, CAST(visitID AS STRING)) for every filtered row, associates this value with the alias unique_session_id, and tracks the other fields (sessionQualityDim, productRevenue).
  3. Group the data: Now, BigQuery groups the rows using the values from unique_session_id (the computed concatenated string) and sessionQualityDim. Since it already mapped the alias to its underlying expression, you don’t need to rewrite the CONCAT logic here.
  4. Compute aggregate functions: For each group, BigQuery runs SUM(productRevenue) to calculate the transactions_revenue value for that group.
  5. Output the results: Finally, it returns the three fields you specified in SELECT: unique_session_id, sessionQualityDim, and transactions_revenue.

A Quick Standard SQL Comparison

If you were writing strictly standard SQL (without BigQuery’s extension), your query would require repeating the CONCAT expression in the GROUP BY clause, like this:

SELECT 
  CONCAT(fullVisitorId, CAST(visitID AS STRING)) AS unique_session_id, 
  sessionQualityDim, 
  SUM(productRevenue) AS transactions_revenue 
FROM transaction_table 
WHERE sessionQualityDim > 60 
GROUP BY CONCAT(fullVisitorId, CAST(visitID AS STRING)), sessionQualityDim

BigQuery’s alias support saves you from this redundant code, which is especially helpful for longer, more complex expressions.

内容的提问来源于stack exchange,提问作者Mr.Mahajan

火山引擎 最新活动