含CONCAT函数的SQL执行顺序及GROUP BY使用别名的疑问
Great question! Let’s break this down clearly, especially focusing on BigQuery’s behavior since that’s what you’re working with in your Coursera course.
The "Standard SQL" vs. BigQuery Syntax Difference
In standard SQL, the logical execution order follows this sequence:
FROMand anyJOINoperationsWHERE(filters rows before grouping)GROUP BYHAVING(filters groups after grouping)SELECTORDER BY
By this standard, GROUP BY runs before SELECT, so you can’t reference aliases defined in SELECT in your GROUP BY clause—you’d have to repeat the entire CONCAT(fullVisitorId, CAST(visitID AS STRING)) expression instead.
But BigQuery (along with a few other databases like PostgreSQL) includes a convenient syntax extension that lets you reference SELECT aliases in GROUP BY. This works because BigQuery’s query parser resolves these aliases early on, mapping them to their underlying expressions before executing the GROUP BY step. It’s a quality-of-life feature to avoid duplicating complex logic.
Step-by-Step Execution of Your Query in BigQuery
Let’s walk through exactly how your query runs:
- Filter data with
WHERE: First, BigQuery scanstransaction_tableand keeps only rows wheresessionQualityDim > 60. This reduces the dataset size upfront, optimizing performance for subsequent steps. - Resolve
SELECTexpressions: Next, BigQuery parses all expressions in yourSELECTclause. It calculatesCONCAT(fullVisitorId, CAST(visitID AS STRING))for every filtered row, associates this value with the aliasunique_session_id, and tracks the other fields (sessionQualityDim,productRevenue). - Group the data: Now, BigQuery groups the rows using the values from
unique_session_id(the computed concatenated string) andsessionQualityDim. Since it already mapped the alias to its underlying expression, you don’t need to rewrite theCONCATlogic here. - Compute aggregate functions: For each group, BigQuery runs
SUM(productRevenue)to calculate thetransactions_revenuevalue for that group. - Output the results: Finally, it returns the three fields you specified in
SELECT:unique_session_id,sessionQualityDim, andtransactions_revenue.
A Quick Standard SQL Comparison
If you were writing strictly standard SQL (without BigQuery’s extension), your query would require repeating the CONCAT expression in the GROUP BY clause, like this:
SELECT CONCAT(fullVisitorId, CAST(visitID AS STRING)) AS unique_session_id, sessionQualityDim, SUM(productRevenue) AS transactions_revenue FROM transaction_table WHERE sessionQualityDim > 60 GROUP BY CONCAT(fullVisitorId, CAST(visitID AS STRING)), sessionQualityDim
BigQuery’s alias support saves you from this redundant code, which is especially helpful for longer, more complex expressions.
内容的提问来源于stack exchange,提问作者Mr.Mahajan




