You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

寻求GROUP BY替代方案:移除SQL别名列重复值的技术方法

解决SQL查询中别名列去重的问题

首先得说清楚为啥你直接GROUP BY CONCZIP不行——SQL的执行顺序是先处理FROM/JOIN,然后WHERE,接着是GROUP BY,最后才轮到SELECT。这意味着当数据库执行GROUP BY的时候,你在SELECT里定义的别名CONCZIP还没被创建出来,自然识别不了它。

下面给你两种可行的解决办法,看你偏好哪种:

方法1:将别名对应的原始表达式放到GROUP BY中

既然不能直接用别名,那我们把生成CONCZIP的完整表达式直接写到GROUP BY里就行,虽然代码有点重复,但逻辑很直接:

SELECT 
    (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.DestinationSite) AS DESTZIP1,
    (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.SOURCESite) AS SOURCESITE1,
    ((SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.SOURCESite)) AS CONCZIP,
    TransportationPolicies.SourceSite,
    TransportationPolicies.DestinationSite
FROM TransportationPolicies
GROUP BY 
    (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.SOURCESite),
    DESTZIP1, SOURCESITE1, SourceSite, DestinationSite;

⚠️ 注意:这里把其他非聚合列也加到了GROUP BY里,因为大多数数据库要求SELECT里的非聚合列必须出现在GROUP BY中。如果同一个CONCZIP对应多个不同的SourceSiteDestinationSite,你可能需要用聚合函数(比如MAX(SourceSite))来选择其中一个值,而不是把它们都加到GROUP BY里。

方法2:用CTE(公共表表达式)先计算所有列,再外层去重

这种方法更整洁,先把所有需要的列(包括CONCZIP)在CTE里计算好,然后在外层查询直接GROUP BY CONCZIP就行:

WITH PolicyZips AS (
    SELECT 
        (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) AS DESTZIP1,
        (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite) AS SOURCESITE1,
        ((SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite)) AS CONCZIP,
        tp.SourceSite,
        tp.DestinationSite
    FROM TransportationPolicies tp
)
SELECT 
    DESTZIP1,
    SOURCESITE1,
    CONCZIP,
    SourceSite,
    DestinationSite
FROM PolicyZips
GROUP BY CONCZIP, DESTZIP1, SOURCESITE1, SourceSite, DestinationSite;

如果不需要保留所有对应列,只需要唯一的CONCZIP,可以调整外层查询,比如:

WITH PolicyZips AS (
    SELECT 
        ((SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite)) AS CONCZIP,
        tp.SourceSite,
        tp.DestinationSite,
        (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) AS DESTZIP1,
        (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite) AS SOURCESITE1
    FROM TransportationPolicies tp
)
SELECT 
    MAX(DESTZIP1) AS DESTZIP1,
    MAX(SOURCESITE1) AS SOURCESITE1,
    CONCZIP,
    MAX(SourceSite) AS SourceSite,
    MAX(DestinationSite) AS DestinationSite
FROM PolicyZips
GROUP BY CONCZIP;

这样就能得到每个唯一CONCZIP对应的任意一组关联列值,具体用MAX还是MIN取决于你的需求。

另外,顺便提一句:你当前的查询里用了很多相关子查询,其实可以改成JOIN来优化性能,比如把Sites表和TransportationPolicies表关联两次(一次对应SourceSite,一次对应DestinationSite),这样比多次子查询效率更高,代码也更易读:

WITH PolicyZips AS (
    SELECT 
        dest.SiteZipCode AS DESTZIP1,
        src.SiteZipCode AS SOURCESITE1,
        CONCAT(dest.SiteZipCode, '_', src.SiteZipCode) AS CONCZIP,
        tp.SourceSite,
        tp.DestinationSite
    FROM TransportationPolicies tp
    LEFT JOIN Sites dest ON dest.SiteName = tp.DestinationSite
    LEFT JOIN Sites src ON src.SiteName = tp.SourceSite
)
SELECT 
    DESTZIP1,
    SOURCESITE1,
    CONCZIP,
    SourceSite,
    DestinationSite
FROM PolicyZips
GROUP BY CONCZIP, DESTZIP1, SOURCESITE1, SourceSite, DestinationSite;

这个改写后的查询性能会更好,尤其是当数据量比较大的时候。

内容的提问来源于stack exchange,提问作者sherimve

火山引擎 最新活动