寻求GROUP BY替代方案:移除SQL别名列重复值的技术方法
首先得说清楚为啥你直接GROUP BY CONCZIP不行——SQL的执行顺序是先处理FROM/JOIN,然后WHERE,接着是GROUP BY,最后才轮到SELECT。这意味着当数据库执行GROUP BY的时候,你在SELECT里定义的别名CONCZIP还没被创建出来,自然识别不了它。
下面给你两种可行的解决办法,看你偏好哪种:
方法1:将别名对应的原始表达式放到GROUP BY中
既然不能直接用别名,那我们把生成CONCZIP的完整表达式直接写到GROUP BY里就行,虽然代码有点重复,但逻辑很直接:
SELECT (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.DestinationSite) AS DESTZIP1, (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.SOURCESite) AS SOURCESITE1, ((SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.SOURCESite)) AS CONCZIP, TransportationPolicies.SourceSite, TransportationPolicies.DestinationSite FROM TransportationPolicies GROUP BY (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = TransportationPolicies.SOURCESite), DESTZIP1, SOURCESITE1, SourceSite, DestinationSite;
⚠️ 注意:这里把其他非聚合列也加到了GROUP BY里,因为大多数数据库要求SELECT里的非聚合列必须出现在GROUP BY中。如果同一个CONCZIP对应多个不同的SourceSite或DestinationSite,你可能需要用聚合函数(比如MAX(SourceSite))来选择其中一个值,而不是把它们都加到GROUP BY里。
方法2:用CTE(公共表表达式)先计算所有列,再外层去重
这种方法更整洁,先把所有需要的列(包括CONCZIP)在CTE里计算好,然后在外层查询直接GROUP BY CONCZIP就行:
WITH PolicyZips AS ( SELECT (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) AS DESTZIP1, (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite) AS SOURCESITE1, ((SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite)) AS CONCZIP, tp.SourceSite, tp.DestinationSite FROM TransportationPolicies tp ) SELECT DESTZIP1, SOURCESITE1, CONCZIP, SourceSite, DestinationSite FROM PolicyZips GROUP BY CONCZIP, DESTZIP1, SOURCESITE1, SourceSite, DestinationSite;
如果不需要保留所有对应列,只需要唯一的CONCZIP,可以调整外层查询,比如:
WITH PolicyZips AS ( SELECT ((SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) + '_' + (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite)) AS CONCZIP, tp.SourceSite, tp.DestinationSite, (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.DestinationSite) AS DESTZIP1, (SELECT Sites.SiteZipCode FROM Sites WHERE Sites.SiteName = tp.SOURCESite) AS SOURCESITE1 FROM TransportationPolicies tp ) SELECT MAX(DESTZIP1) AS DESTZIP1, MAX(SOURCESITE1) AS SOURCESITE1, CONCZIP, MAX(SourceSite) AS SourceSite, MAX(DestinationSite) AS DestinationSite FROM PolicyZips GROUP BY CONCZIP;
这样就能得到每个唯一CONCZIP对应的任意一组关联列值,具体用MAX还是MIN取决于你的需求。
另外,顺便提一句:你当前的查询里用了很多相关子查询,其实可以改成JOIN来优化性能,比如把Sites表和TransportationPolicies表关联两次(一次对应SourceSite,一次对应DestinationSite),这样比多次子查询效率更高,代码也更易读:
WITH PolicyZips AS ( SELECT dest.SiteZipCode AS DESTZIP1, src.SiteZipCode AS SOURCESITE1, CONCAT(dest.SiteZipCode, '_', src.SiteZipCode) AS CONCZIP, tp.SourceSite, tp.DestinationSite FROM TransportationPolicies tp LEFT JOIN Sites dest ON dest.SiteName = tp.DestinationSite LEFT JOIN Sites src ON src.SiteName = tp.SourceSite ) SELECT DESTZIP1, SOURCESITE1, CONCZIP, SourceSite, DestinationSite FROM PolicyZips GROUP BY CONCZIP, DESTZIP1, SOURCESITE1, SourceSite, DestinationSite;
这个改写后的查询性能会更好,尤其是当数据量比较大的时候。
内容的提问来源于stack exchange,提问作者sherimve




