Excel数据透视表与Power Query重复条目处理方案——容量与需求表合并后重复计算问题咨询
我看了你遇到的核心问题:通过Power Query按RESOURCE NAME关联Capacity和Demand表后,数据透视表统计Capacity Hours和Total Capacity Cost时,会因为单个资源对应多条Demand记录而重复累加,导致结果失真,而需求类数据显示正常。下面给你两种针对性的解决方案:
核心原因分析
当你用左外连接关联Demand和Capacity表时,每条Demand记录都会复制对应资源的Capacity数据。比如Resource 1有2条Demand记录,它的Capacity(800)就会被计算2次,透视表求和后会显示1600,但实际应该统计为800。这是因为透视表默认的求和逻辑是对所有行的数值累加,而非按资源维度去重统计。
解决方案1:调整Power Query合并逻辑(生成无重复的汇总表)
这种方法先分别对两张表做维度汇总,再关联,从根源避免重复行:
步骤1:清理Capacity表,保留资源唯一容量记录
在Power Query中编辑CapacityTable,确保每个RESOURCE NAME只有一条有效容量记录(如果同一资源有多个项目/时间段的容量数据,可根据业务需求选择求和、取最大值等合并逻辑):
CapacityTable_Cleaned = let Source = Excel.CurrentWorkbook(){[Name="CapacityTable"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"RESOURCE NAME", type text}, {"SKILL GROUP", type text}, {"PROJECT", type text}, {"START DATE", type datetime}, {"END DATE", type datetime}, {"COST PER HOUR", Int64.Type}, {"CAPACITY", Int64.Type}}), // 按资源分组,汇总容量、保留成本和维度信息 #"Grouped Rows" = Table.Group(#"Changed Type", {"RESOURCE NAME"}, { {"Total Capacity", each List.Sum([CAPACITY]), type number}, {"Cost Per Hour", each List.Max([COST PER HOUR]), type number}, {"Skill Groups", each Text.Combine(List.Distinct([SKILL GROUP]), ", "), type text}, {"Assigned Projects", each Text.Combine(List.Distinct([PROJECT]), ", "), type text} }) in #"Grouped Rows"
步骤2:汇总Demand表,按资源统计总需求
同样在Power Query中编辑DemandTable,按RESOURCE NAME汇总需求数据:
DemandTable_Summarized = let Source = Excel.CurrentWorkbook(){[Name="DemandTable"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"RESOURCE NAME", type text}, {"SKILL GROUP", type text}, {"PROJECT", type text}, {"START DATE", type datetime}, {"END DATE", type datetime}, {"DEMAND", Int64.Type}}), #"Grouped Rows" = Table.Group(#"Changed Type", {"RESOURCE NAME"}, { {"Total Demand", each List.Sum([DEMAND]), type number}, {"Demand Projects", each Text.Combine(List.Distinct([PROJECT]), ", "), type text} }) in #"Grouped Rows"
步骤3:关联清理后的两张表
创建新的FinalTable_Cleaned,关联上述两个汇总表:
FinalTable_Cleaned = let Source = Table.NestedJoin(DemandTable_Summarized, {"RESOURCE NAME"}, CapacityTable_Cleaned, {"RESOURCE NAME"}, "CapacityData", JoinKind.FullOuter), #"Expanded CapacityData" = Table.ExpandTableColumn(Source, "CapacityData", {"Total Capacity", "Cost Per Hour", "Skill Groups", "Assigned Projects"}, {"Total Capacity", "Cost Per Hour", "Skill Groups", "Assigned Projects"}) in #"Expanded CapacityData"
生成的表中每个资源仅占一行,透视表统计时就不会出现容量重复计算的问题。
解决方案2:使用DAX度量值(不修改原始合并表,通过度量值修正统计)
如果不想调整Power Query的合并结果,可以直接在Excel数据模型中创建DAX度量值,按资源维度去重统计容量:
1. 准确统计总容量的度量值
Total Capacity (Correct) = SUMX( DISTINCT(CapacityTable[RESOURCE NAME]), CALCULATE(SUM(CapacityTable[CAPACITY])) )
2. 准确统计总容量成本的度量值
Total Capacity Cost (Correct) = SUMX( DISTINCT(CapacityTable[RESOURCE NAME]), CALCULATE(SUM(CapacityTable[CAPACITY]) * MAX(CapacityTable[COST PER HOUR])) )
在数据透视表中使用这些度量值代替直接求和CAPACITY和COST PER HOUR*CAPACITY,就能得到准确的结果。
关于按项目维度关联的补充
如果你的业务场景需要按RESOURCE NAME+PROJECT双重维度关联(比如同一资源在不同项目中有独立容量),可以修改Power Query的关联键为两个字段的组合:
FinalTable_ProjectLinked = let Source = Table.NestedJoin(DemandTable, {"RESOURCE NAME", "PROJECT"}, CapacityTable, {"RESOURCE NAME", "PROJECT"}, "CapacityTable", JoinKind.LeftOuter), #"Expanded CapacityTable" = Table.ExpandTableColumn(Source, "CapacityTable", {"SKILL GROUP", "START DATE", "END DATE", "COST PER HOUR", "CAPACITY"}, {"Capacity.SKILL GROUP", "Capacity.START DATE", "Capacity.END DATE", "Capacity.COST PER HOUR", "Capacity.CAPACITY"}) in #"Expanded CapacityTable"
注意:这种方式需要确保Demand表中的PROJECT在Capacity表中有对应记录,否则会出现空值,需根据实际业务判断是否适用。
内容的提问来源于stack exchange,提问作者techmaster




