如何高效实现带日期范围的数据无重叠分组与排序?
日期范围无重叠分组优化与性能提升方案
核心需求
将带有日期范围的DateVM数据分组,确保每组内的日期范围无重叠(允许间隔或连续衔接),默认Group值为0,需排序并更新Group字段,优化大数据集下的处理性能,以降低UI展示耗时。
数据结构定义
public class DateVM { public int Group { get; set; } public string Text { get; set; } public string BackColor { get; set; } public DateTime StartDate { get; set; } public DateTime EndDate { get; set; } public bool IsSorted {get;set;} }
示例数据
输入数据
Sample Data {Group:0, Text:Record1, BackColor:Red, StartDate:10/02/2025, EndDate:16/02/2025} {Group:0, Text:Record2, BackColor:Green, StartDate:17/02/2025, EndDate:20/02/2025} {Group:0, Text:Record3, BackColor:Blue, StartDate:17/02/2025, EndDate:20/02/2025} {Group:0, Text:Record4, BackColor:Red, StartDate:21/02/2025, EndDate:23/02/2025} {Group:0, Text:Record5, BackColor:Blue, StartDate:21/02/2025, EndDate:23/02/2025} {Group:0, Text:Record6, BackColor:Green, StartDate:24/02/2025, EndDate:03/03/2025} {Group:0, Text:Record7, BackColor:Blue, StartDate:24/02/2025, EndDate:02/03/2025} {Group:0, Text:Record8, BackColor:Red, StartDate:04/03/2025, EndDate:09/03/2025} {Group:0, Text:Record9, BackColor:Green, StartDate:04/03/2025, EndDate:09/03/2025} {Group:0, Text:Record10, BackColor:Red, StartDate:19/02/2025, EndDate:20/02/2025}
期望输出
Needed Output {Group:1, Text:Record1, BackColor:Red, StartDate:10/02/2025, EndDate:16/02/2025} {Group:1, Text:Record2, BackColor:Green, StartDate:17/02/2025, EndDate:20/02/2025} {Group:2, Text:Record3, BackColor:Blue, StartDate:17/02/2025, EndDate:20/02/2025} {Group:1, Text:Record4, BackColor:Red, StartDate:21/02/2025, EndDate:23/02/2025} {Group:2, Text:Record5, BackColor:Blue, StartDate:21/02/2025, EndDate:23/02/2025} {Group:1, Text:Record6, BackColor:Green, StartDate:24/02/2025, EndDate:03/03/2025} {Group:2, Text:Record7, BackColor:Blue, StartDate:24/02/2025, EndDate:02/03/2025} {Group:1, Text:Record8, BackColor:Red, StartDate:04/03/2025, EndDate:09/03/2025} {Group:2, Text:Record9, BackColor:Green, StartDate:04/03/2025, EndDate:09/03/2025} {Group:3, Text:Record10, BackColor:Red, StartDate:19/02/2025, EndDate:20/02/2025}
当前处理流程及问题
排序逻辑
先按StartDate排序:
dateVM.Sort(DatebarVMComparer.CompareByStartDate);
比较器实现(存在性能问题):
public static int CompareByStartDate(DateVM x, DateVM y) { if (x == null) { return y == null ? 0 : -1; } if (y == null) { return 1; } // 错误:DateVM的StartDate已是DateTime类型,无需重复Parse int retval = DateTime.Parse(x.StartDate).CompareTo(DateTime.Parse(y.StartDate)); return retval; }
分组逻辑
- 第一条记录加入第1组;
- 后续记录依次与已处理的所有组的日期范围比较,若重叠则移至下一组,否则加入当前组。
问题:大数据集下,每次分组都需遍历所有组的所有记录,时间复杂度为O(n²),效率极低、资源消耗大。
现有实现代码
public static IList<SomeObjectNonOverlapping> SortToNonOverlappingDates(List<DateVM> bars) { IList<SomeObjectNonOverlapping> response = new List<SomeObjectNonOverlapping>(); response = bars.Select((t, i) => new SomeObjectNonOverlapping { Text = t.Text, BackColor = t.BackColor, StartDate = DateTime.Parse(t.StartDate), EndDate = DateTime.Parse(t.EndDate), IsSorted = false }).ToList(); var group = 1; response.First().Group = group; response.First().IsSorted = true; for (var i = 1; i < response.Count; i++) { group = response.Max(x => x.Group); for (var g = 1; g <= group; g++) { if (!response.Where(x => x.IsSorted && x.Group == g).Except(response[i]).Any(range => (response[i].StartDate >= range.StartDate && response[i].StartDate <= range.EndDate))) { response[i].Group = g; response[i].IsSorted = true; break; } else { group++; } } } return response; }
性能优化建议
1. 修正排序逻辑,减少不必要的DateTime解析
DateVM的StartDate和EndDate已是DateTime类型,无需重复调用DateTime.Parse,直接比较即可:
public static int CompareByStartDate(DateVM x, DateVM y) { if (x == null) return y == null ? 0 : -1; if (y == null) return 1; int retval = x.StartDate.CompareTo(y.StartDate); // 若StartDate相同,按EndDate排序,进一步优化后续分组效率 if (retval == 0) { retval = x.EndDate.CompareTo(y.EndDate); } return retval; }
2. 预排序+维护分组最大结束日期,降低时间复杂度
- 预排序:先按
StartDate升序,StartDate相同则按EndDate升序排序,确保后续记录的StartDate不会早于已处理记录。 - 维护分组最大EndDate:用字典存储每个分组的最晚结束日期,判断当前记录能否加入某分组时,只需比较当前记录的
StartDate是否大于该分组的最大EndDate(因已排序,若当前StartDate大于分组最大EndDate,则与分组内所有记录无重叠)。
优化后的代码:
public static IList<SomeObjectNonOverlapping> SortToNonOverlappingDatesOptimized(List<DateVM> bars) { // 预排序:StartDate升序,StartDate相同则EndDate升序 var sortedBars = bars.OrderBy(x => x.StartDate).ThenBy(x => x.EndDate).ToList(); // 转换目标对象,直接复用DateTime类型,避免重复解析 var response = sortedBars.Select(t => new SomeObjectNonOverlapping { Text = t.Text, BackColor = t.BackColor, StartDate = t.StartDate, EndDate = t.EndDate, IsSorted = true, Group = 0 }).ToList(); if (!response.Any()) return response; // 存储每个分组的最大结束日期,以及当前最大分组号 var groupMaxEndDates = new Dictionary<int, DateTime>(); int currentMaxGroup = 1; // 处理第一个元素 var firstItem = response[0]; firstItem.Group = currentMaxGroup; groupMaxEndDates[currentMaxGroup] = firstItem.EndDate; // 处理剩余元素 for (int i = 1; i < response.Count; i++) { var currentItem = response[i]; bool assigned = false; // 遍历现有分组,寻找可加入的分组 foreach (var groupEntry in groupMaxEndDates) { int groupId = groupEntry.Key; DateTime groupMaxEnd = groupEntry.Value; // 已排序前提下,当前Item的StartDate >= 所有已处理Item的StartDate // 只要当前Item的StartDate > 分组最大EndDate,就无重叠 if (currentItem.StartDate > groupMaxEnd) { currentItem.Group = groupId; // 更新分组的最大EndDate为当前Item和原最大值的较大者 if (currentItem.EndDate > groupMaxEnd) { groupMaxEndDates[groupId] = currentItem.EndDate; } assigned = true; break; } } // 无可用分组时,新增分组 if (!assigned) { currentMaxGroup++; currentItem.Group = currentMaxGroup; groupMaxEndDates[currentMaxGroup] = currentItem.EndDate; } } return response; }
3. 避免重复计算最大分组号
原代码每次循环调用response.Max(x => x.Group),时间复杂度为O(n),改用变量currentMaxGroup跟踪当前最大分组号,每次新增分组时自增即可,时间复杂度降为O(1)。
C#日期范围处理类库推荐
可以使用TimePeriodLibrary.NET,它提供了专业的时间范围操作API,包括TimeRange类的IntersectsWith方法可快速判断两个时间范围是否重叠。不过针对当前需求,上述优化后的自定义逻辑已足够高效,无需引入第三方依赖。
内容的提问来源于stack exchange,提问作者TUS




