添加BY命令后SAS输出缺失观测值的原因及解决方法
Let's break down what's happening here and how to fix it:
The Root Cause
When you use a BY statement in SAS procedures like PROC PRINT, SAS expects the input dataset to be sorted by the BY variable(s) (in your case, age), or to have an index built on that variable.
The sashelp.class dataset is sorted by name by default, not by age. When you run your second code without sorting first, SAS can't properly identify contiguous groups of the same age. This leads to unexpected grouping behavior—instead of grouping all same-age observations together, it treats non-consecutive same-age entries as separate groups, which makes some observations appear "missing" in your output (they’re actually tucked into unnoticeable group sections, or the output formatting hides them).
Fixes You Can Use
Option 1: Sort the Dataset First
The most straightforward fix is to sort the dataset by age before using the BY statement. This ensures SAS can correctly group all observations by age:
/* Sort the dataset by age and output to a new dataset */ proc sort data=sashelp.class out=class_sorted; by age; run; /* Use the sorted dataset with PROC PRINT */ proc print data=class_sorted; var name sex age height weight; where age > 14; sum weight; by age; run;
Option 2: Use the NOTSORTED Keyword
If you don't want to sort the dataset (e.g., you want to keep the original order of observations but still group by age as they appear), add the NOTSORTED keyword to your BY statement. This tells SAS to group observations based on the order the BY variable appears in the dataset, without requiring prior sorting:
proc print data=sashelp.class; var name sex age height weight; where age > 14; sum weight; by age notsorted; run;
Either of these approaches will ensure all your filtered observations (age >14) appear correctly in the output, including the ones that seemed to go missing.
内容的提问来源于stack exchange,提问作者S.Lee




