使用Stata合并多数据集时遇type mismatch错误求助
date() in Stata When Merging Datasets Let's break down why you're hitting that type mismatch error and walk through how to fix it, along with addressing the date format inconsistency between your two datasets.
First: Understand the Root of the Type Mismatch
The date() function in Stata only works on string variables. If your DateCollected variable is already a numeric (date-formatted) variable, running generate dateCollected = date(DateCollected, "DMY") will throw a type mismatch error immediately.
Looking at your dataset samples:
- In
Creatinine.dta,DateCollectedis a string (e.g.,"12/5/2007"), which should work withdate()—but if some records have non-standard values (empty strings, garbage text, or mixed formats), that can also trigger errors. - Your
Report.dtauses numeric codes for categorical variables (e.g.,sex=2for Female) and string dates with month abbreviations (e.g.,"15-Apr-07"), which creates mismatches when merged withCreatinine.dta's stringsexand slash-separated dates.
Step 1: Diagnose the DateCollected Variable
Before running the generate command, run these lines to check what you're working with:
describe DateCollected codebook DateCollected
- If the output says
type: numeric, you don't need to convert it—just adjust the display format withformat %tdCCYY/NN/DD DateCollected. - If it's
type: string, look for problematic entries with:
This will show you any strings thatlist id DateCollected if missing(date(DateCollected, "DMY"))date()can't parse (e.g.,"N/A","07/12/2007"if some dates are MDY instead of DMY).
Step 2: Standardize Dates Across Both Datasets
To fix the format inconsistency between Creatinine.dta and Report.dta, clean each dataset separately first:
Clean Report.dta first:
use "Archive\Report.dta", clear // Convert string dates to numeric dates foreach var in dob tx1 censor_date DeathDate { gen `var'_num = date(`var', "DMY") // "DMY" recognizes month abbreviations like "Apr" format %tdCCYY/NN/DD `var'_num drop `var' rename `var'_num `var' } save "Archive\Report_cleaned.dta", replace
Clean Creatinine.dta next:
use "\\files\Creatinine.dta", clear // Fix DateCollected first gen dateCollected = date(DateCollected, "DMY") // Check for failed conversions and fix if needed (e.g., try MDY for mixed formats) list id DateCollected if missing(dateCollected) // replace dateCollected = date(DateCollected, "MDY") if missing(dateCollected) format %tdCCYY/NN/DD dateCollected drop DateCollected rename dateCollected DateCollected // Standardize other dates to match Report.dta foreach var in dob tx1 censor_date DeathDate { gen `var'_num = date(`var', "DMY") format %tdCCYY/NN/DD `var'_num drop `var' rename `var'_num `var' } // Fix sex variable to match Report.dta's numeric code gen sex_num = cond(sex == "Male", 1, cond(sex == "Female", 2, .)) drop sex rename sex_num sex label define sex1 1 "Male" 2 "Female" // Recreate label if needed label values sex sex1
Step 3: Merge the Cleaned Datasets
Now that both datasets have consistent variable types, run your merge safely:
merge m:1 id using "Archive\Report_cleaned.dta" // Keep only matched records (adjust based on your needs) drop if _merge != 3 drop _merge // Rest of your labeling and cleaning code here... label define org 1 "Heart" 2 "Lung" 3 "Liver" 5 "Multiple" 6 "Small Bowel" 7 "Pancreas" 8 "Stomach" label values organ1 organ2 organ3 org // ... (continue with your existing labeling, ordering, and duplicate removal)
Key Takeaways
- Always confirm variable types before running conversion functions like
date(). - Merge only after standardizing variable types (dates, categoricals) across all datasets.
- Check for failed date conversions—mixed formats (DMY vs MDY) are a common hidden issue.
内容的提问来源于stack exchange,提问作者user8722680




