You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用Stata合并多数据集时遇type mismatch错误求助

Fixing "Type Mismatch" Error with date() in Stata When Merging Datasets

Let's break down why you're hitting that type mismatch error and walk through how to fix it, along with addressing the date format inconsistency between your two datasets.

First: Understand the Root of the Type Mismatch

The date() function in Stata only works on string variables. If your DateCollected variable is already a numeric (date-formatted) variable, running generate dateCollected = date(DateCollected, "DMY") will throw a type mismatch error immediately.

Looking at your dataset samples:

  • In Creatinine.dta, DateCollected is a string (e.g., "12/5/2007"), which should work with date()—but if some records have non-standard values (empty strings, garbage text, or mixed formats), that can also trigger errors.
  • Your Report.dta uses numeric codes for categorical variables (e.g., sex=2 for Female) and string dates with month abbreviations (e.g., "15-Apr-07"), which creates mismatches when merged with Creatinine.dta's string sex and slash-separated dates.

Step 1: Diagnose the DateCollected Variable

Before running the generate command, run these lines to check what you're working with:

describe DateCollected
codebook DateCollected
  • If the output says type: numeric, you don't need to convert it—just adjust the display format with format %tdCCYY/NN/DD DateCollected.
  • If it's type: string, look for problematic entries with:
    list id DateCollected if missing(date(DateCollected, "DMY"))
    
    This will show you any strings that date() can't parse (e.g., "N/A", "07/12/2007" if some dates are MDY instead of DMY).

Step 2: Standardize Dates Across Both Datasets

To fix the format inconsistency between Creatinine.dta and Report.dta, clean each dataset separately first:

Clean Report.dta first:

use "Archive\Report.dta", clear
// Convert string dates to numeric dates
foreach var in dob tx1 censor_date DeathDate {
    gen `var'_num = date(`var', "DMY") // "DMY" recognizes month abbreviations like "Apr"
    format %tdCCYY/NN/DD `var'_num
    drop `var'
    rename `var'_num `var'
}
save "Archive\Report_cleaned.dta", replace

Clean Creatinine.dta next:

use "\\files\Creatinine.dta", clear
// Fix DateCollected first
gen dateCollected = date(DateCollected, "DMY")
// Check for failed conversions and fix if needed (e.g., try MDY for mixed formats)
list id DateCollected if missing(dateCollected)
// replace dateCollected = date(DateCollected, "MDY") if missing(dateCollected)
format %tdCCYY/NN/DD dateCollected
drop DateCollected
rename dateCollected DateCollected

// Standardize other dates to match Report.dta
foreach var in dob tx1 censor_date DeathDate {
    gen `var'_num = date(`var', "DMY")
    format %tdCCYY/NN/DD `var'_num
    drop `var'
    rename `var'_num `var'
}

// Fix sex variable to match Report.dta's numeric code
gen sex_num = cond(sex == "Male", 1, cond(sex == "Female", 2, .))
drop sex
rename sex_num sex
label define sex1 1 "Male" 2 "Female" // Recreate label if needed
label values sex sex1

Step 3: Merge the Cleaned Datasets

Now that both datasets have consistent variable types, run your merge safely:

merge m:1 id using "Archive\Report_cleaned.dta"
// Keep only matched records (adjust based on your needs)
drop if _merge != 3
drop _merge

// Rest of your labeling and cleaning code here...
label define org 1 "Heart" 2 "Lung" 3 "Liver" 5 "Multiple" 6 "Small Bowel" 7 "Pancreas" 8 "Stomach"
label values organ1 organ2 organ3 org
// ... (continue with your existing labeling, ordering, and duplicate removal)

Key Takeaways

  • Always confirm variable types before running conversion functions like date().
  • Merge only after standardizing variable types (dates, categoricals) across all datasets.
  • Check for failed date conversions—mixed formats (DMY vs MDY) are a common hidden issue.

内容的提问来源于stack exchange,提问作者user8722680

火山引擎 最新活动