统计2134×2134相关矩阵中绝对值大于0.8的元素数量问题求助
Troubleshooting Your Correlation Matrix Element Counting Issue
Hey there! Let's break down why your count results don't match your histogram and fix the zero-count problem for values < -0.8.
First, Diagnosing the "50% > 0.8" Discrepancy
Correlation matrices are symmetric (each pair (i,j) has the same value as (j,i)) and have 1s along the diagonal (every variable correlates perfectly with itself). Here's what's almost certainly going on:
- Your original code
length(TFcoTF[TFcoTF>.8])counts all elements meeting the condition—including the 2134 diagonal 1s, and both the upper and lower triangular copies of each off-diagonal value. - If your histogram only displays the upper (or lower) triangle excluding the diagonal (a standard practice to avoid redundant data), your full-matrix count will be drastically higher than the histogram suggests.
Fix to Match Your Histogram
To count only the unique, non-diagonal elements aligned with your histogram's scope, extract the upper triangle first:
# Pull upper triangle elements (excludes diagonal and lower duplicates) upper_tri_elements <- TFcoTF[upper.tri(TFcoTF)] # Count values > 0.8 in this subset count_gt0.8 <- length(upper_tri_elements[upper_tri_elements > 0.8]) # Or use sum() for a cleaner, more efficient approach: count_gt0.8 <- sum(upper_tri_elements > 0.8, na.rm = TRUE)
Why Are You Getting 0 for Values < -0.8?
There are a few likely culprits here:
- Your data truly has no values < -0.8: Verify this by checking the matrix's minimum value:
If the result is greater than -0.8, the 0 count is correct.min(TFcoTF, na.rm = TRUE) - Missing values (NA) are interfering: NAs return
NAin comparisons, which get excluded from length counts. Usesum()withna.rm = TRUEto handle this:count_lt_neg0.8 <- sum(TFcoTF < -0.8, na.rm = TRUE) - Data type mismatch: Double-check your matrix is numeric:
If it's a character matrix, comparisons won't work as expected—convert it withclass(TFcoTF) str(TFcoTF)TFcoTF <- matrix(as.numeric(TFcoTF), nrow = nrow(TFcoTF))first (clean up any non-numeric values if present).
Quick Pro Tip
Using sum() on logical vectors is generally more efficient and readable than subsetting and counting length—plus it handles NAs seamlessly with na.rm = TRUE.
内容的提问来源于stack exchange,提问作者mike ropri




