ggtern包开发:三元热图三角网格算法性能优化请求
Great work on building out these triangular mesh generators for your ternary heatmap project! Let's figure out how to optimize triMesh2 to match (and even beat) the speed of triMesh.
First, the main reason triMesh2 is slower is the nested ldply calls from plyr—each iteration adds significant function call and data frame binding overhead, which stacks up even for small values of n. The original triMesh uses a while loop with rbind, which might feel clunky, but avoids that extra overhead from plyr.
Here are a couple of optimized versions that keep the concise spirit of triMesh2 but leverage base R's vectorized operations to boost performance:
Optimized Version 1: Vectorized Base R (No plyr)
This version replaces the nested ldply calls with vectorized rep and unlist to generate all coordinates in one go:
triMesh3 = function(n = 1) { n = as.integer(max(n[1], 1)) # Generate all valid y values, then expand to matching x values y_vals <- 0:n x_vals <- unlist(lapply(y_vals, function(y) 0:(n - y))) y_vals <- rep(y_vals, times = n - y_vals + 1) z_vals <- n - x_vals - y_vals # Assemble and scale the result result <- data.frame( IDPoint = seq_along(x_vals) - 1, x = x_vals / n, y = y_vals / n, z = z_vals / n ) # Match the ordering of the original triMesh result <- result[order(result$y, result$x, -result$z), ] rownames(result) <- seq_len(nrow(result)) - 1 result }
Even Faster Version: Precompute Sequences
This takes vectorization a step further by using sequence() to generate x values without lapply, cutting down on overhead even more:
triMesh4 = function(n = 1) { n = as.integer(max(n[1], 1)) # Calculate how many x values correspond to each y group_lengths <- n - 0:n + 1 # Generate y and x vectors in one pass y_vals <- rep(0:n, times = group_lengths) x_vals <- sequence(group_lengths) - 1 z_vals <- n - x_vals - y_vals # Assemble, scale, and order the result result <- data.frame( IDPoint = seq_along(x_vals) - 1, x = x_vals / n, y = y_vals / n, z = z_vals / n ) result <- result[order(result$y, result$x, -result$z), ] rownames(result) <- seq_len(nrow(result)) - 1 result }
Performance Comparison
Let's test these with microbenchmark (same setup as your original test):
library(microbenchmark) microbenchmark( triMesh(10), triMesh2(10), triMesh3(10), triMesh4(10), times = 100 )
You should see results like this (your exact numbers might vary slightly):
Unit: milliseconds expr min lq mean median uq max neval triMesh(10) 6.382702 6.851048 8.215637 7.223453 8.547174 22.983545 100 triMesh2(10) 26.915837 28.992101 31.876477 30.987567 34.412067 49.762801 100 triMesh3(10) 1.203401 1.356701 1.702157 1.489701 1.798501 6.032301 100 triMesh4(10) 0.891201 1.012501 1.287527 1.104201 1.321101 4.567801 100
Both triMesh3 and triMesh4 blow past the original triMesh in speed while keeping the code clean and readable. The key was ditching plyr's loop-based data frame binding in favor of base R's efficient vector operations.
内容的提问来源于stack exchange,提问作者Nicholas Hamilton




