使用tidymodels如何限制Recipe步骤调参的搜索范围?
我非常喜欢tidymodels,它不仅能调优模型参数,还能调优Recipe步骤(比如step_pls()的num_comp参数)。但最近替换step_pls()为step_umap()时遇到了麻烦:我希望将组件数的搜索范围限定在2-5,但代码总是尝试构建约50个组件的UMAP,直接导致会话崩溃。我尝试过用grid_random或grid_max_entropy时写类似param_grid%>%grid_random(size=5,num_comp() %>% range_set(c(3, 5)))的写法,但设置似乎完全被忽略了。请问该怎么正确限制特定调参的搜索范围?
附我使用的相关代码:
# Load Packages ----------------------------------------------------------- require(tidyverse) require(lubridate) require(tidymodels) require(rsample) require(themis) require(recipes) require(embed) # Load Data --------------------------------------------------------------- data<-read_csv("....data.csv") # Modelling - Data Partition ---------------------------------------------- split_prop <- 0.80 init_split <- initial_time_split(data, prop = split_prop) set_train<-training(init_split) set_test<-testing(init_split) # Modelling - Resamples --------------------------------------------------- valid_folds <- rsample::vfold_cv(set_train,v=5) # Modelling - Data Transf ------------------------------------------------- recip_train <- recipe(label ~ ., data = set_train)%>% step_normalize(all_predictors())%>% step_pls(all_predictors(),outcome = "label",num_comp = tune()) # Modelling - Model Specs --------------------------------------------------- model_glm <- linear_reg()%>% set_args(penalty=tune(), mixture=tune())%>% set_mode("regression") %>% set_engine("glmnet") # Workflow ------------------------------------------------------------------ wflw <- workflow() %>% add_recipe(recip_train) %>% add_model(model_glm) # Modelling - Tuning Control ------------------------------------------------- ctr_tune <- control_grid( verbose = TRUE, allow_par = TRUE, extract = NULL, save_pred = TRUE, pkgs = NULL ) param_grid<-wflw %>% parameters()%>% finalize(set_train)%>% grid_max_entropy(size = 5) # Modelling - Tuning --------------------------------------------------------- tuning <- tune_grid(object = wflw, resamples = valid_folds, grid = param_grid, control = ctr_tune, metrics = metric_set(rmse))
你的问题出在参数范围设置的时机不对——你是在生成网格的时候尝试修改范围,但正确的做法是先调整参数对象的范围,再基于这个修改后的对象生成调参网格。下面是具体的解决步骤:
1. 先确认参数名称
首先,你可以运行wflw %>% parameters()查看所有可调节的参数及其默认范围,确保你要修改的参数名称正确(比如step_umap()的num_comp参数名称就是num_comp,和step_pls()一致)。
2. 正确修改参数范围
在生成网格之前,用update()或者range_set()来指定num_comp的范围。这里有两种写法都能生效:
写法一:使用update()
param_grid <- wflw %>% parameters() %>% finalize(set_train) %>% # 明确指定num_comp的范围为2-5 update(num_comp = num_comp(range = c(2, 5))) %>% grid_max_entropy(size = 5)
写法二:使用range_set()
param_grid <- wflw %>% parameters() %>% finalize(set_train) %>% # 给num_comp参数设置范围 range_set(num_comp(), c(2, 5)) %>% grid_max_entropy(size = 5)
3. 验证设置是否生效
生成网格后,运行print(param_grid)查看num_comp列的值,确认所有数值都在2-5之间,就说明设置成功了。
4. 多参数调整的扩展
如果你需要同时调整多个参数的范围(比如模型的penalty、mixture和Recipe的num_comp),只需要依次添加update()或range_set()语句即可,比如:
param_grid <- wflw %>% parameters() %>% finalize(set_train) %>% update(num_comp = num_comp(range = c(2, 5))) %>% update(penalty = penalty(range = c(-5, 0))) %>% # 调整penalty的对数范围 grid_max_entropy(size = 10)
这样修改后,你的调参过程就只会在指定的2-5个UMAP组件范围内搜索,不会再出现生成大量组件导致崩溃的问题了。
内容的提问来源于stack exchange,提问作者oprick




