自托管GitLab升级至13.9.1-ee后创建子组触发500错误求助
问题背景
- 之前GitLab运行状态完全正常,创建子组操作毫无问题
- 从13.3版本升级到13.9.1-ee之后,再尝试创建新子组就出现了500服务器错误
完整环境信息
系统信息
System: Current User: myuser Using RVM: no Ruby Version: 2.7.2p137 Gem Version: 3.1.4 Bundler Version:2.1.4 Rake Version: 13.0.3 Redis Version: 6.0.10 Git Version: 2.29.0 Sidekiq Version:5.2.9 Go Version: unknown
GitLab信息
Version: 13.9.1-ee Revision: 8ae438629fa Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 12.5 URL: https://my.gitlab.co.uk HTTP Clone URL: https://my.gitlab.co.uk/some-group/some-project.git SSH Clone URL: myuser@my.gitlab.co.uk:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: yes Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 13.16.1 Repository storage paths: - default: /app/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
捕获到的错误日志
{"method":"GET","path":"/groups/new","format":"html","controller":"GroupsController","action":"new","status":500,"time":"2021-04-19T11:59:30.155Z","params":[{"key":"parent_id","value":"189"}],"remote_ip":"10.78.XX.XXX","user_id":1,"username":"root","ua":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36","correlation_id":"XXXXXXXXXXXXXXXXXXXX","meta.user":"root","meta.caller_id":"GroupsController#new","meta.remote_ip":"10.78.XX.XXX","meta.feature_category":"subgroups","redis_calls":12,"redis_duration_s":0.0044269999999999995,"redis_read_bytes":1076,"redis_write_bytes":1647,"redis_cache_calls":11,"redis_cache_duration_s":0.004037,"redis_cache_read_bytes":893,"redis_cache_write_bytes":776,"redis_shared_state_calls":1,"redis_shared_state_duration_s":0.00039,"redis_shared_state_read_bytes":183,"redis_shared_state_write_bytes":871,"db_count":14,"db_write_count":0,"db_cached_count":3,"cpu_s":0.22893,"queue_duration_s":0.007548,"exception.class":"ActionView::Template::Error","exception.message":"undefined method `gitlab_subscription' for nil:NilClass","exception.backtrace":["ee/app/models/ee/namespace.rb:184:in `block in closest_gitlab_subscription'","lib/gitlab/utils/strong_memoize.rb:30:in `strong_memoize'","ee/app/models/ee/namespace.rb:182:in `closest_gitlab_subscription'","ee/app/helpers/ee/subscribable_banner_helper.rb:61:in `decorated_subscription'","ee/app/helpers/ee/subscribable_banner_helper.rb:25:in `renew_subscription_path'","ee/app/views/layouts/header/_ee_subscribable_banner.html.haml:24","app/helpers/application_helper.rb:17:in `render_if_exists'","app/views/layouts/_page.html.haml:18","app/views/layouts/application.html.haml:13","app/controllers/application_controller.rb:125:in `render'","ee/lib/gitlab/ip_address_state.rb:10:in `with'","ee/app/controllers/ee/application_controller.rb:44:in `set_current_ip_address'","app/controllers/application_controller.rb:482:in `set_current_admin'","lib/gitlab/session.rb:11:in `with_session'","app/controllers/application_controller.rb:473:in `set_session_storage'","lib/gitlab/i18n.rb:73:in `with_locale'","lib/gitlab/i18n.rb:79:in `with_user_locale'","app/controllers/application_controller.rb:467:in `set_locale'","lib/gitlab/error_tracking.rb:52:in `with_context'","app/controllers/application_controller.rb:532:in `sentry_context'","app/controllers/application_controller.rb:460:in `block in set_current_context'","lib/gitlab/application_context.rb:56:in `block in use'","lib/gitlab/application_context.rb:56:in `use'","lib/gitlab/application_context.rb:22:in `with_context'","app/controllers/application_controller.rb:451:in `set_current_context'","lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'","lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'","lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'","lib/gitlab/metrics/transaction.rb:56:in `run'","lib/gitlab/metrics/rack_middleware.rb:16:in `call'","lib/gitlab/request_profiler/middleware.rb:17:in `call'","lib/gitlab/jira/middleware.rb:19:in `call'","lib/gitlab/middleware/go.rb:20:in `call'","lib/gitlab/etag_caching/middleware.rb:21:in `call'","lib/gitlab/middleware/multipart.rb:172:in `call'","lib/gitlab/middleware/read_only/controller.rb:50:in `call'","lib/gitlab/middleware/read_only.rb:18:in `call'","lib/gitlab/middleware/same_site_cookies.rb:27:in `call'","lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'","lib/gitlab/middleware/basic_health_check.rb:25:in `call'","lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'","lib/gitlab/middleware/request_context.rb:21:in `call'","config/initializers/fix_local_cache_middleware.rb:11:in `call'","lib/gitlab/metrics/requests_rack_middleware.rb:76:in `call'","lib/gitlab/middleware/release_env.rb:12:in `call'"],"db_duration_s":0.0096,"view_duration_s":0.0,"duration_s":0.22644}
问题分析与解决步骤
咱们先抓核心报错看:undefined method 'gitlab_subscription' for nil:NilClass,意思是代码尝试调用gitlab_subscription方法,但这个方法的调用对象是空的(nil)。从回溯信息能看到,问题出在EE版本的命名空间模型里,查找父命名空间的订阅信息时出了问题——具体是你要创建子组的父组(parent_id=189),它的订阅关联数据可能丢了,或者升级后没正确关联上。
结合升级后才出现的情况,大概率是升级过程中数据库迁移没跑全,或者命名空间和订阅的关联数据出了异常。下面是几个可以一步步尝试的解决办法:
1. 重新执行数据库迁移
有时候升级时迁移可能因为网络、权限等原因没完全执行,咱们重新跑一遍:
sudo gitlab-rake db:migrate
跑完之后一定要重启GitLab服务:
sudo gitlab-ctl restart
2. 检查父组的订阅数据
既然报错指向parent_id=189的组,咱们去数据库里查一下这个组的订阅情况:
- 先进入GitLab的PostgreSQL控制台:
sudo gitlab-psql -d gitlabhq_production
- 查询这个父组的基本信息,确认它存在:
SELECT * FROM namespaces WHERE id = 189;
- 再查它对应的订阅记录:
SELECT * FROM gitlab_subscriptions WHERE namespace_id = 189;
如果没找到订阅记录,而你的GitLab EE是有合法订阅的,那可能需要重新激活订阅;如果是自许可的情况,可以尝试重新导入许可文件。
3. 清除GitLab缓存
缓存异常也可能导致这类数据关联问题,咱们清一下缓存再试试:
sudo gitlab-rake cache:clear
同样,清完缓存要重启服务:
sudo gitlab-ctl restart
4. 升级到13.9系列的最新小版本
13.9.1可能存在这个特定的bug,GitLab后续的小版本(比如13.9.6)大概率修复了这个问题。升级前记得先备份好数据,然后按照官方升级流程升级到同大版本下的最新稳定版。
5. 临时绕过订阅横幅(应急方案)
如果上面的方法都暂时解决不了,咱们可以先把导致报错的订阅横幅视图禁用,先让创建子组的功能恢复:
- 重命名那个有问题的视图文件,让系统找不到它:
sudo mv /opt/gitlab/embedded/service/gitlab-rails/ee/app/views/layouts/header/_ee_subscribable_banner.html.haml /opt/gitlab/embedded/service/gitlab-rails/ee/app/views/layouts/header/_ee_subscribable_banner.html.haml.bak
- 重启GitLab服务:
sudo gitlab-ctl restart
这个方法只是临时应急,后续还是建议用前面的方法彻底解决问题,不然订阅相关的功能可能会受影响。
内容的提问来源于stack exchange,提问作者mikita agrawal




