You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Flink应用模式下RocksDB文件名过长崩溃的解决方案咨询

针对增量恢复时RocksDB临时文件名超长导致崩溃的问题,FLINK-31743的修复确实未覆盖所有场景,以下是几个无需缩短应用名称的可靠解决办法:

1. 用短ID占位符自定义RocksDB本地目录

直接在Flink配置里指定state.backend.rocksdb.localdir,利用Flink的短ID占位符压缩路径长度,同时保证唯一性:

flinkConfiguration:
  state.backend.rocksdb.localdir: /tmp/rdb/tm_${taskmanager.id:short}_job_${job.id:short}_op_${operator.id:short}

:short后缀会截取ID的前8位,把原来超长的UUID和算子ID压缩成短字符串,从根源减少文件名长度。

2. 彻底关闭RocksDB日志文件

降低日志级别无效的话,直接通过RocksDB原生配置禁用日志生成:

flinkConfiguration:
  state.backend.rocksdb.log.level: OFF
  # 强制关闭RocksDB日志文件生成
  state.backend.rocksdb.extended-options: "keep_log_file_num=0;log_file_time_to_roll=0;log_file_size_to_roll=0"

keep_log_file_num=0会让RocksDB不创建任何日志文件,彻底避免日志文件名过长的问题。

3. Operator层面映射短作业名

在FlinkDeployment的YAML里,用spec.jobName设置一个短名称,不影响应用的业务标识和UI显示:

apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
spec:
  jobName: short-job-id  # 这个名称会被用于生成RocksDB路径
  name: hydra-sql-adr-assoc-device-and-login-features  # 原应用名称,Flink UI显示这个
  flinkVersion: v1_22
  # 其他部署配置...

这样既保留了原应用的清晰名称,又缩短了RocksDB路径里的作业名部分。

4. 补丁修复增量恢复的临时文件命名

如果上面的配置都不生效,只能修改Flink代码补全FLINK-31743的修复:

  • 找到RocksDBIncrementalRestoreOperation类中生成临时DB路径的代码,把拼接的超长算子ID、UUID替换成短哈希值(比如MD5前8位)。
  • 编译自定义的flink-statebackend-rocksdb jar包,替换集群里的对应依赖。

问题堆栈信息

[...]
java.io.IOException: Error while opening RocksDB instance.
    at org.apache.flink.state.rocksdb.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:101)
    at org.apache.flink.state.rocksdb.restore.RestoredDBInstance.restoreTempDBInstanceFromLocalState(RestoredDBInstance.java:121)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.copyToBaseDBUsingTempDBs(RocksDBIncrementalRestoreOperation.java:788)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.mergeStateHandlesWithCopyFromTemporaryInstance(RocksDBIncrementalRestoreOperation.java:628)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.restoreFromMultipleStateHandles(RocksDBIncrementalRestoreOperation.java:446)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.restoreFromLocalState(RocksDBIncrementalRestoreOperation.java:326)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.lambda$restore$1(RocksDBIncrementalRestoreOperation.java:253)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.runAndReportDuration(RocksDBIncrementalRestoreOperation.java:893)
    at org.apache.flink.state.rocksdb.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:252)
    at org.apache.flink.state.rocksdb.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:390)
    ... 19 more
Caused by: org.rocksdb.RocksDBException: While open a file for appending: /tmp/rdb/tmp_tm_hydra-sql-adr-assoc-device-and-login-features-taskmanager-1-10_tmp_job_41471278f6601d1a7ab05da6958d83f7_op_KeyedProcessOperator_d4d5e8c74c3d05d8a9a53a9c312a6161__1_5__uuid_aadf2786-a3dd-4fa9-acaa-59d560e05ce3_b5ea62d0-713f-46c4-bd4e-a4526f117f33_LOG: File name too long
    at org.rocksdb.RocksDB.open(Native Method)
    at org.rocksdb.RocksDB.open(RocksDB.java:315)
    at org.apache.flink.state.rocksdb.RocksDBOperationUtils.openDB(RocksDBOperationUtils.java:89)

内容的提问来源于stack exchange,提问作者Clemens Valiente

火山引擎 最新活动