Hive Metastore无法识别已运行的Hadoop DataNode问题求助
我刚用Docker Compose部署了一个单NameNode+单DataNode的Hadoop集群,启动Hive时碰到了个头疼的问题:明明DataNode已经在运行,但Hive Metastore就是找不到它,日志里反复刷这条信息:
namenode:9870 is available. check for datanode:9871... datanode:9871 is not available yet try in 5s once again ...
先贴一下我的docker-compose.yml配置:
#HADOOP namenode: image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8 container_name: namenode restart: always expose: - "9870" - "54310" - "9000" ports: - 9870:9870 - 9000:9000 volumes: - ./data/hadoop_data/:/hadoop_data environment: - CLUSTER_NAME=test - CORE_CONF_fs_defaultFS=hdfs://namenode:9000 - CORE_CONF_hadoop_http_staticuser_user=root - CORE_CONF_hadoop_proxyuser_hue_hosts=* - CORE_CONF_hadoop_proxyuser_hue_groups=* - CORE_CONF_io_compression_codecs=org.apache.hadoop.io.compress.SnappyCodec - HDFS_CONF_dfs_webhdfs_enabled=true - HDFS_CONF_dfs_permissions_enabled=false - HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false - HDFS_CONF_dfs_safemode_threshold_pct=0 datanode: image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8 container_name: datanode restart: always expose: - "9871" environment: SERVICE_PRECONDITION: "namenode:9870" ports: - "9871:9871" env_file: - hive.env hive-server: image: bde2020/hive:2.3.2-postgresql-metastore container_name: hive-server volumes: - ./employee:/employee env_file: - hive.env environment: HIVE_CORE_CONF_javax_jdo_option_ConnectionURL: "jdbc:postgresql://hive-metastore/metastore" SERVICE_PRECONDITION: "hive-metastore:9083" depends_on: - hive-metastore ports: - "10000:10000" hive-metastore: image: bde2020/hive:2.3.2-postgresql-metastore container_name: hive-metastore env_file: - hive.env command: /opt/hive/bin/hive --service metastore environment: SERVICE_PRECONDITION: "namenode:9870 datanode:9871 hive-metastore-postgresql:5432" depends_on: - hive-metastore-postgresql ports: - "9083:9083" hive-metastore-postgresql: image: bde2020/hive-metastore-postgresql:2.3.0 container_name: hive-metastore-postgresql volumes: - ./metastore-postgresql/postgresql/data:/var/lib/postgresql/data depends_on: - datanode
问题根源与解决方案
经过排查,核心问题出在健康检查的端口配置上,结合几个小调整就能解决:
1. 修正SERVICE_PRECONDITION的端口错误
bde2020镜像里的SERVICE_PRECONDITION是通过HTTP请求来检查服务可用性的,但你指定的datanode:9871是DataNode的IPC通信端口(dfs.datanode.address),并不支持HTTP探测。Hadoop 3.x中DataNode的Web UI端口是9864,要么换成这个端口检查,要么干脆去掉DataNode的端口检查——因为只要NameNode正常,DataNode成功注册后HDFS就处于可用状态。
修改hive-metastore的环境变量:
hive-metastore: # ... 其他配置保留 environment: SERVICE_PRECONDITION: "namenode:9870 hive-metastore-postgresql:5432"
2. 确认DataNode已成功注册到NameNode
先访问NameNode的Web UI(http://localhost:9870),进入DataNodes页面,看你的DataNode是否出现在列表里。如果没注册,查看DataNode的日志找原因:
docker logs datanode
你已经配置了HDFS_CONF_dfs_namenode_datanode_registration_ip___hostname___check=false,这个能避免主机名解析的坑,大概率注册是正常的。
3. 添加更可靠的健康检查(可选但推荐)
给DataNode添加基于NameNode API的健康检查,确保它真的完成了注册:
datanode: # ... 其他配置保留 healthcheck: test: ["CMD", "curl", "-f", "http://namenode:9870/jmx?qry=Hadoop:service=DataNode,name=DataNodeInfo"] interval: 10s timeout: 5s retries: 5
然后让hive-metastore依赖DataNode的健康状态:
hive-metastore: # ... 其他配置保留 depends_on: - hive-metastore-postgresql - datanode healthcheck: test: ["CMD", "nc", "-zv", "localhost", "9083"] interval: 10s timeout: 5s retries: 5
4. 确保Hive Metastore的HDFS配置正确
检查你的hive.env文件,确保里面的HDFS配置和NameNode一致,比如:
CORE_CONF_fs_defaultFS=hdfs://namenode:9000
这样Hive才能正确连接到HDFS集群,不依赖错误的DataNode端口检查。
总结
核心问题就是用了错误的端口做健康探测——9871是IPC端口,不支持HTTP健康检查,导致Hive Metastore误以为DataNode没启动。调整SERVICE_PRECONDITION之后,再确认DataNode的注册状态,问题就能解决。
内容的提问来源于stack exchange,提问作者leop




