Hadoop集群中Hive Testbench TPC-H/TPC-DS数据生成失败求助

阿华AIGC实验室

2026-5-15

Troubleshooting Hive Testbench Data Generation Failures on Hadoop 2.9.0 + Hive 2.3.0 + Tez 0.9.0

Let's break down the issues you're facing and walk through actionable troubleshooting steps:

Key Observations from Your Logs

Both TPC-H and TPC-DS MapReduce/Tez jobs report "completed successfully" but have 0 counters, which is highly abnormal—this means no actual data was processed or written.
The target HDFS temp directories remain empty despite job completion.
TPC-DS fails during the table optimization step for store_sales and store_returns, which is a downstream effect of missing raw data.

Step 1: Verify Temp Directory Paths (Local vs HDFS)

The most common root cause here is a mismatch between the script's expected output location and where the data is actually being written:

Open tpch-setup.sh and tpcds-setup.sh, look for the GENERATE_DIR variable.
- If it's set to a local path like /tmp/tpch-generate, that's a problem—MapReduce/Tez tasks run on cluster nodes, so local data won't be accessible across the cluster or visible in your HDFS temp directory.
- Update it to an HDFS path, e.g.:
```
GENERATE_DIR="hdfs://192.168.10.15:8020/tmp/tpch-generate"
```
Check if local temp directories on your cluster nodes have any generated data:
```
ssh <cluster-node> ls /tmp/tpch-generate/10/
```
If you see files here, the script is generating data locally but failing to upload it to HDFS. Look for hdfs dfs -put commands in the setup scripts and verify they're executing without errors.

Step 2: Fix HDFS Permissions

Your user (rapids) needs full read/write access to the temp directories:

Create and set permissions for the TPC-H temp directory:

hdfs dfs -mkdir -p /tmp/tpch-generate/10
hdfs dfs -chmod -R 777 /tmp/tpch-generate

Repeat the same for TPC-DS:

hdfs dfs -mkdir -p /tmp/tpcds-generate/10
hdfs dfs -chmod -R 777 /tmp/tpcds-generate

Test your user's HDFS access:
```
hdfs dfs -touchz /tmp/test_rapids && hdfs dfs -rm /tmp/test_rapids
```
If this fails, you need to adjust HDFS ACLs or add your user to the appropriate group.

Step 3: Inspect Map Task Logs for Hidden Errors

Even though the job says "completed successfully", individual map tasks might have silently failed or skipped data processing. Here's how to check:

Open the YARN WebUI at http://boray05:8088 (from your logs).
Locate the relevant job (e.g., job_1514226810133_0050 for TPC-H).
Click into the job, then view the logs for each map task. Look for:
- Errors related to input file not found
- Permission denied when writing output
- Issues with Tez container initialization

Step 4: Validate Tez Configuration Compatibility

Your Tez setup looks mostly correct, but let's confirm a few critical points:

Ensure Hive is configured to use Tez:
Check hive-site.xml for this property, or run:
```
hive -e "set hive.execution.engine;"
```
It should return tez.
Verify the Tez lib path exists in HDFS:
```
hdfs dfs -ls hdfs://192.168.10.15:8020/apps/tez/
```
You should see Tez jar files here. If not, re-deploy Tez to HDFS.
Resolve SLF4J binding conflicts (optional but improves logging):
Your logs show SLF4J: Class path contains multiple SLF4J bindings. This can cause log suppression—remove conflicting SLF4J jars from Hive/Tez/Hadoop lib directories (keep one consistent version, e.g., slf4j-api-1.7.25.jar).

Step 5: Check Version Compatibility

While your stack should be compatible (Hadoop 2.9.0 supports Hive 2.3.0 and Tez 0.9.0), double-check:

Tez 0.9.0 requires Hadoop 2.7 or later (you're on 2.9.0, which is fine)
Hive 2.3.0's Tez integration is stable with Tez 0.9.0

If you've gone through all these steps and still have issues, share the map task logs from YARN—they'll give us the specific error that's being hidden by the "success" job status.

内容的提问来源于stack exchange，提问作者robert