Spark作业提交失败：Spark-solr Twitter示例运行异常求助

阿华AIGC实验室

2026-5-26

Let's break down the possible issues and troubleshoot your failed spark-solr Twitter job submission step by step:

1. Mismatched spark-solr Jar Version

You mentioned trying to use spark-solr-3.4.4-shaded.jar, but your spark-submit command references ./target/spark-solr-3.1.1-shaded.jar—the version numbers don't match! This is a common source of class loading errors or compatibility issues.

Fix the command: Ensure the jar path and filename exactly match the version you intend to use. For example, if the 3.4.4 jar is in your current directory, update the path to ./spark-solr-3.4.4-shaded.jar, or confirm the jar in the target directory is indeed the 3.4.4 version.

2. Increase Log Verbosity to See Full Error Details

Right now you only get truncated INFO Con... logs, which means the default INFO-level logging isn't showing the critical error details. Adjust the logging configuration to capture what's actually failing:

Add a log configuration parameter to your spark-submit command, like:

--conf "spark.driver.extraJavaOptions=-Dlog4j.rootLogger=ERROR,console"

Alternatively, point to a custom log4j.properties file that sets the root logger to DEBUG or ERROR. This will print the full error stack trace, letting you pinpoint whether it's an auth issue, connection problem, or missing dependency.

3. Validate Twitter4J OAuth Credentials

You've included placeholders ? for your Twitter OAuth keys—these need to be replaced with valid, working credentials from the Twitter Developer Portal:

Double-check that consumerKey, consumerSecret, accessToken, and accessTokenSecret are all correctly copied (no typos, extra spaces, or missing characters).
Test the credentials independently with a simple Twitter4J test script to confirm they can successfully connect to the Twitter API, ruling out authentication as the root cause.

4. Verify Solr Cluster and Collection Status

Confirm the zkHost localhost:9983 is the correct ZooKeeper address for your Solr cluster, and that the ZooKeeper service is running and reachable from your Spark node (no firewall or network blocks).
Check if the socialdata collection exists in Solr: Use the Solr Admin UI (default at http://localhost:8983/solr) or run the command bin/solr collection -list to verify. If the collection doesn't exist, create it before running the job.

5. Check Spark and spark-solr Compatibility

Make sure your spark-solr version (3.4.4) is compatible with your installed Spark version. For example, spark-solr 3.4.x typically pairs with Spark 3.4.x—if you're running an older Spark version (like 2.x), you'll run into compatibility conflicts.

Cross-reference the spark-solr version documentation to confirm it matches your Spark runtime, or switch to a spark-solr version built for your Spark release.

内容的提问来源于stack exchange，提问作者Jill Clover