Spark作业提交失败:Spark-solr Twitter示例运行异常求助
Let's break down the possible issues and troubleshoot your failed spark-solr Twitter job submission step by step:
1. Mismatched spark-solr Jar Version
You mentioned trying to use spark-solr-3.4.4-shaded.jar, but your spark-submit command references ./target/spark-solr-3.1.1-shaded.jar—the version numbers don't match! This is a common source of class loading errors or compatibility issues.
- Fix the command: Ensure the jar path and filename exactly match the version you intend to use. For example, if the 3.4.4 jar is in your current directory, update the path to
./spark-solr-3.4.4-shaded.jar, or confirm the jar in thetargetdirectory is indeed the 3.4.4 version.
2. Increase Log Verbosity to See Full Error Details
Right now you only get truncated INFO Con... logs, which means the default INFO-level logging isn't showing the critical error details. Adjust the logging configuration to capture what's actually failing:
- Add a log configuration parameter to your
spark-submitcommand, like:--conf "spark.driver.extraJavaOptions=-Dlog4j.rootLogger=ERROR,console" - Alternatively, point to a custom log4j.properties file that sets the root logger to DEBUG or ERROR. This will print the full error stack trace, letting you pinpoint whether it's an auth issue, connection problem, or missing dependency.
3. Validate Twitter4J OAuth Credentials
You've included placeholders ? for your Twitter OAuth keys—these need to be replaced with valid, working credentials from the Twitter Developer Portal:
- Double-check that
consumerKey,consumerSecret,accessToken, andaccessTokenSecretare all correctly copied (no typos, extra spaces, or missing characters). - Test the credentials independently with a simple Twitter4J test script to confirm they can successfully connect to the Twitter API, ruling out authentication as the root cause.
4. Verify Solr Cluster and Collection Status
- Confirm the
zkHost localhost:9983is the correct ZooKeeper address for your Solr cluster, and that the ZooKeeper service is running and reachable from your Spark node (no firewall or network blocks). - Check if the
socialdatacollection exists in Solr: Use the Solr Admin UI (default athttp://localhost:8983/solr) or run the commandbin/solr collection -listto verify. If the collection doesn't exist, create it before running the job.
5. Check Spark and spark-solr Compatibility
Make sure your spark-solr version (3.4.4) is compatible with your installed Spark version. For example, spark-solr 3.4.x typically pairs with Spark 3.4.x—if you're running an older Spark version (like 2.x), you'll run into compatibility conflicts.
- Cross-reference the spark-solr version documentation to confirm it matches your Spark runtime, or switch to a spark-solr version built for your Spark release.
内容的提问来源于stack exchange,提问作者Jill Clover




