You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

通过Spark UI REST API获取扫描输入记录量等指标时遭遇应用识别错误的技术求助

Troubleshooting Spark REST API /sql Endpoint Error & Getting Your Target Metrics

Hey there, let's work through this issue you're having with Spark's REST API. I've dealt with similar Spark API quirks before, so here are some actionable steps to fix the error and get the metrics you need:

1. Verify if Your Application Has Executed SQL Queries

The /api/v1/applications/[app-id]/sql endpoint only returns data if your Spark application has actually run SQL queries or DataFrame operations that translate to SQL under the hood. If your job is purely based on RDDs with no SQL/DataFrame logic, this endpoint will throw an error because there's no SQL execution data to return.

Since your /stages endpoint works, double-check if your application includes any SQL statements, DataFrame.createOrReplaceTempView() calls, or DataFrame transformations that would trigger Spark's SQL engine.

2. Double-Check Spark Version Compatibility

Spark's REST API endpoints have evolved over versions. Make sure the version of Spark you're running matches the documentation you're referencing. For example:

  • Older Spark versions (pre-2.4) might not have the /sql endpoint at all, or might use a different path like /sql/executions.
  • Some minor versions have had bugs with the /sql endpoint that were fixed in later releases.

You can confirm your Spark version by running:

spark-submit --version

3. Alternative: Get Your Target Metrics from the /stages Endpoint

Since you can successfully call the /stages endpoint, you can extract the exact metrics you need from there without relying on the /sql endpoint:

  • Scan input records size: Look for the inputRecords field under the metrics object of each stage. This counts the total records read from data sources.
  • Shuffle read size: Look for shuffleReadBytes under the metrics object of each stage. This gives the total bytes read during shuffle operations.

Here's a snippet of what the relevant part of a stage response might look like:

{
  "stageId": 0,
  "metrics": {
    "inputRecords": 100000,
    "shuffleReadBytes": 52428800
  }
}

If you need to map these stage metrics to specific SQL queries, you can correlate stages with SQL executions using the jobGroup field in stage data, or check the description field which often includes references to SQL query IDs.

4. Validate Request Details

  • Ensure you're using a GET request (the /sql endpoint only supports GET).
  • Confirm the [app-id] you're using is exactly the same as returned by /api/v1/applications (call this endpoint to list all active applications and verify the ID matches).

内容的提问来源于stack exchange,提问作者etiel

火山引擎 最新活动