AWS Athena中替代STUFF、GROUP BY的SQL实现及多表查询咨询

阿华AIGC实验室

2026-5-20

AWS Athena SQL Solutions: String Aggregation & Retrieval Table Queries

Hey there! Let's break down your Athena SQL challenges step by step.

1. 替代STUFF + FOR XML PATH的字符串拼接方案

Since Athena (built on Presto/Trino) doesn't support SQL Server's STUFF and FOR XML PATH for string concatenation, the go-to alternative is using array_agg() combined with array_join(). Here's how to apply it to your search table:

Suppose you want to combine all term values for the same id into a single comma-separated string:

SELECT 
  id,
  array_join(array_agg(term), ', ') AS combined_terms
FROM search
GROUP BY id;

array_agg(term): Aggregates all term values for each id into an array.
array_join(..., ', '): Converts the array into a human-readable string, using the separator of your choice (replace ', ' with any delimiter like ';' or '|').

If you need to sort the terms before concatenation, add an ORDER BY inside array_agg:

SELECT 
  id,
  array_join(array_agg(term ORDER BY term), ', ') AS sorted_combined_terms
FROM search
GROUP BY id;

2. Retrieval表的ID相关查询解决办法

Since you didn't specify the exact requirement for the retrieval table, here are common scenarios with example queries:

Scenario 1: Fetch all records for a specific ID

SELECT * 
FROM retrieval 
WHERE id = 'your-target-id'; -- Replace with your actual ID value

Scenario 2: Join with the `search` table to link terms and retrieval data

If you want to combine data from both tables based on matching id:

SELECT 
  s.id,
  s.term,
  r.* -- Adjust to select specific columns from retrieval instead of all
FROM search s
JOIN retrieval r ON s.id = r.id;

Use LEFT JOIN instead if you want to include search records even when there's no matching entry in retrieval.

Scenario 3: Aggregate statistics per ID

For example, count how many records exist per id in the retrieval table:

SELECT 
  id,
  COUNT(*) AS record_count,
  MAX(some_numeric_column) AS max_value -- Replace with your actual column
FROM retrieval
GROUP BY id
ORDER BY record_count DESC;

If you have a more specific requirement (like filtering by other columns in retrieval), feel free to refine the ask, but these should cover most common use cases.

内容的提问来源于stack exchange，提问作者Achal