如何编写Hive SQL实现列数据拆分并生成多条记录(不使用默认函数)
Got it, let's work through this problem step by step. You need to split both the colon-separated Phone column and comma-separated Address column, then pair each split value with the original Name to create individual rows—all without relying on Hive's standard split/explode functions. Here's a practical approach:
Step 1: Create a Number Sequence Helper
First, we need a way to target each split position (1st, 2nd, 3rd value) since your sample data has exactly 3 entries per column. We'll use a CTE to generate this sequence:
WITH number_seq AS ( SELECT 1 AS idx UNION ALL SELECT 2 AS idx UNION ALL SELECT 3 AS idx )
If your actual data has more than 3 split values, just add more SELECT n AS idx UNION ALL lines to cover the maximum number of splits you expect.
Step 2: Main Query to Extract Split Values
Next, we'll join our original table with this number sequence, then use basic string functions to extract the nth value from each delimited column:
WITH number_seq AS ( SELECT 1 AS idx UNION ALL SELECT 2 AS idx UNION ALL SELECT 3 AS idx ) SELECT original.Name, -- Extract the nth Phone value (colon-separated) TRIM( SUBSTRING( original.Phone, -- Start position: 1 for first value, after (idx-1)th colon for others CASE idx WHEN 1 THEN 1 ELSE INSTR(original.Phone, ':', 1, idx-1) + 1 END, -- Length of the substring: from start to next colon (or end of string) CASE idx WHEN 3 THEN LENGTH(original.Phone) - INSTR(original.Phone, ':', 1, 2) ELSE INSTR(original.Phone, ':', 1, idx) - CASE idx WHEN 1 THEN 0 ELSE INSTR(original.Phone, ':', 1, idx-1) END - 1 END ) ) AS Phone, -- Extract the nth Address value (comma-separated) TRIM( SUBSTRING( original.Address, -- Start position: 1 for first value, after (idx-1)th comma for others CASE idx WHEN 1 THEN 1 ELSE INSTR(original.Address, ',', 1, idx-1) + 1 END, -- Length of the substring: from start to next comma (or end of string) CASE idx WHEN 3 THEN LENGTH(original.Address) - INSTR(original.Address, ',', 1, 2) ELSE INSTR(original.Address, ',', 1, idx) - CASE idx WHEN 1 THEN 0 ELSE INSTR(original.Address, ',', 1, idx-1) END - 1 END ) ) AS Address FROM your_table_name original CROSS JOIN number_seq WHERE -- Filter out rows where the nth value doesn't exist (for variable-length splits) (idx = 1 OR INSTR(original.Phone, ':', 1, idx-1) > 0) AND (idx = 1 OR INSTR(original.Address, ',', 1, idx-1) > 0) ORDER BY original.Name, idx;
How This Works
- Number Sequence: The
number_seqCTE gives us an index for each split position, letting us process each value in the delimited columns one by one. - String Extraction:
INSTRfinds the position of the nth delimiter (colon for Phone, comma for Address).SUBSTRINGuses those positions to pull out the substring between the previous delimiter (or start of the string) and the next delimiter (or end of the string).TRIMcleans up any accidental whitespace (a safe guard for real-world data).
- Filtering: The
WHEREclause ensures we don't generate rows for split positions that don't exist in the original data (e.g., if a row only has 2 Phone numbers, we won't create a 3rd row for it).
Notes for Adaptation
- Replace
your_table_namewith the actual name of your Hive table containing the input data. - If your data has a variable number of splits (some rows have 2, some have 4, etc.), expand the
number_seqCTE to cover the maximum number of splits you'll encounter. TheWHEREclause will automatically exclude any unused indices for rows with fewer splits.
内容的提问来源于stack exchange,提问作者Manindar




