如何从Hive的JSON列中获取与源数据一致的原始字符串?
get_json_object() Question
I’m running this SQL query to extract a specific JSON key’s value:
select get_json_object(xkey,'$.note') from db_name.table_name where id=638112577;
The source JSON contains this note key value:
{....,"note": "\u2604\ufe0freality stone \n\n\u2604\ufe0freality stone \n\n \u2604\ufe0freality stone",....}
But the result returned by get_json_object() looks fragmented like this:
(':comet:reality stone ',) ('',) (':comet:reality stone ',) ('',) (' :comet:reality stone',)
Is it possible to get the exact, unmodified original string value that matches the note key in the source JSON?
Answer
Absolutely—you can get the exact original string value corresponding to the note key. The fragmented output you’re seeing is almost certainly a formatting issue with your SQL client, not a flaw in the get_json_object() function itself. Here’s how to fix it:
Check your SQL client’s display settings
Most SQL clients automatically split results on line breaks (\n) or truncate long strings for readability. Look for settings to disable auto-splitting, enable full string display, or adjust the maximum column width. Once you tweak these, you should see the full, unbroken string exactly as it exists in the source JSON::comet:reality stone :comet:reality stone :comet:reality stoneUse an alternative JSON function if needed
If you’re still having issues (depending on your SQL engine, like Hive), try usingjson_tupleinstead—it’s designed to return raw JSON values without unexpected formatting:SELECT json_tuple(xkey, 'note') AS full_note FROM db_name.table_name WHERE id=638112577;Verify with a wrapper function
To confirm you’re getting the exact value (including leading/trailing spaces and line breaks), wrap the result in quotes usingconcat. This makes whitespace easier to spot in your output:SELECT concat('"', get_json_object(xkey,'$.note'), '"') AS quoted_note FROM db_name.table_name WHERE id=638112577;
A quick note on Unicode characters
The \u2604\ufe0f in your source JSON is a Unicode escape sequence for the comet emoji (:comet:). get_json_object() correctly parses this into the actual emoji character, which is part of the original string value—this is intentional behavior, not a modification of the data.
内容的提问来源于stack exchange,提问作者Devesh Ahuja




