You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Apache Hive字符串转时间戳及CSV时间字段转NULL问题求助

Hey there! Let's work through your Apache Hive timestamp conversion questions step by step:

1. Converting the string "2017-10-31 20:42:00" to a Hive Timestamp

Hive offers two straightforward ways to handle this, since your string uses the standard yyyy-MM-dd HH:mm:ss format that Hive recognizes by default:

  • Using CAST(): This is the simplest method for standard formats:
    SELECT CAST('2017-10-31 20:42:00' AS TIMESTAMP);
    
  • Using to_timestamp(): This function is more flexible if you ever need to handle non-standard formats (you can explicitly define the pattern). For your case, both versions work:
    -- Omit the format pattern (Hive auto-detects standard formats)
    SELECT to_timestamp('2017-10-31 20:42:00');
    
    -- Explicitly specify the format for clarity/consistency
    SELECT to_timestamp('2017-10-31 20:42:00', 'yyyy-MM-dd HH:mm:ss');
    
2. Fixing NULL values when converting CSV string columns to Timestamp

It sounds like the issue is likely related to how your CSV data was parsed during import (specifically those surrounding double quotes) or unhandled invalid data. Here's how to resolve it:

First, diagnose the root cause

If converting returns NULL, check if your string columns still contain double quotes (like "2017-10-23 11:07:00" instead of 2017-10-23 11:07:00). Hive can't parse quoted strings as timestamps directly. Run this query to verify:

SELECT start_time FROM your_table WHERE start_time LIKE '"%"';

Solutions to fix NULL values

  • Clean existing quoted strings before conversion
    If your columns have leftover quotes, strip them first using regexp_replace() before converting:

    SELECT 
      to_timestamp(regexp_replace(start_time, '"', ''), 'yyyy-MM-dd HH:mm:ss') AS start_time_ts,
      to_timestamp(regexp_replace(end_time, '"', ''), 'yyyy-MM-dd HH:mm:ss') AS end_time_ts,
      to_timestamp(regexp_replace(last_updated, '"', ''), 'yyyy-MM-dd HH:mm:ss') AS last_updated_ts
    FROM your_table;
    
  • Import CSV correctly with quote handling (for future imports)
    To avoid this issue entirely when importing, use Hive's OpenCSVSerde to automatically parse quoted values. Create your table like this before uploading the CSV:

    CREATE TABLE your_table (
      start_time STRING,
      end_time STRING,
      last_updated STRING
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
      "separatorChar" = ",",
      "quoteChar" = "\"",
      "escapeChar" = "\\"
    )
    STORED AS TEXTFILE;
    

    This will strip the double quotes during import, so your string columns will have clean timestamp values ready for conversion.

  • Create a new table with Timestamp types (permanent solution)
    If you want to permanently switch the columns to Timestamp, create a new table with the correct types and insert converted data:

    -- Create the new table with Timestamp columns
    CREATE TABLE your_table_clean (
      start_time TIMESTAMP,
      end_time TIMESTAMP,
      last_updated TIMESTAMP
    )
    STORED AS TEXTFILE; -- Match your original table's storage format (ORC, Parquet, etc.)
    
    -- Insert converted data from the old table
    INSERT INTO your_table_clean
    SELECT 
      to_timestamp(regexp_replace(start_time, '"', ''), 'yyyy-MM-dd HH:mm:ss'),
      to_timestamp(regexp_replace(end_time, '"', ''), 'yyyy-MM-dd HH:mm:ss'),
      to_timestamp(regexp_replace(last_updated, '"', ''), 'yyyy-MM-dd HH:mm:ss')
    FROM your_table;
    
  • Check for invalid timestamp values
    If you still get NULLs after cleaning quotes, check for invalid dates/times (like February 30th, or times like 25:00:00) with this query:

    SELECT start_time 
    FROM your_table 
    WHERE start_time NOT RLIKE '^"\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"$';
    

    Fix any invalid entries before converting.


内容的提问来源于stack exchange,提问作者Shantesh

火山引擎 最新活动