You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

首次使用Logstash Grok Filter:提取日志JSON字段至Elasticsearch遇阻

Fixing Logstash Grok Filter for Extracting JSON Logs with path

Hey there! I see you're struggling to get your Logstash Grok filter working correctly for extracting specific JSON segments from your access.log. Let's break this down and build a config that meets your needs.

First, Let's Diagnose the Problem

Your current Grok pattern doesn't match the actual format of your logs:

  • The timestamp in your logs isn't TIMESTAMP_ISO8601 (it uses a format like September 6th 2020, 10:30:37:759 am)
  • You need to specifically target log lines that contain a path field in their JSON payload, and parse those JSON fields into usable fields for Elasticsearch.

Step-by-Step Solution

1. Build a Grok Pattern to Match the Log Prefix

First, we'll create a Grok pattern that correctly captures the timestamp, UUID, log level, and the raw message (which can be either plain text or JSON). Here's the custom pattern we'll use:

%{MONTHNAME:log_month} %{MONTHDAY}(?:st|nd|rd|th) %{YEAR:log_year}, %{TIME:log_time} %{AMPM:log_ampm} \[%{DATA:uuid}\] %{LOGLEVEL:log-level}: %{GREEDYDATA:raw_message}

Let's break this down:

  • %{MONTHNAME:log_month}: Captures the month name (e.g., September)
  • %{MONTHDAY}(?:st|nd|rd|th): Matches the day with its suffix (1st, 2nd, etc.) without capturing the suffix itself
  • %{YEAR:log_year}: Captures the year (e.g., 2020)
  • %{TIME:log_time}: Captures the time (e.g., 10:30:37:759)
  • %{AMPM:log_ampm}: Captures am/pm
  • \[%{DATA:uuid}\]: Captures the UUID inside square brackets
  • %{LOGLEVEL:log-level}: Captures the log level (info)
  • %{GREEDYDATA:raw_message}: Captures everything after the colon, whether it's plain text or JSON

2. Filter Only JSON Lines with path

We'll use a conditional to target only log lines where raw_message contains the "path" key. Then we'll use Logstash's json filter to parse the JSON into individual fields—this is way more reliable than trying to parse JSON with Grok.

3. Convert the Log Timestamp to @timestamp

Elasticsearch works best with the @timestamp field, so we'll combine the captured timestamp parts and convert them to a proper date format.

Full Logstash Configuration

Here's the complete config putting it all together:

input {
  file {
    path => "/path/to/your/access.log"
    start_position => "beginning"
  }
}

filter {
  # Match the log prefix with Grok
  grok {
    match => { "message" => "%{MONTHNAME:log_month} %{MONTHDAY}(?:st|nd|rd|th) %{YEAR:log_year}, %{TIME:log_time} %{AMPM:log_ampm} \[%{DATA:uuid}\] %{LOGLEVEL:log-level}: %{GREEDYDATA:raw_message}" }
    remove_field => ["message"] # Clean up the original message field if not needed
  }

  # Combine timestamp parts and convert to @timestamp
  mutate {
    add_field => { "full_log_timestamp" => "%{log_month} %{MONTHDAY}, %{log_year} %{log_time} %{log_ampm}" }
    remove_field => ["log_month", "log_year", "log_time", "log_ampm", "MONTHDAY"]
  }

  date {
    match => ["full_log_timestamp", "MMMM dd, yyyy HH:mm:ss:SSS a"]
    target => "@timestamp"
    remove_field => ["full_log_timestamp"]
  }

  # Only process lines with JSON containing "path"
  if "path" in [raw_message] {
    json {
      source => "raw_message"
      remove_field => ["raw_message"] # Remove the raw JSON after parsing
    }
  }
}

output {
  elasticsearch {
    hosts => ["http://your-es-host:9200"]
    index => "access-logs-%{+YYYY.MM.dd}"
  }

  # Optional: Print to console for testing
  stdout {
    codec => rubydebug
  }
}

How to Test This

  1. First, validate your config with Logstash's built-in test tool:
    bin/logstash -f your-config.conf --config.test_and_exit
    
  2. Run Logstash with your config to see the output:
    bin/logstash -f your-config.conf
    
    You should see the parsed JSON fields (like path, originalUrl, responseCode) in the console output, and they'll be sent to your Elasticsearch index.

Key Notes

  • The json filter automatically extracts all fields from the JSON payload, so you don't need to define each one manually.
  • Lines without the path field will still be processed (their raw_message will remain as plain text), but won't get parsed into JSON fields.
  • We clean up unnecessary fields to keep your Elasticsearch documents tidy.

内容的提问来源于stack exchange,提问作者Shivam Gupta

火山引擎 最新活动