首次使用Logstash Grok Filter:提取日志JSON字段至Elasticsearch遇阻
path Hey there! I see you're struggling to get your Logstash Grok filter working correctly for extracting specific JSON segments from your access.log. Let's break this down and build a config that meets your needs.
First, Let's Diagnose the Problem
Your current Grok pattern doesn't match the actual format of your logs:
- The timestamp in your logs isn't
TIMESTAMP_ISO8601(it uses a format likeSeptember 6th 2020, 10:30:37:759 am) - You need to specifically target log lines that contain a
pathfield in their JSON payload, and parse those JSON fields into usable fields for Elasticsearch.
Step-by-Step Solution
1. Build a Grok Pattern to Match the Log Prefix
First, we'll create a Grok pattern that correctly captures the timestamp, UUID, log level, and the raw message (which can be either plain text or JSON). Here's the custom pattern we'll use:
%{MONTHNAME:log_month} %{MONTHDAY}(?:st|nd|rd|th) %{YEAR:log_year}, %{TIME:log_time} %{AMPM:log_ampm} \[%{DATA:uuid}\] %{LOGLEVEL:log-level}: %{GREEDYDATA:raw_message}
Let's break this down:
%{MONTHNAME:log_month}: Captures the month name (e.g., September)%{MONTHDAY}(?:st|nd|rd|th): Matches the day with its suffix (1st, 2nd, etc.) without capturing the suffix itself%{YEAR:log_year}: Captures the year (e.g., 2020)%{TIME:log_time}: Captures the time (e.g., 10:30:37:759)%{AMPM:log_ampm}: Captures am/pm\[%{DATA:uuid}\]: Captures the UUID inside square brackets%{LOGLEVEL:log-level}: Captures the log level (info)%{GREEDYDATA:raw_message}: Captures everything after the colon, whether it's plain text or JSON
2. Filter Only JSON Lines with path
We'll use a conditional to target only log lines where raw_message contains the "path" key. Then we'll use Logstash's json filter to parse the JSON into individual fields—this is way more reliable than trying to parse JSON with Grok.
3. Convert the Log Timestamp to @timestamp
Elasticsearch works best with the @timestamp field, so we'll combine the captured timestamp parts and convert them to a proper date format.
Full Logstash Configuration
Here's the complete config putting it all together:
input { file { path => "/path/to/your/access.log" start_position => "beginning" } } filter { # Match the log prefix with Grok grok { match => { "message" => "%{MONTHNAME:log_month} %{MONTHDAY}(?:st|nd|rd|th) %{YEAR:log_year}, %{TIME:log_time} %{AMPM:log_ampm} \[%{DATA:uuid}\] %{LOGLEVEL:log-level}: %{GREEDYDATA:raw_message}" } remove_field => ["message"] # Clean up the original message field if not needed } # Combine timestamp parts and convert to @timestamp mutate { add_field => { "full_log_timestamp" => "%{log_month} %{MONTHDAY}, %{log_year} %{log_time} %{log_ampm}" } remove_field => ["log_month", "log_year", "log_time", "log_ampm", "MONTHDAY"] } date { match => ["full_log_timestamp", "MMMM dd, yyyy HH:mm:ss:SSS a"] target => "@timestamp" remove_field => ["full_log_timestamp"] } # Only process lines with JSON containing "path" if "path" in [raw_message] { json { source => "raw_message" remove_field => ["raw_message"] # Remove the raw JSON after parsing } } } output { elasticsearch { hosts => ["http://your-es-host:9200"] index => "access-logs-%{+YYYY.MM.dd}" } # Optional: Print to console for testing stdout { codec => rubydebug } }
How to Test This
- First, validate your config with Logstash's built-in test tool:
bin/logstash -f your-config.conf --config.test_and_exit - Run Logstash with your config to see the output:
You should see the parsed JSON fields (likebin/logstash -f your-config.confpath,originalUrl,responseCode) in the console output, and they'll be sent to your Elasticsearch index.
Key Notes
- The
jsonfilter automatically extracts all fields from the JSON payload, so you don't need to define each one manually. - Lines without the
pathfield will still be processed (theirraw_messagewill remain as plain text), but won't get parsed into JSON fields. - We clean up unnecessary fields to keep your Elasticsearch documents tidy.
内容的提问来源于stack exchange,提问作者Shivam Gupta




