如何从日志文件中筛选指定关键词?附示例日志及grep命令
Hey there! Looks like you're trying to sift through Hive logs to pull out entries related to count queries and dagName values. Your initial grep command is a solid starting point, but we can refine it to get cleaner, more focused output. Let's walk through a couple of ways to do this:
Approach 1: Refine grep with sed for Cleaned-Up Output
This combo lets you isolate exactly the parts you care about and format them nicely:
grep -E 'Dag name:|count\(\*\)' log.log | sed -E 's/.*Dag name: (.*)/Dag Name: \1/; s/.*(select count\(\*\) from [^ ]*)/Count Query: \1/'
What this does:
grep -Euses extended regex to match either "Dag name:" or "count(*)".sedstrips out the extra log noise and formats the output with clear labels.
Approach 2: Use awk for Targeted Extraction
Awk is great for parsing structured log lines. This command directly pulls the relevant substrings while retaining timestamps for context:
awk ' /Dag name:/ { print "Timestamp: " $1 " | Dag Name: " substr($0, index($0, "Dag name:") + 9) } /count\(\*\)/ { print "Timestamp: " $1 " | Count Query: " substr($0, index($0, "select count(*)")) } ' log.log
Example Output:
Timestamp: 2018-03-20T15:53:24,001 | Dag Name: select count() from reportingperiod(Stage-1)
Timestamp: 2018-03-20T15:53:24,001 | Count Query: select count() from reportingperiod(Stage-1)
Note on Incomplete Log Lines
Your second log line cuts off at dagName... — if you're seeing truncated entries regularly, you might want to check if your log rotation setup is splitting lines, or use tools like multitail to view logs in real-time without truncation.
内容的提问来源于stack exchange,提问作者Teju Priya




