如何从日志文件中筛选指定关键词？附示例日志及grep命令

阿华AIGC实验室

2026-5-20

Extracting count Queries and dagName Entries from Hive Logs

Hey there! Looks like you're trying to sift through Hive logs to pull out entries related to count queries and dagName values. Your initial grep command is a solid starting point, but we can refine it to get cleaner, more focused output. Let's walk through a couple of ways to do this:

Approach 1: Refine grep with sed for Cleaned-Up Output

This combo lets you isolate exactly the parts you care about and format them nicely:

grep -E 'Dag name:|count\(\*\)' log.log | sed -E 's/.*Dag name: (.*)/Dag Name: \1/; s/.*(select count\(\*\) from [^ ]*)/Count Query: \1/'

What this does:

grep -E uses extended regex to match either "Dag name:" or "count(*)".
sed strips out the extra log noise and formats the output with clear labels.

Approach 2: Use awk for Targeted Extraction

Awk is great for parsing structured log lines. This command directly pulls the relevant substrings while retaining timestamps for context:

awk '
    /Dag name:/ {
        print "Timestamp: " $1 " | Dag Name: " substr($0, index($0, "Dag name:") + 9)
    }
    /count\(\*\)/ {
        print "Timestamp: " $1 " | Count Query: " substr($0, index($0, "select count(*)"))
    }
' log.log

Example Output:

Timestamp: 2018-03-20T15:53:24,001 | Dag Name: select count() from reportingperiod(Stage-1)
Timestamp: 2018-03-20T15:53:24,001 | Count Query: select count() from reportingperiod(Stage-1)

Note on Incomplete Log Lines

Your second log line cuts off at dagName... — if you're seeing truncated entries regularly, you might want to check if your log rotation setup is splitting lines, or use tools like multitail to view logs in real-time without truncation.

内容的提问来源于stack exchange，提问作者Teju Priya