使用dpkt从pcap文件提取SMTP邮件地址的Python技术求助
Hey there! I see you've already nailed the basics of reading a PCAP file with dpkt—great start! Let's build on your existing code to pull out those SMTP From: and To: email addresses. Here's a step-by-step breakdown of how to make it work:
Step 1: Filter for Valid SMTP Traffic
First, we need to narrow down to actual SMTP packets. SMTP typically uses port 25 (default) or 587 (submission port), so we'll check if the TCP segment uses either of these ports. We also need to skip packets with empty TCP payloads—they won't have the SMTP headers we're looking for.
Step 2: Parse the SMTP Payload
SMTP is a text-based protocol, so we can convert the TCP payload from bytes to a string, split it into lines, and hunt for lines starting with From: or To:. A simple regex will help us cleanly extract the email address from those lines.
Updated Code with Explanations
Here's your modified code with all the necessary logic, plus comments to make it easy to follow:
import dpkt import re def extract_smtp_emails(pcap_path): # Basic regex to match standard email addresses (works for most SMTP header cases) email_regex = re.compile(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}') with open(pcap_path, 'rb') as f: pcap = dpkt.pcap.Reader(f) for ts, buf in pcap: try: # Parse the Ethernet frame eth = dpkt.ethernet.Ethernet(buf) # Skip if the payload isn't an IP packet if not isinstance(eth.data, dpkt.ip.IP): continue ip = eth.data # Skip if the IP payload isn't a TCP segment if not isinstance(ip.data, dpkt.tcp.TCP): continue tcp = ip.data # Filter for SMTP ports (check both source and destination) smtp_ports = {25, 587} if tcp.dport not in smtp_ports and tcp.sport not in smtp_ports: continue # Skip empty TCP payloads if not tcp.data: continue # Convert payload bytes to string (ignore encoding errors for malformed data) try: smtp_content = tcp.data.decode('utf-8', errors='ignore') except: continue # Search each line for From: and To: headers for line in smtp_content.splitlines(): line_clean = line.strip() if line_clean.lower().startswith('from:'): email_match = email_regex.search(line_clean) if email_match: print(f"Found From email: {email_match.group()}") elif line_clean.lower().startswith('to:'): email_match = email_regex.search(line_clean) if email_match: print(f"Found To email: {email_match.group()}") except Exception as e: # Skip malformed packets to avoid crashing the script continue # Run the function with your PCAP file extract_smtp_emails('test.pcap')
Key Details to Note:
- Error Handling: The try/except blocks skip any malformed packets (like non-IP/TCP frames) which is super common when parsing real-world PCAPs.
- Regex Flexibility: The email regex is basic but covers most standard addresses. You can tweak it if you need to handle edge cases like quoted local parts.
- Port Checking: We check both source and destination ports because SMTP uses a client-server model—responses from the server will also use port 25/587.
- Payload Decoding: Using
errors='ignore'ensures we don't get stuck on non-UTF-8 bytes that might sneak into the SMTP traffic.
Give this a test run with your test.pcap file, and let me know if you hit any snags!
内容的提问来源于stack exchange,提问作者sophia




