如何用Boto3 Python脚本遍历S3存储桶的目录与文件?
I totally get your frustration—S3's "folder" structure can feel counterintuitive at first because it doesn’t work like a traditional filesystem. Let me break this down for you with a practical script tailored exactly to your SNS delivery report use case.
First, a critical point to wrap your head around: S3 doesn’t have actual folders. What looks like a folder is just a prefix in an object’s key. For example, if you have a report at sns-delivery-reports/2024-05-20/daily-summary.csv, the sns-delivery-reports/2024-05-20/ part is the prefix that mimics a folder path.
Here’s a complete script that will:
- Automatically find all your date-based "daily folders"
- Fetch every SNS report file inside those folders
- Let you either download files locally or process their content directly
import boto3 from datetime import datetime def process_sns_delivery_reports(bucket_name, parent_prefix="sns-delivery-reports/"): # Initialize S3 client (works just as well with boto3.resource if you prefer that syntax) s3_client = boto3.client('s3') # First, get all date-based "folders" (prefixes) under your parent path paginator = s3_client.get_paginator('list_objects_v2') folder_pages = paginator.paginate( Bucket=bucket_name, Prefix=parent_prefix, Delimiter='/' # This tells S3 to group objects by the "/" delimiter, mimicking folders ) # Iterate through each date folder for page in folder_pages: if 'CommonPrefixes' in page: for prefix in page['CommonPrefixes']: folder_path = prefix['Prefix'] print(f"Processing folder: {folder_path}") # Now list all files within this date folder file_paginator = s3_client.get_paginator('list_objects_v2') file_pages = file_paginator.paginate( Bucket=bucket_name, Prefix=folder_path, Delimiter='' # No delimiter here to grab every object under the prefix ) # Process each individual report file for file_page in file_pages: if 'Contents' in file_page: for obj in file_page['Contents']: file_key = obj['Key'] file_size = obj['Size'] last_modified = obj['LastModified'] print(f" Found report: {file_key} (Size: {file_size} bytes, Updated: {last_modified})") # Option 1: Download the file to your local machine local_save_path = f"./sns-reports/{file_key.split('/')[-1]}" s3_client.download_file(bucket_name, file_key, local_save_path) print(f" Saved to: {local_save_path}") # Option 2: Read file content directly (no download needed) # response = s3_client.get_object(Bucket=bucket_name, Key=file_key) # report_content = response['Body'].read().decode('utf-8') # print(f" Preview: {report_content[:150]}...") if __name__ == "__main__": # Replace these with your actual bucket details YOUR_BUCKET_NAME = "your-s3-bucket-name" YOUR_PARENT_FOLDER_PREFIX = "sns-delivery-reports/" process_sns_delivery_reports(YOUR_BUCKET_NAME, YOUR_PARENT_FOLDER_PREFIX)
Key Details to Know:
Delimiter='/': This is the trick to mimicking folder traversal. When set, S3 returnsCommonPrefixeswhich are the "folders" under your target path.- Paginators: Used to handle large numbers of files/folders—S3 returns results in chunks, and paginators automatically loop through all pages so you don’t miss any data.
- Flexible Processing: Pick between downloading files for later review or reading content directly to parse metrics right away (just comment/uncomment the relevant section).
To make this even more powerful, you could:
- Add date filtering to only process folders from the last 7 days (use
datetimeto compare the prefix date with today) - Parse CSV/JSON report content to extract delivery success rates or error details
- Schedule the script to run daily using cron (Linux) or Task Scheduler (Windows)
Let me know if you need help tweaking this to match your exact bucket structure or report format!
内容的提问来源于stack exchange,提问作者Monty




