从Serverless框架出发:EC2实例MySQL数据下载处理配置问询
Hey there! Since you already know how to spin up EC2 instances via Lambda, let's walk through solving your remaining questions step by step.
1. Where to Store Your Python Script & How to Fetch It on EC2 Startup
The most straightforward and scalable way is to store your script in Amazon S3—it's cheap, easy to access, and integrates seamlessly with EC2 via IAM roles. Here's how to set it up:
Step 1: Upload the Script to S3
- Create an S3 bucket (or use an existing one) and upload your Python script (e.g.,
data_processor.py) to it. - Make sure your EC2 instance's IAM role has the
s3:GetObjectpermission for this bucket/object (no hardcoded access keys needed!).
Step 2: Fetch the Script on EC2 Launch via User Data
When you launch the EC2 instance from Lambda, include a User Data script that automatically pulls the script from S3 and runs it. Here's an example User Data snippet for Linux instances:
#!/bin/bash # Install required packages (adjust based on your script's dependencies) yum install -y python3-pip pip3 install pymysql boto3 # Fetch the script from S3 aws s3 cp s3://your-bucket-name/data_processor.py /home/ec2-user/ # Make the script executable and run it chmod +x /home/ec2-user/data_processor.py python3 /home/ec2-user/data_processor.py
Note: For Windows instances, adjust the commands to use PowerShell and appropriate package managers.
Alternative: If your script has sensitive configs, you can store parts of it (or config values) in SSM Parameter Store and pull them at runtime, but S3 is ideal for the full script.
2. Configuring the EC2 Instance to Process MySQL Data & Send to DynamoDB
Your Python script will handle the core logic, but you need to set up permissions and network access first:
Prerequisites
- IAM Role Permissions: Attach a policy to your EC2 instance's IAM role that allows:
dynamodb:PutItem(orBatchWriteItemfor bulk operations) on your target DynamoDB tables3:GetObjectfor the script bucket- (Optional)
ec2:StopInstancesif you want the script to shut down the instance later
- Network Access: Ensure your EC2 instance can reach your MySQL database (e.g., same VPC, security group allows inbound MySQL traffic from EC2's SG, or use a public endpoint with proper credentials).
Sample Python Script Outline
Here's a simplified version of what your data_processor.py could look like:
import pymysql import boto3 import os import requests def get_ssm_parameter(param_name): # Helper to fetch sensitive configs from SSM Parameter Store ssm = boto3.client('ssm') response = ssm.get_parameter(Name=param_name, WithDecryption=True) return response['Parameter']['Value'] def main(): # 1. Fetch MySQL credentials from SSM (never hardcode!) mysql_host = get_ssm_parameter('/mysql/host') mysql_user = get_ssm_parameter('/mysql/user') mysql_password = get_ssm_parameter('/mysql/password') mysql_db = get_ssm_parameter('/mysql/db') # 2. Connect to MySQL db_conn = pymysql.connect( host=mysql_host, user=mysql_user, password=mysql_password, database=mysql_db ) # 3. Fetch all data for the specified user into memory cursor = db_conn.cursor() target_user_id = 'your-target-user-id' cursor.execute("SELECT * FROM your_table WHERE user_id = %s", (target_user_id,)) user_data = cursor.fetchall() # 4. Perform row-level processing processed_data = [] for row in user_data: # Example processing: transform values, filter, enrich processed_row = { 'user_id': row[0], 'processed_value': round(row[1] * 1.2, 2), 'record_timestamp': row[2].isoformat(), 'processed_at': os.popen('date -u +"%Y-%m-%dT%H:%M:%SZ"').read().strip() } processed_data.append(processed_row) # 5. Send results to DynamoDB (batch write for efficiency) dynamodb = boto3.resource('dynamodb') table = dynamodb.Table('your-dynamodb-table') with table.batch_writer() as batch: for item in processed_data: batch.put_item(Item=item) # 6. Cleanup connections cursor.close() db_conn.close() if __name__ == "__main__": main() # Add shutdown logic here (see Section 3)
3. Adding Auto-Shutdown to the Script
You have two reliable options to shut down the EC2 instance after the script finishes:
Option 1: OS-Level Shutdown (Simple)
Add this line at the end of your script (for Linux):
import subprocess subprocess.run(["sudo", "shutdown", "-h", "now"], check=True)
For Windows, use:
subprocess.run(["shutdown", "/s", "/t", "0"], check=True)
Note: Most default EC2 AMIs allow the ec2-user to run sudo commands without a password.
Option 2: AWS API Shutdown (More Controlled)
Use boto3 to call the EC2 API to stop the instance. This requires the ec2:StopInstances permission on the IAM role, and lets you add error handling (e.g., only shut down if processing succeeded):
def shutdown_instance(): # Get current instance ID via EC2 metadata service instance_id = requests.get("http://169.254.169.254/latest/meta-data/instance-id").text ec2 = boto3.client('ec2') ec2.stop_instances(InstanceIds=[instance_id]) # Call after main logic (wrap in try/except if needed) shutdown_instance()
Final Tips
- Test your script manually on an EC2 instance first before integrating with Lambda—this helps debug permissions or network issues faster.
- For Lambda, ensure your execution role has
ec2:RunInstancesandiam:PassRolepermissions to launch EC2 instances with the correct IAM role. - Consider using Spot Instances for this batch workflow to cut costs, since it doesn't require continuous uptime.
内容的提问来源于stack exchange,提问作者WeCanBeFriends




