在AWS环境中自动化执行Jupyter Notebook的技术问询:含REST API启停、参数传递及所需AWS组件咨询
在AWS上实现Jupyter Notebook自动化执行的方案组件说明
Hey there! Let's break down the AWS components you'll need to build this automated Jupyter Notebook execution workflow with REST API controls and parameter passing.
核心AWS组件
- Amazon EMR Notebooks:这是运行Jupyter Notebook的核心载体,它支持直接关联EMR集群执行Notebook,并且提供了完整的API来启停执行、传递参数,完全匹配你的需求。
- Amazon API Gateway:用来构建你需要的REST API入口,对外提供
/start和/stop这类端点,接收前端的启停请求和参数,再转发给后端处理逻辑。 - AWS Lambda:作为API Gateway的后端处理函数,负责对接EMR的Notebook执行API——比如调用
start_notebook_execution启动任务、stop_notebook_execution终止任务,同时还能把API接收的参数传递给Notebook。 - IAM Roles:必不可少的权限控制组件,需要给Lambda配置能调用EMR API的角色,给EMR Notebooks配置能访问S3(存储Notebook文件)等资源的角色,确保整个流程权限合法。
- Amazon S3:用来存储你的Jupyter Notebook文件(
.ipynb),EMR Notebooks需要从S3读取Notebook内容,执行后的结果也可以存回S3归档。
简单实现思路
- 先把你的Jupyter Notebook上传到S3桶,在EMR控制台创建EMR Notebook,关联这个S3文件和对应的EMR集群。
- 在API Gateway创建REST API,定义
/start(带参数)和/stop两个端点,绑定到对应的Lambda函数。 - 编写Lambda函数处理请求:
- 对于
/start请求,调用EMR的start_notebook_execution接口,通过NotebookParams参数把API传来的参数传递给Notebook(Notebook内可以通过读取环境变量获取这些参数)。 - 对于
/stop请求,调用EMR的stop_notebook_execution接口,传入目标执行ID即可终止任务。
- 对于
- 配置好对应的IAM角色权限,确保Lambda能调用EMR,EMR能访问S3。
示例代码(Lambda中调用EMR API)
import boto3 import time # 初始化EMR客户端 emr = boto3.client('emr', region_name='us-west-1') def lambda_handler(event, context): # 示例:处理启动请求(可从event中提取参数) if event['action'] == 'start': # 从API请求中获取要传递给Notebook的参数 notebook_params = event.get('params', {}) start_resp = emr.start_notebook_execution( EditorId='e-40AC8ZO6EGGCPJ4DLO48KGGGI', # 替换为你的EMR Notebook ID RelativePath='boto3_demo.ipynb', # S3中Notebook的相对路径 ExecutionEngine={'Id':'j-1HYZS6JQKV11Q'},# 替换为你的EMR集群ID ServiceRole='EMR_Notebooks_DefaultRole', # 替换为你的EMR服务角色 NotebookParams=str(notebook_params) # 传递给Notebook的参数 ) execution_id = start_resp["NotebookExecutionId"] return {"status": "success", "execution_id": execution_id} # 示例:处理停止请求 elif event['action'] == 'stop': execution_id = event['execution_id'] emr.stop_notebook_execution(NotebookExecutionId=execution_id) describe_resp = emr.describe_notebook_execution(NotebookExecutionId=execution_id) return {"status": "success", "execution_status": describe_resp['Status']} # 其他请求处理 else: return {"status": "error", "message": "Invalid action"}
另外,如果你需要直接测试EMR的Notebook执行API,也可以用下面的独立脚本:
import boto3,time emr = boto3.client( 'emr', region_name='us-west-1' ) start_resp = emr.start_notebook_execution( EditorId='e-40AC8ZO6EGGCPJ4DLO48KGGGI', RelativePath='boto3_demo.ipynb', ExecutionEngine={'Id':'j-1HYZS6JQKV11Q'}, ServiceRole='EMR_Notebooks_DefaultRole', NotebookParams='{"param1": "value1", "param2": "value2"}' # 传递参数 ) execution_id = start_resp["NotebookExecutionId"] print(execution_id) print("\n") describe_response = emr.describe_notebook_execution(NotebookExecutionId=execution_id) print(describe_response) print("\n") list_response = emr.list_notebook_executions() print("Existing notebook executions:\n") for execution in list_response['NotebookExecutions']: print(execution) print("\n") print("Sleeping for 5 sec...") time.sleep(5) print("Stop execution " + execution_id) emr.stop_notebook_execution(NotebookExecutionId=execution_id) describe_response = emr.describe_notebook_execution(NotebookExecutionId=execution_id) print(describe_response) print("\n")
内容的提问来源于stack exchange,提问作者user22




