在AWS环境中自动化执行Jupyter Notebook的技术问询：含REST API启停、参数传递及所需AWS组件咨询

阿华AIGC实验室

2026-4-30

在AWS上实现Jupyter Notebook自动化执行的方案组件说明

Hey there! Let's break down the AWS components you'll need to build this automated Jupyter Notebook execution workflow with REST API controls and parameter passing.

核心AWS组件

Amazon EMR Notebooks：这是运行Jupyter Notebook的核心载体，它支持直接关联EMR集群执行Notebook，并且提供了完整的API来启停执行、传递参数，完全匹配你的需求。
Amazon API Gateway：用来构建你需要的REST API入口，对外提供/start和/stop这类端点，接收前端的启停请求和参数，再转发给后端处理逻辑。
AWS Lambda：作为API Gateway的后端处理函数，负责对接EMR的Notebook执行API——比如调用start_notebook_execution启动任务、stop_notebook_execution终止任务，同时还能把API接收的参数传递给Notebook。
IAM Roles：必不可少的权限控制组件，需要给Lambda配置能调用EMR API的角色，给EMR Notebooks配置能访问S3（存储Notebook文件）等资源的角色，确保整个流程权限合法。
Amazon S3：用来存储你的Jupyter Notebook文件（.ipynb），EMR Notebooks需要从S3读取Notebook内容，执行后的结果也可以存回S3归档。

简单实现思路

先把你的Jupyter Notebook上传到S3桶，在EMR控制台创建EMR Notebook，关联这个S3文件和对应的EMR集群。
在API Gateway创建REST API，定义/start（带参数）和/stop两个端点，绑定到对应的Lambda函数。
编写Lambda函数处理请求：
- 对于/start请求，调用EMR的start_notebook_execution接口，通过NotebookParams参数把API传来的参数传递给Notebook（Notebook内可以通过读取环境变量获取这些参数）。
- 对于/stop请求，调用EMR的stop_notebook_execution接口，传入目标执行ID即可终止任务。
配置好对应的IAM角色权限，确保Lambda能调用EMR，EMR能访问S3。

示例代码（Lambda中调用EMR API）

import boto3
import time

# 初始化EMR客户端
emr = boto3.client('emr', region_name='us-west-1')

def lambda_handler(event, context):
    # 示例：处理启动请求（可从event中提取参数）
    if event['action'] == 'start':
        # 从API请求中获取要传递给Notebook的参数
        notebook_params = event.get('params', {})
        start_resp = emr.start_notebook_execution(
            EditorId='e-40AC8ZO6EGGCPJ4DLO48KGGGI',  # 替换为你的EMR Notebook ID
            RelativePath='boto3_demo.ipynb',         # S3中Notebook的相对路径
            ExecutionEngine={'Id':'j-1HYZS6JQKV11Q'},# 替换为你的EMR集群ID
            ServiceRole='EMR_Notebooks_DefaultRole', # 替换为你的EMR服务角色
            NotebookParams=str(notebook_params)      # 传递给Notebook的参数
        )
        execution_id = start_resp["NotebookExecutionId"]
        return {"status": "success", "execution_id": execution_id}
    
    # 示例：处理停止请求
    elif event['action'] == 'stop':
        execution_id = event['execution_id']
        emr.stop_notebook_execution(NotebookExecutionId=execution_id)
        describe_resp = emr.describe_notebook_execution(NotebookExecutionId=execution_id)
        return {"status": "success", "execution_status": describe_resp['Status']}
    
    # 其他请求处理
    else:
        return {"status": "error", "message": "Invalid action"}

另外，如果你需要直接测试EMR的Notebook执行API，也可以用下面的独立脚本：

import boto3,time
emr = boto3.client( 'emr', region_name='us-west-1' )
start_resp = emr.start_notebook_execution(
    EditorId='e-40AC8ZO6EGGCPJ4DLO48KGGGI',
    RelativePath='boto3_demo.ipynb',
    ExecutionEngine={'Id':'j-1HYZS6JQKV11Q'},
    ServiceRole='EMR_Notebooks_DefaultRole',
    NotebookParams='{"param1": "value1", "param2": "value2"}'  # 传递参数
)
execution_id = start_resp["NotebookExecutionId"]
print(execution_id)
print("\n")
describe_response = emr.describe_notebook_execution(NotebookExecutionId=execution_id)
print(describe_response)
print("\n")
list_response = emr.list_notebook_executions()
print("Existing notebook executions:\n")
for execution in list_response['NotebookExecutions']:
    print(execution)
print("\n")
print("Sleeping for 5 sec...")
time.sleep(5)
print("Stop execution " + execution_id)
emr.stop_notebook_execution(NotebookExecutionId=execution_id)
describe_response = emr.describe_notebook_execution(NotebookExecutionId=execution_id)
print(describe_response)
print("\n")

内容的提问来源于stack exchange，提问作者user22