跨账号AssumeRole访问Glue Catalog:Spark无法查询非Iceberg表
解决方案:区分Iceberg与标准Glue表的Catalog配置
你的问题根源在于:Iceberg的SparkCatalog实现仅适配Iceberg格式的表,即便它能从Glue Catalog中列出非Iceberg表的元数据,但无法解析这类表的存储格式与结构,导致查询时抛出TABLE_OR_VIEW_NOT_FOUND错误。要同时支持两种表的跨账号assume role访问,需要配置两个独立的Catalog,分别对应Iceberg表和标准Glue表。
具体Spark会话配置
保留原有的Iceberg Catalog配置,新增一个针对标准Glue表的Catalog,通过Hive metastore参数指定assume role:
config_options = { # Iceberg表专属Catalog配置(保留原有逻辑) "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions", "spark.sql.catalog.idt_iceberg": "org.apache.iceberg.spark.SparkCatalog", "spark.sql.catalog.idt_iceberg.catalog-impl": "org.apache.iceberg.aws.glue.GlueCatalog", "spark.sql.catalog.idt_iceberg.io-impl": "org.apache.iceberg.aws.s3.S3FileIO", "spark.sql.catalog.idt_iceberg.client.factory": "org.apache.iceberg.aws.AssumeRoleAwsClientFactory", "spark.sql.catalog.idt_iceberg.client.assume-role.arn": ROLE_B_ARN, # 标准Glue表专属Catalog配置(使用assume role) "spark.sql.catalog.idt_glue": "com.amazonaws.glue.catalog.metastore.AWSGlueCatalog", "spark.hadoop.hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueMetastoreClientFactory", "spark.hadoop.hive.metastore.glue.assume.role.arn": ROLE_B_ARN }
使用方式
- 查询Iceberg表时,指定Iceberg Catalog:
SELECT * FROM idt_iceberg.database.iceberg_table - 查询标准Glue表时,指定标准Catalog:
SELECT * FROM idt_glue.database.standard_table
关键说明
- 标准Glue表依赖AWS原生的
AWSGlueCatalog实现,它基于Hive metastore逻辑,因此需要通过spark.hadoop.hive.metastore.glue.assume.role.arn配置要切换的RoleB ARN,实现跨账号元数据与数据访问。 - 确保RoleB拥有:
- Glue Catalog的
glue:GetTable、glue:GetDatabase等元数据权限 - 标准表对应S3存储路径的
s3:GetObject、s3:ListBucket权限
- Glue Catalog的
- Glue Spark作业默认已包含
AWSGlueCatalog所需的依赖包,无需额外添加。
内容的提问来源于stack exchange,提问作者Haha




