跨AWS账号切换角色后获取EKS集群CA证书与JWT Token失败求助
跨AWS账号切换IAM角色操作EKS集群时出现未授权错误
编写了Bash脚本用于跨AWS账号切换IAM角色,列出EKS集群、更新kubeconfig并获取JWT Token和CA证书,但执行时触发未授权错误,无法完成操作。脚本及错误信息如下:
#!/bin/bash # Set AWS region AWS_REGION="us-east-1" INPUT_FILE="inputfile" # Function to assume role and get temporary credentials assume_role() { local account_id=$1 local account_name=$2 local role_arn="arn:aws:iam::${account_id}:role/support-admin" local session_name="${account_id}" echo "Assuming role for account: ${account_name} (${account_id})" # Assume role and get temporary credentials credentials=$(aws sts assume-role --role-arn "${role_arn}" --role-session-name "${session_name}") if [ $? -ne 0 ]; then echo "Cannot assume role for account: ${account_id} (${account_name})" echo "--------------------------------------------------------------------" return fi export AWS_ACCESS_KEY_ID=$(echo "${credentials}" | jq -r '.Credentials.AccessKeyId') export AWS_SECRET_ACCESS_KEY=$(echo "${credentials}" | jq -r '.Credentials.SecretAccessKey') export AWS_SESSION_TOKEN=$(echo "${credentials}" | jq -r '.Credentials.SessionToken') list_eks_clusters "${account_name}" echo "--------------------------------------------------------------------" } # Function to list EKS clusters list_eks_clusters() { local account_name=$1 echo "Listing EKS clusters for account: ${account_name}" clusters=$(aws eks list-clusters --region "${AWS_REGION}" | jq -r '.clusters[]') if [ $? -ne 0 ]; then echo "Error listing EKS clusters for account: ${account_name}" return fi for cluster in ${clusters}; do echo "Cluster: ${cluster}" # describe_cluster "${cluster}" done } # Function to describe EKS cluster, update kubeconfig, and get CA certificate describe_cluster() { local cluster_name=$1 echo "Describing cluster: ${cluster_name}" cluster_info=$(aws eks describe-cluster --name "${cluster_name}" --region "${AWS_REGION}") if [ $? -ne 0 ]; then echo "Error describing cluster: ${cluster_name}" return fi cluster_endpoint=$(echo "${cluster_info}" | jq -r '.cluster.endpoint') echo "Cluster ${cluster_name} endpoint: ${cluster_endpoint}" # Update kubeconfig for the cluster and write to custom location CUSTOM_KUBECONFIG="kubeconfig" aws eks --region "${AWS_REGION}" update-kubeconfig --name "${cluster_name}" --kubeconfig "${CUSTOM_KUBECONFIG}" # Get the authentication token for the cluster export KUBECONFIG=kubeconfig kubectl config view --kubeconfig="${KUBECONFIG}" token=$(aws eks get-token --cluster-name "${cluster_name}" --region "${AWS_REGION}" | jq -r '.status.token') echo "Token: ${token}" kubectl config set-credentials arn:aws:eks:${AWS_REGION}:${account_id}:cluster/${cluster_name} --token="${token}" # Ensure the kubeconfig context is set correctly kubectl config use-context arn:aws:eks:${AWS_REGION}:${account_id}:cluster/${cluster_name} # Get all namespaces from the cluster using the token - works this here kubectl --kubeconfig="${KUBECONFIG}" get namespaces # Get the CA certificate from the vault-token secret in the kube-system namespace - dont get exact ca cert of cluster selected KUBE_CA_CERT=$(kubectl get secret vault-token -n kube-system -o json --token="${token}" | jq -r '.data | ."ca.crt"' | base64 --decode) echo "KUBE_CA_CERT: ${KUBE_CA_CERT}" echo "--------------------------------------------------------------------" } # Read accounts from input file and assume role for each account if [ ! -f "${INPUT_FILE}" ]; then echo "Input file not found: ${INPUT_FILE}" exit 1 fi while IFS=, read -r account_id account_name; do assume_role "${account_id}" "${account_name}" done < "${INPUT_FILE}"
错误信息(中文翻译)
已切换至上下文 "arn:aws:eks:us-east-1:***:cluster/eks7-"。
E0213 16:15:03.105457 57722 memcache.go:265] "未处理的错误" err="无法获取当前服务器API组列表:服务器要求客户端提供凭证"
E0213 16:15:03.188119 57722 memcache.go:265] "未处理的错误" err="无法获取当前服务器API组列表:服务器要求客户端提供凭证"
E0213 16:15:03.290776 57722 memcache.go:265] "未处理的错误" err="无法获取当前服务器API组列表:服务器要求客户端提供凭证"
错误:您必须登录服务器(未授权)
问题分析与修复方案
核心问题
- 变量作用域缺失:
describe_cluster函数中使用的account_id是assume_role的局部变量,无法跨函数访问,导致kubeconfig中凭证关联的账号ID错误。 - kubeconfig配置冗余:
aws eks update-kubeconfig已经自动配置了通过AWS CLI动态获取token的逻辑,手动硬编码token会引发冲突,且token有有效期限制。 - CA证书获取方式错误:从
vault-tokensecret中获取的不是EKS集群的CA证书,正确的集群CA证书可直接从aws eks describe-cluster结果中提取,无需通过kubectl访问集群。
修复后的关键调整
1. 传递account_id到describe_cluster函数
修改list_eks_clusters函数,新增account_id参数并传递给describe_cluster:
# Function to list EKS clusters list_eks_clusters() { local account_name=$1 local account_id=$2 echo "Listing EKS clusters for account: ${account_name}" clusters=$(aws eks list-clusters --region "${AWS_REGION}" | jq -r '.clusters[]') if [ $? -ne 0 ]; then echo "Error listing EKS clusters for account: ${account_name}" return fi for cluster in ${clusters}; do echo "Cluster: ${cluster}" describe_cluster "${cluster}" "${account_id}" done }
同时修改assume_role中调用list_eks_clusters的语句:
list_eks_clusters "${account_name}" "${account_id}"
2. 简化kubeconfig配置,移除手动token设置
删除手动获取并设置token的冗余代码,保留aws eks update-kubeconfig的自动配置:
# 移除以下代码块 # token=$(aws eks get-token --cluster-name "${cluster_name}" --region "${AWS_REGION}" | jq -r '.status.token') # echo "Token: ${token}" # kubectl config set-credentials arn:aws:eks:${AWS_REGION}:${account_id}:cluster/${cluster_name} --token="${token}"
3. 从describe-cluster结果中直接获取CA证书
替换从secret获取CA的逻辑,直接提取集群官方CA证书:
# 获取集群CA证书 KUBE_CA_CERT=$(echo "${cluster_info}" | jq -r '.cluster.certificateAuthority.data' | base64 --decode) echo "KUBE_CA_CERT: ${KUBE_CA_CERT}"
4. 确保kubectl使用指定kubeconfig
执行kubectl命令时明确指定kubeconfig路径,避免环境变量冲突:
kubectl --kubeconfig="${CUSTOM_KUBECONFIG}" get namespaces
完整修复后的脚本
#!/bin/bash # Set AWS region AWS_REGION="us-east-1" INPUT_FILE="inputfile" # Function to assume role and get temporary credentials assume_role() { local account_id=$1 local account_name=$2 local role_arn="arn:aws:iam::${account_id}:role/support-admin" local session_name="${account_id}" echo "Assuming role for account: ${account_name} (${account_id})" # Assume role and get temporary credentials credentials=$(aws sts assume-role --role-arn "${role_arn}" --role-session-name "${session_name}") if [ $? -ne 0 ]; then echo "Cannot assume role for account: ${account_id} (${account_name})" echo "--------------------------------------------------------------------" return fi export AWS_ACCESS_KEY_ID=$(echo "${credentials}" | jq -r '.Credentials.AccessKeyId') export AWS_SECRET_ACCESS_KEY=$(echo "${credentials}" | jq -r '.Credentials.SecretAccessKey') export AWS_SESSION_TOKEN=$(echo "${credentials}" | jq -r '.Credentials.SessionToken') list_eks_clusters "${account_name}" "${account_id}" echo "--------------------------------------------------------------------" } # Function to list EKS clusters list_eks_clusters() { local account_name=$1 local account_id=$2 echo "Listing EKS clusters for account: ${account_name}" clusters=$(aws eks list-clusters --region "${AWS_REGION}" | jq -r '.clusters[]') if [ $? -ne 0 ]; then echo "Error listing EKS clusters for account: ${account_name}" return fi for cluster in ${clusters}; do echo "Cluster: ${cluster}" describe_cluster "${cluster}" "${account_id}" done } # Function to describe EKS cluster, update kubeconfig, and get CA certificate describe_cluster() { local cluster_name=$1 local account_id=$2 echo "Describing cluster: ${cluster_name}" cluster_info=$(aws eks describe-cluster --name "${cluster_name}" --region "${AWS_REGION}") if [ $? -ne 0 ]; then echo "Error describing cluster: ${cluster_name}" return fi cluster_endpoint=$(echo "${cluster_info}" | jq -r '.cluster.endpoint') echo "Cluster ${cluster_name} endpoint: ${cluster_endpoint}" # Update kubeconfig for the cluster and write to custom location CUSTOM_KUBECONFIG="kubeconfig_${cluster_name}" aws eks --region "${AWS_REGION}" update-kubeconfig --name "${cluster_name}" --kubeconfig "${CUSTOM_KUBECONFIG}" export KUBECONFIG="${CUSTOM_KUBECONFIG}" kubectl config view --kubeconfig="${KUBECONFIG}" # Ensure the kubeconfig context is set correctly kubectl config use-context arn:aws:eks:${AWS_REGION}:${account_id}:cluster/${cluster_name} # Get all namespaces from the cluster kubectl --kubeconfig="${KUBECONFIG}" get namespaces # Get the cluster CA certificate directly from describe-cluster result KUBE_CA_CERT=$(echo "${cluster_info}" | jq -r '.cluster.certificateAuthority.data' | base64 --decode) echo "KUBE_CA_CERT: ${KUBE_CA_CERT}" echo "--------------------------------------------------------------------" } # Read accounts from input file and assume role for each account if [ ! -f "${INPUT_FILE}" ]; then echo "Input file not found: ${INPUT_FILE}" exit 1 fi while IFS=, read -r account_id account_name; do assume_role "${account_id}" "${account_name}" done < "${INPUT_FILE}"
内容的提问来源于stack exchange,提问作者Raghu




