You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Kafka 3.2.0执行消费者组列表查询命令时出现超时异常问题求助

Kafka 3.2.0 + AWS MSK: Consumer Group List Command Timeout Issue Troubleshooting

Let's break down why you're hitting this timeout and walk through actionable fixes—your core config looks right for MSK IAM auth, but there are a few easy-to-miss details and checks to run:

1. First, Verify Basic Network Connectivity

Timeouts often boil down to blocked traffic before we even get to auth. Let's confirm your client can reach the MSK cluster's 9098 ports:

  • Run nc -zv b-2.amazonaws.com 9098 (or telnet b-2.amazonaws.com 9098 if netcat isn't installed) for each bootstrap server in your list. If any fail, you've found the issue:
    • Check your EC2 security group (if running on EC2) allows outbound traffic to the MSK cluster's security group on port 9098.
    • Confirm the MSK cluster's security group allows inbound traffic from your client's IP/security group on 9098.

2. Fix IAM Auth Config & Permissions

Your IAM auth setup is almost there, but a tiny oversight could be causing silent auth retries that lead to timeouts:

  • Add required; to your JAAS config: Your current sasl.jaas.config line is missing the mandatory suffix. Update it to:
    sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
    
    This is a super common gotcha—without it, the login module won't initialize properly, and the client will keep retrying auth until it times out.
  • Validate IAM permissions: The entity running the command (EC2 instance role, local IAM user) needs permissions to list consumer groups and describe the cluster. Test with the managed AmazonMSKReadOnlyAccess policy first, or use a custom policy like this:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": ["kafka:ListGroups", "kafka:DescribeCluster"],
                "Resource": "arn:aws:kafka:YOUR_REGION:YOUR_ACCOUNT_ID:cluster/YOUR_CLUSTER_NAME/*"
            }
        ]
    }
    
  • Double-check MSK cluster auth settings: Ensure your MSK cluster has IAM authentication enabled, and that port 9098 is mapped to the SASL_SSL security protocol with AWS_MSK_IAM as the SASL mechanism (verify this in the AWS MSK console under "Client authentication").

3. Add Explicit SSL Truststore Config

Even with IAM auth, the SASL_SSL connection needs valid root CA certificates to verify the MSK cluster's SSL certificate. Add these lines to your client.properties:

ssl.truststore.location=/etc/pki/java/cacerts  # For Amazon Linux; use $JAVA_HOME/jre/lib/security/cacerts for other systems
ssl.truststore.password=changeit

The default Java truststore password is changeit—if you've modified it, use your custom password instead. Missing this can cause silent SSL handshake failures that manifest as timeouts.

4. Extend Client Timeout Settings

Kafka 3.2.0's default timeouts might be too short for your environment. Add these to client.properties to give the client more time to fetch metadata and complete auth:

request.timeout.ms=30000
metadata.max.age.ms=30000

5. Enable Debug Logs for Deep Diving

If you're still stuck, enable debug logging to see exactly where the command is failing:

export KAFKA_OPTS="-Dorg.slf4j.simpleLogger.defaultLogLevel=debug"
./bin/kafka-consumer-groups.sh --bootstrap-server b-2.amazonaws.com:9098,b-3.amazonaws.com:9098,b-1.amazonaws.com:9098 --list --command-config bin/client.properties

Look for logs related to SSL handshakes, IAM token retrieval, or metadata fetching—this will pinpoint whether the issue is network, auth, or cluster-side.

内容的提问来源于stack exchange,提问作者Dushan

火山引擎 最新活动