You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Azure Service Fabric测试集群DnsService自动消失问题求助

Troubleshooting Disappeared Fabric Dns Service in Azure Service Fabric Cluster

Hey there, sorry to hear you hit this frustrating issue with your certificate-secured Azure Service Fabric test cluster. Let's break down actionable steps to get your Dns Service back up and running:

1. Verify Node & System Service Status First

  • Head to the Azure Portal and check if all your cluster nodes are in a healthy "Up" state. A failed or unresponsive node could disrupt system service deployment.
  • Use PowerShell to connect to your cluster (replace placeholders with your cluster details):
    Connect-ServiceFabricCluster -ConnectionEndpoint <your-cluster-endpoint> -X509Credential -FindType FindByThumbprint -FindValue <cert-thumbprint> -StoreLocation CurrentUser -StoreName My
    
  • Run these commands to inspect system services:
    # Check all node statuses
    Get-ServiceFabricNode
    
    # List all system applications
    Get-ServiceFabricApplication | Where-Object { $_.ApplicationName -eq "fabric:/System" }
    
    # Check if Dns Service is listed in system services
    Get-ServiceFabricService -ApplicationName fabric:/System
    
    If the Dns Service doesn't show up here, it confirms the service deployment is missing or corrupted.

2. Re-Deploy System Services

System services like Dns Service are part of the core fabric:/System application. You can trigger a redeployment even if the application version hasn't changed:

  • First, export your cluster manifest to confirm Dns Service is configured:
    Get-ServiceFabricClusterManifest -ClusterEndpoint <your-cluster-endpoint> -X509Credential -FindType FindByThumbprint -FindValue <cert-thumbprint> -StoreLocation CurrentUser -StoreName My > clusterManifest.xml
    
    Open the XML file and look for entries like <Service Name="DnsService" ServiceTypeName="DnsServiceType">—if this exists, the service is supposed to be deployed.
  • Trigger a monitored upgrade of the system application to force redeployment:
    # Get the current system application version
    $systemApp = Get-ServiceFabricApplication -ApplicationName fabric:/System
    Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/System -ApplicationTypeVersion $systemApp.ApplicationTypeVersion -Monitored
    
    Monitor the upgrade progress in the Portal or with Get-ServiceFabricApplicationUpgrade -ApplicationName fabric:/System.

3. Dig Into Node Logs for Root Cause

If redeployment doesn't work, check the Service Fabric trace logs on your VM nodes to find why the Dns Service failed to start after the reboot:

  • Windows nodes: Logs are at C:\ProgramData\SF\Log\Traces
  • Linux nodes: Logs are at /var/log/servicefabric/traces
  • Search for keywords like DnsService, Failed to start, or Certificate to spot errors—common issues include certificate permission problems, resource constraints, or corrupted service packages.

4. Validate Certificate Configuration

Since your cluster uses certificate security, incorrect certificate setup might be blocking the Dns Service:

  • On each node, confirm the cluster certificate is installed in the correct store (CurrentUser\My for Windows, or the specified Linux store) and has the right permissions (Windows: grant NETWORK SERVICE read access to the certificate private key).
  • Test your cluster connection again with Test-ServiceFabricClusterConnection to ensure the certificate is valid and authentication works.

5. Use Azure's Built-in Cluster Repair

If all else fails, leverage Azure's built-in repair tool for system services:

  1. Go to your Service Fabric cluster in the Azure Portal
  2. Navigate to the Repair tab
  3. Select Repair system services and follow the prompts to let Azure automatically diagnose and fix missing or corrupted system services

内容的提问来源于stack exchange,提问作者Katrin Muck

火山引擎 最新活动