Azure Service Fabric测试集群DnsService自动消失问题求助
Hey there, sorry to hear you hit this frustrating issue with your certificate-secured Azure Service Fabric test cluster. Let's break down actionable steps to get your Dns Service back up and running:
1. Verify Node & System Service Status First
- Head to the Azure Portal and check if all your cluster nodes are in a healthy "Up" state. A failed or unresponsive node could disrupt system service deployment.
- Use PowerShell to connect to your cluster (replace placeholders with your cluster details):
Connect-ServiceFabricCluster -ConnectionEndpoint <your-cluster-endpoint> -X509Credential -FindType FindByThumbprint -FindValue <cert-thumbprint> -StoreLocation CurrentUser -StoreName My - Run these commands to inspect system services:
If the Dns Service doesn't show up here, it confirms the service deployment is missing or corrupted.# Check all node statuses Get-ServiceFabricNode # List all system applications Get-ServiceFabricApplication | Where-Object { $_.ApplicationName -eq "fabric:/System" } # Check if Dns Service is listed in system services Get-ServiceFabricService -ApplicationName fabric:/System
2. Re-Deploy System Services
System services like Dns Service are part of the core fabric:/System application. You can trigger a redeployment even if the application version hasn't changed:
- First, export your cluster manifest to confirm Dns Service is configured:
Open the XML file and look for entries likeGet-ServiceFabricClusterManifest -ClusterEndpoint <your-cluster-endpoint> -X509Credential -FindType FindByThumbprint -FindValue <cert-thumbprint> -StoreLocation CurrentUser -StoreName My > clusterManifest.xml<Service Name="DnsService" ServiceTypeName="DnsServiceType">—if this exists, the service is supposed to be deployed. - Trigger a monitored upgrade of the system application to force redeployment:
Monitor the upgrade progress in the Portal or with# Get the current system application version $systemApp = Get-ServiceFabricApplication -ApplicationName fabric:/System Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/System -ApplicationTypeVersion $systemApp.ApplicationTypeVersion -MonitoredGet-ServiceFabricApplicationUpgrade -ApplicationName fabric:/System.
3. Dig Into Node Logs for Root Cause
If redeployment doesn't work, check the Service Fabric trace logs on your VM nodes to find why the Dns Service failed to start after the reboot:
- Windows nodes: Logs are at
C:\ProgramData\SF\Log\Traces - Linux nodes: Logs are at
/var/log/servicefabric/traces - Search for keywords like
DnsService,Failed to start, orCertificateto spot errors—common issues include certificate permission problems, resource constraints, or corrupted service packages.
4. Validate Certificate Configuration
Since your cluster uses certificate security, incorrect certificate setup might be blocking the Dns Service:
- On each node, confirm the cluster certificate is installed in the correct store (CurrentUser\My for Windows, or the specified Linux store) and has the right permissions (Windows: grant
NETWORK SERVICEread access to the certificate private key). - Test your cluster connection again with
Test-ServiceFabricClusterConnectionto ensure the certificate is valid and authentication works.
5. Use Azure's Built-in Cluster Repair
If all else fails, leverage Azure's built-in repair tool for system services:
- Go to your Service Fabric cluster in the Azure Portal
- Navigate to the Repair tab
- Select Repair system services and follow the prompts to let Azure automatically diagnose and fix missing or corrupted system services
内容的提问来源于stack exchange,提问作者Katrin Muck




