问题描述
Deployment Azure Service Fabric 时,遇见了VMExtensionProvisioningError, 全文如下:
Deployment Name: 385A084F35B2BC55-xxxx Operation Id: 096CED Operation: Create Correlation Id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx Status: Failed Error: Code: VMExtensionProvisioningError Message: Multiple VM extensions failed to be provisioned on the VM. Please see the VM extension instance view for other failures. The first extension failed due to the error: VM Extension 'RunPowerShell_Frontdoor' is marked as failed since it depends upon the VM Extension 'ServiceFabricNodeVmExt_vmNodeType1Name' which has failed. Help Link: https://aka.ms/ev2/errors/troubleshooting
问题解答
第一步:收集日志
需要远程连接(RDP)到SF的节点中, 在目录 C:\WindowsAzure\Logs\Plugins 中收集VM Extension的安装日志
第二步:分析日志
在 Microsoft.Azure.ServiceFabric.ServiceFabricNode plugin 日志中,发现是由于在获取证书失败而出现的错误。
[09/09/2021 12:32:01.86] Executing: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\ServiceFabricExtensionHandler.exe enable [09/09/2021 12:32:03.62] Execution Complete. ###### Execution Output: 1> 2021-09-09T12:32:01.9973303Z: Information: Starting... 1> 2021-09-09T12:32:01.9973303Z: Information: MachineName: Frontdoor000000 1> 2021-09-09T12:32:01.9973303Z: Information: CurrentDirectory: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11 1> 2021-09-09T12:32:01.9973303Z: Information: UserName: SYSTEM 1> 2021-09-09T12:32:01.9973303Z: Information: Created mutex; this is the only instance of this process running 1> 2021-09-09T12:32:02.3723445Z: Information: HandlerManifest Version: 1 1> 2021-09-09T12:32:02.4348520Z: Information: [HandlerEnvironment] 1> 2021-09-09T12:32:02.4348520Z: Information: LogFolder: C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11 1> 2021-09-09T12:32:02.4348520Z: Information: ConfigFolder: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\RuntimeSettings 1> 2021-09-09T12:32:02.4348520Z: Information: StatusFolder: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\Status 1> 2021-09-09T12:32:02.4348520Z: Information: HeartbeatFile: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\Status\HeartBeat.Json 1> 2021-09-09T12:32:02.4348520Z: Information: DeploymentId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 1> 2021-09-09T12:32:02.4348520Z: Information: RoleName: _Frontdoor_0 1> 2021-09-09T12:32:02.4348520Z: Information: Instance: _Frontdoor_0 1> 2021-09-09T12:32:02.5442327Z: Information: Opening config file: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\1.1.0.11\RuntimeSettings\1.settings 1> 2021-09-09T12:32:02.5598642Z: Information: Config specified client cert FindBySubjectName 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' (store name = My) 1> 2021-09-09T12:32:02.5754780Z: Error: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) 1> 2021-09-09T12:32:02.5754780Z: Warning: Heartbeat: Not Ready: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' 1> 2021-09-09T12:32:02.5754780Z: Error: Heartbeat file updated, with error msg 1> 2021-09-09T12:32:02.5911044Z: Error: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) at Microsoft.Azure.ServiceFabric.Extension.Core.Common.PublicSettingsExtensions.AllowAccessToCerts(PublicSettings publicSettings, ICertificateUtility certificateUtility) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.ValidateDeployment(HandlerSettings settings) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallService() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallServiceWithRetry() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.Enable() at Microsoft.Azure.ServiceFabric.Extension.Handler.Program.Main(String[] args) in X:\bt\1162492\repo\src\HandlerExe\Program.cs:line 52 Execution Error: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) at Microsoft.Azure.ServiceFabric.Extension.Core.Common.PublicSettingsExtensions.AllowAccessToCerts(PublicSettings publicSettings, ICertificateUtility certificateUtility) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.ValidateDeployment(HandlerSettings settings) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallService() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallServiceWithRetry() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.Enable() at Microsoft.Azure.ServiceFabric.Extension.Handler.Program.Main(String[] args) in X:\bt\1162492\repo\src\HandlerExe\Program.cs:line 52
最最关键的信息就是 Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with 'FindBySubjectName' 'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn' in store 'My' 。
第三步:继续查找不能找到证书的问题
在获取证书的日志中,发现SF是从KV(Key Vault: 密钥保管库)中获取的证书,而在第5步获取auth challenge时,从18:59分02秒 到 19:04分45秒,耗时超过了5 分钟。
- 2021-09-09 18:59:02: <info> [CertificateManager] Beginning refresh for: https://xxxxx-chinanorth-kv.vault.azure.cn/secrets/cluster
- 2021-09-09 18:59:02: <info> [WindowsCertificateStore] attempting to open store 'LocalMachine\MY'
- 2021-09-09 18:59:02: <debug> [WindowsCertificateStore] opening the 'LocalMachine' store..
- 2021-09-09 18:59:02: <debug> [WindowsCertificateStore] store opened successfully.
- 2021-09-09 18:59:02: <info> [KeyVaultClient] Getting new auth challenge
- 2021-09-09 19:04:45: <error> [CertificateManager] Refreshing 'https://xxxxx-chinanorth-kv.vault.azure.cn/secrets/cluster' failed with KeyVault exception: KeyVaultHttpClient.GetChallenge. - http_exception: what = 'WinHttpSendRequest: 12030: The connection with the server was terminated abnormall' code = 'windows:12030'
- 2021-09-09 19:04:45: <info> [CertificateManager] Beginning refresh for:
由此可以推断,由于网络的延迟导致部署SF时,向Key Vault请求认证,由于请求超时而导致部署失败。
在多次重试重新部署SF,部署成功。