Troubleshooting Steps for Error 503 Service Unavailable

 Recently one of our vSphere 6.5 environment had a 503 service unavailable issue and I decided to write down the steps which was taken to resolve the issue. This environment is a single vCenter instance with embedded platform controller. The steps here are in valid for VCSA 6.0 to 6.7 but make sure you follow the reference KB articles which are in the bottom of this post to see if these troubleshooting steps are applicable for your environment as well.

The error displayed in the vSphere client is as below when we trying to access the vCenter.

503 Service Unavailable (Failed to connect to endpoint: [N7Vmacore4Http20NamedPipeServiceSpecE:0x00005619b5fb13a0] _serverNamespace = / action = Allow _pipeName =/var/run/vmware/vpxd-webserver-pipe)
 
Initially I though this is a temporary service down/busy issue and I gave it couple of mins before trying to troubleshoot but there was no luck.

The 503 error can be displayed due to various issues, So try the VMware service unavailable flow chart for a step by step troubleshooting approach. 


But before staring the above troubleshooting flow chart, I identified there is an expired certificate by checking the certificate status from the web browser. So I started with checking the certificate status by using following command.

for store in $(/usr/lib/vmware-vmafd/bin/vecs-cli store list | grep -v TRUSTED_ROOT_CRLS); do echo "[*] Store :" $store; /usr/lib/vmware-vmafd/bin/vecs-cli entry list --store $store --text | grep -ie "Alias" -ie "Not After";done;

This will display the following result.


Here you will that Machine SSL certificate is expired and probably that could be the reason for 503 error. So I decided to renew it using VMware certificate manager. 

Note: however the vpxd certificate is also expired, but I didn't noticed it at the time. 

Run following command in VCSA.

/usr/lib/vmware-vmca/bin/certificate-manager


As mentioned earlier I didn't notice the vpxd certification's expiry so I continue with option 3. This will require the SSO password and create certool.cfg file if your environment doesn't have any existing config file. Even if you have it you can reconfigure it here by pressing Y and providing the answers for the cert file. And then Y to continue the operation. 

The process failed at 0%.

This is probably the VMware Security Token Service or the STS certificate expiry. So I ran the checksts Python script attached in KB #79248. The link is at the bottom of this page.


As you can see there are expired STS certificate hence the certificate renewal failed. So I used fixsts shell script which is attached to KB #76719 to resolve expired STS certificate warnings. 

Warning

This script interacts with the VMDIR's database. Take an offline snapshot concurrently for all vCenter Servers and Platform Service Controllers in the SSO domain before running the script. Failing to do so may result in an unrecoverable error and require redeploying vCenter Server.

Notes:

This script should only be run once per SSO domain. In environments containing Horizon View, see Connection Server unable to accept vCenter thumbprint with an error "There was an error identifying the validity of the server" (67701).

To resolve the Signing certificate is not valid error:
 

  1. Download the attached fixsts.sh script from KB #76719 and upload to the impacted PSC or vCenter Server with Embedded PSC to the /tmp folder.
  2. If the connection to upload to the vCenter by the SCP client is rejected, run this from an SSH session to the vCenter: chsh -s /bin/bash
  3. Connect to the PSC or vCenter Server with an SSH session if you have not already per Step 2.
  4. Navigate to the /tmp directory: cd /tmp
  5. Run chmod +x fixsts.sh to make the file executable.
  6. Run ./fixsts.sh.
  7. Restart services on all vCenters and/or PSCs in your SSO domain by using below commands:
    • service-control --stop --all
    • service-control --start --all

While starting the services I noticed that vpxd is failed to start. Then I ran the certificate manager again with option 4 instead 3 to regenerate a new VMCA certificate and replace all certificates. 

which resolve the 503 issue for me. However while searching for possible scenarios I noticed that some people stuck with certification renewal process at 85% due to known issue in VCSA 6.5 because of the Update Manager Service. You can resolve it by following KB #2150895

Keep in mind that If you replaced Machine SSL or VMCA Root certificates, you will need to re-register 2nd party solutions such as NSX, SRM, and vSphere Replication. 

I didn't include too much screenshots here and too much details about the steps. So if you are not familiar with each of steps above please see below KB articles to be familiar with the process. 

"503 Service Unavailable" error on the vSphere Web Client when logging in or accessing the vCenter Server (67818)

Verify and resolve expired vCenter Server certificates using command line (82332)

How to use vSphere Certificate Manager to Replace SSL Certificates (2097936)

Checking Expiration of STS Certificate on vCenter Servers (79248)

"Signing certificate is not valid" error in VCSA 6.5.x,6.7.x or vCenter Server 7.0.x (76719)

Replacing vCenter Server certificates fails when VMware Update Manager service is enabled (2150895) 

Using the 'lsdoctor' Tool (80469) 

Fix lookup service registration after SSL Certificate renewal

Reactions

Post a Comment

0 Comments