Nutanix - Foundation upgrade failed with services running on one of the node


Today I tried to upgrade the foundation on one of Nutanix cluster and it failed with the error saying “Foundation Service running on one of the nodes, Test Failed”. I was wondering why foundation service/s are running on CVMs and then I checked the recent tasks in Prism. Then noticed that there were too many LCM checks are failed recently and then that could be the issue as foundation service is not running under normal operations.



So, I SSH into one of the CVM, then ran the following command.


allssh ‘genesis status | grep foundation’



Then I noticed that, there were few CVMs which are currently running foundation service. If you see the highlighted output (sorry for the creepy image, I didn’t had the proper image editing tool at the time I’m writing this article) you can see the process ID inside the brackets for each service and for the CVMs that are not running foundation service process ID is null.


So, then I SSH to each CVM and then issues following command to kill the foundation service.

genesis stop foundation



Once you hit enter, it will kill the foundation service and shows the services are currently running on the CVM. If you closely look at the highlighted area no process ID shown for Foundation.


Once I kill foundation service on each CVM, I was able to continue the foundation upgrade as usual.








Please note that, If you destroy a cluster foundation service will started permanently until you create a new cluster or add the nodes to any existing cluster. In above scenario it’s a production cluster and so my guess was correct as LCM uses foundation service to run certain operations.


Reactions

Post a Comment

0 Comments