Kubernetes cluster certificate management

In Kubernetes, certificates expire after 1 year if the cluster has not been updated in that time.

The following instructions are for vanilla Kubernetes deployed with kubeadm. Please follow relevant instructions if using other versions of Kubernetes (Rancher, Openshift etc) as there may be differences in the process.

MetalSoft strongly recommend that the certificates are monitored from an external source.

Checking if certificates have expired

If the certificates expire, certain tasks will fail. As an example, running the following on one of the Kubernetes nodes:

kubectl get pods -A

Will result in the following error:

Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-03-23T14:32:50Z is after 2022-03-22T23:03:22Z

To check the certificates, run the following on the first node in the cluster:

kubeadm certs check-expiration

If the certificates have expired, you will receive a similar output to this:

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Mar 22, 2022 23:03 UTC   <invalid>                               no
apiserver                  Mar 22, 2022 23:03 UTC   <invalid>       ca                      no
apiserver-etcd-client      Mar 22, 2022 23:03 UTC   <invalid>       etcd-ca                 no
apiserver-kubelet-client   Mar 22, 2022 23:03 UTC   <invalid>       ca                      no
controller-manager.conf    Mar 22, 2022 23:03 UTC   <invalid>                               no
etcd-healthcheck-client    Mar 22, 2022 23:03 UTC   <invalid>       etcd-ca                 no
etcd-peer                  Mar 22, 2022 23:03 UTC   <invalid>       etcd-ca                 no
etcd-server                Mar 22, 2022 23:03 UTC   <invalid>       etcd-ca                 no
front-proxy-client         Mar 22, 2022 23:03 UTC   <invalid>       front-proxy-ca          no
scheduler.conf             Mar 22, 2022 23:03 UTC   <invalid>                               no
CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Mar 20, 2031 23:03 UTC   8y              no
etcd-ca                 Mar 20, 2031 23:03 UTC   8y              no
front-proxy-ca          Mar 20, 2031 23:03 UTC   8y              no

Renewing Kubernetes certificate

To renew the certificates, issue the following command on all control-plane nodes or etcd will fail:

kubeadm certs renew all

You should receive an output similar to this:

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Restart kubelet on all nodes:

systemctl daemon-reload &&  systemctl restart kubelet

Once this is complete, copy the admin.conf from /etc/kubernetes over the existing ~/.kube/confing/config file:

cp /etc/kubernetes/admin.conf /root/.kube/config

The kubernetes pods will then need to be restarted using the following procedure.

Warning

“Static Pods are managed by the local kubelet and not by the API Server, thus kubectl cannot be used to delete and restart them. To restart a static Pod you can temporarily remove its manifest file from /etc/kubernetes/manifests/ and wait for 20 seconds (see the fileCheckFrequency value in KubeletConfiguration struct. The kubelet will terminate the Pod if it’s no longer in the manifest directory. You can then move the file back and after another fileCheckFrequency period, the kubelet will recreate the Pod and the certificate renewal for the component can complete.”

mkdir -p /etc/kubernetes/_bak_manifests && mv /etc/kubernetes/manifests/* /etc/kubernetes/_bak_manifests/ && sleep 61 && mv /etc/kubernetes/_bak_manifests/* /etc/kubernetes/manifests/

Check pods and certificate

Then check the pods are all in running state:

k get pod

You should receive an output similar to this:

NAME                                  READY   STATUS    RESTARTS   AGE
auth-microservice-76d58f8666-6f24b    1/1     Running   0          55d
config-microservice-6cb6749b5-7h49r   1/1     Running   0          6d17h
controller-d96445fbf-pvz4s            1/1     Running   0          55d
couchdb-5cf5f9c6b4-xz9j5              1/1     Running   0          75d
event-microservice-5994c7f59d-zj4n7   1/1     Running   1          6d17h
gateway-api-74fffd489c-q5rpj          1/1     Running   2          6d17h
kafka-5bbb5b6b54-9wz7g                1/1     Running   0          294d
metal-cloud-ui-75b67dbdb9-2km7z       1/1     Running   0          55d
mysql-86f84d5f7b-49rs9                1/1     Running   0          55d
pdns-564dc7f7f4-wnwq8                 1/1     Running   0          55d
redis-5488cf8cb6-mnrhb                1/1     Running   0          55d
repo-76cc854495-m4nk7                 1/1     Running   0          55d
traefik-poc-67db8598c-svs2p           1/1     Running   0          71d
zookeeper-78cbb9749-cp25s             1/1     Running   0          55d

Check that the certificates are new renewed by issuing this command:

kubeadm certs check-expiration

Which should provide a similar output to the below. The expiry date should now be 1 year in the future:

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Mar 23, 2023 19:27 UTC   364d                                    no
apiserver                  Mar 23, 2023 19:27 UTC   364d            ca                      no
apiserver-etcd-client      Mar 23, 2023 19:27 UTC   364d            etcd-ca                 no
apiserver-kubelet-client   Mar 23, 2023 19:27 UTC   364d            ca                      no
controller-manager.conf    Mar 23, 2023 19:27 UTC   364d                                    no
etcd-healthcheck-client    Mar 23, 2023 19:27 UTC   364d            etcd-ca                 no
etcd-peer                  Mar 23, 2023 19:27 UTC   364d            etcd-ca                 no
etcd-server                Mar 23, 2023 19:27 UTC   364d            etcd-ca                 no
front-proxy-client         Mar 23, 2023 19:27 UTC   364d            front-proxy-ca          no
scheduler.conf             Mar 23, 2023 19:27 UTC   364d                                    no
CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Mar 20, 2031 23:03 UTC   8y              no
etcd-ca                 Mar 20, 2031 23:03 UTC   8y              no
front-proxy-ca          Mar 20, 2031 23:03 UTC   8y              no

If using Calico as a CNI, ensure you also restart/recreate its node and controller pods