A company reached out to us on a Saturday with a Kubernetes Emergency.
Initial brief
During the initial briefing were told that their cluster certificates had expired. This caused apiserver calls to fail. Worker nodes were taken offline, and the single master node had been tainted to run workloads as a troubleshooting measure. Currently etcd CLI commands were failing, and apiserver was unable to contact etcd.
Investigation
We found that a collection of certificates had been generated with the tls_rotate tool provided by CoreOS. Examining these certs showed they had the required subject and alternative names, as well as corresponding kubeconfig files for various stages of rotation. We decided to move forward with this set of certificates for rotation.
Etcd
Although etcd CLI commands were failing with default arguments, passing the correct certificate information to the CLI through command line arguments allowed us to query etcd. This showed that the etcd certs were in fact correctly rotated, and that the apiserver only needed the correct CA and client keypair to access etcd.
Kubelet API Server
The next step was to establish a working API server. We’d need to plant the etcd CA cert and client keypair on the API server. Typically this would be done by editing a Kubernetes secret object, but this wasn’t possible since the apiserver could not reach etcd. We found that the kubelet apiserver was launched from a static definition in /etc/kubernetes. Alongside it was a static definition of the apiserver secrets. Planting the appropriate certificate files gave us a working apiserver, but only for a few seconds until the pod was recreated from the stale definitions in etcd once kubelet was bootstrapped. When restarting kubelet the values that had been loaded from the stale definitions were written to the /etc/kubernetes definitions, so kubelet was essentially caching and persisting these expired certificates. Using the brief period of availabilty we updated the apiserver secrets with the new certificates using kubectl. Importantly, we needed to base64 the certificate data. This process would be used again while updating secrets and config maps for other core services. After updating /etc/kubernetes/kubeconfig with the new apiserver CA cert and restarting kubelet we had a stable, usable apiserver.
Flannel Networking
With the api server working, workloads were still not being scheduled on the (master) node. Initially we found errors in pod/kubelet logs about a missing flannel file (which flannel generates at runtime). Kubelet logs showed it was unable to launch other pods (including the flannel pod) due to this same error. To solve this chicken-egg problem we manually created the file with the expected configuration. This allowed flannel to boot and manage its own copy of the config moving forward. With the overlay network functioning it was time to bring in a second node. We reverted the master’s taints so it would no longer attempt to schedule workloads. Node02 was given the updated kubeconfig and successfully joined the cluster.
Remaining kube-system Services
After joining the cluster, node02 was not scheduling workloads. Examining the kube-system namespace many core pods were still absent from the node. These pods are defined as deployments, but were never scheduled as kube-scheduler was not running. Workloads were also unscheduled for this same reason. kube-scheduler itself was also defined indirectly, but was not being scheduled without a living scheduler. To solve this second chicken-egg problem we used kubectl to extract a single pod definition from the scheduler deployment and manually ran that pod with kubectl. However, the scheduler threw routing errors attempting to contact the apiserver.
We found that the node was missing routes to be created by the kube-dns. We extracted a kube-dns pod definition and manually ran a single pod in the same way. This time kube-dns threw cert errors while hitting the API due to the outdated CA.crt in service account secrets. These are managed by the controllermanager, so we extracted a kube-controllermanager definition and manually ran a single pod. We updated the controllermanager secret, and controllermanager then updated all of the downstream service accounts with the correct CA. This allowed kube-dns to create routes on the node, which allowed the scheduler to reach the API server, and with a living scheduler the workloads were successfully scheduled to the node.
Traefik DNS
With workloads scheduled, web services were still unreachable through their HTTP endpoints in a browser. However, they were accessible via curl using the service IP and faked Host headers. The expected URLs were failing to correctly resolve. Traefik logs showed failures attempting to create DNS records through AWS, and the configured credentials were confirmed to be invalid. After updating the credentials, records were created, resolution succeeded, and the endpoints were reachable. At this point the cluster was handed over to Client for deployment and validation.